We need end-to-end tests that exercise the full user flow through the API: registering a user, creating a project, creating different kinds of servers, and creating a deployment. Our repos are on GitHub and we run RSpec in CI, but setting up a full E2E environment there is complex and slow.
All necessary endpoints are already exposed through the Main API (26 controllers with valid_api_actions). We need a way to test the complete flow against a real running instance before deploying to staging.
Tests must run before the staging deployment, where we merge all new features. The flow is:
Push to staging branch
→ GitHub Actions: RSpec tests (4 parallel groups)
→ E2E smoke test ← NEW
→ Capistrano deploy to staging
| Option | Approach | Effort | Maintenance | Reliability | CI-ready |
|---|---|---|---|---|---|
| A | Claude Code (ad-hoc) | ~1h | None | N/A (manual) | No |
| B | Ruby script agent | ~1-2 days | Medium | High | Yes |
| C | AI agent (hybrid) | ~2-3 days | Low | Medium-High | Yes |
| D | Postman/Newman | ~1-2 days | High | High | Yes |
Run Claude Code interactively, ask it to call the staging API via curl.
Verdict: Useful for debugging, not for automated testing.
A standalone Ruby script that calls the staging API in a fixed sequence, asserting responses at each step.
GitHub Actions runner (Ubuntu)
└── ruby scripts/e2e_smoke_test.rb
└── HTTP calls → staging.deployhq.com (or local Docker)
class E2ESmokeTest
def initialize(base_url, api_key)
@base_url = base_url
@api_key = api_key
end
def run!
project = create_project(name: "e2e-test-#{Time.now.to_i}")
create_ssh_server(project)
create_ftp_server(project)
trigger_deployment(project)
wait_for_deployment(project)
cleanup(project)
puts "E2E smoke test passed"
end
end
scripts/e2e_smoke_test.rb using Net::HTTP or httpartyE2E_BASE_URL, E2E_API_KEY.github/workflows/actions.yamle2e-smoke:
needs: [test]
if: github.ref == 'refs/heads/staging'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: 2.7.8
- run: ruby scripts/e2e_smoke_test.rb
env:
E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
E2E_API_KEY: ${{ secrets.E2E_API_KEY }}
The script can detect what changed and select additional test scenarios:
changed_files = `git diff staging --name-only`.split("\n")
scenarios = [HappyPath] # always run baseline
if changed_files.any? { |f| f.include?("servers") }
scenarios << AllServerTypes
end
if changed_files.any? { |f| f.include?("deployment") }
scenarios << DeploymentEdgeCases
end
scenarios.each(&:run!)
This gives adaptability without AI non-determinism. But you maintain the scenarios manually.
| Location | How | Best for |
|---|---|---|
| GitHub Actions runner | HTTP calls over internet to staging | Automated CI gate |
| Staging server itself | Via Capistrano before-hook, calls localhost | Avoids network dependency |
| Developer machine | ruby scripts/e2e_smoke_test.rb | Debugging |
GitHub Actions runner is the natural choice since tests already run there.
An AI agent that reads the git diff, understands what changed, and generates/executes targeted tests.
Option B runs the same test every time. Option C adapts:
staging..HEAD)No test maintenance -- coverage adapts to the changeset automatically.
AI agents are non-deterministic. Concrete failure modes:
| Failure Mode | Example |
|---|---|
| False positive | Agent decides a 422 is "expected behavior" and passes |
| False negative | Agent constructs an invalid request, reports a bug that isn't there |
| Flaky | Same code passes Monday, fails Tuesday with no changes |
| Prompt sensitivity | Minor wording change alters which API calls it makes |
| Hard to debug | Need to read agent reasoning trace to understand what it tried |
Split the AI and execution into two steps to get adaptability with reliability:
Step 1 (AI): Claude reads the diff → outputs a JSON test plan
Step 2 (Script): Ruby script reads the JSON → executes API calls deterministically
This way:
test_plan.json to see the AI's decisions separately from executione2e-smoke:
needs: [test]
if: github.ref == 'refs/heads/staging'
steps:
- uses: actions/checkout@v4
- name: Generate test plan
run: |
claude --print --dangerously-skip-permissions \
"Read the git diff for staging. \
Output a JSON array of test scenarios. \
Format: [{method, path, body, expected_status}]" \
> test_plan.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Execute test plan
run: ruby scripts/run_e2e_plan.rb test_plan.json
env:
E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
E2E_API_KEY: ${{ secrets.E2E_API_KEY }}
Pros: Uses tooling you already have (Claude CLI)
Cons: --dangerously-skip-permissions is broad, Opus is expensive per run
A standalone agent with constrained tools:
tools = [
Tool("create_project", calls POST /projects),
Tool("create_server", calls POST /projects/:id/servers),
Tool("create_deployment", calls POST /projects/:id/deployments),
Tool("check_deployment", calls GET /projects/:id/deployments/:id),
Tool("read_git_diff", runs git diff),
Tool("cleanup", deletes test resources),
]
agent = Agent(
model="claude-sonnet-4-6",
tools=tools,
instructions="""
You are an E2E tester for DeployHQ.
1. Read the git diff to see what changed
2. Always run the baseline happy path
3. Add targeted tests for changed areas
4. Clean up all test resources
5. Exit 0 on pass, exit 1 on fail with details
"""
)
Pros: Constrained tools (agent can only call defined operations), cheaper (Sonnet), faster Cons: New dependency (Agent SDK), you maintain tool definitions
AI generates the plan, script executes it:
- name: Generate test plan (AI)
run: |
claude --print --dangerously-skip-permissions \
"Read the git diff for staging. \
Output a JSON array of test scenarios to run. \
Format: [{method, path, body, expected_status}]" \
> test_plan.json
- name: Execute test plan (deterministic)
run: ruby scripts/run_e2e_plan.rb test_plan.json
Pros: AI adaptability + deterministic execution, easy to debug (inspect test_plan.json) Cons: Two-step process, AI might generate invalid plans
| Approach | Model | Cost per run |
|---|---|---|
| Claude CLI (full) | Opus | ~$0.50 |
| Agent SDK | Sonnet | ~$0.05 |
| Hybrid (one prompt) | Sonnet | ~$0.03 |
| Option B (no AI) | N/A | $0 |
Same as Option B: GitHub Actions runner. The difference is it also calls the Anthropic API for reasoning, so it needs an ANTHROPIC_API_KEY secret in addition to the staging API key.
GitHub Actions runner (Ubuntu)
├── Anthropic API → Claude (generates test plan / makes decisions)
└── HTTP calls → staging.deployhq.com (executes tests)
Define the full flow as a Postman collection, run via newman CLI in CI.
newman to CInewman run e2e-collection.json -e staging-env.jsonVerdict: Works but awkward for a Ruby shop.
| Criteria | Option B (Script) | Option C Hybrid | Option C Agent SDK |
|---|---|---|---|
| Setup effort | 1-2 days | 2-3 days | 3-5 days |
| Ongoing maintenance | Medium (manual) | Low (AI adapts) | Low (AI adapts) |
| Reliability | High | Medium-High | Medium |
| Cost per run | $0 | ~$0.03 | ~$0.05 |
| Deterministic | Yes | Execution yes, plan no | No |
| Debuggability | Easy | Easy (inspect JSON) | Medium (reasoning trace) |
| Adapts to changes | Only with manual scenarios | Yes (AI reads diff) | Yes (AI reads diff) |
| New dependencies | None | Claude CLI in CI | Agent SDK + Python/TS |
| CI integration | Simple | Medium | Medium |
Get a working, reliable deploy gate with minimal effort. Covers the happy path and critical flows. No new dependencies, no API costs, runs in your existing CI.
Once B is stable, add the hybrid AI layer:
This gives you the adaptability of AI with the reliability of a script, and the baseline always protects you even if the AI generates a bad plan.
GitHub Actions (staging branch only)
│
├── Job: RSpec tests (existing, 4 groups)
│
└── Job: E2E smoke test
│
├── Step 1: Run baseline scenarios (Option B)
│ └── Always: happy path, all server types, deployment
│
├── Step 2: AI-generated scenarios (Option C hybrid, optional)
│ ├── Claude reads git diff
│ ├── Outputs additional test_plan.json
│ └── Script executes the plan
│
└── Step 3: Cleanup test resources
If Step 2 fails or produces garbage, Step 1 still ran and gave you the core coverage. Over time, you can tune the AI prompt and expand its scope as you build confidence.
Based on the existing Main API controllers, the full flow would exercise:
| Step | Method | Endpoint | Controller |
|---|---|---|---|
| Create project | POST | /projects | projects_controller.rb |
| Set up repository | POST | /projects/:id/repository | repositories_controller.rb |
| Create SSH server | POST | /projects/:id/servers | servers_controller.rb |
| Create FTP server | POST | /projects/:id/servers | servers_controller.rb |
| Add config file | POST | /projects/:id/config_files | config_files_controller.rb |
| Add env variable | POST | /projects/:id/environment_variables | environment_variables_controller.rb |
| Add build command | POST | /projects/:id/build_commands | build_commands_controller.rb |
| Trigger deployment | POST | /projects/:id/deployments | deployments_controller.rb |
| Check status | GET | /projects/:id/deployments/:id | deployments_controller.rb |
| Cleanup | DELETE | /projects/:id | projects_controller.rb |
Authentication: HTTP Basic Auth with API key (-u :<api_key>).
comments (0)