E2E API Testing Strategy for DeployHQ
We need end-to-end tests that exercise the full user flow through the API: registering a user, creating a project, creating different kinds of servers, and creating a deployment. Our repos are on GitHub and we run RSpec in CI, but setting up a full E2E environment there is complex and slow.
All necessary endpoints are already exposed through the Main API (26 controllers with valid_api_actions). We need a way to test the complete flow against a real running instance before deploying to staging.
Tests must run before the staging deployment, where we merge all new features. The flow is:
Push to staging branch
→ GitHub Actions: RSpec tests (4 parallel groups)
→ E2E smoke test ← NEW
→ Capistrano deploy to staging
| Option | Approach | Effort | Maintenance | Reliability | CI-ready |
|---|---|---|---|---|---|
| A | Claude Code (ad-hoc) | ~1h | None | N/A (manual) | No |
| B | Ruby script agent | ~1-2 days | Medium | High | Yes |
| C | AI agent (hybrid) | ~2-3 days | Low | Medium-High | Yes |
| D | Postman/Newman | ~1-2 days | High | High | Yes |
Run Claude Code interactively, ask it to call the staging API via curl.
- Setup: An API key for a test account on staging
- Trigger: Manual -- someone runs it before deploying
- Pros: Zero setup, good for one-off exploration
- Cons: Cannot be automated, not a deploy gate
Verdict: Useful for debugging, not for automated testing.
A standalone Ruby script that calls the staging API in a fixed sequence, asserting responses at each step.
#Architecture
GitHub Actions runner (Ubuntu)
└── ruby scripts/e2e_smoke_test.rb
└── HTTP calls → staging.deployhq.com (or local Docker)
#What the Script Does
class E2ESmokeTest
def initialize(base_url, api_key)
@base_url = base_url
@api_key = api_key
end
def run!
project = create_project(name: "e2e-test-#{Time.now.to_i}")
create_ssh_server(project)
create_ftp_server(project)
trigger_deployment(project)
wait_for_deployment(project)
cleanup(project)
puts "E2E smoke test passed"
end
end
#Setup Required
- Script:
scripts/e2e_smoke_test.rbusingNet::HTTPorhttparty - Test account: Dedicated account on staging with a known API key
- GitHub secrets:
E2E_BASE_URL,E2E_API_KEY - Workflow job: New job in
.github/workflows/actions.yaml
#GitHub Actions Integration
e2e-smoke:
needs: [test]
if: github.ref == 'refs/heads/staging'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: 2.7.8
- run: ruby scripts/e2e_smoke_test.rb
env:
E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
E2E_API_KEY: ${{ secrets.E2E_API_KEY }}
#Change-Aware Variant
The script can detect what changed and select additional test scenarios:
changed_files = `git diff staging --name-only`.split("\n")
scenarios = [HappyPath] # always run baseline
if changed_files.any? { |f| f.include?("servers") }
scenarios << AllServerTypes
end
if changed_files.any? { |f| f.include?("deployment") }
scenarios << DeploymentEdgeCases
end
scenarios.each(&:run!)
This gives adaptability without AI non-determinism. But you maintain the scenarios manually.
#Where It Runs
| Location | How | Best for |
|---|---|---|
| GitHub Actions runner | HTTP calls over internet to staging | Automated CI gate |
| Staging server itself | Via Capistrano before-hook, calls localhost | Avoids network dependency |
| Developer machine | ruby scripts/e2e_smoke_test.rb | Debugging |
GitHub Actions runner is the natural choice since tests already run there.
#Strengths
- Deterministic: same code = same result
- Fast: no LLM round-trips, just HTTP calls
- Debuggable: "Step 3 returned 422: {error details}"
- Free: no API costs
- Fits the stack: Ruby, same language as the app
#Weaknesses
- Tests must be maintained manually
- New API features require new test code
- Doesn't automatically adapt to changes
An AI agent that reads the git diff, understands what changed, and generates/executes targeted tests.
#The Core Advantage
Option B runs the same test every time. Option C adapts:
- Reads the git diff (
staging..HEAD) - Sees "the server creation endpoint changed its params"
- Adjusts tests to cover that specific change
- Also runs the baseline happy path
No test maintenance -- coverage adapts to the changeset automatically.
#The Reliability Concern
AI agents are non-deterministic. Concrete failure modes:
| Failure Mode | Example |
|---|---|
| False positive | Agent decides a 422 is "expected behavior" and passes |
| False negative | Agent constructs an invalid request, reports a bug that isn't there |
| Flaky | Same code passes Monday, fails Tuesday with no changes |
| Prompt sensitivity | Minor wording change alters which API calls it makes |
| Hard to debug | Need to read agent reasoning trace to understand what it tried |
#Recommended Approach: Hybrid (AI Plan + Deterministic Execution)
Split the AI and execution into two steps to get adaptability with reliability:
Step 1 (AI): Claude reads the diff → outputs a JSON test plan
Step 2 (Script): Ruby script reads the JSON → executes API calls deterministically
This way:
- The AI decides what to test (adaptive)
- The script decides how to test (deterministic)
- When it fails, inspect
test_plan.jsonto see the AI's decisions separately from execution
#Implementation Approaches
Approach 1: Claude CLI in GitHub Actions
e2e-smoke:
needs: [test]
if: github.ref == 'refs/heads/staging'
steps:
- uses: actions/checkout@v4
- name: Generate test plan
run: |
claude --print --dangerously-skip-permissions \
"Read the git diff for staging. \
Output a JSON array of test scenarios. \
Format: [{method, path, body, expected_status}]" \
> test_plan.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Execute test plan
run: ruby scripts/run_e2e_plan.rb test_plan.json
env:
E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
E2E_API_KEY: ${{ secrets.E2E_API_KEY }}
Pros: Uses tooling you already have (Claude CLI)
Cons: --dangerously-skip-permissions is broad, Opus is expensive per run
Approach 2: Claude Agent SDK (purpose-built)
A standalone agent with constrained tools:
tools = [
Tool("create_project", calls POST /projects),
Tool("create_server", calls POST /projects/:id/servers),
Tool("create_deployment", calls POST /projects/:id/deployments),
Tool("check_deployment", calls GET /projects/:id/deployments/:id),
Tool("read_git_diff", runs git diff),
Tool("cleanup", deletes test resources),
]
agent = Agent(
model="claude-sonnet-4-6",
tools=tools,
instructions="""
You are an E2E tester for DeployHQ.
1. Read the git diff to see what changed
2. Always run the baseline happy path
3. Add targeted tests for changed areas
4. Clean up all test resources
5. Exit 0 on pass, exit 1 on fail with details
"""
)
Pros: Constrained tools (agent can only call defined operations), cheaper (Sonnet), faster Cons: New dependency (Agent SDK), you maintain tool definitions
Approach 3: Hybrid (Recommended for Option C)
AI generates the plan, script executes it:
- name: Generate test plan (AI)
run: |
claude --print --dangerously-skip-permissions \
"Read the git diff for staging. \
Output a JSON array of test scenarios to run. \
Format: [{method, path, body, expected_status}]" \
> test_plan.json
- name: Execute test plan (deterministic)
run: ruby scripts/run_e2e_plan.rb test_plan.json
Pros: AI adaptability + deterministic execution, easy to debug (inspect test_plan.json) Cons: Two-step process, AI might generate invalid plans
#Cost Comparison
| Approach | Model | Cost per run |
|---|---|---|
| Claude CLI (full) | Opus | ~$0.50 |
| Agent SDK | Sonnet | ~$0.05 |
| Hybrid (one prompt) | Sonnet | ~$0.03 |
| Option B (no AI) | N/A | $0 |
#Where It Runs
Same as Option B: GitHub Actions runner. The difference is it also calls the Anthropic API for reasoning, so it needs an ANTHROPIC_API_KEY secret in addition to the staging API key.
GitHub Actions runner (Ubuntu)
├── Anthropic API → Claude (generates test plan / makes decisions)
└── HTTP calls → staging.deployhq.com (executes tests)
Define the full flow as a Postman collection, run via newman CLI in CI.
- Setup: Build collection in Postman UI, export JSON, add
newmanto CI - Trigger:
newman run e2e-collection.json -e staging-env.json - Pros: Visual editor, shareable
- Cons: Collections are JSON blobs (hard to review in PRs), limited programming logic, another tool to maintain, can't share logic with Ruby codebase
Verdict: Works but awkward for a Ruby shop.
| Criteria | Option B (Script) | Option C Hybrid | Option C Agent SDK |
|---|---|---|---|
| Setup effort | 1-2 days | 2-3 days | 3-5 days |
| Ongoing maintenance | Medium (manual) | Low (AI adapts) | Low (AI adapts) |
| Reliability | High | Medium-High | Medium |
| Cost per run | $0 | ~$0.03 | ~$0.05 |
| Deterministic | Yes | Execution yes, plan no | No |
| Debuggability | Easy | Easy (inspect JSON) | Medium (reasoning trace) |
| Adapts to changes | Only with manual scenarios | Yes (AI reads diff) | Yes (AI reads diff) |
| New dependencies | None | Claude CLI in CI | Agent SDK + Python/TS |
| CI integration | Simple | Medium | Medium |
#Start with Option B
Get a working, reliable deploy gate with minimal effort. Covers the happy path and critical flows. No new dependencies, no API costs, runs in your existing CI.
#Layer Option C (Hybrid) on top
Once B is stable, add the hybrid AI layer:
- AI generates additional test scenarios based on the git diff
- Deterministic script executes them
- Baseline happy path always runs regardless of AI output (fall back to Option B if AI fails)
This gives you the adaptability of AI with the reliability of a script, and the baseline always protects you even if the AI generates a bad plan.
#Architecture of the combined approach
GitHub Actions (staging branch only)
│
├── Job: RSpec tests (existing, 4 groups)
│
└── Job: E2E smoke test
│
├── Step 1: Run baseline scenarios (Option B)
│ └── Always: happy path, all server types, deployment
│
├── Step 2: AI-generated scenarios (Option C hybrid, optional)
│ ├── Claude reads git diff
│ ├── Outputs additional test_plan.json
│ └── Script executes the plan
│
└── Step 3: Cleanup test resources
If Step 2 fails or produces garbage, Step 1 still ran and gave you the core coverage. Over time, you can tune the AI prompt and expand its scope as you build confidence.
Based on the existing Main API controllers, the full flow would exercise:
| Step | Method | Endpoint | Controller |
|---|---|---|---|
| Create project | POST | /projects | projects_controller.rb |
| Set up repository | POST | /projects/:id/repository | repositories_controller.rb |
| Create SSH server | POST | /projects/:id/servers | servers_controller.rb |
| Create FTP server | POST | /projects/:id/servers | servers_controller.rb |
| Add config file | POST | /projects/:id/config_files | config_files_controller.rb |
| Add env variable | POST | /projects/:id/environment_variables | environment_variables_controller.rb |
| Add build command | POST | /projects/:id/build_commands | build_commands_controller.rb |
| Trigger deployment | POST | /projects/:id/deployments | deployments_controller.rb |
| Check status | GET | /projects/:id/deployments/:id | deployments_controller.rb |
| Cleanup | DELETE | /projects/:id | projects_controller.rb |
Authentication: HTTP Basic Auth with API key (-u :<api_key>).
- Target environment: Run against live staging (tests current deploy) or Docker instance in CI (tests new code before deploy)?
- Test account: Create a dedicated test account on staging, or provision one dynamically?
- Cleanup strategy: Delete test resources after each run, or use a TTL-based cleanup?
- Failure handling: Should E2E failure block the deploy, or just alert?
- Server connectivity: E2E tests create servers -- do we need real servers to connect to, or just validate the API accepts the configuration?