E2E API Testing Strategy for DeployHQ

We need end-to-end tests that exercise the full user flow through the API: registering a user, creating a project, creating different kinds of servers, and creating a deployment. Our repos are on GitHub and we run RSpec in CI, but setting up a full E2E environment there is complex and slow.

All necessary endpoints are already exposed through the Main API (26 controllers with valid_api_actions). We need a way to test the complete flow against a real running instance before deploying to staging.

Tests must run before the staging deployment, where we merge all new features. The flow is:

Push to staging branch
  → GitHub Actions: RSpec tests (4 parallel groups)
  → E2E smoke test ← NEW
  → Capistrano deploy to staging

Option	Approach	Effort	Maintenance	Reliability	CI-ready
A	Claude Code (ad-hoc)	~1h	None	N/A (manual)	No
B	Ruby script agent	~1-2 days	Medium	High	Yes
C	AI agent (hybrid)	~2-3 days	Low	Medium-High	Yes
D	Postman/Newman	~1-2 days	High	High	Yes

Run Claude Code interactively, ask it to call the staging API via curl.

Setup: An API key for a test account on staging
Trigger: Manual -- someone runs it before deploying
Pros: Zero setup, good for one-off exploration
Cons: Cannot be automated, not a deploy gate

Verdict: Useful for debugging, not for automated testing.

A standalone Ruby script that calls the staging API in a fixed sequence, asserting responses at each step.

#Architecture

GitHub Actions runner (Ubuntu)
  └── ruby scripts/e2e_smoke_test.rb
        └── HTTP calls → staging.deployhq.com (or local Docker)

#What the Script Does

class E2ESmokeTest
  def initialize(base_url, api_key)
    @base_url = base_url
    @api_key = api_key
  end

  def run!
    project = create_project(name: "e2e-test-#{Time.now.to_i}")
    create_ssh_server(project)
    create_ftp_server(project)
    trigger_deployment(project)
    wait_for_deployment(project)
    cleanup(project)
    puts "E2E smoke test passed"
  end
end

#Setup Required

Script: scripts/e2e_smoke_test.rb using Net::HTTP or httparty
Test account: Dedicated account on staging with a known API key
GitHub secrets: E2E_BASE_URL, E2E_API_KEY
Workflow job: New job in .github/workflows/actions.yaml

#GitHub Actions Integration

e2e-smoke:
  needs: [test]
  if: github.ref == 'refs/heads/staging'
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: ruby/setup-ruby@v1
      with:
        ruby-version: 2.7.8
    - run: ruby scripts/e2e_smoke_test.rb
      env:
        E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
        E2E_API_KEY: ${{ secrets.E2E_API_KEY }}

#Change-Aware Variant

The script can detect what changed and select additional test scenarios:

changed_files = `git diff staging --name-only`.split("\n")

scenarios = [HappyPath] # always run baseline

if changed_files.any? { |f| f.include?("servers") }
  scenarios << AllServerTypes
end

if changed_files.any? { |f| f.include?("deployment") }
  scenarios << DeploymentEdgeCases
end

scenarios.each(&:run!)

This gives adaptability without AI non-determinism. But you maintain the scenarios manually.

#Where It Runs

Location	How	Best for
GitHub Actions runner	HTTP calls over internet to staging	Automated CI gate
Staging server itself	Via Capistrano before-hook, calls localhost	Avoids network dependency
Developer machine	`ruby scripts/e2e_smoke_test.rb`	Debugging

GitHub Actions runner is the natural choice since tests already run there.

#Strengths

Deterministic: same code = same result
Fast: no LLM round-trips, just HTTP calls
Debuggable: "Step 3 returned 422: {error details}"
Free: no API costs
Fits the stack: Ruby, same language as the app

#Weaknesses

Tests must be maintained manually
New API features require new test code
Doesn't automatically adapt to changes

An AI agent that reads the git diff, understands what changed, and generates/executes targeted tests.

#The Core Advantage

Option B runs the same test every time. Option C adapts:

Reads the git diff (staging..HEAD)
Sees "the server creation endpoint changed its params"
Adjusts tests to cover that specific change
Also runs the baseline happy path

No test maintenance -- coverage adapts to the changeset automatically.

#The Reliability Concern

AI agents are non-deterministic. Concrete failure modes:

Failure Mode	Example
False positive	Agent decides a 422 is "expected behavior" and passes
False negative	Agent constructs an invalid request, reports a bug that isn't there
Flaky	Same code passes Monday, fails Tuesday with no changes
Prompt sensitivity	Minor wording change alters which API calls it makes
Hard to debug	Need to read agent reasoning trace to understand what it tried

#Recommended Approach: Hybrid (AI Plan + Deterministic Execution)

Split the AI and execution into two steps to get adaptability with reliability:

Step 1 (AI):   Claude reads the diff → outputs a JSON test plan
Step 2 (Script): Ruby script reads the JSON → executes API calls deterministically

This way:

The AI decides what to test (adaptive)
The script decides how to test (deterministic)
When it fails, inspect test_plan.json to see the AI's decisions separately from execution

#Implementation Approaches

Approach 1: Claude CLI in GitHub Actions

e2e-smoke:
  needs: [test]
  if: github.ref == 'refs/heads/staging'
  steps:
    - uses: actions/checkout@v4
    - name: Generate test plan
      run: |
        claude --print --dangerously-skip-permissions \
          "Read the git diff for staging. \
           Output a JSON array of test scenarios. \
           Format: [{method, path, body, expected_status}]" \
          > test_plan.json
      env:
        ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    - name: Execute test plan
      run: ruby scripts/run_e2e_plan.rb test_plan.json
      env:
        E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
        E2E_API_KEY: ${{ secrets.E2E_API_KEY }}

Pros: Uses tooling you already have (Claude CLI) Cons: --dangerously-skip-permissions is broad, Opus is expensive per run

Approach 2: Claude Agent SDK (purpose-built)

A standalone agent with constrained tools:

tools = [
    Tool("create_project", calls POST /projects),
    Tool("create_server", calls POST /projects/:id/servers),
    Tool("create_deployment", calls POST /projects/:id/deployments),
    Tool("check_deployment", calls GET /projects/:id/deployments/:id),
    Tool("read_git_diff", runs git diff),
    Tool("cleanup", deletes test resources),
]

agent = Agent(
    model="claude-sonnet-4-6",
    tools=tools,
    instructions="""
    You are an E2E tester for DeployHQ.
    1. Read the git diff to see what changed
    2. Always run the baseline happy path
    3. Add targeted tests for changed areas
    4. Clean up all test resources
    5. Exit 0 on pass, exit 1 on fail with details
    """
)

Pros: Constrained tools (agent can only call defined operations), cheaper (Sonnet), faster Cons: New dependency (Agent SDK), you maintain tool definitions

Approach 3: Hybrid (Recommended for Option C)

AI generates the plan, script executes it:

- name: Generate test plan (AI)
  run: |
    claude --print --dangerously-skip-permissions \
      "Read the git diff for staging. \
       Output a JSON array of test scenarios to run. \
       Format: [{method, path, body, expected_status}]" \
      > test_plan.json

- name: Execute test plan (deterministic)
  run: ruby scripts/run_e2e_plan.rb test_plan.json

Pros: AI adaptability + deterministic execution, easy to debug (inspect test_plan.json) Cons: Two-step process, AI might generate invalid plans

#Cost Comparison

Approach	Model	Cost per run
Claude CLI (full)	Opus	~$0.50
Agent SDK	Sonnet	~$0.05
Hybrid (one prompt)	Sonnet	~$0.03
Option B (no AI)	N/A	$0

#Where It Runs

Same as Option B: GitHub Actions runner. The difference is it also calls the Anthropic API for reasoning, so it needs an ANTHROPIC_API_KEY secret in addition to the staging API key.

GitHub Actions runner (Ubuntu)
  ├── Anthropic API → Claude (generates test plan / makes decisions)
  └── HTTP calls → staging.deployhq.com (executes tests)

Define the full flow as a Postman collection, run via newman CLI in CI.

Setup: Build collection in Postman UI, export JSON, add newman to CI
Trigger: newman run e2e-collection.json -e staging-env.json
Pros: Visual editor, shareable
Cons: Collections are JSON blobs (hard to review in PRs), limited programming logic, another tool to maintain, can't share logic with Ruby codebase

Verdict: Works but awkward for a Ruby shop.

Criteria	Option B (Script)	Option C Hybrid	Option C Agent SDK
Setup effort	1-2 days	2-3 days	3-5 days
Ongoing maintenance	Medium (manual)	Low (AI adapts)	Low (AI adapts)
Reliability	High	Medium-High	Medium
Cost per run	$0	~$0.03	~$0.05
Deterministic	Yes	Execution yes, plan no	No
Debuggability	Easy	Easy (inspect JSON)	Medium (reasoning trace)
Adapts to changes	Only with manual scenarios	Yes (AI reads diff)	Yes (AI reads diff)
New dependencies	None	Claude CLI in CI	Agent SDK + Python/TS
CI integration	Simple	Medium	Medium

#Start with Option B

Get a working, reliable deploy gate with minimal effort. Covers the happy path and critical flows. No new dependencies, no API costs, runs in your existing CI.

#Layer Option C (Hybrid) on top

Once B is stable, add the hybrid AI layer:

AI generates additional test scenarios based on the git diff
Deterministic script executes them
Baseline happy path always runs regardless of AI output (fall back to Option B if AI fails)

This gives you the adaptability of AI with the reliability of a script, and the baseline always protects you even if the AI generates a bad plan.

#Architecture of the combined approach

GitHub Actions (staging branch only)
  │
  ├── Job: RSpec tests (existing, 4 groups)
  │
  └── Job: E2E smoke test
        │
        ├── Step 1: Run baseline scenarios (Option B)
        │     └── Always: happy path, all server types, deployment
        │
        ├── Step 2: AI-generated scenarios (Option C hybrid, optional)
        │     ├── Claude reads git diff
        │     ├── Outputs additional test_plan.json
        │     └── Script executes the plan
        │
        └── Step 3: Cleanup test resources

If Step 2 fails or produces garbage, Step 1 still ran and gave you the core coverage. Over time, you can tune the AI prompt and expand its scope as you build confidence.

Based on the existing Main API controllers, the full flow would exercise:

Step	Method	Endpoint	Controller
Create project	POST	`/projects`	`projects_controller.rb`
Set up repository	POST	`/projects/:id/repository`	`repositories_controller.rb`
Create SSH server	POST	`/projects/:id/servers`	`servers_controller.rb`
Create FTP server	POST	`/projects/:id/servers`	`servers_controller.rb`
Add config file	POST	`/projects/:id/config_files`	`config_files_controller.rb`
Add env variable	POST	`/projects/:id/environment_variables`	`environment_variables_controller.rb`
Add build command	POST	`/projects/:id/build_commands`	`build_commands_controller.rb`
Trigger deployment	POST	`/projects/:id/deployments`	`deployments_controller.rb`
Check status	GET	`/projects/:id/deployments/:id`	`deployments_controller.rb`
Cleanup	DELETE	`/projects/:id`	`projects_controller.rb`

Authentication: HTTP Basic Auth with API key (-u :<api_key>).

Target environment: Run against live staging (tests current deploy) or Docker instance in CI (tests new code before deploy)?
Test account: Create a dedicated test account on staging, or provision one dynamically?
Cleanup strategy: Delete test resources after each run, or use a TTL-based cleanup?
Failure handling: Should E2E failure block the deploy, or just alert?
Server connectivity: E2E tests create servers -- do we need real servers to connect to, or just validate the API accepts the configuration?