E2E API Testing Strategy for DeployHQ

publicv1
6h ago
37 views0 comments0 reviews9 min read

We need end-to-end tests that exercise the full user flow through the API: registering a user, creating a project, creating different kinds of servers, and creating a deployment. Our repos are on GitHub and we run RSpec in CI, but setting up a full E2E environment there is complex and slow.

All necessary endpoints are already exposed through the Main API (26 controllers with valid_api_actions). We need a way to test the complete flow against a real running instance before deploying to staging.

Tests must run before the staging deployment, where we merge all new features. The flow is:

Push to staging branch
  → GitHub Actions: RSpec tests (4 parallel groups)
  → E2E smoke test ← NEW
  → Capistrano deploy to staging

OptionApproachEffortMaintenanceReliabilityCI-ready
AClaude Code (ad-hoc)~1hNoneN/A (manual)No
BRuby script agent~1-2 daysMediumHighYes
CAI agent (hybrid)~2-3 daysLowMedium-HighYes
DPostman/Newman~1-2 daysHighHighYes

Run Claude Code interactively, ask it to call the staging API via curl.

  • Setup: An API key for a test account on staging
  • Trigger: Manual -- someone runs it before deploying
  • Pros: Zero setup, good for one-off exploration
  • Cons: Cannot be automated, not a deploy gate

Verdict: Useful for debugging, not for automated testing.


A standalone Ruby script that calls the staging API in a fixed sequence, asserting responses at each step.

#Architecture

GitHub Actions runner (Ubuntu)
  └── ruby scripts/e2e_smoke_test.rb
        └── HTTP calls → staging.deployhq.com (or local Docker)

#What the Script Does

class E2ESmokeTest
  def initialize(base_url, api_key)
    @base_url = base_url
    @api_key = api_key
  end

  def run!
    project = create_project(name: "e2e-test-#{Time.now.to_i}")
    create_ssh_server(project)
    create_ftp_server(project)
    trigger_deployment(project)
    wait_for_deployment(project)
    cleanup(project)
    puts "E2E smoke test passed"
  end
end

#Setup Required

  1. Script: scripts/e2e_smoke_test.rb using Net::HTTP or httparty
  2. Test account: Dedicated account on staging with a known API key
  3. GitHub secrets: E2E_BASE_URL, E2E_API_KEY
  4. Workflow job: New job in .github/workflows/actions.yaml

#GitHub Actions Integration

e2e-smoke:
  needs: [test]
  if: github.ref == 'refs/heads/staging'
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: ruby/setup-ruby@v1
      with:
        ruby-version: 2.7.8
    - run: ruby scripts/e2e_smoke_test.rb
      env:
        E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
        E2E_API_KEY: ${{ secrets.E2E_API_KEY }}

#Change-Aware Variant

The script can detect what changed and select additional test scenarios:

changed_files = `git diff staging --name-only`.split("\n")

scenarios = [HappyPath] # always run baseline

if changed_files.any? { |f| f.include?("servers") }
  scenarios << AllServerTypes
end

if changed_files.any? { |f| f.include?("deployment") }
  scenarios << DeploymentEdgeCases
end

scenarios.each(&:run!)

This gives adaptability without AI non-determinism. But you maintain the scenarios manually.

#Where It Runs

LocationHowBest for
GitHub Actions runnerHTTP calls over internet to stagingAutomated CI gate
Staging server itselfVia Capistrano before-hook, calls localhostAvoids network dependency
Developer machineruby scripts/e2e_smoke_test.rbDebugging

GitHub Actions runner is the natural choice since tests already run there.

#Strengths

  • Deterministic: same code = same result
  • Fast: no LLM round-trips, just HTTP calls
  • Debuggable: "Step 3 returned 422: {error details}"
  • Free: no API costs
  • Fits the stack: Ruby, same language as the app

#Weaknesses

  • Tests must be maintained manually
  • New API features require new test code
  • Doesn't automatically adapt to changes

An AI agent that reads the git diff, understands what changed, and generates/executes targeted tests.

#The Core Advantage

Option B runs the same test every time. Option C adapts:

  1. Reads the git diff (staging..HEAD)
  2. Sees "the server creation endpoint changed its params"
  3. Adjusts tests to cover that specific change
  4. Also runs the baseline happy path

No test maintenance -- coverage adapts to the changeset automatically.

#The Reliability Concern

AI agents are non-deterministic. Concrete failure modes:

Failure ModeExample
False positiveAgent decides a 422 is "expected behavior" and passes
False negativeAgent constructs an invalid request, reports a bug that isn't there
FlakySame code passes Monday, fails Tuesday with no changes
Prompt sensitivityMinor wording change alters which API calls it makes
Hard to debugNeed to read agent reasoning trace to understand what it tried

Split the AI and execution into two steps to get adaptability with reliability:

Step 1 (AI):   Claude reads the diff → outputs a JSON test plan
Step 2 (Script): Ruby script reads the JSON → executes API calls deterministically

This way:

  • The AI decides what to test (adaptive)
  • The script decides how to test (deterministic)
  • When it fails, inspect test_plan.json to see the AI's decisions separately from execution

#Implementation Approaches

Approach 1: Claude CLI in GitHub Actions

e2e-smoke:
  needs: [test]
  if: github.ref == 'refs/heads/staging'
  steps:
    - uses: actions/checkout@v4
    - name: Generate test plan
      run: |
        claude --print --dangerously-skip-permissions \
          "Read the git diff for staging. \
           Output a JSON array of test scenarios. \
           Format: [{method, path, body, expected_status}]" \
          > test_plan.json
      env:
        ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    - name: Execute test plan
      run: ruby scripts/run_e2e_plan.rb test_plan.json
      env:
        E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
        E2E_API_KEY: ${{ secrets.E2E_API_KEY }}

Pros: Uses tooling you already have (Claude CLI) Cons: --dangerously-skip-permissions is broad, Opus is expensive per run

Approach 2: Claude Agent SDK (purpose-built)

A standalone agent with constrained tools:

tools = [
    Tool("create_project", calls POST /projects),
    Tool("create_server", calls POST /projects/:id/servers),
    Tool("create_deployment", calls POST /projects/:id/deployments),
    Tool("check_deployment", calls GET /projects/:id/deployments/:id),
    Tool("read_git_diff", runs git diff),
    Tool("cleanup", deletes test resources),
]

agent = Agent(
    model="claude-sonnet-4-6",
    tools=tools,
    instructions="""
    You are an E2E tester for DeployHQ.
    1. Read the git diff to see what changed
    2. Always run the baseline happy path
    3. Add targeted tests for changed areas
    4. Clean up all test resources
    5. Exit 0 on pass, exit 1 on fail with details
    """
)

Pros: Constrained tools (agent can only call defined operations), cheaper (Sonnet), faster Cons: New dependency (Agent SDK), you maintain tool definitions

Approach 3: Hybrid (Recommended for Option C)

AI generates the plan, script executes it:

- name: Generate test plan (AI)
  run: |
    claude --print --dangerously-skip-permissions \
      "Read the git diff for staging. \
       Output a JSON array of test scenarios to run. \
       Format: [{method, path, body, expected_status}]" \
      > test_plan.json

- name: Execute test plan (deterministic)
  run: ruby scripts/run_e2e_plan.rb test_plan.json

Pros: AI adaptability + deterministic execution, easy to debug (inspect test_plan.json) Cons: Two-step process, AI might generate invalid plans

#Cost Comparison

ApproachModelCost per run
Claude CLI (full)Opus~$0.50
Agent SDKSonnet~$0.05
Hybrid (one prompt)Sonnet~$0.03
Option B (no AI)N/A$0

#Where It Runs

Same as Option B: GitHub Actions runner. The difference is it also calls the Anthropic API for reasoning, so it needs an ANTHROPIC_API_KEY secret in addition to the staging API key.

GitHub Actions runner (Ubuntu)
  ├── Anthropic API → Claude (generates test plan / makes decisions)
  └── HTTP calls → staging.deployhq.com (executes tests)

Define the full flow as a Postman collection, run via newman CLI in CI.

  • Setup: Build collection in Postman UI, export JSON, add newman to CI
  • Trigger: newman run e2e-collection.json -e staging-env.json
  • Pros: Visual editor, shareable
  • Cons: Collections are JSON blobs (hard to review in PRs), limited programming logic, another tool to maintain, can't share logic with Ruby codebase

Verdict: Works but awkward for a Ruby shop.


CriteriaOption B (Script)Option C HybridOption C Agent SDK
Setup effort1-2 days2-3 days3-5 days
Ongoing maintenanceMedium (manual)Low (AI adapts)Low (AI adapts)
ReliabilityHighMedium-HighMedium
Cost per run$0~$0.03~$0.05
DeterministicYesExecution yes, plan noNo
DebuggabilityEasyEasy (inspect JSON)Medium (reasoning trace)
Adapts to changesOnly with manual scenariosYes (AI reads diff)Yes (AI reads diff)
New dependenciesNoneClaude CLI in CIAgent SDK + Python/TS
CI integrationSimpleMediumMedium

#Start with Option B

Get a working, reliable deploy gate with minimal effort. Covers the happy path and critical flows. No new dependencies, no API costs, runs in your existing CI.

#Layer Option C (Hybrid) on top

Once B is stable, add the hybrid AI layer:

  • AI generates additional test scenarios based on the git diff
  • Deterministic script executes them
  • Baseline happy path always runs regardless of AI output (fall back to Option B if AI fails)

This gives you the adaptability of AI with the reliability of a script, and the baseline always protects you even if the AI generates a bad plan.

#Architecture of the combined approach

GitHub Actions (staging branch only)
  │
  ├── Job: RSpec tests (existing, 4 groups)
  │
  └── Job: E2E smoke test
        │
        ├── Step 1: Run baseline scenarios (Option B)
        │     └── Always: happy path, all server types, deployment
        │
        ├── Step 2: AI-generated scenarios (Option C hybrid, optional)
        │     ├── Claude reads git diff
        │     ├── Outputs additional test_plan.json
        │     └── Script executes the plan
        │
        └── Step 3: Cleanup test resources

If Step 2 fails or produces garbage, Step 1 still ran and gave you the core coverage. Over time, you can tune the AI prompt and expand its scope as you build confidence.


Based on the existing Main API controllers, the full flow would exercise:

StepMethodEndpointController
Create projectPOST/projectsprojects_controller.rb
Set up repositoryPOST/projects/:id/repositoryrepositories_controller.rb
Create SSH serverPOST/projects/:id/serversservers_controller.rb
Create FTP serverPOST/projects/:id/serversservers_controller.rb
Add config filePOST/projects/:id/config_filesconfig_files_controller.rb
Add env variablePOST/projects/:id/environment_variablesenvironment_variables_controller.rb
Add build commandPOST/projects/:id/build_commandsbuild_commands_controller.rb
Trigger deploymentPOST/projects/:id/deploymentsdeployments_controller.rb
Check statusGET/projects/:id/deployments/:iddeployments_controller.rb
CleanupDELETE/projects/:idprojects_controller.rb

Authentication: HTTP Basic Auth with API key (-u :<api_key>).


  • Target environment: Run against live staging (tests current deploy) or Docker instance in CI (tests new code before deploy)?
  • Test account: Create a dedicated test account on staging, or provision one dynamically?
  • Cleanup strategy: Delete test resources after each run, or use a TTL-based cleanup?
  • Failure handling: Should E2E failure block the deploy, or just alert?
  • Server connectivity: E2E tests create servers -- do we need real servers to connect to, or just validate the API accepts the configuration?

comments (0)

reviews (0)