# E2E API Testing Strategy for DeployHQ

## Problem Statement

We need end-to-end tests that exercise the full user flow through the API: registering a user, creating a project, creating different kinds of servers, and creating a deployment. Our repos are on GitHub and we run RSpec in CI, but setting up a full E2E environment there is complex and slow.

All necessary endpoints are already exposed through the Main API (26 controllers with `valid_api_actions`). We need a way to test the complete flow against a real running instance before deploying to staging.

## Trigger Point

Tests must run **before the staging deployment**, where we merge all new features. The flow is:

```
Push to staging branch
  → GitHub Actions: RSpec tests (4 parallel groups)
  → E2E smoke test ← NEW
  → Capistrano deploy to staging
```

---

## Options Overview

| Option | Approach | Effort | Maintenance | Reliability | CI-ready |
|--------|----------|--------|-------------|-------------|----------|
| **A** | Claude Code (ad-hoc) | ~1h | None | N/A (manual) | No |
| **B** | Ruby script agent | ~1-2 days | Medium | High | Yes |
| **C** | AI agent (hybrid) | ~2-3 days | Low | Medium-High | Yes |
| **D** | Postman/Newman | ~1-2 days | High | High | Yes |

---

## Option A: Claude Code (Ad-hoc)

Run Claude Code interactively, ask it to call the staging API via curl.

- **Setup**: An API key for a test account on staging
- **Trigger**: Manual -- someone runs it before deploying
- **Pros**: Zero setup, good for one-off exploration
- **Cons**: Cannot be automated, not a deploy gate

**Verdict**: Useful for debugging, not for automated testing.

---

## Option B: Ruby Script Agent (Deterministic)

A standalone Ruby script that calls the staging API in a fixed sequence, asserting responses at each step.

### Architecture

```
GitHub Actions runner (Ubuntu)
  └── ruby scripts/e2e_smoke_test.rb
        └── HTTP calls → staging.deployhq.com (or local Docker)
```

### What the Script Does

```ruby
class E2ESmokeTest
  def initialize(base_url, api_key)
    @base_url = base_url
    @api_key = api_key
  end

  def run!
    project = create_project(name: "e2e-test-#{Time.now.to_i}")
    create_ssh_server(project)
    create_ftp_server(project)
    trigger_deployment(project)
    wait_for_deployment(project)
    cleanup(project)
    puts "E2E smoke test passed"
  end
end
```

### Setup Required

1. **Script**: `scripts/e2e_smoke_test.rb` using `Net::HTTP` or `httparty`
2. **Test account**: Dedicated account on staging with a known API key
3. **GitHub secrets**: `E2E_BASE_URL`, `E2E_API_KEY`
4. **Workflow job**: New job in `.github/workflows/actions.yaml`

### GitHub Actions Integration

```yaml
e2e-smoke:
  needs: [test]
  if: github.ref == 'refs/heads/staging'
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: ruby/setup-ruby@v1
      with:
        ruby-version: 2.7.8
    - run: ruby scripts/e2e_smoke_test.rb
      env:
        E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
        E2E_API_KEY: ${{ secrets.E2E_API_KEY }}
```

### Change-Aware Variant

The script can detect what changed and select additional test scenarios:

```ruby
changed_files = `git diff staging --name-only`.split("\n")

scenarios = [HappyPath] # always run baseline

if changed_files.any? { |f| f.include?("servers") }
  scenarios << AllServerTypes
end

if changed_files.any? { |f| f.include?("deployment") }
  scenarios << DeploymentEdgeCases
end

scenarios.each(&:run!)
```

This gives adaptability without AI non-determinism. But you maintain the scenarios manually.

### Where It Runs

| Location | How | Best for |
|----------|-----|----------|
| **GitHub Actions runner** | HTTP calls over internet to staging | Automated CI gate |
| **Staging server itself** | Via Capistrano before-hook, calls localhost | Avoids network dependency |
| **Developer machine** | `ruby scripts/e2e_smoke_test.rb` | Debugging |

GitHub Actions runner is the natural choice since tests already run there.

### Strengths

- Deterministic: same code = same result
- Fast: no LLM round-trips, just HTTP calls
- Debuggable: "Step 3 returned 422: {error details}"
- Free: no API costs
- Fits the stack: Ruby, same language as the app

### Weaknesses

- Tests must be maintained manually
- New API features require new test code
- Doesn't automatically adapt to changes

---

## Option C: AI Agent (Adaptive)

An AI agent that reads the git diff, understands what changed, and generates/executes targeted tests.

### The Core Advantage

Option B runs the **same test every time**. Option C adapts:

1. Reads the git diff (`staging..HEAD`)
2. Sees "the server creation endpoint changed its params"
3. Adjusts tests to cover that specific change
4. Also runs the baseline happy path

No test maintenance -- coverage adapts to the changeset automatically.

### The Reliability Concern

AI agents are non-deterministic. Concrete failure modes:

| Failure Mode | Example |
|--------------|---------|
| **False positive** | Agent decides a 422 is "expected behavior" and passes |
| **False negative** | Agent constructs an invalid request, reports a bug that isn't there |
| **Flaky** | Same code passes Monday, fails Tuesday with no changes |
| **Prompt sensitivity** | Minor wording change alters which API calls it makes |
| **Hard to debug** | Need to read agent reasoning trace to understand what it tried |

### Recommended Approach: Hybrid (AI Plan + Deterministic Execution)

Split the AI and execution into two steps to get adaptability with reliability:

```
Step 1 (AI):   Claude reads the diff → outputs a JSON test plan
Step 2 (Script): Ruby script reads the JSON → executes API calls deterministically
```

This way:
- The AI decides **what** to test (adaptive)
- The script decides **how** to test (deterministic)
- When it fails, inspect `test_plan.json` to see the AI's decisions separately from execution

### Implementation Approaches

#### Approach 1: Claude CLI in GitHub Actions

```yaml
e2e-smoke:
  needs: [test]
  if: github.ref == 'refs/heads/staging'
  steps:
    - uses: actions/checkout@v4
    - name: Generate test plan
      run: |
        claude --print --dangerously-skip-permissions \
          "Read the git diff for staging. \
           Output a JSON array of test scenarios. \
           Format: [{method, path, body, expected_status}]" \
          > test_plan.json
      env:
        ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    - name: Execute test plan
      run: ruby scripts/run_e2e_plan.rb test_plan.json
      env:
        E2E_BASE_URL: ${{ secrets.E2E_STAGING_URL }}
        E2E_API_KEY: ${{ secrets.E2E_API_KEY }}
```

**Pros**: Uses tooling you already have (Claude CLI)
**Cons**: `--dangerously-skip-permissions` is broad, Opus is expensive per run

#### Approach 2: Claude Agent SDK (purpose-built)

A standalone agent with constrained tools:

```python
tools = [
    Tool("create_project", calls POST /projects),
    Tool("create_server", calls POST /projects/:id/servers),
    Tool("create_deployment", calls POST /projects/:id/deployments),
    Tool("check_deployment", calls GET /projects/:id/deployments/:id),
    Tool("read_git_diff", runs git diff),
    Tool("cleanup", deletes test resources),
]

agent = Agent(
    model="claude-sonnet-4-6",
    tools=tools,
    instructions="""
    You are an E2E tester for DeployHQ.
    1. Read the git diff to see what changed
    2. Always run the baseline happy path
    3. Add targeted tests for changed areas
    4. Clean up all test resources
    5. Exit 0 on pass, exit 1 on fail with details
    """
)
```

**Pros**: Constrained tools (agent can only call defined operations), cheaper (Sonnet), faster
**Cons**: New dependency (Agent SDK), you maintain tool definitions

#### Approach 3: Hybrid (Recommended for Option C)

AI generates the plan, script executes it:

```yaml
- name: Generate test plan (AI)
  run: |
    claude --print --dangerously-skip-permissions \
      "Read the git diff for staging. \
       Output a JSON array of test scenarios to run. \
       Format: [{method, path, body, expected_status}]" \
      > test_plan.json

- name: Execute test plan (deterministic)
  run: ruby scripts/run_e2e_plan.rb test_plan.json
```

**Pros**: AI adaptability + deterministic execution, easy to debug (inspect test_plan.json)
**Cons**: Two-step process, AI might generate invalid plans

### Cost Comparison

| Approach | Model | Cost per run |
|----------|-------|-------------|
| Claude CLI (full) | Opus | ~$0.50 |
| Agent SDK | Sonnet | ~$0.05 |
| Hybrid (one prompt) | Sonnet | ~$0.03 |
| Option B (no AI) | N/A | $0 |

### Where It Runs

Same as Option B: **GitHub Actions runner**. The difference is it also calls the Anthropic API for reasoning, so it needs an `ANTHROPIC_API_KEY` secret in addition to the staging API key.

```
GitHub Actions runner (Ubuntu)
  ├── Anthropic API → Claude (generates test plan / makes decisions)
  └── HTTP calls → staging.deployhq.com (executes tests)
```

---

## Option D: Postman/Newman

Define the full flow as a Postman collection, run via `newman` CLI in CI.

- **Setup**: Build collection in Postman UI, export JSON, add `newman` to CI
- **Trigger**: `newman run e2e-collection.json -e staging-env.json`
- **Pros**: Visual editor, shareable
- **Cons**: Collections are JSON blobs (hard to review in PRs), limited programming logic, another tool to maintain, can't share logic with Ruby codebase

**Verdict**: Works but awkward for a Ruby shop.

---

## Comparison Matrix

| Criteria | Option B (Script) | Option C Hybrid | Option C Agent SDK |
|----------|-------------------|-----------------|-------------------|
| **Setup effort** | 1-2 days | 2-3 days | 3-5 days |
| **Ongoing maintenance** | Medium (manual) | Low (AI adapts) | Low (AI adapts) |
| **Reliability** | High | Medium-High | Medium |
| **Cost per run** | $0 | ~$0.03 | ~$0.05 |
| **Deterministic** | Yes | Execution yes, plan no | No |
| **Debuggability** | Easy | Easy (inspect JSON) | Medium (reasoning trace) |
| **Adapts to changes** | Only with manual scenarios | Yes (AI reads diff) | Yes (AI reads diff) |
| **New dependencies** | None | Claude CLI in CI | Agent SDK + Python/TS |
| **CI integration** | Simple | Medium | Medium |

---

## Recommendation

### Start with Option B

Get a working, reliable deploy gate with minimal effort. Covers the happy path and critical flows. No new dependencies, no API costs, runs in your existing CI.

### Layer Option C (Hybrid) on top

Once B is stable, add the hybrid AI layer:
- AI generates additional test scenarios based on the git diff
- Deterministic script executes them
- Baseline happy path always runs regardless of AI output (fall back to Option B if AI fails)

This gives you the adaptability of AI with the reliability of a script, and the baseline always protects you even if the AI generates a bad plan.

### Architecture of the combined approach

```
GitHub Actions (staging branch only)
  │
  ├── Job: RSpec tests (existing, 4 groups)
  │
  └── Job: E2E smoke test
        │
        ├── Step 1: Run baseline scenarios (Option B)
        │     └── Always: happy path, all server types, deployment
        │
        ├── Step 2: AI-generated scenarios (Option C hybrid, optional)
        │     ├── Claude reads git diff
        │     ├── Outputs additional test_plan.json
        │     └── Script executes the plan
        │
        └── Step 3: Cleanup test resources
```

If Step 2 fails or produces garbage, Step 1 still ran and gave you the core coverage. Over time, you can tune the AI prompt and expand its scope as you build confidence.

---

## API Endpoints for E2E Flow

Based on the existing Main API controllers, the full flow would exercise:

| Step | Method | Endpoint | Controller |
|------|--------|----------|------------|
| Create project | POST | `/projects` | `projects_controller.rb` |
| Set up repository | POST | `/projects/:id/repository` | `repositories_controller.rb` |
| Create SSH server | POST | `/projects/:id/servers` | `servers_controller.rb` |
| Create FTP server | POST | `/projects/:id/servers` | `servers_controller.rb` |
| Add config file | POST | `/projects/:id/config_files` | `config_files_controller.rb` |
| Add env variable | POST | `/projects/:id/environment_variables` | `environment_variables_controller.rb` |
| Add build command | POST | `/projects/:id/build_commands` | `build_commands_controller.rb` |
| Trigger deployment | POST | `/projects/:id/deployments` | `deployments_controller.rb` |
| Check status | GET | `/projects/:id/deployments/:id` | `deployments_controller.rb` |
| Cleanup | DELETE | `/projects/:id` | `projects_controller.rb` |

Authentication: HTTP Basic Auth with API key (`-u :<api_key>`).

---

## Open Questions

- **Target environment**: Run against live staging (tests current deploy) or Docker instance in CI (tests new code before deploy)?
- **Test account**: Create a dedicated test account on staging, or provision one dynamically?
- **Cleanup strategy**: Delete test resources after each run, or use a TTL-based cleanup?
- **Failure handling**: Should E2E failure block the deploy, or just alert?
- **Server connectivity**: E2E tests create servers -- do we need real servers to connect to, or just validate the API accepts the configuration?