# Admin Diagnostics API - Implementation Plan

## Overview

Build a cross-account, read-only diagnostics API for DeployHQ that enables autonomous support investigation via the MCP server. Currently, investigating customer issues requires human-run Rails console queries because the public API is account-scoped. This plan adds admin-level diagnostic endpoints to the Rails app (Tier 1) and documents a future read-only SQL tool (Tier 2).

## Current State Analysis

### Rails App
- **Admin namespace exists**: `Admin::` controllers at `app/controllers/admin/`, accessed via admin subdomain with IP whitelist (`AdminConstraint` in `routes.rb:9-13`)
- **Admin deployments controller exists**: `Admin::DeploymentsController` already has `recover`, `fix`, `requeue_deployment` actions -- but all are HTML-only, session-authenticated, and don't return JSON diagnostics
- **Internal API pattern exists**: `InternalApiController` uses Bearer token auth (`callback_verification_token`) for cross-service communication -- good pattern to follow. Test pattern at `spec/requests/internal_api_controller_spec.rb` uses request specs.
- **API auth pattern**: Public API uses HTTP Basic Auth with account-scoped API keys via `Authentication` concern

### MCP Server
- **Repository**: `deployhq/deployhq-mcp-server`, cloned locally at `/Users/martakaras/Projects/DeployHQ/deployhq-mcp-server/`
- **Architecture**: Factory function `createMCPServer()` in `src/mcp-server.ts` (line 46), tool dispatch via switch/case (lines 81-364), Zod validation from `src/tools.ts`
- **API client**: `src/api-client.ts` -- `DeployHQClient` class with HTTP Basic Auth, `request<T>()` private method (line 220)
- **Config**: `src/config.ts` -- env vars `DEPLOYHQ_EMAIL`, `DEPLOYHQ_API_KEY`, `DEPLOYHQ_ACCOUNT` + optional `DEPLOYHQ_URL`, `DEPLOYHQ_READ_ONLY`
- **Entry points**: `src/stdio.ts` (primary, line 1-104), `src/index.ts` (Express hosted, line 1-110)
- **Tests**: Vitest in `src/__tests__/mcp-server.test.ts`, helper `invokeToolForTest()` at line 375
- **Read-only mode**: Already supports `readOnlyMode` config flag that blocks write operations (checked per-tool in mcp-server.ts)

### Key Models
- `Deployment` -- main deployment record, `has_many :deployments_servers`, `has_many :deployment_steps`
- `DeploymentsServer` -- join table linking deployment to server, has `status`, `file_operations`
- `DeploymentStep` -- granular step record with `step` type (e.g. `transfer_files`), `server_id`, `status`
- `ServerGroup` -- `belongs_to :parent, polymorphic: true` (`server_group.rb:36`). Has `identifier` (string UUID) and `permalink` fields. Parent is typically a `Project` but the association is polymorphic.
- `Account` -- has `permalink` field but NO `identifier` field. Lookup must use `permalink` or numeric `id`.
- The bug that motivated this: mismatch between `deployments_servers` count and `transfer_files` step count

## Desired End State

The MCP server can look up any deployment by UUID (regardless of account) and return diagnostic data including server/step counts, mismatches, recent failures, and server group membership. All access is read-only, authenticated with a dedicated admin token, and logged.

### Verification
- MCP tool `admin_get_deployment` returns deployment with server count, step counts, and mismatch detection
- MCP tool `admin_list_recent_failures` returns recent failed deployments for a given account (looked up by permalink)
- MCP tool `admin_get_deployment_servers` returns per-server step breakdown
- MCP tool `admin_get_server_group` returns server group details with current membership
- All tools reject requests without valid admin token
- All tools are read-only (no mutations)

## What We're NOT Doing

- No write/mutation admin endpoints (no retry, recover, or requeue via MCP)
- No general-purpose SQL query endpoint (that's Tier 2, documented below)
- No changes to the existing public API or account-scoped MCP tools
- No admin UI changes -- this is API-only
- No changes to the admin subdomain/session-based admin access
- No global "all accounts" failure listing -- Tier 1 is scoped to a single account's failures at a time

## Key Design Decisions

### Identifier Strategy
All admin API endpoints use the same identifiers that appear in support workflows:
- **Deployments**: looked up by `identifier` (UUID string, e.g. `aab2065a-72ca-4263-a621-b4ddf59ca036`)
- **Accounts**: looked up by `permalink` (e.g. `kicksite`) -- Account has no `identifier` field
- **Server groups**: looked up by `identifier` (UUID string from `server_group.identifier`)

No endpoints accept raw numeric database IDs. This matches how support tickets reference these entities.

### Mismatch Detection Rules
A "server/step mismatch" is defined precisely as:
- For each `DeploymentsServer` record, there MUST be exactly one `DeploymentStep` with `step = 'transfer_files'` and matching `server_id`
- **Missing step**: a `DeploymentsServer` exists but no `transfer_files` step for that `server_id` -- this is the bug we fixed
- **Orphaned step**: a `transfer_files` step exists for a `server_id` that has no `DeploymentsServer` record
- **Duplicate steps**: more than one `transfer_files` step for the same `server_id`
- Preview deployments are excluded from mismatch detection (they don't have transfer steps)
- Disabled servers that still have `DeploymentsServer` records ARE included (they were part of the deployment)

The diagnostics response reports all three categories separately.

### Security & Exposure Model
- The `/admin_api/` routes are **intentionally outside** `AdminConstraint` (no subdomain restriction)
- Authentication relies solely on the Bearer token in the `Authorization` header
- This is the same model as `InternalApiController` which is also outside `AdminConstraint`
- **Token source**: `DEPLOYHQ_ADMIN_API_TOKEN` env var, set in deployment config
- **Token rotation**: Change the env var and restart. No database records to update.
- **Cancelled/inactive accounts**: Queryable. Support investigations often involve inactive accounts. The endpoints return account status in the response so the caller can see the state.
- **Future hardening**: IP allowlisting can be added later as a `before_action` if needed, using the same `AuthorizedNetworks.valid_ip?` pattern as `AdminConstraint`

## Implementation Approach

Follow the `InternalApiController` pattern: a new `AdminApiController` base class with Bearer token auth, under a `/admin_api/` route namespace. This keeps it separate from both the session-based admin panel and the account-scoped public API. The MCP server gets a new `DEPLOYHQ_ADMIN_TOKEN` env var and additional admin tools alongside existing tools.

---

## Phase 1: Rails Admin API Foundation

### Overview
Create the base controller, authentication, routing, and diagnostic endpoints.

### Changes Required:

#### 1. Admin API Token Configuration

The token will be read from `DEPLOYHQ_ADMIN_API_TOKEN` environment variable. No database storage needed -- single static token is sufficient for machine-to-machine auth. Same pattern as `AtechIdentity.callback_verification_token` used by `InternalApiController`.

#### 2. Base Controller
**File**: `app/controllers/admin_api_controller.rb` (new)
**Pattern**: Follow `InternalApiController` (`app/controllers/internal_api_controller.rb:61-86`)

```ruby
# frozen_string_literal: true

class AdminApiController < ActionController::Base
  skip_before_action :verify_authenticity_token

  before_action :authenticate_admin_api

  private

  def authenticate_admin_api
    auth_header = request.headers['Authorization']

    unless auth_header&.start_with?('Bearer ')
      Rails.logger.warn "[AdminAPI] Missing or invalid Authorization header from #{request.remote_ip}"
      render json: { error: 'Unauthorized' }, status: :unauthorized
      return
    end

    token = auth_header.sub('Bearer ', '')
    expected_token = ENV['DEPLOYHQ_ADMIN_API_TOKEN']

    if expected_token.blank?
      Rails.logger.error '[AdminAPI] DEPLOYHQ_ADMIN_API_TOKEN not configured'
      render json: { error: 'Internal server error' }, status: :internal_server_error
      return
    end

    unless ActiveSupport::SecurityUtils.secure_compare(expected_token, token)
      Rails.logger.warn "[AdminAPI] Invalid token from #{request.remote_ip}"
      render json: { error: 'Unauthorized' }, status: :unauthorized
      return
    end

    Rails.logger.info "[AdminAPI] Authenticated request from #{request.remote_ip} for #{request.path}"
  end
end
```

#### 3. Diagnostics Controller
**File**: `app/controllers/admin_api/diagnostics_controller.rb` (new)

```ruby
# frozen_string_literal: true

class AdminApi::DiagnosticsController < AdminApiController

  # GET /admin_api/deployments/:identifier
  #
  # Looks up a deployment by UUID across all accounts.
  # Returns deployment metadata, server/step counts, and mismatch diagnostics.
  def deployment
    @deployment = Deployment.find_by!(identifier: params[:identifier])
    project = @deployment.project
    account = project.account

    servers = @deployment.deployments_servers.includes(:server)
    steps = @deployment.deployment_steps

    server_ids = servers.pluck(:server_id)
    transfer_steps = steps.where(step: 'transfer_files')
    transfer_server_ids = transfer_steps.pluck(:server_id)

    # Mismatch detection (skip for preview deployments)
    if @deployment.preview?
      diagnostics = { preview: true, mismatch_detection_skipped: true }
    else
      missing_steps = server_ids - transfer_server_ids
      orphaned_steps = transfer_server_ids - server_ids
      duplicate_server_ids = transfer_server_ids.group_by(&:itself).select { |_, v| v.size > 1 }.keys

      diagnostics = {
        server_step_mismatch: missing_steps.any? || orphaned_steps.any? || duplicate_server_ids.any?,
        servers_missing_transfer_steps: missing_steps,
        orphaned_transfer_steps: orphaned_steps,
        duplicate_transfer_steps: duplicate_server_ids,
        server_count: server_ids.size,
        transfer_step_count: transfer_server_ids.size
      }
    end

    render json: {
      deployment: {
        identifier: @deployment.identifier,
        id: @deployment.id,
        status: @deployment.status,
        branch: @deployment.branch,
        preview: @deployment.preview?,
        created_at: @deployment.created_at,
        started_at: @deployment.started_at,
        completed_at: @deployment.completed_at,
        failed_at: @deployment.failed_at
      },
      project: {
        id: project.id,
        name: project.name,
        permalink: project.permalink
      },
      account: {
        id: account.id,
        name: account.name,
        permalink: account.permalink
      },
      servers: {
        count: server_ids.size,
        ids: server_ids
      },
      steps: {
        total: steps.count,
        by_type: steps.group(:step).count,
        failed: steps.where(status: 'failed').map { |s| { step: s.step, server_id: s.server_id, description: s.description } }
      },
      diagnostics: diagnostics
    }
  rescue ActiveRecord::RecordNotFound
    render json: { error: 'Deployment not found' }, status: :not_found
  end

  # GET /admin_api/deployments/:identifier/servers
  #
  # Returns per-server step breakdown for a deployment.
  def deployment_servers
    @deployment = Deployment.find_by!(identifier: params[:identifier])
    all_steps = @deployment.deployment_steps.to_a

    data = @deployment.deployments_servers.includes(:server).map do |ds|
      server_steps = all_steps.select { |s| s.server_id == ds.server_id }
      {
        server_id: ds.server_id,
        server_name: ds.server&.name,
        server_identifier: ds.server&.identifier,
        status: ds.status,
        steps: server_steps.map { |s| { step: s.step, status: s.status, identifier: s.identifier } },
        has_transfer_step: server_steps.any? { |s| s.step == 'transfer_files' }
      }
    end

    render json: { deployment_identifier: @deployment.identifier, servers: data }
  rescue ActiveRecord::RecordNotFound
    render json: { error: 'Deployment not found' }, status: :not_found
  end

  # GET /admin_api/accounts/:permalink/recent_failures
  #
  # Returns recent failed deployments for an account, looked up by permalink.
  def recent_failures
    account = Account.find_by!(permalink: params[:permalink])
    limit = [(params[:limit] || 20).to_i, 100].min

    deployments = Deployment
      .joins(:project)
      .where(projects: { account_id: account.id })
      .where.not(failed_at: nil)
      .order(failed_at: :desc)
      .limit(limit)
      .includes(:project)

    render json: {
      account: { id: account.id, name: account.name, permalink: account.permalink },
      failures: deployments.map do |d|
        {
          identifier: d.identifier,
          project: d.project.permalink,
          branch: d.branch,
          status: d.status,
          failed_at: d.failed_at,
          created_at: d.created_at
        }
      end
    }
  rescue ActiveRecord::RecordNotFound
    render json: { error: 'Account not found' }, status: :not_found
  end

  # GET /admin_api/server_groups/:identifier
  #
  # Returns server group details with current membership, looked up by UUID identifier.
  # Note: ServerGroup#parent is polymorphic but in practice is always a Project.
  # We guard against non-Project parents to avoid NoMethodError.
  def server_group
    group = ServerGroup.find_by!(identifier: params[:identifier])
    project = group.parent

    group_data = {
      id: group.id,
      identifier: group.identifier,
      name: group.name
    }

    # ServerGroup#parent is polymorphic (server_group.rb:36).
    # In practice it's always a Project, but guard against other types.
    if project.is_a?(Project)
      group_data[:project] = project.permalink
      group_data[:account] = project.account.permalink
    else
      group_data[:parent_type] = group.parent_type
      group_data[:parent_id] = group.parent_id
    end

    render json: {
      server_group: group_data,
      servers: group.servers.map { |s|
        {
          id: s.id,
          identifier: s.identifier,
          name: s.name,
          enabled: s.enabled?,
          protocol_type: s.protocol_type
        }
      },
      server_count: group.servers.count
    }
  rescue ActiveRecord::RecordNotFound
    render json: { error: 'Server group not found' }, status: :not_found
  end
end
```

#### 4. Routes
**File**: `config/routes.rb`
**Location**: Near the existing `internal_api` route at line 31

```ruby
# Admin API - token-authenticated diagnostics for MCP/support tooling
# Intentionally outside AdminConstraint: uses Bearer token auth, not session/subdomain.
# Same exposure model as InternalApiController.
namespace :admin_api, defaults: { format: :json } do
  get 'deployments/:identifier', to: 'diagnostics#deployment', as: :deployment_diagnostics
  get 'deployments/:identifier/servers', to: 'diagnostics#deployment_servers', as: :deployment_servers_diagnostics
  get 'accounts/:permalink/recent_failures', to: 'diagnostics#recent_failures', as: :account_recent_failures
  get 'server_groups/:identifier', to: 'diagnostics#server_group', as: :server_group_diagnostics
end
```

### Success Criteria:

#### Automated Verification:
- [ ] `bin/rspec spec/requests/admin_api/` passes
- [ ] Routes resolve correctly: `bin/rails routes | grep admin_api` shows 4 GET routes
- [ ] Requests without token return 401
- [ ] Requests with invalid token return 401
- [ ] Requests with valid token return 200 with expected JSON shape
- [ ] Mismatch detection correctly identifies missing, orphaned, and duplicate steps
- [ ] Preview deployments skip mismatch detection

#### Manual Verification:
- [ ] Test against staging with `DEPLOYHQ_ADMIN_API_TOKEN` env var set
- [ ] curl a known deployment UUID and verify JSON response shape

**Implementation Note**: Complete this phase and all automated verification before proceeding to Phase 2.

---

## Phase 2: MCP Server Admin Tools

### Overview
Add admin tools to the existing MCP server that call the new Rails admin API endpoints.

**Repository**: `deployhq/deployhq-mcp-server` (local: `/Users/martakaras/Projects/DeployHQ/deployhq-mcp-server/`)

### Changes Required:

#### 1. Config
**File**: `src/config.ts` (lines 6-12 `ServerConfig` interface, lines 75-91 `parseServerConfig`)
**Changes**: Add `adminToken?: string` to `ServerConfig`. Read from `DEPLOYHQ_ADMIN_TOKEN` env var. No CLI flag needed (env-only is fine for a token).

```typescript
// In ServerConfig interface
export interface ServerConfig {
  readOnlyMode: boolean;
  adminToken?: string;  // Optional admin API token for cross-account diagnostics
}

// In parseServerConfig()
const adminToken = process.env.DEPLOYHQ_ADMIN_TOKEN || undefined;
return { readOnlyMode, adminToken };
```

#### 2. Admin API Client
**File**: `src/api-client.ts`
**Changes**: Add `DeployHQAdminClient` class. Follows the same pattern as `DeployHQClient` (line 192+) but uses Bearer token auth instead of Basic Auth, and targets the `/admin_api/` path prefix.

```typescript
export interface AdminClientConfig {
  adminToken: string;
  timeout?: number;
  baseUrl: string;  // Same base URL as the regular client
}

export class DeployHQAdminClient {
  private readonly baseUrl: string;
  private readonly authHeader: string;
  private readonly timeout: number;

  constructor(config: AdminClientConfig) {
    this.baseUrl = config.baseUrl;
    this.timeout = config.timeout || 30000;
    this.authHeader = `Bearer ${config.adminToken}`;
  }

  // Same request() pattern as DeployHQClient (line 220)
  // but with Bearer auth and /admin_api/ prefix

  async getDeploymentDiagnostics(identifier: string): Promise<DeploymentDiagnostics> { ... }
  async getDeploymentServers(identifier: string): Promise<DeploymentServersResponse> { ... }
  async getRecentFailures(accountPermalink: string, limit?: number): Promise<RecentFailuresResponse> { ... }
  async getServerGroup(identifier: string): Promise<ServerGroupResponse> { ... }
}
```

Type interfaces for each response shape to match the Rails JSON contracts defined in Phase 1.

#### 3. Tool Definitions
**File**: `src/tools.ts`
**Changes**: Add 4 new Zod schemas and tool definitions at the end of the file.

```typescript
// Schemas
export const AdminGetDeploymentSchema = z.object({
  identifier: z.string().describe('Deployment UUID'),
});

export const AdminGetDeploymentServersSchema = z.object({
  identifier: z.string().describe('Deployment UUID'),
});

export const AdminListRecentFailuresSchema = z.object({
  account: z.string().describe('Account permalink'),
  limit: z.number().optional().describe('Max results (default 20, max 100)'),
});

export const AdminGetServerGroupSchema = z.object({
  identifier: z.string().describe('Server group UUID'),
});

// Tool definitions with readOnlyHint: true, destructiveHint: false
```

#### 4. Tool Registration & Dispatch
**File**: `src/mcp-server.ts`
**Changes**:

In `createMCPServer()` (line 46), accept `adminToken` parameter. The `ListToolsRequestSchema` handler (line 75) conditionally includes admin tools only when `adminToken` is set:

```typescript
server.setRequestHandler(ListToolsRequestSchema, async () => {
  const availableTools = adminToken ? [...tools, ...adminTools] : tools;
  return { tools: availableTools };
});
```

In the `CallToolRequestSchema` handler (line 81), add cases for admin tools. If an admin tool is called but no `adminToken` is configured, return an error explaining admin tools are not available.

#### 5. Entry Points
**Files**: `src/stdio.ts` (line 32-46), `src/index.ts` (line 23-36)
**Changes**: Read `DEPLOYHQ_ADMIN_TOKEN` from env. Pass `config.adminToken` to `createMCPServer()`. Log whether admin tools are enabled:

```typescript
log.info(`Admin tools: ${config.adminToken ? 'enabled' : 'disabled (no DEPLOYHQ_ADMIN_TOKEN)'}`);
```

### Success Criteria:

#### Automated Verification:
- [ ] `npm run build` succeeds
- [ ] `npm run lint` passes
- [ ] `npm run test` passes with new admin tool tests
- [ ] Admin tools don't appear in tool list when `DEPLOYHQ_ADMIN_TOKEN` is not set
- [ ] Admin tools appear when `DEPLOYHQ_ADMIN_TOKEN` is set
- [ ] Calling an admin tool without token configured returns descriptive error

#### Manual Verification:
- [ ] Configure MCP server with `DEPLOYHQ_ADMIN_TOKEN` in `.claude.json`
- [ ] From Claude Code, run `admin_get_deployment` with a known deployment UUID
- [ ] Verify the response includes server counts, step counts, and mismatch detection

**Implementation Note**: Complete this phase and all automated verification before proceeding to Phase 3.

---

## Phase 3: Tests

### Overview
Write request specs for the Rails endpoints and Vitest tests for the MCP tools.

### Changes Required:

#### 1. Rails Request Specs
**File**: `spec/requests/admin_api/diagnostics_spec.rb` (new)
**Pattern**: Follow `spec/requests/internal_api_controller_spec.rb` (request specs with Bearer token auth)

```ruby
# Test structure:
describe 'Admin API Diagnostics' do
  let(:valid_token) { 'test-admin-api-token' }

  before do
    ENV['DEPLOYHQ_ADMIN_API_TOKEN'] = valid_token
  end

  after do
    ENV.delete('DEPLOYHQ_ADMIN_API_TOKEN')
  end

  describe 'authentication' do
    # 401 without token
    # 401 with wrong token
    # 500 when DEPLOYHQ_ADMIN_API_TOKEN not configured
    # 200 with correct token
  end

  describe 'GET /admin_api/deployments/:identifier' do
    # Returns deployment with correct metadata
    # Returns correct server and step counts
    # Detects missing transfer_files steps (mismatch)
    # Detects orphaned transfer_files steps
    # Detects duplicate transfer_files steps
    # Skips mismatch detection for preview deployments
    # Returns 404 for unknown identifier
  end

  describe 'GET /admin_api/deployments/:identifier/servers' do
    # Returns per-server step breakdown
    # has_transfer_step is true/false correctly
    # Returns 404 for unknown identifier
  end

  describe 'GET /admin_api/accounts/:permalink/recent_failures' do
    # Returns failures scoped to account
    # Respects limit param
    # Caps limit at 100
    # Returns 404 for unknown permalink
  end

  describe 'GET /admin_api/server_groups/:identifier' do
    # Returns group with server list
    # Handles polymorphic parent gracefully
    # Returns 404 for unknown identifier
  end
end
```

Use factories to create deterministic test data:
- A deployment with 3 servers and only 2 transfer_files steps (to test mismatch detection)
- A deployment with matching servers and steps (no mismatch)
- A preview deployment (mismatch detection skipped)
- A failed deployment (for recent_failures)
- A server group with mixed enabled/disabled servers

#### 2. MCP Server Tests
**File**: `src/__tests__/admin-tools.test.ts` (new)
**Pattern**: Follow `src/__tests__/mcp-server.test.ts` (Vitest, `invokeToolForTest()` helper at line 375)

Tests:
- Admin tools hidden when no `adminToken` in config
- Admin tools visible when `adminToken` is provided
- Each admin tool validates input schema correctly
- Each admin tool calls the correct `DeployHQAdminClient` method

### Success Criteria:

#### Automated Verification:
- [ ] `bin/rspec spec/requests/admin_api/` passes
- [ ] `npm run test` passes in MCP server
- [ ] All mismatch scenarios covered with deterministic fixtures

---

## Testing Strategy

### Unit Tests (Rails)
- Auth: token validation, missing token, wrong token, unconfigured token
- Endpoints: correct JSON shape, correct counts, edge cases
- Mismatch detection: missing steps, orphaned steps, duplicate steps, preview skip
- Identifier lookup: by UUID/permalink, 404 for unknown

### Unit Tests (MCP Server)
- Tool registration: conditional on admin token
- Tool dispatch: correct schema validation, correct client method calls
- Error handling: descriptive errors when admin tools unavailable

### Integration Tests
- Full round-trip: MCP tool -> HTTP request -> Rails controller -> JSON response
- Test against staging with `DEPLOYHQ_ADMIN_API_TOKEN` set

### Manual Testing Steps
1. Set `DEPLOYHQ_ADMIN_API_TOKEN` on staging
2. Configure MCP server with `DEPLOYHQ_ADMIN_TOKEN` matching
3. From Claude Code, call `admin_get_deployment` with a known deployment UUID
4. Verify response shape matches the contract

## Performance Considerations

- Admin endpoints do eager loading (`includes`) to avoid N+1 queries
- `deployment_servers` endpoint loads all steps once then filters in Ruby (avoids N+1)
- `recent_failures` has a default limit of 20, hard cap at 100
- No caching needed -- these are diagnostic queries, not high-traffic endpoints

## Security Considerations

- **Auth model**: Bearer token via `Authorization` header, validated with `ActiveSupport::SecurityUtils.secure_compare` (prevents timing attacks)
- **Exposure**: Routes are outside `AdminConstraint` (no subdomain restriction), same model as `InternalApiController`. Authentication relies solely on the token.
- **Read-only**: All endpoints are GET requests returning JSON. No mutations possible.
- **Logging**: Every authenticated request logs IP and path. Failed auth attempts log IP.
- **Token management**: Stored as env var `DEPLOYHQ_ADMIN_API_TOKEN`. Rotation = change env var + restart. No database records.
- **Cancelled accounts**: Queryable by design. Support investigations often involve inactive accounts. Account status is visible in responses.
- **Future hardening**: IP allowlisting can be added as a `before_action` using `AuthorizedNetworks.valid_ip?` (same pattern as `AdminConstraint` in `routes.rb:9-13`).

---

## Tier 2: Read-Only SQL Tool (Future)

### Motivation
Tier 1 covers ~80% of support investigations with structured endpoints. The remaining 20% are ad-hoc questions that can't be predicted: "how many deployments hit this code path last week?", "is there a pattern in which server types fail?", "what's the distribution of server group sizes across accounts?".

### Design
A single admin API endpoint that accepts a SQL query and returns results:

```
POST /admin_api/query
{
  "sql": "SELECT COUNT(*) FROM deployments WHERE failed_at > '2026-04-01'"
}
```

### Guardrails Required
- **Read replica only**: Connect to a read replica, never the primary
- **SELECT only**: Parse/validate that the query starts with SELECT (or use a read-only database user)
- **Statement timeout**: 10-second max query time to prevent table scans
- **Row limit**: Max 1000 rows returned
- **Audit logging**: Log every query with timestamp, IP, and the full SQL
- **No sensitive columns**: Strip or redact known sensitive fields (encrypted passwords, tokens)
- **Rate limiting**: Max N queries per minute

### MCP Tool
```
admin_sql_query:
  sql: string (the SELECT query)
  -> returns: { columns: string[], rows: any[][], row_count: number, truncated: boolean }
```

### When to Build
Build Tier 2 when there are 3+ support cases where Tier 1 endpoints were insufficient and a custom query would have solved the problem. Track these cases to justify the investment and inform what guardrails are most important.

### Risks
- SQL injection is mitigated by read-only user + read replica, but still requires careful implementation
- Large result sets could be expensive -- row limit and timeout are essential
- Query complexity is unbounded -- may need a query cost estimator

---

## References

- Internal API pattern: `app/controllers/internal_api_controller.rb`
- Internal API test pattern: `spec/requests/internal_api_controller_spec.rb`
- Admin deployments controller: `app/controllers/admin/deployments_controller.rb`
- ServerGroup model (polymorphic parent): `app/models/server_group.rb:36`
- MCP server entry point: `deployhq-mcp-server/src/stdio.ts`
- MCP server tool dispatch: `deployhq-mcp-server/src/mcp-server.ts:81-364`
- MCP server test helper: `deployhq-mcp-server/src/__tests__/mcp-server.test.ts:375`
- Bug that motivated this: `fix/retry-deployment-server-step-mismatch` branch
- Handoff: `thoughts/shared/handoffs/general/2026-04-14_10-38-26_retry-deployment-server-step-mismatch.md`