
## Context
[DHQ-537](https://linear.app/deployhq/issue/DHQ-537/api-add-progress-fields-to-deployment-steps-response) asks for progress fields in the deployment steps API. 

PR #719 adds `started_at`/`finished_at` timestamps and computed `transfer_speed` to the deployment steps API response. This requires an Lhm migration on `deployment_steps` (166M+ rows in production, ~30min downtime window, weekend-only).

Reviewers raised two concerns:

1. **@facundofarias**: "Changing this table is tricky, we will have to do it during the weekend. Last time it took 30m." / "Also wondering if this is the only way to do it?"
2. **@thdurante**: Suggested skipping detailed progress fields and showing simpler status (like "thinking"/"processing") instead.

This proposal evaluates alternatives.

## Options

### Option A: Expose `completed_items` only

Expose the existing `completed_items` column in the API. No new columns, no migration.

**Changes**: One line in `to_hash` (already done in PR #719 — `completed_items` was always returned, just not documented as the key addition).

**API response**:
```json
{
  "step": "transfer_files",
  "status": "running",
  "total_items": 31344,
  "completed_items": 12400
}
```

**CLI can show**: `Transferring files: 12,400/31,344 (39%)`

**CLI cannot show**: Speed, ETA, elapsed time.

| Pros | Cons |
|------|------|
| Zero migration risk | No speed/ETA |
| Ships immediately | Clients must compute their own rate by polling delta |
| Already populated in production | |

---

### Option B: Separate `deployment_step_timings` table

Create a new join table instead of altering the 166M-row `deployment_steps` table.

```ruby
# New table — instant migration, no Lhm
class CreateDeploymentStepTimings < ActiveRecord::Migration[6.1]
  def change
    create_table :deployment_step_timings do |t|
      t.references :deployment_step, null: false, index: { unique: true }
      t.datetime :started_at, precision: 6
      t.datetime :finished_at, precision: 6
    end
  end
end
```

```ruby
# Model
class DeploymentStep < ApplicationRecord
  has_one :timing, class_name: 'DeploymentStepTiming', dependent: :delete
end
```

**API response**: Same as original PR (with `started_at`, `finished_at`, `transfer_speed`).

| Pros | Cons |
|------|------|
| No Lhm, instant migration | Extra join on API reads (N+1 risk, mitigated with `includes`) |
| Sub-second precision (DATETIME(6)) | Slightly more complex model |
| Only new deployments create rows | Two tables to reason about |
| Full speed/ETA support | |

---

### Option C: Store timing in Redis

Track `started_at`/`finished_at` in Redis, keyed by deployment step identifier. No schema change at all.

```ruby
def set_started_at
  Rails.cache.write("step_timing:#{id}:started_at", Time.current.iso8601(6), expires_in: 24.hours)
end

def started_at
  @started_at ||= Time.zone.parse(Rails.cache.read("step_timing:#{id}:started_at"))
end
```

**API response**: Same as original PR (with `started_at`, `finished_at`, `transfer_speed`).

| Pros | Cons |
|------|------|
| No migration at all | Ephemeral — data lost on Redis restart/eviction |
| Very fast reads/writes | No historical queries |
| Zero DB impact | Adds Redis as a dependency for this feature |
| | Cache expiry edge cases |

---

## Discarded options

| Option | Why discarded |
|--------|--------------|
| **Derive from `updated_at`** | `updated_at` is overwritten by `finalise` (which calls `save` after batch-inserting logs), so you can't recover when a step started running |
| **Compute from `deployment.started_at`** | Too coarse — a deployment runs many steps sequentially (preparing, building, transferring, finishing), so deployment start != step start |
| **Original PR (Lhm on `deployment_steps`)** | 30min production migration on 166M-row table, weekend-only window, risk flagged by reviewer |

## Recommendation

**Start with Option A, follow up with Option B.**

1. **Now**: Expose `completed_items` in the API (trivial, no migration). This unblocks CLI progress bars (`39% complete`) immediately.
2. **Follow-up**: Add `deployment_step_timings` table for `started_at`/`finished_at`/`transfer_speed`. This is a safe, instant migration that can ship on any day — no weekend window needed.

This splits the PR into a small shippable piece and a low-risk follow-up, addressing both reviewers' concerns.

## For discussion

- Is progress percentage (`completed_items / total_items`) sufficient for the CLI MVP, or is speed/ETA a hard requirement?
- If we go with Option B, should we eager-load timings in the deployments API to avoid N+1?
- Should `transfer_speed` be computed server-side or left to clients?
