DHQ-537: Progress Fields for Deployment Steps API — Alternatives Proposal

4 comments2 reviews

DHQ-537 asks for progress fields in the deployment steps API.

PR #719 adds started_at/finished_at timestamps and computed transfer_speed to the deployment steps API response. This requires an Lhm migration on deployment_steps (166M+ rows in production, ~30min downtime window, weekend-only).

Reviewers raised two concerns:

  1. @facundofarias: "Changing this table is tricky, we will have to do it during the weekend. Last time it took 30m." / "Also wondering if this is the only way to do it?"
  2. @thdurante: Suggested skipping detailed progress fields and showing simpler status (like "thinking"/"processing") instead.

This proposal evaluates alternatives.

#Option A: Expose completed_items only

Expose the existing completed_items column in the API. No new columns, no migration.

Changes: One line in to_hash (already done in PR #719 — completed_items was always returned, just not documented as the key addition).

API response:

{
  "step": "transfer_files",
  "status": "running",
  "total_items": 31344,
  "completed_items": 12400
}

CLI can show: Transferring files: 12,400/31,344 (39%)

CLI cannot show: Speed, ETA, elapsed time.

ProsCons
Zero migration riskNo speed/ETA
Ships immediatelyClients must compute their own rate by polling delta
Already populated in production

#Option B: Separate deployment_step_timings table

Create a new join table instead of altering the 166M-row deployment_steps table.

# New table — instant migration, no Lhm
class CreateDeploymentStepTimings < ActiveRecord::Migration[6.1]
  def change
    create_table :deployment_step_timings do |t|
      t.references :deployment_step, null: false, index: { unique: true }
      t.datetime :started_at, precision: 6
      t.datetime :finished_at, precision: 6
    end
  end
end
# Model
class DeploymentStep < ApplicationRecord
  has_one :timing, class_name: 'DeploymentStepTiming', dependent: :delete
end

API response: Same as original PR (with started_at, finished_at, transfer_speed).

ProsCons
No Lhm, instant migrationExtra join on API reads (N+1 risk, mitigated with includes)
Sub-second precision (DATETIME(6))Slightly more complex model
Only new deployments create rowsTwo tables to reason about
Full speed/ETA support

#Option C: Store timing in Redis

Track started_at/finished_at in Redis, keyed by deployment step identifier. No schema change at all.

def set_started_at
  Rails.cache.write("step_timing:#{id}:started_at", Time.current.iso8601(6), expires_in: 24.hours)
end

def started_at
  @started_at ||= Time.zone.parse(Rails.cache.read("step_timing:#{id}:started_at"))
end

API response: Same as original PR (with started_at, finished_at, transfer_speed).

ProsCons
No migration at allEphemeral — data lost on Redis restart/eviction
Very fast reads/writesNo historical queries
Zero DB impactAdds Redis as a dependency for this feature
Cache expiry edge cases

OptionWhy discarded
Derive from updated_atupdated_at is overwritten by finalise (which calls save after batch-inserting logs), so you can't recover when a step started running
Compute from deployment.started_atToo coarse — a deployment runs many steps sequentially (preparing, building, transferring, finishing), so deployment start != step start
Original PR (Lhm on deployment_steps)30min production migration on 166M-row table, weekend-only window, risk flagged by reviewer

Start with Option A, follow up with Option B.

  1. Now: Expose completed_items in the API (trivial, no migration). This unblocks CLI progress bars (39% complete) immediately.
  2. Follow-up: Add deployment_step_timings table for started_at/finished_at/transfer_speed. This is a safe, instant migration that can ship on any day — no weekend window needed.

This splits the PR into a small shippable piece and a low-risk follow-up, addressing both reviewers' concerns.

  • Is progress percentage (completed_items / total_items) sufficient for the CLI MVP, or is speed/ETA a hard requirement?
  • If we go with Option B, should we eager-load timings in the deployments API to avoid N+1?
  • Should transfer_speed be computed server-side or left to clients?
1

4 comments

+4 more