Archon/migrations/019_workflow_resume_path.sql
Rasmus Widing 54697d40d1
feat: resume failed workflows from prior artifacts on same branch (#440)
* feat: resume failed workflows from prior artifacts on same branch

When a workflow fails mid-execution (rate limits, crashes) and is
re-run on the same branch/worktree, the executor now detects the prior
failed run and resumes from the first incomplete step instead of
starting fresh.

- Adds working_path column to workflow_runs (migration 019 + combined)
- Adds findResumableRun() and resumeWorkflowRun() to db/workflows
- Resume detection runs before createWorkflowRun in the executor;
  falls through to fresh run on DB error (non-critical)
- Step skip logic at top of loop emits step_skipped_prior_success events
- Only resumes when current_step_index > 0 (something to skip)
- Completed runs and runs with 0 completed steps always get fresh start

* fix: add working_path to SQLite schema and split resume error handling

Two gaps from the initial implementation:

1. SQLite migrateColumns() never added working_path for existing
   databases. createSchema() already had it for fresh installs, but
   existing users would hit 'no such column: working_path' on first
   run. Added the IF NOT EXISTS guard alongside parent_conversation_id.
   Also added working_path to createSchema() CREATE TABLE for completeness.

2. findResumableRun and resumeWorkflowRun shared a single catch block
   that treated both errors as non-critical (fall through to fresh run).
   resumeWorkflowRun failing is different: a run was detected but
   couldn't be activated, so silently creating a fresh run would leave
   the user without prior artifacts. Split into two separate try-catch
   blocks: findResumableRun errors fall through (non-critical),
   resumeWorkflowRun errors propagate as { success: false }.

* fix: address resume workflow review feedback

- Add conversation_id filter to findResumableRun to prevent cross-conversation resume leaks
- Fix stepNumber initialization on resume (now starts at resumeFromStepIndex to show correct Step X/N)
- Add session-context warning to resume notification message
- Guard updateWorkflowRun(status=running) behind !resumeFromStepIndex to avoid redundant write on resumed runs
- Fix resumeWorkflowRun: move not-found throw outside try-catch to distinguish race from DB error
- Lower findResumableRun log level from error to warn (failure is non-critical at executor level)
- Add user notification when findResumableRun silently falls through to fresh run
- Log when current_step_index=0 guard drops a found-but-not-resumable run
- Use WorkflowRun directly instead of Awaited<ReturnType<...>> for workflowRun variable
- Add comment explaining no emitter.emit for step_skipped_prior_success events
- Add unit tests for findResumableRun and resumeWorkflowRun in workflows.test.ts
- Add executor tests for resume activation failure and findResumableRun error fall-through
- Fix mockQuery.mockRestore() → mockImplementation(defaultMockQuery) in resume tests
- Tighten resume message assertion to verify step numbers and context warning text

* feat: extend workflow resume to loop workflows

Loop workflows now resume from the recorded iteration instead of always
restarting from iteration 1. Derives startIteration from
workflowRun.current_step_index (set at the start of each iteration), and
notifies the user when resuming mid-loop.

* fix: harden loop workflow resume against edge cases

- Guard against startIteration > max_iterations (YAML reduced between runs):
  fail fast with a clear user message instead of silently misfiring
- Fix needsFreshSession to use startIteration instead of i===1 so the
  first resumed iteration is correctly treated as a session start
- Show resume banner for all resumed loop runs (not just startIteration > 1),
  since resuming from iteration 1 is still a resume not a fresh start
- Add comment clarifying that current_step_index is written at iteration START
2026-02-18 12:29:30 +02:00

7 lines
304 B
SQL

-- Add working_path to workflow_runs for resume detection
-- Version: 19.0
-- Description: Stores the cwd (worktree path) for each workflow run so
-- re-runs on the same branch can find prior failed runs and resume.
ALTER TABLE remote_agent_workflow_runs
ADD COLUMN IF NOT EXISTS working_path TEXT;