mirror of
https://github.com/coleam00/Archon
synced 2026-04-21 13:37:41 +00:00
* feat: resume failed workflows from prior artifacts on same branch
When a workflow fails mid-execution (rate limits, crashes) and is
re-run on the same branch/worktree, the executor now detects the prior
failed run and resumes from the first incomplete step instead of
starting fresh.
- Adds working_path column to workflow_runs (migration 019 + combined)
- Adds findResumableRun() and resumeWorkflowRun() to db/workflows
- Resume detection runs before createWorkflowRun in the executor;
falls through to fresh run on DB error (non-critical)
- Step skip logic at top of loop emits step_skipped_prior_success events
- Only resumes when current_step_index > 0 (something to skip)
- Completed runs and runs with 0 completed steps always get fresh start
* fix: add working_path to SQLite schema and split resume error handling
Two gaps from the initial implementation:
1. SQLite migrateColumns() never added working_path for existing
databases. createSchema() already had it for fresh installs, but
existing users would hit 'no such column: working_path' on first
run. Added the IF NOT EXISTS guard alongside parent_conversation_id.
Also added working_path to createSchema() CREATE TABLE for completeness.
2. findResumableRun and resumeWorkflowRun shared a single catch block
that treated both errors as non-critical (fall through to fresh run).
resumeWorkflowRun failing is different: a run was detected but
couldn't be activated, so silently creating a fresh run would leave
the user without prior artifacts. Split into two separate try-catch
blocks: findResumableRun errors fall through (non-critical),
resumeWorkflowRun errors propagate as { success: false }.
* fix: address resume workflow review feedback
- Add conversation_id filter to findResumableRun to prevent cross-conversation resume leaks
- Fix stepNumber initialization on resume (now starts at resumeFromStepIndex to show correct Step X/N)
- Add session-context warning to resume notification message
- Guard updateWorkflowRun(status=running) behind !resumeFromStepIndex to avoid redundant write on resumed runs
- Fix resumeWorkflowRun: move not-found throw outside try-catch to distinguish race from DB error
- Lower findResumableRun log level from error to warn (failure is non-critical at executor level)
- Add user notification when findResumableRun silently falls through to fresh run
- Log when current_step_index=0 guard drops a found-but-not-resumable run
- Use WorkflowRun directly instead of Awaited<ReturnType<...>> for workflowRun variable
- Add comment explaining no emitter.emit for step_skipped_prior_success events
- Add unit tests for findResumableRun and resumeWorkflowRun in workflows.test.ts
- Add executor tests for resume activation failure and findResumableRun error fall-through
- Fix mockQuery.mockRestore() → mockImplementation(defaultMockQuery) in resume tests
- Tighten resume message assertion to verify step numbers and context warning text
* feat: extend workflow resume to loop workflows
Loop workflows now resume from the recorded iteration instead of always
restarting from iteration 1. Derives startIteration from
workflowRun.current_step_index (set at the start of each iteration), and
notifies the user when resuming mid-loop.
* fix: harden loop workflow resume against edge cases
- Guard against startIteration > max_iterations (YAML reduced between runs):
fail fast with a clear user message instead of silently misfiring
- Fix needsFreshSession to use startIteration instead of i===1 so the
first resumed iteration is correctly treated as a session start
- Show resume banner for all resumed loop runs (not just startIteration > 1),
since resuming from iteration 1 is still a resume not a fresh start
- Add comment clarifying that current_step_index is written at iteration START
7 lines
304 B
SQL
7 lines
304 B
SQL
-- Add working_path to workflow_runs for resume detection
|
|
-- Version: 19.0
|
|
-- Description: Stores the cwd (worktree path) for each workflow run so
|
|
-- re-runs on the same branch can find prior failed runs and resume.
|
|
|
|
ALTER TABLE remote_agent_workflow_runs
|
|
ADD COLUMN IF NOT EXISTS working_path TEXT;
|