* refactor(workflows)!: remove sequential execution mode, DAG becomes sole format Remove the steps-based (sequential) workflow execution mode entirely. All workflows now use the nodes-based (DAG) format exclusively. - Convert 8 sequential default workflows to DAG format - Delete archon-fix-github-issue sequential (DAG version absorbs triggers) - Remove SingleStep, ParallelBlock, StepWorkflow types and guards - Gut executor.ts from ~2200 to ~730 lines (remove sequential loop) - Remove step_started/completed/failed and parallel_agent_* events - Remove logStepStart/Complete and logParallelBlockStart/Complete - Delete SequentialEditor, StepProgress, ParallelBlockView components - Remove sequential mode from workflow builder and execution views - Delete executor.test.ts (4395 lines), update ~45 test fixtures - Update CLAUDE.md and docs to reflect DAG-only format BREAKING CHANGE: Workflows using `steps:` format are no longer supported. Convert to `nodes:` (DAG) format. The loader provides a clear error message directing users to the migration guide. * fix: address review findings — guard errors, remove dead code, add tests - Guard logNodeSkip/logWorkflowError against filesystem errors in dag-executor - Move mkdir(artifactsDir) inside try-catch with user-friendly error - Remove startFromStep dead parameter from executeWorkflow signature - Remove isDagWorkflow() tautology and all callers (20+ sites) - Remove dead BuilderMode/mode state from frontend components - Remove vestigial isLoop, selectedStep, stepIndex, step_index fields - Remove "DAG" prefix from user-facing resume/error messages - Fix 5 stale docs (README, getting-started, authoring-commands, web adapter) - Update event-emitter tests to use node events instead of removed step events - Add executor-shared.test.ts (12 tests) for substituteWorkflowVariables - Add executor.test.ts (11 tests) for concurrent-run, model resolution, resume * fix(workflows): add migration guide, port preamble tests, improve error message - Add docs/sequential-dag-migration-guide.md with 3 conversion patterns (single step, chain with clearContext, parallel block) and a Claude Code migration command for automated conversion - Update loader error message to point to migration guide and include ready-to-run claude command - Port 8 preamble tests from deleted executor.test.ts to new executor-preamble.test.ts: staleness detection (3), concurrent-run guard (3), DAG resume (2) Addresses review feedback from #805. * fix(workflows): update loader test to match new error message wording * fix: address review findings — fail stuck runs, remove dead code, fix docs - Mark workflow run as failed when artifacts mkdir fails (prevents 15-min concurrent-run guard block) - Remove vestigial totalSteps from WorkflowStartedEvent and executor - Delete dead WorkflowToolbar.tsx (369 lines, no importers) - Remove stepIndex prop from StepLogs (always 0, label now "Node logs") - Restore cn() in StatusBar for consistent conditional classes - Promote resume-check log to error, add errorType to failure logs - Remove ghost $PLAN/$IMPLEMENTATION_SUMMARY from docs (never implemented) - Update workflows.md rules to DAG-only format - Fix migration guide trigger_rule example - Clean up blank-line residues and stale comments * fix: resolve rebase conflicts with #729 (forkSession) and #730 (dashboard) - Remove sequential forkSession/persistSession code from #729 (dead after sequential removal) - Fix loader type narrowing for DagNode context field - Update dashboard components from #730 to use dagNodes instead of steps - Remove WorkflowStepEvent/ParallelAgentEvent from dashboard SSE hook
30 KiB
Authoring Workflows for Archon
This guide explains how to create workflows that orchestrate multiple commands into automated pipelines. Read Authoring Commands first - workflows are built from commands.
What is a Workflow?
A workflow is a YAML file that defines a directed acyclic graph (DAG) of nodes to execute. Workflows enable:
- Multi-node automation: Chain multiple AI agents together with dependency edges
- Parallel execution: Independent nodes in the same topological layer run concurrently
- Conditional branching:
when:conditions andtrigger_rulecontrol which nodes run - Autonomous loops: Loop nodes iterate until a condition is met
name: fix-github-issue
description: Investigate and fix a GitHub issue end-to-end
nodes:
- id: investigate
command: investigate-issue
- id: implement
command: implement-issue
depends_on: [investigate]
context: fresh
File Location
Workflows live in .archon/workflows/ relative to the working directory:
.archon/
├── workflows/
│ ├── my-workflow.yaml
│ └── review/
│ └── full-review.yaml # Subdirectories work
└── commands/
└── [commands used by workflows]
Archon discovers workflows recursively - subdirectories are fine. If a workflow file fails to load (syntax error, validation failure), it's skipped and the error is reported via /workflow list.
CLI vs Server: The CLI reads workflow files from wherever you run it (sees uncommitted changes). The server reads from the workspace clone at
~/.archon/workspaces/owner/repo/, which only syncs from the remote before worktree creation. If you edit a workflow locally but don't push, the server won't see it.
Workflow Schema
Workflows use a nodes: format where nodes declare explicit dependency edges. Independent nodes in the same topological layer run concurrently via Promise.allSettled. Skipped nodes (failed when: condition or trigger_rule) propagate their skipped state to dependants.
Example: Conditional Branching
name: classify-and-fix
description: Classify issue type, then run the appropriate fix path
nodes:
- id: classify
command: classify-issue
output_format:
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
required: [type]
- id: investigate
command: investigate-bug
depends_on: [classify]
when: "$classify.output.type == 'BUG'"
- id: plan
command: plan-feature
depends_on: [classify]
when: "$classify.output.type == 'FEATURE'"
- id: implement
command: implement-changes
depends_on: [investigate, plan]
trigger_rule: none_failed_min_one_success
Full Workflow Schema
# Required
name: workflow-name
description: |
What this workflow does.
# Optional
provider: claude
model: sonnet
modelReasoningEffort: medium # Codex only
webSearchMode: live # Codex only
# Required
nodes:
- id: classify # Unique node ID (used for dependency refs and $id.output)
command: classify-issue # Loads from .archon/commands/classify-issue.md
output_format: # Optional: enforce structured JSON output (Claude + Codex)
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
required: [type]
- id: investigate
command: investigate-bug
depends_on: [classify] # Wait for classify to complete
when: "$classify.output.type == 'BUG'" # Skip if condition is false
- id: plan
command: plan-feature
depends_on: [classify]
when: "$classify.output.type == 'FEATURE'"
- id: implement
command: implement-changes
depends_on: [investigate, plan]
trigger_rule: none_failed_min_one_success # Run if at least one dep succeeded
- id: inline-node
prompt: "Summarize the changes made in $implement.output" # Inline prompt (no command file)
depends_on: [implement]
context: fresh # Force fresh session for this node
provider: claude # Per-node provider override
model: haiku # Per-node model override
# hooks: # Optional: per-node SDK hook callbacks (Claude only) — see docs/hooks.md
# mcp: .archon/mcp/servers.json # Optional: per-node MCP servers (Claude only)
# skills: [remotion-best-practices] # Optional: per-node skills (Claude only) — see docs/skills.md
Node Fields
| Field | Type | Default | Description |
|---|---|---|---|
id |
string | required | Unique node identifier. Used in depends_on, when:, and $id.output substitution |
command |
string | — | Command name to load from .archon/commands/. Mutually exclusive with prompt |
prompt |
string | — | Inline prompt string. Mutually exclusive with command |
depends_on |
string[] | [] |
Node IDs that must complete before this node runs |
when |
string | — | Condition expression. Node is skipped if false |
trigger_rule |
string | all_success |
Join semantics when multiple upstreams exist |
output_format |
object | — | JSON Schema for structured output. Supported for Claude and Codex nodes |
context |
'fresh' |
— | Force a fresh AI session for this node |
provider |
'claude' | 'codex' |
inherited | Per-node provider override |
model |
string | inherited | Per-node model override |
allowed_tools |
string[] | — | Whitelist of built-in tools for this node. [] disables all built-in tools (MCP-only mode). Claude only — Codex nodes emit a warning and ignore this field |
denied_tools |
string[] | — | Blacklist of built-in tools to remove from this node. Applied after allowed_tools if both are set. Claude only — Codex nodes emit a warning and ignore this field |
retry |
object | — | Per-node retry configuration. See Retry Configuration. Omit to use the automatic default (2 retries, 3 s base delay, transient errors only) |
hooks |
object | — | Per-node SDK hook callbacks. Claude only — Codex nodes emit a warning and ignore this field. See docs/hooks.md |
mcp |
string | — | Path to MCP server config JSON file (relative to cwd or absolute). Environment variables ($VAR_NAME) in env/headers values are expanded from process.env at execution time. Claude only — Codex nodes emit a warning and ignore this field. See docs/mcp-servers.md |
skills |
string[] | — | Skill names to preload into this node's agent context. Skills must be installed in .claude/skills/. The node is wrapped in an AgentDefinition with these skills + Skill auto-added to allowedTools. Claude only — Codex nodes emit a warning and ignore this field. See docs/skills.md |
trigger_rule Values
| Value | Behavior |
|---|---|
all_success |
Run only if all upstream deps completed successfully (default) |
one_success |
Run if at least one upstream dep completed successfully |
none_failed_min_one_success |
Run if no deps failed AND at least one succeeded (skipped deps are ok) |
all_done |
Run when all deps are in a terminal state (completed, failed, or skipped) |
when: Condition Syntax
Conditions use string equality against upstream node outputs:
when: "$nodeId.output == 'VALUE'"
when: "$nodeId.output != 'VALUE'"
when: "$nodeId.output.field == 'VALUE'" # JSON dot notation for output_format nodes
- Uses
$nodeId.outputto reference the full output string of a completed node - Use
$nodeId.output.fieldto access a JSON field (foroutput_formatnodes) - Invalid expressions default to
true(fail open — node runs rather than silently skipping) - Skipped nodes propagate their skipped state to dependants
$node_id.output Substitution
In node prompts and commands, reference the output of any upstream node:
nodes:
- id: classify
command: classify-issue
- id: fix
command: implement-fix
depends_on: [classify]
# The command file can use $classify.output or $classify.output.field
Variable substitution order:
- Standard variables (
$WORKFLOW_ID,$USER_MESSAGE,$ARTIFACTS_DIR, etc.) - Node output references (
$nodeId.output,$nodeId.output.field)
output_format for Structured JSON
Use output_format to enforce JSON output from an AI node. For Claude, the schema is passed via the SDK's outputFormat option and structured_output is used directly. For Codex (v0.116.0+), the schema is passed via TurnOptions.outputSchema and the agent's inline JSON response is used. Both ensure clean JSON for when: conditions and $nodeId.output substitution:
nodes:
- id: classify
command: classify-issue
output_format:
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
severity:
type: string
enum: [low, medium, high]
required: [type]
- The output is captured as a JSON string and available via
$classify.output(full JSON) or$classify.output.type(field access) - Use
output_formatwhen downstream nodes need to branch on specific values viawhen:
allowed_tools and denied_tools for Tool Restrictions
Restrict which built-in tools a node can use without relying on prompt instructions. Restrictions are enforced at the Claude SDK level.
nodes:
- id: review
command: code-review
allowed_tools: [Read, Grep, Glob] # whitelist — only these tools available
- id: implement
command: implement-feature
denied_tools: [WebSearch, WebFetch] # blacklist — remove these tools
- id: mcp-only
command: mcp-command
allowed_tools: [] # empty list = disable all built-in tools
allowed_tools: []disables all built-in tools (useful for MCP-only nodes). Use themcpfield on a node to attach per-node MCP servers — see Node Fields- If both are set,
denied_toolsis applied afterallowed_tools undefined(field absent) and[]have different semantics — absent means use default tool set,[]means no tools- Claude only — Codex nodes emit a warning and continue (Codex doesn't support per-call tool restrictions)
Retry Configuration
Every node automatically retries on transient errors (SDK subprocess crashes, rate limits, network timeouts) using a default configuration: 2 retries, 3 s base delay with exponential backoff. You will see a platform notification before each retry attempt.
To opt out or customise, add a retry: block:
nodes:
- id: flaky-node
command: flaky-command
retry:
max_attempts: 3 # Total attempts including the first (1–5)
delay_ms: 5000 # Base delay before first retry in ms (1000–60000, default: 3000)
on_error: transient # 'transient' (default) | 'all'
- id: no-retry-node
command: stable-command
retry:
max_attempts: 1 # Effectively disables retry
- id: aggressive-retry
prompt: "Summarise the output"
retry:
max_attempts: 4
on_error: all # Retry even non-transient errors (use with caution)
Retry Fields
| Field | Type | Default | Constraints | Description |
|---|---|---|---|---|
max_attempts |
number | 3 |
1–5 | Total attempts including the first. 1 disables retry |
delay_ms |
number | 3000 |
1000–60000 | Base delay in ms before the first retry. Doubles each attempt (exponential backoff) |
on_error |
'transient' | 'all' |
'transient' |
— | Which errors trigger a retry. 'transient' = SDK crashes, rate limits, network timeouts only. 'all' = any error including unknown errors (FATAL errors such as auth failures are never retried regardless) |
Error Classification
Archon classifies errors into three buckets before deciding whether to retry:
| Class | Examples | Retried by default? |
|---|---|---|
| FATAL | Auth failure, permission denied, credit balance exhausted | ❌ Never (even with on_error: all) |
| TRANSIENT | Process crashed (exited with code), rate limit, network timeout |
✅ Yes |
| UNKNOWN | Unrecognised error messages | ❌ No (unless on_error: all) |
Retry Notifications
Before each retry the platform receives a message like:
⚠️ Node `node-name` failed with transient error (attempt 1/3). Retrying in 3s...
Two-Layer Retry Stack
Archon uses two independent retry layers:
SDK subprocess retry (claude.ts) — 3 total attempts, 2 s base backoff
↓ only if all SDK retries exhausted
Node retry (dag-executor) — default 2 retries, 3 s base backoff
↓ only if all node retries exhausted
Workflow fails → next invocation auto-resumes
This means a single transient crash may trigger up to 3 SDK retries before a single node retry attempt is consumed.
Resume: Resume is automatic — the next invocation detects the prior failed run and skips already-completed nodes. No
--resumeflag is needed. See Resume on Failure below.
Resume on Failure
When a workflow fails, the next invocation automatically resumes from where it left off — no --resume flag required.
How it works:
- On each invocation, Archon checks for a prior failed run of the same workflow in the same conversation.
- If found, it loads the
node_completedevents from that run to determine which nodes finished successfully. - Completed nodes are skipped; only failed and not-yet-run nodes are executed.
- You receive a platform message like:
▶️ Resuming workflow — skipping 3 already-completed node(s).
Known limitation: AI session context from prior nodes is not restored. If a downstream node relies on in-context knowledge from a prior run's session (rather than artifacts), it may need to re-read those artifacts explicitly.
Fresh start: If zero nodes completed in the prior run, Archon starts fresh (no nodes to skip).
Parallel Execution
Nodes without dependencies (or whose dependencies have all completed) run concurrently in the same topological layer:
nodes:
- id: setup
command: setup-scope # Creates shared context
- id: review-code
command: review-code
depends_on: [setup] # These three run in parallel
- id: review-comments
command: review-comments
depends_on: [setup]
- id: review-security
command: review-security
depends_on: [setup]
- id: synthesize
command: synthesize-reviews # Waits for all three reviews
depends_on: [review-code, review-comments, review-security]
context: fresh
Parallel Execution Rules
- Each node gets its own session - no context sharing (use
context: freshfor explicit control) - All nodes in a layer must complete before the next layer runs
- All failures are reported - not just the first one
- Shared state via artifacts - nodes read/write to known paths
Pattern: Coordinator + Parallel Agents
name: comprehensive-review
nodes:
- id: scope
command: create-review-scope
- id: code-review
command: code-review-agent
depends_on: [scope]
- id: comment-quality
command: comment-quality-agent
depends_on: [scope]
- id: test-coverage
command: test-coverage-agent
depends_on: [scope]
- id: synthesize
command: synthesize-review
depends_on: [code-review, comment-quality, test-coverage]
context: fresh
The coordinator writes to .archon/artifacts/reviews/pr-{n}/scope.md.
Each agent reads scope, writes to {category}-findings.md.
The synthesizer reads all findings and produces final output.
The Artifact Chain
Workflows work because artifacts pass data between nodes:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ investigate │ │ implement │ │ create-pr │
│ │ │ │ │ │
│ Reads: input │ │ Reads: artifact │ │ Reads: git diff │
│ Writes: artifact│────▶│ Writes: code │────▶│ Writes: PR │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
.archon/artifacts/ src/feature.ts
issues/issue-123.md src/feature.test.ts
Designing Artifact Flow
When creating a workflow, plan the artifact chain:
| Node | Reads | Writes |
|---|---|---|
investigate-issue |
GitHub issue via gh |
.archon/artifacts/issues/issue-{n}.md |
implement-issue |
Artifact from investigate | Code files, tests |
create-pr |
Git diff | GitHub PR |
Each command must know:
- Where to find its input
- Where to write its output
- What format to use
Model Configuration
Workflows can configure AI models and provider-specific options at the workflow level.
Configuration Priority
Model and options are resolved in this order:
- Workflow-level - Explicit settings in the workflow YAML
- Config defaults -
assistants.*in.archon/config.yaml - SDK defaults - Built-in defaults from Claude/Codex SDKs
Provider and Model
name: my-workflow
provider: claude # 'claude' or 'codex' (default: from config)
model: sonnet # Model override (default: from config assistants.claude.model)
Claude models:
sonnet- Fast, balanced (recommended)opus- Powerful, expensivehaiku- Fast, lightweightclaude-*- Full model IDs (e.g.,claude-3-5-sonnet-20241022)inherit- Use model from previous session
Codex models:
- Any OpenAI model ID (e.g.,
gpt-5.3-codex,o5-pro) - Cannot use Claude model aliases
Codex-Specific Options
name: my-workflow
provider: codex
model: gpt-5.3-codex
modelReasoningEffort: medium # 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
webSearchMode: live # 'disabled' | 'cached' | 'live'
additionalDirectories:
- /absolute/path/to/other/repo
- /path/to/shared/library
Model reasoning effort:
minimal,low- Fast, cheapermedium- Balanced (default)high,xhigh- More thorough, expensive
Web search mode:
disabled- No web access (default)cached- Use cached search resultslive- Real-time web search
Additional directories:
- Codex can access files outside the codebase
- Useful for shared libraries, documentation repos
- Must be absolute paths
Model Validation
Workflows are validated at load time:
- Provider/model compatibility checked
- Invalid combinations fail with clear error messages
- Validation errors shown in
/workflow list
Example validation error:
Model "sonnet" is not compatible with provider "codex"
Example: Config Defaults + Workflow Override
.archon/config.yaml:
assistants:
claude:
model: haiku # Fast model for most tasks
codex:
model: gpt-5.3-codex
modelReasoningEffort: low
webSearchMode: disabled
Workflow with override:
name: complex-analysis
description: Deep code analysis requiring powerful model
provider: claude
model: opus # Override config default (haiku) for this workflow
nodes:
- id: analyze
command: analyze-architecture
- id: report
command: generate-report
depends_on: [analyze]
The workflow uses opus instead of the config default haiku, but other settings inherit from config.
Workflow Description Best Practices
Write descriptions that help with routing and user understanding:
description: |
Investigate and fix a GitHub issue end-to-end.
**Use when**: User provides a GitHub issue number or URL
**NOT for**: Feature requests, refactoring, documentation
**Produces**:
- Investigation artifact
- Code changes
- Pull request linked to issue
**Steps**:
1. Investigate root cause
2. Implement fix with tests
3. Create PR
Good descriptions include:
- What the workflow does
- When to use it (and when NOT to)
- What it produces
- High-level steps
Variable Substitution
All workflows support these variables in prompts and commands:
| Variable | Description |
|---|---|
$WORKFLOW_ID |
Unique ID for this workflow run |
$USER_MESSAGE |
Original message that triggered workflow |
$ARGUMENTS |
Same as $USER_MESSAGE |
$ARTIFACTS_DIR |
Pre-created artifacts directory for this workflow run |
$BASE_BRANCH |
Base branch; auto-detected from git when worktree.baseBranch is not set. Fails only if referenced and detection fails |
$CONTEXT |
GitHub issue/PR context (if available) |
$EXTERNAL_CONTEXT |
Same as $CONTEXT |
$ISSUE_CONTEXT |
Same as $CONTEXT |
$nodeId.output |
Output of a completed upstream DAG node (DAG workflows only) |
$nodeId.output.field |
JSON field from a structured upstream node output (DAG workflows only) |
Example:
prompt: |
Workflow: $WORKFLOW_ID
Original request: $USER_MESSAGE
GitHub context:
$CONTEXT
[Instructions...]
Example Workflows
Simple Two-Node
name: quick-fix
description: |
Fast bug fix without full investigation.
Use when: Simple, obvious bugs.
NOT for: Complex issues needing root cause analysis.
nodes:
- id: fix
command: analyze-and-fix
- id: pr
command: create-pr
depends_on: [fix]
context: fresh
Investigation Pipeline
name: fix-github-issue
description: |
Full investigation and fix for GitHub issues.
Use when: User provides issue number/URL
Produces: Investigation artifact, code fix, PR
nodes:
- id: investigate
command: investigate-issue # Creates .archon/artifacts/issues/issue-{n}.md
- id: implement
command: implement-issue # Reads artifact, implements fix
depends_on: [investigate]
context: fresh
Parallel Review
name: comprehensive-pr-review
description: |
Multi-agent PR review covering code, comments, tests, and security.
Use when: Reviewing PRs before merge
Produces: Review findings, synthesized summary
nodes:
- id: scope
command: create-review-scope
- id: code-review
command: code-review-agent
depends_on: [scope]
- id: comment-review
command: comment-quality-agent
depends_on: [scope]
- id: test-review
command: test-coverage-agent
depends_on: [scope]
- id: security-review
command: security-review-agent
depends_on: [scope]
- id: synthesize
command: synthesize-reviews
depends_on: [code-review, comment-review, test-review, security-review]
context: fresh
Loop Node
Loop nodes iterate until a completion signal is detected. Use them within a DAG for autonomous iteration:
name: implement-prd
description: |
Autonomously implement a PRD, iterating until all stories pass.
Use when: Full PRD implementation
Requires: PRD file at .archon/prd.md
nodes:
- id: implement
loop:
until: COMPLETE
max_iterations: 15
fresh_context: true # Progress tracked in files
prompt: |
# PRD Implementation Loop
Workflow: $WORKFLOW_ID
## Instructions
1. Read PRD from `.archon/prd.md`
2. Read progress from `.archon/progress.json`
3. Find the next incomplete story
4. Implement it with tests
5. Run validation: `bun run validate`
6. Update progress file
7. If ALL stories complete and validated:
Output: <promise>COMPLETE</promise>
## Important
- Implement ONE story per iteration
- Always run validation after changes
- Update progress file before ending iteration
Classify and Route
name: classify-and-fix
description: |
Classify issue type and run the appropriate path in parallel.
Use when: User reports a bug or requests a feature
Produces: Code fix (bug path) or feature plan (feature path), then PR
nodes:
- id: classify
command: classify-issue
output_format:
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
required: [type]
- id: investigate
command: investigate-bug
depends_on: [classify]
when: "$classify.output.type == 'BUG'"
- id: plan
command: plan-feature
depends_on: [classify]
when: "$classify.output.type == 'FEATURE'"
- id: implement
command: implement-changes
depends_on: [investigate, plan]
trigger_rule: none_failed_min_one_success
- id: create-pr
command: create-pr
depends_on: [implement]
context: fresh
Test-Fix Loop
name: fix-until-green
description: |
Keep fixing until all tests pass.
Use when: Tests are failing and need automated fixing.
nodes:
- id: fix-loop
loop:
until: ALL_TESTS_PASS
max_iterations: 5
fresh_context: false # Remember what we've tried
prompt: |
# Fix Until Green
## Instructions
1. Run tests: `bun test`
2. If all pass: <promise>ALL_TESTS_PASS</promise>
3. If failures:
- Analyze the failure
- Fix the code (not the test, unless test is wrong)
- Run tests again
## Rules
- Don't skip or delete failing tests
- Don't modify test expectations unless they're wrong
- Each iteration should fix at least one failure
Common Patterns
Pattern: Gated Execution
Run different paths based on conditions using when::
name: smart-fix
description: Route to appropriate fix strategy based on issue complexity
nodes:
- id: analyze
command: analyze-complexity
output_format:
type: object
properties:
complexity:
type: string
enum: [simple, complex]
required: [complexity]
- id: quick-fix
command: quick-fix-strategy
depends_on: [analyze]
when: "$analyze.output.complexity == 'simple'"
- id: deep-fix
command: deep-fix-strategy
depends_on: [analyze]
when: "$analyze.output.complexity == 'complex'"
Pattern: Checkpoint and Resume
For long workflows, save checkpoints. Resume is automatic on re-invocation — completed nodes are skipped:
name: large-migration
description: Multi-file migration with checkpoint recovery
nodes:
- id: plan
command: create-migration-plan
- id: batch-1
command: migrate-batch-1
depends_on: [plan]
context: fresh
- id: batch-2
command: migrate-batch-2
depends_on: [batch-1]
context: fresh
- id: validate
command: validate-migration
depends_on: [batch-2]
context: fresh
Each batch command saves progress to an artifact. If the workflow fails mid-way, re-invoking it skips already-completed nodes.
Pattern: Human-in-the-Loop
Pause for human approval:
name: careful-refactor
description: Refactor with human approval at each stage
nodes:
- id: propose
command: propose-refactor # Creates proposal artifact
# Workflow pauses here - human reviews proposal
# Human triggers next workflow to continue:
Then a separate workflow to continue:
name: execute-refactor
nodes:
- id: execute
command: execute-approved-refactor
- id: pr
command: create-pr
depends_on: [execute]
context: fresh
Debugging Workflows
Check Workflow Discovery
bun run cli workflow list
Run with Verbose Output
bun run cli workflow run {name} "test input"
Watch the streaming output to see each node.
Check Artifacts
After a workflow runs, check the artifacts:
ls -la .archon/artifacts/
cat .archon/artifacts/issues/issue-*.md
Check Logs
Workflow execution logs to:
.archon/logs/{workflow-id}.jsonl
Each line is a JSON event (node start, AI response, tool call, etc.).
Workflow Validation
Before deploying a workflow:
-
Test each command individually
bun run cli workflow run {workflow} "test input" -
Verify artifact flow
- Does each node produce what downstream nodes expect?
- Are paths correct?
- Is the format complete?
-
Test edge cases
- What if the input is invalid?
- What if a node fails?
- What if an artifact is missing?
-
Check iteration limits (for loops)
- Is
max_iterationsreasonable? - What happens when limit is hit?
- Is
Summary
- Workflows orchestrate commands - YAML files that define a DAG of nodes
- Nodes with dependencies -
depends_onedges control execution order; independent nodes run in parallel - Artifacts are the glue - Commands communicate via files, not memory
context: fresh- Fresh session for a node, works from artifacts- Parallel execution - Nodes in the same topological layer run concurrently
- Loop nodes -
loop:on a node iterates until<promise>COMPLETE</promise>signal - Conditional branching -
when:conditions andtrigger_rulecontrol which nodes run output_format- Enforce structured JSON output from AI nodes for reliable branchingallowed_tools/denied_tools- Restrict which tools a node can use (Claude only, enforced at SDK level)retry:- All nodes auto-retry transient errors (default: 2 retries, 3 s backoff); configure per-node withretry:blockhooks— Attach static SDK hook callbacks to individual Claude nodes for tool control and context injection (see docs/hooks.md)mcp:— Attach per-node MCP servers via a JSON config file path (Claude only; env vars expanded at execution time); use withallowed_tools: []for MCP-only nodesskills:— Preload named skills into individual Claude nodes for domain expertise (Claude only; see docs/skills.md)- Test thoroughly - Each command, the artifact flow, and edge cases