* Fix: Add stale workflow cleanup and defense-in-depth error handling
Problem: Workflows could get stuck in "running" state indefinitely when
the async generator disconnected but the AI subprocess continued working.
This blocked new workflow invocations with "Workflow already running" errors.
Root cause: No cleanup mechanism existed for workflows that failed to
complete due to disconnection between the executor and the Claude SDK.
Solution (defense-in-depth):
1. Activity-based staleness detection: Workflows inactive for 15+ minutes
are auto-failed when a new workflow is triggered on the same conversation
2. Top-level error handling: All errors in workflow execution are caught
and the workflow is properly marked as failed (prevents stuck state)
3. Manual cancel command: /workflow cancel lets users force-fail stuck
workflows immediately
Changes:
- Add last_activity_at column via migration for staleness tracking
- Add updateWorkflowActivity() to track activity during execution
- Add staleness check before blocking concurrent workflows
- Wrap workflow execution in try-catch to ensure failure is recorded
- Add /workflow cancel subcommand to command handler
- Update test to match new error handling behavior
Fixes#232
* docs: Add /workflow cancel command to documentation
* Improve error handling and add comprehensive tests for stale workflow cleanup
Error handling improvements:
- Add workflow ID and error context to updateWorkflowActivity logs
- Add stack trace, error name, and cause to top-level catch block
- Separate DB failure recording from file logging for clearer error messages
- Add try-catch around staleness cleanup with user-facing error message
- Check sendCriticalMessage return value and log when user not notified
Test coverage additions:
- Add staleness detection tests (stale vs non-stale, fallback to started_at)
- Add /workflow cancel command tests
- Add updateWorkflowActivity function tests (including non-throwing behavior)
All 845 tests pass, type-check clean, lint clean.
- Fix router to not hardcode "review-pr" workflow name, instead directing
AI to check the available workflow list for PR review workflows
- Apply Prettier auto-formatting to multiple files
* feat: Add parallel block execution for workflows
- Add SingleStep, ParallelBlock, and WorkflowStep types
- Extend workflow parser to handle parallel: blocks in YAML
- Implement executeParallelBlock() for concurrent step execution
- Refactor executeStep into executeStepInternal for reusability
- Update main execution loop to handle parallel blocks
- Add 8 comprehensive tests for parallel block parsing
- Add logging functions for parallel block events
- Maintain backward compatibility with existing workflows
Each parallel step runs as an independent Claude Code agent with its own
fresh session, all working on the same worktree. Steps inside parallel
blocks execute concurrently using Promise.all(), enabling 2-5x faster
execution for parallel-safe workflows like code reviews.
Resolves#205
* fix: Address PR #217 review feedback for parallel block execution
Fixes based on comprehensive code review:
**Type Errors Fixed:**
- Added type guards (isSingleStep) before accessing .command on WorkflowStep
- Removed unused executeStep function (dead code)
- Removed unnecessary type assertions in type guards
**Parser Bugs Fixed:**
- Fixed nested parallel detection to check raw input before parsing
- Fixed invalid command rejection to fail entire parallel block if any step invalid
**Error Handling Restored:**
- Added logWorkflowError() calls for parallel and sequential failure paths
- Added logParallelBlockStart/Complete calls for workflow logging
**Test Improvements:**
- Fixed step notification format from "Step N" to "Step N/M"
- Added 5 new tests for parallel block execution covering:
- Basic parallel execution
- Parallel failure handling
- Sequential + parallel mix workflows
- Notification format verification
- Fresh session isolation for parallel steps
**Type Design Improvements:**
- Made type guards mutually exclusive (isSingleStep checks !('parallel' in step))
- Removed unnecessary type assertions after 'in' checks
All 828 tests pass, type-check passes, no lint errors.
* fix: Address PR #217 review feedback for parallel block execution
Critical fixes:
- Restore try-catch around updateWorkflowRun to prevent transient DB
errors from crashing workflows (regression fix)
- Update documentation to reflect wait-for-all behavior (not fail-fast)
Important fixes:
- Report ALL parallel failures in error message, not just the first one
- Add error handling for DB operations in failure path to ensure user
notification even when DB is unavailable
- Replace `as any` casts with proper type guards in loader tests
- Make StepDefinition a type alias for SingleStep (removes duplication)
Test improvements:
- Add test for workflows with multiple parallel blocks
- Add test for all-parallel-steps-fail scenario
- Add explicit backward compatibility test for step-only workflows
Loader improvements:
- Aggregate validation errors for better debugging - all errors are
now collected and logged together instead of one at a time
* fix: Add missing try-catch for sequential step failure and session test
- Wrap failWorkflowRun in try-catch for sequential step failures
(matches parallel block failure path for consistency)
- Add test verifying session reset after parallel block completes
(next sequential step correctly gets fresh session)
* Investigate issue #211: Workflow executor missing GitHub context
Root cause: issueContext built in GitHub adapter and used for routing,
but not passed to workflow executor. Context lost during workflow
simplification refactor (commit 0352067).
Fix requires threading issueContext through orchestrator -> workflow
routing -> executor -> variable substitution. Pattern already exists
in command system (orchestrator.ts:473-476).
* Investigate issue #211: Workflow executor missing GitHub context
* Fix: Workflow executor missing GitHub issue context (#211)
When workflows were triggered on GitHub issues/PRs, the issue context (title, body, labels) was built but never passed to the workflow executor. This caused AI to ask clarifying questions instead of executing workflows with the provided context.
Changes:
- Added issueContext parameter throughout workflow execution chain
- Threaded context from orchestrator → routing → executor → steps
- Added variable substitution support ($CONTEXT, $EXTERNAL_CONTEXT, $ISSUE_CONTEXT)
- Appended context to prompts following existing command system pattern
- Stored context in WorkflowRun metadata for session persistence
Fixes#211
* Fix: Add missing issueContext parameter to executeLoopWorkflow
Self-code-review caught critical bug where executeLoopWorkflow function used
issueContext variable without receiving it as a parameter. This would cause
compilation failure and runtime error for any loop-based workflow triggered
from GitHub.
Changes:
- Added issueContext parameter to executeLoopWorkflow function signature
- Passed issueContext argument at call site in executeWorkflow
This completes the context threading for ALL workflow execution paths
(both step-based and loop-based workflows).
* Archive implementation report for issue #211
* Fix PR review findings: test coverage, silent failures, and double-context
- Fix CI: Update workflows.test.ts to expect 5 parameters (metadata)
- Fix silent failure: Clear $CONTEXT variables when issueContext is undefined
to avoid sending literal "$CONTEXT" to AI
- Fix double-context: Only append context if not already substituted via
$CONTEXT variables (prevents duplicate context in prompts)
- Add comprehensive tests for issueContext handling:
- Step workflow with context passing and $CONTEXT substitution
- Loop workflow with context passing and $ISSUE_CONTEXT substitution
- Metadata storage verification
- Edge case: clearing variables when no context provided
- Add JSDoc documentation for issueContext parameters
- Introduce SubstitutionResult type for cleaner tracking of context usage
* Simplify workflow context substitution with helper function
- Extract CONTEXT_VAR_PATTERN as module-level constant (single compilation)
- Add buildPromptWithContext() helper to eliminate duplication
- Simplify substituteWorkflowVariables() with chained replacements
- Reduce code by 5 lines while improving readability
* Address PR review findings: regex safety, error handling, tests, and docs
Important fixes:
- Fix regex lastIndex hazard by creating fresh regex instances for each operation
- Add user warning when loop workflow metadata tracking fails (database issues)
- Add JSON.stringify validation in createWorkflowRun to catch serialization errors
Test improvements:
- Add test for context with special regex characters ($, .*, [a-z]+, etc.)
- Add test for multiple context variables in same prompt
Documentation:
- Add @param tags to substituteWorkflowVariables() JSDoc
- Expand buildPromptWithContext() JSDoc with all 5 parameters documented
- Enhance context variable clearing log with structured data
* Investigate issue #192: Detect and block concurrent workflow execution
* Fix: Detect and block concurrent workflow execution (#192)
When multiple workflow triggers are posted on the same issue, each one
was starting a separate workflow execution, leading to duplicate work,
wasted API tokens, and potential duplicate PRs.
Changes:
- Add concurrency check before createWorkflowRun() in executeWorkflow()
- Use existing getActiveWorkflowRun() to query for active workflows
- Send rejection message when workflow already running
- Update test mocks to properly handle getActiveWorkflowRun() queries
Fixes#192
* Archive investigation for issue #192
* Investigate issue #156: Add code formatting for workflow/command names
Created comprehensive investigation artifact analyzing the enhancement
request to use backticks for workflow and command names in bot messages.
Assessment:
- Priority: MEDIUM (improves clarity, doesn't block functionality)
- Complexity: LOW (simple string formatting in 2 files + tests)
- Confidence: HIGH (all locations identified, pattern established)
Changes required:
- src/workflows/executor.ts: 4 message templates
- src/handlers/command-handler.ts: workflow list formatting
- src/workflows/executor.test.ts: test expectations
Artifact: .archon/artifacts/issues/issue-156.md
* Fix: Use code formatting for workflow/command names (#156)
Workflow and command names in bot messages were shown in plain text or bold formatting, making them hard to distinguish from prose. This reduces readability and is inconsistent with how commands are shown elsewhere (e.g., /help uses backticks).
Changes:
- Wrap workflow names in backticks in workflow start/complete/failure messages
- Wrap command names in backticks in step notifications
- Wrap workflow/command names in backticks in /workflows list
- Update all test expectations to match new formatting
Fixes#156
* Archive investigation for issue #156
* Investigate issue #154: Skip step notification for single-step workflows
* Fix: Skip step notification for single-step workflows (#154)
Single-step workflows like 'assist' or 'review-pr' were showing "Step 1/1"
notifications which add no information since the workflow start message already
indicates what's running. This creates unnecessary noise for users.
Changes:
- Add conditional check in executor.ts to only send step notifications for multi-step workflows
- Update test to verify single-step workflows skip "Step 1/1" notification
- Multi-step workflows continue to show progress with step notifications
Fixes#154
* Archive investigation for issue #154
* Investigate issue #158: Remove redundant workflow completion message for GitHub
* Fix: Remove redundant workflow completion message on GitHub (#158)
After workflows post their artifacts to GitHub issues, a separate "Workflow complete"
comment was creating redundant notifications. Since GitHub uses batch mode, the artifact
itself signals completion, making the extra comment unnecessary noise.
Changes:
- Add platform check in workflow executor to suppress completion message for GitHub
- Keep completion message for streaming platforms (Telegram, Slack, Discord)
- Add tests for platform-specific completion message behavior
- Error messages remain unchanged (still sent for all platforms)
Fixes#158
* Archive investigation for issue #158
* Address PR review feedback
- Fix inaccurate comment: Changed from "streaming platforms" to
"non-GitHub platforms" since Slack/Discord default to batch mode
- Add structured context to suppression log (workflowName, workflowId,
conversationId) for better debugging
- Add explicit tests for Slack and Discord platforms
* Simplify platform completion tests with it.each()
Consolidate 4 near-identical test cases into single parameterized test:
- telegram, slack, discord: should send completion message
- github: should suppress completion message
Reduces test code from 97 lines to 35 lines while maintaining coverage.
* Fix outdated command loading documentation
* feat: Add Ralph-style autonomous iteration loops to workflow engine
Enable workflows to iterate autonomously until a completion signal is
detected (e.g., `<promise>COMPLETE</promise>`) or max iterations reached.
Changes:
- Add LoopConfig type with until signal, max_iterations, fresh_context
- Extend WorkflowDefinition to support loop + prompt (mutually exclusive with steps)
- Add executeLoopWorkflow function with completion signal detection
- Update loader to parse and validate loop configuration
- Add ralph.yaml example workflow demonstrating PRD implementation pattern
- Add 22 new tests covering loop execution and parsing
Loop workflows allow developers to run long-running tasks (like PRD
implementation) without manual phase transitions, following the pattern
popularized by Geoffrey Huntley.
* Add test-loop workflow for Ralph loop testing
* feat: Update worktree config to copy .archon files
- Include all .archon files in worktree copy (not just .archon/ralph)
- Update ralph.yaml with dynamic path detection for feature directories
- Add PR creation step at completion
- Use {prd-dir} variable for flexible path handling
* feat: Add ralph-prd command for generating PRD files
Creates structured PRD files for Ralph autonomous loops:
- Outputs to .archon/ralph/{feature-slug}/ directory
- Generates prd.md (full context) and prd.json (story tracking)
- Feature-based naming to avoid conflicts between projects
- Guides user through requirements gathering phases
* feat: Add ralph-fresh workflow with fresh_context: true
Fresh context mode for Ralph loops where each iteration:
- Starts with a clean slate (no memory of previous iterations)
- Re-reads progress.txt, prd.json, prd.md to understand current state
- Relies on progress.txt "Codebase Patterns" section for learnings
- Better for long loops and avoiding context confusion
* refactor: Rename ralph workflows with explicit descriptions
- Rename ralph.yaml → ralph-stateful.yaml (persistent memory mode)
- Update ralph-fresh.yaml description for clarity
- Both workflows now require explicit invocation
- Clear INVOKE WITH / NOT FOR / HOW IT WORKS / TRADE-OFFS sections
- Neither is "default" - user must choose explicitly
* chore: Set max_iterations to 10 on ralph workflows
* fix: Address PR review feedback for loop workflow
- Wrap database metadata update in try-catch to prevent misleading errors
- Add dropped message tracking and user warning in loop workflow
- Make plain signal detection more restrictive (end of output or own line)
- Add context (line number, preview) to YAML parse errors
- Make max iterations error message actionable with suggestions
- Remove unnecessary type assertions after discriminated union refactor
* chore: Remove unrelated plan files from branch
* Investigate error handling improvements (#128, #126, #129)
Add combined investigation artifact for batched error handling issues:
- #128: loadCommandPrompt error specificity
- #126: AI client error classification for user hints
- #129: Move isValidCommandName to parse-time validation
Artifact includes detailed implementation plan with code changes.
* Improve error handling in workflow engine (#128, #126, #129)
The workflow engine caught and handled errors but lost important context.
Users saw generic messages like "Command prompt not found" without knowing
the specific cause (security rejection vs empty file vs network timeout).
Changes:
- Add LoadCommandResult discriminated union for specific error reasons
- Return reason: 'invalid_name' | 'empty_file' | 'not_found' from loadCommandPrompt
- Add user-friendly hints for AI client errors based on error classification
- Move command name validation to parse time in loader (fail fast)
- Export isValidCommandName for use in loader
- Add tests for parse-time command validation
Fixes#128, fixes#126, fixes#129
* Archive investigation artifact for #128, #126, #129
* Address PR review feedback for error handling improvements
- Add database error handling in executeWorkflow with try-catch blocks
- Fix loadRepoConfig to log non-ENOENT errors (YAML syntax, permission denied)
- Extend LoadCommandResult type with permission_denied and read_error reasons
- Update loadCommandPrompt to return specific errors for EACCES vs other errors
- Add JSDoc documenting content non-empty invariant
- Re-export LoadCommandResult from types/index.ts for consistency
- Add test for 403 permission error hint
- Add unit tests for isValidCommandName function
* Improve workflow router to always invoke a workflow
- Add $ARGUMENTS substitution to workflow executor so commands receive user message
- Create assist workflow as catch-all fallback for questions, debugging, one-off tasks
- Create review-pr workflow wrapper for code reviews
- Update router prompt to require workflow selection (no text-only responses)
- Enhance workflow descriptions to serve as routing instructions
- Add tests for $ARGUMENTS substitution and multi-line descriptions
* Fix re-triggering loop: remove @archon from command output
The investigate-issue command was outputting "@archon implement issue #X"
which triggered the bot to process its own output as a new mention.
- Change step→command in types and YAML
- Add StepResult discriminated union for proper error handling
- Remove global workflow registry (pass as parameters)
- Rewrite router with /invoke-workflow pattern and restrictive prompt
- Add path validation to prevent directory traversal
- Move .archon/steps/ to .archon/commands/
- Add error handling to db/workflows.ts
- Update tests for new patterns
- Add logger.test.ts with 15 tests for JSONL logging
- Add db/workflows.test.ts with 15 tests for database operations
- Add edge case tests to loader.test.ts, router.test.ts, executor.test.ts
- Fix test pollution by mocking at connection level instead of module level
- All 641 tests pass
Implement a prompt orchestrator that chains prompts together for sequential
AI execution with artifacts passed between steps:
- Add workflow YAML parser for .archon/workflows/ discovery
- Create step executor with context management (clearContext flag)
- Implement router response parser for WORKFLOW: name detection
- Add JSONL event logging for observability
- Create /workflow list and /workflow reload commands
- Add database table for workflow run tracking
Workflows enable automated multi-step development tasks like
plan -> implement -> create-pr with each step receiving context
from previous steps.