Commit graph

1284 commits

Author SHA1 Message Date
Cole Medin
2732288f07
Merge pull request #1065 from coleam00/archon/task-fix-issue-1055
feat(core): inject workflow run context into orchestrator prompt
2026-04-16 07:55:00 -05:00
Cole Medin
b100cd4b48
Merge pull request #1064 from coleam00/archon/task-fix-issue-1054
fix(web): interleave tool calls with text during SSE streaming
2026-04-16 07:44:48 -05:00
Cole Medin
5acf5640c8
Merge pull request #1063 from coleam00/archon/task-fix-issue-1035
fix: archon setup --spawn fails on Windows with spaces in repo path
2026-04-16 07:36:58 -05:00
Cole Medin
68ecb75f0f
Merge pull request #1052 from coleam00/archon/task-fix-github-issue-1775831868291
fix(cli): send workflow dispatch/result messages for Web UI cards
2026-04-16 07:32:52 -05:00
Cole Medin
51b8652d43 fix: complete defensive chaining and add missing test coverage for PR #1052
- Fix half-applied optional chaining in WorkflowProgressCard refetchInterval
  (query.state.data?.run.status → ?.run?.status) preventing TypeError in polling
- Add dispatch-failure test verifying executeWorkflow still runs when
  dispatch sendMessage fails
- Add paused-workflow test proving paused guard fires before summary check
- Strengthen dispatch metadata assertion to verify workerConversationId format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 07:32:37 -05:00
jinglesthula
3dedc22537
Fix incorrect substep numbering in setup.md (#1013)
Substeps for Step 4 were: 4a, 4b, 5c, 5d

Co-authored-by: Jon Anderson <jonathan.anderson@byu.edu>
2026-04-15 12:15:35 +03:00
Rasmus Widing
882fc58f7c
fix: stop server startup from auto-failing in-flight workflow runs (#1216) (#1231)
* fix: stop server startup from auto-failing in-flight workflow runs (#1216)

`failOrphanedRuns()` at server startup unconditionally flipped every
`running` workflow row to `failed`, including runs actively executing in
another process (CLI / adapters). The dag-executor's between-layer
status check then bailed out of the run, exit code 1 — even though every
node had completed successfully. Same class of bug the CLI already
learned (see comment at packages/cli/src/cli.ts:256-258).

Per the new CLAUDE.md principle "No Autonomous Lifecycle Mutation Across
Process Boundaries", we don't replace the call with a timer-based
heuristic. Instead we remove it and surface running workflows to the
user with one-click actions.

Backend
- `packages/server/src/index.ts` — remove the `failOrphanedRuns()` call
  at startup. Replace with explanatory comment referencing the CLI
  precedent and the CLAUDE.md principle. The function in
  `packages/core/src/db/workflows.ts:911` is preserved for use by the
  explicit `archon workflow cleanup` command.

UI
- `packages/web/src/components/layout/TopNav.tsx` — replace the binary
  pulse dot on the Dashboard nav with a numeric count badge sourced
  from `/api/dashboard/runs` `counts.running`. Hidden when count is 0.
  Same 10s polling interval as before. No animation — a steady factual
  count is honest; a pulse would imply system judgment.

- `packages/web/src/components/dashboard/ConfirmRunActionDialog.tsx`
  (new) — shadcn AlertDialog wrapper for destructive workflow-run
  actions, mirroring the codebase-delete pattern in
  `sidebar/ProjectSelector.tsx`. Caller passes the existing button as
  `trigger` slot; dialog handles open/close via Radix.

- `packages/web/src/components/dashboard/WorkflowRunCard.tsx` — replace
  4 `window.confirm()` callsites (Reject, Abandon, Cancel, Delete) with
  ConfirmRunActionDialog. Each gets a context-appropriate description.

- `packages/web/src/components/dashboard/WorkflowHistoryTable.tsx` —
  replace 1 `window.confirm()` (Delete) with the same dialog.

CHANGELOG entries under [Unreleased]: Fixed for #1216, two Changed
entries for the nav badge and dialog upgrade.

No new tests: the web package has no React component testing
infrastructure (existing `bun test` covers `src/lib/` and `src/stores/`
only). Type-check + lint + manual UI verification + the backend
reproducer are the verification levels.

Closes #1216.

* review: address PR #1231 nits — stale doc + 3 code polish

PR review surfaced one real correctness issue in docs and three small
code polish items. None block merge; addressing for cleanliness.

- packages/docs-web/src/content/docs/guides/authoring-workflows.md:486
  removed the "auto-marked as failed on next startup" paragraph that
  described the now-deleted behavior. Replaced with a "Crashed servers /
  orphaned runs" note pointing users at `archon workflow cleanup` and
  the dashboard Cancel/Abandon buttons; explains the auto-resume
  mechanism still works once the row reaches a terminal status.

- ConfirmRunActionDialog: narrow `onConfirm` from
  `() => void | Promise<void>` to `() => void`. All five callsites are
  synchronous wrappers around React Query mutations whose error
  handling lives at the page level (`runAction` in DashboardPage). The
  union widened the API for no current caller. Documented in the JSDoc
  what to do if an awaiting caller appears later.

- TopNav: dropped the redundant `String(runningCount)` cast in the
  aria-label — template literal coerces. Also rewrote the comment above
  the `listDashboardRuns` query: the previous version implied `limit=1`
  constrained `counts.running`; in fact `counts` is a server-side
  aggregate independent of `limit`, and `limit=1` only minimises the
  `runs` array we discard.

* review: correct remediation docs — cleanup ≠ abandon

CodeRabbit caught a factual error I introduced in the doc update:
`archon workflow cleanup` calls `deleteOldWorkflowRuns(days)` which
DELETEs old terminal rows (`completed`/`failed`/`cancelled` older than
N days) for disk hygiene. It does NOT transition stuck `running` rows.

The correct remediation for a stuck `running` row is either the
dashboard's per-row Cancel/Abandon button (already documented) or
`archon workflow abandon <run-id>` from the CLI (existing subcommand,
see packages/cli/src/cli.ts:366-374).

Fixed three locations:
- packages/docs-web/.../guides/authoring-workflows.md — replaced the
  vague "clean up explicitly" with concrete Web UI / CLI instructions
  and an explicit "Not to be confused with `archon workflow cleanup`"
  callout to close off the ambiguity CodeRabbit flagged.
- packages/server/src/index.ts — comment updated to point at the
  correct remediation (`archon workflow abandon`) and clarify that
  `archon workflow cleanup` is unrelated disk-hygiene.
- CHANGELOG.md — same correction in the [Unreleased] Fixed entry.
2026-04-15 12:05:41 +03:00
Rasmus Widing
5c8c39e5c9
fix(test): update stale mocks in cleanup-service 'continues processing' test (#1230) (#1232)
After PR #1034 changed worktree existence checks from execFileAsync to
fs/promises.access, the mockExecFileAsync rejections had no effect.
removeEnvironment needs getById + getCodebase mocks to proceed past
the early-return guard, otherwise envs route to report.skipped instead
of report.removed.

Replace the two stale mockExecFileAsync rejection calls with proper
mockGetById and mockGetCodebase return values for both test environments.

Fixes #1230
2026-04-15 11:53:02 +03:00
Shane McCarron
f61d576a4d
feat(isolation): auto-init submodules in worktrees (#1189)
Worktrees created via `git worktree add` do not initialize submodules — monorepo workflows that need submodule content find empty directories. Auto-detect `.gitmodules` and run `git submodule update --init --recursive` after worktree creation; classify failures through the isolation error pipeline.

Behavior:
- `.gitmodules` absent → skip silently (zero-cost probe, no effect on non-submodule repos)
- `.gitmodules` present → run submodule init by default (opt out via `worktree.initSubmodules: false`)
- submodule init or `.gitmodules` read failure → throw with classified error including opt-out guidance
- Only `ENOENT` on `.gitmodules` is treated as "no submodules"; other access errors (EACCES, EIO) surface as failures to prevent silent empty-dir worktrees

Changes:
- `packages/isolation/src/providers/worktree.ts` — `initSubmodules()` method + call site in `createWorktree()`
- `packages/isolation/src/errors.ts` — collapsed `errorPatterns` + `knownPatterns` into single `ERROR_PATTERNS` source of truth with `known: boolean` per entry; added submodule pattern with opt-out guidance
- `packages/isolation/src/types.ts` + `packages/core/src/config/config-types.ts` — new `initSubmodules?: boolean` config option
- `packages/docs-web/src/content/docs/reference/configuration.md` — documented the new option and submodule behavior
- Tests: default-on, explicit opt-in, explicit opt-out, skip-when-absent, fail-fast on EACCES, fail-fast on git failure, fail-fast on timeout

Credit to @halindrome for the original implementation and root-cause mapping across #1183, #1187, #1188, #1192.

Follow-up: #1192 (codebase identity rearchitect) would retire the cross-clone guard code in `resolver.ts` and `worktree.ts` that #1198, #1206 added. Separate PR.

Closes #1187
2026-04-15 09:48:18 +03:00
Rasmus Widing
c4ab0a2333 docs(claude.md): codify "no autonomous lifecycle mutation across process boundaries"
Generalize the lesson from #1216 (and the CLI precedent at
packages/cli/src/cli.ts:256-258) into a project-wide engineering
principle. When a process cannot reliably distinguish "actively
running elsewhere" from "orphaned by a crash" — typically because the
work was started by a different process or input source (CLI, adapter,
webhook, web UI, cron) — it must not autonomously mutate that work
based on a timer or staleness guess. Surface and ask instead.

Phrased to be specific about what is still allowed: heuristics for
recoverable operations (retry backoff, subprocess timeouts, hygiene
cleanup of terminal-status data) are not banned. The rule targets
destructive mutation of non-terminal state owned by an unknowable
other party.
2026-04-15 09:14:15 +03:00
Kagura
73d9240eb3
fix(isolation): complete reports false success when worktree remains on disk (fixes #964) (#1034)
* fix(isolation): complete reports false success when worktree remains on disk (fixes #964)

Three changes to prevent ghost worktrees:

1. isolationCompleteCommand now checks result.worktreeRemoved — if the
   worktree was not actually removed (partial failure), it reports
   'Partial' with warnings and counts as failed, not completed.
   Previously only skippedReason was checked; a destroy that returned
   successfully but with worktreeRemoved=false would still print
   'Completed'.

2. WorktreeProvider.destroy() now runs 'git worktree prune' after
   removal to clean up stale worktree references that git may keep
   even after the directory is removed.

3. WorktreeProvider.destroy() adds post-removal verification: after
   git worktree remove, it checks 'git worktree list --porcelain' to
   confirm the worktree is actually unregistered. If still registered,
   worktreeRemoved is set back to false with a descriptive warning.

* fix: address CodeRabbit review — ghost worktree prune, partial cleanup callers, accurate messages

* test: add regression test for Partial branch in isolation complete

Exercises the !result.worktreeRemoved path (without skippedReason)
that was flagged as uncovered by CodeRabbit review.
2026-04-14 17:58:45 +03:00
Matt Chapman
28b258286f
Extra backticks for markdown block to fix formatting (#1218)
of nested code blocks.
2026-04-14 17:58:31 +03:00
Rasmus Widing
81859d6842
fix(providers): replace Claude SDK embed with explicit binary-path resolver (#1217)
* feat(providers): replace Claude SDK embed with explicit binary-path resolver

Drop `@anthropic-ai/claude-agent-sdk/embed` and resolve Claude Code via
CLAUDE_BIN_PATH env → assistants.claude.claudeBinaryPath config → throw
with install instructions. The embed's silent failure modes on macOS
(#1210) and Windows (#1087) become actionable errors with a documented
recovery path.

Dev mode (bun run) remains auto-resolved via node_modules. The setup
wizard auto-detects Claude Code by probing the native installer path
(~/.local/bin/claude), npm global cli.js, and PATH, then writes
CLAUDE_BIN_PATH to ~/.archon/.env. Dockerfile pre-sets CLAUDE_BIN_PATH
so extenders using the compiled binary keep working. Release workflow
gets negative and positive resolver smoke tests.

Docs, CHANGELOG, README, .env.example, CLAUDE.md, test-release and
archon skills all updated to reflect the curl-first install story.

Retires #1210, #1087, #1091 (never merged, now obsolete).
Implements #1176.

* fix(providers): only pass --no-env-file when spawning Claude via Bun/Node

`--no-env-file` is a Bun flag that prevents Bun from auto-loading
`.env` from the subprocess cwd. It is only meaningful when the Claude
Code executable is a `cli.js` file — in which case the SDK spawns it
via `bun`/`node` and the flag reaches the runtime.

When `CLAUDE_BIN_PATH` points at a native compiled Claude binary (e.g.
`~/.local/bin/claude` from the curl installer, which is Anthropic's
recommended default), the SDK executes the binary directly. Passing
`--no-env-file` then goes straight to the native binary, which
rejects it with `error: unknown option '--no-env-file'` and the
subprocess exits code 1.

Emit `executableArgs` only when the target is a `.js` file (dev mode
or explicit cli.js path). Caught by end-to-end smoke testing against
the curl-installed native Claude binary.

* docs: record env-leak validation result in provider comment

Verified end-to-end with sentinel `.env` and `.env.local` files in a
workflow CWD that the native Claude binary (curl installer) does not
auto-load `.env` files. With Archon's full spawn pathway and parent
env stripped, the subprocess saw both sentinels as UNSET. The
first-layer protection in `@archon/paths` (#1067) handles the
inheritance leak; `--no-env-file` only matters for the Bun-spawned
cli.js path, where it is still emitted.

* chore(providers): cleanup pass — exports, docs, troubleshooting

Final-sweep cleanup tied to the binary-resolver PR:

- Mirror Codex's package surface for the new Claude resolver: add
  `./claude/binary-resolver` subpath export and re-export
  `resolveClaudeBinaryPath` + `claudeFileExists` from the package
  index. Renames the previously single `fileExists` re-export to
  `codexFileExists` for symmetry; nothing outside the providers
  package was importing it.
- Add a "Claude Code not found" entry to the troubleshooting reference
  doc with platform-specific install snippets and pointers to the
  AI Assistants binary-path section.
- Reframe the example claudeBinaryPath in reference/configuration.md
  away from cli.js-only language; it accepts either the native binary
  or cli.js.

* test+refactor(providers, cli): address PR review feedback

Two test gaps and one doc nit from the PR review (#1217):

- Extract the `--no-env-file` decision into a pure exported helper
  `shouldPassNoEnvFile(cliPath)` so the native-binary branch is unit
  testable without mocking `BUNDLED_IS_BINARY` or running the full
  sendQuery pathway. Six new tests cover undefined, cli.js, native
  binary (Linux + Windows), Homebrew symlink, and suffix-only matching.
  Also adds a `claude.subprocess_env_file_flag` debug log so the
  security-adjacent decision is auditable.

- Extract the three install-location probes in setup.ts into exported
  wrappers (`probeFileExists`, `probeNpmRoot`, `probeWhichClaude`) and
  export `detectClaudeExecutablePath` itself, so the probe order can be
  spied on. Six new tests cover each tier winning, fall-through
  ordering, npm-tier skip when not installed, and the
  which-resolved-but-stale-path edge case.

- CLAUDE.md `claudeBinaryPath` placeholder updated to reflect that the
  field accepts either the native binary or cli.js (the example value
  was previously `/absolute/path/to/cli.js`, slightly misleading now
  that the curl-installer native binary is the default).

Skipped from the review by deliberate scope decision:

- `resolveClaudeBinaryPath` async-with-no-await: matches Codex's
  resolver signature exactly. Changing only Claude breaks symmetry;
  if pursued, do both providers in a separate cleanup PR.
- `isAbsolute()` validation in parseClaudeConfig: Codex doesn't do it
  either. Resolver throws on non-existence already.
- Atomic `.env` writes in setup wizard: pre-existing pattern this PR
  touched only adjacently. File as separate issue if needed.
- classifyError branch in dag-executor for setup errors: scope creep.
- `.env.example` "missing #" claim: false positive (verified all
  CLAUDE_BIN_PATH lines have proper comment prefixes).

* fix(test): use path.join in Windows-compatible probe-order test

The "tier 2 wins (npm cli.js)" test hardcoded forward-slash path
comparisons, but `path.join` produces backslashes on Windows. Caused
the Windows CI leg of the test suite to fail while macOS and Linux
passed. Use `path.join` for both the mock return value and the
expectation so the separator matches whatever the platform produces.
2026-04-14 17:56:37 +03:00
Rasmus Widing
33d31c44f1
fix: lock workflow runs by working_path (#1036, #1188 part 2) (#1212)
* fix: lock workflow runs by working_path (#1036, #1188 part 2)

Both bugs reduce to the same primitive: there's no enforced lock on
working_path, so two dispatches that resolve to the same filesystem
location can race. The DB row is the lock token; pending/running/paused
are "lock held"; terminal statuses release.

Changes:

- getActiveWorkflowRunByPath includes `pending` (with 5-min stale-orphan
  age window), accepts excludeId + selfStartedAt, and orders by
  (started_at ASC, id ASC) for a deterministic older-wins tiebreaker.
  Eliminates the both-abort race where two near-simultaneous dispatches
  with similar timestamps could mutually abort each other.

- Move the executor's guard call site to AFTER workflowRun is finalized
  (preCreated, resumed, or freshly created). This guarantees we always
  have self-ID + started_at to pass to the lock query.

- On guard fire after row creation: mark self as 'cancelled' so we don't
  leave a zombie pending row that would then become its own lock holder.

- New error message includes workflow name, duration, short run id, and
  three concrete next-action commands (status / cancel / different
  branch). Replaces the vague "Workflow already running".

- Resume orphan fix: when executor activates a resumable run, mark the
  orchestrator's pre-created row as 'cancelled'. Without this, every
  resume leaks a pending row that would block the user's own
  back-to-back resume until the 5-min stale window.

- New formatDuration helper for the error message (8 unit tests).

Tests:

- 5 new tests in db/workflows.test.ts: pending in active set, age window,
  excludeId exclusion, tiebreaker SQL shape, ordering.
- 5 new tests in executor.test.ts: self-id passed to query, self-cancel
  on guard fire, new message format, resume orphan cancellation,
  resume proceeds even if orphan cancel fails.
- Updated 2 executor-preamble tests for new structural behavior
  (row-then-guard, new message format).
- 8 new tests for formatDuration.

Deferred (kept scope tight):
- Worktree-layer advisory lockfile (residual #1188.2 microsecond race
  where both dispatches reach provider.create — bounded by git's own
  atomicity for `worktree add`).
- Startup cleanup of pre-existing stale pending rows (5-min age window
  makes them harmless).
- DB partial UNIQUE constraint migration (code-only is sufficient).

Fixes #1036
Fixes #1188 (part 2)

* fix: SQLite Date binding + UTC timestamp parse for path lock guard

Two issues found during E2E smoke testing:

1. bun:sqlite rejects Date objects as bindings ("Binding expected
   string, TypedArray, boolean, number, bigint or null"). Serialize
   selfStartedAt to ISO string before passing — PostgreSQL accepts
   ISO strings for TIMESTAMPTZ comparison too.

2. SQLite returns datetimes as plain strings without timezone suffix
   ("YYYY-MM-DD HH:MM:SS"), and JS new Date() parses such strings as
   local time. The blocking message was showing "running 3h" for
   workflows started seconds ago in a UTC+3 timezone.

   Added parseDbTimestamp helper that:
   - Returns Date.getTime() unchanged for Date inputs (PG path)
   - Treats SQLite-style strings as UTC by appending Z

   Used at both call sites: the lock query (selfStartedAt) and the
   blocking message duration.

Tests:
- 4 new tests in duration.test.ts for parseDbTimestamp covering
  Date input, SQLite UTC interpretation, explicit Z, and explicit
  +/-HH:MM offsets.
- Updated workflows.test.ts assertion for ISO serialization.

E2E smoke verified end-to-end:
- Sanity (single dispatch) succeeds.
- Two concurrent --no-worktree dispatches: one wins, one blocked
  with actionable message showing correct "Xs" duration.
- Resume + back-to-back resume both succeed (orphan correctly
  cancelled when resume activates).

* fix: address review — resume timestamp, lock-leak paths, status copy

CodeRabbit review on #1212 surfaced three real correctness gaps:

CRITICAL — resumeWorkflowRun preserved historical started_at, letting
a resumed row sort ahead of a currently-active holder in the lock
query's older-wins tiebreaker. Two active workflows could end up on
the same working_path. Fix: refresh started_at to NOW() in
resumeWorkflowRun. Original creation time is recoverable from
workflow_events history if needed for analytics.

MAJOR — lock-leak failure paths:
- If resumeWorkflowRun() throws, the orchestrator's pre-created row
  was left as 'pending' until the 5-min stale window. Fix: cancel
  preCreatedRun in the resume catch.
- If getActiveWorkflowRunByPath() throws, workflowRun (possibly
  already promoted to 'running' via resume) was left active with no
  auto-cleanup. Fix: cancel workflowRun in the guard catch.

MINOR — the blocking message always said "running" but the lock
query returns running, paused, AND fresh-pending rows. Telling a
user to "wait for it to finish" on a paused run (waiting on user
approval) would block them indefinitely. Fix: status-aware copy:
- paused: "paused waiting for user input" + approve/reject actions
- pending: "starting" verb
- running: keep current

Tests:
- New: resume refreshes started_at (asserts SQL contains
  `started_at = NOW()`)
- New: cancels preCreatedRun when resumeWorkflowRun throws
- New: cancels workflowRun when guard query throws
- New: paused message uses approve/reject actions, NOT "wait"
- New: pending message uses "starting" verb
- New: running message uses default copy
- Updated: existing tests for new error string ("already active"
  reflects status-aware semantics, not just "running")

Note: the user-facing error string changed from "already running on
this path" to "already active on this path (status)". Internal use
only — surfaced via getResult().error, not directly to users.

* fix: SQLite tiebreaker dialect bug + paired self struct + UX polish

CodeRabbit second review found one critical issue and several polish
items not addressed in 008013da.

CRITICAL — SQLite tiebreaker silently broken under default deployment.
SQLite stores started_at as TEXT "YYYY-MM-DD HH:MM:SS" (space sep).
Our ISO param is "YYYY-MM-DDTHH:MM:SS.mmmZ" (T sep). SQLite compares
text lexically: char 11 is space (0x20) in column vs T (0x54) in param,
so EVERY column value lex-sorts before EVERY ISO param. Result:
`started_at < $param` is always TRUE regardless of actual time. In
true concurrent dispatches, both sides see each other as "older" and
both abort — defeating the older-wins guarantee under SQLite, which
is the default deployment.

Fix: dialect-aware comparison in getActiveWorkflowRunByPath:
  - PostgreSQL: `started_at < $3::timestamptz` (TIMESTAMPTZ + cast)
  - SQLite: `datetime(started_at) < datetime($3)` (forces chronological
    via SQLite's date/time functions)

Documented with reproducer tests in adapters/sqlite.test.ts: lexical
returns wrong answer for "2026-04-14 12:00:00" < "2026-04-14T10:00:00Z";
datetime() returns correct answer.

Type design — collapse paired params into struct.
`excludeId` and `selfStartedAt` had to travel together (tiebreaker
references both) but were two independent optionals — future callers
could pass one without the other and silently degrade. Replaced with
a single `self?: { id: string; startedAt: Date }` to make the
paired-or-nothing invariant structural.

formatDuration(0) consistency.
Old: `if (ms <= 0) return '0s'` — special-cased 0ms despite the
"sub-second rounds up to 1s" comment. Fixed to `ms < 0` so 0ms
returns '1s' (a run that just started in the same DB second should
display as active, not literal zero).

Comment fix: "We acquired the lock via createWorkflowRun" was
misleading — createWorkflowRun creates a row; the lock is determined
later by the query.

Log context: added cwd to workflow.guard_self_cancel_failed and
pendingRunId to db_active_workflow_check_failed so operators can
correlate leaked rows.

Doc fixes:
- /workflow abandon doc said "marks as failed" — actually 'cancelled'
- database.md "Prevents concurrent workflow execution" → accurate
  description of path-based lock with stale-pending tolerance

Test additions:
- 3 SQLite-direct tests in adapters/sqlite.test.ts proving the
  lexical-vs-chronological bug and the datetime() fix
- Guard self-cancel update throw still surfaces failure to user

Signature change rippled through:
- IWorkflowStore.getActiveWorkflowRunByPath now takes (path, self?)
- All internal callers updated
2026-04-14 15:19:38 +03:00
Rasmus Widing
5a4541b391
fix: route canonical path failures through blocked classification (#1211)
Follow-up to #1206 review: the early getCanonicalRepoPath() wrap in
resolve() threw directly, escaping the classification flow that
createNewEnvironment uses. Permission errors, malformed worktree
pointers, ENOENT, etc. surfaced as unclassified crashes instead of
becoming an actionable `blocked` result.

Mirror createNewEnvironment's contract:
- isKnownIsolationError → return { status: 'blocked', reason:
  'creation_failed', userMessage: classifyIsolationError(err) + suffix }
- unknown errors → throw (programming bugs stay visible as crashes,
  not silent isolation failures)

Adds two tests in resolver.test.ts:
- EACCES classifies to "Permission denied" blocked message
- Unknown error propagates as throw

Addresses CodeRabbit review comment on #1206.
2026-04-14 15:19:13 +03:00
Rasmus Widing
fd3f043125
fix: extend worktree ownership guard to resolver adoption paths (#1206)
* fix: extend worktree ownership guard to resolver adoption paths (#1183, #1188)

PR #1198 guarded WorktreeProvider.findExisting(), but IsolationResolver
has three earlier adoption paths that bypass the provider layer:

- findReusable (DB lookup by workflow identity)
- findLinkedIssueEnv (cross-reference via linked issues)
- tryBranchAdoption (PR branch discovery)

Two clones of the same remote share codebase_id (identity is derived
from owner/repo). Without these guards, clone B silently adopts clone
A's worktree via any of the three paths.

Changes:
- Extract verifyWorktreeOwnership from WorktreeProvider (private) to
  @archon/git/src/worktree.ts as an exported function, sitting next to
  getCanonicalRepoPath which parses the same .git file format
- Call the shared function from all three resolver paths; throw on
  cross-clone mismatch (DB rows are preserved — they legitimately
  belong to the other clone)
- Compute canonicalRepoPath once at the top of resolve()
- Six new tests in resolver.test.ts covering each guarded path's
  cross-checkout and same-clone behaviors

Fixes #1183
Fixes #1188 (part 1 — cross-checkout; part 2 parallel collision deferred
to follow-up alongside #1036)

* fix: address PR review — polish, observability, secondary gap, docs

Addresses the multi-agent review on #1206:

Code fixes:
- worktree.adoption_refused_cross_checkout log event renamed to match
  CLAUDE.md {domain}.{action}_{state} convention
- verifyWorktreeOwnership now preserves err.code and err via { cause }
  when wrapping fs errors, so classifyIsolationError is robust to Node
  message format changes
- Structured fields (codebaseId, canonicalRepoPath) added to all
  cross-clone rejection logs for incident debugging
- Wrap getCanonicalRepoPath at top of resolve() with classified error
  instead of letting it propagate as an unclassified crash
- Extract assertWorktreeOwnership helper on IsolationResolver —
  centralizes warn-then-rethrow contract, removes duplication
- Dedupe toWorktreePath(existing.working_path) calls in resolver paths
- Add code comment on findLinkedIssueEnv explaining why throw-on-first
  is intentional (user decision — surfaces anomaly instead of masking)

Secondary gap closed:
- WorktreeProvider.findExisting PR-branch adoption path
  (findWorktreeByBranch) now also verifies ownership — same class of
  bug as the main path, just via a different lookup

Tests:
- 8 new unit tests for verifyWorktreeOwnership in @archon/git
  (matching pointer, different clone, EISDIR/ENOENT errno preservation,
  submodule pointer, corrupted .git, trailing-slash normalization,
  cause chain)
- tryBranchAdoption cross-clone test now asserts store.create was
  never called (symmetry with paths 1+2 asserting updateStatus)
- New test for cross-clone rejection in the PR-branch-adoption
  secondary path in worktree.test.ts

Docs:
- CHANGELOG.md Unreleased entry for the cross-clone fix series
- troubleshooting.md "Worktree Belongs to a Different Clone" section
  documenting all four new error patterns with resolution steps and
  pointer to #1192 for the architectural fix

* fix(git): use raw .git pointer in cross-clone error message

verifyWorktreeOwnership previously called path.resolve() on the gitdir
path before embedding it in the error message. On Windows, resolve()
prepends a drive letter to a POSIX-style path (e.g., /other/clone →
C:\other\clone), which:

1. Misled users by showing a path that doesn't match what's actually
   in their .git file
2. Broke a Windows-only test asserting the error contains the literal
   /other/clone path

Compare on resolved paths (correct — normalizes trailing slashes and
relative components for the equality check) but display the raw match
in the error message (recognizable to the user).
2026-04-14 12:10:19 +03:00
Rasmus Widing
af9ed84157
fix: prevent worktree isolation bypass via prompt and git-level adoption (#1198)
* fix: prevent worktree isolation bypass via prompt and git-level adoption (#1193, #1188)

Three fixes for workflows operating on wrong branches:

- archon-implement prompt: replace ambiguous branch table with decision
  tree that trusts the worktree isolation system, uses $BASE_BRANCH
  explicitly, and instructs AI to never switch branches
- WorktreeProvider.findExisting: verify worktree's parent repo matches
  the request before adopting, preventing cross-clone adoption
- WorktreeProvider.createNewBranch: reset stale orphan branches to the
  intended start-point instead of silently inheriting old commits

Fixes #1193
Relates to #1188

* fix: address PR review — strict worktree verification, align sibling prompts

Address CodeRabbit + self-review findings on #1198:

Code fixes:
- findExisting now throws on cross-checkout or unverifiable state instead of
  returning null, avoiding a confusing cascade through createNewBranch
- verifyWorktreeOwnership handles .git errors precisely: ENOENT/EACCES/EIO
  throw a fail-fast error; EISDIR (full checkout at path) throws a clear
  "not a worktree" error; unmatched gitdir (submodule, malformed) throws
- Path comparison uses resolve() to normalize trailing slashes
- Added classifyIsolationError patterns so new errors produce actionable
  user messages

Test fixes:
- mockClear readFile/rm in afterEach
- New tests: cross-checkout throws, EISDIR throws, EACCES throws,
  submodule pointer throws, trailing-slash normalization, branch -f
  reset failure propagates without retry
- Updated existing tests that relied on permissive adoption to provide
  valid matching gitdir

Prompt fixes (sweep of all default commands):
- archon-implement.md: clarify "never switch branches" applies to worktree
  context; non-worktree branch creation still allowed
- archon-fix-issue.md + archon-implement-issue.md: aligned decision tree
  with archon-implement pattern; use $BASE_BRANCH instead of MAIN/MASTER
- archon-plan-setup.md: converted table to ordered decision tree with
  IN WORKTREE? first; removed ambiguous "already on correct feature
  branch" row
2026-04-14 09:44:12 +03:00
Rasmus Widing
d6e24f5075
feat: Phase 2 — community-friendly provider registry system (#1195)
* feat: replace hardcoded provider factory with typed registry system

Replace the built-in-only factory switch with a typed ProviderRegistration
registry where entries carry metadata (displayName, capabilities,
isModelCompatible) alongside the factory function. This enables community
providers to register without modifying core code.

- Add ProviderRegistration and ProviderInfo types to contract layer
- Create registry.ts with register/get/list/clear API, delete factory.ts
- Bootstrap registerBuiltinProviders() at server and CLI entrypoints
- Widen provider unions from 'claude' | 'codex' to string across schemas,
  config types, deps, executors, and API validation
- Replace hardcoded model-validation with registry-driven isModelCompatible
  and inferProviderFromModel (built-in only inference)
- Add GET /api/providers endpoint returning registry metadata
- Dynamic provider dropdowns in Web UI (BuilderToolbar, NodeInspector,
  WorkflowBuilder, SettingsPage) via useProviders hook
- Dynamic provider selection in CLI setup command
- Registry test suite covering full lifecycle

* feat: generalize assistant config and tighten registry validation

- Add ProviderDefaults/ProviderDefaultsMap generic types to contract layer
- Add index signatures to ClaudeProviderDefaults/CodexProviderDefaults
- Introduce AssistantDefaults/AssistantDefaultsConfig intersection types
  that combine ProviderDefaultsMap with typed built-in entries
- Replace hardcoded claude/codex config merging with generic
  mergeAssistantDefaults() that iterates all provider entries
- Replace hardcoded toSafeConfig projection with generic
  toSafeAssistantDefaults() that strips server-internal fields
- Validate provider strings at all config-entry surfaces: env override,
  global config, repo config all throw on unknown providers
- Validate provider on PATCH /api/config/assistants (400 on unknown)
- Move validator.ts from hardcoded Codex checks to capability-driven
  warnings using registry getProviderCapabilities()
- Remove resolveProvider() default to 'claude' — returns undefined when
  no provider is set, skipping capability warnings for unresolved nodes
- Widen config API schemas to generic Record<string, ProviderDefaults>
- Rewrite SettingsPage to iterate providers dynamically with built-in
  specific UI for Claude/Codex and generic JSON view for community
- Extract bootstrap to provider-bootstrap modules in CLI and server
- Remove all as Record<...> casts from dag-executor, executor,
  orchestrator — clean indexing via ProviderDefaultsMap intersection

* fix: remove remaining hardcoded provider assumptions and regenerate types

- Replace hardcoded 'claude' defaults in CLI setup with registry lookup
  (getRegisteredProviders().find(p => p.builtIn)?.id)
- Replace hardcoded 'claude' default in clone.ts folder detection with
  registry-driven fallback
- Update config YAML comment from "claude or codex" to "registered provider"
- Make bootstrap test assertions use toContain instead of exact toEqual
  so they don't break when community providers are registered
- Widen validator.test.ts helper from 'claude' | 'codex' to string
- Remove unnecessary type casts in NodeInspector, WorkflowBuilder,
  SettingsPage now that generated types use string
- Regenerate api.generated.d.ts from updated OpenAPI spec — all provider
  fields are now string instead of 'claude' | 'codex' union

* fix: address PR review findings — consistency, tests, docs

Critical fixes:
- isModelCompatible now throws on unknown providers (fail-fast parity
  with getProviderCapabilities) instead of silently returning true
- Schema provider fields use z.string().trim().min(1) to reject
  whitespace-only values
- validator.ts resolveProvider accepts defaultProvider param so
  capability warnings fire for config-inherited providers
- PATCH /api/config/assistants validates assistants keys against
  registry (rejects unknown provider IDs in the map)

YAGNI cleanup:
- Delete provider-bootstrap.ts wrappers in CLI and server — call
  registerBuiltinProviders() directly
- Remove no-op .map(provider => provider) in SettingsPage

Test coverage:
- Add GET /api/providers endpoint tests (shape, projection, capabilities)
- Add config-loader throw-path tests for unknown providers in env var,
  global config, and repo config
- Add isModelCompatible throw test for unknown providers

Docs:
- CLAUDE.md: factory.ts → registry.ts in directory tree, add
  GET /api/providers to API endpoints section
- .env.example: update DEFAULT_AI_ASSISTANT comment
- docs-web configuration reference: update provider constraint docs

UI:
- Settings default-assistant dropdown uses allProviderEntries fallback
  (no longer silently empty on API failure)
- clearRegistry marked @internal in JSDoc

* fix: use registry defaults in getDefaults/registerProject, document type design

- getDefaults() initializes assistant defaults from registered providers
  instead of hardcoding { claude: {}, codex: {} }
- getDefaults() uses first registered built-in as default assistant
  instead of hardcoding 'claude'
- handleRegisterProject uses config.assistant instead of hardcoded 'claude'
  for new codebase ai_assistant_type
- Document AssistantDefaults/AssistantDefaultsConfig intersection types:
  built-in keys are typed for parseClaudeConfig/parseCodexConfig type
  safety; community providers use the generic [string] index
- Document WorkflowConfig.assistants intersection type with same rationale

* docs: update stale provider references to reflect registry system

- architecture.md: DB schema comment now says 'registered provider'
- first-workflow.md: provider field accepts any registered provider
- quick-reference.md: provider type changed from enum to string
- authoring-workflows.md: provider type changed from enum to string
- title-generator.ts: @param doc updated from 'claude or codex' to
  generic provider identifier

* docs: fix remaining stale provider references in quick-reference and authoring guide

- quick-reference.md: per-node provider type changed from enum to string
- quick-reference.md: model mismatch guidance updated for registry pattern
- authoring-workflows.md: provider comment says 'any registered provider'
2026-04-13 21:27:11 +03:00
Rasmus Widing
b5c5f81c8a
refactor: extract provider metadata seam for Phase 2 registry readiness (#1185)
* refactor: extract provider metadata seam for Phase 2 registry readiness

- Add static capability constants (capabilities.ts) for Claude and Codex
- Export getProviderCapabilities() from @archon/providers for capability
  queries without provider instantiation
- Add inferProviderFromModel() to model-validation.ts, replacing three
  copy-pasted inline inference blocks in executor.ts and dag-executor.ts
- Replace throwaway provider instantiation in dag-executor with static
  capability lookup (getProviderCapabilities)
- Add orchestrator warning when env vars are configured but provider
  doesn't support envInjection

* refactor: address LOW findings from code review

- Remove CLAUDE_CAPABILITIES/CODEX_CAPABILITIES from public index (YAGNI —
  callers should use getProviderCapabilities(), not raw constants)
- Remove dead _deps parameter from resolveNodeProviderAndModel and its
  two call-sites (no longer needed after static capability lookup refactor)
- Update factory.ts module JSDoc to mention both exported functions
- Add edge-case tests for getProviderCapabilities: empty string and
  case-sensitive throws (parity with existing getAgentProvider tests)
- Add test for inferProviderFromModel with empty string (returns default,
  documenting the falsy-string shortcut)
2026-04-13 16:10:48 +03:00
Rasmus Widing
bf20063e5a
feat: propagate managed execution env to all workflow surfaces (#1161)
* Implement managed execution env propagation

* Address managed env review feedback
2026-04-13 15:21:57 +03:00
Rasmus Widing
a8ac3f057b
security: prevent target repo .env from leaking into subprocesses (#1135)
Remove the entire env-leak scanning/consent infrastructure: scanner,
allow_env_keys DB column usage, allow_target_repo_keys config, PATCH
consent route, --allow-env-keys CLI flag, and UI consent toggle.

The env-leak gate was the wrong primitive. Target repo .env protection
is already structural:
- stripCwdEnv() at boot removes Bun-auto-loaded CWD .env keys
- Archon loads its own env sources afterward (~/.archon/.env)
- process.env is clean before any subprocess spawns
- Managed env injection (config.yaml env: + DB vars) is unchanged

No scanning, no consent, no blocking. Any repo can be registered and
used. Subprocesses receive the already-clean process.env.
2026-04-13 13:46:24 +03:00
Rasmus Widing
c9c6ab47cb test: add comprehensive e2e smoke test workflows
- e2e-all-nodes: exercises bash, prompt, script (bun), structured output,
  model override (haiku), effort control, and $nodeId.output refs
- e2e-mixed-providers: tests Claude + Codex in the same workflow with
  cross-provider output references
- echo-args.js: simple script node test helper
2026-04-13 11:26:05 +03:00
Rasmus Widing
37aeadb8c8
refactor: decompose provider sendQuery() into explicit helper boundaries (#1162)
* refactor: decompose provider sendQuery() into explicit helper boundaries (#1139)

sendQuery() in both Claude and Codex providers was a monolith mixing SDK option
building, nodeConfig translation, stream normalization, and error classification.
This makes it hard to safely extend for Phase 2 provider extensibility.

Decompose both providers into focused internal helpers:

Claude:
- buildBaseClaudeOptions: SDK option construction
- buildToolCaptureHooks: PostToolUse/PostToolUseFailure hook setup
- applyNodeConfig: workflow nodeConfig → SDK translation + structured warnings
- streamClaudeMessages: raw SDK event → MessageChunk normalization
- classifyAndEnrichError: error classification with retry decisions

Codex:
- buildTurnOptions: per-turn option construction (output schema, abort)
- streamCodexEvents: raw SDK event → MessageChunk normalization
- classifyAndEnrichCodexError: error classification with retry decisions

Also introduces ProviderWarning { code, message } replacing raw string warnings
for machine-readable provider translation warnings.

Adds 43 focused unit tests covering the extracted helpers directly.

Fixes #1139

* fix: export ToolResultEntry type used in public buildBaseClaudeOptions API

* fix: unexport internal helpers to prevent API surface leakage, fix retry state bug

Review findings:
1. Internal helpers were exported and reachable through package.json subpath
   exports (./claude/provider, ./codex/provider), widening the public API.
   All new helpers are now file-local — the only public exports remain
   ClaudeProvider, CodexProvider, loadMcpConfig, buildSDKHooksFromYAML,
   withFirstMessageTimeout, getProcessUid.

2. Codex streamState (lastTodoListSignature) was shared across retry
   attempts, causing todo-list dedup to suppress output on retry.
   Now creates fresh state per attempt.

Removed direct helper test imports — existing sendQuery e2e tests
(51 Claude + 42 Codex) cover all behavior paths.

* fix: address review findings — abort handling, retry bugs, error swallowing

Fixes from CodeRabbit + multi-agent review:

1. classifyAndEnrichError preserves first-event timeout diagnostic instead
   of collapsing it into generic "Query aborted" (the timeout aborts the
   controller, but the original error carries the #1067 breadcrumb)

2. nodeConfigWarnings emitted once before retry loop, not per attempt

3. buildSubprocessEnv() called once before retry loop (was re-logging
   auth mode and rebuilding { ...process.env } per attempt)

4. Abort signal listener registered once with forwarding to current
   controller (was accumulating per-retry listeners)

5. PostToolUse hook wrapped in try/catch (JSON.stringify can throw on
   circular refs — was asymmetric with PostToolUseFailure which had it)

6. Codex streamCodexEvents throws on abort instead of silent break
   (callers were getting truncated stream with no result/error)

7. Both providers store enrichedError (not raw error) for retry
   exhaustion — preserves stderr context in final throw

8. Log is_error result events at error level in Claude stream normalizer

* test: add black-box behavioral tests for sendQuery decomposition fixes

Restore test coverage for the specific fixes from the decomposition review,
exercised through sendQuery (black-box) since helpers are file-local:

Claude (6 tests):
- Timeout error preserved (not collapsed into "Query aborted")
- nodeConfig warnings emitted once even when retries occur
- Abort signal cancels across retries via single forwarding listener
- Enriched error (with stderr) thrown at retry exhaustion
- PostToolUse hook handles circular reference without crashing
- is_error result events logged at error level

Codex (3 tests):
- Abort signal throws instead of silently truncating stream
- Enriched error thrown at retry exhaustion
- Todo-list dedup state resets between retry attempts
2026-04-13 11:24:36 +03:00
Rasmus Widing
6a6740af38
fix: make env-integration test cross-platform (Windows CI) (#1160)
* fix: make env-integration test cross-platform (Windows CI)

Check for Windows env var equivalents (Path instead of PATH,
USERPROFILE instead of HOME) in scenario 3 assertions.

Closes #1128

* fix: Windows PATH/HOME casing in provider subprocess env test

Same cross-platform fix for ClaudeProvider test — spread objects
lose Windows case-insensitive behavior (Path vs PATH, USERPROFILE
vs HOME).
2026-04-13 09:44:58 +03:00
Rasmus Widing
c1ed76524b
refactor: extract providers from @archon/core into @archon/providers (#1137)
* refactor: extract providers from @archon/core into @archon/providers

Move Claude and Codex provider implementations, factory, and SDK
dependencies into a new @archon/providers package. This establishes a
clean boundary: providers own SDK translation, core owns business logic.

Key changes:
- New @archon/providers package with zero-dep contract layer (types.ts)
- @archon/workflows imports from @archon/providers/types — no mirror types
- dag-executor delegates option building to providers via nodeConfig
- IAgentProvider gains getCapabilities() for provider-agnostic warnings
- @archon/core no longer depends on SDK packages directly
- UnknownProviderError standardizes error shape across all surfaces

Zero user-facing changes — same providers, same config, same behavior.

* refactor: remove config type duplication and backward-compat re-exports

Address review findings:
- Move ClaudeProviderDefaults and CodexProviderDefaults to the
  @archon/providers/types contract layer as the single source of truth.
  @archon/core/config/config-types.ts now imports from there.
- Remove provider re-exports from @archon/core (index.ts and types/).
  Consumers should import from @archon/providers directly.
- Update @archon/server to depend on @archon/providers for MessageChunk.

* refactor: move structured output validation into providers

Each provider now normalizes its own structured output semantics:
- Claude already yields structuredOutput from the SDK's native field
- Codex now parses inline agent_message text as JSON when outputFormat
  is set, populating structuredOutput on the result chunk

This eliminates the last provider === 'codex' branch from dag-executor,
making it fully provider-agnostic. The dag-executor checks structuredOutput
uniformly regardless of provider.

Also removes the ClaudeCodexProviderDefaults deprecated alias — all
consumers now use ClaudeProviderDefaults directly.

* fix: address PR review — restore warnings, fix loop options, cleanup

Critical fixes:
- Restore MCP missing env vars user-facing warning (was silently dropped)
- Restore Haiku + MCP tool search warning
- Fix buildLoopNodeOptions to pass workflow-level nodeConfig (effort,
  thinking, betas, sandbox were silently lost for loop nodes)
- Add TODO(#1135) comments documenting env-leak gate gap

Cleanup:
- Remove backward-compat type aliases from deps.ts (keep WorkflowTokenUsage)
- Remove 26 unnecessary eslint-disable comments from test files
- Trim internal helpers from providers barrel (withFirstMessageTimeout,
  getProcessUid, loadMcpConfig, buildSDKHooksFromYAML)
- Add @archon/providers dep to CLI package.json
- Fix 8 stale documentation paths pointing to deleted core/src/providers/
- Add E2E smoke test workflows for both Claude and Codex providers

* fix: forward provider system warnings to users in dag-executor

The dag-executor only forwarded system chunks starting with
"MCP server connection failed:" — all other provider warnings
(missing env vars, Haiku+MCP, structured output issues) were
logged but never reached the user.

Now forwards all system chunks starting with ⚠️ (the prefix
providers use for user-actionable warnings).

* fix: add providers package to Dockerfile and fix CI module resolution

- Add packages/providers/ to all three Dockerfile stages (deps,
  production package.json copy, production source copy)
- Replace wildcard export map (./*) with explicit subpath entries
  to fix module resolution in CI (bun workspace linking)

* chore: update bun.lock for providers package exports
2026-04-13 09:21:36 +03:00
Rasmus Widing
eb75ab60e5
Merge pull request #1130 from coleam00/rules-cleanup
docs: consolidate Claude guidance into CLAUDE.md
2026-04-12 20:31:49 +03:00
Rasmus Widing
39c6f05bad docs: consolidate Claude guidance into CLAUDE.md 2026-04-12 20:21:16 +03:00
Rasmus Widing
a4242e6b49
Merge pull request #1116 from coleam00/rename-iassistantclient-to-iagentprovider
refactor: rename IAssistantClient to IAgentProvider
2026-04-12 20:02:49 +03:00
Rasmus Widing
a7b3b94388 refactor: simplify provider rename follow-through
- ProviderDefaults → CodexProviderDefaults (symmetric with ClaudeProviderDefaults)
- Fix stale "AI client" comments in orchestrator-agent.ts and orchestrator.test.ts
- Remove dead createMockAgentProvider in test/mocks/streaming.ts (zero importers, wrong method names)
- Fix irregular whitespace in .claude/rules/workflows.md
2026-04-12 13:51:45 +03:00
Rasmus Widing
b9a70a5d17 refactor: complete provider rename in config types, logger domains, and docs
- AssistantDefaults → ProviderDefaults, ClaudeAssistantDefaults → ClaudeProviderDefaults
- Logger domains: client.claude → provider.claude, client.codex → provider.codex
- Fix stale JSDoc, error messages, and references in architecture docs, CHANGELOG, testing rules
2026-04-12 13:47:05 +03:00
Rasmus Widing
91c184af57 refactor: rename IAssistantClient to IAgentProvider
Rename the core AI provider interface and all related types, classes,
factory functions, and directory from clients/ to providers/.

Rename map:
- IAssistantClient → IAgentProvider
- ClaudeClient → ClaudeProvider
- CodexClient → CodexProvider
- getAssistantClient → getAgentProvider
- AssistantRequestOptions → AgentRequestOptions
- IWorkflowAssistantClient → IWorkflowAgentProvider
- AssistantClientFactory → AgentProviderFactory
- WorkflowAssistantOptions → WorkflowAgentOptions
- packages/core/src/clients/ → packages/core/src/providers/

NOT renamed (user-facing/DB-stored): assistant config key,
DEFAULT_AI_ASSISTANT env var, ai_assistant_type DB column.

No behavioral changes — purely naming.
2026-04-12 13:11:21 +03:00
github-actions[bot]
c2089117fa chore: update Homebrew formula for v0.3.6 2026-04-12 09:19:27 +00:00
Rasmus Widing
59cda08efa
Merge pull request #1114 from coleam00/dev
Release 0.3.6
2026-04-12 12:17:34 +03:00
Rasmus Widing
883d1369f4 Release 0.3.6 2026-04-12 12:16:49 +03:00
Cole Medin
6da994815c
fix: strip CWD .env leak, remove subprocess allowlist, add first-event timeout (#1067, #1030, #1098, #1070)
* fix: strip CWD .env leak, enable platform adapters in serve, add first-event timeout (#1067)

Three bugs fixed: (1) Bun auto-loads CWD .env files before user code, leaking
non-overlapping keys into the Archon process — new stripCwdEnv() boot import
removes them before any module reads env. (2) archon serve hardcoded
skipPlatformAdapters:true, preventing Slack/Telegram/Discord from starting.
(3) Claude SDK query had no first-event timeout, causing silent 30-min hangs
when the subprocess wedges — new withFirstMessageTimeout wrapper races the
first event against a configurable deadline (default 60s).

Changes:
- Add @archon/paths/strip-cwd-env and strip-cwd-env-boot modules
- Import boot module as first import in CLI entry point
- Remove skipPlatformAdapters: true from serve.ts
- Add withFirstMessageTimeout + diagnostics to ClaudeClient
- Add CLAUDECODE=1 nested-session warning to CLI
- Add 9 unit tests (6 strip-cwd-env + 3 timeout)

Fixes #1067

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings for PR #1092

Fixed:
- Clear setTimeout timer in withFirstMessageTimeout finally block (HIGH-1)
- Add strip-cwd-env-boot to server/src/index.ts for direct dev:server path (MEDIUM-1)
- Warn to stderr on non-ENOENT errors in stripCwdEnv (MEDIUM-2)
- Update stale configuration.md docs for new env-loading mechanism (HIGH-2)
- Add ARCHON_CLAUDE_FIRST_EVENT_TIMEOUT_MS and ARCHON_SUPPRESS_NESTED_CLAUDE_WARNING env vars to docs
- Add nested Claude Code hang troubleshooting entry
- Fix boot module JSDoc: "CLI and server" → "CLI" only
- Fix stripCwdEnv JSDoc: remove stale "override: true" reference
- Update .claude/rules/cli.md startup behavior section
- Update CLAUDE.md @archon/paths description with new exports

Tests added:
- Assert controller.signal.aborted on timeout
- Handle generator that completes immediately without yielding
- Strip distinct keys from different .env files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* simplify: replace string sentinel with typed error class in withFirstMessageTimeout

Replace the '__timeout__' string sentinel used to identify timeout rejections
with a dedicated FirstEventTimeoutError class. instanceof checks are more
explicit and robust than string comparison on error messages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address review findings — dotenv version, docs, server warning, marker strip, tests

1. Align dotenv to ^17 (was ^16, rest of monorepo uses ^17.2.3)
2. Remove incorrect SUBPROCESS_ENV_ALLOWLIST claim from docs — the SDK
   bypasses the env option and uses process.env directly (#1097)
3. Add CLAUDECODE=1 warning to server entry point (was only in CLI)
4. Add diagnostic payload content test for withFirstMessageTimeout
5. Integrate #1097's finding: strip CLAUDECODE + CLAUDE_CODE_* session
   markers (except auth vars) + NODE_OPTIONS + VSCODE_INSPECTOR_OPTIONS
   from process.env at entry point. Pattern-matched on CLAUDE_CODE_*
   prefix rather than hardcoding 6 names, so future Claude Code markers
   are handled automatically. Auth vars (CLAUDE_CODE_OAUTH_TOKEN,
   CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are preserved.

   Root cause per #1097: the Claude Agent SDK leaks process.env into the
   spawned child regardless of the explicit env option, so the only way
   to prevent the nested-session deadlock is to delete the markers from
   process.env at the entry point.

Validation: bun run validate passes, 125 paths tests (6 new marker
tests), 60 claude tests (1 new diagnostic test), DATABASE_URL leak
verified stripped (target repo .env DATABASE_URL does not affect Archon
DB selection).

* refactor: remove SUBPROCESS_ENV_ALLOWLIST — trust user config, strip only CWD

The allowlist was wrong for a single-developer tool:
- It blocked keys the user intentionally set in ~/.archon/.env
  (ANTHROPIC_API_KEY, AWS_*, CLAUDE_CONFIG_DIR, MiniMax vars, etc.)
- It was bypassed by the SDK anyway (process.env leaks to subprocess
  regardless of the env option — see #1097)
- It attracted a constant stream of PRs adding keys (#1060, #1093, #1099)

New model: CWD .env keys are the only untrusted source. stripCwdEnv()
at entry point handles that. Everything in ~/.archon/.env + shell env
passes through to the subprocess. No filtering, no second-guessing.

Changes:
- Delete env-allowlist.ts and env-allowlist.test.ts
- Simplify buildSubprocessEnv() to return { ...process.env } with
  auth-mode logging (no token stripping — user controls their config)
- Replace 4 allowlist-based tests with 1 pass-through test
- Remove env-allowlist.test.ts from core test batch
- Update security.md and cli.md docs to reflect the new model

The CLAUDECODE + CLAUDE_CODE_* marker strip and NODE_OPTIONS strip
remain in stripCwdEnv() at entry point — those are process-level
safety (not per-subprocess filtering) and are needed regardless.

* fix: restore override:true for archon env, add integration tests

The integration tests caught a real issue: without override:true, the
~/.archon/.env load doesn't win over shell-inherited env vars. If the
user's shell profile exports PORT=9999 and ~/.archon/.env has PORT=3000,
the user expects Archon to use 3000.

stripCwdEnv() handles CWD .env files (untrusted). override:true handles
shell-inherited vars (trusted but less specific than ~/.archon/.env).
Different concerns, both needed.

Also adds 6 integration tests covering the full entry-point flow:
1. Global auth user with ANTHROPIC_API_KEY in CWD .env — stripped
2. OAuth token in archon env + random key in CWD — CWD stripped, archon kept
3. General leak test — nothing from CWD reaches subprocess
4. Same key in both CWD and archon — archon value wins
5. CLAUDECODE markers stripped even when not from CWD .env
6. CLAUDE_CODE_OAUTH_TOKEN survives marker strip

* test: add DATABASE_URL leak scenarios to env integration tests

* fix: move CLAUDECODE warning into stripCwdEnv, remove dead useGlobalAuth logic

Review findings addressed:

1. CLAUDECODE warning was dead code — the boot import deleted CLAUDECODE
   from process.env before the warning check in cli.ts/server/index.ts
   could fire. Moved the warning into stripCwdEnv() itself, emitted
   BEFORE the deletion. Removed duplicate warning code from both entry
   points.

2. useGlobalAuth token stripping removed (intentional, not regression) —
   the old code stripped CLAUDE_CODE_OAUTH_TOKEN and CLAUDE_API_KEY when
   useGlobalAuth=true. Per design discussion: the user controls
   ~/.archon/.env and all keys they set are intentional. If they want
   global auth, they just don't set tokens. Simplified buildSubprocessEnv
   to log auth mode for diagnostics only, no filtering.

3. Docs "no override needed" corrected — cli.md and configuration.md
   now reflect the actual code (override: true).

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2026-04-12 12:11:16 +03:00
Cole Medin
b620c04e27 fix(web): add defensive optional chaining for workflow run data access
Prevents "Cannot read properties of undefined (reading 'status')" crash
when navigating between chat and workflow execution views during race
conditions where run data may be transiently undefined.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 20:09:09 -05:00
Cole Medin
bf8bc8e4ae fix: address review findings for workflow context injection
- CRITICAL: fix metadata filter in getRecentWorkflowResultMessages to check
  for workflowResult key presence instead of category (which is never persisted
  to DB); feature was completely non-functional on every call
- HIGH: guard JSON.parse(msg.metadata) with typeof check to handle PostgreSQL
  JSONB columns returned as objects (not strings) by node-postgres
- MEDIUM: add structured warn log inside inner metadata parse catch block
- LOW: use SELECT id, content, metadata instead of SELECT * in new DB query
- LOW: update comments in messages.ts and prompt-builder.ts for accuracy
- Tests: add formatWorkflowContextSection unit tests (pure function coverage)
- Tests: add getRecentWorkflowResultMessages tests (dialect switch + contract)
- Tests: add getDatabaseType mock to messages.test.ts connection mock
- Tests: add ../db/messages mock and formatWorkflowContextSection to
  prompt-builder mock in orchestrator-agent.test.ts
- Tests: add handleMessage workflow context injection behavioral tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 17:59:19 -05:00
Cole Medin
4292c3a24b simplify: replace nested ternary with if/else for headerTitle in WorkflowResultCard
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 17:49:55 -05:00
Cole Medin
e4555a769b simplify: reduce complexity in changed files
- Parallelize checksums + tarball fetch in serve.ts (removes waterfall latency)
- Remove redundant existsSync before readFileSync in update-check.ts (catch already handles ENOENT)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 17:47:53 -05:00
Cole Medin
dbe559efd1 fix(web): address review findings — logging and test extraction
- Add console.error logging to silent .catch on SSE reconnect re-fetch
  (ChatInterface.tsx:~544) so production failures are visible in logs
- Extract onText setMessages reducer to chat-message-reducer.ts as a
  pure function (applyOnText) with 14 unit tests covering all 6
  segmentation rules including the new tool-call boundary (issue #1054)
- Refactor ChatInterface.onText to delegate to applyOnText

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 17:45:08 -05:00
Cole Medin
3e3ddf25d5 feat: inject workflow run context into orchestrator prompt (#1055)
After a workflow completes, the AI had no awareness of results when
answering follow-up questions. This adds a "Recent Workflow Results"
section to the orchestrator prompt by querying persisted workflow_result
messages from the conversation.

Changes:
- Add getRecentWorkflowResultMessages() to db/messages.ts
- Add WorkflowResultContext type and formatWorkflowContextSection() to prompt-builder.ts
- Extend buildFullPrompt() with optional workflowContext parameter
- Fetch and inject workflow context in handleMessage() before prompt building

Fixes #1055

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:34:17 -05:00
Cole Medin
4ee5232da3 fix(web): interleave tool calls with text during SSE streaming (#1054)
During SSE streaming, tool calls always appeared below all text because
onText appended to the existing message even when it already had tool
calls. The server-side persistence already segments at this boundary.
Mirror that rule in the client's onText handler: when the last streaming
message has tool calls, seal it and start a new message for incoming text.

Fixes #1054

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:31:38 -05:00
Cole Medin
16b47d3dde fix: archon setup --spawn fails on Windows when repo path contains spaces (#1035)
The cmd.exe fallback in spawnWindowsTerminal() used shell: true, which caused
Bun/Node to flatten args into a single string without proper quoting. Paths
with spaces were split at whitespace, breaking the /D argument to start.

Changes:
- Remove shell: true from cmd.exe fallback spawn options
- Remove shell?: boolean from trySpawn options type (no callers need it)

Fixes #1035

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:29:25 -05:00
Cole Medin
536584db8f
Merge pull request #1026 from coleam00/archon/task-fix-issue-1014
feat(web): loop node iteration visibility in workflow execution view
2026-04-10 16:14:12 -05:00
Cole Medin
60ddda3a12 revert: remove incorrect remainingMessage suppression in stream mode
The suppression broke the "sends remaining message before dispatching
workflow" test — when the AI response contains both text and a command
in a single chunk, the text was never streamed, so suppressing
remainingMessage loses it entirely. The actual duplicate was in the
WorkflowLogs execution view, not the routing AI path, and is already
fixed by the onText message splitting and text content dedup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 16:13:45 -05:00
Cole Medin
1eddf3e6aa fix(web): split workflow status messages in WorkflowLogs onText handler
WorkflowLogs' onText handler was blindly concatenating all SSE text into
a single streaming message, unlike ChatInterface which splits on workflow
status text (🚀/). This caused the "Starting workflow" text to merge
with subsequent text into one giant message, breaking text dedup against
DB messages (which are stored as separate segments). The SSE message
content never matched any single DB message exactly, so both appeared.

Add the same workflow status boundary detection from ChatInterface:
close the current streaming message and start a new one when a workflow
status message arrives or when regular text follows a status message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 15:51:40 -05:00
Cole Medin
4e56c86dff fix: eliminate duplicate text and tool calls in workflow execution view
Three fixes for message duplication during live workflow execution:

1. dag-executor: Add missing `tool_call_formatted` category to loop iteration
   tool messages. Without this, the web adapter sent tool text as both a regular
   SSE text event AND a structured tool_call event, causing each tool to appear
   twice (raw text + rendered card). Regular DAG nodes already had this metadata.

2. WorkflowLogs: Add text content dedup in SSE/DB merge. During live execution,
   the same text (e.g. "Starting workflow...") can appear in both DB (REST fetch)
   and SSE (event buffer replay). Collects DB text into a Set and skips matching
   SSE text messages.

3. orchestrator-agent: Suppress remainingMessage re-send in stream mode. The
   routing AI streams text chunks before /invoke-workflow is detected, then
   retracts them. Without suppression, remainingMessage re-sends the same text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 15:48:40 -05:00
Cole Medin
5685b41d18 fix(cli): add cli. domain prefix to log event names
Apply review finding: rename flat log event names to use the
cli.{action}_{state} convention matching the rest of the file.

- workflow_dispatch_surface_failed → cli.workflow_dispatch_surface_failed
- workflow_output_surface_failed → cli.workflow_result_surface_failed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:27:12 -05:00
Cole Medin
b8e367f35d simplify: reduce complexity in changed files
Deduplicate JSON branch in workflowStatusCommand by computing the output
array once with a single console.log call, removing the duplicated
verbose/non-verbose conditional branches.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 10:16:27 -05:00
Cole Medin
e8334b313a Merge branch 'archon/task-fix-issue-1015' into dev
Resolve merge conflict in MessageList.tsx by combining:
- PR #1025: status/duration/nodes/artifacts enrichment for WorkflowResultCard
- PR #1023: ArtifactViewerModal clickable file paths in result card content

Both features now work together — the result card shows status-aware
headers, node counts, duration, and artifact summaries while also
supporting clickable artifact file paths in the markdown content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:16:06 -05:00