Commit graph

674 commits

Author SHA1 Message Date
Alex Siri
7ea321419f
fix: initialize options.hooks before merging YAML node hooks (#1177)
Some checks are pending
E2E Smoke Tests / e2e-codex (push) Waiting to run
E2E Smoke Tests / e2e-claude (push) Waiting to run
E2E Smoke Tests / e2e-deterministic (push) Waiting to run
E2E Smoke Tests / e2e-mixed (push) Blocked by required conditions
Test Suite / test (ubuntu-latest) (push) Waiting to run
Test Suite / test (windows-latest) (push) Waiting to run
Test Suite / docker-build (push) Waiting to run
When a workflow node defines hooks (PreToolUse/PostToolUse) in YAML but
no hooks exist yet on the options object, applyNodeConfig crashes with
"undefined is not an object" because it tries to assign properties on
the undefined options.hooks.

Initialize options.hooks to {} before the merge loop.

Reproduces with: archon workflow run archon-architect (which uses
per-node hooks extensively).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 14:52:56 +03:00
Rasmus Widing
ba4b9b47e6
docs(worktree): fix stale rename example + document copyFiles properly (#1328)
Three related fixes around the `worktree.copyFiles` primitive:

1. Remove the `.env.example -> .env` rename example from
   reference/configuration.md and getting-started/overview.md. The
   `->` parser was removed in #739 (2026-03-19) because it caused
   the stale-credentials production bug in #228 — but the docs kept
   advertising it. A user writing `.env.example -> .env` today gets
   `parseCopyFileEntry` returning `{source: '.env.example -> .env',
   destination: '.env.example -> .env'}`, stat() fails with ENOENT,
   and the copy silently no-ops at debug level.

2. Replace the single-line "Default behavior: .archon/ is always
   copied" note with a proper "Worktree file copying" subsection
   that explains:
   - Why this exists (git worktree add = tracked files only; gitignored
     workflow inputs need this hook)
   - The `.archon/` default (no config needed for the common case)
   - Common entries: .env, .vscode/, .claude/, plans/, reports/,
     data fixtures
   - Semantics: source=destination, ENOENT silently skipped, per-entry
     error isolation, path-traversal rejected
   - Interaction with `worktree.path` (both layouts get the same
     treatment)

3. Update the overview example to drop the `.env.example + .env` pair
   (which implied rename semantics) in favor of `.env + plans/`, and
   call out that `.archon/` is auto-copied so users don't list it.

No code changes. `bun run format:check` and `bun run lint` green.
2026-04-21 12:15:37 +03:00
Lior Franko
08de8ee5c6
fix(web,server): show real platform connection status in Settings (#1061)
The Settings page's Platform Connections section hardcoded every platform
except Web to 'Not configured', so users couldn't tell whether their Slack/
Telegram/Discord/GitHub/Gitea/GitLab adapters had actually started.

- Server: /api/health now returns an activePlatforms array populated live
  as each adapter's start() resolves. Passed into registerApiRoutes so the
  reference stays mutable — Telegram starts after the HTTP listener is
  already accepting requests, so a snapshot would miss it.
- Web: SettingsPage.PlatformConnectionsSection now reads activePlatforms
  from /api/health and looks each platform up in a Set. Also adds Gitea
  and GitLab to the list (they already ship as adapters).

Closes #1031

Co-authored-by: Lior Franko <liorfr@dreamgroup.com>
2026-04-21 11:47:32 +03:00
Rasmus Widing
5ed38dc765
feat(isolation,workflows): worktree location + per-workflow isolation policy (#1310)
Some checks are pending
E2E Smoke Tests / e2e-codex (push) Waiting to run
E2E Smoke Tests / e2e-claude (push) Waiting to run
E2E Smoke Tests / e2e-deterministic (push) Waiting to run
E2E Smoke Tests / e2e-mixed (push) Blocked by required conditions
Test Suite / test (ubuntu-latest) (push) Waiting to run
Test Suite / test (windows-latest) (push) Waiting to run
Test Suite / docker-build (push) Waiting to run
* feat(isolation): per-project worktree.path + collapse to two layouts

Adds an opt-in `worktree.path` to .archon/config.yaml so a repo can co-locate
worktrees with its own checkout (`<repoRoot>/<path>/<branch>`) instead of the
default `~/.archon/workspaces/<owner>/<repo>/worktrees/<branch>`. Requested in
joelsb's #1117.

Primitive changes (clean up the graveyard rather than add parallel code paths):

- Collapse worktree layouts from three to two. The old "legacy global" layout
  (`~/.archon/worktrees/<owner>/<repo>/<branch>`) is gone — every repo resolves
  to the workspace-scoped layout (`~/.archon/workspaces/<owner>/<repo>/worktrees/<branch>`),
  whether it was archon-cloned or locally registered. `extractOwnerRepo()` on
  the repo path is the stable identity fallback. Ends the divergence where
  workspace-cloned and local repos had visibly different worktree trees.

- `getWorktreeBase()` in @archon/git now returns `{ base, layout }` and accepts
  an optional `{ repoLocal }` override. The layout value replaces the old
  `isProjectScopedWorktreeBase()` classification at the call sites
  (`isProjectScopedWorktreeBase` stays exported as deprecated back-compat).

- `WorktreeCreateConfig.path` carries the validated override from repo config.
  `resolveRepoLocalOverride()` fails loudly on absolute paths, `..` escapes,
  and resolve-escape edge cases (Fail Fast — no silent default fallback when
  the config is syntactically wrong).

- `WorktreeProvider.create()` now loads repo config exactly once and threads it
  through `getWorktreePath()` + `createWorktree()`. Replaces the prior
  swallow-then-retry pattern flagged on #1117. `generateEnvId()` is gone —
  envId is assigned directly from the resolved path (the invariant was already
  documented on `destroy(envId)`).

Tests (packages/git + packages/isolation):
- Update the pre-existing `getWorktreeBase` / `isProjectScopedWorktreeBase`
  suite for the new two-layout return shape and precedence.
- Add 8 tests for `worktree.path`: default fallthrough, empty/whitespace
  ignored, override wins for workspace-scoped repos, rejects absolute, rejects
  `../` escapes (three variants), accepts nested relative paths.

Docs: add `worktree.path` to the repo config reference with explicit precedence
and the `.gitignore` responsibility note.

Co-authored-by: Joel Bastos <joelsb2001@gmail.com>

* feat(workflows): per-workflow worktree.enabled policy

Introduces a declarative top-level `worktree:` block on a workflow so
authors can pin isolation behavior regardless of invocation surface. Solves
the case where read-only workflows (e.g. `repo-triage`) should always run in
the live checkout, without every CLI/web/scheduled-trigger caller having to
remember to set the right flag.

Schema (packages/workflows/src/schemas/workflow.ts + loader.ts):

- New optional `worktree.enabled: boolean` on `workflowBaseSchema`. Loader
  parses with the same warn-and-ignore discipline used for `interactive`
  and `modelReasoningEffort` — invalid shapes log and drop rather than
  killing workflow discovery.

Policy reconciliation (packages/cli/src/commands/workflow.ts):

- Three hard-error cases when YAML policy contradicts invocation flags:
  • `enabled: false` + `--branch`       (worktree required by flag, forbidden by policy)
  • `enabled: false` + `--from`         (start-point only meaningful with worktree)
  • `enabled: true`  + `--no-worktree`  (policy requires worktree, flag forbids it)
- `enabled: false` + `--no-worktree` is redundant, accepted silently.
- `--resume` ignores the pinned policy (it reuses the existing run's worktree
  even when policy would disable — avoids disturbing a paused run).

Orchestrator wiring (packages/core/src/orchestrator/orchestrator-agent.ts):

- `dispatchOrchestratorWorkflow` short-circuits `validateAndResolveIsolation`
  when `workflow.worktree?.enabled === false` and runs directly in
  `codebase.default_cwd`. Web chat/slack/telegram callers have no flag
  equivalent to `--no-worktree`, so the YAML field is their only control.
- Logged as `workflow.worktree_disabled_by_policy` for operator visibility.

First consumer (.archon/workflows/repo-triage.yaml):

- `worktree: { enabled: false }` — triage reads issues/PRs and writes gh
  labels; no code mutations, no reason to spin up a worktree per run.

Tests:

- Loader: parses `worktree.enabled: true|false`, omits block when absent.
- CLI: four new integration tests for the reconciliation matrix (skip when
  policy false, three hard-error cases, redundant `--no-worktree` accepted,
  `--no-worktree` + `enabled: true` rejected).

Docs: authoring-workflows.md gets the new top-level field in the schema
example with a comment explaining the precedence and the `enabled: true|false`
semantics.

* fix(isolation): use path.sep for repo-containment check on Windows

resolveRepoLocalOverride was hardcoding '/' as the separator in the
startsWith check, so on Windows (where `resolve()` returns backslash
paths like `D:\Users\dev\Projects\myapp`) every otherwise-valid
relative `worktree.path` was rejected with "resolves outside the repo
root". Fixed by importing `path.sep` and using it in the sentinel.

Fixes the 3 Windows CI failures in `worktree.path repo-local override`.

---------

Co-authored-by: Joel Bastos <joelsb2001@gmail.com>
2026-04-20 21:54:10 +03:00
Rasmus Widing
7be4d0a35e
feat(paths,workflows): unify ~/.archon/{workflows,commands,scripts} + drop globalSearchPath (closes #1136) (#1315)
* feat(paths,workflows): unify ~/.archon/{workflows,commands,scripts} + drop globalSearchPath

Collapses the awkward `~/.archon/.archon/workflows/` convention to a direct
`~/.archon/workflows/` child (matching `workspaces/`, `archon.db`, etc.), adds
home-scoped commands and scripts with the same loading story, and kills the
opt-in `globalSearchPath` parameter so every call site gets home-scope for free.

Closes #1136 (supersedes @jonasvanderhaegen's tactical fix — the bug was the
primitive itself: an easy-to-forget parameter that five of six call sites on
dev dropped).

Primitive changes:

- Home paths are direct children of `~/.archon/`. New helpers in `@archon/paths`:
  `getHomeWorkflowsPath()`, `getHomeCommandsPath()`, `getHomeScriptsPath()`,
  and `getLegacyHomeWorkflowsPath()` (detection-only for migration).
- `discoverWorkflowsWithConfig(cwd, loadConfig)` reads home-scope internally.
  The old `{ globalSearchPath }` option is removed. Chat command handler, Web
  UI workflow picker, orchestrator resolve path — all inherit home-scope for
  free without maintainer patches at every new site.
- `discoverScriptsForCwd(cwd)` merges home + repo scripts (repo wins on name
  collision). dag-executor and validator use it; the hardcoded
  `resolve(cwd, '.archon', 'scripts')` single-scope path is gone.
- Command resolution is now walked-by-basename in each scope. `loadCommand`
  and `resolveCommand` walk 1 subfolder deep and match by `.md` basename, so
  `.archon/commands/triage/review.md` resolves as `review` — closes the
  latent bug where subfolder commands were listed but unresolvable.
- All three (`workflows/`, `commands/`, `scripts/`) enforce a 1-level
  subfolder cap (matches the existing `defaults/` convention). Deeper
  nesting is silently skipped.
- `WorkflowSource` gains `'global'` alongside `'bundled'` and `'project'`.
  Web UI node palette shows a dedicated "Global (~/.archon/commands/)"
  section; badges updated.

Migration (clean cut — no fallback read):

- First use after upgrade: if `~/.archon/.archon/workflows/` exists, Archon
  logs a one-time WARN per process with the exact `mv` command:
  `mv ~/.archon/.archon/workflows ~/.archon/workflows && rmdir ~/.archon/.archon`
  The legacy path is NOT read — users migrate manually. Rollback caveat
  noted in CHANGELOG.

Tests:

- `@archon/paths/archon-paths.test.ts`: new helper tests (default HOME,
  ARCHON_HOME override, Docker), plus regression guards for the double-`.archon/`
  path.
- `@archon/workflows/loader.test.ts`: home-scoped workflows, precedence,
  subfolder 1-depth cap, legacy-path deprecation warning fires exactly once
  per process.
- `@archon/workflows/validator.test.ts`: home-scoped commands + subfolder
  resolution.
- `@archon/workflows/script-discovery.test.ts`: depth cap + merge semantics
  (repo wins, home-missing tolerance).
- Existing CLI + orchestrator tests updated to drop `globalSearchPath`
  assertions.

E2E smoke (verified locally, before cleanup):

- `.archon/workflows/e2e-home-scope.yaml` + scratch repo at /tmp
- Home-scoped workflow discovered from an unrelated git repo
- Home-scoped script (`~/.archon/scripts/*.ts`) executes inside a script node
- 1-level subfolder workflow (`~/.archon/workflows/triage/*.yaml`) listed
- Legacy path warning fires with actionable `mv` command; workflows there
  are NOT loaded

Docs: `CLAUDE.md`, `docs-web/guides/global-workflows.md` (full rewrite for
three-type scope + subfolder convention + migration), `docs-web/reference/
configuration.md` (directory tree), `docs-web/reference/cli.md`,
`docs-web/guides/authoring-workflows.md`.

Co-authored-by: Jonas Vanderhaegen <7755555+jonasvanderhaegen@users.noreply.github.com>

* test(script-discovery): normalize path separators in mocks for Windows

The 4 new tests in `scanScriptDir depth cap` and `discoverScriptsForCwd —
merge repo + home with repo winning` compared incoming mock paths with
hardcoded forward-slash strings (`if (path === '/scripts/triage')`). On
Windows, `path.join('/scripts', 'triage')` produces `\scripts\triage`, so
those branches never matched, readdir returned `[]`, and the tests failed.

Added a `norm()` helper at module scope and wrapped the incoming `path`
argument in every `mockImplementation` before comparing. Stored paths go
through `normalizeSep()` in production code, so the existing equality
assertions on `script.path` remain OS-independent.

Fixes Windows CI job `test (windows-latest)` on PR #1315.

* address review feedback: home-scope error handling, depth cap, and tests

Critical fixes:
- api.ts: add `maxDepth: 1` to all 3 findMarkdownFilesRecursive calls in
  GET /api/commands (bundled/home/project). Without this the UI palette
  surfaced commands from deep subfolders that the executor (capped at 1)
  could not resolve — silent "command not found" at runtime.
- validator.ts: wrap home-scope findMarkdownFilesRecursive and
  resolveCommandInDir calls in try/catch so EACCES/EPERM on
  ~/.archon/commands/ doesn't crash the validator with a raw filesystem
  error. ENOENT still returns [] via the underlying helper.

Error handling fixes:
- workflow-discovery.ts: maybeWarnLegacyHomePath now sets the
  "warned-once" flag eagerly before `await access()`, so concurrent
  discovery calls (server startup with parallel codebase resolution)
  can't double-warn. Non-ENOENT probe errors (EACCES/EPERM) now log at
  WARN instead of DEBUG so permission issues on the legacy dir are
  visible in default operation.
- dag-executor.ts: wrap discoverScriptsForCwd in its own try/catch so
  an EACCES on ~/.archon/scripts/ routes through safeSendMessage /
  logNodeError with a dedicated "failed to discover scripts" message
  instead of being mis-attributed by the outer catch's
  "permission denied (check cwd permissions)" branch.

Tests:
- load-command-prompt.test.ts (new): 6 tests covering the executor's
  command resolution hot path — home-scope resolves when repo misses,
  repo shadows home, 1-level subfolder resolvable by basename, 2-level
  rejected, not-found, empty-file. Runs in its own bun test batch.
- archon-paths.test.ts: add getHomeScriptsPath describe block to match
  the existing getHomeCommandsPath / getHomeWorkflowsPath coverage.

Comment clarity:
- workflow-discovery.ts: MAX_DISCOVERY_DEPTH comment now leads with the
  actual value (1) before describing what 0 would mean.
- script-discovery.ts: copy the "routing ambiguity" rationale from
  MAX_DISCOVERY_DEPTH to MAX_SCRIPT_DISCOVERY_DEPTH.

Cleanup:
- Remove .archon/workflows/e2e-home-scope.yaml — one-off smoke test that
  would ship permanently in every project's workflow list. Equivalent
  coverage exists in loader.test.ts.

Addresses all blocking and important feedback from the multi-agent
review on PR #1315.

---------

Co-authored-by: Jonas Vanderhaegen <7755555+jonasvanderhaegen@users.noreply.github.com>
2026-04-20 21:45:32 +03:00
Rasmus Widing
cc78071ff6
fix(isolation): raise worktree git-operation timeout to 5m (#1306)
All 15 worktree git-subprocess timeouts in WorktreeProvider were hardcoded
at 30000ms. Repos with heavy post-checkout hooks (lint, dependency install,
submodule init) routinely exceed that budget and fail worktree creation.

Consolidate them onto a single GIT_OPERATION_TIMEOUT_MS constant at 5 min.
Generous enough to cover reported cases while still catching genuine hangs
(credential prompts in non-TTY, stalled fetches).

Chosen over the config-key approach in #1029 to avoid adding permanent
.archon/config.yaml surface for a problem a raised default solves cleanly.
If 5 min turns out to also be too tight for real-world use, we'll revisit.

Closes #1119
Supersedes #1029

Co-authored-by: Shay Elmualem <12733941+norbinsh@users.noreply.github.com>
2026-04-20 21:45:24 +03:00
Kagura
39a05b762f
fix(db): throw on corrupt commands JSON instead of silent empty fallback (#1033)
* fix(db): throw on corrupt commands JSON instead of silent empty fallback (#967)

getCodebaseCommands() silently returned {} when the commands column
contained corrupt JSON. Callers had no way to distinguish 'no commands'
from 'unreadable data', violating fail-fast principles.

Now throws a descriptive error with the codebase ID and a recovery hint.
The error is still logged for observability before throwing.

Adds two test cases: corrupt JSON throws, valid JSON string parses.

* fix: include parse error in log for better diagnostics
2026-04-20 16:19:50 +03:00
Cole Medin
cb44b96f7b
feat(providers/pi): interactive flag binds UIContext for extensions (#1299)
* feat(providers/pi): interactive flag binds UIContext for extensions

Adds `interactive: true` opt-in to Pi provider (in `.archon/config.yaml`
under `assistants.pi`) that binds a minimal `ExtensionUIContext` stub to
each session. Without this, Pi's `ExtensionRunner.hasUI()` reports false,
causing extensions like `@plannotator/pi-extension` to silently auto-approve
every plan instead of opening their browser review UI.

Semantics: clamped to `enableExtensions: true` — no extensions loaded
means nothing would consume `hasUI`, so `interactive` alone is silently
dropped. Stub forwards `notify()` to Archon's event stream; interactive
dialogs (select/confirm/input/editor/custom) resolve to undefined/false;
TUI-only setters (widgets/headers/footers/themes) no-op. Theme access
throws with a clear diagnostic — Pi's theme singleton is coupled to its
own `Symbol.for()` registry which Archon doesn't own.

Trust boundary: only binds when the operator has explicitly enabled
both flags. Extensions gated on `ctx.hasUI` (plannotator and similar)
get a functional UI context; extensions that reach for TUI features
still fail loudly rather than rendering garbage.

Includes smoke-test workflow documenting the integration surface.
End-to-end plannotator UI rendering requires plan-mode activation
(Pi `--plan` CLI flag or `/plannotator` TUI slash command) which is
out of reach for programmatic Archon sessions — manual test only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(providers/pi): end-to-end interactive extension UI

Three fixes that together get plannotator's browser review UI to actually
render from an Archon workflow and reach the reviewer's browser.

1. Call resourceLoader.reload() when enableExtensions is true.
   createAgentSession's internal reload is gated on `!resourceLoader`, so
   caller-supplied loaders must reload themselves. Without this,
   getExtensions() returns the empty default, no ExtensionRunner is built,
   and session.extensionRunner.setFlagValue() silently no-ops.

2. Set PLANNOTATOR_REMOTE=1 in interactive mode.
   plannotator-browser.ts only calls ctx.ui.notify(url) when openBrowser()
   returns { isRemote: true }; otherwise it spawns xdg-open/start on the
   Archon server host — invisible to the user and untestable from bash
   asserts. From the workflow runner's POV every Archon execution IS
   remote; flipping the heuristic routes the URL through notify(), which
   the ExtensionUIContext stub forwards into the event stream. Respect
   explicit operator overrides.

3. notify() emits as assistant chunks, not system chunks.
   The DAG executor's system-chunk filter only forwards warnings/MCP
   prefixes, and only assistant chunks accumulate into $nodeId.output.
   Emitting as assistant makes the URL available both in the user's
   stream and in downstream bash/script nodes via output substitution.

Plus: extensionFlags config pass-through (equivalent to `pi --plan` on the
CLI) applied via ExtensionRunner.setFlagValue() BEFORE bindExtensions
fires session_start, so extensions reading flags in their startup handler
actually see them. Also bind extensions with an empty binding when
enableExtensions is on but interactive is off, so session_start still
fires for flag-driven but UI-less extensions.

Smoke test (.archon/workflows/e2e-plannotator-smoke.yaml) uses
openai-codex/gpt-5.4-mini (ChatGPT Plus OAuth compatible) and bumps
idle_timeout to 600000ms so plannotator's server survives while a human
approves in the browser.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(providers/pi): keep Archon extension-agnostic

Remove the plannotator-specific PLANNOTATOR_REMOTE=1 env var write from
the Pi provider. Archon's provider layer shouldn't know about any
specific extension's internals. Document the env var in the plannotator
smoke test instead — operators who use plannotator set it via their shell
or per-codebase env config.

Workflow smoke test updated with:
- Instructions for setting PLANNOTATOR_REMOTE=1 externally
- Simpler assertion (URL emission only) — validated in a real
  reject-revise-approve run: reviewer annotated, clicked Send Feedback,
  Pi received the feedback as a tool result, revised the plan (added
  aria-label and WCAG contrast per the annotation), resubmitted, and
  reviewer approved. Plannotator's tool result signals approval but
  doesn't return the plan text, so the bash assertion now only checks
  that the review URL reached the stream (not that plan content flowed
  into \$nodeId.output — it can't).
- Known-limitation note documenting the tool-result shape so downstream
  workflow authors know to Write the plan separately if they need it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(providers/pi): keep e2e-plannotator-smoke workflow local-only

The smoke test is plannotator-specific (calls plannotator_submit_plan,
expects PLAN.md on disk, requires PLANNOTATOR_REMOTE=1) and is better
kept out of the PR while the extension-agnostic infra lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style(providers/pi): trim verbose inline comments

Collapse multi-paragraph SDK explanations to 1-2 line "why" notes across
provider.ts, types.ts, ui-context-stub.ts, and event-bridge.ts. No
behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(providers/pi): wire assistants.pi.env + theme-proxy identity

Two end-to-end fixes discovered while exercising the combined
plannotator + @pi-agents/loop smoke flow:

- PiProviderDefaults gains an optional `env` map; parsePiConfig picks
  it up and the provider applies it to process.env at session start
  (shell env wins, no override). Needed so extensions like plannotator
  can read PLANNOTATOR_REMOTE=1 from config.yaml without requiring a
  shell export before `archon workflow run`.

- ui-context-stub theme proxy returns identity decorators instead of
  throwing on unknown methods. Styled strings flow into no-op
  setStatus/setWidget sinks anyway, so the throw was blocking
  plannotator_submit_plan after HTTP approval with no benefit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(providers/pi): flush notify() chunks immediately in batch mode

Batch-mode adapters (CLI) accumulate assistant chunks and only flush on
node completion. That broke plannotator's review-URL flow: Pi's notify()
emitted the URL as an assistant chunk, but the user needed the URL to
POST /api/approve — which is what unblocks the node in the first place.

Adds an optional `flush` flag on assistant MessageChunks. notify() sets
it, and the DAG executor drains pending batched content before surfacing
the flushed chunk so ordering is preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: mention Pi alongside Claude and Codex in README + top-level docs

The AI assistants docs page already covers Pi in depth, but the README
architecture diagram + docs table, overview "Further Reading" section,
and local-deployment .env comment still listed only Claude/Codex.

Left feature-specific mentions alone where Pi genuinely lacks support
(e.g. structured output — Claude + Codex only).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: note Pi structured output (best-effort) in matrix + workflow docs

Pi gained structured output support via prompt augmentation + JSON
extraction (see packages/providers/src/community/pi/capabilities.ts).
Unlike Claude/Codex, which use SDK-enforced JSON mode, Pi appends the
schema to the prompt and parses JSON out of the result text (bare or
fenced). Updates four stale references that still said Claude/Codex-only:

- ai-assistants.md capabilities matrix
- authoring-workflows.md (YAML example + field table)
- workflow-dag.md skill reference
- CLAUDE.md DAG-format node description

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(providers/pi): default extensions + interactive to on

Extensions (community packages like @plannotator/pi-extension and
user-authored ones) are a core reason users pick Pi. Defaulting
enableExtensions and interactive to false previously silenced installed
extensions with no signal, leading to "did my extension even load?"
confusion.

Opt out in .archon/config.yaml when you want the prior behavior:

  assistants:
    pi:
      enableExtensions: false   # skip extension discovery entirely
      # interactive: false       # load extensions, but no UI bridge

Docs gain a new "Extensions (on by default)" section in
getting-started/ai-assistants.md that documents the three config
surfaces (extensionFlags, env, workflow-level interactive) and uses
plannotator as a concrete walk-through example.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-20 07:37:40 -05:00
Cocoon-Break
45682bd2c8
fix(providers/claude): use || instead of ?? in hasExplicitTokens to handle empty-string env vars (#1028)
Closes #1027
2026-04-20 14:15:27 +03:00
Rasmus Widing
28908f0c75
feat(paths/cli/setup): unify env load + write on three-path model (#1302, #1303) (#1304)
* feat(paths/cli/setup): unify env load + write on three-path model (#1302, #1303)

Key env handling on directory ownership rather than filename. `.archon/` (at
`~/` or `<cwd>/`) is archon-owned; anything else is the user's.

- `<repo>/.env` — stripped at boot (guard kept), never loaded, never written
- `<repo>/.archon/.env` — loaded at repo scope (wins over home), writable via
  `archon setup --scope project`
- `~/.archon/.env` — loaded at home scope, writable via `--scope home` (default)

Read side (#1302):
- New `@archon/paths/env-loader` with `loadArchonEnv(cwd)` shared by CLI and
  server entry points. Loads both archon-owned files with `override: true`;
  repo scope wins.
- Replaced `[dotenv@17.3.1] injecting env (0) from .env` (always lied about
  stripped keys) with `[archon] stripped N keys from <cwd> (...)` and
  `[archon] loaded N keys from <path>` lines, emitted only when N > 0.
  `quiet: true` passed to dotenv to silence its own output.
- `stripCwdEnv` unchanged in semantics — still the only source that deletes
  keys from `process.env`; now logs what it did.

Write side (#1303):
- `archon setup` never writes to `<repo>/.env`. Writing there was incoherent
  because `stripCwdEnv` deletes those keys on every run.
- New `--scope home|project` (default home) targets exactly one archon-owned
  file. New `--force` overrides the merge; backup still written.
- Merge-only by default: existing non-empty values win, user-added custom keys
  survive, `<path>.archon-backup-<ISO-ts>` written before every rewrite. Fixes
  silent PostgreSQL→SQLite downgrade and silent token loss in Add mode.
- One-time migration note emitted when `<cwd>/.env` exists at setup start.

Tests: new `env-loader.test.ts` (6), extended `strip-cwd-env.test.ts` (+4 for
the log line), extended `setup.test.ts` (+10 for scope/merge/backup/force/
repo-untouched), extended `cli.test.ts` (+5 for flag parsing).

Docs: configuration.md, cli.md, security.md, cli-internals.md, setup skill —
all updated to the three-path model.

* fix(cli/setup): address PR review — scope/path/secret-handling edge cases

- cli: resolve --scope project to git repo root so running setup from a
  subdir writes to <repo-root>/.archon/.env (what loadArchonEnv reads at
  boot), not <subdir>/.archon/.env. Fail fast with a useful message when
  --scope project is used outside a git repo.
- setup: resolveScopedEnvPath() now delegates to @archon/paths helpers
  (getArchonEnvPath / getRepoArchonEnvPath) so Docker's /.archon home,
  ARCHON_HOME overrides, and the "undefined" literal guard all behave
  identically between the loader and the writer.
- setup: wrap the writeScopedEnv call in try/catch so an fs exception
  (permission denied, read-only FS, backup copy failure) stops the clack
  spinner cleanly and emits an actionable error instead of a raw stack
  trace after the user has completed the entire wizard.
- setup: checkExistingConfig(envPath?) — scope-aware existing-config read.
  Add/Update/Fresh now reflects the actual write target, not an
  unconditional ~/.archon/.env.
- setup: serializeEnv escapes \r (was only \n) so values with bare CR or
  CRLF round-trip through dotenv.parse without corruption. Regression
  test added.
- setup: merge path treats whitespace-only existing values ('   ') as
  empty, so a copy-paste stray space doesn't silently defeat the wizard
  update for that key forever. Regression test added.
- setup: 0o600 mode on the written env file AND on backup copies —
  writeFileSync+copyFileSync default to 0o666 & ~umask, which can leave
  secrets group/world-readable on a permissive umask.
- docs/cli.md + setup skill: appendix sections that still described the
  pre-#1303 two-file symlink model now reflect the three-path model.

* fix(paths/env-loader): Windows-safe assertion for home-scope load line

The test asserted the log line contained `from ~/`, which is opportunistic
tilde-shortening that only happens when the tmpdir lives under `homedir()`.
On Windows CI the tmpdir is on `D:\\` while homedir is `C:\\Users\\...`, so
the path renders absolute and the `~/` never appears.

Match on the count and the archon-home tmpdir segment instead — robust on
both Unix tilde-short paths and Windows absolute paths.
2026-04-20 12:49:14 +03:00
Rasmus Widing
8ae4a56193
feat(workflows): add repo-triage — periodic maintenance via inline Haiku sub-agents (#1293)
* feat(workflows): add repo-triage — 6-node periodic maintenance workflow

Adds .archon/workflows/repo-triage.yaml: a self-contained periodic
maintenance workflow that uses inline sub-agents (Claude SDK agents:
field introduced in #1276) for map-reduce across open issues and PRs.

Six DAG nodes, three-layer topology:
- Layer 1 (parallel): triage-issues, link-prs, closed-pr-dedup-check,
  stale-nudge
- Layer 2: closed-dedup-check (reads triage-issues state)
- Layer 3: digest (synthesises all prior nodes + writes markdown)

Capabilities per node:
- triage-issues: delegates labeling to on-disk triage-agent; inline
  brief-gen Haiku for duplicate detection; 3-day auto-close clock
  for unanswered duplicate warnings
- link-prs: conservative PR ↔ issue cross-refs via inline pr-issue-
  matcher Haiku, Sonnet re-verifies fully-addresses claims before
  suggesting Closes #X; auto-nudges on low-quality PR template fill
  with first-run grandfather guard (snapshot-only, no nudge spam)
- closed-dedup-check: cross-matches open issues against recently-
  closed ones via inline closed-brief-gen Haiku; same 3-day clock
- closed-pr-dedup-check: flags open PRs duplicating recently-closed
  PRs via inline pr-brief-gen Haiku; comment-only, never closes PRs
- stale-nudge: 60-day inactivity pings (configurable); no auto-close
- digest: synthesises per-node outputs + reads state files to emit
  $ARTIFACTS_DIR/digest.md with clickable GitHub comment links

Env-gated rollout knobs:
- DRY_RUN=1 (read-only; prints [DRY] lines, no gh/state mutations)
- SKIP_PR_LINK=1, SKIP_CLOSED_DEDUP=1, SKIP_CLOSED_PR_DEDUP=1,
  SKIP_STALE_NUDGE=1
- STALE_DAYS=N (stale-nudge window; default 60)

Cross-run state under .archon/state/ (gitignored):
- triage-state.json        briefs + pendingDedupComments
- closed-dedup-state.json  closedBriefs + closedMatchComments
- closed-pr-dedup-state.json openBriefs + closedBriefs + matches
- pr-state.json            linkedPrs + commentIds + templateAdherence
- stale-nudge-state.json   nudged (with updatedAtAtNudge for re-nudge)

Every bot comment:
- @-tags the target human (reporter for issues, author for PRs)
- Tracks comment ID in state for traceability
- Is idempotent — re-runs skip existing comments

Intended use: invoke periodically (`archon workflow run repo-triage
--no-worktree`) once a scheduler lands; live state persists across
runs so previously-flagged items reconcile correctly.

.gitignore: adds .archon/state/ for cross-run memory files.

* feat(workflows/repo-triage): post digest to Slack when SLACK_WEBHOOK is set

Extends the digest node with an optional Slack-post step after the
canonical digest.md artifact is written. Uses Slack incoming webhook
(no bot token required beyond the incoming-webhook scope).

Behavior:
- SLACK_WEBHOOK unset → skipped silently with a one-line note
- DRY_RUN=1 → prints full payload, does not curl
- Otherwise → POSTs a compact (<3500 char) mrkdwn-formatted summary
  containing headline numbers, this-run comment index (clickable
  GitHub URLs), pending items, and a path reference to digest.md
- curl failure or non-ok Slack response is logged but does not fail
  the node — digest.md on disk remains authoritative
- Intermediate Slack text written to $ARTIFACTS_DIR/digest-slack.txt
  for traceability; payload JSON assembled via jq and written to
  $ARTIFACTS_DIR/slack-payload.json before curl posts it

Slack mrkdwn conversion rules baked into the prompt (no tables, link
shape <url|text>, single-asterisk bold) so Sonnet emits a variant
that renders cleanly in Slack rather than being sent raw.

The webhook URL is read from the operator's environment (Archon
auto-loads ~/.archon/.env on CLI startup — put SLACK_WEBHOOK=... there).

* fix(workflows/repo-triage): address PR #1293 review feedback

Critical (3):
- `gh issue close --reason "not planned"` (space, not underscore) — the
  CLI expects lowercase with a space; `not_planned` fails at runtime.
  Fixed in both auto-close paths (triage-issues step 8, closed-dedup-
  check step 7).
- link-prs step 7 state save was sparse `{ sha, processedAt, related,
  fullyAddresses }`, overwriting `commentIds` / `templateNudgedAt` /
  `templateAdherence`. Changed to explicit merge that spreads existing
  entry first so per-run captured fields survive.
- Corrupt-JSON state files previously treated as first-run default
  (silent `pendingDedupComments` reset → 3-day clock restarts forever).
  All five state-load sites now abort loudly on JSON.parse throw;
  ENOENT/empty continue to default-shape.

Important (7):
- Sub-agents (`brief-gen`, `closed-brief-gen`, `pr-brief-gen`,
  `pr-issue-matcher`) emit `ERROR: <reason>` on gh failures rather than
  partial/fabricated JSON. Orchestrator detects the sentinel, logs the
  failed ID + first 200 chars of raw response, tracks in a failed-list,
  and aborts the cluster/match pass if ≥50% of items failed (avoids
  acting on bad data).
- `pr-brief-gen` now sets `diffTruncated: true` when the 30k-char diff
  cap hits; link-prs verify pass downgrades any `fully-addresses` claim
  to `related` when either side's brief was truncated.
- 3-day auto-close validates `postedAt` parses as ISO-8601 before the
  elapsed-time comparison; corrupt timestamps are logged and skipped,
  never acted on.
- `gh issue close` failure path no longer drops state — sets
  `closeAttemptFailed: true` on the entry for next-run retry. Only
  drops on exit 0.
- `closed-pr-dedup-check` idempotency check (`gh pr view --json comments`)
  now aborts the post on fetch failure rather than falling through —
  prevents double-posts on gh hiccups.
- `triage-agent` label pass has preflight `test -f` check for
  `.claude/agents/triage-agent.md`; skips the pass with a clear log if
  the file is missing rather than firing Task calls that fail obscurely.
- `brief-gen` template-adherence wording flipped from "Ignore … as
  'filled'" (ambiguous, read as affirmative) to explicit "A section
  counts as MISSING when …", matching the `pr-issue-matcher` phrasing.

Minor:
- `stale-nudge` idempotency check uses substring "has been quiet for"
  instead of a prefix check that never matched (posted body starts
  with @<author>).
- `closed-dedup-check` distinguishes "upstream crashed" (missing/corrupt
  triage-state.json, or `lastRunAt == null`) from "legitimately quiet
  day" (state present, briefs empty) — different log lines.
- Slack curl adds `-w "\nHTTP_STATUS:%{http_code}"` + `2>&1` so TLS /
  4xx / 5xx errors are visible in captured output.
- `stateReason` values from `gh issue view --json stateReason` are
  UPPERCASE (`COMPLETED`, `NOT_PLANNED`); documented and instruct
  sub-agent to normalize to lowercase for consistency.

Docs:
- CLAUDE.md repo-level `.archon/` tree now lists `state/`.
- archon-directories.md tree adds `state/` + `scripts/` (both were
  missing) with purpose descriptions.

Deferred (worth doing as a follow-up, not blocking):
- DRY/SKIP preamble duplication (~30-50 lines across 5 nodes).
- Explicit `BASELINE_IS_EMPTY` capture in link-prs (current derived
  check works but is a load-bearing model instruction).
- Digest `WARNING` prefix block when upstream nodes are missing
  outputs — today's "(output unavailable)" sub-line is functional.
- Pre-existing README workflow-count (17 → 20) and table gaps — not
  caused by this PR.
2026-04-20 11:34:38 +03:00
Fly Lee
eb730c0b82
fix(docs): prevent theme reset to dark after user switches to auto/light (#1079)
Starlight removes the `starlight-theme` localStorage key when the user
selects "auto" mode. The old init script checked that key, so every
navigation or refresh re-forced dark theme. Use a separate
`archon-theme-init` sentinel that persists across theme changes.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:01:27 +03:00
Cole Medin
ec5e5a5cf9
feat(providers/pi): opt-in extension discovery via config flag (#1298)
Some checks are pending
E2E Smoke Tests / e2e-deterministic (push) Waiting to run
Test Suite / test (ubuntu-latest) (push) Waiting to run
Test Suite / test (windows-latest) (push) Waiting to run
E2E Smoke Tests / e2e-claude (push) Waiting to run
E2E Smoke Tests / e2e-codex (push) Waiting to run
E2E Smoke Tests / e2e-mixed (push) Blocked by required conditions
Test Suite / docker-build (push) Waiting to run
Adds `assistants.pi.enableExtensions` (default false) to `.archon/config.yaml`.
When true, Pi's `noExtensions` guard is lifted so the session loads tools and
lifecycle hooks from `~/.pi/agent/extensions/`, packages installed via
`pi install npm:<pkg>`, and the workflow's cwd `.pi/` directory — opening up
the community extension ecosystem at https://shittycodingagent.ai/packages.

Default stays suppressed to preserve the "Archon is source of truth" trust
boundary: enabling this loads arbitrary JS under the Archon server's OS
permissions, including whatever extension code the target repo happens to
ship. Operators opt in explicitly, per-host.

Skills, prompt templates, themes, and context files remain suppressed even
when extensions are enabled — only the extensions gate opens.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 14:35:52 -05:00
Cole Medin
fb73a500d7
feat(providers/pi): best-effort structured output via prompt engineering (#1297)
Pi's SDK has no native JSON-schema mode (unlike Claude's outputFormat /
Codex's outputSchema). Previously Pi declared structuredOutput: false
and any workflow using output_format silently degraded — the node ran,
the transcript was treated as free text, and downstream $nodeId.output.field
refs resolved to empty strings. 8 bundled/repo workflows across 10 nodes
were affected (archon-create-issue, archon-fix-github-issue,
archon-smart-pr-review, archon-workflow-builder, archon-validate-pr, etc.).

This PR closes the gap via prompt engineering + post-parse:

1. When requestOptions.outputFormat is present, the provider appends a
   "respond with ONLY a JSON object matching this schema" instruction plus
   JSON.stringify(schema) to the prompt before calling session.prompt().

2. bridgeSession accepts an optional jsonSchema param. When set, it buffers
   every assistant text_delta and — on the terminal result chunk — parses
   the buffer via tryParseStructuredOutput (trims whitespace, strips
   ```json / ``` fences, JSON.parse). On success, attaches
   structuredOutput to the result chunk (matching Claude's shape). On
   failure, emits a warn event and leaves structuredOutput undefined so
   the executor's existing dag.structured_output_missing path handles it.

3. Flipped PI_CAPABILITIES.structuredOutput to true. Unlike Claude/Codex
   this is best-effort, not SDK-enforced — reliable on GPT-5, Claude,
   Gemini 2.x, recent Qwen Coder, DeepSeek V3, less reliable on smaller
   or older models that ignore JSON-only instructions.

Tests added (14 total):
- tryParseStructuredOutput: clean JSON, fenced, bare fences, arrays,
  whitespace, empty, prose-wrapped (fails), malformed, inner backticks
- augmentPromptForJsonSchema via provider integration: schema appended,
  prompt unchanged when absent
- End-to-end: clean JSON → structuredOutput parsed; fenced JSON parses;
  prose-wrapped → no structuredOutput + no crash; no outputFormat →
  never sets structuredOutput even if assistant happens to emit JSON

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 10:16:02 -05:00
Cole Medin
83c119af78
fix(providers/pi): wire env injection + harden silent-failure paths (#1296)
Four defensive fixes to the Pi community provider to match the
Claude/Codex contract and eliminate silent error swallowing.

1. envInjection now actually wired (capability was declared but unused)
   Pi's SDK has no top-level `env` option on createAgentSession, so
   per-project env vars were being dropped. Routes requestOptions.env
   through a BashSpawnHook that merges caller env over the inherited
   baseline (caller wins, matching Claude/Codex semantics). When env is
   present with no allow/deny, resolvePiTools now explicitly returns Pi's
   4 default tools so the pre-constructed default bashTool is replaced
   with an env-aware one.

2. AsyncQueue no longer leaks on consumer abort. Added close() that
   drains pending waiters with { done: true } so iterate() exits instead
   of hanging forever when the producer's finally fires before the next
   push. bridgeSession calls queue.close() in its finally block.

3. buildResultChunk no longer reports silent success when agent_end fires
   with no assistant message. Now returns { isError: true, errorSubtype:
   'missing_assistant_message' } and logs a warn event so broken Pi
   sessions don't masquerade as clean completions.

4. session-resolver no longer swallows arbitrary errors from
   SessionManager.list(). Narrowed the catch to ENOENT/ENOTDIR (the only
   "session dir doesn't exist yet" signals); permission errors, parse
   failures, and other unexpected errors now propagate.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 09:20:32 -05:00
Rasmus Widing
60eeb00e42
feat(workflows): inline sub-agent definitions on DAG nodes (#1276)
Some checks are pending
Test Suite / test (windows-latest) (push) Waiting to run
Test Suite / docker-build (push) Waiting to run
E2E Smoke Tests / e2e-deterministic (push) Waiting to run
E2E Smoke Tests / e2e-claude (push) Waiting to run
E2E Smoke Tests / e2e-codex (push) Waiting to run
E2E Smoke Tests / e2e-mixed (push) Blocked by required conditions
Test Suite / test (ubuntu-latest) (push) Waiting to run
* feat(workflows): inline sub-agent definitions on DAG nodes

Add `agents:` node field letting workflow YAML define Claude Agent SDK
sub-agents inline, keyed by kebab-case ID. The main agent can spawn
them via the Task tool — useful for map-reduce patterns where a cheap
model briefs items and a stronger model reduces.

Authors no longer need standalone `.claude/agents/*.md` files for
workflow-scoped helpers; the definitions live with the workflow.

Claude only. Codex and community providers without the capability
emit a capability warning and ignore the field. Merges with the
internal `dag-node-skills` wrapper when `skills:` is also set —
user-defined agents win on ID collision.

* fix(workflows): address PR #1276 review feedback

Critical:
- Re-export agentDefinitionSchema + AgentDefinition from schemas/index.ts
  (matches the "schemas/index.ts re-exports all" convention).

Important:
- Surface user-override of internal 'dag-node-skills' wrapper: warn-level
  provider log + platform message to the user when agents: redefines the
  reserved ID alongside skills:. User-wins behavior preserved (by design)
  but silent capability removal is now observable.
- Add validator test coverage for the agents-capability warning (codex
  node with agents: → warning; claude node → no warning; no-agents
  field → no warning).
- Strengthen NodeConfig.agents duplicate-type comment explaining the
  intentional circular-dep avoidance and pointing at the Zod schema as
  authoritative source. Actual extraction is follow-up work.

Simplifications:
- Drop redundant typeof check in validator (schema already enforces).
- Drop unreachable Object.keys(...).length > 0 check in dag-executor.
- Drop rot-prone "(out of v1 scope)" parenthetical.
- Drop WHAT-only comment on AGENT_ID_REGEX.
- Tighten AGENT_ID_REGEX to reject trailing/double hyphens
  (/^[a-z0-9]+(-[a-z0-9]+)*$/).

Tests:
- parseWorkflow strips agents on script: and loop: nodes (parallel to
  the existing bash: coverage).
- provider emits warn log on dag-node-skills collision; no warn on
  non-colliding inline agents.

Docs:
- Renumber authoring-workflows Summary section (12b → 13; bump 13-19).
- Add Pi capability-table row for inline agents (, Claude-only).
- Add when-to-use guidance (agents: vs .claude/agents/*.md) in the
  new "Inline sub-agents" section.
- Cross-link skills.md Related → inline-sub-agents.
- CHANGELOG [Unreleased] Added entry for #1276.
2026-04-19 09:16:01 +03:00
Cole Medin
4c6ddd994f
fix(workflows): fail loudly on SDK isError results (#1208) (#1291)
Some checks are pending
E2E Smoke Tests / e2e-deterministic (push) Waiting to run
E2E Smoke Tests / e2e-mixed (push) Blocked by required conditions
Test Suite / test (windows-latest) (push) Waiting to run
Test Suite / docker-build (push) Waiting to run
E2E Smoke Tests / e2e-codex (push) Waiting to run
E2E Smoke Tests / e2e-claude (push) Waiting to run
Test Suite / test (ubuntu-latest) (push) Waiting to run
Previously, `dag-executor` only failed nodes/iterations when the SDK
returned an `error_max_budget_usd` result. Every other `isError: true`
subtype — including `error_during_execution` — was silently `break`ed
out of the stream with whatever partial output had accumulated, letting
failed runs masquerade as successful ones with empty output.

This is the most likely explanation for the "5-second crash" symptom in
#1208: iterations finish instantly with empty text, the loop keeps
going, and only the `claude.result_is_error` log tips the user off.

Changes:
- Capture the SDK's `errors: string[]` detail on result messages
  (previously discarded) and surface it through `MessageChunk.errors`.
- Log `errors`, `stopReason` alongside `errorSubtype` in
  `claude.result_is_error` so users can see what actually failed.
- Throw from both the general node path and the loop iteration path
  on any `isError: true` result, including the subtype and SDK errors
  detail in the thrown message.

Note: this does not implement auto-retry. See PR comments on #1121 and
the analysis on #1208 — a retry-with-fresh-session approach for loop
iterations is not obviously correct until we see what
`error_during_execution` actually carries in the reporter's env.
This change is the observability + fail-loud step that has to come
first so that signal is no longer silent.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 15:02:35 -05:00
DIY Smart Code
d89bc767d2
fix(setup): align PORT default on 3090 across .env.example, wizard, and JSDoc (#1152) (#1271)
The server's getPort() fallback changed from 3000 to 3090 in the Hono
migration (#318), but .env.example, the setup wizard's generated .env,
and the JSDoc describing the fallback were not updated — leaving three
different sources of truth for "the default PORT."

When the wizard writes PORT=3000 to ~/.archon/.env (which the Hono
server loads with override: true, while Vite only reads repo-local
.env), the two processes can land on different ports silently. That
mismatch is the real mechanism behind the failure described in #1152.

- .env.example: comment out PORT, document 3090 as the default
- packages/cli/src/commands/setup.ts: wizard no longer writes PORT=3000
  into the generated .env; fix the "Additional Options" note
- packages/cli/src/commands/setup.test.ts: assert no bare PORT= line and
  the commented default is present
- packages/core/src/utils/port-allocation.ts: fix stale JSDoc "default
  3000" -> "default 3090"
- deploy/.env.example: keep Docker default at 3000 (compose/Caddy target
  that) but annotate it so users don't copy it for local dev

Single source of truth for the local-dev default is now basePort in
port-allocation.ts.
2026-04-17 14:15:37 +02:00
Rasmus Widing
c864d8e427
refactor(providers/pi): drop rot-prone file:line refs from code comments (#1275)
Applies the CLAUDE.md comment rule ("don't embed paths/callers that rot
as the codebase evolves") flagged by the PR #1271 review to the Pi
provider's inline comments.

Three spots in the merged Pi code embed `packages/.../provider.ts:N-M`
line ranges pointing at the Claude and Codex providers. These ranges
will drift the moment those files change — the Claude auth-merge
pattern's line numbers are already off-by-a-few in some local branches.

Keep the conceptual cross-reference ("mirrors Claude's process-env +
request-env merge pattern", "matches the Codex provider's fallback
pattern for the same condition") — that's the load-bearing part of the
comment — drop the fragile line numbers and file paths.

Same treatment for the upstream Pi auth-storage.ts:424-485 reference,
which points at a specific line range in a moving dependency.

No behavior change; comment-only refactor.
2026-04-17 14:08:43 +02:00
Rasmus Widing
4e56991b72
feat(providers): add Pi community provider (@mariozechner/pi-coding-agent) (#1270)
* feat(providers): add Pi community provider (@mariozechner/pi-coding-agent)

Introduces Pi as the first community provider under the Phase 2 registry,
registered with builtIn: false. Wraps Pi's full coding-agent harness the
same way ClaudeProvider wraps @anthropic-ai/claude-agent-sdk and
CodexProvider wraps @openai/codex-sdk.

- PiProvider implements IAgentProvider; fresh AgentSession per sendQuery call
- AsyncQueue bridges Pi's callback-based session.subscribe() to Archon's
  AsyncGenerator<MessageChunk> contract
- Server-safe: AuthStorage.inMemory + SessionManager.inMemory +
  SettingsManager.inMemory + DefaultResourceLoader with all no* flags —
  no filesystem access, no cross-request state
- API key seeded per-call from options.env → process.env fallback
- Model refs: '<pi-provider-id>/<model-id>' (e.g. google/gemini-2.5-pro,
  openrouter/qwen/qwen3-coder) with syntactic compatibility check
- registerPiProvider() wired at CLI, server, and config-loader entrypoints,
  kept separate from registerBuiltinProviders() since builtIn: false is
  load-bearing for the community-provider validation story
- All 12 capability flags declared false in v1 — dag-executor warnings fire
  honestly for any unmapped nodeConfig field
- 58 new tests covering event mapping, async-queue semantics, model-ref
  parsing, defensive config parsing, registry integration

Supported Pi providers (v1): anthropic, openai, google, groq, mistral,
cerebras, xai, openrouter, huggingface. Extend PI_PROVIDER_ENV_VARS as
needed.

Out of scope (v1): session resume, MCP, hooks, skills mapping, thinking
level mapping, structured output, OAuth flows, model catalog validation.
These remain false on PI_CAPABILITIES until intentionally wired.

* feat(providers/pi): read ~/.pi/agent/auth.json for OAuth + api_key passthrough

Replaces the v1 env-var-only auth flow with AuthStorage.create(), which
reads ~/.pi/agent/auth.json. This transparently picks up credentials the
user has populated via `pi` → `/login` (OAuth subscriptions: Claude
Pro/Max, ChatGPT Plus, GitHub Copilot, Gemini CLI, Antigravity) or by
editing the file directly.

Env-var behavior preserved: when ANTHROPIC_API_KEY / GEMINI_API_KEY /
etc. is set (in process.env or per-request options.env), the adapter
calls setRuntimeApiKey which is priority #1 in Pi's resolution chain.
Auth.json entries are priority #2-#3. Pi's internal env-var fallback
remains priority #4 as a safety net.

Archon does not implement OAuth flows itself — it only rides on creds
the user created via the Pi CLI. OAuth refresh still happens inside Pi
(auth-storage.ts:369-413) under a file lock; concurrent refreshes
between the Pi CLI and Archon are race-safe by Pi's own design.

- Fail-fast error now mentions both the env-var path and `pi /login`
- 2 new tests: OAuth cred from auth.json; env var wins over auth.json
- 12 existing tests still pass (env-var-only path unchanged)

CI compatibility: no auth.json in CI, no change — env-var (secrets)
flows through Pi's getEnvApiKey fallback identically to v1.

* test(e2e): add Pi provider smoke test workflow

Mirrors e2e-claude-smoke.yaml: single prompt node + bash assert.
Targets `anthropic/claude-haiku-4-5` via `provider: pi`; works in CI
(ANTHROPIC_API_KEY secret) and locally (user's `pi /login` OAuth).

Verified locally with an Anthropic OAuth subscription — full run takes
~4s from session_started to assert PASS, exercising the async-queue
bridge and agent_end → result-chunk assembly under real Pi event timing.

Not yet wired into .github/workflows/e2e-smoke.yml — separate PR once
this lands, to keep the Pi provider PR minimal.

* feat(providers/pi): v2 — thinkingLevel, tool restrictions, systemPrompt

Extends the Pi adapter with three node-level translations, flipping the
corresponding capability flags from false → true so the dag-executor no
longer emits warnings for these fields on Pi nodes.

1. effort / thinking → Pi thinkingLevel (options-translator.ts)
   - Archon EffortLevel enum: low|medium|high|max (from
     packages/workflows/src/schemas/dag-node.ts). `max` maps to Pi's
     `xhigh` since Archon's enum lacks it.
   - Pi-native strings (minimal, xhigh, off) also accepted for
     programmatic callers bypassing the schema.
   - `off` on either field → no thinkingLevel (Pi's implicit off).
   - Claude-shape object `thinking: {type:'enabled', budget_tokens:N}`
     yields a system warning and is not applied.

2. allowed_tools / denied_tools → filtered Pi built-in tools
   - Supports all 7 Pi tools: read, bash, edit, write, grep, find, ls.
   - Case-insensitive normalization.
   - Empty `allowed_tools: []` means no tools (LLM-only), matching
     e2e-claude-smoke's idiom.
   - Unknown names (Claude-specific like `WebFetch`) collected and
     surfaced as a system warning; ignored tools don't fail the run.

3. systemPrompt (AgentRequestOptions + nodeConfig.systemPrompt)
   - Threaded through `DefaultResourceLoader({systemPrompt})`; Pi's
     default prompt is replaced entirely. Request-level wins over
     node-level.

Capability flag changes:
- thinkingControl: false → true
- effortControl:   false → true
- toolRestrictions: false → true

Package delta:
- +1 direct dep: @sinclair/typebox (Pi types reference it; adding as
  direct dep resolves the TS portable-type error).
- +1 test file: options-translator.test.ts (19 tests, 100% coverage).
- provider.test.ts extended with 11 new tests covering all three paths.
- registry.test.ts updated: capability assertion reflects new flags.

Live-verified: `bun run cli workflow run e2e-pi-smoke --no-worktree`
succeeds in 1.2s with thinkingLevel=low, toolCount=0. Smoke YAML updated
to use `effort: low` (schema-valid) + `allowed_tools: []` (LLM-only).

* test(e2e): add comprehensive Pi smoke covering every CI-compatible node type

Exercises every node type Archon supports under `provider: pi`, except
`approval:` (pauses for human input, incompatible with CI):
  1. prompt   — inline AI prompt
  2. command  — named command file (uses e2e-echo-command.md)
  3. loop     — bounded iterative AI prompt (max_iterations: 2)
  4. bash     — shell script with JSON output
  5. script   — bun runtime (echo-args.js)
  6. script   — uv / Python runtime (echo-py.py)

Plus DAG features on top of Pi:
  - depends_on + $nodeId.output substitution
  - when: conditional with JSON dot-access
  - trigger_rule: all_success merge
  - final assert node validates every upstream output is non-empty

Complements the minimal e2e-pi-smoke.yaml — that stays as the fast-path
smoke for connectivity checks; this one is the broader surface coverage.

Verified locally end-to-end against Anthropic OAuth (pi /login): PASS,
all 9 non-final nodes produce output, assert succeeds.

* feat(providers/pi): resolve Archon `skills:` names to Pi skill paths

Flips capabilities.skills: false → true by translating Archon's name-based
`skills:` nodeConfig (e.g. `skills: [agent-browser]`) to absolute directory
paths Pi's DefaultResourceLoader can consume via additionalSkillPaths.

Search order for each skill name (first match wins):
  1. <cwd>/.agents/skills/<name>/      — project-local, agentskills.io
  2. <cwd>/.claude/skills/<name>/      — project-local, Claude convention
  3. ~/.agents/skills/<name>/          — user-global, agentskills.io
  4. ~/.claude/skills/<name>/          — user-global, Claude convention

A directory resolves only if it contains a SKILL.md. Unresolved names are
collected and surfaced as a system-chunk warning (e.g. "Pi could not
resolve skill names: foo, bar. Searched .agents/skills and .claude/skills
(project + user-global)."), matching the semantic of "requested but not
found" without aborting the run.

Pi's buildSystemPrompt auto-appends the agentskills.io XML block for each
loaded skill, so the model sees them — no separate prompt injection needed
(Pi differs from Claude here; Claude wraps in an AgentDefinition with a
preloaded prompt, Pi uses XML block in system prompt).

Ancestor directory traversal above cwd is deliberately skipped in this
pass — matches the Pi provider's cwd-bound scope and avoids ambiguity
about which repo's skills win when Archon runs from a subdirectory.

Bun's os.homedir() bypasses the HOME env var; the resolver uses
`process.env.HOME ?? homedir()` so tests can stage a synthetic home dir.

Tests:
- 11 new tests in options-translator.test.ts cover project/user, .agents/
  vs .claude/, project-wins-over-user, SKILL.md presence check, dedup,
  missing-name collection.
- 2 new integration tests in provider.test.ts cover the missing-skill
  warning path and the "no skills configured → no additionalSkillPaths"
  path.
- registry.test.ts updated to assert skills: true in capabilities.

Live-verified locally: `.claude/skills/archon-dev/SKILL.md` resolves,
pi.session_started log shows `skillCount: 1, missingSkillCount: 0`,
smoke workflow passes in 1.2s.

* feat(providers/pi): session resume via Pi session store

Flips capabilities.sessionResume: false → true. Pi now persists sessions
under ~/.pi/agent/sessions/<encoded-cwd>/<uuid>.jsonl by default — same
pattern Claude and Codex use for their respective stores, same blast
radius as those providers.

Flow:
  - No resumeSessionId → SessionManager.create(cwd) (fresh, persisted)
  - resumeSessionId + match in SessionManager.list(cwd) → open(path)
  - resumeSessionId + no match → fresh session + system warning
    ("⚠️ Could not resume Pi session. Starting fresh conversation.")
    Matches Codex's resume_thread_failed fallback at
    packages/providers/src/codex/provider.ts:553-558.

The sessionId flows back to Archon via the terminal `result` chunk —
bridgeSession annotates it with session.sessionId unconditionally so
Archon's orchestrator can persist it and pass it as resumeSessionId on
the next turn. Same mechanism used for Claude/Codex.

Cross-cwd resume (e.g. worktree switch) is deliberately not supported in
this pass: list(cwd) scans only the current cwd's session dir. A workflow
that changes cwd mid-run lands on a fresh session, which matches Pi's
mental model.

Bridge sessionId annotation uses session.sessionId, which Pi always
populates (UUID) — so no special-case for inMemory sessions is needed.

Factored the resolver into session-resolver.ts (5 unit tests):
  - no id → create
  - id + match → open
  - id + no match → create with resumeFailed: true
  - list() throws → resumeFailed: true (graceful)
  - empty-string id → treated as "no resume requested"

Integration tests in provider.test.ts add 3 cases:
  - resume-not-found yields warning + calls create
  - resume-match calls open with the file path, no warning
  - result chunk always carries sessionId

Verified live end-to-end against Anthropic OAuth:
  - first call → sessionId 019d...; model replies "noted"
  - second call with that sessionId → "resumed: true" in logs; model
    correctly recalls prior turn ("Crimson.")
  - bogus sessionId → "⚠️ Could not resume..." warning + fresh UUID

* refactor(providers,core): generalize community-provider registration

Addresses the community-pattern regression flagged in the PR #1270 review:
a second community provider should require editing only its own directory,
not seven files across providers/ + core/ + cli/ + server/.

Three changes:

1. Drop typed `pi` slot from AssistantDefaultsConfig + AssistantDefaults.
   Community providers live behind the generic `[string]` index that
   `ProviderDefaultsMap` was explicitly designed to provide. The typed
   claude/codex slots stay — they give IDE autocomplete for built-in
   config access without `as` casts, which was the whole reason the
   intersection exists. Community providers parse their own config via
   Record<string, unknown> anyway, so the typed slot added no real
   parser safety.

2. Loop-based getDefaults + mergeAssistantDefaults. No more hardcoded
   `pi: {}` spreads. getDefaults() seeds from `getRegisteredProviders()`;
   mergeAssistantDefaults clones every slot present in `base`. Adding a
   new provider requires zero edits to this function.

3. New `registerCommunityProviders()` aggregator in registry.ts.
   Entrypoints (CLI, server, config-loader) call ONE function after
   `registerBuiltinProviders()` rather than one call per community
   provider. Adding a new community provider is now a single-line edit
   to registerCommunityProviders().

This makes Pi (and future community providers) actually behave like
Phase 2 (#1195) advertised: drop the implementation under
packages/providers/src/community/<id>/, export a `register<Id>Provider`,
add one line to the aggregator.

Tests:
- New `registerCommunityProviders` suite (2 tests: registers pi,
  idempotent).
- config-loader.test updated: assert built-in slots explicitly rather
  than exhaustive map shape.

No functional change for Pi end-users. Purely structural.

* fix(providers/pi,core): correctness + hygiene fixes from PR #1270 review

Addresses six of the review's important findings, all within the same
PR branch:

1. envInjection: false → true
   The provider reads requestOptions.env on every call (for API-key
   passthrough). Declaring the capability false caused a spurious
   dag-executor warning for every Pi user who configured codebase env
   vars — which is the MAIN auth path. Flipping to true removes the
   false positive.

2. toSafeAssistantDefaults: denylist → allowlist
   The old shape deleted `additionalDirectories`, `settingSources`,
   `codexBinaryPath` before sending defaults to the web UI. Any future
   sensitive provider field (OAuth token, absolute path, internal
   metadata) would silently leak via the `[key: string]: unknown` index
   signature. New SAFE_ASSISTANT_FIELDS map lists exactly what to
   expose per provider; unknown providers get an empty allowlist so
   the web UI sees "provider exists" but no config details.

3. AsyncQueue single-consumer invariant
   The type was documented single-consumer but unenforced. A second
   `for await` would silently race with the first over buffer +
   waiters. Added a synchronous guard in Symbol.asyncIterator that
   throws on second call — copy-paste mistakes now fail fast with a
   clear message instead of dropping items.

4. session.dispose() / session.abort() silent catches
   Both catch blocks now log at debug via a module-scoped logger so
   SDK regressions surface without polluting normal output.

5. Type scripted events as AgentSessionEvent in provider.test.ts
   Was `Record<string, unknown>` — Pi field renames would silently
   keep tests passing. Now typed against Pi's actual event union.

6. Leaked /tmp/pi-research/... path in provider.ts comment
   Local-machine path that crept in during research. Replaced with
   the upstream GitHub URL (matches convention at provider.ts:110).

Plus review-flagged simplifications:
  - Extract lookupPiModel wrapper — isolates the `as unknown as` cast
    behind one searchable name.
  - Hoist QueueItem → BridgeQueueItem at module scope (export'd for
    test visibility; not used externally yet but enables unit testing
    the mapping in isolation if needed later).
  - getRegisteredProviderNames: remove side-effecting registration
    calls. `loadConfig()` already bootstraps the registry before any
    caller can observe this helper — the hidden coupling was
    misleading.

Plus missing-coverage tests from the review (pr-test-analyzer):
  - session.prompt() rejection → error surfaces to consumer
  - pre-aborted signal → session.abort() called
  - mid-stream abort → session.abort() called
  - modelFallbackMessage → system chunk yielded
  - AsyncQueue second-consumer → throws synchronously

No behavioral changes for end users beyond the envInjection warning
fix.

* docs: Pi provider + community-provider contributor guide

Addresses the PR #1270 review's docs-impact findings: the original Pi
PR had no user-facing or contributor-facing documentation, and
architecture.md still referenced the pre-Phase-2 factory.ts pattern
(factory.ts was deleted in #1195).

1. packages/docs-web/src/content/docs/reference/architecture.md
   - Replace stale factory.ts references with the registry pattern.
   - Update inline IAgentProvider block: add getCapabilities, add
     options parameter.
   - Rewrite MessageChunk block as the actual discriminated union
     (was a placeholder with optional fields that didn't match the
     current type).
   - "Adding a New AI Agent Provider" checklist now distinguishes
     built-in (register in registerBuiltinProviders) from community
     (separate guide). Links to the new contributor guide.

2. packages/docs-web/src/content/docs/contributing/adding-a-community-provider.md (new)
   - Step-by-step guide using Pi as the reference implementation.
   - Covers: directory layout, capability discipline (start false,
     flip one at a time), provider class skeleton, registration via
     aggregator, test isolation (Bun mock.module pollution), what
     NOT to do (no edits to AssistantDefaultsConfig, no direct
     registerProvider from entrypoints, no overclaiming capabilities).

3. packages/docs-web/src/content/docs/getting-started/ai-assistants.md
   - New "Pi (Community Provider)" section: install, OAuth +
     API-key table per Pi backend, model ref format, workflow
     examples, capability matrix showing what Pi supports (session
     resume, tool restrictions, effort/thinking, skills, system
     prompt, envInjection) and what it doesn't (MCP, hooks,
     structured output, cost control, fallback model, sandbox).

4. .env.example
   - New Pi section with commented env vars for each supported
     backend (ANTHROPIC_API_KEY through HUGGINGFACE_API_KEY), each
     paired with its Pi provider id. OAuth flow (pi /login → auth.json)
     is explicitly called out — Archon reads that file too.

5. CHANGELOG.md
   - Unreleased entry for Pi, registerCommunityProviders aggregator,
     and the new contributor guide.
2026-04-17 13:52:03 +02:00
Rasmus Widing
301a139e5a
fix(core/test): split connection.test.ts from DB-test batch to avoid mock pollution (#1269)
messages.test.ts uses mock.module('./connection', ...) at module-load time.
Per CLAUDE.md:131 (Bun issue oven-sh/bun#7823), mock.module() is process-
global and irreversible. When Bun pre-loads all test files in a batch, the
mock shadows the real connection module before connection.test.ts runs,
causing getDatabaseType() to always return the mocked value regardless of
DATABASE_URL.

Move connection.test.ts into its own `bun test` invocation immediately
after postgres.test.ts (which runs alone) and before the big DB/utils/
config/state batch that contains messages.test.ts. This follows the same
isolation pattern already used for command-handler, clone, postgres, and
path-validation tests.
2026-04-17 09:33:52 +02:00
Cole Medin
bed36ca4ad
fix(workflows): add word boundary to context variable substitution regex (#1256)
* fix(workflows): add word boundary to context variable substitution regex (#1112)

Variable substitution for $CONTEXT, $EXTERNAL_CONTEXT, and $ISSUE_CONTEXT
was matching as a prefix of longer identifiers like $CONTEXT_FILE, silently
corrupting bash node scripts. Added negative lookahead (?![A-Za-z0-9_]) to
CONTEXT_VAR_PATTERN_STR so only exact variable names are substituted.

Changes:
- Add negative lookahead to CONTEXT_VAR_PATTERN_STR regex in executor-shared.ts
- Add regression test for prefix-match boundary case

Fixes #1112

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(workflows): add missing boundary cases for context variable substitution

Add three new test cases that complete coverage of the word-boundary fix
from #1112: $ISSUE_CONTEXT with suffix variants, $ISSUE_CONTEXT with multiple
suffixes, and contextSubstituted=false for suffix-only prompts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:32:06 -05:00
Cole Medin
df828594d7 fix(test): normalize on-disk content to LF in bundled-defaults test
Companion to 75427c7c. The bundle-completeness test compared
BUNDLED_* strings (now LF-normalized by the generator) against raw
readFileSync output, which is CRLF on Windows checkouts. Apply the
same normalization to the on-disk side so the defense-in-depth check
stays meaningful on every platform.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-16 17:59:41 -05:00
Leex
9dd57b2f3c fix(web): unify Add Project URL/path classification across UI entry points
Settings → Projects Add Project only submitted { path }, so GitHub URLs
entered there failed even though the API and the Sidebar Add Project
already accepted them. Closes #1108.

Changes:
- Add packages/web/src/lib/codebase-input.ts: shared getCodebaseInput()
  helper returning a discriminated { path } | { url } union (re-exported
  from api.ts for convenience).
- Use the helper from all three Add Project entry points: Sidebar,
  Settings, and ChatPage. Removes three divergent inline heuristics.
- SettingsPage: rename addPath → addValue (state now holds either URL
  or local path) and update placeholder text.
- Tests: cover https://, git@ shorthand, ssh://, git://, whitespace,
  unix/relative/home/Windows/UNC paths.
- Docs: document the unified Add Project entry point in adapters/web.md.

Heuristic flips from "assume URL unless explicitly local" to "assume
local unless explicitly remote" — only inputs starting with https?://,
ssh://, git@, or git:// are sent as { url }; everything else is sent
as { path }. The server already resolves tilde/relative paths.

Co-authored-by: Nguyen Huu Loc <lockbkbang@gmail.com>
2026-04-16 23:43:19 +02:00
Rasmus Widing
86e4c8d605
fix(bundled-defaults): auto-generate import list, emit inline strings (#1263)
* fix(bundled-defaults): auto-generate import list, emit inline strings

Root-cause fix for bundle drift (15 commands + 7 workflows previously
missing from binary distributions) and a prerequisite for packaging
@archon/workflows as a Node-loadable SDK.

The hand-maintained `bundled-defaults.ts` import list is replaced by
`scripts/generate-bundled-defaults.ts`, which walks
`.archon/{commands,workflows}/defaults/` and emits a generated source
file with inline string literals. `bundled-defaults.ts` becomes a thin
facade that re-exports the generated records and keeps the
`isBinaryBuild()` helper.

Inline strings (via JSON.stringify) replace Bun's
`import X from '...' with { type: 'text' }` attributes. The binary build
still embeds the data at compile time, but the module now loads under
Node too — removing SDK blocker #2.

- Generator: `scripts/generate-bundled-defaults.ts` (+ `--check` mode for CI)
- `package.json`: `generate:bundled`, `check:bundled`; wired into `validate`
- `build-binaries.sh`: regenerates defaults before compile
- Test: `bundle completeness` now derives expected set from on-disk files
- All 56 defaults (36 commands + 20 workflows) now in the bundle

* fix(bundled-defaults): address PR review feedback

Review: https://github.com/coleam00/Archon/pull/1263#issuecomment-4262719090

Generator:
- Guard against .yaml/.yml name collisions (previously silent overwrite)
- Add early access() check with actionable error when run from wrong cwd
- Type top-level catch as unknown; print only message for Error instances
- Drop redundant /* eslint-disable */ emission (global ignore covers it)
- Fix misleading CI-mechanism claim in header comment
- Collapse dead `if (!ext) continue` guard into a single typed pass

Scripts get real type-checking + linting:
- New scripts/tsconfig.json extending root config
- type-check now includes scripts/ via `tsc --noEmit -p scripts/tsconfig.json`
- Drop `scripts/**` from eslint ignores; add to projectService file scope

Tests:
- Inline listNames helper (Rule of Three)
- Drop redundant toBeDefined/typeof assertions; the Record<string, string>
  type plus length > 50 already cover them
- Add content-fidelity round-trip assertion (defense against generator
  content bugs, not just key-set drift)

Facade comment: drop dead reference to .claude/rules/dx-quirks.md.

CI: wire `bun run check:bundled` into .github/workflows/test.yml so the
header's CI-verification claim is truthful.

Docs: CLAUDE.md step count four→five; add contributor bullet about
`bun run generate:bundled` in the Defaults section and CONTRIBUTING.md.

* chore(e2e): bump Codex model to gpt-5.2

gpt-5.1-codex-mini is deprecated and unavailable on ChatGPT-account Codex
auth. Plain gpt-5.2 works. Verified end-to-end:

- e2e-codex-smoke: structured output returns {category:'math'}
- e2e-mixed-providers: claude+codex both return expected tokens
2026-04-16 21:27:51 +02:00
Cole Medin
d535c832e3
feat(telemetry): anonymous PostHog workflow-invocation tracking (#1262)
* feat(telemetry): add anonymous PostHog workflow-invocation tracking

Emits one `workflow_invoked` event per run with workflow name/description,
platform, and Archon version. Uses a stable random UUID persisted to
`$ARCHON_HOME/telemetry-id` for distinct-install counting, with
`$process_person_profile: false` to stay in PostHog's anonymous tier.

Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host
via `POSTHOG_API_KEY` / `POSTHOG_HOST`.

Closes #1261

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(telemetry): stop leaking test events to production PostHog

The `telemetry-id preservation` test exercised the real capture path with
the embedded production key, so every `bun run validate` published a
tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback
so the flush fails silently; bump test timeout to accommodate the
retry-then-give-up window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(telemetry): silence posthog-node stderr leak on network failure

The PostHog SDK's internal logFlushError() writes 'Error while flushing
PostHog' directly to stderr via console.error on any network or HTTP
error, bypassing logger config. For a fire-and-forget telemetry path
this leaked stack traces to users' terminals whenever PostHog was
unreachable (offline, firewalled, DNS broken, rate-limited).

Pass a silentFetch wrapper to the PostHog client that masks failures as
fake 200 responses. The SDK never sees an error, so it never logs.
Original failure is still recorded at debug level for diagnostics.

Side benefit: shutdown is now fast on network failure (no retry loop),
so offline CLI commands no longer hang ~10s on exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(telemetry): make id-preservation test deterministic

Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback
dance with a direct synchronous call to getOrCreateTelemetryId(). Export
the function with an @internal marker so tests can exercise the id path
without spinning up the PostHog client. No network, no timer, no flake.

Addresses CodeRabbit feedback on #1262.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-16 13:45:55 -05:00
Cole Medin
367de7a625 test(ci): inject deliberate failure to verify CI red X
Injects exit 1 into e2e-deterministic bash-echo node to prove the engine
fix (failWorkflowRun on anyFailed) propagates to a non-zero CLI exit code
and a red X in GitHub Actions. Will be reverted in the next commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:40:55 -05:00
Cole Medin
7721259bdc
fix(core): surface auth errors instead of silently dropping them (#1089)
* fix: surface auth errors instead of silently dropping them (#1076)

When Claude OAuth refresh token is expired, the SDK yields a result chunk
with is_error=true and no session_id. Both handleStreamMode and
handleBatchMode guarded the result branch with `&& msg.sessionId`,
silently dropping the error. Users saw no response at all.

Changes:
- Remove sessionId guard from result branches in orchestrator-agent.ts
- Add isError early-exit that sends error message to user
- Add 4 OAuth patterns to AUTH_PATTERNS in claude.ts and codex.ts
- Add OAuth refresh-token handler to error-formatter.ts
- Add tests for new error-formatter branches

Fixes #1076

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add structured logging to isError path and remove overly broad auth pattern

- Add getLog().warn({ conversationId, errorSubtype }, 'ai_result_error') in both
  handleStreamMode and handleBatchMode isError branches so auth failures are
  visible server-side instead of silently swallowed
- Remove 'access token' from AUTH_PATTERNS in claude.ts and codex.ts; the real
  OAuth refresh error is already covered by 'refresh token' and 'could not be
  refreshed', eliminating false-positive auth classification risk

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: route isError results through classifyAndFormatError with provider-specific messages

The isError path in stream/batch mode used a hardcoded generic message,
bypassing the classifyAndFormatError infrastructure. Now constructs a
synthetic Error from errorSubtype and routes through the formatter.

Error formatter updated with provider-specific auth detection:
- Claude: OAuth token refresh, sign-in expired → guidance to run /login
- Codex: 401 retry exhaustion → guidance to run codex login
- General: tightened patterns (removed broad 'auth error' substring match)

Also persists session ID before early-returning on isError.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:36:40 -05:00
Cole Medin
818854474f
fix(workflows): stop warning about model/provider on loop nodes (#1090)
* fix(workflows): stop warning about model/provider on loop nodes (#1082)

The loader incorrectly classified loop nodes as "non-AI nodes" and warned
that model/provider fields were ignored, even though the DAG executor has
supported these fields on loop nodes since commit 594d5daa.

Changes:
- Add LOOP_NODE_AI_FIELDS constant excluding model/provider from the warn list
- Update loader to use LOOP_NODE_AI_FIELDS for loop node field checking
- Fix BASH_NODE_AI_FIELDS comment that incorrectly referenced loop nodes
- Add tests for loop node model/provider acceptance and unsupported field warnings

Fixes #1082

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(workflows): update stale comment and add LOOP_NODE_AI_FIELDS unit tests

- Update section comment from "bash/loop nodes" to "non-AI nodes" since loop
  nodes do support model/provider (the fix in this PR)
- Export LOOP_NODE_AI_FIELDS from schemas/index.ts alongside BASH/SCRIPT variants
- Add dedicated describe block in schemas.test.ts verifying that model and
  provider are excluded and all other BASH_NODE_AI_FIELDS are still present

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* simplify: merge nodeType and aiFields into a single if/else chain in parseDagNode

Eliminates the separate isNonAiNode predicate and nested ternary for aiFields
selection by combining both into one explicit if/else block — each branch sets
nodeType and aiFields together, removing the need to re-check node type twice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:19:18 -05:00
Cole Medin
a5e5d5ceeb fix: address review findings for grammY Telegram adapter
- Fix misleading 'unde***' log when ctx.from is undefined; use 'unknown'
  to match the Slack/Discord adapter pattern
- Log post-startup bot runtime errors before reject() (no-op after
  onStart fires but errors are now visible in logs)
- Add debug log when message is dropped due to no handler registered
- Add stop() unit test to guard against grammY API rename regressions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 09:15:47 -05:00
Cole Medin
da1f8b7d97 fix: replace Telegraf with grammY to fix Bun TypeError crash (#1042)
Telegraf v4's internal `redactToken()` assigns to readonly `error.message`
properties, which crashes under Bun's strict ESM mode. Telegraf is EOL.

Changes:
- Replace `telegraf` dependency with `grammy` ^1.36.0
- Migrate adapter from Telegraf API to grammY API (Bot, bot.api, bot.start)
- Use grammY's `onStart` callback pattern for async polling launch
- Preserve 409 retry logic and all existing behavior
- Update test mocks from telegraf types to grammy types

Fixes #1042

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:15:46 -05:00
Cole Medin
2732288f07
Merge pull request #1065 from coleam00/archon/task-fix-issue-1055
feat(core): inject workflow run context into orchestrator prompt
2026-04-16 07:55:00 -05:00
Cole Medin
b100cd4b48
Merge pull request #1064 from coleam00/archon/task-fix-issue-1054
fix(web): interleave tool calls with text during SSE streaming
2026-04-16 07:44:48 -05:00
Cole Medin
5acf5640c8
Merge pull request #1063 from coleam00/archon/task-fix-issue-1035
fix: archon setup --spawn fails on Windows with spaces in repo path
2026-04-16 07:36:58 -05:00
Cole Medin
68ecb75f0f
Merge pull request #1052 from coleam00/archon/task-fix-github-issue-1775831868291
fix(cli): send workflow dispatch/result messages for Web UI cards
2026-04-16 07:32:52 -05:00
Cole Medin
51b8652d43 fix: complete defensive chaining and add missing test coverage for PR #1052
- Fix half-applied optional chaining in WorkflowProgressCard refetchInterval
  (query.state.data?.run.status → ?.run?.status) preventing TypeError in polling
- Add dispatch-failure test verifying executeWorkflow still runs when
  dispatch sendMessage fails
- Add paused-workflow test proving paused guard fires before summary check
- Strengthen dispatch metadata assertion to verify workerConversationId format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 07:32:37 -05:00
Rasmus Widing
882fc58f7c
fix: stop server startup from auto-failing in-flight workflow runs (#1216) (#1231)
* fix: stop server startup from auto-failing in-flight workflow runs (#1216)

`failOrphanedRuns()` at server startup unconditionally flipped every
`running` workflow row to `failed`, including runs actively executing in
another process (CLI / adapters). The dag-executor's between-layer
status check then bailed out of the run, exit code 1 — even though every
node had completed successfully. Same class of bug the CLI already
learned (see comment at packages/cli/src/cli.ts:256-258).

Per the new CLAUDE.md principle "No Autonomous Lifecycle Mutation Across
Process Boundaries", we don't replace the call with a timer-based
heuristic. Instead we remove it and surface running workflows to the
user with one-click actions.

Backend
- `packages/server/src/index.ts` — remove the `failOrphanedRuns()` call
  at startup. Replace with explanatory comment referencing the CLI
  precedent and the CLAUDE.md principle. The function in
  `packages/core/src/db/workflows.ts:911` is preserved for use by the
  explicit `archon workflow cleanup` command.

UI
- `packages/web/src/components/layout/TopNav.tsx` — replace the binary
  pulse dot on the Dashboard nav with a numeric count badge sourced
  from `/api/dashboard/runs` `counts.running`. Hidden when count is 0.
  Same 10s polling interval as before. No animation — a steady factual
  count is honest; a pulse would imply system judgment.

- `packages/web/src/components/dashboard/ConfirmRunActionDialog.tsx`
  (new) — shadcn AlertDialog wrapper for destructive workflow-run
  actions, mirroring the codebase-delete pattern in
  `sidebar/ProjectSelector.tsx`. Caller passes the existing button as
  `trigger` slot; dialog handles open/close via Radix.

- `packages/web/src/components/dashboard/WorkflowRunCard.tsx` — replace
  4 `window.confirm()` callsites (Reject, Abandon, Cancel, Delete) with
  ConfirmRunActionDialog. Each gets a context-appropriate description.

- `packages/web/src/components/dashboard/WorkflowHistoryTable.tsx` —
  replace 1 `window.confirm()` (Delete) with the same dialog.

CHANGELOG entries under [Unreleased]: Fixed for #1216, two Changed
entries for the nav badge and dialog upgrade.

No new tests: the web package has no React component testing
infrastructure (existing `bun test` covers `src/lib/` and `src/stores/`
only). Type-check + lint + manual UI verification + the backend
reproducer are the verification levels.

Closes #1216.

* review: address PR #1231 nits — stale doc + 3 code polish

PR review surfaced one real correctness issue in docs and three small
code polish items. None block merge; addressing for cleanliness.

- packages/docs-web/src/content/docs/guides/authoring-workflows.md:486
  removed the "auto-marked as failed on next startup" paragraph that
  described the now-deleted behavior. Replaced with a "Crashed servers /
  orphaned runs" note pointing users at `archon workflow cleanup` and
  the dashboard Cancel/Abandon buttons; explains the auto-resume
  mechanism still works once the row reaches a terminal status.

- ConfirmRunActionDialog: narrow `onConfirm` from
  `() => void | Promise<void>` to `() => void`. All five callsites are
  synchronous wrappers around React Query mutations whose error
  handling lives at the page level (`runAction` in DashboardPage). The
  union widened the API for no current caller. Documented in the JSDoc
  what to do if an awaiting caller appears later.

- TopNav: dropped the redundant `String(runningCount)` cast in the
  aria-label — template literal coerces. Also rewrote the comment above
  the `listDashboardRuns` query: the previous version implied `limit=1`
  constrained `counts.running`; in fact `counts` is a server-side
  aggregate independent of `limit`, and `limit=1` only minimises the
  `runs` array we discard.

* review: correct remediation docs — cleanup ≠ abandon

CodeRabbit caught a factual error I introduced in the doc update:
`archon workflow cleanup` calls `deleteOldWorkflowRuns(days)` which
DELETEs old terminal rows (`completed`/`failed`/`cancelled` older than
N days) for disk hygiene. It does NOT transition stuck `running` rows.

The correct remediation for a stuck `running` row is either the
dashboard's per-row Cancel/Abandon button (already documented) or
`archon workflow abandon <run-id>` from the CLI (existing subcommand,
see packages/cli/src/cli.ts:366-374).

Fixed three locations:
- packages/docs-web/.../guides/authoring-workflows.md — replaced the
  vague "clean up explicitly" with concrete Web UI / CLI instructions
  and an explicit "Not to be confused with `archon workflow cleanup`"
  callout to close off the ambiguity CodeRabbit flagged.
- packages/server/src/index.ts — comment updated to point at the
  correct remediation (`archon workflow abandon`) and clarify that
  `archon workflow cleanup` is unrelated disk-hygiene.
- CHANGELOG.md — same correction in the [Unreleased] Fixed entry.
2026-04-15 12:05:41 +03:00
Rasmus Widing
5c8c39e5c9
fix(test): update stale mocks in cleanup-service 'continues processing' test (#1230) (#1232)
After PR #1034 changed worktree existence checks from execFileAsync to
fs/promises.access, the mockExecFileAsync rejections had no effect.
removeEnvironment needs getById + getCodebase mocks to proceed past
the early-return guard, otherwise envs route to report.skipped instead
of report.removed.

Replace the two stale mockExecFileAsync rejection calls with proper
mockGetById and mockGetCodebase return values for both test environments.

Fixes #1230
2026-04-15 11:53:02 +03:00
Shane McCarron
f61d576a4d
feat(isolation): auto-init submodules in worktrees (#1189)
Worktrees created via `git worktree add` do not initialize submodules — monorepo workflows that need submodule content find empty directories. Auto-detect `.gitmodules` and run `git submodule update --init --recursive` after worktree creation; classify failures through the isolation error pipeline.

Behavior:
- `.gitmodules` absent → skip silently (zero-cost probe, no effect on non-submodule repos)
- `.gitmodules` present → run submodule init by default (opt out via `worktree.initSubmodules: false`)
- submodule init or `.gitmodules` read failure → throw with classified error including opt-out guidance
- Only `ENOENT` on `.gitmodules` is treated as "no submodules"; other access errors (EACCES, EIO) surface as failures to prevent silent empty-dir worktrees

Changes:
- `packages/isolation/src/providers/worktree.ts` — `initSubmodules()` method + call site in `createWorktree()`
- `packages/isolation/src/errors.ts` — collapsed `errorPatterns` + `knownPatterns` into single `ERROR_PATTERNS` source of truth with `known: boolean` per entry; added submodule pattern with opt-out guidance
- `packages/isolation/src/types.ts` + `packages/core/src/config/config-types.ts` — new `initSubmodules?: boolean` config option
- `packages/docs-web/src/content/docs/reference/configuration.md` — documented the new option and submodule behavior
- Tests: default-on, explicit opt-in, explicit opt-out, skip-when-absent, fail-fast on EACCES, fail-fast on git failure, fail-fast on timeout

Credit to @halindrome for the original implementation and root-cause mapping across #1183, #1187, #1188, #1192.

Follow-up: #1192 (codebase identity rearchitect) would retire the cross-clone guard code in `resolver.ts` and `worktree.ts` that #1198, #1206 added. Separate PR.

Closes #1187
2026-04-15 09:48:18 +03:00
Kagura
73d9240eb3
fix(isolation): complete reports false success when worktree remains on disk (fixes #964) (#1034)
* fix(isolation): complete reports false success when worktree remains on disk (fixes #964)

Three changes to prevent ghost worktrees:

1. isolationCompleteCommand now checks result.worktreeRemoved — if the
   worktree was not actually removed (partial failure), it reports
   'Partial' with warnings and counts as failed, not completed.
   Previously only skippedReason was checked; a destroy that returned
   successfully but with worktreeRemoved=false would still print
   'Completed'.

2. WorktreeProvider.destroy() now runs 'git worktree prune' after
   removal to clean up stale worktree references that git may keep
   even after the directory is removed.

3. WorktreeProvider.destroy() adds post-removal verification: after
   git worktree remove, it checks 'git worktree list --porcelain' to
   confirm the worktree is actually unregistered. If still registered,
   worktreeRemoved is set back to false with a descriptive warning.

* fix: address CodeRabbit review — ghost worktree prune, partial cleanup callers, accurate messages

* test: add regression test for Partial branch in isolation complete

Exercises the !result.worktreeRemoved path (without skippedReason)
that was flagged as uncovered by CodeRabbit review.
2026-04-14 17:58:45 +03:00
Matt Chapman
28b258286f
Extra backticks for markdown block to fix formatting (#1218)
of nested code blocks.
2026-04-14 17:58:31 +03:00
Rasmus Widing
81859d6842
fix(providers): replace Claude SDK embed with explicit binary-path resolver (#1217)
* feat(providers): replace Claude SDK embed with explicit binary-path resolver

Drop `@anthropic-ai/claude-agent-sdk/embed` and resolve Claude Code via
CLAUDE_BIN_PATH env → assistants.claude.claudeBinaryPath config → throw
with install instructions. The embed's silent failure modes on macOS
(#1210) and Windows (#1087) become actionable errors with a documented
recovery path.

Dev mode (bun run) remains auto-resolved via node_modules. The setup
wizard auto-detects Claude Code by probing the native installer path
(~/.local/bin/claude), npm global cli.js, and PATH, then writes
CLAUDE_BIN_PATH to ~/.archon/.env. Dockerfile pre-sets CLAUDE_BIN_PATH
so extenders using the compiled binary keep working. Release workflow
gets negative and positive resolver smoke tests.

Docs, CHANGELOG, README, .env.example, CLAUDE.md, test-release and
archon skills all updated to reflect the curl-first install story.

Retires #1210, #1087, #1091 (never merged, now obsolete).
Implements #1176.

* fix(providers): only pass --no-env-file when spawning Claude via Bun/Node

`--no-env-file` is a Bun flag that prevents Bun from auto-loading
`.env` from the subprocess cwd. It is only meaningful when the Claude
Code executable is a `cli.js` file — in which case the SDK spawns it
via `bun`/`node` and the flag reaches the runtime.

When `CLAUDE_BIN_PATH` points at a native compiled Claude binary (e.g.
`~/.local/bin/claude` from the curl installer, which is Anthropic's
recommended default), the SDK executes the binary directly. Passing
`--no-env-file` then goes straight to the native binary, which
rejects it with `error: unknown option '--no-env-file'` and the
subprocess exits code 1.

Emit `executableArgs` only when the target is a `.js` file (dev mode
or explicit cli.js path). Caught by end-to-end smoke testing against
the curl-installed native Claude binary.

* docs: record env-leak validation result in provider comment

Verified end-to-end with sentinel `.env` and `.env.local` files in a
workflow CWD that the native Claude binary (curl installer) does not
auto-load `.env` files. With Archon's full spawn pathway and parent
env stripped, the subprocess saw both sentinels as UNSET. The
first-layer protection in `@archon/paths` (#1067) handles the
inheritance leak; `--no-env-file` only matters for the Bun-spawned
cli.js path, where it is still emitted.

* chore(providers): cleanup pass — exports, docs, troubleshooting

Final-sweep cleanup tied to the binary-resolver PR:

- Mirror Codex's package surface for the new Claude resolver: add
  `./claude/binary-resolver` subpath export and re-export
  `resolveClaudeBinaryPath` + `claudeFileExists` from the package
  index. Renames the previously single `fileExists` re-export to
  `codexFileExists` for symmetry; nothing outside the providers
  package was importing it.
- Add a "Claude Code not found" entry to the troubleshooting reference
  doc with platform-specific install snippets and pointers to the
  AI Assistants binary-path section.
- Reframe the example claudeBinaryPath in reference/configuration.md
  away from cli.js-only language; it accepts either the native binary
  or cli.js.

* test+refactor(providers, cli): address PR review feedback

Two test gaps and one doc nit from the PR review (#1217):

- Extract the `--no-env-file` decision into a pure exported helper
  `shouldPassNoEnvFile(cliPath)` so the native-binary branch is unit
  testable without mocking `BUNDLED_IS_BINARY` or running the full
  sendQuery pathway. Six new tests cover undefined, cli.js, native
  binary (Linux + Windows), Homebrew symlink, and suffix-only matching.
  Also adds a `claude.subprocess_env_file_flag` debug log so the
  security-adjacent decision is auditable.

- Extract the three install-location probes in setup.ts into exported
  wrappers (`probeFileExists`, `probeNpmRoot`, `probeWhichClaude`) and
  export `detectClaudeExecutablePath` itself, so the probe order can be
  spied on. Six new tests cover each tier winning, fall-through
  ordering, npm-tier skip when not installed, and the
  which-resolved-but-stale-path edge case.

- CLAUDE.md `claudeBinaryPath` placeholder updated to reflect that the
  field accepts either the native binary or cli.js (the example value
  was previously `/absolute/path/to/cli.js`, slightly misleading now
  that the curl-installer native binary is the default).

Skipped from the review by deliberate scope decision:

- `resolveClaudeBinaryPath` async-with-no-await: matches Codex's
  resolver signature exactly. Changing only Claude breaks symmetry;
  if pursued, do both providers in a separate cleanup PR.
- `isAbsolute()` validation in parseClaudeConfig: Codex doesn't do it
  either. Resolver throws on non-existence already.
- Atomic `.env` writes in setup wizard: pre-existing pattern this PR
  touched only adjacently. File as separate issue if needed.
- classifyError branch in dag-executor for setup errors: scope creep.
- `.env.example` "missing #" claim: false positive (verified all
  CLAUDE_BIN_PATH lines have proper comment prefixes).

* fix(test): use path.join in Windows-compatible probe-order test

The "tier 2 wins (npm cli.js)" test hardcoded forward-slash path
comparisons, but `path.join` produces backslashes on Windows. Caused
the Windows CI leg of the test suite to fail while macOS and Linux
passed. Use `path.join` for both the mock return value and the
expectation so the separator matches whatever the platform produces.
2026-04-14 17:56:37 +03:00
Rasmus Widing
33d31c44f1
fix: lock workflow runs by working_path (#1036, #1188 part 2) (#1212)
* fix: lock workflow runs by working_path (#1036, #1188 part 2)

Both bugs reduce to the same primitive: there's no enforced lock on
working_path, so two dispatches that resolve to the same filesystem
location can race. The DB row is the lock token; pending/running/paused
are "lock held"; terminal statuses release.

Changes:

- getActiveWorkflowRunByPath includes `pending` (with 5-min stale-orphan
  age window), accepts excludeId + selfStartedAt, and orders by
  (started_at ASC, id ASC) for a deterministic older-wins tiebreaker.
  Eliminates the both-abort race where two near-simultaneous dispatches
  with similar timestamps could mutually abort each other.

- Move the executor's guard call site to AFTER workflowRun is finalized
  (preCreated, resumed, or freshly created). This guarantees we always
  have self-ID + started_at to pass to the lock query.

- On guard fire after row creation: mark self as 'cancelled' so we don't
  leave a zombie pending row that would then become its own lock holder.

- New error message includes workflow name, duration, short run id, and
  three concrete next-action commands (status / cancel / different
  branch). Replaces the vague "Workflow already running".

- Resume orphan fix: when executor activates a resumable run, mark the
  orchestrator's pre-created row as 'cancelled'. Without this, every
  resume leaks a pending row that would block the user's own
  back-to-back resume until the 5-min stale window.

- New formatDuration helper for the error message (8 unit tests).

Tests:

- 5 new tests in db/workflows.test.ts: pending in active set, age window,
  excludeId exclusion, tiebreaker SQL shape, ordering.
- 5 new tests in executor.test.ts: self-id passed to query, self-cancel
  on guard fire, new message format, resume orphan cancellation,
  resume proceeds even if orphan cancel fails.
- Updated 2 executor-preamble tests for new structural behavior
  (row-then-guard, new message format).
- 8 new tests for formatDuration.

Deferred (kept scope tight):
- Worktree-layer advisory lockfile (residual #1188.2 microsecond race
  where both dispatches reach provider.create — bounded by git's own
  atomicity for `worktree add`).
- Startup cleanup of pre-existing stale pending rows (5-min age window
  makes them harmless).
- DB partial UNIQUE constraint migration (code-only is sufficient).

Fixes #1036
Fixes #1188 (part 2)

* fix: SQLite Date binding + UTC timestamp parse for path lock guard

Two issues found during E2E smoke testing:

1. bun:sqlite rejects Date objects as bindings ("Binding expected
   string, TypedArray, boolean, number, bigint or null"). Serialize
   selfStartedAt to ISO string before passing — PostgreSQL accepts
   ISO strings for TIMESTAMPTZ comparison too.

2. SQLite returns datetimes as plain strings without timezone suffix
   ("YYYY-MM-DD HH:MM:SS"), and JS new Date() parses such strings as
   local time. The blocking message was showing "running 3h" for
   workflows started seconds ago in a UTC+3 timezone.

   Added parseDbTimestamp helper that:
   - Returns Date.getTime() unchanged for Date inputs (PG path)
   - Treats SQLite-style strings as UTC by appending Z

   Used at both call sites: the lock query (selfStartedAt) and the
   blocking message duration.

Tests:
- 4 new tests in duration.test.ts for parseDbTimestamp covering
  Date input, SQLite UTC interpretation, explicit Z, and explicit
  +/-HH:MM offsets.
- Updated workflows.test.ts assertion for ISO serialization.

E2E smoke verified end-to-end:
- Sanity (single dispatch) succeeds.
- Two concurrent --no-worktree dispatches: one wins, one blocked
  with actionable message showing correct "Xs" duration.
- Resume + back-to-back resume both succeed (orphan correctly
  cancelled when resume activates).

* fix: address review — resume timestamp, lock-leak paths, status copy

CodeRabbit review on #1212 surfaced three real correctness gaps:

CRITICAL — resumeWorkflowRun preserved historical started_at, letting
a resumed row sort ahead of a currently-active holder in the lock
query's older-wins tiebreaker. Two active workflows could end up on
the same working_path. Fix: refresh started_at to NOW() in
resumeWorkflowRun. Original creation time is recoverable from
workflow_events history if needed for analytics.

MAJOR — lock-leak failure paths:
- If resumeWorkflowRun() throws, the orchestrator's pre-created row
  was left as 'pending' until the 5-min stale window. Fix: cancel
  preCreatedRun in the resume catch.
- If getActiveWorkflowRunByPath() throws, workflowRun (possibly
  already promoted to 'running' via resume) was left active with no
  auto-cleanup. Fix: cancel workflowRun in the guard catch.

MINOR — the blocking message always said "running" but the lock
query returns running, paused, AND fresh-pending rows. Telling a
user to "wait for it to finish" on a paused run (waiting on user
approval) would block them indefinitely. Fix: status-aware copy:
- paused: "paused waiting for user input" + approve/reject actions
- pending: "starting" verb
- running: keep current

Tests:
- New: resume refreshes started_at (asserts SQL contains
  `started_at = NOW()`)
- New: cancels preCreatedRun when resumeWorkflowRun throws
- New: cancels workflowRun when guard query throws
- New: paused message uses approve/reject actions, NOT "wait"
- New: pending message uses "starting" verb
- New: running message uses default copy
- Updated: existing tests for new error string ("already active"
  reflects status-aware semantics, not just "running")

Note: the user-facing error string changed from "already running on
this path" to "already active on this path (status)". Internal use
only — surfaced via getResult().error, not directly to users.

* fix: SQLite tiebreaker dialect bug + paired self struct + UX polish

CodeRabbit second review found one critical issue and several polish
items not addressed in 008013da.

CRITICAL — SQLite tiebreaker silently broken under default deployment.
SQLite stores started_at as TEXT "YYYY-MM-DD HH:MM:SS" (space sep).
Our ISO param is "YYYY-MM-DDTHH:MM:SS.mmmZ" (T sep). SQLite compares
text lexically: char 11 is space (0x20) in column vs T (0x54) in param,
so EVERY column value lex-sorts before EVERY ISO param. Result:
`started_at < $param` is always TRUE regardless of actual time. In
true concurrent dispatches, both sides see each other as "older" and
both abort — defeating the older-wins guarantee under SQLite, which
is the default deployment.

Fix: dialect-aware comparison in getActiveWorkflowRunByPath:
  - PostgreSQL: `started_at < $3::timestamptz` (TIMESTAMPTZ + cast)
  - SQLite: `datetime(started_at) < datetime($3)` (forces chronological
    via SQLite's date/time functions)

Documented with reproducer tests in adapters/sqlite.test.ts: lexical
returns wrong answer for "2026-04-14 12:00:00" < "2026-04-14T10:00:00Z";
datetime() returns correct answer.

Type design — collapse paired params into struct.
`excludeId` and `selfStartedAt` had to travel together (tiebreaker
references both) but were two independent optionals — future callers
could pass one without the other and silently degrade. Replaced with
a single `self?: { id: string; startedAt: Date }` to make the
paired-or-nothing invariant structural.

formatDuration(0) consistency.
Old: `if (ms <= 0) return '0s'` — special-cased 0ms despite the
"sub-second rounds up to 1s" comment. Fixed to `ms < 0` so 0ms
returns '1s' (a run that just started in the same DB second should
display as active, not literal zero).

Comment fix: "We acquired the lock via createWorkflowRun" was
misleading — createWorkflowRun creates a row; the lock is determined
later by the query.

Log context: added cwd to workflow.guard_self_cancel_failed and
pendingRunId to db_active_workflow_check_failed so operators can
correlate leaked rows.

Doc fixes:
- /workflow abandon doc said "marks as failed" — actually 'cancelled'
- database.md "Prevents concurrent workflow execution" → accurate
  description of path-based lock with stale-pending tolerance

Test additions:
- 3 SQLite-direct tests in adapters/sqlite.test.ts proving the
  lexical-vs-chronological bug and the datetime() fix
- Guard self-cancel update throw still surfaces failure to user

Signature change rippled through:
- IWorkflowStore.getActiveWorkflowRunByPath now takes (path, self?)
- All internal callers updated
2026-04-14 15:19:38 +03:00
Rasmus Widing
5a4541b391
fix: route canonical path failures through blocked classification (#1211)
Follow-up to #1206 review: the early getCanonicalRepoPath() wrap in
resolve() threw directly, escaping the classification flow that
createNewEnvironment uses. Permission errors, malformed worktree
pointers, ENOENT, etc. surfaced as unclassified crashes instead of
becoming an actionable `blocked` result.

Mirror createNewEnvironment's contract:
- isKnownIsolationError → return { status: 'blocked', reason:
  'creation_failed', userMessage: classifyIsolationError(err) + suffix }
- unknown errors → throw (programming bugs stay visible as crashes,
  not silent isolation failures)

Adds two tests in resolver.test.ts:
- EACCES classifies to "Permission denied" blocked message
- Unknown error propagates as throw

Addresses CodeRabbit review comment on #1206.
2026-04-14 15:19:13 +03:00
Rasmus Widing
fd3f043125
fix: extend worktree ownership guard to resolver adoption paths (#1206)
* fix: extend worktree ownership guard to resolver adoption paths (#1183, #1188)

PR #1198 guarded WorktreeProvider.findExisting(), but IsolationResolver
has three earlier adoption paths that bypass the provider layer:

- findReusable (DB lookup by workflow identity)
- findLinkedIssueEnv (cross-reference via linked issues)
- tryBranchAdoption (PR branch discovery)

Two clones of the same remote share codebase_id (identity is derived
from owner/repo). Without these guards, clone B silently adopts clone
A's worktree via any of the three paths.

Changes:
- Extract verifyWorktreeOwnership from WorktreeProvider (private) to
  @archon/git/src/worktree.ts as an exported function, sitting next to
  getCanonicalRepoPath which parses the same .git file format
- Call the shared function from all three resolver paths; throw on
  cross-clone mismatch (DB rows are preserved — they legitimately
  belong to the other clone)
- Compute canonicalRepoPath once at the top of resolve()
- Six new tests in resolver.test.ts covering each guarded path's
  cross-checkout and same-clone behaviors

Fixes #1183
Fixes #1188 (part 1 — cross-checkout; part 2 parallel collision deferred
to follow-up alongside #1036)

* fix: address PR review — polish, observability, secondary gap, docs

Addresses the multi-agent review on #1206:

Code fixes:
- worktree.adoption_refused_cross_checkout log event renamed to match
  CLAUDE.md {domain}.{action}_{state} convention
- verifyWorktreeOwnership now preserves err.code and err via { cause }
  when wrapping fs errors, so classifyIsolationError is robust to Node
  message format changes
- Structured fields (codebaseId, canonicalRepoPath) added to all
  cross-clone rejection logs for incident debugging
- Wrap getCanonicalRepoPath at top of resolve() with classified error
  instead of letting it propagate as an unclassified crash
- Extract assertWorktreeOwnership helper on IsolationResolver —
  centralizes warn-then-rethrow contract, removes duplication
- Dedupe toWorktreePath(existing.working_path) calls in resolver paths
- Add code comment on findLinkedIssueEnv explaining why throw-on-first
  is intentional (user decision — surfaces anomaly instead of masking)

Secondary gap closed:
- WorktreeProvider.findExisting PR-branch adoption path
  (findWorktreeByBranch) now also verifies ownership — same class of
  bug as the main path, just via a different lookup

Tests:
- 8 new unit tests for verifyWorktreeOwnership in @archon/git
  (matching pointer, different clone, EISDIR/ENOENT errno preservation,
  submodule pointer, corrupted .git, trailing-slash normalization,
  cause chain)
- tryBranchAdoption cross-clone test now asserts store.create was
  never called (symmetry with paths 1+2 asserting updateStatus)
- New test for cross-clone rejection in the PR-branch-adoption
  secondary path in worktree.test.ts

Docs:
- CHANGELOG.md Unreleased entry for the cross-clone fix series
- troubleshooting.md "Worktree Belongs to a Different Clone" section
  documenting all four new error patterns with resolution steps and
  pointer to #1192 for the architectural fix

* fix(git): use raw .git pointer in cross-clone error message

verifyWorktreeOwnership previously called path.resolve() on the gitdir
path before embedding it in the error message. On Windows, resolve()
prepends a drive letter to a POSIX-style path (e.g., /other/clone →
C:\other\clone), which:

1. Misled users by showing a path that doesn't match what's actually
   in their .git file
2. Broke a Windows-only test asserting the error contains the literal
   /other/clone path

Compare on resolved paths (correct — normalizes trailing slashes and
relative components for the equality check) but display the raw match
in the error message (recognizable to the user).
2026-04-14 12:10:19 +03:00
Rasmus Widing
af9ed84157
fix: prevent worktree isolation bypass via prompt and git-level adoption (#1198)
* fix: prevent worktree isolation bypass via prompt and git-level adoption (#1193, #1188)

Three fixes for workflows operating on wrong branches:

- archon-implement prompt: replace ambiguous branch table with decision
  tree that trusts the worktree isolation system, uses $BASE_BRANCH
  explicitly, and instructs AI to never switch branches
- WorktreeProvider.findExisting: verify worktree's parent repo matches
  the request before adopting, preventing cross-clone adoption
- WorktreeProvider.createNewBranch: reset stale orphan branches to the
  intended start-point instead of silently inheriting old commits

Fixes #1193
Relates to #1188

* fix: address PR review — strict worktree verification, align sibling prompts

Address CodeRabbit + self-review findings on #1198:

Code fixes:
- findExisting now throws on cross-checkout or unverifiable state instead of
  returning null, avoiding a confusing cascade through createNewBranch
- verifyWorktreeOwnership handles .git errors precisely: ENOENT/EACCES/EIO
  throw a fail-fast error; EISDIR (full checkout at path) throws a clear
  "not a worktree" error; unmatched gitdir (submodule, malformed) throws
- Path comparison uses resolve() to normalize trailing slashes
- Added classifyIsolationError patterns so new errors produce actionable
  user messages

Test fixes:
- mockClear readFile/rm in afterEach
- New tests: cross-checkout throws, EISDIR throws, EACCES throws,
  submodule pointer throws, trailing-slash normalization, branch -f
  reset failure propagates without retry
- Updated existing tests that relied on permissive adoption to provide
  valid matching gitdir

Prompt fixes (sweep of all default commands):
- archon-implement.md: clarify "never switch branches" applies to worktree
  context; non-worktree branch creation still allowed
- archon-fix-issue.md + archon-implement-issue.md: aligned decision tree
  with archon-implement pattern; use $BASE_BRANCH instead of MAIN/MASTER
- archon-plan-setup.md: converted table to ordered decision tree with
  IN WORKTREE? first; removed ambiguous "already on correct feature
  branch" row
2026-04-14 09:44:12 +03:00
Rasmus Widing
d6e24f5075
feat: Phase 2 — community-friendly provider registry system (#1195)
* feat: replace hardcoded provider factory with typed registry system

Replace the built-in-only factory switch with a typed ProviderRegistration
registry where entries carry metadata (displayName, capabilities,
isModelCompatible) alongside the factory function. This enables community
providers to register without modifying core code.

- Add ProviderRegistration and ProviderInfo types to contract layer
- Create registry.ts with register/get/list/clear API, delete factory.ts
- Bootstrap registerBuiltinProviders() at server and CLI entrypoints
- Widen provider unions from 'claude' | 'codex' to string across schemas,
  config types, deps, executors, and API validation
- Replace hardcoded model-validation with registry-driven isModelCompatible
  and inferProviderFromModel (built-in only inference)
- Add GET /api/providers endpoint returning registry metadata
- Dynamic provider dropdowns in Web UI (BuilderToolbar, NodeInspector,
  WorkflowBuilder, SettingsPage) via useProviders hook
- Dynamic provider selection in CLI setup command
- Registry test suite covering full lifecycle

* feat: generalize assistant config and tighten registry validation

- Add ProviderDefaults/ProviderDefaultsMap generic types to contract layer
- Add index signatures to ClaudeProviderDefaults/CodexProviderDefaults
- Introduce AssistantDefaults/AssistantDefaultsConfig intersection types
  that combine ProviderDefaultsMap with typed built-in entries
- Replace hardcoded claude/codex config merging with generic
  mergeAssistantDefaults() that iterates all provider entries
- Replace hardcoded toSafeConfig projection with generic
  toSafeAssistantDefaults() that strips server-internal fields
- Validate provider strings at all config-entry surfaces: env override,
  global config, repo config all throw on unknown providers
- Validate provider on PATCH /api/config/assistants (400 on unknown)
- Move validator.ts from hardcoded Codex checks to capability-driven
  warnings using registry getProviderCapabilities()
- Remove resolveProvider() default to 'claude' — returns undefined when
  no provider is set, skipping capability warnings for unresolved nodes
- Widen config API schemas to generic Record<string, ProviderDefaults>
- Rewrite SettingsPage to iterate providers dynamically with built-in
  specific UI for Claude/Codex and generic JSON view for community
- Extract bootstrap to provider-bootstrap modules in CLI and server
- Remove all as Record<...> casts from dag-executor, executor,
  orchestrator — clean indexing via ProviderDefaultsMap intersection

* fix: remove remaining hardcoded provider assumptions and regenerate types

- Replace hardcoded 'claude' defaults in CLI setup with registry lookup
  (getRegisteredProviders().find(p => p.builtIn)?.id)
- Replace hardcoded 'claude' default in clone.ts folder detection with
  registry-driven fallback
- Update config YAML comment from "claude or codex" to "registered provider"
- Make bootstrap test assertions use toContain instead of exact toEqual
  so they don't break when community providers are registered
- Widen validator.test.ts helper from 'claude' | 'codex' to string
- Remove unnecessary type casts in NodeInspector, WorkflowBuilder,
  SettingsPage now that generated types use string
- Regenerate api.generated.d.ts from updated OpenAPI spec — all provider
  fields are now string instead of 'claude' | 'codex' union

* fix: address PR review findings — consistency, tests, docs

Critical fixes:
- isModelCompatible now throws on unknown providers (fail-fast parity
  with getProviderCapabilities) instead of silently returning true
- Schema provider fields use z.string().trim().min(1) to reject
  whitespace-only values
- validator.ts resolveProvider accepts defaultProvider param so
  capability warnings fire for config-inherited providers
- PATCH /api/config/assistants validates assistants keys against
  registry (rejects unknown provider IDs in the map)

YAGNI cleanup:
- Delete provider-bootstrap.ts wrappers in CLI and server — call
  registerBuiltinProviders() directly
- Remove no-op .map(provider => provider) in SettingsPage

Test coverage:
- Add GET /api/providers endpoint tests (shape, projection, capabilities)
- Add config-loader throw-path tests for unknown providers in env var,
  global config, and repo config
- Add isModelCompatible throw test for unknown providers

Docs:
- CLAUDE.md: factory.ts → registry.ts in directory tree, add
  GET /api/providers to API endpoints section
- .env.example: update DEFAULT_AI_ASSISTANT comment
- docs-web configuration reference: update provider constraint docs

UI:
- Settings default-assistant dropdown uses allProviderEntries fallback
  (no longer silently empty on API failure)
- clearRegistry marked @internal in JSDoc

* fix: use registry defaults in getDefaults/registerProject, document type design

- getDefaults() initializes assistant defaults from registered providers
  instead of hardcoding { claude: {}, codex: {} }
- getDefaults() uses first registered built-in as default assistant
  instead of hardcoding 'claude'
- handleRegisterProject uses config.assistant instead of hardcoded 'claude'
  for new codebase ai_assistant_type
- Document AssistantDefaults/AssistantDefaultsConfig intersection types:
  built-in keys are typed for parseClaudeConfig/parseCodexConfig type
  safety; community providers use the generic [string] index
- Document WorkflowConfig.assistants intersection type with same rationale

* docs: update stale provider references to reflect registry system

- architecture.md: DB schema comment now says 'registered provider'
- first-workflow.md: provider field accepts any registered provider
- quick-reference.md: provider type changed from enum to string
- authoring-workflows.md: provider type changed from enum to string
- title-generator.ts: @param doc updated from 'claude or codex' to
  generic provider identifier

* docs: fix remaining stale provider references in quick-reference and authoring guide

- quick-reference.md: per-node provider type changed from enum to string
- quick-reference.md: model mismatch guidance updated for registry pattern
- authoring-workflows.md: provider comment says 'any registered provider'
2026-04-13 21:27:11 +03:00
Rasmus Widing
b5c5f81c8a
refactor: extract provider metadata seam for Phase 2 registry readiness (#1185)
* refactor: extract provider metadata seam for Phase 2 registry readiness

- Add static capability constants (capabilities.ts) for Claude and Codex
- Export getProviderCapabilities() from @archon/providers for capability
  queries without provider instantiation
- Add inferProviderFromModel() to model-validation.ts, replacing three
  copy-pasted inline inference blocks in executor.ts and dag-executor.ts
- Replace throwaway provider instantiation in dag-executor with static
  capability lookup (getProviderCapabilities)
- Add orchestrator warning when env vars are configured but provider
  doesn't support envInjection

* refactor: address LOW findings from code review

- Remove CLAUDE_CAPABILITIES/CODEX_CAPABILITIES from public index (YAGNI —
  callers should use getProviderCapabilities(), not raw constants)
- Remove dead _deps parameter from resolveNodeProviderAndModel and its
  two call-sites (no longer needed after static capability lookup refactor)
- Update factory.ts module JSDoc to mention both exported functions
- Add edge-case tests for getProviderCapabilities: empty string and
  case-sensitive throws (parity with existing getAgentProvider tests)
- Add test for inferProviderFromModel with empty string (returns default,
  documenting the falsy-string shortcut)
2026-04-13 16:10:48 +03:00
Rasmus Widing
bf20063e5a
feat: propagate managed execution env to all workflow surfaces (#1161)
* Implement managed execution env propagation

* Address managed env review feedback
2026-04-13 15:21:57 +03:00
Rasmus Widing
a8ac3f057b
security: prevent target repo .env from leaking into subprocesses (#1135)
Remove the entire env-leak scanning/consent infrastructure: scanner,
allow_env_keys DB column usage, allow_target_repo_keys config, PATCH
consent route, --allow-env-keys CLI flag, and UI consent toggle.

The env-leak gate was the wrong primitive. Target repo .env protection
is already structural:
- stripCwdEnv() at boot removes Bun-auto-loaded CWD .env keys
- Archon loads its own env sources afterward (~/.archon/.env)
- process.env is clean before any subprocess spawns
- Managed env injection (config.yaml env: + DB vars) is unchanged

No scanning, no consent, no blocking. Any repo can be registered and
used. Subprocesses receive the already-clean process.env.
2026-04-13 13:46:24 +03:00