When a workflow node defines hooks (PreToolUse/PostToolUse) in YAML but
no hooks exist yet on the options object, applyNodeConfig crashes with
"undefined is not an object" because it tries to assign properties on
the undefined options.hooks.
Initialize options.hooks to {} before the merge loop.
Reproduces with: archon workflow run archon-architect (which uses
per-node hooks extensively).
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three related fixes around the `worktree.copyFiles` primitive:
1. Remove the `.env.example -> .env` rename example from
reference/configuration.md and getting-started/overview.md. The
`->` parser was removed in #739 (2026-03-19) because it caused
the stale-credentials production bug in #228 — but the docs kept
advertising it. A user writing `.env.example -> .env` today gets
`parseCopyFileEntry` returning `{source: '.env.example -> .env',
destination: '.env.example -> .env'}`, stat() fails with ENOENT,
and the copy silently no-ops at debug level.
2. Replace the single-line "Default behavior: .archon/ is always
copied" note with a proper "Worktree file copying" subsection
that explains:
- Why this exists (git worktree add = tracked files only; gitignored
workflow inputs need this hook)
- The `.archon/` default (no config needed for the common case)
- Common entries: .env, .vscode/, .claude/, plans/, reports/,
data fixtures
- Semantics: source=destination, ENOENT silently skipped, per-entry
error isolation, path-traversal rejected
- Interaction with `worktree.path` (both layouts get the same
treatment)
3. Update the overview example to drop the `.env.example + .env` pair
(which implied rename semantics) in favor of `.env + plans/`, and
call out that `.archon/` is auto-copied so users don't list it.
No code changes. `bun run format:check` and `bun run lint` green.
The Settings page's Platform Connections section hardcoded every platform
except Web to 'Not configured', so users couldn't tell whether their Slack/
Telegram/Discord/GitHub/Gitea/GitLab adapters had actually started.
- Server: /api/health now returns an activePlatforms array populated live
as each adapter's start() resolves. Passed into registerApiRoutes so the
reference stays mutable — Telegram starts after the HTTP listener is
already accepting requests, so a snapshot would miss it.
- Web: SettingsPage.PlatformConnectionsSection now reads activePlatforms
from /api/health and looks each platform up in a Set. Also adds Gitea
and GitLab to the list (they already ship as adapters).
Closes#1031
Co-authored-by: Lior Franko <liorfr@dreamgroup.com>
* feat(isolation): per-project worktree.path + collapse to two layouts
Adds an opt-in `worktree.path` to .archon/config.yaml so a repo can co-locate
worktrees with its own checkout (`<repoRoot>/<path>/<branch>`) instead of the
default `~/.archon/workspaces/<owner>/<repo>/worktrees/<branch>`. Requested in
joelsb's #1117.
Primitive changes (clean up the graveyard rather than add parallel code paths):
- Collapse worktree layouts from three to two. The old "legacy global" layout
(`~/.archon/worktrees/<owner>/<repo>/<branch>`) is gone — every repo resolves
to the workspace-scoped layout (`~/.archon/workspaces/<owner>/<repo>/worktrees/<branch>`),
whether it was archon-cloned or locally registered. `extractOwnerRepo()` on
the repo path is the stable identity fallback. Ends the divergence where
workspace-cloned and local repos had visibly different worktree trees.
- `getWorktreeBase()` in @archon/git now returns `{ base, layout }` and accepts
an optional `{ repoLocal }` override. The layout value replaces the old
`isProjectScopedWorktreeBase()` classification at the call sites
(`isProjectScopedWorktreeBase` stays exported as deprecated back-compat).
- `WorktreeCreateConfig.path` carries the validated override from repo config.
`resolveRepoLocalOverride()` fails loudly on absolute paths, `..` escapes,
and resolve-escape edge cases (Fail Fast — no silent default fallback when
the config is syntactically wrong).
- `WorktreeProvider.create()` now loads repo config exactly once and threads it
through `getWorktreePath()` + `createWorktree()`. Replaces the prior
swallow-then-retry pattern flagged on #1117. `generateEnvId()` is gone —
envId is assigned directly from the resolved path (the invariant was already
documented on `destroy(envId)`).
Tests (packages/git + packages/isolation):
- Update the pre-existing `getWorktreeBase` / `isProjectScopedWorktreeBase`
suite for the new two-layout return shape and precedence.
- Add 8 tests for `worktree.path`: default fallthrough, empty/whitespace
ignored, override wins for workspace-scoped repos, rejects absolute, rejects
`../` escapes (three variants), accepts nested relative paths.
Docs: add `worktree.path` to the repo config reference with explicit precedence
and the `.gitignore` responsibility note.
Co-authored-by: Joel Bastos <joelsb2001@gmail.com>
* feat(workflows): per-workflow worktree.enabled policy
Introduces a declarative top-level `worktree:` block on a workflow so
authors can pin isolation behavior regardless of invocation surface. Solves
the case where read-only workflows (e.g. `repo-triage`) should always run in
the live checkout, without every CLI/web/scheduled-trigger caller having to
remember to set the right flag.
Schema (packages/workflows/src/schemas/workflow.ts + loader.ts):
- New optional `worktree.enabled: boolean` on `workflowBaseSchema`. Loader
parses with the same warn-and-ignore discipline used for `interactive`
and `modelReasoningEffort` — invalid shapes log and drop rather than
killing workflow discovery.
Policy reconciliation (packages/cli/src/commands/workflow.ts):
- Three hard-error cases when YAML policy contradicts invocation flags:
• `enabled: false` + `--branch` (worktree required by flag, forbidden by policy)
• `enabled: false` + `--from` (start-point only meaningful with worktree)
• `enabled: true` + `--no-worktree` (policy requires worktree, flag forbids it)
- `enabled: false` + `--no-worktree` is redundant, accepted silently.
- `--resume` ignores the pinned policy (it reuses the existing run's worktree
even when policy would disable — avoids disturbing a paused run).
Orchestrator wiring (packages/core/src/orchestrator/orchestrator-agent.ts):
- `dispatchOrchestratorWorkflow` short-circuits `validateAndResolveIsolation`
when `workflow.worktree?.enabled === false` and runs directly in
`codebase.default_cwd`. Web chat/slack/telegram callers have no flag
equivalent to `--no-worktree`, so the YAML field is their only control.
- Logged as `workflow.worktree_disabled_by_policy` for operator visibility.
First consumer (.archon/workflows/repo-triage.yaml):
- `worktree: { enabled: false }` — triage reads issues/PRs and writes gh
labels; no code mutations, no reason to spin up a worktree per run.
Tests:
- Loader: parses `worktree.enabled: true|false`, omits block when absent.
- CLI: four new integration tests for the reconciliation matrix (skip when
policy false, three hard-error cases, redundant `--no-worktree` accepted,
`--no-worktree` + `enabled: true` rejected).
Docs: authoring-workflows.md gets the new top-level field in the schema
example with a comment explaining the precedence and the `enabled: true|false`
semantics.
* fix(isolation): use path.sep for repo-containment check on Windows
resolveRepoLocalOverride was hardcoding '/' as the separator in the
startsWith check, so on Windows (where `resolve()` returns backslash
paths like `D:\Users\dev\Projects\myapp`) every otherwise-valid
relative `worktree.path` was rejected with "resolves outside the repo
root". Fixed by importing `path.sep` and using it in the sentinel.
Fixes the 3 Windows CI failures in `worktree.path repo-local override`.
---------
Co-authored-by: Joel Bastos <joelsb2001@gmail.com>
* feat(paths,workflows): unify ~/.archon/{workflows,commands,scripts} + drop globalSearchPath
Collapses the awkward `~/.archon/.archon/workflows/` convention to a direct
`~/.archon/workflows/` child (matching `workspaces/`, `archon.db`, etc.), adds
home-scoped commands and scripts with the same loading story, and kills the
opt-in `globalSearchPath` parameter so every call site gets home-scope for free.
Closes#1136 (supersedes @jonasvanderhaegen's tactical fix — the bug was the
primitive itself: an easy-to-forget parameter that five of six call sites on
dev dropped).
Primitive changes:
- Home paths are direct children of `~/.archon/`. New helpers in `@archon/paths`:
`getHomeWorkflowsPath()`, `getHomeCommandsPath()`, `getHomeScriptsPath()`,
and `getLegacyHomeWorkflowsPath()` (detection-only for migration).
- `discoverWorkflowsWithConfig(cwd, loadConfig)` reads home-scope internally.
The old `{ globalSearchPath }` option is removed. Chat command handler, Web
UI workflow picker, orchestrator resolve path — all inherit home-scope for
free without maintainer patches at every new site.
- `discoverScriptsForCwd(cwd)` merges home + repo scripts (repo wins on name
collision). dag-executor and validator use it; the hardcoded
`resolve(cwd, '.archon', 'scripts')` single-scope path is gone.
- Command resolution is now walked-by-basename in each scope. `loadCommand`
and `resolveCommand` walk 1 subfolder deep and match by `.md` basename, so
`.archon/commands/triage/review.md` resolves as `review` — closes the
latent bug where subfolder commands were listed but unresolvable.
- All three (`workflows/`, `commands/`, `scripts/`) enforce a 1-level
subfolder cap (matches the existing `defaults/` convention). Deeper
nesting is silently skipped.
- `WorkflowSource` gains `'global'` alongside `'bundled'` and `'project'`.
Web UI node palette shows a dedicated "Global (~/.archon/commands/)"
section; badges updated.
Migration (clean cut — no fallback read):
- First use after upgrade: if `~/.archon/.archon/workflows/` exists, Archon
logs a one-time WARN per process with the exact `mv` command:
`mv ~/.archon/.archon/workflows ~/.archon/workflows && rmdir ~/.archon/.archon`
The legacy path is NOT read — users migrate manually. Rollback caveat
noted in CHANGELOG.
Tests:
- `@archon/paths/archon-paths.test.ts`: new helper tests (default HOME,
ARCHON_HOME override, Docker), plus regression guards for the double-`.archon/`
path.
- `@archon/workflows/loader.test.ts`: home-scoped workflows, precedence,
subfolder 1-depth cap, legacy-path deprecation warning fires exactly once
per process.
- `@archon/workflows/validator.test.ts`: home-scoped commands + subfolder
resolution.
- `@archon/workflows/script-discovery.test.ts`: depth cap + merge semantics
(repo wins, home-missing tolerance).
- Existing CLI + orchestrator tests updated to drop `globalSearchPath`
assertions.
E2E smoke (verified locally, before cleanup):
- `.archon/workflows/e2e-home-scope.yaml` + scratch repo at /tmp
- Home-scoped workflow discovered from an unrelated git repo
- Home-scoped script (`~/.archon/scripts/*.ts`) executes inside a script node
- 1-level subfolder workflow (`~/.archon/workflows/triage/*.yaml`) listed
- Legacy path warning fires with actionable `mv` command; workflows there
are NOT loaded
Docs: `CLAUDE.md`, `docs-web/guides/global-workflows.md` (full rewrite for
three-type scope + subfolder convention + migration), `docs-web/reference/
configuration.md` (directory tree), `docs-web/reference/cli.md`,
`docs-web/guides/authoring-workflows.md`.
Co-authored-by: Jonas Vanderhaegen <7755555+jonasvanderhaegen@users.noreply.github.com>
* test(script-discovery): normalize path separators in mocks for Windows
The 4 new tests in `scanScriptDir depth cap` and `discoverScriptsForCwd —
merge repo + home with repo winning` compared incoming mock paths with
hardcoded forward-slash strings (`if (path === '/scripts/triage')`). On
Windows, `path.join('/scripts', 'triage')` produces `\scripts\triage`, so
those branches never matched, readdir returned `[]`, and the tests failed.
Added a `norm()` helper at module scope and wrapped the incoming `path`
argument in every `mockImplementation` before comparing. Stored paths go
through `normalizeSep()` in production code, so the existing equality
assertions on `script.path` remain OS-independent.
Fixes Windows CI job `test (windows-latest)` on PR #1315.
* address review feedback: home-scope error handling, depth cap, and tests
Critical fixes:
- api.ts: add `maxDepth: 1` to all 3 findMarkdownFilesRecursive calls in
GET /api/commands (bundled/home/project). Without this the UI palette
surfaced commands from deep subfolders that the executor (capped at 1)
could not resolve — silent "command not found" at runtime.
- validator.ts: wrap home-scope findMarkdownFilesRecursive and
resolveCommandInDir calls in try/catch so EACCES/EPERM on
~/.archon/commands/ doesn't crash the validator with a raw filesystem
error. ENOENT still returns [] via the underlying helper.
Error handling fixes:
- workflow-discovery.ts: maybeWarnLegacyHomePath now sets the
"warned-once" flag eagerly before `await access()`, so concurrent
discovery calls (server startup with parallel codebase resolution)
can't double-warn. Non-ENOENT probe errors (EACCES/EPERM) now log at
WARN instead of DEBUG so permission issues on the legacy dir are
visible in default operation.
- dag-executor.ts: wrap discoverScriptsForCwd in its own try/catch so
an EACCES on ~/.archon/scripts/ routes through safeSendMessage /
logNodeError with a dedicated "failed to discover scripts" message
instead of being mis-attributed by the outer catch's
"permission denied (check cwd permissions)" branch.
Tests:
- load-command-prompt.test.ts (new): 6 tests covering the executor's
command resolution hot path — home-scope resolves when repo misses,
repo shadows home, 1-level subfolder resolvable by basename, 2-level
rejected, not-found, empty-file. Runs in its own bun test batch.
- archon-paths.test.ts: add getHomeScriptsPath describe block to match
the existing getHomeCommandsPath / getHomeWorkflowsPath coverage.
Comment clarity:
- workflow-discovery.ts: MAX_DISCOVERY_DEPTH comment now leads with the
actual value (1) before describing what 0 would mean.
- script-discovery.ts: copy the "routing ambiguity" rationale from
MAX_DISCOVERY_DEPTH to MAX_SCRIPT_DISCOVERY_DEPTH.
Cleanup:
- Remove .archon/workflows/e2e-home-scope.yaml — one-off smoke test that
would ship permanently in every project's workflow list. Equivalent
coverage exists in loader.test.ts.
Addresses all blocking and important feedback from the multi-agent
review on PR #1315.
---------
Co-authored-by: Jonas Vanderhaegen <7755555+jonasvanderhaegen@users.noreply.github.com>
All 15 worktree git-subprocess timeouts in WorktreeProvider were hardcoded
at 30000ms. Repos with heavy post-checkout hooks (lint, dependency install,
submodule init) routinely exceed that budget and fail worktree creation.
Consolidate them onto a single GIT_OPERATION_TIMEOUT_MS constant at 5 min.
Generous enough to cover reported cases while still catching genuine hangs
(credential prompts in non-TTY, stalled fetches).
Chosen over the config-key approach in #1029 to avoid adding permanent
.archon/config.yaml surface for a problem a raised default solves cleanly.
If 5 min turns out to also be too tight for real-world use, we'll revisit.
Closes#1119
Supersedes #1029
Co-authored-by: Shay Elmualem <12733941+norbinsh@users.noreply.github.com>
* fix(db): throw on corrupt commands JSON instead of silent empty fallback (#967)
getCodebaseCommands() silently returned {} when the commands column
contained corrupt JSON. Callers had no way to distinguish 'no commands'
from 'unreadable data', violating fail-fast principles.
Now throws a descriptive error with the codebase ID and a recovery hint.
The error is still logged for observability before throwing.
Adds two test cases: corrupt JSON throws, valid JSON string parses.
* fix: include parse error in log for better diagnostics
CLAUDE.md is the primary entry point for agents working in this repo, but it
only mentioned Pi once — buried in a DAG-node capability parenthetical. Add
Pi to the directory tree, Package Split blurb, and AI Agent Providers list
so Pi is discoverable without relying on the docs site or git log.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(providers/pi): interactive flag binds UIContext for extensions
Adds `interactive: true` opt-in to Pi provider (in `.archon/config.yaml`
under `assistants.pi`) that binds a minimal `ExtensionUIContext` stub to
each session. Without this, Pi's `ExtensionRunner.hasUI()` reports false,
causing extensions like `@plannotator/pi-extension` to silently auto-approve
every plan instead of opening their browser review UI.
Semantics: clamped to `enableExtensions: true` — no extensions loaded
means nothing would consume `hasUI`, so `interactive` alone is silently
dropped. Stub forwards `notify()` to Archon's event stream; interactive
dialogs (select/confirm/input/editor/custom) resolve to undefined/false;
TUI-only setters (widgets/headers/footers/themes) no-op. Theme access
throws with a clear diagnostic — Pi's theme singleton is coupled to its
own `Symbol.for()` registry which Archon doesn't own.
Trust boundary: only binds when the operator has explicitly enabled
both flags. Extensions gated on `ctx.hasUI` (plannotator and similar)
get a functional UI context; extensions that reach for TUI features
still fail loudly rather than rendering garbage.
Includes smoke-test workflow documenting the integration surface.
End-to-end plannotator UI rendering requires plan-mode activation
(Pi `--plan` CLI flag or `/plannotator` TUI slash command) which is
out of reach for programmatic Archon sessions — manual test only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(providers/pi): end-to-end interactive extension UI
Three fixes that together get plannotator's browser review UI to actually
render from an Archon workflow and reach the reviewer's browser.
1. Call resourceLoader.reload() when enableExtensions is true.
createAgentSession's internal reload is gated on `!resourceLoader`, so
caller-supplied loaders must reload themselves. Without this,
getExtensions() returns the empty default, no ExtensionRunner is built,
and session.extensionRunner.setFlagValue() silently no-ops.
2. Set PLANNOTATOR_REMOTE=1 in interactive mode.
plannotator-browser.ts only calls ctx.ui.notify(url) when openBrowser()
returns { isRemote: true }; otherwise it spawns xdg-open/start on the
Archon server host — invisible to the user and untestable from bash
asserts. From the workflow runner's POV every Archon execution IS
remote; flipping the heuristic routes the URL through notify(), which
the ExtensionUIContext stub forwards into the event stream. Respect
explicit operator overrides.
3. notify() emits as assistant chunks, not system chunks.
The DAG executor's system-chunk filter only forwards warnings/MCP
prefixes, and only assistant chunks accumulate into $nodeId.output.
Emitting as assistant makes the URL available both in the user's
stream and in downstream bash/script nodes via output substitution.
Plus: extensionFlags config pass-through (equivalent to `pi --plan` on the
CLI) applied via ExtensionRunner.setFlagValue() BEFORE bindExtensions
fires session_start, so extensions reading flags in their startup handler
actually see them. Also bind extensions with an empty binding when
enableExtensions is on but interactive is off, so session_start still
fires for flag-driven but UI-less extensions.
Smoke test (.archon/workflows/e2e-plannotator-smoke.yaml) uses
openai-codex/gpt-5.4-mini (ChatGPT Plus OAuth compatible) and bumps
idle_timeout to 600000ms so plannotator's server survives while a human
approves in the browser.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* refactor(providers/pi): keep Archon extension-agnostic
Remove the plannotator-specific PLANNOTATOR_REMOTE=1 env var write from
the Pi provider. Archon's provider layer shouldn't know about any
specific extension's internals. Document the env var in the plannotator
smoke test instead — operators who use plannotator set it via their shell
or per-codebase env config.
Workflow smoke test updated with:
- Instructions for setting PLANNOTATOR_REMOTE=1 externally
- Simpler assertion (URL emission only) — validated in a real
reject-revise-approve run: reviewer annotated, clicked Send Feedback,
Pi received the feedback as a tool result, revised the plan (added
aria-label and WCAG contrast per the annotation), resubmitted, and
reviewer approved. Plannotator's tool result signals approval but
doesn't return the plan text, so the bash assertion now only checks
that the review URL reached the stream (not that plan content flowed
into \$nodeId.output — it can't).
- Known-limitation note documenting the tool-result shape so downstream
workflow authors know to Write the plan separately if they need it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(providers/pi): keep e2e-plannotator-smoke workflow local-only
The smoke test is plannotator-specific (calls plannotator_submit_plan,
expects PLAN.md on disk, requires PLANNOTATOR_REMOTE=1) and is better
kept out of the PR while the extension-agnostic infra lands.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* style(providers/pi): trim verbose inline comments
Collapse multi-paragraph SDK explanations to 1-2 line "why" notes across
provider.ts, types.ts, ui-context-stub.ts, and event-bridge.ts. No
behavior change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(providers/pi): wire assistants.pi.env + theme-proxy identity
Two end-to-end fixes discovered while exercising the combined
plannotator + @pi-agents/loop smoke flow:
- PiProviderDefaults gains an optional `env` map; parsePiConfig picks
it up and the provider applies it to process.env at session start
(shell env wins, no override). Needed so extensions like plannotator
can read PLANNOTATOR_REMOTE=1 from config.yaml without requiring a
shell export before `archon workflow run`.
- ui-context-stub theme proxy returns identity decorators instead of
throwing on unknown methods. Styled strings flow into no-op
setStatus/setWidget sinks anyway, so the throw was blocking
plannotator_submit_plan after HTTP approval with no benefit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(providers/pi): flush notify() chunks immediately in batch mode
Batch-mode adapters (CLI) accumulate assistant chunks and only flush on
node completion. That broke plannotator's review-URL flow: Pi's notify()
emitted the URL as an assistant chunk, but the user needed the URL to
POST /api/approve — which is what unblocks the node in the first place.
Adds an optional `flush` flag on assistant MessageChunks. notify() sets
it, and the DAG executor drains pending batched content before surfacing
the flushed chunk so ordering is preserved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: mention Pi alongside Claude and Codex in README + top-level docs
The AI assistants docs page already covers Pi in depth, but the README
architecture diagram + docs table, overview "Further Reading" section,
and local-deployment .env comment still listed only Claude/Codex.
Left feature-specific mentions alone where Pi genuinely lacks support
(e.g. structured output — Claude + Codex only).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: note Pi structured output (best-effort) in matrix + workflow docs
Pi gained structured output support via prompt augmentation + JSON
extraction (see packages/providers/src/community/pi/capabilities.ts).
Unlike Claude/Codex, which use SDK-enforced JSON mode, Pi appends the
schema to the prompt and parses JSON out of the result text (bare or
fenced). Updates four stale references that still said Claude/Codex-only:
- ai-assistants.md capabilities matrix
- authoring-workflows.md (YAML example + field table)
- workflow-dag.md skill reference
- CLAUDE.md DAG-format node description
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(providers/pi): default extensions + interactive to on
Extensions (community packages like @plannotator/pi-extension and
user-authored ones) are a core reason users pick Pi. Defaulting
enableExtensions and interactive to false previously silenced installed
extensions with no signal, leading to "did my extension even load?"
confusion.
Opt out in .archon/config.yaml when you want the prior behavior:
assistants:
pi:
enableExtensions: false # skip extension discovery entirely
# interactive: false # load extensions, but no UI bridge
Docs gain a new "Extensions (on by default)" section in
getting-started/ai-assistants.md that documents the three config
surfaces (extensionFlags, env, workflow-level interactive) and uses
plannotator as a concrete walk-through example.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(paths/cli/setup): unify env load + write on three-path model (#1302, #1303)
Key env handling on directory ownership rather than filename. `.archon/` (at
`~/` or `<cwd>/`) is archon-owned; anything else is the user's.
- `<repo>/.env` — stripped at boot (guard kept), never loaded, never written
- `<repo>/.archon/.env` — loaded at repo scope (wins over home), writable via
`archon setup --scope project`
- `~/.archon/.env` — loaded at home scope, writable via `--scope home` (default)
Read side (#1302):
- New `@archon/paths/env-loader` with `loadArchonEnv(cwd)` shared by CLI and
server entry points. Loads both archon-owned files with `override: true`;
repo scope wins.
- Replaced `[dotenv@17.3.1] injecting env (0) from .env` (always lied about
stripped keys) with `[archon] stripped N keys from <cwd> (...)` and
`[archon] loaded N keys from <path>` lines, emitted only when N > 0.
`quiet: true` passed to dotenv to silence its own output.
- `stripCwdEnv` unchanged in semantics — still the only source that deletes
keys from `process.env`; now logs what it did.
Write side (#1303):
- `archon setup` never writes to `<repo>/.env`. Writing there was incoherent
because `stripCwdEnv` deletes those keys on every run.
- New `--scope home|project` (default home) targets exactly one archon-owned
file. New `--force` overrides the merge; backup still written.
- Merge-only by default: existing non-empty values win, user-added custom keys
survive, `<path>.archon-backup-<ISO-ts>` written before every rewrite. Fixes
silent PostgreSQL→SQLite downgrade and silent token loss in Add mode.
- One-time migration note emitted when `<cwd>/.env` exists at setup start.
Tests: new `env-loader.test.ts` (6), extended `strip-cwd-env.test.ts` (+4 for
the log line), extended `setup.test.ts` (+10 for scope/merge/backup/force/
repo-untouched), extended `cli.test.ts` (+5 for flag parsing).
Docs: configuration.md, cli.md, security.md, cli-internals.md, setup skill —
all updated to the three-path model.
* fix(cli/setup): address PR review — scope/path/secret-handling edge cases
- cli: resolve --scope project to git repo root so running setup from a
subdir writes to <repo-root>/.archon/.env (what loadArchonEnv reads at
boot), not <subdir>/.archon/.env. Fail fast with a useful message when
--scope project is used outside a git repo.
- setup: resolveScopedEnvPath() now delegates to @archon/paths helpers
(getArchonEnvPath / getRepoArchonEnvPath) so Docker's /.archon home,
ARCHON_HOME overrides, and the "undefined" literal guard all behave
identically between the loader and the writer.
- setup: wrap the writeScopedEnv call in try/catch so an fs exception
(permission denied, read-only FS, backup copy failure) stops the clack
spinner cleanly and emits an actionable error instead of a raw stack
trace after the user has completed the entire wizard.
- setup: checkExistingConfig(envPath?) — scope-aware existing-config read.
Add/Update/Fresh now reflects the actual write target, not an
unconditional ~/.archon/.env.
- setup: serializeEnv escapes \r (was only \n) so values with bare CR or
CRLF round-trip through dotenv.parse without corruption. Regression
test added.
- setup: merge path treats whitespace-only existing values (' ') as
empty, so a copy-paste stray space doesn't silently defeat the wizard
update for that key forever. Regression test added.
- setup: 0o600 mode on the written env file AND on backup copies —
writeFileSync+copyFileSync default to 0o666 & ~umask, which can leave
secrets group/world-readable on a permissive umask.
- docs/cli.md + setup skill: appendix sections that still described the
pre-#1303 two-file symlink model now reflect the three-path model.
* fix(paths/env-loader): Windows-safe assertion for home-scope load line
The test asserted the log line contained `from ~/`, which is opportunistic
tilde-shortening that only happens when the tmpdir lives under `homedir()`.
On Windows CI the tmpdir is on `D:\\` while homedir is `C:\\Users\\...`, so
the path renders absolute and the `~/` never appears.
Match on the count and the archon-home tmpdir segment instead — robust on
both Unix tilde-short paths and Windows absolute paths.
* feat(workflows): add repo-triage — 6-node periodic maintenance workflow
Adds .archon/workflows/repo-triage.yaml: a self-contained periodic
maintenance workflow that uses inline sub-agents (Claude SDK agents:
field introduced in #1276) for map-reduce across open issues and PRs.
Six DAG nodes, three-layer topology:
- Layer 1 (parallel): triage-issues, link-prs, closed-pr-dedup-check,
stale-nudge
- Layer 2: closed-dedup-check (reads triage-issues state)
- Layer 3: digest (synthesises all prior nodes + writes markdown)
Capabilities per node:
- triage-issues: delegates labeling to on-disk triage-agent; inline
brief-gen Haiku for duplicate detection; 3-day auto-close clock
for unanswered duplicate warnings
- link-prs: conservative PR ↔ issue cross-refs via inline pr-issue-
matcher Haiku, Sonnet re-verifies fully-addresses claims before
suggesting Closes #X; auto-nudges on low-quality PR template fill
with first-run grandfather guard (snapshot-only, no nudge spam)
- closed-dedup-check: cross-matches open issues against recently-
closed ones via inline closed-brief-gen Haiku; same 3-day clock
- closed-pr-dedup-check: flags open PRs duplicating recently-closed
PRs via inline pr-brief-gen Haiku; comment-only, never closes PRs
- stale-nudge: 60-day inactivity pings (configurable); no auto-close
- digest: synthesises per-node outputs + reads state files to emit
$ARTIFACTS_DIR/digest.md with clickable GitHub comment links
Env-gated rollout knobs:
- DRY_RUN=1 (read-only; prints [DRY] lines, no gh/state mutations)
- SKIP_PR_LINK=1, SKIP_CLOSED_DEDUP=1, SKIP_CLOSED_PR_DEDUP=1,
SKIP_STALE_NUDGE=1
- STALE_DAYS=N (stale-nudge window; default 60)
Cross-run state under .archon/state/ (gitignored):
- triage-state.json briefs + pendingDedupComments
- closed-dedup-state.json closedBriefs + closedMatchComments
- closed-pr-dedup-state.json openBriefs + closedBriefs + matches
- pr-state.json linkedPrs + commentIds + templateAdherence
- stale-nudge-state.json nudged (with updatedAtAtNudge for re-nudge)
Every bot comment:
- @-tags the target human (reporter for issues, author for PRs)
- Tracks comment ID in state for traceability
- Is idempotent — re-runs skip existing comments
Intended use: invoke periodically (`archon workflow run repo-triage
--no-worktree`) once a scheduler lands; live state persists across
runs so previously-flagged items reconcile correctly.
.gitignore: adds .archon/state/ for cross-run memory files.
* feat(workflows/repo-triage): post digest to Slack when SLACK_WEBHOOK is set
Extends the digest node with an optional Slack-post step after the
canonical digest.md artifact is written. Uses Slack incoming webhook
(no bot token required beyond the incoming-webhook scope).
Behavior:
- SLACK_WEBHOOK unset → skipped silently with a one-line note
- DRY_RUN=1 → prints full payload, does not curl
- Otherwise → POSTs a compact (<3500 char) mrkdwn-formatted summary
containing headline numbers, this-run comment index (clickable
GitHub URLs), pending items, and a path reference to digest.md
- curl failure or non-ok Slack response is logged but does not fail
the node — digest.md on disk remains authoritative
- Intermediate Slack text written to $ARTIFACTS_DIR/digest-slack.txt
for traceability; payload JSON assembled via jq and written to
$ARTIFACTS_DIR/slack-payload.json before curl posts it
Slack mrkdwn conversion rules baked into the prompt (no tables, link
shape <url|text>, single-asterisk bold) so Sonnet emits a variant
that renders cleanly in Slack rather than being sent raw.
The webhook URL is read from the operator's environment (Archon
auto-loads ~/.archon/.env on CLI startup — put SLACK_WEBHOOK=... there).
* fix(workflows/repo-triage): address PR #1293 review feedback
Critical (3):
- `gh issue close --reason "not planned"` (space, not underscore) — the
CLI expects lowercase with a space; `not_planned` fails at runtime.
Fixed in both auto-close paths (triage-issues step 8, closed-dedup-
check step 7).
- link-prs step 7 state save was sparse `{ sha, processedAt, related,
fullyAddresses }`, overwriting `commentIds` / `templateNudgedAt` /
`templateAdherence`. Changed to explicit merge that spreads existing
entry first so per-run captured fields survive.
- Corrupt-JSON state files previously treated as first-run default
(silent `pendingDedupComments` reset → 3-day clock restarts forever).
All five state-load sites now abort loudly on JSON.parse throw;
ENOENT/empty continue to default-shape.
Important (7):
- Sub-agents (`brief-gen`, `closed-brief-gen`, `pr-brief-gen`,
`pr-issue-matcher`) emit `ERROR: <reason>` on gh failures rather than
partial/fabricated JSON. Orchestrator detects the sentinel, logs the
failed ID + first 200 chars of raw response, tracks in a failed-list,
and aborts the cluster/match pass if ≥50% of items failed (avoids
acting on bad data).
- `pr-brief-gen` now sets `diffTruncated: true` when the 30k-char diff
cap hits; link-prs verify pass downgrades any `fully-addresses` claim
to `related` when either side's brief was truncated.
- 3-day auto-close validates `postedAt` parses as ISO-8601 before the
elapsed-time comparison; corrupt timestamps are logged and skipped,
never acted on.
- `gh issue close` failure path no longer drops state — sets
`closeAttemptFailed: true` on the entry for next-run retry. Only
drops on exit 0.
- `closed-pr-dedup-check` idempotency check (`gh pr view --json comments`)
now aborts the post on fetch failure rather than falling through —
prevents double-posts on gh hiccups.
- `triage-agent` label pass has preflight `test -f` check for
`.claude/agents/triage-agent.md`; skips the pass with a clear log if
the file is missing rather than firing Task calls that fail obscurely.
- `brief-gen` template-adherence wording flipped from "Ignore … as
'filled'" (ambiguous, read as affirmative) to explicit "A section
counts as MISSING when …", matching the `pr-issue-matcher` phrasing.
Minor:
- `stale-nudge` idempotency check uses substring "has been quiet for"
instead of a prefix check that never matched (posted body starts
with @<author>).
- `closed-dedup-check` distinguishes "upstream crashed" (missing/corrupt
triage-state.json, or `lastRunAt == null`) from "legitimately quiet
day" (state present, briefs empty) — different log lines.
- Slack curl adds `-w "\nHTTP_STATUS:%{http_code}"` + `2>&1` so TLS /
4xx / 5xx errors are visible in captured output.
- `stateReason` values from `gh issue view --json stateReason` are
UPPERCASE (`COMPLETED`, `NOT_PLANNED`); documented and instruct
sub-agent to normalize to lowercase for consistency.
Docs:
- CLAUDE.md repo-level `.archon/` tree now lists `state/`.
- archon-directories.md tree adds `state/` + `scripts/` (both were
missing) with purpose descriptions.
Deferred (worth doing as a follow-up, not blocking):
- DRY/SKIP preamble duplication (~30-50 lines across 5 nodes).
- Explicit `BASELINE_IS_EMPTY` capture in link-prs (current derived
check works but is a load-bearing model instruction).
- Digest `WARNING` prefix block when upstream nodes are missing
outputs — today's "(output unavailable)" sub-line is functional.
- Pre-existing README workflow-count (17 → 20) and table gaps — not
caused by this PR.
Starlight removes the `starlight-theme` localStorage key when the user
selects "auto" mode. The old init script checked that key, so every
navigation or refresh re-forced dark theme. Use a separate
`archon-theme-init` sentinel that persists across theme changes.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `assistants.pi.enableExtensions` (default false) to `.archon/config.yaml`.
When true, Pi's `noExtensions` guard is lifted so the session loads tools and
lifecycle hooks from `~/.pi/agent/extensions/`, packages installed via
`pi install npm:<pkg>`, and the workflow's cwd `.pi/` directory — opening up
the community extension ecosystem at https://shittycodingagent.ai/packages.
Default stays suppressed to preserve the "Archon is source of truth" trust
boundary: enabling this loads arbitrary JS under the Archon server's OS
permissions, including whatever extension code the target repo happens to
ship. Operators opt in explicitly, per-host.
Skills, prompt templates, themes, and context files remain suppressed even
when extensions are enabled — only the extensions gate opens.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Pi's SDK has no native JSON-schema mode (unlike Claude's outputFormat /
Codex's outputSchema). Previously Pi declared structuredOutput: false
and any workflow using output_format silently degraded — the node ran,
the transcript was treated as free text, and downstream $nodeId.output.field
refs resolved to empty strings. 8 bundled/repo workflows across 10 nodes
were affected (archon-create-issue, archon-fix-github-issue,
archon-smart-pr-review, archon-workflow-builder, archon-validate-pr, etc.).
This PR closes the gap via prompt engineering + post-parse:
1. When requestOptions.outputFormat is present, the provider appends a
"respond with ONLY a JSON object matching this schema" instruction plus
JSON.stringify(schema) to the prompt before calling session.prompt().
2. bridgeSession accepts an optional jsonSchema param. When set, it buffers
every assistant text_delta and — on the terminal result chunk — parses
the buffer via tryParseStructuredOutput (trims whitespace, strips
```json / ``` fences, JSON.parse). On success, attaches
structuredOutput to the result chunk (matching Claude's shape). On
failure, emits a warn event and leaves structuredOutput undefined so
the executor's existing dag.structured_output_missing path handles it.
3. Flipped PI_CAPABILITIES.structuredOutput to true. Unlike Claude/Codex
this is best-effort, not SDK-enforced — reliable on GPT-5, Claude,
Gemini 2.x, recent Qwen Coder, DeepSeek V3, less reliable on smaller
or older models that ignore JSON-only instructions.
Tests added (14 total):
- tryParseStructuredOutput: clean JSON, fenced, bare fences, arrays,
whitespace, empty, prose-wrapped (fails), malformed, inner backticks
- augmentPromptForJsonSchema via provider integration: schema appended,
prompt unchanged when absent
- End-to-end: clean JSON → structuredOutput parsed; fenced JSON parses;
prose-wrapped → no structuredOutput + no crash; no outputFormat →
never sets structuredOutput even if assistant happens to emit JSON
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Four defensive fixes to the Pi community provider to match the
Claude/Codex contract and eliminate silent error swallowing.
1. envInjection now actually wired (capability was declared but unused)
Pi's SDK has no top-level `env` option on createAgentSession, so
per-project env vars were being dropped. Routes requestOptions.env
through a BashSpawnHook that merges caller env over the inherited
baseline (caller wins, matching Claude/Codex semantics). When env is
present with no allow/deny, resolvePiTools now explicitly returns Pi's
4 default tools so the pre-constructed default bashTool is replaced
with an env-aware one.
2. AsyncQueue no longer leaks on consumer abort. Added close() that
drains pending waiters with { done: true } so iterate() exits instead
of hanging forever when the producer's finally fires before the next
push. bridgeSession calls queue.close() in its finally block.
3. buildResultChunk no longer reports silent success when agent_end fires
with no assistant message. Now returns { isError: true, errorSubtype:
'missing_assistant_message' } and logs a warn event so broken Pi
sessions don't masquerade as clean completions.
4. session-resolver no longer swallows arbitrary errors from
SessionManager.list(). Narrowed the catch to ENOENT/ENOTDIR (the only
"session dir doesn't exist yet" signals); permission errors, parse
failures, and other unexpected errors now propagate.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(workflows): inline sub-agent definitions on DAG nodes
Add `agents:` node field letting workflow YAML define Claude Agent SDK
sub-agents inline, keyed by kebab-case ID. The main agent can spawn
them via the Task tool — useful for map-reduce patterns where a cheap
model briefs items and a stronger model reduces.
Authors no longer need standalone `.claude/agents/*.md` files for
workflow-scoped helpers; the definitions live with the workflow.
Claude only. Codex and community providers without the capability
emit a capability warning and ignore the field. Merges with the
internal `dag-node-skills` wrapper when `skills:` is also set —
user-defined agents win on ID collision.
* fix(workflows): address PR #1276 review feedback
Critical:
- Re-export agentDefinitionSchema + AgentDefinition from schemas/index.ts
(matches the "schemas/index.ts re-exports all" convention).
Important:
- Surface user-override of internal 'dag-node-skills' wrapper: warn-level
provider log + platform message to the user when agents: redefines the
reserved ID alongside skills:. User-wins behavior preserved (by design)
but silent capability removal is now observable.
- Add validator test coverage for the agents-capability warning (codex
node with agents: → warning; claude node → no warning; no-agents
field → no warning).
- Strengthen NodeConfig.agents duplicate-type comment explaining the
intentional circular-dep avoidance and pointing at the Zod schema as
authoritative source. Actual extraction is follow-up work.
Simplifications:
- Drop redundant typeof check in validator (schema already enforces).
- Drop unreachable Object.keys(...).length > 0 check in dag-executor.
- Drop rot-prone "(out of v1 scope)" parenthetical.
- Drop WHAT-only comment on AGENT_ID_REGEX.
- Tighten AGENT_ID_REGEX to reject trailing/double hyphens
(/^[a-z0-9]+(-[a-z0-9]+)*$/).
Tests:
- parseWorkflow strips agents on script: and loop: nodes (parallel to
the existing bash: coverage).
- provider emits warn log on dag-node-skills collision; no warn on
non-colliding inline agents.
Docs:
- Renumber authoring-workflows Summary section (12b → 13; bump 13-19).
- Add Pi capability-table row for inline agents (❌, Claude-only).
- Add when-to-use guidance (agents: vs .claude/agents/*.md) in the
new "Inline sub-agents" section.
- Cross-link skills.md Related → inline-sub-agents.
- CHANGELOG [Unreleased] Added entry for #1276.
Previously, `dag-executor` only failed nodes/iterations when the SDK
returned an `error_max_budget_usd` result. Every other `isError: true`
subtype — including `error_during_execution` — was silently `break`ed
out of the stream with whatever partial output had accumulated, letting
failed runs masquerade as successful ones with empty output.
This is the most likely explanation for the "5-second crash" symptom in
#1208: iterations finish instantly with empty text, the loop keeps
going, and only the `claude.result_is_error` log tips the user off.
Changes:
- Capture the SDK's `errors: string[]` detail on result messages
(previously discarded) and surface it through `MessageChunk.errors`.
- Log `errors`, `stopReason` alongside `errorSubtype` in
`claude.result_is_error` so users can see what actually failed.
- Throw from both the general node path and the loop iteration path
on any `isError: true` result, including the subtype and SDK errors
detail in the thrown message.
Note: this does not implement auto-retry. See PR comments on #1121 and
the analysis on #1208 — a retry-with-fresh-session approach for loop
iterations is not obviously correct until we see what
`error_during_execution` actually carries in the reporter's env.
This change is the observability + fail-loud step that has to come
first so that signal is no longer silent.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
The server's getPort() fallback changed from 3000 to 3090 in the Hono
migration (#318), but .env.example, the setup wizard's generated .env,
and the JSDoc describing the fallback were not updated — leaving three
different sources of truth for "the default PORT."
When the wizard writes PORT=3000 to ~/.archon/.env (which the Hono
server loads with override: true, while Vite only reads repo-local
.env), the two processes can land on different ports silently. That
mismatch is the real mechanism behind the failure described in #1152.
- .env.example: comment out PORT, document 3090 as the default
- packages/cli/src/commands/setup.ts: wizard no longer writes PORT=3000
into the generated .env; fix the "Additional Options" note
- packages/cli/src/commands/setup.test.ts: assert no bare PORT= line and
the commented default is present
- packages/core/src/utils/port-allocation.ts: fix stale JSDoc "default
3000" -> "default 3090"
- deploy/.env.example: keep Docker default at 3000 (compose/Caddy target
that) but annotate it so users don't copy it for local dev
Single source of truth for the local-dev default is now basePort in
port-allocation.ts.
Applies the CLAUDE.md comment rule ("don't embed paths/callers that rot
as the codebase evolves") flagged by the PR #1271 review to the Pi
provider's inline comments.
Three spots in the merged Pi code embed `packages/.../provider.ts:N-M`
line ranges pointing at the Claude and Codex providers. These ranges
will drift the moment those files change — the Claude auth-merge
pattern's line numbers are already off-by-a-few in some local branches.
Keep the conceptual cross-reference ("mirrors Claude's process-env +
request-env merge pattern", "matches the Codex provider's fallback
pattern for the same condition") — that's the load-bearing part of the
comment — drop the fragile line numbers and file paths.
Same treatment for the upstream Pi auth-storage.ts:424-485 reference,
which points at a specific line range in a moving dependency.
No behavior change; comment-only refactor.
* feat(providers): add Pi community provider (@mariozechner/pi-coding-agent)
Introduces Pi as the first community provider under the Phase 2 registry,
registered with builtIn: false. Wraps Pi's full coding-agent harness the
same way ClaudeProvider wraps @anthropic-ai/claude-agent-sdk and
CodexProvider wraps @openai/codex-sdk.
- PiProvider implements IAgentProvider; fresh AgentSession per sendQuery call
- AsyncQueue bridges Pi's callback-based session.subscribe() to Archon's
AsyncGenerator<MessageChunk> contract
- Server-safe: AuthStorage.inMemory + SessionManager.inMemory +
SettingsManager.inMemory + DefaultResourceLoader with all no* flags —
no filesystem access, no cross-request state
- API key seeded per-call from options.env → process.env fallback
- Model refs: '<pi-provider-id>/<model-id>' (e.g. google/gemini-2.5-pro,
openrouter/qwen/qwen3-coder) with syntactic compatibility check
- registerPiProvider() wired at CLI, server, and config-loader entrypoints,
kept separate from registerBuiltinProviders() since builtIn: false is
load-bearing for the community-provider validation story
- All 12 capability flags declared false in v1 — dag-executor warnings fire
honestly for any unmapped nodeConfig field
- 58 new tests covering event mapping, async-queue semantics, model-ref
parsing, defensive config parsing, registry integration
Supported Pi providers (v1): anthropic, openai, google, groq, mistral,
cerebras, xai, openrouter, huggingface. Extend PI_PROVIDER_ENV_VARS as
needed.
Out of scope (v1): session resume, MCP, hooks, skills mapping, thinking
level mapping, structured output, OAuth flows, model catalog validation.
These remain false on PI_CAPABILITIES until intentionally wired.
* feat(providers/pi): read ~/.pi/agent/auth.json for OAuth + api_key passthrough
Replaces the v1 env-var-only auth flow with AuthStorage.create(), which
reads ~/.pi/agent/auth.json. This transparently picks up credentials the
user has populated via `pi` → `/login` (OAuth subscriptions: Claude
Pro/Max, ChatGPT Plus, GitHub Copilot, Gemini CLI, Antigravity) or by
editing the file directly.
Env-var behavior preserved: when ANTHROPIC_API_KEY / GEMINI_API_KEY /
etc. is set (in process.env or per-request options.env), the adapter
calls setRuntimeApiKey which is priority #1 in Pi's resolution chain.
Auth.json entries are priority #2-#3. Pi's internal env-var fallback
remains priority #4 as a safety net.
Archon does not implement OAuth flows itself — it only rides on creds
the user created via the Pi CLI. OAuth refresh still happens inside Pi
(auth-storage.ts:369-413) under a file lock; concurrent refreshes
between the Pi CLI and Archon are race-safe by Pi's own design.
- Fail-fast error now mentions both the env-var path and `pi /login`
- 2 new tests: OAuth cred from auth.json; env var wins over auth.json
- 12 existing tests still pass (env-var-only path unchanged)
CI compatibility: no auth.json in CI, no change — env-var (secrets)
flows through Pi's getEnvApiKey fallback identically to v1.
* test(e2e): add Pi provider smoke test workflow
Mirrors e2e-claude-smoke.yaml: single prompt node + bash assert.
Targets `anthropic/claude-haiku-4-5` via `provider: pi`; works in CI
(ANTHROPIC_API_KEY secret) and locally (user's `pi /login` OAuth).
Verified locally with an Anthropic OAuth subscription — full run takes
~4s from session_started to assert PASS, exercising the async-queue
bridge and agent_end → result-chunk assembly under real Pi event timing.
Not yet wired into .github/workflows/e2e-smoke.yml — separate PR once
this lands, to keep the Pi provider PR minimal.
* feat(providers/pi): v2 — thinkingLevel, tool restrictions, systemPrompt
Extends the Pi adapter with three node-level translations, flipping the
corresponding capability flags from false → true so the dag-executor no
longer emits warnings for these fields on Pi nodes.
1. effort / thinking → Pi thinkingLevel (options-translator.ts)
- Archon EffortLevel enum: low|medium|high|max (from
packages/workflows/src/schemas/dag-node.ts). `max` maps to Pi's
`xhigh` since Archon's enum lacks it.
- Pi-native strings (minimal, xhigh, off) also accepted for
programmatic callers bypassing the schema.
- `off` on either field → no thinkingLevel (Pi's implicit off).
- Claude-shape object `thinking: {type:'enabled', budget_tokens:N}`
yields a system warning and is not applied.
2. allowed_tools / denied_tools → filtered Pi built-in tools
- Supports all 7 Pi tools: read, bash, edit, write, grep, find, ls.
- Case-insensitive normalization.
- Empty `allowed_tools: []` means no tools (LLM-only), matching
e2e-claude-smoke's idiom.
- Unknown names (Claude-specific like `WebFetch`) collected and
surfaced as a system warning; ignored tools don't fail the run.
3. systemPrompt (AgentRequestOptions + nodeConfig.systemPrompt)
- Threaded through `DefaultResourceLoader({systemPrompt})`; Pi's
default prompt is replaced entirely. Request-level wins over
node-level.
Capability flag changes:
- thinkingControl: false → true
- effortControl: false → true
- toolRestrictions: false → true
Package delta:
- +1 direct dep: @sinclair/typebox (Pi types reference it; adding as
direct dep resolves the TS portable-type error).
- +1 test file: options-translator.test.ts (19 tests, 100% coverage).
- provider.test.ts extended with 11 new tests covering all three paths.
- registry.test.ts updated: capability assertion reflects new flags.
Live-verified: `bun run cli workflow run e2e-pi-smoke --no-worktree`
succeeds in 1.2s with thinkingLevel=low, toolCount=0. Smoke YAML updated
to use `effort: low` (schema-valid) + `allowed_tools: []` (LLM-only).
* test(e2e): add comprehensive Pi smoke covering every CI-compatible node type
Exercises every node type Archon supports under `provider: pi`, except
`approval:` (pauses for human input, incompatible with CI):
1. prompt — inline AI prompt
2. command — named command file (uses e2e-echo-command.md)
3. loop — bounded iterative AI prompt (max_iterations: 2)
4. bash — shell script with JSON output
5. script — bun runtime (echo-args.js)
6. script — uv / Python runtime (echo-py.py)
Plus DAG features on top of Pi:
- depends_on + $nodeId.output substitution
- when: conditional with JSON dot-access
- trigger_rule: all_success merge
- final assert node validates every upstream output is non-empty
Complements the minimal e2e-pi-smoke.yaml — that stays as the fast-path
smoke for connectivity checks; this one is the broader surface coverage.
Verified locally end-to-end against Anthropic OAuth (pi /login): PASS,
all 9 non-final nodes produce output, assert succeeds.
* feat(providers/pi): resolve Archon `skills:` names to Pi skill paths
Flips capabilities.skills: false → true by translating Archon's name-based
`skills:` nodeConfig (e.g. `skills: [agent-browser]`) to absolute directory
paths Pi's DefaultResourceLoader can consume via additionalSkillPaths.
Search order for each skill name (first match wins):
1. <cwd>/.agents/skills/<name>/ — project-local, agentskills.io
2. <cwd>/.claude/skills/<name>/ — project-local, Claude convention
3. ~/.agents/skills/<name>/ — user-global, agentskills.io
4. ~/.claude/skills/<name>/ — user-global, Claude convention
A directory resolves only if it contains a SKILL.md. Unresolved names are
collected and surfaced as a system-chunk warning (e.g. "Pi could not
resolve skill names: foo, bar. Searched .agents/skills and .claude/skills
(project + user-global)."), matching the semantic of "requested but not
found" without aborting the run.
Pi's buildSystemPrompt auto-appends the agentskills.io XML block for each
loaded skill, so the model sees them — no separate prompt injection needed
(Pi differs from Claude here; Claude wraps in an AgentDefinition with a
preloaded prompt, Pi uses XML block in system prompt).
Ancestor directory traversal above cwd is deliberately skipped in this
pass — matches the Pi provider's cwd-bound scope and avoids ambiguity
about which repo's skills win when Archon runs from a subdirectory.
Bun's os.homedir() bypasses the HOME env var; the resolver uses
`process.env.HOME ?? homedir()` so tests can stage a synthetic home dir.
Tests:
- 11 new tests in options-translator.test.ts cover project/user, .agents/
vs .claude/, project-wins-over-user, SKILL.md presence check, dedup,
missing-name collection.
- 2 new integration tests in provider.test.ts cover the missing-skill
warning path and the "no skills configured → no additionalSkillPaths"
path.
- registry.test.ts updated to assert skills: true in capabilities.
Live-verified locally: `.claude/skills/archon-dev/SKILL.md` resolves,
pi.session_started log shows `skillCount: 1, missingSkillCount: 0`,
smoke workflow passes in 1.2s.
* feat(providers/pi): session resume via Pi session store
Flips capabilities.sessionResume: false → true. Pi now persists sessions
under ~/.pi/agent/sessions/<encoded-cwd>/<uuid>.jsonl by default — same
pattern Claude and Codex use for their respective stores, same blast
radius as those providers.
Flow:
- No resumeSessionId → SessionManager.create(cwd) (fresh, persisted)
- resumeSessionId + match in SessionManager.list(cwd) → open(path)
- resumeSessionId + no match → fresh session + system warning
("⚠️ Could not resume Pi session. Starting fresh conversation.")
Matches Codex's resume_thread_failed fallback at
packages/providers/src/codex/provider.ts:553-558.
The sessionId flows back to Archon via the terminal `result` chunk —
bridgeSession annotates it with session.sessionId unconditionally so
Archon's orchestrator can persist it and pass it as resumeSessionId on
the next turn. Same mechanism used for Claude/Codex.
Cross-cwd resume (e.g. worktree switch) is deliberately not supported in
this pass: list(cwd) scans only the current cwd's session dir. A workflow
that changes cwd mid-run lands on a fresh session, which matches Pi's
mental model.
Bridge sessionId annotation uses session.sessionId, which Pi always
populates (UUID) — so no special-case for inMemory sessions is needed.
Factored the resolver into session-resolver.ts (5 unit tests):
- no id → create
- id + match → open
- id + no match → create with resumeFailed: true
- list() throws → resumeFailed: true (graceful)
- empty-string id → treated as "no resume requested"
Integration tests in provider.test.ts add 3 cases:
- resume-not-found yields warning + calls create
- resume-match calls open with the file path, no warning
- result chunk always carries sessionId
Verified live end-to-end against Anthropic OAuth:
- first call → sessionId 019d...; model replies "noted"
- second call with that sessionId → "resumed: true" in logs; model
correctly recalls prior turn ("Crimson.")
- bogus sessionId → "⚠️ Could not resume..." warning + fresh UUID
* refactor(providers,core): generalize community-provider registration
Addresses the community-pattern regression flagged in the PR #1270 review:
a second community provider should require editing only its own directory,
not seven files across providers/ + core/ + cli/ + server/.
Three changes:
1. Drop typed `pi` slot from AssistantDefaultsConfig + AssistantDefaults.
Community providers live behind the generic `[string]` index that
`ProviderDefaultsMap` was explicitly designed to provide. The typed
claude/codex slots stay — they give IDE autocomplete for built-in
config access without `as` casts, which was the whole reason the
intersection exists. Community providers parse their own config via
Record<string, unknown> anyway, so the typed slot added no real
parser safety.
2. Loop-based getDefaults + mergeAssistantDefaults. No more hardcoded
`pi: {}` spreads. getDefaults() seeds from `getRegisteredProviders()`;
mergeAssistantDefaults clones every slot present in `base`. Adding a
new provider requires zero edits to this function.
3. New `registerCommunityProviders()` aggregator in registry.ts.
Entrypoints (CLI, server, config-loader) call ONE function after
`registerBuiltinProviders()` rather than one call per community
provider. Adding a new community provider is now a single-line edit
to registerCommunityProviders().
This makes Pi (and future community providers) actually behave like
Phase 2 (#1195) advertised: drop the implementation under
packages/providers/src/community/<id>/, export a `register<Id>Provider`,
add one line to the aggregator.
Tests:
- New `registerCommunityProviders` suite (2 tests: registers pi,
idempotent).
- config-loader.test updated: assert built-in slots explicitly rather
than exhaustive map shape.
No functional change for Pi end-users. Purely structural.
* fix(providers/pi,core): correctness + hygiene fixes from PR #1270 review
Addresses six of the review's important findings, all within the same
PR branch:
1. envInjection: false → true
The provider reads requestOptions.env on every call (for API-key
passthrough). Declaring the capability false caused a spurious
dag-executor warning for every Pi user who configured codebase env
vars — which is the MAIN auth path. Flipping to true removes the
false positive.
2. toSafeAssistantDefaults: denylist → allowlist
The old shape deleted `additionalDirectories`, `settingSources`,
`codexBinaryPath` before sending defaults to the web UI. Any future
sensitive provider field (OAuth token, absolute path, internal
metadata) would silently leak via the `[key: string]: unknown` index
signature. New SAFE_ASSISTANT_FIELDS map lists exactly what to
expose per provider; unknown providers get an empty allowlist so
the web UI sees "provider exists" but no config details.
3. AsyncQueue single-consumer invariant
The type was documented single-consumer but unenforced. A second
`for await` would silently race with the first over buffer +
waiters. Added a synchronous guard in Symbol.asyncIterator that
throws on second call — copy-paste mistakes now fail fast with a
clear message instead of dropping items.
4. session.dispose() / session.abort() silent catches
Both catch blocks now log at debug via a module-scoped logger so
SDK regressions surface without polluting normal output.
5. Type scripted events as AgentSessionEvent in provider.test.ts
Was `Record<string, unknown>` — Pi field renames would silently
keep tests passing. Now typed against Pi's actual event union.
6. Leaked /tmp/pi-research/... path in provider.ts comment
Local-machine path that crept in during research. Replaced with
the upstream GitHub URL (matches convention at provider.ts:110).
Plus review-flagged simplifications:
- Extract lookupPiModel wrapper — isolates the `as unknown as` cast
behind one searchable name.
- Hoist QueueItem → BridgeQueueItem at module scope (export'd for
test visibility; not used externally yet but enables unit testing
the mapping in isolation if needed later).
- getRegisteredProviderNames: remove side-effecting registration
calls. `loadConfig()` already bootstraps the registry before any
caller can observe this helper — the hidden coupling was
misleading.
Plus missing-coverage tests from the review (pr-test-analyzer):
- session.prompt() rejection → error surfaces to consumer
- pre-aborted signal → session.abort() called
- mid-stream abort → session.abort() called
- modelFallbackMessage → system chunk yielded
- AsyncQueue second-consumer → throws synchronously
No behavioral changes for end users beyond the envInjection warning
fix.
* docs: Pi provider + community-provider contributor guide
Addresses the PR #1270 review's docs-impact findings: the original Pi
PR had no user-facing or contributor-facing documentation, and
architecture.md still referenced the pre-Phase-2 factory.ts pattern
(factory.ts was deleted in #1195).
1. packages/docs-web/src/content/docs/reference/architecture.md
- Replace stale factory.ts references with the registry pattern.
- Update inline IAgentProvider block: add getCapabilities, add
options parameter.
- Rewrite MessageChunk block as the actual discriminated union
(was a placeholder with optional fields that didn't match the
current type).
- "Adding a New AI Agent Provider" checklist now distinguishes
built-in (register in registerBuiltinProviders) from community
(separate guide). Links to the new contributor guide.
2. packages/docs-web/src/content/docs/contributing/adding-a-community-provider.md (new)
- Step-by-step guide using Pi as the reference implementation.
- Covers: directory layout, capability discipline (start false,
flip one at a time), provider class skeleton, registration via
aggregator, test isolation (Bun mock.module pollution), what
NOT to do (no edits to AssistantDefaultsConfig, no direct
registerProvider from entrypoints, no overclaiming capabilities).
3. packages/docs-web/src/content/docs/getting-started/ai-assistants.md
- New "Pi (Community Provider)" section: install, OAuth +
API-key table per Pi backend, model ref format, workflow
examples, capability matrix showing what Pi supports (session
resume, tool restrictions, effort/thinking, skills, system
prompt, envInjection) and what it doesn't (MCP, hooks,
structured output, cost control, fallback model, sandbox).
4. .env.example
- New Pi section with commented env vars for each supported
backend (ANTHROPIC_API_KEY through HUGGINGFACE_API_KEY), each
paired with its Pi provider id. OAuth flow (pi /login → auth.json)
is explicitly called out — Archon reads that file too.
5. CHANGELOG.md
- Unreleased entry for Pi, registerCommunityProviders aggregator,
and the new contributor guide.
Named volumes inherit /.archon/workspaces and /.archon/worktrees from the
image layer on first run, but bind mounts do not. Without these directories,
the Claude subprocess is spawned with a non-existent cwd and fails silently,
causing the 60s first-event timeout.
Adding mkdir -p in the entrypoint is idempotent for named volumes and fixes
bind-mount setups (e.g. ARCHON_DATA pointing to a host path on macOS/Linux).
messages.test.ts uses mock.module('./connection', ...) at module-load time.
Per CLAUDE.md:131 (Bun issue oven-sh/bun#7823), mock.module() is process-
global and irreversible. When Bun pre-loads all test files in a batch, the
mock shadows the real connection module before connection.test.ts runs,
causing getDatabaseType() to always return the mocked value regardless of
DATABASE_URL.
Move connection.test.ts into its own `bun test` invocation immediately
after postgres.test.ts (which runs alone) and before the big DB/utils/
config/state batch that contains messages.test.ts. This follows the same
isolation pattern already used for command-handler, clone, postgres, and
path-validation tests.
* fix(workflows): add word boundary to context variable substitution regex (#1112)
Variable substitution for $CONTEXT, $EXTERNAL_CONTEXT, and $ISSUE_CONTEXT
was matching as a prefix of longer identifiers like $CONTEXT_FILE, silently
corrupting bash node scripts. Added negative lookahead (?![A-Za-z0-9_]) to
CONTEXT_VAR_PATTERN_STR so only exact variable names are substituted.
Changes:
- Add negative lookahead to CONTEXT_VAR_PATTERN_STR regex in executor-shared.ts
- Add regression test for prefix-match boundary case
Fixes#1112
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(workflows): add missing boundary cases for context variable substitution
Add three new test cases that complete coverage of the word-boundary fix
from #1112: $ISSUE_CONTEXT with suffix variants, $ISSUE_CONTEXT with multiple
suffixes, and contextSubstituted=false for suffix-only prompts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Companion to 75427c7c. The bundle-completeness test compared
BUNDLED_* strings (now LF-normalized by the generator) against raw
readFileSync output, which is CRLF on Windows checkouts. Apply the
same normalization to the on-disk side so the defense-in-depth check
stays meaningful on every platform.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
On Windows, `git checkout` converts source files to CRLF via the
`* text=auto` policy. The generator inlined raw file content as JSON
strings, so the Windows regeneration produced `\r\n` escapes while the
committed artifact (written on Linux) used `\n`. `bun run check:bundled`
then flagged the file as stale and failed the Windows CI job.
Fix by normalizing CRLF → LF both when reading source defaults and when
comparing against the existing generated file. No-op on Linux.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Settings → Projects Add Project only submitted { path }, so GitHub URLs
entered there failed even though the API and the Sidebar Add Project
already accepted them. Closes#1108.
Changes:
- Add packages/web/src/lib/codebase-input.ts: shared getCodebaseInput()
helper returning a discriminated { path } | { url } union (re-exported
from api.ts for convenience).
- Use the helper from all three Add Project entry points: Sidebar,
Settings, and ChatPage. Removes three divergent inline heuristics.
- SettingsPage: rename addPath → addValue (state now holds either URL
or local path) and update placeholder text.
- Tests: cover https://, git@ shorthand, ssh://, git://, whitespace,
unix/relative/home/Windows/UNC paths.
- Docs: document the unified Add Project entry point in adapters/web.md.
Heuristic flips from "assume URL unless explicitly local" to "assume
local unless explicitly remote" — only inputs starting with https?://,
ssh://, git@, or git:// are sent as { url }; everything else is sent
as { path }. The server already resolves tilde/relative paths.
Co-authored-by: Nguyen Huu Loc <lockbkbang@gmail.com>
* fix(bundled-defaults): auto-generate import list, emit inline strings
Root-cause fix for bundle drift (15 commands + 7 workflows previously
missing from binary distributions) and a prerequisite for packaging
@archon/workflows as a Node-loadable SDK.
The hand-maintained `bundled-defaults.ts` import list is replaced by
`scripts/generate-bundled-defaults.ts`, which walks
`.archon/{commands,workflows}/defaults/` and emits a generated source
file with inline string literals. `bundled-defaults.ts` becomes a thin
facade that re-exports the generated records and keeps the
`isBinaryBuild()` helper.
Inline strings (via JSON.stringify) replace Bun's
`import X from '...' with { type: 'text' }` attributes. The binary build
still embeds the data at compile time, but the module now loads under
Node too — removing SDK blocker #2.
- Generator: `scripts/generate-bundled-defaults.ts` (+ `--check` mode for CI)
- `package.json`: `generate:bundled`, `check:bundled`; wired into `validate`
- `build-binaries.sh`: regenerates defaults before compile
- Test: `bundle completeness` now derives expected set from on-disk files
- All 56 defaults (36 commands + 20 workflows) now in the bundle
* fix(bundled-defaults): address PR review feedback
Review: https://github.com/coleam00/Archon/pull/1263#issuecomment-4262719090
Generator:
- Guard against .yaml/.yml name collisions (previously silent overwrite)
- Add early access() check with actionable error when run from wrong cwd
- Type top-level catch as unknown; print only message for Error instances
- Drop redundant /* eslint-disable */ emission (global ignore covers it)
- Fix misleading CI-mechanism claim in header comment
- Collapse dead `if (!ext) continue` guard into a single typed pass
Scripts get real type-checking + linting:
- New scripts/tsconfig.json extending root config
- type-check now includes scripts/ via `tsc --noEmit -p scripts/tsconfig.json`
- Drop `scripts/**` from eslint ignores; add to projectService file scope
Tests:
- Inline listNames helper (Rule of Three)
- Drop redundant toBeDefined/typeof assertions; the Record<string, string>
type plus length > 50 already cover them
- Add content-fidelity round-trip assertion (defense against generator
content bugs, not just key-set drift)
Facade comment: drop dead reference to .claude/rules/dx-quirks.md.
CI: wire `bun run check:bundled` into .github/workflows/test.yml so the
header's CI-verification claim is truthful.
Docs: CLAUDE.md step count four→five; add contributor bullet about
`bun run generate:bundled` in the Defaults section and CONTRIBUTING.md.
* chore(e2e): bump Codex model to gpt-5.2
gpt-5.1-codex-mini is deprecated and unavailable on ChatGPT-account Codex
auth. Plain gpt-5.2 works. Verified end-to-end:
- e2e-codex-smoke: structured output returns {category:'math'}
- e2e-mixed-providers: claude+codex both return expected tokens
* feat(telemetry): add anonymous PostHog workflow-invocation tracking
Emits one `workflow_invoked` event per run with workflow name/description,
platform, and Archon version. Uses a stable random UUID persisted to
`$ARCHON_HOME/telemetry-id` for distinct-install counting, with
`$process_person_profile: false` to stay in PostHog's anonymous tier.
Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host
via `POSTHOG_API_KEY` / `POSTHOG_HOST`.
Closes#1261
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(telemetry): stop leaking test events to production PostHog
The `telemetry-id preservation` test exercised the real capture path with
the embedded production key, so every `bun run validate` published a
tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback
so the flush fails silently; bump test timeout to accommodate the
retry-then-give-up window.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(telemetry): silence posthog-node stderr leak on network failure
The PostHog SDK's internal logFlushError() writes 'Error while flushing
PostHog' directly to stderr via console.error on any network or HTTP
error, bypassing logger config. For a fire-and-forget telemetry path
this leaked stack traces to users' terminals whenever PostHog was
unreachable (offline, firewalled, DNS broken, rate-limited).
Pass a silentFetch wrapper to the PostHog client that masks failures as
fake 200 responses. The SDK never sees an error, so it never logs.
Original failure is still recorded at debug level for diagnostics.
Side benefit: shutdown is now fast on network failure (no retry loop),
so offline CLI commands no longer hang ~10s on exit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test(telemetry): make id-preservation test deterministic
Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback
dance with a direct synchronous call to getOrCreateTelemetryId(). Export
the function with an @internal marker so tests can exercise the id path
without spinning up the PostHog client. No network, no timer, no flake.
Addresses CodeRabbit feedback on #1262.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Removes feat/e2e-smoke-tests from E2E workflow triggers. CI failure
detection verified: red X on run 24522356737 (deliberate bash exit 1),
green on run 24522484762 (reverted), and credit-exhaustion failure also
correctly produced exit 1.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverts the injected exit 1 in bash-echo (CI red X confirmed in run
24522356737). Removes feat/e2e-smoke-tests from branch triggers — ready
to merge to dev.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Injects exit 1 into e2e-deterministic bash-echo node to prove the engine
fix (failWorkflowRun on anyFailed) propagates to a non-zero CLI exit code
and a red X in GitHub Actions. Will be reverted in the next commit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Command nodes consistently produce zero output and hit the 30s idle
timeout in CI, even with allowed_tools: []. This appears to be a bug
in how command: nodes interact with the Claude CLI subprocess — the
process never emits output. This adds 30s of wasted time to every run.
The simple prompt node already verifies Claude connectivity. Command
file discovery/loading is a deterministic operation that doesn't need
an AI call to validate in a smoke test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The command-test node was missing allowed_tools: [], causing the Claude
CLI to load full tool access. Without tools restricted, the subprocess
hangs after responding. The simple prompt node with allowed_tools: []
completes in 4s — this should match.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude CLI is extremely slow with structured output (~4 min) and tool use
(~2 min) in CI, making the previous multi-workflow approach take 10+ min.
Radical simplification:
- Remove e2e-all-nodes (redundant with deterministic + claude-smoke)
- Remove e2e-skills-mcp (advanced features too slow for per-commit smoke)
- Remove structured output and tool use from Claude smoke test (too slow)
- Strip Claude smoke to: 1 prompt + 1 command + 1 bash verify node
- Keep mixed providers (simplified: 1 Claude + 1 Codex + bash verify)
- All timeouts reduced to 30s, all job timeouts to 5 min
- Remove MCP test fixtures and e2e-test-skill (no longer needed)
Expected: Claude job ~15s of AI time, Codex ~5s, mixed ~10s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude CLI is slow with structured output and tool use in CI (~4 min for
structured output, ~2 min for tool use). With 3 sequential workflow runs
(claude-smoke, all-nodes, skills-mcp), 10 minutes is insufficient.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename echo-args.py → echo-py.py to avoid duplicate script name conflict
with echo-args.js (script discovery uses base name, not extension)
- Add CODEX_API_KEY env var to codex and mixed CI jobs (Codex CLI requires
this, not OPENAI_API_KEY, for headless auth)
- Sequentialize all Claude AI nodes via depends_on chains to prevent
concurrent CLI subprocess idle timeouts in CI
- Increase idle_timeout from 60s to 120s on all AI nodes for CI headroom
- Override MCP test node to model: sonnet (Haiku doesn't support MCP tool search)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds real workflow execution to CI, verifying the full engine works
end-to-end with both providers. Organized into 4 tiers: deterministic
(0 API calls), Claude, Codex, and mixed-provider tests.
New workflows:
- e2e-deterministic: bash, script (bun/uv), conditions, trigger rules
- e2e-skills-mcp: skills injection, MCP server, effort, systemPrompt
- Enhanced existing e2e-claude-smoke, e2e-codex-smoke, e2e-mixed-providers
- Fixed e2e-all-nodes (was broken due to script node syntax)
Supporting files:
- e2e-echo-command.md (test command file)
- echo-args.py (Python script for uv runtime test)
- e2e-test-skill/SKILL.md (minimal skill for injection test)
- e2e-filesystem.json (MCP config for filesystem server test)
GitHub Actions: .github/workflows/e2e-smoke.yml
- Runs on push to main/dev only (no PR trigger to avoid API cost abuse)
- Uses haiku (Claude) and gpt-5.1-codex-mini (Codex) for cost efficiency
Closes#1254
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: surface auth errors instead of silently dropping them (#1076)
When Claude OAuth refresh token is expired, the SDK yields a result chunk
with is_error=true and no session_id. Both handleStreamMode and
handleBatchMode guarded the result branch with `&& msg.sessionId`,
silently dropping the error. Users saw no response at all.
Changes:
- Remove sessionId guard from result branches in orchestrator-agent.ts
- Add isError early-exit that sends error message to user
- Add 4 OAuth patterns to AUTH_PATTERNS in claude.ts and codex.ts
- Add OAuth refresh-token handler to error-formatter.ts
- Add tests for new error-formatter branches
Fixes#1076
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add structured logging to isError path and remove overly broad auth pattern
- Add getLog().warn({ conversationId, errorSubtype }, 'ai_result_error') in both
handleStreamMode and handleBatchMode isError branches so auth failures are
visible server-side instead of silently swallowed
- Remove 'access token' from AUTH_PATTERNS in claude.ts and codex.ts; the real
OAuth refresh error is already covered by 'refresh token' and 'could not be
refreshed', eliminating false-positive auth classification risk
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: route isError results through classifyAndFormatError with provider-specific messages
The isError path in stream/batch mode used a hardcoded generic message,
bypassing the classifyAndFormatError infrastructure. Now constructs a
synthetic Error from errorSubtype and routes through the formatter.
Error formatter updated with provider-specific auth detection:
- Claude: OAuth token refresh, sign-in expired → guidance to run /login
- Codex: 401 retry exhaustion → guidance to run codex login
- General: tightened patterns (removed broad 'auth error' substring match)
Also persists session ID before early-returning on isError.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(workflows): stop warning about model/provider on loop nodes (#1082)
The loader incorrectly classified loop nodes as "non-AI nodes" and warned
that model/provider fields were ignored, even though the DAG executor has
supported these fields on loop nodes since commit 594d5daa.
Changes:
- Add LOOP_NODE_AI_FIELDS constant excluding model/provider from the warn list
- Update loader to use LOOP_NODE_AI_FIELDS for loop node field checking
- Fix BASH_NODE_AI_FIELDS comment that incorrectly referenced loop nodes
- Add tests for loop node model/provider acceptance and unsupported field warnings
Fixes#1082
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(workflows): update stale comment and add LOOP_NODE_AI_FIELDS unit tests
- Update section comment from "bash/loop nodes" to "non-AI nodes" since loop
nodes do support model/provider (the fix in this PR)
- Export LOOP_NODE_AI_FIELDS from schemas/index.ts alongside BASH/SCRIPT variants
- Add dedicated describe block in schemas.test.ts verifying that model and
provider are excluded and all other BASH_NODE_AI_FIELDS are still present
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* simplify: merge nodeType and aiFields into a single if/else chain in parseDagNode
Eliminates the separate isNonAiNode predicate and nested ternary for aiFields
selection by combining both into one explicit if/else block — each branch sets
nodeType and aiFields together, removing the need to re-check node type twice.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix misleading 'unde***' log when ctx.from is undefined; use 'unknown'
to match the Slack/Discord adapter pattern
- Log post-startup bot runtime errors before reject() (no-op after
onStart fires but errors are now visible in logs)
- Add debug log when message is dropped due to no handler registered
- Add stop() unit test to guard against grammY API rename regressions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Telegraf v4's internal `redactToken()` assigns to readonly `error.message`
properties, which crashes under Bun's strict ESM mode. Telegraf is EOL.
Changes:
- Replace `telegraf` dependency with `grammy` ^1.36.0
- Migrate adapter from Telegraf API to grammY API (Bot, bot.api, bot.start)
- Use grammY's `onStart` callback pattern for async polling launch
- Preserve 409 retry logic and all existing behavior
- Update test mocks from telegraf types to grammy types
Fixes#1042
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>