Archon/Dockerfile

191 lines
8.5 KiB
Text
Raw Permalink Normal View History

feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# =============================================================================
# Archon - Remote Agentic Coding Platform
# Multi-stage build: deps → web build → production image
# =============================================================================
# ---------------------------------------------------------------------------
# Stage 1: Install dependencies
# ---------------------------------------------------------------------------
fix(docker): fix Docker build failures and add CI guard (#1022) * fix(docker): update Bun base image from 1.2 to 1.3 The lockfile was generated with Bun 1.3.x locally but the Docker image used oven/bun:1.2-slim. Bun 1.3 changed the lockfile format, causing --frozen-lockfile to fail during docker build. * fix(docker): pin Bun to exact version 1.3.9 matching lockfile Floating tag 1.3-slim resolved to 1.3.11 which has a different lockfile format than 1.3.9 used to generate bun.lock. Pin to exact patch version to prevent --frozen-lockfile failures. * fix(docker): add missing docs-web workspace package.json The docs-web package was added as a workspace member but its package.json was never added to the Dockerfile COPY steps. This caused bun install --frozen-lockfile to fail because the workspace layout in Docker didn't match the lockfile. * fix(docker): use hoisted linker for Vite/Rollup compatibility Bun's default "isolated" linker stores packages in node_modules/.bun/ with symlinks that Vite's Rollup bundler cannot resolve during production builds (e.g., remark-gfm → mdast-util-gfm chain). Using --linker=hoisted gives the classic flat node_modules layout that Rollup expects. Local dev is unaffected (Vite dev server handles the isolated layout fine). * ci: pin Bun version to 1.3.9 and add Docker build check - Align CI Bun version (was 1.3.11) with Dockerfile and local dev (1.3.9) to prevent lockfile format mismatches between environments - Add docker-build job to test.yml that builds the Docker image on every PR — catches Dockerfile regressions (missing workspace packages, linker issues, build failures) before they reach deploy * fix(ci): add permissions for GHA cache and tighten Bun engine - Add actions: write permission to docker-build job so GHA layer cache writes succeed on PRs from forks - Tighten package.json engines.bun from >=1.0.0 to >=1.3.9 to document the minimum version that matches the lockfile format * fix(ci): add smoke test, align Bun version across all workflows Review fixes: - Add load: true + health endpoint smoke test to docker-build CI job so we verify the image actually starts, not just compiles - Align Bun 1.3.9 in deploy-docs.yml and release.yml (were still 1.3.11) - Document why docs-web source is intentionally omitted from Docker * chore: float Docker to bun:1.3 and align CI to 1.3.11 - Dockerfile: oven/bun:1.3-slim (auto-tracks latest 1.3.x patches) - CI workflows: bun-version 1.3.11 (current latest, reproducible) - engines.bun: >=1.3.9 (minimum for local devs) Lockfile format is stable across 1.3.x patches, so this is safe. * fix(docker,ci): pin Docker to 1.3.11, loosen engines, harden smoke test - Dockerfile: pin oven/bun:1.3.11-slim (was floating 1.3-slim) so Docker builds are reproducible and match CI exactly. - package.json: loosen engines to ^1.3.0 so end users on any 1.3.x can run the CLI; CI/Docker remain pinned to the canonical latest. - CI smoke test: replace 'sleep 5' with curl --retry-connrefused, and move container cleanup to an 'if: always()' step so a failed health check no longer leaks the named container. --------- Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2026-04-07 07:37:47 +00:00
FROM oven/bun:1.3.11-slim AS deps
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
WORKDIR /app
# Copy root package files and lockfile
COPY package.json bun.lock ./
# Copy ALL workspace package.json files (monorepo lockfile depends on all of them)
COPY packages/adapters/package.json ./packages/adapters/
COPY packages/cli/package.json ./packages/cli/
COPY packages/core/package.json ./packages/core/
fix(docker): fix Docker build failures and add CI guard (#1022) * fix(docker): update Bun base image from 1.2 to 1.3 The lockfile was generated with Bun 1.3.x locally but the Docker image used oven/bun:1.2-slim. Bun 1.3 changed the lockfile format, causing --frozen-lockfile to fail during docker build. * fix(docker): pin Bun to exact version 1.3.9 matching lockfile Floating tag 1.3-slim resolved to 1.3.11 which has a different lockfile format than 1.3.9 used to generate bun.lock. Pin to exact patch version to prevent --frozen-lockfile failures. * fix(docker): add missing docs-web workspace package.json The docs-web package was added as a workspace member but its package.json was never added to the Dockerfile COPY steps. This caused bun install --frozen-lockfile to fail because the workspace layout in Docker didn't match the lockfile. * fix(docker): use hoisted linker for Vite/Rollup compatibility Bun's default "isolated" linker stores packages in node_modules/.bun/ with symlinks that Vite's Rollup bundler cannot resolve during production builds (e.g., remark-gfm → mdast-util-gfm chain). Using --linker=hoisted gives the classic flat node_modules layout that Rollup expects. Local dev is unaffected (Vite dev server handles the isolated layout fine). * ci: pin Bun version to 1.3.9 and add Docker build check - Align CI Bun version (was 1.3.11) with Dockerfile and local dev (1.3.9) to prevent lockfile format mismatches between environments - Add docker-build job to test.yml that builds the Docker image on every PR — catches Dockerfile regressions (missing workspace packages, linker issues, build failures) before they reach deploy * fix(ci): add permissions for GHA cache and tighten Bun engine - Add actions: write permission to docker-build job so GHA layer cache writes succeed on PRs from forks - Tighten package.json engines.bun from >=1.0.0 to >=1.3.9 to document the minimum version that matches the lockfile format * fix(ci): add smoke test, align Bun version across all workflows Review fixes: - Add load: true + health endpoint smoke test to docker-build CI job so we verify the image actually starts, not just compiles - Align Bun 1.3.9 in deploy-docs.yml and release.yml (were still 1.3.11) - Document why docs-web source is intentionally omitted from Docker * chore: float Docker to bun:1.3 and align CI to 1.3.11 - Dockerfile: oven/bun:1.3-slim (auto-tracks latest 1.3.x patches) - CI workflows: bun-version 1.3.11 (current latest, reproducible) - engines.bun: >=1.3.9 (minimum for local devs) Lockfile format is stable across 1.3.x patches, so this is safe. * fix(docker,ci): pin Docker to 1.3.11, loosen engines, harden smoke test - Dockerfile: pin oven/bun:1.3.11-slim (was floating 1.3-slim) so Docker builds are reproducible and match CI exactly. - package.json: loosen engines to ^1.3.0 so end users on any 1.3.x can run the CLI; CI/Docker remain pinned to the canonical latest. - CI smoke test: replace 'sleep 5' with curl --retry-connrefused, and move container cleanup to an 'if: always()' step so a failed health check no longer leaks the named container. --------- Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2026-04-07 07:37:47 +00:00
# docs-web source is NOT copied — it's a static site deployed separately
# (see .github/workflows/deploy-docs.yml). package.json is included only
# so Bun's workspace lockfile resolves correctly.
COPY packages/docs-web/package.json ./packages/docs-web/
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
COPY packages/git/package.json ./packages/git/
COPY packages/isolation/package.json ./packages/isolation/
COPY packages/paths/package.json ./packages/paths/
refactor: extract providers from @archon/core into @archon/providers (#1137) * refactor: extract providers from @archon/core into @archon/providers Move Claude and Codex provider implementations, factory, and SDK dependencies into a new @archon/providers package. This establishes a clean boundary: providers own SDK translation, core owns business logic. Key changes: - New @archon/providers package with zero-dep contract layer (types.ts) - @archon/workflows imports from @archon/providers/types — no mirror types - dag-executor delegates option building to providers via nodeConfig - IAgentProvider gains getCapabilities() for provider-agnostic warnings - @archon/core no longer depends on SDK packages directly - UnknownProviderError standardizes error shape across all surfaces Zero user-facing changes — same providers, same config, same behavior. * refactor: remove config type duplication and backward-compat re-exports Address review findings: - Move ClaudeProviderDefaults and CodexProviderDefaults to the @archon/providers/types contract layer as the single source of truth. @archon/core/config/config-types.ts now imports from there. - Remove provider re-exports from @archon/core (index.ts and types/). Consumers should import from @archon/providers directly. - Update @archon/server to depend on @archon/providers for MessageChunk. * refactor: move structured output validation into providers Each provider now normalizes its own structured output semantics: - Claude already yields structuredOutput from the SDK's native field - Codex now parses inline agent_message text as JSON when outputFormat is set, populating structuredOutput on the result chunk This eliminates the last provider === 'codex' branch from dag-executor, making it fully provider-agnostic. The dag-executor checks structuredOutput uniformly regardless of provider. Also removes the ClaudeCodexProviderDefaults deprecated alias — all consumers now use ClaudeProviderDefaults directly. * fix: address PR review — restore warnings, fix loop options, cleanup Critical fixes: - Restore MCP missing env vars user-facing warning (was silently dropped) - Restore Haiku + MCP tool search warning - Fix buildLoopNodeOptions to pass workflow-level nodeConfig (effort, thinking, betas, sandbox were silently lost for loop nodes) - Add TODO(#1135) comments documenting env-leak gate gap Cleanup: - Remove backward-compat type aliases from deps.ts (keep WorkflowTokenUsage) - Remove 26 unnecessary eslint-disable comments from test files - Trim internal helpers from providers barrel (withFirstMessageTimeout, getProcessUid, loadMcpConfig, buildSDKHooksFromYAML) - Add @archon/providers dep to CLI package.json - Fix 8 stale documentation paths pointing to deleted core/src/providers/ - Add E2E smoke test workflows for both Claude and Codex providers * fix: forward provider system warnings to users in dag-executor The dag-executor only forwarded system chunks starting with "MCP server connection failed:" — all other provider warnings (missing env vars, Haiku+MCP, structured output issues) were logged but never reached the user. Now forwards all system chunks starting with ⚠️ (the prefix providers use for user-actionable warnings). * fix: add providers package to Dockerfile and fix CI module resolution - Add packages/providers/ to all three Dockerfile stages (deps, production package.json copy, production source copy) - Replace wildcard export map (./*) with explicit subpath entries to fix module resolution in CI (bun workspace linking) * chore: update bun.lock for providers package exports
2026-04-13 06:21:36 +00:00
COPY packages/providers/package.json ./packages/providers/
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
COPY packages/server/package.json ./packages/server/
COPY packages/web/package.json ./packages/web/
COPY packages/workflows/package.json ./packages/workflows/
# Install ALL dependencies (including devDependencies needed for web build)
fix(docker): fix Docker build failures and add CI guard (#1022) * fix(docker): update Bun base image from 1.2 to 1.3 The lockfile was generated with Bun 1.3.x locally but the Docker image used oven/bun:1.2-slim. Bun 1.3 changed the lockfile format, causing --frozen-lockfile to fail during docker build. * fix(docker): pin Bun to exact version 1.3.9 matching lockfile Floating tag 1.3-slim resolved to 1.3.11 which has a different lockfile format than 1.3.9 used to generate bun.lock. Pin to exact patch version to prevent --frozen-lockfile failures. * fix(docker): add missing docs-web workspace package.json The docs-web package was added as a workspace member but its package.json was never added to the Dockerfile COPY steps. This caused bun install --frozen-lockfile to fail because the workspace layout in Docker didn't match the lockfile. * fix(docker): use hoisted linker for Vite/Rollup compatibility Bun's default "isolated" linker stores packages in node_modules/.bun/ with symlinks that Vite's Rollup bundler cannot resolve during production builds (e.g., remark-gfm → mdast-util-gfm chain). Using --linker=hoisted gives the classic flat node_modules layout that Rollup expects. Local dev is unaffected (Vite dev server handles the isolated layout fine). * ci: pin Bun version to 1.3.9 and add Docker build check - Align CI Bun version (was 1.3.11) with Dockerfile and local dev (1.3.9) to prevent lockfile format mismatches between environments - Add docker-build job to test.yml that builds the Docker image on every PR — catches Dockerfile regressions (missing workspace packages, linker issues, build failures) before they reach deploy * fix(ci): add permissions for GHA cache and tighten Bun engine - Add actions: write permission to docker-build job so GHA layer cache writes succeed on PRs from forks - Tighten package.json engines.bun from >=1.0.0 to >=1.3.9 to document the minimum version that matches the lockfile format * fix(ci): add smoke test, align Bun version across all workflows Review fixes: - Add load: true + health endpoint smoke test to docker-build CI job so we verify the image actually starts, not just compiles - Align Bun 1.3.9 in deploy-docs.yml and release.yml (were still 1.3.11) - Document why docs-web source is intentionally omitted from Docker * chore: float Docker to bun:1.3 and align CI to 1.3.11 - Dockerfile: oven/bun:1.3-slim (auto-tracks latest 1.3.x patches) - CI workflows: bun-version 1.3.11 (current latest, reproducible) - engines.bun: >=1.3.9 (minimum for local devs) Lockfile format is stable across 1.3.x patches, so this is safe. * fix(docker,ci): pin Docker to 1.3.11, loosen engines, harden smoke test - Dockerfile: pin oven/bun:1.3.11-slim (was floating 1.3-slim) so Docker builds are reproducible and match CI exactly. - package.json: loosen engines to ^1.3.0 so end users on any 1.3.x can run the CLI; CI/Docker remain pinned to the canonical latest. - CI smoke test: replace 'sleep 5' with curl --retry-connrefused, and move container cleanup to an 'if: always()' step so a failed health check no longer leaks the named container. --------- Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2026-04-07 07:37:47 +00:00
# --linker=hoisted: Bun's default "isolated" linker stores packages in
# node_modules/.bun/ with symlinks that Vite/Rollup cannot resolve during
# production builds. Hoisted layout gives classic flat node_modules.
RUN bun install --frozen-lockfile --linker=hoisted
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# ---------------------------------------------------------------------------
# Stage 2: Build web UI (Vite + React)
# ---------------------------------------------------------------------------
FROM deps AS web-build
# Copy full source (needed for workspace resolution and web build)
COPY . .
# Build the web frontend — output goes to packages/web/dist/
RUN bun run build:web && \
test -f packages/web/dist/index.html || \
(echo "ERROR: Web build produced no index.html" >&2 && exit 1)
# ---------------------------------------------------------------------------
# Stage 3: Production image
# ---------------------------------------------------------------------------
fix(docker): fix Docker build failures and add CI guard (#1022) * fix(docker): update Bun base image from 1.2 to 1.3 The lockfile was generated with Bun 1.3.x locally but the Docker image used oven/bun:1.2-slim. Bun 1.3 changed the lockfile format, causing --frozen-lockfile to fail during docker build. * fix(docker): pin Bun to exact version 1.3.9 matching lockfile Floating tag 1.3-slim resolved to 1.3.11 which has a different lockfile format than 1.3.9 used to generate bun.lock. Pin to exact patch version to prevent --frozen-lockfile failures. * fix(docker): add missing docs-web workspace package.json The docs-web package was added as a workspace member but its package.json was never added to the Dockerfile COPY steps. This caused bun install --frozen-lockfile to fail because the workspace layout in Docker didn't match the lockfile. * fix(docker): use hoisted linker for Vite/Rollup compatibility Bun's default "isolated" linker stores packages in node_modules/.bun/ with symlinks that Vite's Rollup bundler cannot resolve during production builds (e.g., remark-gfm → mdast-util-gfm chain). Using --linker=hoisted gives the classic flat node_modules layout that Rollup expects. Local dev is unaffected (Vite dev server handles the isolated layout fine). * ci: pin Bun version to 1.3.9 and add Docker build check - Align CI Bun version (was 1.3.11) with Dockerfile and local dev (1.3.9) to prevent lockfile format mismatches between environments - Add docker-build job to test.yml that builds the Docker image on every PR — catches Dockerfile regressions (missing workspace packages, linker issues, build failures) before they reach deploy * fix(ci): add permissions for GHA cache and tighten Bun engine - Add actions: write permission to docker-build job so GHA layer cache writes succeed on PRs from forks - Tighten package.json engines.bun from >=1.0.0 to >=1.3.9 to document the minimum version that matches the lockfile format * fix(ci): add smoke test, align Bun version across all workflows Review fixes: - Add load: true + health endpoint smoke test to docker-build CI job so we verify the image actually starts, not just compiles - Align Bun 1.3.9 in deploy-docs.yml and release.yml (were still 1.3.11) - Document why docs-web source is intentionally omitted from Docker * chore: float Docker to bun:1.3 and align CI to 1.3.11 - Dockerfile: oven/bun:1.3-slim (auto-tracks latest 1.3.x patches) - CI workflows: bun-version 1.3.11 (current latest, reproducible) - engines.bun: >=1.3.9 (minimum for local devs) Lockfile format is stable across 1.3.x patches, so this is safe. * fix(docker,ci): pin Docker to 1.3.11, loosen engines, harden smoke test - Dockerfile: pin oven/bun:1.3.11-slim (was floating 1.3-slim) so Docker builds are reproducible and match CI exactly. - package.json: loosen engines to ^1.3.0 so end users on any 1.3.x can run the CLI; CI/Docker remain pinned to the canonical latest. - CI smoke test: replace 'sleep 5' with curl --retry-connrefused, and move container cleanup to an 'if: always()' step so a failed health check no longer leaks the named container. --------- Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2026-04-07 07:37:47 +00:00
FROM oven/bun:1.3.11-slim AS production
Add Archon distribution config and directory structure (#101) * Add Archon distribution config and directory structure - Create centralized path resolution in src/utils/archon-paths.ts - Add YAML configuration system (src/config/) with layered loading - Update Dockerfile and docker-compose for /.archon/ directory - Add GHCR publish workflow for multi-arch Docker builds - Create deploy/ directory with end-user docker-compose - Add /init command to create .archon structure in repos - Add docs/configuration.md reference guide - Update README with Quick Start section - Add bun run validate script - Update tests for new path defaults (~/.archon/) Directory structure: - Local: ~/.archon/{workspaces,worktrees,config.yaml} - Docker: /.archon/{workspaces,worktrees} - Repo: .archon/{commands,workflows,config.yaml} Legacy WORKSPACE_PATH and WORKTREE_BASE env vars still supported. * Complete Archon distribution config implementation - Wire up config system in src/index.ts (Task 3.5) - Remove legacy WORKSPACE_PATH and WORKTREE_BASE support - Add logConfig() function to config-loader.ts - Update docker-compose.yml to use ARCHON_DOCKER env var - Remove legacy env vars from .env.example - Update all documentation to reference ARCHON_HOME - Create scripts/validate-setup.sh for setup validation - Add setup:check script to package.json - Create docs/getting-started.md guide - Create docs/archon-architecture.md technical docs - Update tests to use ARCHON_HOME instead of legacy vars - Fix validate.md command template for new paths All plan phases now complete: - Phase 1: Archon Directory Structure - Phase 2: Docker Distribution - Phase 3: YAML Configuration System - Phase 4: Developer Experience - Phase 5: Documentation
2025-12-17 19:45:41 +00:00
# OCI Labels for GHCR
LABEL org.opencontainers.image.source="https://github.com/coleam00/Archon"
Add Archon distribution config and directory structure (#101) * Add Archon distribution config and directory structure - Create centralized path resolution in src/utils/archon-paths.ts - Add YAML configuration system (src/config/) with layered loading - Update Dockerfile and docker-compose for /.archon/ directory - Add GHCR publish workflow for multi-arch Docker builds - Create deploy/ directory with end-user docker-compose - Add /init command to create .archon structure in repos - Add docs/configuration.md reference guide - Update README with Quick Start section - Add bun run validate script - Update tests for new path defaults (~/.archon/) Directory structure: - Local: ~/.archon/{workspaces,worktrees,config.yaml} - Docker: /.archon/{workspaces,worktrees} - Repo: .archon/{commands,workflows,config.yaml} Legacy WORKSPACE_PATH and WORKTREE_BASE env vars still supported. * Complete Archon distribution config implementation - Wire up config system in src/index.ts (Task 3.5) - Remove legacy WORKSPACE_PATH and WORKTREE_BASE support - Add logConfig() function to config-loader.ts - Update docker-compose.yml to use ARCHON_DOCKER env var - Remove legacy env vars from .env.example - Update all documentation to reference ARCHON_HOME - Create scripts/validate-setup.sh for setup validation - Add setup:check script to package.json - Create docs/getting-started.md guide - Create docs/archon-architecture.md technical docs - Update tests to use ARCHON_HOME instead of legacy vars - Fix validate.md command template for new paths All plan phases now complete: - Phase 1: Archon Directory Structure - Phase 2: Docker Distribution - Phase 3: YAML Configuration System - Phase 4: Developer Experience - Phase 5: Documentation
2025-12-17 19:45:41 +00:00
LABEL org.opencontainers.image.description="Control AI coding assistants remotely from Telegram, Slack, Discord, and GitHub"
LABEL org.opencontainers.image.licenses="MIT"
# Prevent interactive prompts during installation
ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /app
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Install system dependencies + gosu for privilege dropping in entrypoint
RUN apt-get update && apt-get install -y \
curl \
git \
bash \
ca-certificates \
gnupg \
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
gosu \
postgresql-client \
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Chromium for agent-browser E2E testing (drives browser via CDP)
chromium \
&& rm -rf /var/lib/apt/lists/*
# Install GitHub CLI
RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \
&& chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& apt-get update \
&& apt-get install -y gh \
&& rm -rf /var/lib/apt/lists/*
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Install agent-browser CLI (Vercel Labs) for E2E testing workflows
# - Uses npm (not bun) because postinstall script downloads the native Rust binary
# - After install, symlink the Rust binary directly and purge nodejs/npm (~60MB saved)
# - The npm entry point is a Node.js wrapper; the native binary works standalone
# - agent-browser auto-detects Docker (via /.dockerenv) and adds --no-sandbox to Chromium
RUN apt-get update && apt-get install -y --no-install-recommends nodejs npm \
&& npm install -g agent-browser@0.22.1 \
&& NATIVE_BIN=$(find /usr/local/lib/node_modules/agent-browser -name 'agent-browser-*' -type f -executable 2>/dev/null | head -1) \
&& if [ -n "$NATIVE_BIN" ]; then \
cp "$NATIVE_BIN" /usr/local/bin/agent-browser-native \
&& chmod +x /usr/local/bin/agent-browser-native \
&& ln -sf /usr/local/bin/agent-browser-native /usr/local/bin/agent-browser; \
else \
echo "ERROR: agent-browser native binary not found after npm install" >&2 && exit 1; \
fi \
&& npm cache clean --force \
&& rm -rf /usr/local/lib/node_modules/agent-browser \
&& apt-get purge -y nodejs npm \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
# Point agent-browser to system Chromium (avoids ~400MB Chrome for Testing download)
ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
fix(providers): replace Claude SDK embed with explicit binary-path resolver (#1217) * feat(providers): replace Claude SDK embed with explicit binary-path resolver Drop `@anthropic-ai/claude-agent-sdk/embed` and resolve Claude Code via CLAUDE_BIN_PATH env → assistants.claude.claudeBinaryPath config → throw with install instructions. The embed's silent failure modes on macOS (#1210) and Windows (#1087) become actionable errors with a documented recovery path. Dev mode (bun run) remains auto-resolved via node_modules. The setup wizard auto-detects Claude Code by probing the native installer path (~/.local/bin/claude), npm global cli.js, and PATH, then writes CLAUDE_BIN_PATH to ~/.archon/.env. Dockerfile pre-sets CLAUDE_BIN_PATH so extenders using the compiled binary keep working. Release workflow gets negative and positive resolver smoke tests. Docs, CHANGELOG, README, .env.example, CLAUDE.md, test-release and archon skills all updated to reflect the curl-first install story. Retires #1210, #1087, #1091 (never merged, now obsolete). Implements #1176. * fix(providers): only pass --no-env-file when spawning Claude via Bun/Node `--no-env-file` is a Bun flag that prevents Bun from auto-loading `.env` from the subprocess cwd. It is only meaningful when the Claude Code executable is a `cli.js` file — in which case the SDK spawns it via `bun`/`node` and the flag reaches the runtime. When `CLAUDE_BIN_PATH` points at a native compiled Claude binary (e.g. `~/.local/bin/claude` from the curl installer, which is Anthropic's recommended default), the SDK executes the binary directly. Passing `--no-env-file` then goes straight to the native binary, which rejects it with `error: unknown option '--no-env-file'` and the subprocess exits code 1. Emit `executableArgs` only when the target is a `.js` file (dev mode or explicit cli.js path). Caught by end-to-end smoke testing against the curl-installed native Claude binary. * docs: record env-leak validation result in provider comment Verified end-to-end with sentinel `.env` and `.env.local` files in a workflow CWD that the native Claude binary (curl installer) does not auto-load `.env` files. With Archon's full spawn pathway and parent env stripped, the subprocess saw both sentinels as UNSET. The first-layer protection in `@archon/paths` (#1067) handles the inheritance leak; `--no-env-file` only matters for the Bun-spawned cli.js path, where it is still emitted. * chore(providers): cleanup pass — exports, docs, troubleshooting Final-sweep cleanup tied to the binary-resolver PR: - Mirror Codex's package surface for the new Claude resolver: add `./claude/binary-resolver` subpath export and re-export `resolveClaudeBinaryPath` + `claudeFileExists` from the package index. Renames the previously single `fileExists` re-export to `codexFileExists` for symmetry; nothing outside the providers package was importing it. - Add a "Claude Code not found" entry to the troubleshooting reference doc with platform-specific install snippets and pointers to the AI Assistants binary-path section. - Reframe the example claudeBinaryPath in reference/configuration.md away from cli.js-only language; it accepts either the native binary or cli.js. * test+refactor(providers, cli): address PR review feedback Two test gaps and one doc nit from the PR review (#1217): - Extract the `--no-env-file` decision into a pure exported helper `shouldPassNoEnvFile(cliPath)` so the native-binary branch is unit testable without mocking `BUNDLED_IS_BINARY` or running the full sendQuery pathway. Six new tests cover undefined, cli.js, native binary (Linux + Windows), Homebrew symlink, and suffix-only matching. Also adds a `claude.subprocess_env_file_flag` debug log so the security-adjacent decision is auditable. - Extract the three install-location probes in setup.ts into exported wrappers (`probeFileExists`, `probeNpmRoot`, `probeWhichClaude`) and export `detectClaudeExecutablePath` itself, so the probe order can be spied on. Six new tests cover each tier winning, fall-through ordering, npm-tier skip when not installed, and the which-resolved-but-stale-path edge case. - CLAUDE.md `claudeBinaryPath` placeholder updated to reflect that the field accepts either the native binary or cli.js (the example value was previously `/absolute/path/to/cli.js`, slightly misleading now that the curl-installer native binary is the default). Skipped from the review by deliberate scope decision: - `resolveClaudeBinaryPath` async-with-no-await: matches Codex's resolver signature exactly. Changing only Claude breaks symmetry; if pursued, do both providers in a separate cleanup PR. - `isAbsolute()` validation in parseClaudeConfig: Codex doesn't do it either. Resolver throws on non-existence already. - Atomic `.env` writes in setup wizard: pre-existing pattern this PR touched only adjacently. File as separate issue if needed. - classifyError branch in dag-executor for setup errors: scope creep. - `.env.example` "missing #" claim: false positive (verified all CLAUDE_BIN_PATH lines have proper comment prefixes). * fix(test): use path.join in Windows-compatible probe-order test The "tier 2 wins (npm cli.js)" test hardcoded forward-slash path comparisons, but `path.join` produces backslashes on Windows. Caused the Windows CI leg of the test suite to fail while macOS and Linux passed. Use `path.join` for both the mock return value and the expectation so the separator matches whatever the platform produces.
2026-04-14 14:56:37 +00:00
# Pre-configure the Claude Code SDK cli.js path for any consumer that runs
# a compiled Archon binary inside (or extending) this image. In source mode
# (the default `bun run start` ENTRYPOINT), BUNDLED_IS_BINARY is false and
# this variable is ignored — the SDK resolves cli.js via node_modules. Kept
# here so extenders don't need to rediscover the path.
# Path matches the hoisted layout produced by `bun install --linker=hoisted`.
ENV CLAUDE_BIN_PATH=/app/node_modules/@anthropic-ai/claude-agent-sdk/cli.js
# Create non-root user for running Claude Code
# Claude Code refuses to run with --dangerously-skip-permissions as root for security
RUN useradd -m -u 1001 -s /bin/bash appuser \
Add Archon distribution config and directory structure (#101) * Add Archon distribution config and directory structure - Create centralized path resolution in src/utils/archon-paths.ts - Add YAML configuration system (src/config/) with layered loading - Update Dockerfile and docker-compose for /.archon/ directory - Add GHCR publish workflow for multi-arch Docker builds - Create deploy/ directory with end-user docker-compose - Add /init command to create .archon structure in repos - Add docs/configuration.md reference guide - Update README with Quick Start section - Add bun run validate script - Update tests for new path defaults (~/.archon/) Directory structure: - Local: ~/.archon/{workspaces,worktrees,config.yaml} - Docker: /.archon/{workspaces,worktrees} - Repo: .archon/{commands,workflows,config.yaml} Legacy WORKSPACE_PATH and WORKTREE_BASE env vars still supported. * Complete Archon distribution config implementation - Wire up config system in src/index.ts (Task 3.5) - Remove legacy WORKSPACE_PATH and WORKTREE_BASE support - Add logConfig() function to config-loader.ts - Update docker-compose.yml to use ARCHON_DOCKER env var - Remove legacy env vars from .env.example - Update all documentation to reference ARCHON_HOME - Create scripts/validate-setup.sh for setup validation - Add setup:check script to package.json - Create docs/getting-started.md guide - Create docs/archon-architecture.md technical docs - Update tests to use ARCHON_HOME instead of legacy vars - Fix validate.md command template for new paths All plan phases now complete: - Phase 1: Archon Directory Structure - Phase 2: Docker Distribution - Phase 3: YAML Configuration System - Phase 4: Developer Experience - Phase 5: Documentation
2025-12-17 19:45:41 +00:00
&& chown -R appuser:appuser /app
# Create Archon directories
RUN mkdir -p /.archon/workspaces /.archon/worktrees \
&& chown -R appuser:appuser /.archon
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Copy root package files and lockfile
Migrate from Node.js/npm/Jest to Bun runtime (#85) * Migrate from Node.js/npm/Jest to Bun runtime - Replace npm with bun for package management (bun.lock) - Replace Jest with bun:test for testing - Update tsconfig for Bun (ESNext module, bundler resolution) - Update Dockerfile to use oven/bun:1-slim - Update CI workflow to use oven-sh/setup-bun@v2 - Remove dynamic import hack from codex.ts (direct ESM imports) - Fix test mocking for Bun (export execFileAsync, use spyOn) - Update all documentation (CLAUDE.md, README.md, CONTRIBUTING.md) All 395 tests pass, type-check passes, E2E validated with curl. * ci: retrigger CI build * fix: make execFileAsync a function for better Bun mockability * fix: ensure execFileAsync returns string not Buffer * fix: rename _execFileAsync to comply with naming convention * fix: make mkdirAsync mockable for Bun tests * fix: update engines to bun>=1.0.0 and add mkdirAsync mock * fix: pin Bun to 1.3.4 in CI to fix mock.module test failures Newer Bun versions have different mock.module() behavior that causes cross-test module pollution, resulting in 71 test failures in CI while tests pass locally. Pinning to 1.3.4 ensures consistent behavior. * fix: run orchestrator tests last to avoid mock.module pollution Bun's mock.module() pollutes the global module cache, causing tests to fail when orchestrator.test.ts (which mocks command-handler and factory) runs before those modules' own test files. Fix by running tests in two batches: 1. All tests except orchestrator 2. Orchestrator tests last This ensures orchestrator's mocks don't affect other test files.
2025-12-16 13:34:58 +00:00
COPY package.json bun.lock ./
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Copy ALL workspace package.json files
COPY packages/adapters/package.json ./packages/adapters/
COPY packages/cli/package.json ./packages/cli/
COPY packages/core/package.json ./packages/core/
fix(docker): fix Docker build failures and add CI guard (#1022) * fix(docker): update Bun base image from 1.2 to 1.3 The lockfile was generated with Bun 1.3.x locally but the Docker image used oven/bun:1.2-slim. Bun 1.3 changed the lockfile format, causing --frozen-lockfile to fail during docker build. * fix(docker): pin Bun to exact version 1.3.9 matching lockfile Floating tag 1.3-slim resolved to 1.3.11 which has a different lockfile format than 1.3.9 used to generate bun.lock. Pin to exact patch version to prevent --frozen-lockfile failures. * fix(docker): add missing docs-web workspace package.json The docs-web package was added as a workspace member but its package.json was never added to the Dockerfile COPY steps. This caused bun install --frozen-lockfile to fail because the workspace layout in Docker didn't match the lockfile. * fix(docker): use hoisted linker for Vite/Rollup compatibility Bun's default "isolated" linker stores packages in node_modules/.bun/ with symlinks that Vite's Rollup bundler cannot resolve during production builds (e.g., remark-gfm → mdast-util-gfm chain). Using --linker=hoisted gives the classic flat node_modules layout that Rollup expects. Local dev is unaffected (Vite dev server handles the isolated layout fine). * ci: pin Bun version to 1.3.9 and add Docker build check - Align CI Bun version (was 1.3.11) with Dockerfile and local dev (1.3.9) to prevent lockfile format mismatches between environments - Add docker-build job to test.yml that builds the Docker image on every PR — catches Dockerfile regressions (missing workspace packages, linker issues, build failures) before they reach deploy * fix(ci): add permissions for GHA cache and tighten Bun engine - Add actions: write permission to docker-build job so GHA layer cache writes succeed on PRs from forks - Tighten package.json engines.bun from >=1.0.0 to >=1.3.9 to document the minimum version that matches the lockfile format * fix(ci): add smoke test, align Bun version across all workflows Review fixes: - Add load: true + health endpoint smoke test to docker-build CI job so we verify the image actually starts, not just compiles - Align Bun 1.3.9 in deploy-docs.yml and release.yml (were still 1.3.11) - Document why docs-web source is intentionally omitted from Docker * chore: float Docker to bun:1.3 and align CI to 1.3.11 - Dockerfile: oven/bun:1.3-slim (auto-tracks latest 1.3.x patches) - CI workflows: bun-version 1.3.11 (current latest, reproducible) - engines.bun: >=1.3.9 (minimum for local devs) Lockfile format is stable across 1.3.x patches, so this is safe. * fix(docker,ci): pin Docker to 1.3.11, loosen engines, harden smoke test - Dockerfile: pin oven/bun:1.3.11-slim (was floating 1.3-slim) so Docker builds are reproducible and match CI exactly. - package.json: loosen engines to ^1.3.0 so end users on any 1.3.x can run the CLI; CI/Docker remain pinned to the canonical latest. - CI smoke test: replace 'sleep 5' with curl --retry-connrefused, and move container cleanup to an 'if: always()' step so a failed health check no longer leaks the named container. --------- Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2026-04-07 07:37:47 +00:00
# docs-web source is NOT copied — it's a static site deployed separately
# (see .github/workflows/deploy-docs.yml). package.json is included only
# so Bun's workspace lockfile resolves correctly.
COPY packages/docs-web/package.json ./packages/docs-web/
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
COPY packages/git/package.json ./packages/git/
COPY packages/isolation/package.json ./packages/isolation/
COPY packages/paths/package.json ./packages/paths/
refactor: extract providers from @archon/core into @archon/providers (#1137) * refactor: extract providers from @archon/core into @archon/providers Move Claude and Codex provider implementations, factory, and SDK dependencies into a new @archon/providers package. This establishes a clean boundary: providers own SDK translation, core owns business logic. Key changes: - New @archon/providers package with zero-dep contract layer (types.ts) - @archon/workflows imports from @archon/providers/types — no mirror types - dag-executor delegates option building to providers via nodeConfig - IAgentProvider gains getCapabilities() for provider-agnostic warnings - @archon/core no longer depends on SDK packages directly - UnknownProviderError standardizes error shape across all surfaces Zero user-facing changes — same providers, same config, same behavior. * refactor: remove config type duplication and backward-compat re-exports Address review findings: - Move ClaudeProviderDefaults and CodexProviderDefaults to the @archon/providers/types contract layer as the single source of truth. @archon/core/config/config-types.ts now imports from there. - Remove provider re-exports from @archon/core (index.ts and types/). Consumers should import from @archon/providers directly. - Update @archon/server to depend on @archon/providers for MessageChunk. * refactor: move structured output validation into providers Each provider now normalizes its own structured output semantics: - Claude already yields structuredOutput from the SDK's native field - Codex now parses inline agent_message text as JSON when outputFormat is set, populating structuredOutput on the result chunk This eliminates the last provider === 'codex' branch from dag-executor, making it fully provider-agnostic. The dag-executor checks structuredOutput uniformly regardless of provider. Also removes the ClaudeCodexProviderDefaults deprecated alias — all consumers now use ClaudeProviderDefaults directly. * fix: address PR review — restore warnings, fix loop options, cleanup Critical fixes: - Restore MCP missing env vars user-facing warning (was silently dropped) - Restore Haiku + MCP tool search warning - Fix buildLoopNodeOptions to pass workflow-level nodeConfig (effort, thinking, betas, sandbox were silently lost for loop nodes) - Add TODO(#1135) comments documenting env-leak gate gap Cleanup: - Remove backward-compat type aliases from deps.ts (keep WorkflowTokenUsage) - Remove 26 unnecessary eslint-disable comments from test files - Trim internal helpers from providers barrel (withFirstMessageTimeout, getProcessUid, loadMcpConfig, buildSDKHooksFromYAML) - Add @archon/providers dep to CLI package.json - Fix 8 stale documentation paths pointing to deleted core/src/providers/ - Add E2E smoke test workflows for both Claude and Codex providers * fix: forward provider system warnings to users in dag-executor The dag-executor only forwarded system chunks starting with "MCP server connection failed:" — all other provider warnings (missing env vars, Haiku+MCP, structured output issues) were logged but never reached the user. Now forwards all system chunks starting with ⚠️ (the prefix providers use for user-actionable warnings). * fix: add providers package to Dockerfile and fix CI module resolution - Add packages/providers/ to all three Dockerfile stages (deps, production package.json copy, production source copy) - Replace wildcard export map (./*) with explicit subpath entries to fix module resolution in CI (bun workspace linking) * chore: update bun.lock for providers package exports
2026-04-13 06:21:36 +00:00
COPY packages/providers/package.json ./packages/providers/
COPY packages/server/package.json ./packages/server/
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
COPY packages/web/package.json ./packages/web/
COPY packages/workflows/package.json ./packages/workflows/
# Install production dependencies only (--ignore-scripts skips husky prepare hook)
fix(docker): fix Docker build failures and add CI guard (#1022) * fix(docker): update Bun base image from 1.2 to 1.3 The lockfile was generated with Bun 1.3.x locally but the Docker image used oven/bun:1.2-slim. Bun 1.3 changed the lockfile format, causing --frozen-lockfile to fail during docker build. * fix(docker): pin Bun to exact version 1.3.9 matching lockfile Floating tag 1.3-slim resolved to 1.3.11 which has a different lockfile format than 1.3.9 used to generate bun.lock. Pin to exact patch version to prevent --frozen-lockfile failures. * fix(docker): add missing docs-web workspace package.json The docs-web package was added as a workspace member but its package.json was never added to the Dockerfile COPY steps. This caused bun install --frozen-lockfile to fail because the workspace layout in Docker didn't match the lockfile. * fix(docker): use hoisted linker for Vite/Rollup compatibility Bun's default "isolated" linker stores packages in node_modules/.bun/ with symlinks that Vite's Rollup bundler cannot resolve during production builds (e.g., remark-gfm → mdast-util-gfm chain). Using --linker=hoisted gives the classic flat node_modules layout that Rollup expects. Local dev is unaffected (Vite dev server handles the isolated layout fine). * ci: pin Bun version to 1.3.9 and add Docker build check - Align CI Bun version (was 1.3.11) with Dockerfile and local dev (1.3.9) to prevent lockfile format mismatches between environments - Add docker-build job to test.yml that builds the Docker image on every PR — catches Dockerfile regressions (missing workspace packages, linker issues, build failures) before they reach deploy * fix(ci): add permissions for GHA cache and tighten Bun engine - Add actions: write permission to docker-build job so GHA layer cache writes succeed on PRs from forks - Tighten package.json engines.bun from >=1.0.0 to >=1.3.9 to document the minimum version that matches the lockfile format * fix(ci): add smoke test, align Bun version across all workflows Review fixes: - Add load: true + health endpoint smoke test to docker-build CI job so we verify the image actually starts, not just compiles - Align Bun 1.3.9 in deploy-docs.yml and release.yml (were still 1.3.11) - Document why docs-web source is intentionally omitted from Docker * chore: float Docker to bun:1.3 and align CI to 1.3.11 - Dockerfile: oven/bun:1.3-slim (auto-tracks latest 1.3.x patches) - CI workflows: bun-version 1.3.11 (current latest, reproducible) - engines.bun: >=1.3.9 (minimum for local devs) Lockfile format is stable across 1.3.x patches, so this is safe. * fix(docker,ci): pin Docker to 1.3.11, loosen engines, harden smoke test - Dockerfile: pin oven/bun:1.3.11-slim (was floating 1.3-slim) so Docker builds are reproducible and match CI exactly. - package.json: loosen engines to ^1.3.0 so end users on any 1.3.x can run the CLI; CI/Docker remain pinned to the canonical latest. - CI smoke test: replace 'sleep 5' with curl --retry-connrefused, and move container cleanup to an 'if: always()' step so a failed health check no longer leaks the named container. --------- Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
2026-04-07 07:37:47 +00:00
RUN bun install --frozen-lockfile --production --ignore-scripts --linker=hoisted
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Copy application source (Bun runs TypeScript directly, no compile step needed)
COPY packages/adapters/ ./packages/adapters/
COPY packages/cli/ ./packages/cli/
COPY packages/core/ ./packages/core/
COPY packages/git/ ./packages/git/
COPY packages/isolation/ ./packages/isolation/
COPY packages/paths/ ./packages/paths/
refactor: extract providers from @archon/core into @archon/providers (#1137) * refactor: extract providers from @archon/core into @archon/providers Move Claude and Codex provider implementations, factory, and SDK dependencies into a new @archon/providers package. This establishes a clean boundary: providers own SDK translation, core owns business logic. Key changes: - New @archon/providers package with zero-dep contract layer (types.ts) - @archon/workflows imports from @archon/providers/types — no mirror types - dag-executor delegates option building to providers via nodeConfig - IAgentProvider gains getCapabilities() for provider-agnostic warnings - @archon/core no longer depends on SDK packages directly - UnknownProviderError standardizes error shape across all surfaces Zero user-facing changes — same providers, same config, same behavior. * refactor: remove config type duplication and backward-compat re-exports Address review findings: - Move ClaudeProviderDefaults and CodexProviderDefaults to the @archon/providers/types contract layer as the single source of truth. @archon/core/config/config-types.ts now imports from there. - Remove provider re-exports from @archon/core (index.ts and types/). Consumers should import from @archon/providers directly. - Update @archon/server to depend on @archon/providers for MessageChunk. * refactor: move structured output validation into providers Each provider now normalizes its own structured output semantics: - Claude already yields structuredOutput from the SDK's native field - Codex now parses inline agent_message text as JSON when outputFormat is set, populating structuredOutput on the result chunk This eliminates the last provider === 'codex' branch from dag-executor, making it fully provider-agnostic. The dag-executor checks structuredOutput uniformly regardless of provider. Also removes the ClaudeCodexProviderDefaults deprecated alias — all consumers now use ClaudeProviderDefaults directly. * fix: address PR review — restore warnings, fix loop options, cleanup Critical fixes: - Restore MCP missing env vars user-facing warning (was silently dropped) - Restore Haiku + MCP tool search warning - Fix buildLoopNodeOptions to pass workflow-level nodeConfig (effort, thinking, betas, sandbox were silently lost for loop nodes) - Add TODO(#1135) comments documenting env-leak gate gap Cleanup: - Remove backward-compat type aliases from deps.ts (keep WorkflowTokenUsage) - Remove 26 unnecessary eslint-disable comments from test files - Trim internal helpers from providers barrel (withFirstMessageTimeout, getProcessUid, loadMcpConfig, buildSDKHooksFromYAML) - Add @archon/providers dep to CLI package.json - Fix 8 stale documentation paths pointing to deleted core/src/providers/ - Add E2E smoke test workflows for both Claude and Codex providers * fix: forward provider system warnings to users in dag-executor The dag-executor only forwarded system chunks starting with "MCP server connection failed:" — all other provider warnings (missing env vars, Haiku+MCP, structured output issues) were logged but never reached the user. Now forwards all system chunks starting with ⚠️ (the prefix providers use for user-actionable warnings). * fix: add providers package to Dockerfile and fix CI module resolution - Add packages/providers/ to all three Dockerfile stages (deps, production package.json copy, production source copy) - Replace wildcard export map (./*) with explicit subpath entries to fix module resolution in CI (bun workspace linking) * chore: update bun.lock for providers package exports
2026-04-13 06:21:36 +00:00
COPY packages/providers/ ./packages/providers/
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
COPY packages/server/ ./packages/server/
COPY packages/workflows/ ./packages/workflows/
# Copy pre-built web UI from build stage
COPY --from=web-build /app/packages/web/dist/ ./packages/web/dist/
# Copy config, migrations, and bundled defaults
COPY .archon/ ./.archon/
COPY migrations/ ./migrations/
COPY tsconfig*.json ./
2025-11-13 23:29:17 +00:00
# Fix permissions for appuser
RUN chown -R appuser:appuser /app
2025-11-12 05:06:29 +00:00
# Create .codex directory for Codex authentication
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
RUN mkdir -p /home/appuser/.codex && chown appuser:appuser /home/appuser/.codex
# Configure git to trust Archon directories (as appuser)
RUN gosu appuser git config --global --add safe.directory '/.archon/workspaces' && \
gosu appuser git config --global --add safe.directory '/.archon/workspaces/*' && \
gosu appuser git config --global --add safe.directory '/.archon/worktrees' && \
gosu appuser git config --global --add safe.directory '/.archon/worktrees/*'
2025-11-12 05:06:29 +00:00
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Copy entrypoint script (fixes volume permissions, drops to appuser)
# sed strips Windows CRLF in case .gitattributes eol=lf was bypassed
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
COPY docker-entrypoint.sh /usr/local/bin/
RUN sed -i 's/\r$//' /usr/local/bin/docker-entrypoint.sh \
&& chmod +x /usr/local/bin/docker-entrypoint.sh
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
# Default port (matches .env.example PORT=3000)
EXPOSE 3000
feat(docker): complete Docker deployment setup (#756) * fix: overhaul Docker setup for working builds and server deployments Multi-stage Dockerfile: deps → web build → production image. Fixes missing workspace packages (was 3/9, now all 9), adds Vite web UI build, removes broken single-file bundle, uses --production install. Merges docker-compose.yml and docker-compose.cloud.yml into a single file with composable profiles (with-db, cloud). Fixes health check path (/api/health), postgres volume (/data), adds Caddyfile.example. * docs: add comprehensive Docker guide and update cloud-deployment.md New docs/docker.md covers quick start, composable profiles, config, cloud deployment with HTTPS, pre-built image usage, building, and troubleshooting. Updates cloud-deployment.md to use the new single compose file with profiles and fixes stale health endpoint paths. * docs: restructure docker.md — prerequisites before commands Moves .env and Caddyfile setup to a Prerequisites section at the top, before any docker compose commands. Adds troubleshooting entry for the "not a directory" Caddyfile mount error. * fix: pass env_file to Caddy container for DOMAIN variable Caddy needs {$DOMAIN} from .env but the container had no env_file. Without it, {$DOMAIN} is empty and Caddy parses the site block as a global options block, causing "unrecognized global option" error. * docs: rewrite docker.md with server quickstart and fix auth guidance Restructures around a step-by-step Quick Start that walks through the full server deployment (Docker install → .env → Caddyfile → DNS → run). Removes CLAUDE_USE_GLOBAL_AUTH references — Docker has no local claude CLI, so users must provide CLAUDE_CODE_OAUTH_TOKEN or CLAUDE_API_KEY. * feat: warn when Docker app falls back to SQLite with postgres running When ARCHON_DOCKER=true and DATABASE_URL is not set, logs a warning with the exact connection string to add to .env. Prevents users from running --profile with-db and unknowingly using SQLite instead. * feat: configurable data directory via ARCHON_DATA env var Users can set ARCHON_DATA=/opt/archon-data in .env to control where Archon stores workspaces, worktrees, artifacts, and logs on the host. Defaults to a Docker-managed volume when not set. * fix: fix volume permission errors with entrypoint script Docker volume mounts create /.archon/ as root, but the app runs as appuser (UID 1001). New docker-entrypoint.sh runs as root to fix permissions, then drops to appuser via gosu. Works both when running as root (default) and as non-root (--user flag, Kubernetes). * fix: configure git credentials from GH_TOKEN in Docker entrypoint Git inside the container can't authenticate for HTTPS clones without credentials. The entrypoint now configures git url.insteadOf to inject GH_TOKEN into GitHub HTTPS URLs automatically. * security: use credential helper for GH_TOKEN instead of url.insteadOf The url.insteadOf approach stored the raw token in ~/.gitconfig as a key name, visible to any process. Credential helper keeps the token in the environment only. Also fixes: chown -Rh (no symlink follow), signal propagation (exec bun directly as PID 1), error diagnostics, and deduplicates root/non-root branches via RUNNER variable. * security: scope SSE flush_interval to /api/stream/*, harden headers flush_interval -1 was global, disabling buffering for all endpoints. Now scoped to @sse path matcher. Also adds HSTS, changes X-Frame-Options to DENY, and trims the comment header. * security: use env-var for postgres password, bind port to localhost Hardcoded postgres:postgres with port exposed to 0.0.0.0 is a risk on servers with permissive firewalls. Now uses POSTGRES_PASSWORD env var with fallback, and binds to 127.0.0.1 only. * fix: caddy depends_on app with service_healthy condition Without the health condition, Caddy starts proxying before the app is ready, returning 502s on first boot. * fix: remove hardcoded container_name from caddy service Hardcoded name prevents running multiple instances on the same host. Other services already use Compose default naming. * security: exclude .claude/ from Docker image Skills, commands, rules, and prompt engineering details are not needed at runtime and expose internal architecture in the production image. * fix: assert web build produces index.html in Dockerfile A silent Vite failure could produce an empty dist/ — the container would start with a healthy backend but a broken UI serving 404s. * chore: remove redundant WORKDIR in Dockerfile Stage 2 WORKDIR /app is inherited from Stage 1 (deps). Re-declaring it adds a no-op layer and implies something changed. * feat: add cloud-init config for automated server setup New deploy/cloud-init.yml for VPS providers — paste into User Data field to auto-install Docker, clone repo, build image, and configure firewall. User only needs to edit .env and run docker compose up. * feat: add optional Caddy basic auth for cloud deployments Single env var (CADDY_BASIC_AUTH) expands to the full basicauth directive or nothing when unset — no app changes needed. Webhooks and health check are excluded. Documented in .env.example, deploy config, and docker.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add agent-browser + Chromium for E2E testing workflows Enables E2E validation workflows (archon-validate-pr, validate-ui, replicate-issue) to run inside Docker containers out of the box. - Install system Chromium via apt-get (~200MB vs ~500MB Chrome for Testing) - Install agent-browser@0.22.1 via npm (postinstall downloads Rust binary) - Purge nodejs/npm after install to keep image lean - Set AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium - agent-browser auto-detects Docker and adds --no-sandbox Closes #787 * fix(docker): symlink agent-browser native binary before purging nodejs The npm entry point (bin/agent-browser.js) is a Node.js wrapper that launches the Rust binary. After purging nodejs/npm to save ~60MB, the wrapper can't execute. Fix by copying the native Rust binary directly to /usr/local/bin and symlinking agent-browser to it. * feat(docker): add cookie-based form auth sidecar for Caddy - Add auth-service/ Node.js sidecar (/verify, /login GET/POST, /logout) - Use bcryptjs for password hashing, HMAC-SHA256 signed HttpOnly cookies - Add auth-service to docker-compose.yml under ["auth"] profile (expose: not ports:) - Restructure Caddyfile.example with handle blocks for Option A (form auth), Option B (basic auth), None - Add AUTH_USERNAME, AUTH_PASSWORD_HASH, COOKIE_SECRET env vars to .env.example and deploy/.env.example - Add Form-Based Authentication section to docs/docker.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review findings for auth-service (HIGH/MEDIUM) Fixes applied: - HIGH: validate AUTH_PASSWORD_HASH is a valid bcrypt hash at startup (bcrypt.getRounds() guard — prevents silent lockout on placeholder hash) - HIGH: add request method/URL context to unhandled error log + non-empty 500 body - HIGH: add server.on('error') handler for port bind failures (EADDRINUSE/EACCES) - HIGH: document AUTH_PORT/AUTH_SERVICE_PORT indirection in server.js comment - HIGH: add auth-service/test.js with isSafeRedirect and cookie sign/verify tests - MEDIUM: add escapeHtml() helper; apply to loginPage error param (latent XSS) - MEDIUM: add 4 KB body size limit in readBody (prevents memory exhaustion) - MEDIUM: export helpers + require.main guard (enables clean import-level testing) - MEDIUM: fix docs/docker.md Step 4 instruction — clarify which handle block to comment out Tests added: - auth-service/test.js: 12 assertions for isSafeRedirect (safe paths + open redirect vectors) - auth-service/test.js: 5 assertions for signCookie/verifyCookie round-trip and edge cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: escape $ in AUTH_PASSWORD_HASH example to prevent Docker Compose variable substitution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(core): break up god function in command-handler (#742) * refactor(core): break up god function in command-handler Extract handleWorktreeCommand, handleWorkflowCommand, handleRepoCommand, and handleRepoRemoveCommand from the 1300-line handleCommand switch statement. Add resolveRepoArg helper to eliminate duplication between repo and repo-remove cases. handleCommand now contains ~200 lines of routing logic only. * fix: address review findings from PR #742 command-handler.ts: - Replace fragile 'success' in discriminator with proper ResolveRepoArgResult discriminated union (ok: true/false) and fix misleading JSDoc - Add missing error handling to worktree orphans, workflow cancel, workflow reload - Fix isolation_env_id UUID used as filesystem path in worktree create/list/orphans (look up working_path from DB instead) - Add cmd. domain prefix to all log events per CLAUDE.md convention - Add identifier/isolationEnvId context to repo_switch_failed and worktree_remove_failed logs - Capture isCurrentCodebase before mutation in handleRepoRemoveCommand - Hoist duplicated workflowCwd computation in handleWorkflowCommand - Remove stale (Phase 3D) comment marker docs: - Remove all /command-invoke references from CLAUDE.md, README.md, docs/architecture.md, and .claude/rules/orchestrator.md - Update command list to match actual handleCommand cases - Replace outdated routing examples with current AI router pattern * refactor: remove MAX_WORKTREES_PER_CODEBASE limit Worktree count is no longer restricted. Remove the constant, the limit field from WorktreeStatusBreakdown, the limit_reached block reason, formatWorktreeLimitMessage, and all associated tests. * fix: address review findings — error handling, log prefixes, tests, docs - Wrap workflow list discoverWorkflowsWithConfig in try/catch (was the only unprotected async call among workflow subcommands) - Cast error to Error before logging in workflow cancel/status catch blocks - Add cmd. domain prefix to all command-handler log events (12 events) - Update worktree create test to use UUID isolation_env_id with DB lookup - Add resolveRepoArg boundary tests (/repo 0, /repo N > count) - Add worktree cleanup subcommand tests (merged, stale, invalid type) - Add updateConversation assertion to repo-remove session test - Fix stale docs: architecture.md command handler section, .claude → .archon paths, remove /command-invoke from commands-reference, fix github.md example * feat(workflows)!: replace standalone loop with DAG loop node (#785) * feat(workflows): add loop node type to DAG workflows Add LoopNode as a fourth DAG node type alongside command, prompt, and bash. Loop nodes run an AI prompt repeatedly until a completion signal is detected (LLM-decided via <promise>SIGNAL</promise>) or a deterministic bash condition succeeds (until_bash exit 0). This enables Ralph-style autonomous iteration as a composable node within DAG workflows — upstream nodes can produce plans/task lists that feed into the loop, and downstream nodes can act on the loop's output via $nodeId.output substitution. Changes: - Add LoopNodeConfig, LoopNode interface, isLoopNode type guard - Add loop branch in parseDagNode with full validation - Extract detectCompletionSignal/stripCompletionTags to executor-shared - Add executeLoopNode function in dag-executor with iteration logic - Add nodeId field to loop iteration event interfaces - Add 17 new tests (9 loader + 8 executor) - Add archon-test-loop-dag and archon-ralph-dag default workflows The standalone loop: workflow type is preserved but deprecated. * refactor(workflows): rewrite archon-ralph-dag prompt to match command quality bar Expand the loop prompt from ~75 lines to ~430 lines with: - 7 numbered phases with checkpoints (matching archon-implement.md pattern) - Environment setup: dependency install, CLAUDE.md reading, git state check - Explicit DO/DON'T implementation rules - Per-failure-type validation handling (type-check, lint, tests, format) - Acceptance criteria verification before commit - Exact commit message template with heredoc format - Edge case handling (validation loops, blocked stories, dirty state, large stories) - File format specs for prd.json schema and progress.txt structure - Critical fix: "context is stale — re-read from disk" for fresh_context loops Also improved bash setup node (dep install, structured output delimiters, story counts) and report node (git log/diff stats, PR status check). * feat(workflows)!: remove standalone loop workflow type BREAKING: Standalone `loop:` workflows are no longer supported. Loop iteration is now exclusively a DAG node type (LoopNode). Existing loop workflows should be migrated to DAG workflows with loop nodes — see archon-ralph-dag.yaml for the pattern. Removed: - LoopConfig type and LoopWorkflow from WorkflowDefinition union - executeLoopWorkflow function (~600 lines) from executor.ts - Loop dispatch in executeWorkflow - Top-level loop: parsing in loader (now returns clear error message) - archon-ralph-fresh.yaml, archon-ralph-stateful.yaml, archon-test-loop.yaml - LoopEditor.tsx and loop mode from WorkflowBuilder UI - ~900 lines of standalone loop tests Kept (for DAG loop nodes): - LoopNodeConfig, LoopNode, isLoopNode - executeLoopNode in dag-executor.ts - Loop iteration events in store/event-emitter - isLoop tracking in web UI workflow store (fires for DAG loop nodes) * fix: address all review findings for loop-dag-node PR - Fix missing isDagWorkflow import in command-handler.ts (shipping bug) - Wrap substituteWorkflowVariables and getAssistantClient in try-catch with structured error output in executeLoopNode - Add onTimeout callback for idle timeout (log + user notification + abort) - Add cancellation user notification before returning failed state - Differentiate until_bash ENOENT/system errors from expected non-zero exit - Use logDir for per-iteration AI output logging (logAssistant, logTool, logStepComplete, tool_called/tool_completed events, sendStructuredEvent) - Reject retry: on loop nodes at load time (executor doesn't apply it) - Remove dead isLoop field from WorkflowStartedEvent - Fix stale error message "DAG/loop dispatch" -> "DAG dispatch" - Fix stale commitWorkflowArtifacts doc referencing "loop-based" - Fix archon-ralph-dag.yaml referencing deleted workflows - Update CLAUDE.md: "Two execution modes", add loop node to DAG description - Extract parseIdleTimeout helper (3 copies -> 1 in loader.ts) - Use isLoopNode() type guard in validateDagStructure - Simplify buildLoopNodeOptions with conditional spread - Restore loop?: never on StepWorkflow for type safety - Add tests: AI error mid-iteration, plain signal detection, false positive - Fix stale test assertion for standalone loop rejection message * feat: refactor Gitea adapter to community forge structure + tea CLI Moves the Gitea platform adapter from the old location (packages/server/src/adapters/gitea.ts) to the proper community forge adapter structure: packages/adapters/src/community/forge/gitea/ ├── adapter.ts # Main adapter class ├── auth.ts # parseAllowedUsers, isGiteaUserAuthorized ├── types.ts # WebhookEvent interface ├── index.ts # Barrel export └── adapter.test.ts # 43 passing tests Key changes: - Fix imports: createLogger, getArchonWorkspacesPath, getCommandFolderSearchPaths now from @archon/paths - Fix imports: cloneRepository, syncRepository, addSafeDirectory, toRepoPath, toBranchName, isWorktreePath now from @archon/git - Remove execAsync / child_process / promisify — use @archon/git functions for all git operations - auth.ts extracted from @archon/core into adapter package (mirrors GitHub adapter's auth.ts pattern) - types.ts extracted: WebhookEvent interface now standalone - Replace gh CLI hints with tea CLI in context strings: 'tea issue view N' and 'tea pr view N' - Register GiteaAdapter in packages/server/src/index.ts via @archon/adapters/community/forge/gitea import - Document GITEA_* env vars in .env.example Tests: 43 pass, 0 fail Co-authored-by: John Fitzpatrick <john@cyberfitz.org> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Archon <archon@dynamous.ai> Co-authored-by: Thomas <info@smartcode.diy> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com> Co-authored-by: Fitzy <fitzy@cyberfitz.org> Co-authored-by: John Fitzpatrick <john@cyberfitz.org>
2026-03-26 13:02:04 +00:00
ENTRYPOINT ["docker-entrypoint.sh"]