onju-v2

mirror of https://github.com/justLV/onju-v2 synced 2026-04-21 07:37:34 +00:00

Author	SHA1	Message	Date
justLV	18e9f9d541	add link to battery base for m5 echo	2026-04-13 10:01:03 -07:00
justLV	55a4019a8a	add link to battery base for m5 echo Updated M5 Echo support description with battery base link.	2026-04-13 10:00:33 -07:00
justLV	126ed40a20	add order link	2026-04-13 00:13:16 -07:00
justLV	a2106d032b	title	2026-04-12 19:29:50 -07:00
justLV	44c7be03b8	Only persist assistant history after successful TTS delivery Move the conversational backend's _finalize() call out of stream()'s finally block and expose it as a public commit(text) method instead. The previous behavior persisted whatever was yielded even when the stream errored, the user interrupted, or a TTS send failed — so saved history diverged from what the user actually heard, and the next turn replayed phantom context to the LLM. main.py now calls backend.commit(response_text) only after a turn successfully completes and produced content. Agentic backend gets a no-op commit() since history lives on the remote service.	2026-04-12 19:11:09 -07:00
justLV	0bc3ae209f	Pre-publish fixes: local TTS key, multicast crash guard, doc drift - Rename TTS backend "qwen3" -> "local" across tts.py and README; the code is a generic /v1/audio/speech client, not qwen-specific, and config.yaml.example already used the local: key. - Guard multicast_listener against non-UTF8 and empty packets so a single bad announcement packet can't cancel the pipeline via gather. - Fix credentials.h.template comments to reference flash.sh (not the old flash_firmware.sh name). - Drop stray test.wav arg from serial_monitor.py usage example in README; the script takes an optional serial port, not an audio file.	2026-04-12 19:09:58 -07:00
justLV	002ed7388d	Refine stall classifier prompt and group benchmark cases by label Rework the stall prompt to distinguish LOOKUP (say something specific, three-to-seven words) from ACTION (content-free backchannel, two-to-five words, no action verbs or promises) and restructure test_stall.py to group cases by expected label for easier manual review.	2026-04-12 19:08:40 -07:00
justLV	9ae918009b	Move Streaming and stalls section below OpenClaw setup	2026-04-12 15:01:23 -07:00
justLV	f0f6e38e7c	Move test scripts into tests/ and add stall/stream benchmarks - git mv test_client, test_mic, test_speaker into tests/ - Add tests/test_stall.py (benchmarks the Gemini stall classifier against conversational/fetch/capture/act/follow-up queries) - Add tests/test_stream.py (raw SSE chunk inspection against the agentic gateway) - Update config path resolution in the new tests to climb one level - Update README Testing section with new tests/ paths	2026-04-12 14:22:52 -07:00
justLV	04990145ae	Embed git hash in m5_echo, generate git_hash.h in flash.sh Moves git_hash.h generation from the one-shot setup-git-hash.sh post-commit hook into flash.sh, so fresh checkouts don't need a bootstrap step. Only rewrites the file when the hash changes to avoid triggering unnecessary recompiles. Also wires GIT_HASH into m5_echo: startup log, multicast announce, and the Device: line. Both sketches now append the hash to Device:.	2026-04-12 14:18:21 -07:00
justLV	dccb6ced15	Stream agentic LLM responses, add contextual stall classifier, rename backends - SSE sentence-level streaming: consume agent deltas, split on sentence boundaries (handles no-space chunk joins), synthesize+send each sentence as it forms; intermediate sends keep mic_timeout=0 - Gemini-backed stall classifier for agentic mode only: narrow to retrieval-only, pass prev user/assistant for context awareness, avoid action promises the stall can't honor, sub-second latency via reasoning_effort=none - Rename backends: local -> conversational, managed -> agentic (files, classes, config keys) - PTT interrupt fix: set device.interrupted when button-press frames arrive mid-response and keep buffering so the next utterance captures cleanly instead of being dropped - Startup summary log showing ASR, LLM, STALL, and TTS config at a glance - run.sh launcher with Homebrew libopus path for macOS - voice_prompt config for per-turn agentic reminders; inline continuity note injection so the agent knows what the stall just said aloud - README section on streaming, stalls, and the first-turn OpenClaw caveat	2026-04-12 13:55:59 -07:00
justLV	19aca75ba8	Add separate default voice for PTT devices PTT devices have smaller speakers that don't carry bass well, so default them to a female voice (Emma) while keeping Archer as the VOX default. Per-device voice overrides still take precedence.	2026-04-12 12:23:24 -07:00
justLV	a2ab42929c	Use m5stack_atom board for Atom Echo, MAC-based hostname flash.sh: switch m5_echo target from generic pico32 @ 115200 baud to esp32:esp32:m5stack_atom @ 1500000 baud — ~8x faster uploads and correct partition scheme (3MB app vs 1.3MB). m5_echo: derive hostname as m5-echo-XXYYZZ from the WiFi STA MAC, matching onjuino's pattern, and print Device line after Opus init.	2026-04-10 16:50:51 -07:00
justLV	28040a77bb	Fix red-flash after interrupt, VAD LED gate, MAC readout, logging On playback interrupt both Opus and PCM paths now close the TCP socket instead of trying a frame-aligned drain. The drain was misaligned when an interrupt hit mid-frame, leaving stray bytes that the next persistent- loop iteration read as a garbage "header" and flashed the error red LED. The bridge opens a fresh TCP per audio push anyway, so closing is safe. 0xCC LED blink handler now treats level=0 as a no-op heartbeat. setLed is an overwrite (not cumulative), so a level=0 write would abruptly truncate an in-flight fade. Only level>0 writes touch LED state or extend mic_timeout — leaves headroom for future keepalive schemes without freezing ongoing fades. MAC address now read via esp_read_mac(ESP_MAC_WIFI_STA) instead of WiFi.macAddress(). On arduino-esp32 v3.x the latter returns all zeros before WiFi is fully initialized, producing a hostname like "onju-000000". eFuse read works unconditionally. Log a "Device: <hostname> @ <ip>" line right after the mic offset on boot so it's easy to spot the IP in serial traces, and label the saved server IP log as "Saved server IP" so it's not confused with the device's own IP.	2026-04-10 16:32:55 -07:00
justLV	c962d3efbf	Gate double-tap on prior normal tap; recover from TCP stalls Double-tap to disable now requires a previously completed standalone normal tap (one whose double-tap window expired without a 2nd tap). Cold start and re-enable both begin locked, so tap-tap can no longer disable on first interaction or back-to-back after re-enabling. handleShortPress reports whether the tap was a real action so no-ops (mute / no server) and re-enables don't satisfy the prerequisite. Center-touch debounce dropped to 150ms and the double-tap window bumped to 700ms so the 2nd tap has real slack. Both Opus and PCM playback loops now break out and force-close the TCP socket if no bytes arrive for 2s, instead of spinning while the I2S DMA buffer loops the last chunk. Inner Opus reads also poll interruptPlayback so user double-taps actually unblock a stalled read. isPlaying is no longer cleared in the touch handlers — playback cleanup clears it after I2S DMA is zeroed, so the mic loop can't reopen while the speaker tail is audible.	2026-04-10 15:34:30 -07:00
justLV	dd42fdb668	Simplify device state model, reduce TCP churn, fix multi-device identity Replace callActive/sendDisconnect with deviceEnabled toggle — device starts enabled on boot, double-tap disables, single-tap re-enables. Removes 0xFF disconnect packets (bridge detects via silence + refused audio). Generates unique hostname from MAC suffix (onju-A1B2C3). Restructure onjuino TCP handling to persistent connection loop (500ms header timeout) so LED blinks reuse one connection per VAD session instead of opening 4-10 connections/sec. Pipeline tracks VAD recording transitions, opens/closes LED TCP accordingly. Remove send_stop_listening from pipeline — mic stays active during ASR/LLM/TTS for better interrupt detection, eliminates zombie-state failure mode. greet_device always sends 0xCC LED pulse for IP registration. Fix config chunk_size 480→512 to match device.	2026-04-09 13:32:23 -07:00
justLV	502187efdc	Set Archer as default ElevenLabs voice in example config	2026-04-08 16:28:00 -07:00
justLV	4f0fbaafda	Change tap LED from green to white for consistency	2026-04-08 14:02:23 -07:00
justLV	fcc2ef284b	Fix Opus TCP read race: check available() after disconnect in frame read loop	2026-04-08 14:00:33 -07:00
justLV	260fbea9eb	Fix onjuino interaction description (VAD, not double-tap), update m5_echo README terminology	2026-04-08 13:49:09 -07:00
justLV	2943b07596	minor	2026-04-08 13:47:53 -07:00
justLV	742d31bcd7	readme	2026-04-08 13:45:01 -07:00
justLV	bf1ceb3e69	Remove redundant top-level default_voice from TTS config device.py now reads default_voice from tts.elevenlabs directly.	2026-04-08 13:37:20 -07:00
justLV	09f218b80d	Add OpenClaw setup script and documentation Script enables gateway chat completions endpoint, appends voice mode prompt to AGENTS.md (idempotent), and restarts the gateway.	2026-04-08 13:22:09 -07:00
justLV	36f4988867	readme	2026-04-08 13:08:12 -07:00
justLV	188aeae1c6	Remove voice agent section from README, fix diagram arrows	2026-04-08 13:06:54 -07:00
justLV	3e70ad5ee5	Move Schematic.pdf from images/ to hardware/	2026-04-08 13:02:08 -07:00
justLV	b2586d2c61	Update schematic PDF to latest revision	2026-04-08 13:01:02 -07:00
justLV	398f89dca7	Prepare repo for v2 release: rewrite README, clean up dev scripts, embed ASR server - Rewrite README with v2 features (OpenClaw, M5 Echo, Opus, pluggable backends), fold ARCHITECTURE.md and PIPELINE.md content inline - Remove dev-only test scripts (streaming TTS, UDP recv, qwen3 bench, etc.) - Remove redundant m5_echo/flash.sh and terminal.py (root scripts handle both) - Consolidate credentials to .template naming, remove .example - Embed parakeet-mlx ASR server as optional dependency (pipeline/services/asr_server.py) - Default LLM to Claude Haiku 4.5 via OpenRouter, local example uses Gemma 4 E4B - Update pyproject.toml with metadata, bump to 2.0.0 - Clean up .gitignore	2026-04-08 13:00:15 -07:00
justLV	81452009d7	Check for .ino.bin artifact to detect stale/missing builds Look for the actual firmware binary (.ino.bin) instead of any .bin when deciding whether to skip compilation. If the build dir exists but the artifact is missing, force a recompile automatically.	2026-04-08 10:53:58 -07:00
justLV	7b734b96b8	Fix firmware: callActive on boot, 0xDD timeout units, opus frame read - Set callActive=true on boot for both VOX and PTT modes - Fix 0xDD mic timeout: multiply by 1000 (was treating seconds as ms) - Mic timeout no longer kills callActive — only double-tap ends call - Fix opus frame length read: loop until both bytes arrive (was reading 1 byte + uninitialized garbage → invalid frame lengths like 18605) - Drain TCP on invalid frame to avoid corrupting next connection	2026-04-07 20:24:02 -07:00
justLV	19d48d4e3c	OpenClaw managed backend, VAD-aware interrupt, firmware fixes - Add managed conversation backend for OpenClaw (x-openclaw-message-channel header, user field for device identity) - Replace aggressive interrupt logic with VAD-aware check: only interrupt on actual speech, not background noise/trailing packets - Fix 0xDD timeout units (was milliseconds, now seconds) and keep callActive alive with 30s hold during LLM+TTS processing - Set callActive on boot for VOX mode so device accepts audio without tap - Mic timeout no longer kills callActive — only double-tap ends the call - LED feedback: scale to configured led_power, let device handle fade-down - Add greeting toggle, TTS/SEND logging, pyserial dep, setuptools config	2026-04-07 20:16:33 -07:00
justLV	a3ac260e1c	Remove old flash_firmware.sh, replaced by unified flash.sh	2026-04-07 19:33:16 -07:00
justLV	781945fa56	Unify flash scripts, auto-install Arduino libs, fix SSID number selection Combine flash_firmware.sh and m5_echo/flash.sh into a single flash.sh that takes a target arg (onjuino default, m5_echo). Auto-installs required Arduino libraries (Adafruit NeoPixel, esp32_opus). Typing a number at the WiFi SSID prompt now selects the corresponding network.	2026-04-07 19:28:57 -07:00
justLV	74890f3202	Fix crash: defer UDP disconnect signal to task context handleDoubleTap runs in an ISR where UDP operations cause a scheduler assert (prvSelectHighestPriorityTaskSMP). Move the disconnect signal send to touchTask via a volatile flag. Also allow numeric selection in flash_firmware.sh WiFi picker.	2026-04-07 17:24:33 -07:00
justLV	a8cb4b9576	Remove 0xDD thinking LED, add delay between disconnect signals The green "thinking" pulse on mic stop was from the old local pipeline. With the sesame bridge, 0xDD is only used at call end — no LED needed.	2026-04-07 17:20:16 -07:00
justLV	1eaaddbc26	Add 5ms delay between disconnect signal UDP sends Spreads the 3 packets across different network frames for better reliability against packet loss.	2026-04-07 17:09:36 -07:00
justLV	a91cb8a879	Send UDP disconnect signal on double-tap (0xFF byte, 3x) Bridge detects the 1-byte UDP packet and ends the call instantly, replacing timeout-based disconnect detection.	2026-04-07 17:07:16 -07:00
justLV	e4d7bc7ca5	End-of-speech protocol, LED tweaks, call-end guard - Handle zero-length Opus frame (0x00 0x00) as end-of-speech marker: exits opusDecodeTask cleanly, clears isPlaying, re-enables mic - Zero I2S DMA buffer on opusDecodeTask exit (prevents stale DMA) - Reject 0xAA audio commands when callActive is false (prevents bridge from restarting playback after user double-tapped to end) - Don't reset mic_timeout after playback if call was ended - LED: white flash for tap/interrupt, red-orange for call end - Pipeline: append end-of-speech marker to Opus TCP payload - ARCHITECTURE.md: document end-of-speech marker protocol	2026-04-07 16:41:59 -07:00
justLV	7bcb94833c	Add PTT device support, IIR DC offset fix, control API, test script updates PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets stop, skip LED commands, interrupt in-flight responses on new audio. Auto-detected from multicast "PTT" announcement. HTTP control server on :3002 for runtime device management: POST/GET/DELETE /devices Firmware: replace per-chunk DC offset with IIR filter to eliminate zipper noise at chunk boundaries (m5_echo + onjuino). Protocol: TCP timeouts use actual timeout param, failures are silent for non-critical commands (LED blink). Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution warning, Gemini OpenAI-compatible endpoint support. Test scripts: rewritten to use pipeline modules, delete redundant test_opus_tts.py, add pyproject.toml (replaces requirements.txt).	2026-04-06 14:22:20 -07:00
justLV	fe8e71131b	Move conversation persistence into local backend, simplify DeviceManager LocalConversation now owns its own per-device message files (data/conversations/{hostname}.json) controlled by persist_dir config. DeviceManager becomes a pure in-memory device registry — devices re-announce via multicast on boot so no persistence needed. Removes --persist CLI flag.	2026-04-06 11:55:08 -07:00
justLV	faea573ab9	Extract conversation layer from device, delete deprecated server/ Separate memory/context/conversation into a pluggable backend so the LLM layer can be swapped without touching device or main. Two backends: - local: manages message history, sends full context to any OpenAI-compatible endpoint - managed: delegates to a remote service (OpenClaw) that owns session state Also: rename persist_file -> registry_file, remove unused services/llm.py, delete old server/ directory.	2026-04-06 11:31:38 -07:00
justLV	3c133ef40e	Add LED toggle, TCP_NODELAY, volume controls, --no-monitor flag - L command toggles LED on/off (persisted to NVS) to reduce power jitter - +/- commands adjust volume live, including during playback - tcpServer.setNoDelay(true) to reduce TCP latency - flash.sh --no-monitor flag to skip serial monitor after upload	2026-04-03 17:06:44 -07:00
justLV	4cd008d822	Fix I2S stereo interleaving, persist server IP, add volume controls Write each sample as L+R stereo pair for ALL_RIGHT I2S format — previous mono writes dropped every other sample causing aliasing/sibilance on speech. Set I2S rate to 16kHz (was 8kHz with broken 2x assumption). Also: save server IP to NVS for auto-reconnect on reboot, add +/- serial volume commands (work during playback), lower default volume to 5.	2026-04-03 16:16:48 -07:00
justLV	529981de54	Add M5Stack ATOM Echo PTT firmware and onjuino PTT mode flag New m5_echo/ firmware for the ATOM Echo (ESP32-PICO-D4) with push-to-talk: - Auto-starts call on boot via PTT multicast announcement - Button hold = record mic (PDM, mu-law), release = listen - Persistent TCP connection survives PTT cycles (Opus task discards frames during PTT instead of closing connection) - Handles ESP32 I2S ALL_RIGHT stereo quirks (2x sample rate compensation for both mic and speaker) - Includes flash script, serial terminal, and integration test tools Also adds PTT_MODE flag to onjuino for bridge compatibility (multicast announcement, auto-start call, skip VAD mic timeouts).	2026-04-03 15:36:42 -07:00
justLV	ede39e0c67	Compile before port detection so skip-compile message is visible	2026-03-30 11:29:17 -07:00
justLV	d1e115c272	Fix silent exit when no USB device connected The ls glob failure was caught by set -e before the error message could print.	2026-03-30 11:27:49 -07:00
justLV	ea9385b74d	Fix WiFi detection on macOS Tahoe (SSID redacted from APIs) Show preferred networks list and let user confirm instead of silently picking the wrong network from Keychain.	2026-03-30 11:26:36 -07:00
justLV	1a475c2f4c	Auto-detect WiFi from macOS Keychain, skip recompile when unchanged flash_firmware.sh now generates credentials.h from template using the system's current WiFi SSID and Keychain password (with interactive fallback). Skips compilation when source files haven't changed. Adds --regen and --force flags. Also switches center touch from long-press to double-tap for ending calls.	2026-03-29 20:00:59 -07:00
justLV	daeaba9bf8	Move touch polling to FreeRTOS task (fix long press during playback) Long-press detection was in loop() which blocks during TCP audio handling. Moved to dedicated touchTask on Core 1 that polls every 20ms regardless of what loop() is doing.	2026-03-27 19:15:16 -07:00

1 2

97 commits