Commit graph

97 commits

Author SHA1 Message Date
justLV
18e9f9d541
add link to battery base for m5 echo 2026-04-13 10:01:03 -07:00
justLV
55a4019a8a
add link to battery base for m5 echo
Updated M5 Echo support description with battery base link.
2026-04-13 10:00:33 -07:00
justLV
126ed40a20
add order link 2026-04-13 00:13:16 -07:00
justLV
a2106d032b
title 2026-04-12 19:29:50 -07:00
justLV
44c7be03b8 Only persist assistant history after successful TTS delivery
Move the conversational backend's _finalize() call out of stream()'s
finally block and expose it as a public commit(text) method instead.
The previous behavior persisted whatever was yielded even when the
stream errored, the user interrupted, or a TTS send failed — so saved
history diverged from what the user actually heard, and the next turn
replayed phantom context to the LLM.

main.py now calls backend.commit(response_text) only after a turn
successfully completes and produced content. Agentic backend gets a
no-op commit() since history lives on the remote service.
2026-04-12 19:11:09 -07:00
justLV
0bc3ae209f Pre-publish fixes: local TTS key, multicast crash guard, doc drift
- Rename TTS backend "qwen3" -> "local" across tts.py and README; the
  code is a generic /v1/audio/speech client, not qwen-specific, and
  config.yaml.example already used the local: key.
- Guard multicast_listener against non-UTF8 and empty packets so a
  single bad announcement packet can't cancel the pipeline via gather.
- Fix credentials.h.template comments to reference flash.sh (not the
  old flash_firmware.sh name).
- Drop stray test.wav arg from serial_monitor.py usage example in
  README; the script takes an optional serial port, not an audio file.
2026-04-12 19:09:58 -07:00
justLV
002ed7388d Refine stall classifier prompt and group benchmark cases by label
Rework the stall prompt to distinguish LOOKUP (say something specific,
three-to-seven words) from ACTION (content-free backchannel, two-to-five
words, no action verbs or promises) and restructure test_stall.py to
group cases by expected label for easier manual review.
2026-04-12 19:08:40 -07:00
justLV
9ae918009b Move Streaming and stalls section below OpenClaw setup 2026-04-12 15:01:23 -07:00
justLV
f0f6e38e7c Move test scripts into tests/ and add stall/stream benchmarks
- git mv test_client, test_mic, test_speaker into tests/
- Add tests/test_stall.py (benchmarks the Gemini stall classifier against
  conversational/fetch/capture/act/follow-up queries)
- Add tests/test_stream.py (raw SSE chunk inspection against the agentic
  gateway)
- Update config path resolution in the new tests to climb one level
- Update README Testing section with new tests/ paths
2026-04-12 14:22:52 -07:00
justLV
04990145ae Embed git hash in m5_echo, generate git_hash.h in flash.sh
Moves git_hash.h generation from the one-shot setup-git-hash.sh
post-commit hook into flash.sh, so fresh checkouts don't need a
bootstrap step. Only rewrites the file when the hash changes to
avoid triggering unnecessary recompiles.

Also wires GIT_HASH into m5_echo: startup log, multicast announce,
and the Device: line. Both sketches now append the hash to Device:.
2026-04-12 14:18:21 -07:00
justLV
dccb6ced15 Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
  boundaries (handles no-space chunk joins), synthesize+send each sentence
  as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
  retrieval-only, pass prev user/assistant for context awareness, avoid
  action promises the stall can't honor, sub-second latency via
  reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
  (files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
  arrive mid-response and keep buffering so the next utterance captures
  cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
  note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 13:55:59 -07:00
justLV
19aca75ba8 Add separate default voice for PTT devices
PTT devices have smaller speakers that don't carry bass well, so default
them to a female voice (Emma) while keeping Archer as the VOX default.
Per-device voice overrides still take precedence.
2026-04-12 12:23:24 -07:00
justLV
a2ab42929c Use m5stack_atom board for Atom Echo, MAC-based hostname
flash.sh: switch m5_echo target from generic pico32 @ 115200 baud to
esp32:esp32:m5stack_atom @ 1500000 baud — ~8x faster uploads and
correct partition scheme (3MB app vs 1.3MB).

m5_echo: derive hostname as m5-echo-XXYYZZ from the WiFi STA MAC,
matching onjuino's pattern, and print Device line after Opus init.
2026-04-10 16:50:51 -07:00
justLV
28040a77bb Fix red-flash after interrupt, VAD LED gate, MAC readout, logging
On playback interrupt both Opus and PCM paths now close the TCP socket
instead of trying a frame-aligned drain. The drain was misaligned when
an interrupt hit mid-frame, leaving stray bytes that the next persistent-
loop iteration read as a garbage "header" and flashed the error red LED.
The bridge opens a fresh TCP per audio push anyway, so closing is safe.

0xCC LED blink handler now treats level=0 as a no-op heartbeat. setLed
is an overwrite (not cumulative), so a level=0 write would abruptly
truncate an in-flight fade. Only level>0 writes touch LED state or
extend mic_timeout — leaves headroom for future keepalive schemes
without freezing ongoing fades.

MAC address now read via esp_read_mac(ESP_MAC_WIFI_STA) instead of
WiFi.macAddress(). On arduino-esp32 v3.x the latter returns all zeros
before WiFi is fully initialized, producing a hostname like
"onju-000000". eFuse read works unconditionally.

Log a "Device: <hostname> @ <ip>" line right after the mic offset on
boot so it's easy to spot the IP in serial traces, and label the saved
server IP log as "Saved server IP" so it's not confused with the
device's own IP.
2026-04-10 16:32:55 -07:00
justLV
c962d3efbf Gate double-tap on prior normal tap; recover from TCP stalls
Double-tap to disable now requires a previously completed standalone normal
tap (one whose double-tap window expired without a 2nd tap). Cold start and
re-enable both begin locked, so tap-tap can no longer disable on first
interaction or back-to-back after re-enabling. handleShortPress reports
whether the tap was a real action so no-ops (mute / no server) and re-enables
don't satisfy the prerequisite. Center-touch debounce dropped to 150ms and
the double-tap window bumped to 700ms so the 2nd tap has real slack.

Both Opus and PCM playback loops now break out and force-close the TCP
socket if no bytes arrive for 2s, instead of spinning while the I2S DMA
buffer loops the last chunk. Inner Opus reads also poll interruptPlayback
so user double-taps actually unblock a stalled read. isPlaying is no longer
cleared in the touch handlers — playback cleanup clears it after I2S DMA
is zeroed, so the mic loop can't reopen while the speaker tail is audible.
2026-04-10 15:34:30 -07:00
justLV
dd42fdb668 Simplify device state model, reduce TCP churn, fix multi-device identity
Replace callActive/sendDisconnect with deviceEnabled toggle — device
starts enabled on boot, double-tap disables, single-tap re-enables.
Removes 0xFF disconnect packets (bridge detects via silence + refused
audio). Generates unique hostname from MAC suffix (onju-A1B2C3).

Restructure onjuino TCP handling to persistent connection loop (500ms
header timeout) so LED blinks reuse one connection per VAD session
instead of opening 4-10 connections/sec. Pipeline tracks VAD recording
transitions, opens/closes LED TCP accordingly.

Remove send_stop_listening from pipeline — mic stays active during
ASR/LLM/TTS for better interrupt detection, eliminates zombie-state
failure mode. greet_device always sends 0xCC LED pulse for IP
registration. Fix config chunk_size 480→512 to match device.
2026-04-09 13:32:23 -07:00
justLV
502187efdc Set Archer as default ElevenLabs voice in example config 2026-04-08 16:28:00 -07:00
justLV
4f0fbaafda Change tap LED from green to white for consistency 2026-04-08 14:02:23 -07:00
justLV
fcc2ef284b Fix Opus TCP read race: check available() after disconnect in frame read loop 2026-04-08 14:00:33 -07:00
justLV
260fbea9eb Fix onjuino interaction description (VAD, not double-tap), update m5_echo README terminology 2026-04-08 13:49:09 -07:00
justLV
2943b07596 minor 2026-04-08 13:47:53 -07:00
justLV
742d31bcd7 readme 2026-04-08 13:45:01 -07:00
justLV
bf1ceb3e69 Remove redundant top-level default_voice from TTS config
device.py now reads default_voice from tts.elevenlabs directly.
2026-04-08 13:37:20 -07:00
justLV
09f218b80d Add OpenClaw setup script and documentation
Script enables gateway chat completions endpoint, appends voice mode
prompt to AGENTS.md (idempotent), and restarts the gateway.
2026-04-08 13:22:09 -07:00
justLV
36f4988867 readme 2026-04-08 13:08:12 -07:00
justLV
188aeae1c6 Remove voice agent section from README, fix diagram arrows 2026-04-08 13:06:54 -07:00
justLV
3e70ad5ee5 Move Schematic.pdf from images/ to hardware/ 2026-04-08 13:02:08 -07:00
justLV
b2586d2c61 Update schematic PDF to latest revision 2026-04-08 13:01:02 -07:00
justLV
398f89dca7 Prepare repo for v2 release: rewrite README, clean up dev scripts, embed ASR server
- Rewrite README with v2 features (OpenClaw, M5 Echo, Opus, pluggable backends),
  fold ARCHITECTURE.md and PIPELINE.md content inline
- Remove dev-only test scripts (streaming TTS, UDP recv, qwen3 bench, etc.)
- Remove redundant m5_echo/flash.sh and terminal.py (root scripts handle both)
- Consolidate credentials to .template naming, remove .example
- Embed parakeet-mlx ASR server as optional dependency (pipeline/services/asr_server.py)
- Default LLM to Claude Haiku 4.5 via OpenRouter, local example uses Gemma 4 E4B
- Update pyproject.toml with metadata, bump to 2.0.0
- Clean up .gitignore
2026-04-08 13:00:15 -07:00
justLV
81452009d7 Check for .ino.bin artifact to detect stale/missing builds
Look for the actual firmware binary (*.ino.bin) instead of any *.bin
when deciding whether to skip compilation. If the build dir exists but
the artifact is missing, force a recompile automatically.
2026-04-08 10:53:58 -07:00
justLV
7b734b96b8 Fix firmware: callActive on boot, 0xDD timeout units, opus frame read
- Set callActive=true on boot for both VOX and PTT modes
- Fix 0xDD mic timeout: multiply by 1000 (was treating seconds as ms)
- Mic timeout no longer kills callActive — only double-tap ends call
- Fix opus frame length read: loop until both bytes arrive (was reading
  1 byte + uninitialized garbage → invalid frame lengths like 18605)
- Drain TCP on invalid frame to avoid corrupting next connection
2026-04-07 20:24:02 -07:00
justLV
19d48d4e3c OpenClaw managed backend, VAD-aware interrupt, firmware fixes
- Add managed conversation backend for OpenClaw (x-openclaw-message-channel
  header, user field for device identity)
- Replace aggressive interrupt logic with VAD-aware check: only interrupt
  on actual speech, not background noise/trailing packets
- Fix 0xDD timeout units (was milliseconds, now seconds) and keep callActive
  alive with 30s hold during LLM+TTS processing
- Set callActive on boot for VOX mode so device accepts audio without tap
- Mic timeout no longer kills callActive — only double-tap ends the call
- LED feedback: scale to configured led_power, let device handle fade-down
- Add greeting toggle, TTS/SEND logging, pyserial dep, setuptools config
2026-04-07 20:16:33 -07:00
justLV
a3ac260e1c Remove old flash_firmware.sh, replaced by unified flash.sh 2026-04-07 19:33:16 -07:00
justLV
781945fa56 Unify flash scripts, auto-install Arduino libs, fix SSID number selection
Combine flash_firmware.sh and m5_echo/flash.sh into a single flash.sh
that takes a target arg (onjuino default, m5_echo). Auto-installs
required Arduino libraries (Adafruit NeoPixel, esp32_opus). Typing a
number at the WiFi SSID prompt now selects the corresponding network.
2026-04-07 19:28:57 -07:00
justLV
74890f3202 Fix crash: defer UDP disconnect signal to task context
handleDoubleTap runs in an ISR where UDP operations cause a scheduler
assert (prvSelectHighestPriorityTaskSMP). Move the disconnect signal
send to touchTask via a volatile flag.

Also allow numeric selection in flash_firmware.sh WiFi picker.
2026-04-07 17:24:33 -07:00
justLV
a8cb4b9576 Remove 0xDD thinking LED, add delay between disconnect signals
The green "thinking" pulse on mic stop was from the old local pipeline.
With the sesame bridge, 0xDD is only used at call end — no LED needed.
2026-04-07 17:20:16 -07:00
justLV
1eaaddbc26 Add 5ms delay between disconnect signal UDP sends
Spreads the 3 packets across different network frames for better
reliability against packet loss.
2026-04-07 17:09:36 -07:00
justLV
a91cb8a879 Send UDP disconnect signal on double-tap (0xFF byte, 3x)
Bridge detects the 1-byte UDP packet and ends the call instantly,
replacing timeout-based disconnect detection.
2026-04-07 17:07:16 -07:00
justLV
e4d7bc7ca5 End-of-speech protocol, LED tweaks, call-end guard
- Handle zero-length Opus frame (0x00 0x00) as end-of-speech marker:
  exits opusDecodeTask cleanly, clears isPlaying, re-enables mic
- Zero I2S DMA buffer on opusDecodeTask exit (prevents stale DMA)
- Reject 0xAA audio commands when callActive is false (prevents
  bridge from restarting playback after user double-tapped to end)
- Don't reset mic_timeout after playback if call was ended
- LED: white flash for tap/interrupt, red-orange for call end
- Pipeline: append end-of-speech marker to Opus TCP payload
- ARCHITECTURE.md: document end-of-speech marker protocol
2026-04-07 16:41:59 -07:00
justLV
7bcb94833c Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.

HTTP control server on :3002 for runtime device management:
  POST/GET/DELETE /devices

Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).

Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).

Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.

Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 14:22:20 -07:00
justLV
fe8e71131b Move conversation persistence into local backend, simplify DeviceManager
LocalConversation now owns its own per-device message files
(data/conversations/{hostname}.json) controlled by persist_dir config.
DeviceManager becomes a pure in-memory device registry — devices
re-announce via multicast on boot so no persistence needed. Removes
--persist CLI flag.
2026-04-06 11:55:08 -07:00
justLV
faea573ab9 Extract conversation layer from device, delete deprecated server/
Separate memory/context/conversation into a pluggable backend so the
LLM layer can be swapped without touching device or main. Two backends:
- local: manages message history, sends full context to any OpenAI-compatible endpoint
- managed: delegates to a remote service (OpenClaw) that owns session state

Also: rename persist_file -> registry_file, remove unused services/llm.py,
delete old server/ directory.
2026-04-06 11:31:38 -07:00
justLV
3c133ef40e Add LED toggle, TCP_NODELAY, volume controls, --no-monitor flag
- L command toggles LED on/off (persisted to NVS) to reduce power jitter
- +/- commands adjust volume live, including during playback
- tcpServer.setNoDelay(true) to reduce TCP latency
- flash.sh --no-monitor flag to skip serial monitor after upload
2026-04-03 17:06:44 -07:00
justLV
4cd008d822 Fix I2S stereo interleaving, persist server IP, add volume controls
Write each sample as L+R stereo pair for ALL_RIGHT I2S format — previous
mono writes dropped every other sample causing aliasing/sibilance on speech.
Set I2S rate to 16kHz (was 8kHz with broken 2x assumption).

Also: save server IP to NVS for auto-reconnect on reboot, add +/- serial
volume commands (work during playback), lower default volume to 5.
2026-04-03 16:16:48 -07:00
justLV
529981de54 Add M5Stack ATOM Echo PTT firmware and onjuino PTT mode flag
New m5_echo/ firmware for the ATOM Echo (ESP32-PICO-D4) with push-to-talk:
- Auto-starts call on boot via PTT multicast announcement
- Button hold = record mic (PDM, mu-law), release = listen
- Persistent TCP connection survives PTT cycles (Opus task discards
  frames during PTT instead of closing connection)
- Handles ESP32 I2S ALL_RIGHT stereo quirks (2x sample rate
  compensation for both mic and speaker)
- Includes flash script, serial terminal, and integration test tools

Also adds PTT_MODE flag to onjuino for bridge compatibility (multicast
announcement, auto-start call, skip VAD mic timeouts).
2026-04-03 15:36:42 -07:00
justLV
ede39e0c67 Compile before port detection so skip-compile message is visible 2026-03-30 11:29:17 -07:00
justLV
d1e115c272 Fix silent exit when no USB device connected
The ls glob failure was caught by set -e before the error message
could print.
2026-03-30 11:27:49 -07:00
justLV
ea9385b74d Fix WiFi detection on macOS Tahoe (SSID redacted from APIs)
Show preferred networks list and let user confirm instead of silently
picking the wrong network from Keychain.
2026-03-30 11:26:36 -07:00
justLV
1a475c2f4c Auto-detect WiFi from macOS Keychain, skip recompile when unchanged
flash_firmware.sh now generates credentials.h from template using the
system's current WiFi SSID and Keychain password (with interactive
fallback). Skips compilation when source files haven't changed.
Adds --regen and --force flags. Also switches center touch from
long-press to double-tap for ending calls.
2026-03-29 20:00:59 -07:00
justLV
daeaba9bf8 Move touch polling to FreeRTOS task (fix long press during playback)
Long-press detection was in loop() which blocks during TCP audio handling.
Moved to dedicated touchTask on Core 1 that polls every 20ms regardless
of what loop() is doing.
2026-03-27 19:15:16 -07:00