- Add managed conversation backend for OpenClaw (x-openclaw-message-channel
header, user field for device identity)
- Replace aggressive interrupt logic with VAD-aware check: only interrupt
on actual speech, not background noise/trailing packets
- Fix 0xDD timeout units (was milliseconds, now seconds) and keep callActive
alive with 30s hold during LLM+TTS processing
- Set callActive on boot for VOX mode so device accepts audio without tap
- Mic timeout no longer kills callActive — only double-tap ends the call
- LED feedback: scale to configured led_power, let device handle fade-down
- Add greeting toggle, TTS/SEND logging, pyserial dep, setuptools config
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.
HTTP control server on :3002 for runtime device management:
POST/GET/DELETE /devices
Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).
Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).
Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.
Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
LocalConversation now owns its own per-device message files
(data/conversations/{hostname}.json) controlled by persist_dir config.
DeviceManager becomes a pure in-memory device registry — devices
re-announce via multicast on boot so no persistence needed. Removes
--persist CLI flag.
Separate memory/context/conversation into a pluggable backend so the
LLM layer can be swapped without touching device or main. Two backends:
- local: manages message history, sends full context to any OpenAI-compatible endpoint
- managed: delegates to a remote service (OpenClaw) that owns session state
Also: rename persist_file -> registry_file, remove unused services/llm.py,
delete old server/ directory.
Adds mlx-audio-based Qwen3-TTS as an alternative to ElevenLabs,
enabling fully offline voice synthesis with voice cloning from a
short reference audio clip. Benchmarked at 0.52x RTF (sub-realtime)
on Apple Silicon with the 1.7B-Base-4bit model.
Switch from webrtcvad's binary is_speech to Silero VAD's calibrated
float probability via direct ONNX session calls with numpy. The LSTM
provides temporal smoothing natively, eliminating the sliding window
hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end
to match Silero's requirements.
Consolidate pipeline/requirements.txt into root requirements.txt,
swap webrtcvad+setuptools for silero-vad+onnxruntime.
Move venv to repo root with combined requirements.txt, fix libopus/portaudio
discovery on macOS, replace deprecated audioop with numpy u-law encoder,
add colored pipeline logging with suppressed third-party noise, fix mic
deadlock on non-speech rejection, fix localhost IP mismatch for test client,
add VAD visualization bar, tune VAD for conversational speech, and move
runtime data to gitignored data/ directory.
Pipeline: async voice pipeline replacing monolithic threaded server.
ASR, LLM, and TTS are independent pluggable services. ASR calls
external parakeet-asr-server, LLM uses any OpenAI-compatible
endpoint, TTS uses ElevenLabs with pluggable backend interface.
Firmware: add mDNS hostname resolution as fallback when multicast
discovery doesn't work. Resolves configured server_hostname via
MDNS.queryHost() on boot, falls back to multicast if resolution fails.
Also adds test_client.py that emulates an ESP32 device for testing
without hardware (TCP server, Opus decode, mic streaming).