Commit graph

8 commits

Author SHA1 Message Date
justLV
19d48d4e3c OpenClaw managed backend, VAD-aware interrupt, firmware fixes
- Add managed conversation backend for OpenClaw (x-openclaw-message-channel
  header, user field for device identity)
- Replace aggressive interrupt logic with VAD-aware check: only interrupt
  on actual speech, not background noise/trailing packets
- Fix 0xDD timeout units (was milliseconds, now seconds) and keep callActive
  alive with 30s hold during LLM+TTS processing
- Set callActive on boot for VOX mode so device accepts audio without tap
- Mic timeout no longer kills callActive — only double-tap ends the call
- LED feedback: scale to configured led_power, let device handle fade-down
- Add greeting toggle, TTS/SEND logging, pyserial dep, setuptools config
2026-04-07 20:16:33 -07:00
justLV
7bcb94833c Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.

HTTP control server on :3002 for runtime device management:
  POST/GET/DELETE /devices

Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).

Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).

Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.

Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 14:22:20 -07:00
justLV
fe8e71131b Move conversation persistence into local backend, simplify DeviceManager
LocalConversation now owns its own per-device message files
(data/conversations/{hostname}.json) controlled by persist_dir config.
DeviceManager becomes a pure in-memory device registry — devices
re-announce via multicast on boot so no persistence needed. Removes
--persist CLI flag.
2026-04-06 11:55:08 -07:00
justLV
faea573ab9 Extract conversation layer from device, delete deprecated server/
Separate memory/context/conversation into a pluggable backend so the
LLM layer can be swapped without touching device or main. Two backends:
- local: manages message history, sends full context to any OpenAI-compatible endpoint
- managed: delegates to a remote service (OpenClaw) that owns session state

Also: rename persist_file -> registry_file, remove unused services/llm.py,
delete old server/ directory.
2026-04-06 11:31:38 -07:00
justLV
13f9d59245 Add Qwen3-TTS as local TTS backend with voice cloning
Adds mlx-audio-based Qwen3-TTS as an alternative to ElevenLabs,
enabling fully offline voice synthesis with voice cloning from a
short reference audio clip. Benchmarked at 0.52x RTF (sub-realtime)
on Apple Silicon with the 1.7B-Base-4bit model.
2026-02-09 13:53:46 -08:00
justLV
0c9c75b3bf Replace webrtcvad with Silero VAD (ONNX, no PyTorch)
Switch from webrtcvad's binary is_speech to Silero VAD's calibrated
float probability via direct ONNX session calls with numpy. The LSTM
provides temporal smoothing natively, eliminating the sliding window
hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end
to match Silero's requirements.

Consolidate pipeline/requirements.txt into root requirements.txt,
swap webrtcvad+setuptools for silero-vad+onnxruntime.
2026-02-07 17:00:02 -08:00
justLV
7162aa0f3b Improve pipeline setup, logging, and test client compatibility
Move venv to repo root with combined requirements.txt, fix libopus/portaudio
discovery on macOS, replace deprecated audioop with numpy u-law encoder,
add colored pipeline logging with suppressed third-party noise, fix mic
deadlock on non-speech rejection, fix localhost IP mismatch for test client,
add VAD visualization bar, tune VAD for conversational speech, and move
runtime data to gitignored data/ directory.
2026-02-07 16:22:53 -08:00
justLV
b3538493a6 Add modular async pipeline server and ESP32 mDNS fallback
Pipeline: async voice pipeline replacing monolithic threaded server.
ASR, LLM, and TTS are independent pluggable services. ASR calls
external parakeet-asr-server, LLM uses any OpenAI-compatible
endpoint, TTS uses ElevenLabs with pluggable backend interface.

Firmware: add mDNS hostname resolution as fallback when multicast
discovery doesn't work. Resolves configured server_hostname via
MDNS.queryHost() on boot, falls back to multicast if resolution fails.

Also adds test_client.py that emulates an ESP32 device for testing
without hardware (TCP server, Opus decode, mic streaming).
2026-02-07 15:04:12 -08:00