onju-v2

mirror of https://github.com/justLV/onju-v2 synced 2026-04-21 23:57:26 +00:00

Author	SHA1	Message	Date
justLV	19d48d4e3c	OpenClaw managed backend, VAD-aware interrupt, firmware fixes - Add managed conversation backend for OpenClaw (x-openclaw-message-channel header, user field for device identity) - Replace aggressive interrupt logic with VAD-aware check: only interrupt on actual speech, not background noise/trailing packets - Fix 0xDD timeout units (was milliseconds, now seconds) and keep callActive alive with 30s hold during LLM+TTS processing - Set callActive on boot for VOX mode so device accepts audio without tap - Mic timeout no longer kills callActive — only double-tap ends the call - LED feedback: scale to configured led_power, let device handle fade-down - Add greeting toggle, TTS/SEND logging, pyserial dep, setuptools config	2026-04-07 20:16:33 -07:00
justLV	7bcb94833c	Add PTT device support, IIR DC offset fix, control API, test script updates PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets stop, skip LED commands, interrupt in-flight responses on new audio. Auto-detected from multicast "PTT" announcement. HTTP control server on :3002 for runtime device management: POST/GET/DELETE /devices Firmware: replace per-chunk DC offset with IIR filter to eliminate zipper noise at chunk boundaries (m5_echo + onjuino). Protocol: TCP timeouts use actual timeout param, failures are silent for non-critical commands (LED blink). Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution warning, Gemini OpenAI-compatible endpoint support. Test scripts: rewritten to use pipeline modules, delete redundant test_opus_tts.py, add pyproject.toml (replaces requirements.txt).	2026-04-06 14:22:20 -07:00
justLV	fe8e71131b	Move conversation persistence into local backend, simplify DeviceManager LocalConversation now owns its own per-device message files (data/conversations/{hostname}.json) controlled by persist_dir config. DeviceManager becomes a pure in-memory device registry — devices re-announce via multicast on boot so no persistence needed. Removes --persist CLI flag.	2026-04-06 11:55:08 -07:00
justLV	faea573ab9	Extract conversation layer from device, delete deprecated server/ Separate memory/context/conversation into a pluggable backend so the LLM layer can be swapped without touching device or main. Two backends: - local: manages message history, sends full context to any OpenAI-compatible endpoint - managed: delegates to a remote service (OpenClaw) that owns session state Also: rename persist_file -> registry_file, remove unused services/llm.py, delete old server/ directory.	2026-04-06 11:31:38 -07:00
justLV	13f9d59245	Add Qwen3-TTS as local TTS backend with voice cloning Adds mlx-audio-based Qwen3-TTS as an alternative to ElevenLabs, enabling fully offline voice synthesis with voice cloning from a short reference audio clip. Benchmarked at 0.52x RTF (sub-realtime) on Apple Silicon with the 1.7B-Base-4bit model.	2026-02-09 13:53:46 -08:00
justLV	0c9c75b3bf	Replace webrtcvad with Silero VAD (ONNX, no PyTorch) Switch from webrtcvad's binary is_speech to Silero VAD's calibrated float probability via direct ONNX session calls with numpy. The LSTM provides temporal smoothing natively, eliminating the sliding window hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end to match Silero's requirements. Consolidate pipeline/requirements.txt into root requirements.txt, swap webrtcvad+setuptools for silero-vad+onnxruntime.	2026-02-07 17:00:02 -08:00
justLV	7162aa0f3b	Improve pipeline setup, logging, and test client compatibility Move venv to repo root with combined requirements.txt, fix libopus/portaudio discovery on macOS, replace deprecated audioop with numpy u-law encoder, add colored pipeline logging with suppressed third-party noise, fix mic deadlock on non-speech rejection, fix localhost IP mismatch for test client, add VAD visualization bar, tune VAD for conversational speech, and move runtime data to gitignored data/ directory.	2026-02-07 16:22:53 -08:00
justLV	b3538493a6	Add modular async pipeline server and ESP32 mDNS fallback Pipeline: async voice pipeline replacing monolithic threaded server. ASR, LLM, and TTS are independent pluggable services. ASR calls external parakeet-asr-server, LLM uses any OpenAI-compatible endpoint, TTS uses ElevenLabs with pluggable backend interface. Firmware: add mDNS hostname resolution as fallback when multicast discovery doesn't work. Resolves configured server_hostname via MDNS.queryHost() on boot, falls back to multicast if resolution fails. Also adds test_client.py that emulates an ESP32 device for testing without hardware (TCP server, Opus decode, mic streaming).	2026-02-07 15:04:12 -08:00

8 commits