onju-v2

mirror of https://github.com/justLV/onju-v2 synced 2026-04-21 15:47:55 +00:00

Author	SHA1	Message	Date
justLV	e4d7bc7ca5	End-of-speech protocol, LED tweaks, call-end guard - Handle zero-length Opus frame (0x00 0x00) as end-of-speech marker: exits opusDecodeTask cleanly, clears isPlaying, re-enables mic - Zero I2S DMA buffer on opusDecodeTask exit (prevents stale DMA) - Reject 0xAA audio commands when callActive is false (prevents bridge from restarting playback after user double-tapped to end) - Don't reset mic_timeout after playback if call was ended - LED: white flash for tap/interrupt, red-orange for call end - Pipeline: append end-of-speech marker to Opus TCP payload - ARCHITECTURE.md: document end-of-speech marker protocol	2026-04-07 16:41:59 -07:00
justLV	0c9c75b3bf	Replace webrtcvad with Silero VAD (ONNX, no PyTorch) Switch from webrtcvad's binary is_speech to Silero VAD's calibrated float probability via direct ONNX session calls with numpy. The LSTM provides temporal smoothing natively, eliminating the sliding window hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end to match Silero's requirements. Consolidate pipeline/requirements.txt into root requirements.txt, swap webrtcvad+setuptools for silero-vad+onnxruntime.	2026-02-07 17:00:02 -08:00
justLV	496f614cb5	cleanup architecture doc	2026-02-07 16:27:32 -08:00
justLV	c3514ceb49	Add Opus compression for speaker audio Implements Opus decoding on ESP32 for TTS playback, achieving 14-16x compression over raw PCM. This improves WiFi throughput margin from 2.2x to 30x+, enabling reliable operation throughout the home even with poor WiFi conditions. Key changes: - Add Opus decoder to ESP32 firmware with dedicated 32KB FreeRTOS task - Implement length-prefixed TCP framing for variable-bitrate Opus frames - Update header protocol: header[5] = compression type (0=PCM, 1=μ-law, 2=Opus) - Auto-detect USB port in flash and serial monitor scripts - Add test script with opuslib encoder supporting WAV/M4A/MP3 input - Document architecture and design rationale for μ-law/UDP (mic) vs Opus/TCP (speaker) Performance: - Compression: 640 bytes PCM → 35-50 bytes Opus per 20ms frame (14-16x) - Bandwidth: 256 kbps → 16 kbps (94% reduction) - WiFi margin: 2.2x → 30x+ throughput safety margin - CPU usage: ~10-20% during playback on ESP32-S3 - Quality: High-fidelity voice suitable for human listening 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-01-31 17:41:16 -08:00

4 commits