Commit graph

4 commits

Author SHA1 Message Date
justLV
e4d7bc7ca5 End-of-speech protocol, LED tweaks, call-end guard
- Handle zero-length Opus frame (0x00 0x00) as end-of-speech marker:
  exits opusDecodeTask cleanly, clears isPlaying, re-enables mic
- Zero I2S DMA buffer on opusDecodeTask exit (prevents stale DMA)
- Reject 0xAA audio commands when callActive is false (prevents
  bridge from restarting playback after user double-tapped to end)
- Don't reset mic_timeout after playback if call was ended
- LED: white flash for tap/interrupt, red-orange for call end
- Pipeline: append end-of-speech marker to Opus TCP payload
- ARCHITECTURE.md: document end-of-speech marker protocol
2026-04-07 16:41:59 -07:00
justLV
0c9c75b3bf Replace webrtcvad with Silero VAD (ONNX, no PyTorch)
Switch from webrtcvad's binary is_speech to Silero VAD's calibrated
float probability via direct ONNX session calls with numpy. The LSTM
provides temporal smoothing natively, eliminating the sliding window
hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end
to match Silero's requirements.

Consolidate pipeline/requirements.txt into root requirements.txt,
swap webrtcvad+setuptools for silero-vad+onnxruntime.
2026-02-07 17:00:02 -08:00
justLV
496f614cb5 cleanup architecture doc 2026-02-07 16:27:32 -08:00
justLV
c3514ceb49 Add Opus compression for speaker audio
Implements Opus decoding on ESP32 for TTS playback, achieving 14-16x
compression over raw PCM. This improves WiFi throughput margin from 2.2x
to 30x+, enabling reliable operation throughout the home even with poor
WiFi conditions.

Key changes:
- Add Opus decoder to ESP32 firmware with dedicated 32KB FreeRTOS task
- Implement length-prefixed TCP framing for variable-bitrate Opus frames
- Update header protocol: header[5] = compression type (0=PCM, 1=μ-law, 2=Opus)
- Auto-detect USB port in flash and serial monitor scripts
- Add test script with opuslib encoder supporting WAV/M4A/MP3 input
- Document architecture and design rationale for μ-law/UDP (mic) vs Opus/TCP (speaker)

Performance:
- Compression: 640 bytes PCM → 35-50 bytes Opus per 20ms frame (14-16x)
- Bandwidth: 256 kbps → 16 kbps (94% reduction)
- WiFi margin: 2.2x → 30x+ throughput safety margin
- CPU usage: ~10-20% during playback on ESP32-S3
- Quality: High-fidelity voice suitable for human listening

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-01-31 17:41:16 -08:00