onju-v2

Elgato_dark/onju-v2

Fork 0

mirror of https://github.com/justLV/onju-v2 synced 2026-04-21 15:47:55 +00:00

Commit graph

Author	SHA1	Message	Date
justLV	0c9c75b3bf	Replace webrtcvad with Silero VAD (ONNX, no PyTorch) Switch from webrtcvad's binary is_speech to Silero VAD's calibrated float probability via direct ONNX session calls with numpy. The LSTM provides temporal smoothing natively, eliminating the sliding window hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end to match Silero's requirements. Consolidate pipeline/requirements.txt into root requirements.txt, swap webrtcvad+setuptools for silero-vad+onnxruntime.	2026-02-07 17:00:02 -08:00
justLV	496f614cb5	cleanup architecture doc	2026-02-07 16:27:32 -08:00
justLV	c3514ceb49	Add Opus compression for speaker audio Implements Opus decoding on ESP32 for TTS playback, achieving 14-16x compression over raw PCM. This improves WiFi throughput margin from 2.2x to 30x+, enabling reliable operation throughout the home even with poor WiFi conditions. Key changes: - Add Opus decoder to ESP32 firmware with dedicated 32KB FreeRTOS task - Implement length-prefixed TCP framing for variable-bitrate Opus frames - Update header protocol: header[5] = compression type (0=PCM, 1=μ-law, 2=Opus) - Auto-detect USB port in flash and serial monitor scripts - Add test script with opuslib encoder supporting WAV/M4A/MP3 input - Document architecture and design rationale for μ-law/UDP (mic) vs Opus/TCP (speaker) Performance: - Compression: 640 bytes PCM → 35-50 bytes Opus per 20ms frame (14-16x) - Bandwidth: 256 kbps → 16 kbps (94% reduction) - WiFi margin: 2.2x → 30x+ throughput safety margin - CPU usage: ~10-20% during playback on ESP32-S3 - Quality: High-fidelity voice suitable for human listening 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-01-31 17:41:16 -08:00

Author

SHA1

Message

Date

justLV

0c9c75b3bf

Replace webrtcvad with Silero VAD (ONNX, no PyTorch)

Switch from webrtcvad's binary is_speech to Silero VAD's calibrated
float probability via direct ONNX session calls with numpy. The LSTM
provides temporal smoothing natively, eliminating the sliding window
hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end
to match Silero's requirements.

Consolidate pipeline/requirements.txt into root requirements.txt,
swap webrtcvad+setuptools for silero-vad+onnxruntime.

2026-02-07 17:00:02 -08:00

justLV

496f614cb5

cleanup architecture doc

2026-02-07 16:27:32 -08:00

justLV

c3514ceb49

Add Opus compression for speaker audio

Implements Opus decoding on ESP32 for TTS playback, achieving 14-16x
compression over raw PCM. This improves WiFi throughput margin from 2.2x
to 30x+, enabling reliable operation throughout the home even with poor
WiFi conditions.

Key changes:
- Add Opus decoder to ESP32 firmware with dedicated 32KB FreeRTOS task
- Implement length-prefixed TCP framing for variable-bitrate Opus frames
- Update header protocol: header[5] = compression type (0=PCM, 1=μ-law, 2=Opus)
- Auto-detect USB port in flash and serial monitor scripts
- Add test script with opuslib encoder supporting WAV/M4A/MP3 input
- Document architecture and design rationale for μ-law/UDP (mic) vs Opus/TCP (speaker)

Performance:
- Compression: 640 bytes PCM → 35-50 bytes Opus per 20ms frame (14-16x)
- Bandwidth: 256 kbps → 16 kbps (94% reduction)
- WiFi margin: 2.2x → 30x+ throughput safety margin
- CPU usage: ~10-20% during playback on ESP32-S3
- Quality: High-fidelity voice suitable for human listening

🤖 Generated with [Claude Code](https://claude.com/claude-code)

2026-01-31 17:41:16 -08:00

3 commits