Commit graph

48 commits

Author SHA1 Message Date
justLV
daeaba9bf8 Move touch polling to FreeRTOS task (fix long press during playback)
Long-press detection was in loop() which blocks during TCP audio handling.
Moved to dedicated touchTask on Core 1 that polls every 20ms regardless
of what loop() is doing.
2026-03-27 19:15:16 -07:00
justLV
9dc9abf753 Add long-press to end call, short-press call lifecycle, reduce mic timeout
- Add long-press detection (1.5s hold) on center touch to explicitly end call:
  stops mic, interrupts playback, shows slow amber LED pulse
- Rewrite touch handler: ISR records touch start, loop() polls for release
  to distinguish short press (<1.5s) from long press (>=1.5s)
- Add callActive state to track call lifecycle (tap to start, long-hold to end)
- Short press when idle shows subtle green flash (server confirms with full
  pulse once WebRTC call is established)
- Reduce default mic timeout from 60s to 20s (server VAD extends when active)
- Guard 0xCC handler: don't extend mic after user explicitly ended call
- Reset callActive on natural mic timeout
2026-03-27 11:04:48 -07:00
justLV
4312f8134a Switch LLM to Gemini 2.5 Flash with thinking disabled
- Resolve ${VAR} env var references in config api_key
- Support thinking_budget config to control Gemini thinking mode
2026-03-01 18:10:58 -08:00
justLV
9024dd53a6 Add --warmup and --persist CLI flags for pipeline startup
--warmup validates LLM and TTS backends on startup with test requests,
logging timing and response validation. --persist (off by default)
restores device state across restarts with message sanitization to
ensure proper role alternation for Gemma 3's chat template.
2026-02-11 17:59:49 -08:00
justLV
13f9d59245 Add Qwen3-TTS as local TTS backend with voice cloning
Adds mlx-audio-based Qwen3-TTS as an alternative to ElevenLabs,
enabling fully offline voice synthesis with voice cloning from a
short reference audio clip. Benchmarked at 0.52x RTF (sub-realtime)
on Apple Silicon with the 1.7B-Base-4bit model.
2026-02-09 13:53:46 -08:00
justLV
0c9c75b3bf Replace webrtcvad with Silero VAD (ONNX, no PyTorch)
Switch from webrtcvad's binary is_speech to Silero VAD's calibrated
float probability via direct ONNX session calls with numpy. The LSTM
provides temporal smoothing natively, eliminating the sliding window
hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end
to match Silero's requirements.

Consolidate pipeline/requirements.txt into root requirements.txt,
swap webrtcvad+setuptools for silero-vad+onnxruntime.
2026-02-07 17:00:02 -08:00
justLV
4efeeaea2b Update PIPELINE.md for root-level venv and setup 2026-02-07 16:28:39 -08:00
justLV
496f614cb5 cleanup architecture doc 2026-02-07 16:27:32 -08:00
justLV
7162aa0f3b Improve pipeline setup, logging, and test client compatibility
Move venv to repo root with combined requirements.txt, fix libopus/portaudio
discovery on macOS, replace deprecated audioop with numpy u-law encoder,
add colored pipeline logging with suppressed third-party noise, fix mic
deadlock on non-speech rejection, fix localhost IP mismatch for test client,
add VAD visualization bar, tune VAD for conversational speech, and move
runtime data to gitignored data/ directory.
2026-02-07 16:22:53 -08:00
justLV
b3538493a6 Add modular async pipeline server and ESP32 mDNS fallback
Pipeline: async voice pipeline replacing monolithic threaded server.
ASR, LLM, and TTS are independent pluggable services. ASR calls
external parakeet-asr-server, LLM uses any OpenAI-compatible
endpoint, TTS uses ElevenLabs with pluggable backend interface.

Firmware: add mDNS hostname resolution as fallback when multicast
discovery doesn't work. Resolves configured server_hostname via
MDNS.queryHost() on boot, falls back to multicast if resolution fails.

Also adds test_client.py that emulates an ESP32 device for testing
without hardware (TCP server, Opus decode, mic streaming).
2026-02-07 15:04:12 -08:00
justLV
7c531c90df Reduce playback buffer from 512ms to 256ms with Opus
With Opus compression providing consistent frame delivery, we can
safely reduce the jitter buffer from 8192 samples (512ms) to 4096
samples (256ms), cutting latency in half.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-02-03 18:37:32 -08:00
justLV
aedea0d568 Document flash_firmware.sh compile-only usage in README
Added section explaining how to use flash_firmware.sh for:
- Compile-only mode (no ESP32 needed)
- Auto-detect and flash
- Flash to specific port

Emphasized using compile-only mode to verify code before committing.

🤖 Generated with Claude Code (https://claude.com/claude-code)
2026-02-02 11:33:02 -08:00
justLV
f3b5b6a7f8 Add compile-only mode to flash_firmware.sh
Added compile-only mode that skips upload:
  ./flash_firmware.sh compile

Updated usage:
- flash_firmware.sh                    # Auto-detect and upload
- flash_firmware.sh /dev/cu.usbmodem1  # Upload to specific port
- flash_firmware.sh compile            # Compile only, no upload

Useful for verifying code compiles without needing ESP32 connected.

🤖 Generated with Claude Code (https://claude.com/claude-code)
2026-02-02 11:32:27 -08:00
justLV
fc2d8412ed Standardize mic timeout to 60s for all activation paths
Changed all mic activation paths to use 60s timeout:
- Center tap to start call: 30s → 60s
- Center tap to interrupt: 30s → 60s
- After assistant audio: Enforce minimum 60s (was using server value)

Behavior:
- Tap center → mic enabled for 60s
- Assistant speaks → mic auto-enabled for 60s after playback
- Tap during playback → interrupts and mic enabled for 60s

This ensures users always have adequate time to respond without
premature timeout, matching the intended UX.

🤖 Generated with Claude Code (https://claude.com/claude-code)
2026-02-02 11:30:07 -08:00
justLV
3586a05b0b Add touch debouncing to prevent rapid-fire interruptions
Added 800ms debounce to all touch pads (left, center, right) to prevent
accidental multiple touches from interrupting audio playback.

Changes:
- Added debounce timing variables (lines 57-61)
- Implemented debounce logic in gotTouch1/2/3 handlers (lines 967-1032)
- Each touch pad has independent debounce timer
- Touches within 800ms of previous touch are ignored

This prevents issues where:
- Center tap would trigger multiple times from single press
- Audio playback would be interrupted repeatedly
- User experience was degraded by touch sensitivity

The 800ms window provides good balance between preventing hardware
bounces and maintaining responsive feel for legitimate user input.

🤖 Generated with Claude Code
2026-02-02 01:32:12 -08:00
justLV
dd3dad883a Add Opus compression support to ElevenLabs streaming test
Enhances test_streaming_tts.py to support optional Opus encoding for
streaming TTS audio from ElevenLabs to ESP32.

Features:
- Add --opus flag to enable Opus compression
- Accept ESP32 IP as command-line argument
- Buffer PCM chunks into 20ms frames (640 bytes) for Opus encoding
- Send with length-prefixed framing (compatible with ESP32 decoder)
- Display compression statistics when using Opus

Usage:
  python test_streaming_tts.py [ESP32_IP] [--opus]

Results with Opus:
- Compression ratio: ~14.5x (248KB PCM → 17KB Opus)
- Bandwidth: 256 kbps → ~17 kbps (93% reduction)
- Maintains streaming latency (~2s to first chunk)
- High quality voice for human listening

Tested successfully with ElevenLabs API streaming to ESP32-S3.
2026-01-31 19:18:58 -08:00
justLV
b25300a6c6 Add tap-to-interrupt playback feature
Allows user to interrupt TTS playback mid-stream by tapping the center
touch button. Enables immediate voice input without waiting for assistant
to finish speaking.

Implementation:
- Add interruptPlayback volatile flag for ISR-safe signaling
- Opus decode task checks flag on each frame decode iteration
- PCM playback checks flag on each buffer read iteration
- On interrupt: stop decoding, clear I2S DMA buffers, drain TCP
- TCP drain runs for 1s to discard in-flight audio from server
- Skip silence buffer flush when interrupted (exit immediately)
- Enable microphone with 30s timeout for user response

Behavior:
- Latency: ~500ms (acceptable - next buffer iteration)
- Visual feedback: Green LED indicates listening mode
- Server timeout value still respected (gives user time to speak)
- Works for both Opus and PCM audio streams

User flow:
1. User taps during playback
2. Audio stops within ~500ms
3. Green LED pulses (listening mode)
4. Microphone enabled for 30s
5. User can speak immediately
2026-01-31 19:01:31 -08:00
justLV
c3514ceb49 Add Opus compression for speaker audio
Implements Opus decoding on ESP32 for TTS playback, achieving 14-16x
compression over raw PCM. This improves WiFi throughput margin from 2.2x
to 30x+, enabling reliable operation throughout the home even with poor
WiFi conditions.

Key changes:
- Add Opus decoder to ESP32 firmware with dedicated 32KB FreeRTOS task
- Implement length-prefixed TCP framing for variable-bitrate Opus frames
- Update header protocol: header[5] = compression type (0=PCM, 1=μ-law, 2=Opus)
- Auto-detect USB port in flash and serial monitor scripts
- Add test script with opuslib encoder supporting WAV/M4A/MP3 input
- Document architecture and design rationale for μ-law/UDP (mic) vs Opus/TCP (speaker)

Performance:
- Compression: 640 bytes PCM → 35-50 bytes Opus per 20ms frame (14-16x)
- Bandwidth: 256 kbps → 16 kbps (94% reduction)
- WiFi margin: 2.2x → 30x+ throughput safety margin
- CPU usage: ~10-20% during playback on ESP32-S3
- Quality: High-fidelity voice suitable for human listening

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-01-31 17:41:16 -08:00
justLV
4b450439c1 Add μ-law compression and fix ESP32-S3 V3 audio issues
This commit adds audio compression and fixes critical I2S configuration
issues that prevented audio playback on ESP32-S3 V3 boards.

Key Changes:
- Fix I2S channel from RIGHT to LEFT (V3 board requirement)
- Fix deprecated I2S_COMM_FORMAT_I2S to I2S_COMM_FORMAT_STAND_I2S
- Add μ-law compression for 2x bandwidth reduction (960→480 bytes/packet)
- Add DC offset removal for microphone to fix compression artifacts
- Add DISABLE_HARDWARE_MUTE option for boards without mute switch
- Improve mute button behavior to control mic_timeout

New Files:
- onjuino/audio_compression.h: μ-law encoding/decoding implementation
- flash_firmware.sh: Automated compilation and flashing script
- serial_monitor.py: Interactive serial monitor with auto-reconnect
- test_mic_receiver.py: UDP audio recording and compression testing
- test_speaker.py: Speaker testing with local WAV file
- test_streaming_tts.py: ElevenLabs streaming TTS performance testing
- record_from_esp32.py: Simple recording script for testing
- TESTING.md: Testing documentation

Fixes:
- I2S audio output now works on V3 boards (issue #57, #75)
- Microphone compression produces valid audio data
- Serial reset command now works properly (ESP.restart vs esp_restart)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-01-29 00:42:08 -08:00
justLV
bc57e687e8 clear 2024-09-09 17:56:26 -07:00
justLV
a5969a7fd4 update to defaults 2024-09-09 17:53:50 -07:00
justLV
53dc65194a config mode 2024-09-09 17:46:51 -07:00
justLV
0649cc72cb Merge pull request #39 from srwalter/srwalter-patch-1
Remove unnecessary header
2024-03-30 11:37:24 -07:00
Steven Walter
f976eef1fc Update onjuino.ino
Remove unnecessary header
2024-01-03 11:31:40 -05:00
justLV
ef4f4a1a86 Update README.md 2023-10-09 19:47:08 -07:00
justLV
fd9e4d4341 Update README.md 2023-10-09 19:46:35 -07:00
justLV
8737480697 Updated PCBWay & design file details 2023-10-03 12:32:51 -07:00
Justin Alvey
6083d12109 Added design files 2023-10-03 12:22:05 -07:00
justLV
d33b584316 Updated readme with downloadable Altium files & PCBWay link 2023-10-03 12:16:22 -07:00
justLV
7c2482aca5 Update README.md 2023-08-09 18:33:46 -07:00
justLV
5041ca6ae3 Update README.md 2023-08-09 14:10:09 -07:00
justLV
4ed7a41b2a Update README.md 2023-08-09 12:49:28 -07:00
Justin Alvey
bc65b07431 Missing openai req 2023-08-09 12:45:54 -07:00
justLV
ff9986a5cd Update README.md 2023-08-09 11:30:27 -07:00
Justin Alvey
24df7d1256 Added white header 2023-08-09 11:29:27 -07:00
justLV
b49a715707 Update README.md 2023-08-09 00:10:26 -07:00
justLV
90c2ea8025 Update README.md 2023-08-09 00:09:55 -07:00
justLV
5511a43f05 Update README.md 2023-08-09 00:05:23 -07:00
justLV
4243a4913d Update README.md 2023-08-08 23:56:28 -07:00
justLV
81eb364d8c Update README.md 2023-08-08 20:42:05 -07:00
justLV
ab2c7c8f0a Update README.md 2023-08-08 20:38:12 -07:00
justLV
c62df66223 Update README.md 2023-08-08 20:34:58 -07:00
Justin Alvey
27030c8094 add to public repo 2023-08-08 20:32:52 -07:00
justLV
1a7fae3d4d Update README.md 2023-08-08 20:16:12 -07:00
Justin Alvey
7b59929af6 renders 2023-08-08 20:07:01 -07:00
Justin Alvey
3f08890e26 images 2023-08-08 20:01:13 -07:00
Justin Alvey
ff2a1c8402 init readme 2023-08-08 19:22:27 -07:00
justLV
ead49decc1 Create LICENSE 2023-08-08 19:17:47 -07:00