onju-v2

mirror of https://github.com/justLV/onju-v2 synced 2026-04-21 15:47:55 +00:00

Author	SHA1	Message	Date
justLV	daeaba9bf8	Move touch polling to FreeRTOS task (fix long press during playback) Long-press detection was in loop() which blocks during TCP audio handling. Moved to dedicated touchTask on Core 1 that polls every 20ms regardless of what loop() is doing.	2026-03-27 19:15:16 -07:00
justLV	9dc9abf753	Add long-press to end call, short-press call lifecycle, reduce mic timeout - Add long-press detection (1.5s hold) on center touch to explicitly end call: stops mic, interrupts playback, shows slow amber LED pulse - Rewrite touch handler: ISR records touch start, loop() polls for release to distinguish short press (<1.5s) from long press (>=1.5s) - Add callActive state to track call lifecycle (tap to start, long-hold to end) - Short press when idle shows subtle green flash (server confirms with full pulse once WebRTC call is established) - Reduce default mic timeout from 60s to 20s (server VAD extends when active) - Guard 0xCC handler: don't extend mic after user explicitly ended call - Reset callActive on natural mic timeout	2026-03-27 11:04:48 -07:00
justLV	4312f8134a	Switch LLM to Gemini 2.5 Flash with thinking disabled - Resolve ${VAR} env var references in config api_key - Support thinking_budget config to control Gemini thinking mode	2026-03-01 18:10:58 -08:00
justLV	9024dd53a6	Add --warmup and --persist CLI flags for pipeline startup --warmup validates LLM and TTS backends on startup with test requests, logging timing and response validation. --persist (off by default) restores device state across restarts with message sanitization to ensure proper role alternation for Gemma 3's chat template.	2026-02-11 17:59:49 -08:00
justLV	13f9d59245	Add Qwen3-TTS as local TTS backend with voice cloning Adds mlx-audio-based Qwen3-TTS as an alternative to ElevenLabs, enabling fully offline voice synthesis with voice cloning from a short reference audio clip. Benchmarked at 0.52x RTF (sub-realtime) on Apple Silicon with the 1.7B-Base-4bit model.	2026-02-09 13:53:46 -08:00
justLV	0c9c75b3bf	Replace webrtcvad with Silero VAD (ONNX, no PyTorch) Switch from webrtcvad's binary is_speech to Silero VAD's calibrated float probability via direct ONNX session calls with numpy. The LSTM provides temporal smoothing natively, eliminating the sliding window hack. Frame size changes from 480 (30ms) to 512 (32ms) end-to-end to match Silero's requirements. Consolidate pipeline/requirements.txt into root requirements.txt, swap webrtcvad+setuptools for silero-vad+onnxruntime.	2026-02-07 17:00:02 -08:00
justLV	4efeeaea2b	Update PIPELINE.md for root-level venv and setup	2026-02-07 16:28:39 -08:00
justLV	496f614cb5	cleanup architecture doc	2026-02-07 16:27:32 -08:00
justLV	7162aa0f3b	Improve pipeline setup, logging, and test client compatibility Move venv to repo root with combined requirements.txt, fix libopus/portaudio discovery on macOS, replace deprecated audioop with numpy u-law encoder, add colored pipeline logging with suppressed third-party noise, fix mic deadlock on non-speech rejection, fix localhost IP mismatch for test client, add VAD visualization bar, tune VAD for conversational speech, and move runtime data to gitignored data/ directory.	2026-02-07 16:22:53 -08:00
justLV	b3538493a6	Add modular async pipeline server and ESP32 mDNS fallback Pipeline: async voice pipeline replacing monolithic threaded server. ASR, LLM, and TTS are independent pluggable services. ASR calls external parakeet-asr-server, LLM uses any OpenAI-compatible endpoint, TTS uses ElevenLabs with pluggable backend interface. Firmware: add mDNS hostname resolution as fallback when multicast discovery doesn't work. Resolves configured server_hostname via MDNS.queryHost() on boot, falls back to multicast if resolution fails. Also adds test_client.py that emulates an ESP32 device for testing without hardware (TCP server, Opus decode, mic streaming).	2026-02-07 15:04:12 -08:00
justLV	7c531c90df	Reduce playback buffer from 512ms to 256ms with Opus With Opus compression providing consistent frame delivery, we can safely reduce the jitter buffer from 8192 samples (512ms) to 4096 samples (256ms), cutting latency in half. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-02-03 18:37:32 -08:00
justLV	aedea0d568	Document flash_firmware.sh compile-only usage in README Added section explaining how to use flash_firmware.sh for: - Compile-only mode (no ESP32 needed) - Auto-detect and flash - Flash to specific port Emphasized using compile-only mode to verify code before committing. 🤖 Generated with Claude Code (https://claude.com/claude-code)	2026-02-02 11:33:02 -08:00
justLV	f3b5b6a7f8	Add compile-only mode to flash_firmware.sh Added compile-only mode that skips upload: ./flash_firmware.sh compile Updated usage: - flash_firmware.sh # Auto-detect and upload - flash_firmware.sh /dev/cu.usbmodem1 # Upload to specific port - flash_firmware.sh compile # Compile only, no upload Useful for verifying code compiles without needing ESP32 connected. 🤖 Generated with Claude Code (https://claude.com/claude-code)	2026-02-02 11:32:27 -08:00
justLV	fc2d8412ed	Standardize mic timeout to 60s for all activation paths Changed all mic activation paths to use 60s timeout: - Center tap to start call: 30s → 60s - Center tap to interrupt: 30s → 60s - After assistant audio: Enforce minimum 60s (was using server value) Behavior: - Tap center → mic enabled for 60s - Assistant speaks → mic auto-enabled for 60s after playback - Tap during playback → interrupts and mic enabled for 60s This ensures users always have adequate time to respond without premature timeout, matching the intended UX. 🤖 Generated with Claude Code (https://claude.com/claude-code)	2026-02-02 11:30:07 -08:00
justLV	3586a05b0b	Add touch debouncing to prevent rapid-fire interruptions Added 800ms debounce to all touch pads (left, center, right) to prevent accidental multiple touches from interrupting audio playback. Changes: - Added debounce timing variables (lines 57-61) - Implemented debounce logic in gotTouch1/2/3 handlers (lines 967-1032) - Each touch pad has independent debounce timer - Touches within 800ms of previous touch are ignored This prevents issues where: - Center tap would trigger multiple times from single press - Audio playback would be interrupted repeatedly - User experience was degraded by touch sensitivity The 800ms window provides good balance between preventing hardware bounces and maintaining responsive feel for legitimate user input. 🤖 Generated with Claude Code	2026-02-02 01:32:12 -08:00
justLV	dd3dad883a	Add Opus compression support to ElevenLabs streaming test Enhances test_streaming_tts.py to support optional Opus encoding for streaming TTS audio from ElevenLabs to ESP32. Features: - Add --opus flag to enable Opus compression - Accept ESP32 IP as command-line argument - Buffer PCM chunks into 20ms frames (640 bytes) for Opus encoding - Send with length-prefixed framing (compatible with ESP32 decoder) - Display compression statistics when using Opus Usage: python test_streaming_tts.py [ESP32_IP] [--opus] Results with Opus: - Compression ratio: ~14.5x (248KB PCM → 17KB Opus) - Bandwidth: 256 kbps → ~17 kbps (93% reduction) - Maintains streaming latency (~2s to first chunk) - High quality voice for human listening Tested successfully with ElevenLabs API streaming to ESP32-S3.	2026-01-31 19:18:58 -08:00
justLV	b25300a6c6	Add tap-to-interrupt playback feature Allows user to interrupt TTS playback mid-stream by tapping the center touch button. Enables immediate voice input without waiting for assistant to finish speaking. Implementation: - Add interruptPlayback volatile flag for ISR-safe signaling - Opus decode task checks flag on each frame decode iteration - PCM playback checks flag on each buffer read iteration - On interrupt: stop decoding, clear I2S DMA buffers, drain TCP - TCP drain runs for 1s to discard in-flight audio from server - Skip silence buffer flush when interrupted (exit immediately) - Enable microphone with 30s timeout for user response Behavior: - Latency: ~500ms (acceptable - next buffer iteration) - Visual feedback: Green LED indicates listening mode - Server timeout value still respected (gives user time to speak) - Works for both Opus and PCM audio streams User flow: 1. User taps during playback 2. Audio stops within ~500ms 3. Green LED pulses (listening mode) 4. Microphone enabled for 30s 5. User can speak immediately	2026-01-31 19:01:31 -08:00
justLV	c3514ceb49	Add Opus compression for speaker audio Implements Opus decoding on ESP32 for TTS playback, achieving 14-16x compression over raw PCM. This improves WiFi throughput margin from 2.2x to 30x+, enabling reliable operation throughout the home even with poor WiFi conditions. Key changes: - Add Opus decoder to ESP32 firmware with dedicated 32KB FreeRTOS task - Implement length-prefixed TCP framing for variable-bitrate Opus frames - Update header protocol: header[5] = compression type (0=PCM, 1=μ-law, 2=Opus) - Auto-detect USB port in flash and serial monitor scripts - Add test script with opuslib encoder supporting WAV/M4A/MP3 input - Document architecture and design rationale for μ-law/UDP (mic) vs Opus/TCP (speaker) Performance: - Compression: 640 bytes PCM → 35-50 bytes Opus per 20ms frame (14-16x) - Bandwidth: 256 kbps → 16 kbps (94% reduction) - WiFi margin: 2.2x → 30x+ throughput safety margin - CPU usage: ~10-20% during playback on ESP32-S3 - Quality: High-fidelity voice suitable for human listening 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-01-31 17:41:16 -08:00
justLV	4b450439c1	Add μ-law compression and fix ESP32-S3 V3 audio issues This commit adds audio compression and fixes critical I2S configuration issues that prevented audio playback on ESP32-S3 V3 boards. Key Changes: - Fix I2S channel from RIGHT to LEFT (V3 board requirement) - Fix deprecated I2S_COMM_FORMAT_I2S to I2S_COMM_FORMAT_STAND_I2S - Add μ-law compression for 2x bandwidth reduction (960→480 bytes/packet) - Add DC offset removal for microphone to fix compression artifacts - Add DISABLE_HARDWARE_MUTE option for boards without mute switch - Improve mute button behavior to control mic_timeout New Files: - onjuino/audio_compression.h: μ-law encoding/decoding implementation - flash_firmware.sh: Automated compilation and flashing script - serial_monitor.py: Interactive serial monitor with auto-reconnect - test_mic_receiver.py: UDP audio recording and compression testing - test_speaker.py: Speaker testing with local WAV file - test_streaming_tts.py: ElevenLabs streaming TTS performance testing - record_from_esp32.py: Simple recording script for testing - TESTING.md: Testing documentation Fixes: - I2S audio output now works on V3 boards (issue #57, #75) - Microphone compression produces valid audio data - Serial reset command now works properly (ESP.restart vs esp_restart) 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-01-29 00:42:08 -08:00
justLV	bc57e687e8	clear	2024-09-09 17:56:26 -07:00
justLV	a5969a7fd4	update to defaults	2024-09-09 17:53:50 -07:00
justLV	53dc65194a	config mode	2024-09-09 17:46:51 -07:00
justLV	0649cc72cb	Merge pull request #39 from srwalter/srwalter-patch-1 Remove unnecessary header	2024-03-30 11:37:24 -07:00
Steven Walter	f976eef1fc	Update onjuino.ino Remove unnecessary header	2024-01-03 11:31:40 -05:00
justLV	ef4f4a1a86	Update README.md	2023-10-09 19:47:08 -07:00
justLV	fd9e4d4341	Update README.md	2023-10-09 19:46:35 -07:00
justLV	8737480697	Updated PCBWay & design file details	2023-10-03 12:32:51 -07:00
Justin Alvey	6083d12109	Added design files	2023-10-03 12:22:05 -07:00
justLV	d33b584316	Updated readme with downloadable Altium files & PCBWay link	2023-10-03 12:16:22 -07:00
justLV	7c2482aca5	Update README.md	2023-08-09 18:33:46 -07:00
justLV	5041ca6ae3	Update README.md	2023-08-09 14:10:09 -07:00
justLV	4ed7a41b2a	Update README.md	2023-08-09 12:49:28 -07:00
Justin Alvey	bc65b07431	Missing openai req	2023-08-09 12:45:54 -07:00
justLV	ff9986a5cd	Update README.md	2023-08-09 11:30:27 -07:00
Justin Alvey	24df7d1256	Added white header	2023-08-09 11:29:27 -07:00
justLV	b49a715707	Update README.md	2023-08-09 00:10:26 -07:00
justLV	90c2ea8025	Update README.md	2023-08-09 00:09:55 -07:00
justLV	5511a43f05	Update README.md	2023-08-09 00:05:23 -07:00
justLV	4243a4913d	Update README.md	2023-08-08 23:56:28 -07:00
justLV	81eb364d8c	Update README.md	2023-08-08 20:42:05 -07:00
justLV	ab2c7c8f0a	Update README.md	2023-08-08 20:38:12 -07:00
justLV	c62df66223	Update README.md	2023-08-08 20:34:58 -07:00
Justin Alvey	27030c8094	add to public repo	2023-08-08 20:32:52 -07:00
justLV	1a7fae3d4d	Update README.md	2023-08-08 20:16:12 -07:00
Justin Alvey	7b59929af6	renders	2023-08-08 20:07:01 -07:00
Justin Alvey	3f08890e26	images	2023-08-08 20:01:13 -07:00
Justin Alvey	ff2a1c8402	init readme	2023-08-08 19:22:27 -07:00
justLV	ead49decc1	Create LICENSE	2023-08-08 19:17:47 -07:00

48 commits