mirror of
https://github.com/justLV/onju-v2
synced 2026-04-21 15:47:55 +00:00
2.2 KiB
2.2 KiB
Pipeline Server
Async voice pipeline that connects ESP32 onju-voice devices to ASR, LLM, and TTS services.
ESP32 (mic) ──UDP/μ-law──▶ Pipeline ──HTTP──▶ ASR Service
│
├──▶ LLM (OpenAI-compatible)
│
├──▶ TTS (ElevenLabs)
│
ESP32 (speaker) ◀──TCP/Opus──┘
Prerequisites
ASR Service — parakeet-asr-server running on port 8100.
LLM — Any OpenAI-compatible server. Examples:
# Local (mlx_lm)
mlx_lm.server --model mlx-community/gemma-3-4b-it-qat-4bit --port 8080
# Local (Ollama)
ollama serve # default port 11434
# Hosted — just set base_url and api_key in config.yaml
TTS — ElevenLabs API key (add to config.yaml).
Setup
# From repo root
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt
# macOS: install system libraries
brew install opus portaudio
Configuration
cp pipeline/config.yaml.example pipeline/config.yaml
# Edit config.yaml with your API keys and preferences
Running
Ensure the prerequisite services are running, then start the pipeline from the repo root:
source .venv/bin/activate
python -m pipeline.main
Test Client
A Python script that emulates an ESP32 device (TCP server, Opus decoding, mic streaming):
# From repo root
python test_client.py # localhost
python test_client.py 192.168.1.50 # remote server
python test_client.py --no-mic # playback only
Config Reference
| Section | Key | Description |
|---|---|---|
asr.url |
ASR service endpoint | Default: http://localhost:8100 |
llm.base_url |
OpenAI-compatible API base | Ollama, mlx_lm, OpenRouter, OpenAI |
llm.model |
Model name | Passed to chat completions API |
tts.backend |
TTS provider | Currently: elevenlabs |
vad.* |
Voice activity detection | Tune thresholds for sensitivity |
network.* |
Ports | UDP 3000 (mic), TCP 3001 (speaker), multicast 239.0.0.1:12345 |