onju-v2/PIPELINE.md

2.2 KiB

Pipeline Server

Async voice pipeline that connects ESP32 onju-voice devices to ASR, LLM, and TTS services.

ESP32 (mic) ──UDP/μ-law──▶ Pipeline ──HTTP──▶ ASR Service
                              │
                              ├──▶ LLM (OpenAI-compatible)
                              │
                              ├──▶ TTS (ElevenLabs)
                              │
ESP32 (speaker) ◀──TCP/Opus──┘

Prerequisites

ASR Serviceparakeet-asr-server running on port 8100.

LLM — Any OpenAI-compatible server. Examples:

# Local (mlx_lm)
mlx_lm.server --model mlx-community/gemma-3-4b-it-qat-4bit --port 8080

# Local (Ollama)
ollama serve  # default port 11434

# Hosted — just set base_url and api_key in config.yaml

TTS — ElevenLabs API key (add to config.yaml).

Setup

# From repo root
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

# macOS: install system libraries
brew install opus portaudio

Configuration

cp pipeline/config.yaml.example pipeline/config.yaml
# Edit config.yaml with your API keys and preferences

Running

Ensure the prerequisite services are running, then start the pipeline from the repo root:

source .venv/bin/activate
python -m pipeline.main

Test Client

A Python script that emulates an ESP32 device (TCP server, Opus decoding, mic streaming):

# From repo root
python test_client.py                  # localhost
python test_client.py 192.168.1.50     # remote server
python test_client.py --no-mic         # playback only

Config Reference

Section Key Description
asr.url ASR service endpoint Default: http://localhost:8100
llm.base_url OpenAI-compatible API base Ollama, mlx_lm, OpenRouter, OpenAI
llm.model Model name Passed to chat completions API
tts.backend TTS provider Currently: elevenlabs
vad.* Voice activity detection Tune thresholds for sensitivity
network.* Ports UDP 3000 (mic), TCP 3001 (speaker), multicast 239.0.0.1:12345