mirror of
https://github.com/justLV/onju-v2
synced 2026-04-21 15:47:55 +00:00
79 lines
2.2 KiB
Markdown
79 lines
2.2 KiB
Markdown
# Pipeline Server
|
|
|
|
Async voice pipeline that connects ESP32 onju-voice devices to ASR, LLM, and TTS services.
|
|
|
|
```
|
|
ESP32 (mic) ──UDP/μ-law──▶ Pipeline ──HTTP──▶ ASR Service
|
|
│
|
|
├──▶ LLM (OpenAI-compatible)
|
|
│
|
|
├──▶ TTS (ElevenLabs)
|
|
│
|
|
ESP32 (speaker) ◀──TCP/Opus──┘
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
**ASR Service** — [parakeet-asr-server](https://github.com/justLV/parakeet-asr-server) running on port 8100.
|
|
|
|
**LLM** — Any OpenAI-compatible server. Examples:
|
|
```bash
|
|
# Local (mlx_lm)
|
|
mlx_lm.server --model mlx-community/gemma-3-4b-it-qat-4bit --port 8080
|
|
|
|
# Local (Ollama)
|
|
ollama serve # default port 11434
|
|
|
|
# Hosted — just set base_url and api_key in config.yaml
|
|
```
|
|
|
|
**TTS** — ElevenLabs API key (add to `config.yaml`).
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
# From repo root
|
|
uv venv && source .venv/bin/activate
|
|
uv pip install -r requirements.txt
|
|
|
|
# macOS: install system libraries
|
|
brew install opus portaudio
|
|
```
|
|
|
|
## Configuration
|
|
|
|
```bash
|
|
cp pipeline/config.yaml.example pipeline/config.yaml
|
|
# Edit config.yaml with your API keys and preferences
|
|
```
|
|
|
|
## Running
|
|
|
|
Ensure the prerequisite services are running, then start the pipeline from the repo root:
|
|
|
|
```bash
|
|
source .venv/bin/activate
|
|
python -m pipeline.main
|
|
```
|
|
|
|
## Test Client
|
|
|
|
A Python script that emulates an ESP32 device (TCP server, Opus decoding, mic streaming):
|
|
|
|
```bash
|
|
# From repo root
|
|
python test_client.py # localhost
|
|
python test_client.py 192.168.1.50 # remote server
|
|
python test_client.py --no-mic # playback only
|
|
```
|
|
|
|
## Config Reference
|
|
|
|
| Section | Key | Description |
|
|
|---------|-----|-------------|
|
|
| `asr.url` | ASR service endpoint | Default: `http://localhost:8100` |
|
|
| `llm.base_url` | OpenAI-compatible API base | Ollama, mlx_lm, OpenRouter, OpenAI |
|
|
| `llm.model` | Model name | Passed to chat completions API |
|
|
| `tts.backend` | TTS provider | Currently: `elevenlabs` |
|
|
| `vad.*` | Voice activity detection | Tune thresholds for sensitivity |
|
|
| `network.*` | Ports | UDP 3000 (mic), TCP 3001 (speaker), multicast 239.0.0.1:12345 |
|