2026-02-07 23:04:12 +00:00
|
|
|
asr:
|
Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.
HTTP control server on :3002 for runtime device management:
POST/GET/DELETE /devices
Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).
Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).
Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.
Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 21:22:20 +00:00
|
|
|
url: "http://localhost:8100" # parakeet-asr-server
|
2026-02-07 23:04:12 +00:00
|
|
|
|
2026-04-06 18:31:38 +00:00
|
|
|
conversation:
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
backend: "agentic" # "agentic" (e.g. OpenClaw, with tools) or "conversational" (plain chat)
|
2026-04-08 20:37:20 +00:00
|
|
|
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
stall:
|
|
|
|
|
enabled: true # decide if a stall phrase is needed while the agent works
|
|
|
|
|
base_url: "https://generativelanguage.googleapis.com/v1beta/openai/"
|
|
|
|
|
api_key: "${GEMINI_API_KEY}"
|
|
|
|
|
model: "gemini-2.5-flash"
|
|
|
|
|
reasoning_effort: "none" # disable thinking for sub-second latency (Gemini 2.5 Flash only)
|
|
|
|
|
max_tokens: 200
|
|
|
|
|
timeout: 1.5 # seconds; skip stall if slower than this
|
|
|
|
|
prompt: |
|
2026-04-13 02:08:40 +00:00
|
|
|
You are the bridge voice for a voice assistant — a short, natural
|
|
|
|
|
utterance you speak immediately while the real assistant starts
|
|
|
|
|
working. Your job is to decide whether the user's latest
|
|
|
|
|
utterance needs one, and if so, to say it.
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
|
|
|
|
|
{recent_context}
|
|
|
|
|
|
|
|
|
|
The user just said: {user_text}
|
|
|
|
|
|
2026-04-13 02:08:40 +00:00
|
|
|
If the assistant can answer entirely from its own knowledge or
|
|
|
|
|
creativity — facts, opinions, jokes, explanations, general
|
|
|
|
|
knowledge, small talk, a partial thought, or a request to keep
|
|
|
|
|
talking — output the literal word NONE. The assistant is itself
|
|
|
|
|
a capable language model and doesn't need bridge audio for
|
|
|
|
|
anything it can just answer. Note: a follow-up that changes a
|
|
|
|
|
parameter in a previous lookup is a fresh lookup, not small talk.
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
|
2026-04-13 02:08:40 +00:00
|
|
|
Otherwise the assistant is about to do slow agentic work — a
|
|
|
|
|
live lookup, a file or API call, or an action like scheduling,
|
|
|
|
|
saving, sending, remembering, updating something — and you
|
|
|
|
|
should speak a brief, warm bridge phrase while that runs. Two
|
|
|
|
|
situations:
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
|
2026-04-13 02:08:40 +00:00
|
|
|
Asking FOR information. React naturally and signal you're going
|
|
|
|
|
to go look. Roughly three to seven words, friend energy, specific
|
|
|
|
|
to what the user actually mentioned — use the name of the place,
|
|
|
|
|
person, or thing instead of vague filler. Never predict the
|
|
|
|
|
answer.
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
|
2026-04-13 02:08:40 +00:00
|
|
|
Asking you to DO something. You are ONLY the bridge voice —
|
|
|
|
|
you have no authority to commit to the action, and the real
|
|
|
|
|
agent will confirm it itself once it's done.
|
|
|
|
|
|
|
|
|
|
Your job: speak a short listener-sound that tells the user
|
|
|
|
|
"I heard you" without actually responding to the substance of
|
|
|
|
|
their request. Two to five words, warm and natural, like the
|
|
|
|
|
reaction a friend gives mid-conversation to show they're
|
|
|
|
|
following. It should feel like a backchannel, not a reply.
|
|
|
|
|
|
|
|
|
|
Content test you must pass: if a third party read ONLY your
|
|
|
|
|
phrase, without the user's message, they should be unable to
|
|
|
|
|
guess what the user asked for. That means:
|
|
|
|
|
- No verb form of the action — no "adding", "saving",
|
|
|
|
|
"scheduling", "sending", "marking", "reminding", "noting",
|
|
|
|
|
"creating", "updating", "setting up", "putting", etc.
|
|
|
|
|
- No naming of the thing being acted on.
|
|
|
|
|
- No "I'll", "I will", "let me", "I'm going to", "on it",
|
|
|
|
|
"will do", "right away".
|
|
|
|
|
|
|
|
|
|
The common failure mode is helpfully narrating the action
|
|
|
|
|
("Okay, adding that…", "Sure, I'll remember that…") — that
|
|
|
|
|
is exactly what NOT to do, because you cannot honestly make
|
|
|
|
|
that promise. Stay content-free.
|
|
|
|
|
|
|
|
|
|
Write fresh each time — don't reach for stock phrases. Match the
|
|
|
|
|
user's register: relaxed if they were relaxed, brisk if they
|
|
|
|
|
were brisk. Keep it under seven words either way. End with
|
|
|
|
|
normal spoken punctuation.
|
|
|
|
|
|
|
|
|
|
Output ONLY the spoken phrase, or the literal word NONE. No
|
|
|
|
|
quotes, no explanation, no preamble.
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
|
|
|
|
|
agentic:
|
2026-04-08 20:37:20 +00:00
|
|
|
base_url: "http://127.0.0.1:18789/v1" # OpenClaw gateway
|
|
|
|
|
api_key: "${OPENCLAW_GATEWAY_TOKEN}" # env var reference
|
|
|
|
|
model: "openclaw/default"
|
|
|
|
|
max_tokens: 300
|
|
|
|
|
message_channel: "onju-voice" # x-openclaw-message-channel header
|
|
|
|
|
# provider_model: "anthropic/claude-opus-4-6" # optional: override backend LLM
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
voice_prompt: >- # prepended to every user message as a reminder
|
|
|
|
|
[voice: this is spoken input transcribed from a microphone and your entire
|
|
|
|
|
response will be read aloud by TTS on a small speaker. Write only plain
|
|
|
|
|
spoken prose — no markdown, no lists, no structured reports, no code. If
|
|
|
|
|
your research produces detailed findings, save them to a file and just
|
|
|
|
|
give a brief spoken summary. Remember, keep it conversational.]
|
2026-04-06 18:31:38 +00:00
|
|
|
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
conversational:
|
Prepare repo for v2 release: rewrite README, clean up dev scripts, embed ASR server
- Rewrite README with v2 features (OpenClaw, M5 Echo, Opus, pluggable backends),
fold ARCHITECTURE.md and PIPELINE.md content inline
- Remove dev-only test scripts (streaming TTS, UDP recv, qwen3 bench, etc.)
- Remove redundant m5_echo/flash.sh and terminal.py (root scripts handle both)
- Consolidate credentials to .template naming, remove .example
- Embed parakeet-mlx ASR server as optional dependency (pipeline/services/asr_server.py)
- Default LLM to Claude Haiku 4.5 via OpenRouter, local example uses Gemma 4 E4B
- Update pyproject.toml with metadata, bump to 2.0.0
- Clean up .gitignore
2026-04-08 20:00:15 +00:00
|
|
|
base_url: "https://openrouter.ai/api/v1" # OpenRouter, Ollama, mlx_lm.server, Gemini, etc.
|
|
|
|
|
api_key: "${OPENROUTER_API_KEY}" # set key or use ${ENV_VAR} reference
|
|
|
|
|
model: "anthropic/claude-haiku-4.5"
|
2026-04-06 18:31:38 +00:00
|
|
|
max_messages: 20
|
|
|
|
|
max_tokens: 300
|
|
|
|
|
system_prompt: "You are a helpful voice assistant. Keep responses concise (under 2 sentences)."
|
Prepare repo for v2 release: rewrite README, clean up dev scripts, embed ASR server
- Rewrite README with v2 features (OpenClaw, M5 Echo, Opus, pluggable backends),
fold ARCHITECTURE.md and PIPELINE.md content inline
- Remove dev-only test scripts (streaming TTS, UDP recv, qwen3 bench, etc.)
- Remove redundant m5_echo/flash.sh and terminal.py (root scripts handle both)
- Consolidate credentials to .template naming, remove .example
- Embed parakeet-mlx ASR server as optional dependency (pipeline/services/asr_server.py)
- Default LLM to Claude Haiku 4.5 via OpenRouter, local example uses Gemma 4 E4B
- Update pyproject.toml with metadata, bump to 2.0.0
- Clean up .gitignore
2026-04-08 20:00:15 +00:00
|
|
|
persist_dir: "data/conversations" # per-device message history (omit to disable)
|
Stream agentic LLM responses, add contextual stall classifier, rename backends
- SSE sentence-level streaming: consume agent deltas, split on sentence
boundaries (handles no-space chunk joins), synthesize+send each sentence
as it forms; intermediate sends keep mic_timeout=0
- Gemini-backed stall classifier for agentic mode only: narrow to
retrieval-only, pass prev user/assistant for context awareness, avoid
action promises the stall can't honor, sub-second latency via
reasoning_effort=none
- Rename backends: local -> conversational, managed -> agentic
(files, classes, config keys)
- PTT interrupt fix: set device.interrupted when button-press frames
arrive mid-response and keep buffering so the next utterance captures
cleanly instead of being dropped
- Startup summary log showing ASR, LLM, STALL, and TTS config at a glance
- run.sh launcher with Homebrew libopus path for macOS
- voice_prompt config for per-turn agentic reminders; inline continuity
note injection so the agent knows what the stall just said aloud
- README section on streaming, stalls, and the first-turn OpenClaw caveat
2026-04-12 20:55:59 +00:00
|
|
|
# Fully local example (Ollama):
|
Prepare repo for v2 release: rewrite README, clean up dev scripts, embed ASR server
- Rewrite README with v2 features (OpenClaw, M5 Echo, Opus, pluggable backends),
fold ARCHITECTURE.md and PIPELINE.md content inline
- Remove dev-only test scripts (streaming TTS, UDP recv, qwen3 bench, etc.)
- Remove redundant m5_echo/flash.sh and terminal.py (root scripts handle both)
- Consolidate credentials to .template naming, remove .example
- Embed parakeet-mlx ASR server as optional dependency (pipeline/services/asr_server.py)
- Default LLM to Claude Haiku 4.5 via OpenRouter, local example uses Gemma 4 E4B
- Update pyproject.toml with metadata, bump to 2.0.0
- Clean up .gitignore
2026-04-08 20:00:15 +00:00
|
|
|
# base_url: "http://localhost:11434/v1"
|
|
|
|
|
# api_key: "none"
|
|
|
|
|
# model: "gemma4:e4b"
|
2026-04-06 18:31:38 +00:00
|
|
|
|
2026-02-07 23:04:12 +00:00
|
|
|
tts:
|
2026-04-08 20:37:20 +00:00
|
|
|
backend: "elevenlabs" # "local" or "elevenlabs" (cloud)
|
|
|
|
|
local:
|
2026-04-06 18:31:38 +00:00
|
|
|
url: "http://localhost:8880"
|
2026-02-09 21:53:46 +00:00
|
|
|
model: "mlx-community/Qwen3-TTS-12Hz-1.7B-Base-4bit"
|
2026-04-06 18:31:38 +00:00
|
|
|
ref_audio: ""
|
|
|
|
|
ref_text: ""
|
2026-02-07 23:04:12 +00:00
|
|
|
elevenlabs:
|
Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.
HTTP control server on :3002 for runtime device management:
POST/GET/DELETE /devices
Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).
Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).
Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.
Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 21:22:20 +00:00
|
|
|
api_key: "" # your ElevenLabs API key
|
2026-04-08 23:28:00 +00:00
|
|
|
default_voice: "Archer"
|
2026-04-12 19:23:24 +00:00
|
|
|
default_voice_ptt: "Emma" # PTT devices (smaller speaker)
|
2026-02-07 23:04:12 +00:00
|
|
|
voices:
|
2026-04-08 23:28:00 +00:00
|
|
|
Archer: "Fahco4VZzobUeiPqni1S" # British conversational male
|
2026-04-12 19:23:24 +00:00
|
|
|
Emma: "56bWURjYFHyYyVf490Dp" # female, better on small speakers
|
Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.
HTTP control server on :3002 for runtime device management:
POST/GET/DELETE /devices
Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).
Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).
Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.
Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 21:22:20 +00:00
|
|
|
Rachel: "21m00Tcm4TlvDq8ikWAM" # add your voice IDs here
|
2026-02-07 23:04:12 +00:00
|
|
|
|
|
|
|
|
vad:
|
Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.
HTTP control server on :3002 for runtime device management:
POST/GET/DELETE /devices
Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).
Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).
Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.
Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 21:22:20 +00:00
|
|
|
threshold: 0.5 # speech onset probability
|
|
|
|
|
neg_threshold: 0.35 # speech offset probability (hysteresis)
|
2026-02-07 23:04:12 +00:00
|
|
|
silence_time: 1.5
|
|
|
|
|
pre_buffer_s: 1.0
|
|
|
|
|
|
|
|
|
|
network:
|
|
|
|
|
udp_port: 3000
|
|
|
|
|
tcp_port: 3001
|
|
|
|
|
multicast_group: "239.0.0.1"
|
|
|
|
|
multicast_port: 12345
|
Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.
HTTP control server on :3002 for runtime device management:
POST/GET/DELETE /devices
Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).
Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).
Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.
Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 21:22:20 +00:00
|
|
|
control_port: 3002
|
2026-02-07 23:04:12 +00:00
|
|
|
|
|
|
|
|
audio:
|
|
|
|
|
sample_rate: 16000
|
2026-04-09 19:43:43 +00:00
|
|
|
chunk_size: 512 # 32ms at 16kHz (matches ESP32 SAMPLE_CHUNK_SIZE)
|
Add PTT device support, IIR DC offset fix, control API, test script updates
PTT devices (--device name=ip:ptt): skip VAD, buffer audio until packets
stop, skip LED commands, interrupt in-flight responses on new audio.
Auto-detected from multicast "PTT" announcement.
HTTP control server on :3002 for runtime device management:
POST/GET/DELETE /devices
Firmware: replace per-chunk DC offset with IIR filter to eliminate
zipper noise at chunk boundaries (m5_echo + onjuino).
Protocol: TCP timeouts use actual timeout param, failures are silent
for non-critical commands (LED blink).
Pipeline: labeled error logging (ASR/LLM/TTS), env var resolution
warning, Gemini OpenAI-compatible endpoint support.
Test scripts: rewritten to use pipeline modules, delete redundant
test_opus_tts.py, add pyproject.toml (replaces requirements.txt).
2026-04-06 21:22:20 +00:00
|
|
|
opus_frame_size: 320 # 20ms at 16kHz
|
2026-02-07 23:04:12 +00:00
|
|
|
|
|
|
|
|
device:
|
2026-04-08 20:37:20 +00:00
|
|
|
default_volume: 15
|
2026-02-07 23:04:12 +00:00
|
|
|
default_mic_timeout: 60
|
2026-04-09 19:43:43 +00:00
|
|
|
led_fade: 2
|
|
|
|
|
led_power: 50
|
|
|
|
|
led_update_period: 0.25
|
2026-04-08 20:37:20 +00:00
|
|
|
greeting: false
|
2026-02-07 23:04:12 +00:00
|
|
|
greeting_wav: "data/hello_imhere.wav"
|
|
|
|
|
|
|
|
|
|
logging:
|
|
|
|
|
level: "INFO"
|