LocalAI/gallery
Richard Palethorpe c60ed75258 feat(middleware): Model routing, PII filtering, Cloud model proxies
Add a routing middleware stack and a cloud-proxy backend.

* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
  Anthropic-shaped chat requests to upstream providers, with an
  optional translate mode (OpenAI request -> Anthropic /v1/messages
  -> OpenAI response) and full tool-calling support.

* routing: admission control, content-aware model routing
  (embedding cache + classifier + rerank + Arch-Router score),
  PII detection/redaction (regex + NER) with streaming filter and
  OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
  backed by GORM or in-memory storage.

* middleware: UsageMiddleware records usage via the billing recorder,
  plus admission, route-model, usage-stamp and trace middlewares.

* observability: BackendTrace ring buffer stores full request bodies
  (capped), MITM proxy emits structured trace events, and router
  classifier decisions surface at /api/router/decide.

* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).

* UI: cloud-proxy model-editor fields, classifier system-prompt and
  score-normalization config, and a Traces page rendering request
  bodies.

Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-24 09:42:31 +01:00
..
alpaca.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
arch-function.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
bge-m3-colbert.yaml feat(middleware): Model routing, PII filtering, Cloud model proxies 2026-05-24 09:42:31 +01:00
cerbero.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
chatml-hercules.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
chatml.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
codellama.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
command-r.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
deephermes.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
deepseek-r1.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
deepseek.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
dreamshaper.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
falcon3.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
flux-ggml.yaml fix(flux): Set CFG=1 so that prompts are followed (#5378) 2025-05-16 17:53:54 +02:00
flux.yaml fix(flux): Set CFG=1 so that prompts are followed (#5378) 2025-05-16 17:53:54 +02:00
gemma.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
granite.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
granite3-2.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
granite4.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
harmony.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
hermes-2-pro-mistral.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
hermes-vllm.yaml chore(model-gallery): add more quants for popular models (#3365) 2024-08-24 00:29:24 +02:00
index.yaml feat(middleware): Model routing, PII filtering, Cloud model proxies 2026-05-24 09:42:31 +01:00
jamba.yaml chore(model gallery): add ai21labs_ai21-jamba-reasoning-3b (#6417) 2025-10-09 15:00:56 +02:00
kokoros.yaml feat: Add Kokoros backend (#9212) 2026-04-08 19:23:16 +02:00
lfm.yaml feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801) 2026-05-13 21:57:27 +02:00
liquid-audio.yaml feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801) 2026-05-13 21:57:27 +02:00
llama3-instruct.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
llama3.1-instruct-grammar.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
llama3.1-instruct.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
llama3.1-reflective.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
llama3.2-fcall.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
llama3.2-quantized.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
llava.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
mathstral.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
mistral-0.3.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
moondream.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
mudler.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
nanbeige4.1.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
noromaid.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
openvino.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
parler-tts.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
phi-2-chat.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
phi-2-orange.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
phi-3-chat.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
phi-3-vision.yaml fix(phi3-vision): add multimodal template (#3944) 2024-10-23 15:34:45 +02:00
phi-4-chat-fcall.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
phi-4-chat.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
piper.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
pocket-tts.yaml feat(tts): add pocket-tts backend (#8018) 2026-01-13 23:35:19 +01:00
qwen-fcall.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
qwen-image.yaml Update qwen-image.yaml 2025-08-06 10:40:46 +02:00
qwen3-deepresearch.yaml chore(model gallery): add alibaba-nlp_tongyi-deepresearch-30b-a3b (#6295) 2025-09-17 09:22:19 +02:00
qwen3-openbuddy.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
qwen3.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
rerankers.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
rwkv.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
sd-ggml.yaml chore(model gallery): add sd-3.5-large-ggml (#4647) 2025-01-20 19:04:23 +01:00
sentencetransformers.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
sglang-gemma-4-e2b-mtp.yaml feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686) 2026-05-07 17:27:29 +02:00
sglang-gemma-4-e4b-mtp.yaml feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686) 2026-05-07 17:27:29 +02:00
sglang-mimo-7b-mtp.yaml feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686) 2026-05-07 17:27:29 +02:00
sglang.yaml feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686) 2026-05-07 17:27:29 +02:00
sherpa-onnx-asr.yaml feat: Add Sherpa ONNX backend for ASR and TTS (#8523) 2026-04-24 14:40:06 +02:00
sherpa-onnx-tts.yaml feat: Add Sherpa ONNX backend for ASR and TTS (#8523) 2026-04-24 14:40:06 +02:00
sherpa-onnx-vad.yaml feat: Add Sherpa ONNX backend for ASR and TTS (#8523) 2026-04-24 14:40:06 +02:00
smolvlm.yaml feat(gallery): Speed up load times and clean gallery entries (#9211) 2026-05-06 14:51:38 +02:00
stablediffusion3.yaml feat(sd-3): add stablediffusion 3 support (#2591) 2024-06-18 15:09:39 +02:00
tuluv2.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
vibevoice.yaml feat(vibevoice): add new backend (#7494) 2025-12-10 21:14:21 +01:00
vicuna-chat.yaml models(gallery): add apollo2-9b (#3860) 2024-10-17 10:16:52 +02:00
virtual.yaml fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
vllm.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
wan-ggml.yaml chore(gallery): fixup wan 2026-04-19 21:31:22 +00:00
whisper-base.yaml models(gallery): add all whisper variants (#2462) 2024-06-01 20:04:03 +02:00
wizardlm2.yaml feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
z-image-ggml.yaml Fix load of z-image-turbo (#9264) 2026-04-11 08:42:13 +02:00