LocalAI/docs/content/features
Richard Palethorpe c60ed75258 feat(middleware): Model routing, PII filtering, Cloud model proxies
Add a routing middleware stack and a cloud-proxy backend.

* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
  Anthropic-shaped chat requests to upstream providers, with an
  optional translate mode (OpenAI request -> Anthropic /v1/messages
  -> OpenAI response) and full tool-calling support.

* routing: admission control, content-aware model routing
  (embedding cache + classifier + rerank + Arch-Router score),
  PII detection/redaction (regex + NER) with streaming filter and
  OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
  backed by GORM or in-memory storage.

* middleware: UsageMiddleware records usage via the billing recorder,
  plus admission, route-model, usage-stamp and trace middlewares.

* observability: BackendTrace ring buffer stores full request bodies
  (capped), MITM proxy emits structured trace events, and router
  classifier decisions surface at /api/router/decide.

* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).

* UI: cloud-proxy model-editor fields, classifier system-prompt and
  score-normalization config, and a Traces page rendering request
  bodies.

Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-24 09:42:31 +01:00
..
_index.en.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
agents.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
api-discovery.md feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084) 2026-04-04 15:14:35 +02:00
audio-diarization.md feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654) 2026-05-05 15:10:13 +02:00
audio-to-text.md feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654) 2026-05-05 15:10:13 +02:00
audio-transform.md fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI) (#9682) 2026-05-06 10:36:59 +02:00
authentication.md feat(usage): track and visualise usage per API key (#9920) 2026-05-21 16:34:02 +02:00
backend-monitor.md fix(backend-monitor): accept model as a query parameter (#9411) 2026-04-21 22:06:35 +02:00
backends.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
cloud-proxy.md feat(middleware): Model routing, PII filtering, Cloud model proxies 2026-05-24 09:42:31 +01:00
constrained_grammars.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
distributed-mode.md fix(distributed): make admin backend installs resilient and observable (#9958) 2026-05-23 12:35:44 +02:00
distributed_inferencing.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
distribution.md fix(docs): commit distribution.md 2026-04-03 10:14:13 +02:00
embeddings.md feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480) 2026-04-22 21:55:41 +02:00
face-recognition.md fix(docs): replace Docsy alert shortcode with Relearn notice 2026-04-25 21:04:31 +00:00
fine-tuning.md fix(docs): Use notice instead of alert (#9134) 2026-03-25 13:55:48 +01:00
gpt-vision.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
GPU-acceleration.md feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
image-generation.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
localai-assistant.md feat: localai assistant chat modality (#9602) 2026-04-28 19:29:27 +02:00
mcp.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
middleware.md feat(middleware): Model routing, PII filtering, Cloud model proxies 2026-05-24 09:42:31 +01:00
mitm-proxy.md feat(middleware): Model routing, PII filtering, Cloud model proxies 2026-05-24 09:42:31 +01:00
mlx-distributed.md feat(mlx-distributed): add new MLX-distributed backend (#8801) 2026-03-09 17:29:32 +01:00
model-gallery.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
object-detection.md feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480) 2026-04-22 21:55:41 +02:00
openai-functions.md docs: document tool calling on vLLM and MLX backends 2026-04-13 16:58:55 +00:00
openai-realtime.md Remove header from OpenAI Realtime API documentation 2026-04-09 09:00:28 +02:00
p2p.md feat: Add documentation for undocumented API endpoints (#8852) 2026-03-08 17:59:33 +01:00
quantization.md fix(docs): Use notice instead of alert (#9134) 2026-03-25 13:55:48 +01:00
reranker.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
runtime-settings.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
sound-generation.md feat: Add documentation for undocumented API endpoints (#8852) 2026-03-08 17:59:33 +01:00
stores.md fix(docs): replace Docsy alert shortcode with Relearn notice 2026-04-25 21:04:31 +00:00
text-generation.md feat(llama-cpp): make server-side prompt cache work by default (#9925) 2026-05-21 16:31:48 +02:00
text-to-audio.md fix(docs): fix broken references to distributed mode 2026-04-03 09:46:06 +02:00
video-generation.md feat: Add documentation for undocumented API endpoints (#8852) 2026-03-08 17:59:33 +01:00
voice-activity-detection.md feat: Add documentation for undocumented API endpoints (#8852) 2026-03-08 17:59:33 +01:00
voice-recognition.md fix(docs): replace Docsy alert shortcode with Relearn notice 2026-04-25 21:04:31 +00:00