LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Richard Palethorpe c60ed75258 feat(middleware): Model routing, PII filtering, Cloud model proxies Add a routing middleware stack and a cloud-proxy backend. * cloud-proxy: a Go gRPC backend that forwards OpenAI- and Anthropic-shaped chat requests to upstream providers, with an optional translate mode (OpenAI request -> Anthropic /v1/messages -> OpenAI response) and full tool-calling support. * routing: admission control, content-aware model routing (embedding cache + classifier + rerank + Arch-Router score), PII detection/redaction (regex + NER) with streaming filter and OpenAI/Anthropic adapters, and a per-user/per-key billing recorder backed by GORM or in-memory storage. * middleware: UsageMiddleware records usage via the billing recorder, plus admission, route-model, usage-stamp and trace middlewares. * observability: BackendTrace ring buffer stores full request bodies (capped), MITM proxy emits structured trace events, and router classifier decisions surface at /api/router/decide. * gallery: Arch-Router-1.5B (Q4_K_M and Q8_0). * UI: cloud-proxy model-editor fields, classifier system-prompt and score-normalization config, and a Traces page rendering request bodies. Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com>		2026-05-24 09:42:31 +01:00
..
gen_inference_defaults	feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )	2026-03-22 00:57:15 +01:00
meta	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00
application_config.go	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00
application_config_test.go	feat: backend versioning, upgrade detection and auto-upgrade (#9315 )	2026-04-11 22:31:15 +02:00
backend_capabilities.go	feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801 )	2026-05-13 21:57:27 +02:00
backend_capabilities_test.go	feat(gallery): Speed up load times and clean gallery entries (#9211 )	2026-05-06 14:51:38 +02:00
backend_hooks.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
config_suite_test.go	dependencies(grpcio): bump to fix CI issues (#2362 )	2024-05-21 14:33:47 +02:00
distributed_config.go	fix(distributed): make admin backend installs resilient and observable (#9958 )	2026-05-23 12:35:44 +02:00
distributed_config_test.go	fix(distributed): make admin backend installs resilient and observable (#9958 )	2026-05-23 12:35:44 +02:00
gallery.go	feat(gallery): verify backend OCI images with keyless cosign (#9823 )	2026-05-18 08:02:20 +02:00
gguf.go	feat(llama-cpp): bump to MTP-merge SHA and automatically set MTP defaults (#9852 )	2026-05-16 22:42:48 +02:00
gguf_reasoning_test.go	Respect explicit reasoning config during GGUF thinking probe (#9463 )	2026-04-21 21:53:10 +02:00
hooks_llamacpp.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
hooks_test.go	feat(config): default prompt_cache_all to true (#9951 )	2026-05-22 22:06:22 +02:00
hooks_vllm.go	feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563 )	2026-04-29 00:49:28 +02:00
inference_defaults.go	feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )	2026-03-22 00:57:15 +01:00
inference_defaults.json	chore: bump inference defaults from unsloth (#9396 )	2026-04-17 09:05:55 +02:00
inference_defaults_test.go	feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )	2026-03-22 00:57:15 +01:00
mitm_host_owners_test.go	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00
model_config.go	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00
model_config_filter.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
model_config_loader.go	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00
model_config_loader_test.go	feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )	2026-05-05 08:42:50 +02:00
model_config_test.go	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00
model_test.go	fix(tests): inline model_test fixtures after tests/models_fixtures removal	2026-04-28 12:58:49 +00:00
mtp.go	feat(llama-cpp): bump to MTP-merge SHA and automatically set MTP defaults (#9852 )	2026-05-16 22:42:48 +02:00
mtp_test.go	feat(llama-cpp): bump to MTP-merge SHA and automatically set MTP defaults (#9852 )	2026-05-16 22:42:48 +02:00
parser_defaults.json	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
runtime_settings.go	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00
runtime_settings_persist.go	feat(branding): admin-configurable instance name, tagline, and assets (#9635 )	2026-05-02 15:51:36 +02:00
runtime_settings_persist_test.go	feat(middleware): Model routing, PII filtering, Cloud model proxies	2026-05-24 09:42:31 +01:00