LocalAI/backend/python
Richard Palethorpe c60ed75258 feat(middleware): Model routing, PII filtering, Cloud model proxies
Add a routing middleware stack and a cloud-proxy backend.

* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
  Anthropic-shaped chat requests to upstream providers, with an
  optional translate mode (OpenAI request -> Anthropic /v1/messages
  -> OpenAI response) and full tool-calling support.

* routing: admission control, content-aware model routing
  (embedding cache + classifier + rerank + Arch-Router score),
  PII detection/redaction (regex + NER) with streaming filter and
  OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
  backed by GORM or in-memory storage.

* middleware: UsageMiddleware records usage via the billing recorder,
  plus admission, route-model, usage-stamp and trace middlewares.

* observability: BackendTrace ring buffer stores full request bodies
  (capped), MITM proxy emits structured trace events, and router
  classifier decisions surface at /api/router/decide.

* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).

* UI: cloud-proxy model-editor fields, classifier system-prompt and
  score-normalization config, and a Traces page rendering request
  bodies.

Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-24 09:42:31 +01:00
..
ace-step feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
chatterbox fix(chatterbox): install chatterbox-tts with --no-deps and pin runtime deps 2026-05-07 09:03:40 +00:00
common fix(python-backend): make JIT subprocesses work on hosts of any size (#9679) 2026-05-06 00:28:01 +02:00
coqui chore(deps): bump packaging from 24.1 to 26.2 in /backend/python/coqui (#9594) 2026-04-28 08:44:53 +02:00
diffusers fix(diffusers): drop compel from requirements to unblock pip resolver (#9632) 2026-05-01 14:45:14 +02:00
faster-qwen3-tts feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
faster-whisper test(ci): trigger faster-whisper rebuild to observe per-arch+merge 2026-05-08 22:09:46 +00:00
fish-speech feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
insightface feat: add biometrics UI (#9524) 2026-04-24 08:50:34 +02:00
kitten-tts feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
kokoro feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
liquid-audio feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801) 2026-05-13 21:57:27 +02:00
llama-cpp-quantization feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
mlx feat: refactor shared helpers and enhance MLX backend functionality (#9335) 2026-04-13 18:44:03 +02:00
mlx-audio feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
mlx-distributed feat: refactor shared helpers and enhance MLX backend functionality (#9335) 2026-04-13 18:44:03 +02:00
mlx-vlm fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds (#9568) 2026-04-25 22:06:01 +02:00
moonshine feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
nemo feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
neutts feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
outetts feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
pocket-tts feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp (#9629) 2026-05-01 10:56:24 +02:00
qwen-asr feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
qwen-tts feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
rerankers fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 (#9688) 2026-05-06 17:07:24 +02:00
rfdetr feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
sglang fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950) 2026-05-22 23:01:22 +02:00
speaker-recognition feat: add biometrics UI (#9524) 2026-04-24 08:50:34 +02:00
tinygrad feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp (#9629) 2026-05-01 10:56:24 +02:00
transformers feat(middleware): Model routing, PII filtering, Cloud model proxies 2026-05-24 09:42:31 +01:00
trl feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
vibevoice feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
vllm feat(middleware): Model routing, PII filtering, Cloud model proxies 2026-05-24 09:42:31 +01:00
vllm-omni fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950) 2026-05-22 23:01:22 +02:00
voxcpm feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
whisperx chore(whisperx): drop ROCm/hipblas build target (#9474) 2026-04-21 21:50:18 +02:00
README.md chore: drop bark which is unmaintained (#8207) 2026-01-25 09:26:40 +01:00

Python Backends for LocalAI

This directory contains Python-based AI backends for LocalAI, providing support for various AI models and hardware acceleration targets.

Overview

The Python backends use a unified build system based on libbackend.sh that provides:

  • Automatic virtual environment management with support for both uv and pip
  • Hardware-specific dependency installation (CPU, CUDA, Intel, MLX, etc.)
  • Portable Python support for standalone deployments
  • Consistent backend execution across different environments

Available Backends

Core AI Models

  • transformers - Hugging Face Transformers framework (PyTorch-based)
  • vllm - High-performance LLM inference engine
  • mlx - Apple Silicon optimized ML framework

Audio & Speech

  • coqui - Coqui TTS models
  • faster-whisper - Fast Whisper speech recognition
  • kitten-tts - Lightweight TTS
  • mlx-audio - Apple Silicon audio processing
  • chatterbox - TTS model
  • kokoro - TTS models

Computer Vision

  • diffusers - Stable Diffusion and image generation
  • mlx-vlm - Vision-language models for Apple Silicon
  • rfdetr - Object detection models

Specialized

  • rerankers - Text reranking models

Quick Start

Prerequisites

  • Python 3.10+ (default: 3.10.18)
  • uv package manager (recommended) or pip
  • Appropriate hardware drivers for your target (CUDA, Intel, etc.)

Installation

Each backend can be installed individually:

# Navigate to a specific backend
cd backend/python/transformers

# Install dependencies
make transformers
# or
bash install.sh

# Run the backend
make run
# or
bash run.sh

Using the Unified Build System

The libbackend.sh script provides consistent commands across all backends:

# Source the library in your backend script
source $(dirname $0)/../common/libbackend.sh

# Install requirements (automatically handles hardware detection)
installRequirements

# Start the backend server
startBackend $@

# Run tests
runUnittests

Hardware Targets

The build system automatically detects and configures for different hardware:

  • CPU - Standard CPU-only builds
  • CUDA - NVIDIA GPU acceleration (supports CUDA 12/13)
  • Intel - Intel XPU/GPU optimization
  • MLX - Apple Silicon (M1/M2/M3) optimization
  • HIP - AMD GPU acceleration

Target-Specific Requirements

Backends can specify hardware-specific dependencies:

  • requirements.txt - Base requirements
  • requirements-cpu.txt - CPU-specific packages
  • requirements-cublas12.txt - CUDA 12 packages
  • requirements-cublas13.txt - CUDA 13 packages
  • requirements-intel.txt - Intel-optimized packages
  • requirements-mps.txt - Apple Silicon packages

Configuration Options

Environment Variables

  • PYTHON_VERSION - Python version (default: 3.10)
  • PYTHON_PATCH - Python patch version (default: 18)
  • BUILD_TYPE - Force specific build target
  • USE_PIP - Use pip instead of uv (default: false)
  • PORTABLE_PYTHON - Enable portable Python builds
  • LIMIT_TARGETS - Restrict backend to specific targets

Example: CUDA 12 Only Backend

# In your backend script
LIMIT_TARGETS="cublas12"
source $(dirname $0)/../common/libbackend.sh

Example: Intel-Optimized Backend

# In your backend script
LIMIT_TARGETS="intel"
source $(dirname $0)/../common/libbackend.sh

Development

Adding a New Backend

  1. Create a new directory in backend/python/
  2. Copy the template structure from common/template/
  3. Implement your backend.py with the required gRPC interface
  4. Add appropriate requirements files for your target hardware
  5. Use libbackend.sh for consistent build and execution

Testing

# Run backend tests
make test
# or
bash test.sh

Building

# Install dependencies
make <backend-name>

# Clean build artifacts
make clean

Architecture

Each backend follows a consistent structure:

backend-name/
├── backend.py          # Main backend implementation
├── requirements.txt    # Base dependencies
├── requirements-*.txt  # Hardware-specific dependencies
├── install.sh         # Installation script
├── run.sh            # Execution script
├── test.sh           # Test script
├── Makefile          # Build targets
└── test.py           # Unit tests

Troubleshooting

Common Issues

  1. Missing dependencies: Ensure all requirements files are properly configured
  2. Hardware detection: Check that BUILD_TYPE matches your system
  3. Python version: Verify Python 3.10+ is available
  4. Virtual environment: Use ensureVenv to create/activate environments

Contributing

When adding new backends or modifying existing ones:

  1. Follow the established directory structure
  2. Use libbackend.sh for consistent behavior
  3. Include appropriate requirements files for all target hardware
  4. Add comprehensive tests
  5. Update this README if adding new backend types