LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) for object detection and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
- Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU)
- Structured detection results with confidence scores
- Easy integration through the `/v1/detection` endpoint
## Usage
### Detection Endpoint
LocalAI provides a dedicated `/v1/detection` endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.
### API Reference
To perform object detection, send a POST request to the `/v1/detection` endpoint:
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "rfdetr-base",
"image": "https://media.roboflow.com/dog.jpeg"
}'
```
### Request Format
The request body should contain:
-`model`: The name of the object detection model (e.g., "rfdetr-base")
The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:
- **CPU**: Optimized for CPU inference
- **NVIDIA GPU**: CUDA acceleration for NVIDIA GPUs
- **Intel GPU**: Intel oneAPI optimization
- **AMD GPU**: ROCm acceleration for AMD GPUs
- **NVIDIA Jetson**: Optimized for ARM64 NVIDIA Jetson devices
#### Setup
1.**Using the Model Gallery (Recommended)**
The easiest way to get started is using the model gallery. The `rfdetr-base` model is available in the official LocalAI gallery:
```bash
# Install and run the rfdetr-base model
local-ai run rfdetr-base
```
You can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base".
2.**Manual Configuration**
Create a model configuration file in your `models` directory:
The sam3-cpp backend provides image segmentation using [sam3.cpp](https://github.com/PABannier/sam3.cpp), a portable C++ implementation of Meta's Segment Anything Model. It supports multiple model architectures:
- **SAM 3**: Full model with text encoder for text-prompted detection and segmentation
- **SAM 2 / SAM 2.1**: Hiera backbone models in multiple sizes
- **SAM 3 Visual-Only**: Point/box segmentation without text encoder
- **EdgeTAM**: Ultra-efficient mobile variant (~15MB quantized)
#### Setup
1.**Manual Configuration**
Create a model configuration file in your `models` directory:
```yaml
name: sam3
backend: sam3-cpp
parameters:
model: edgetam_q4_0.ggml
threads: 4
known_usecases:
- detection
```
Download the model from [Hugging Face](https://huggingface.co/PABannier/sam3.cpp).
#### Segmentation Modes
**Point-prompted segmentation** (all models):
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"points": [256.0, 256.0, 1.0],
"threshold": 0.5
}'
```
**Box-prompted segmentation** (all models):
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"boxes": [100.0, 100.0, 400.0, 400.0],
"threshold": 0.5
}'
```
**Text-prompted segmentation** (SAM 3 full model only):
```bash
curl -X POST http://localhost:8080/v1/detection \
-H "Content-Type: application/json" \
-d '{
"model": "sam3",
"image": "data:image/jpeg;base64,...",
"prompt": "cat",
"threshold": 0.5
}'
```
The response includes segmentation masks as base64-encoded PNGs in the `mask` field of each detection.
- Verify model compatibility with your backend version
2.**Low Detection Accuracy**
- Ensure good image quality and lighting
- Check if objects are clearly visible
- Consider using a larger model for better accuracy
3.**Slow Performance**
- Enable GPU acceleration if available
- Use a smaller model for faster inference
- Optimize image resolution
### Debug Mode
Enable debug logging for troubleshooting:
```bash
local-ai run --debug rfdetr-base
```
## Object Detection Category
LocalAI includes a dedicated **object-detection** category for models and backends that specialize in identifying and locating objects within images. This category currently includes:
Additional object detection models and backends will be added to this category in the future. You can filter models by the `object-detection` tag in the model gallery to find all available object detection models.