ultralytics 8.4.23 Refactor AutoBackend into modular per-backend classes (#23790)

Signed-off-by: Jing Qiu <61612323+Laughing-q@users.noreply.github.com>
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
Co-authored-by: Ultralytics Assistant <135830346+UltralyticsAssistant@users.noreply.github.com>
Co-authored-by: Lakshantha Dissanayake <lakshantha@ultralytics.com>
Co-authored-by: Onuralp SEZER <onuralp@ultralytics.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This commit is contained in:
Jing Qiu 2026-03-17 07:39:27 +08:00 committed by GitHub
parent b3c79532e3
commit b10fa7be23
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
38 changed files with 1844 additions and 773 deletions

View file

@ -0,0 +1,16 @@
---
description: Explore AxeleraBackend for Axelera hardware inference, deploying YOLO models on Axelera AI accelerators with optimized performance.
keywords: Ultralytics, AxeleraBackend, Axelera inference, AI accelerator, hardware inference, edge AI, deep learning acceleration
---
# Reference for `ultralytics/nn/backends/axelera.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/axelera.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/axelera.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.axelera.AxeleraBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore the BaseBackend class, the abstract foundation for all inference backends in Ultralytics, defining the interface for model loading and inference.
keywords: Ultralytics, BaseBackend, inference backend, abstract class, model loading, deep learning, neural network inference
---
# Reference for `ultralytics/nn/backends/base.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/base.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/base.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.base.BaseBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore CoreMLBackend for Apple CoreML inference, enabling efficient YOLO model deployment on iOS, macOS, and Apple Silicon devices.
keywords: Ultralytics, CoreMLBackend, CoreML inference, Apple CoreML, iOS deployment, macOS inference, Apple Silicon, mobile AI
---
# Reference for `ultralytics/nn/backends/coreml.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/coreml.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/coreml.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.coreml.CoreMLBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore ExecuTorchBackend for Meta ExecuTorch inference, enabling efficient PyTorch model deployment on mobile and edge devices.
keywords: Ultralytics, ExecuTorchBackend, ExecuTorch inference, Meta ExecuTorch, mobile inference, edge deployment, PyTorch Mobile
---
# Reference for `ultralytics/nn/backends/executorch.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/executorch.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/executorch.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.executorch.ExecuTorchBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore MNNBackend for Alibaba MNN inference, enabling lightweight and efficient model deployment on mobile and edge devices.
keywords: Ultralytics, MNNBackend, MNN inference, Alibaba MNN, mobile inference, edge AI, .mnn models, deep learning
---
# Reference for `ultralytics/nn/backends/mnn.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/mnn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/mnn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.mnn.MNNBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore NCNNBackend for Tencent NCNN inference, optimized for mobile and embedded platforms with Vulkan acceleration support.
keywords: Ultralytics, NCNNBackend, NCNN inference, Tencent NCNN, mobile inference, Vulkan acceleration, embedded AI, deep learning
---
# Reference for `ultralytics/nn/backends/ncnn.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/ncnn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/ncnn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.ncnn.NCNNBackend
<br><br>

View file

@ -0,0 +1,20 @@
---
description: Explore ONNXBackend and ONNXIMXBackend for Microsoft ONNX Runtime inference, supporting standard ONNX models and NXP IMX-optimized variants.
keywords: Ultralytics, ONNXBackend, ONNXIMXBackend, Microsoft ONNX Runtime, Sony IMX, ONNX inference, edge deployment, deep learning
---
# Reference for `ultralytics/nn/backends/onnx.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/onnx.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/onnx.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.onnx.ONNXBackend
<br><br><hr><br>
## ::: ultralytics.nn.backends.onnx.ONNXIMXBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore OpenVINOBackend for optimized inference on Intel hardware, supporting OpenVINO IR models for efficient deployment on CPUs, GPUs, and VPUs.
keywords: Ultralytics, OpenVINOBackend, OpenVINO inference, Intel OpenVINO, CPU inference, VPU, edge AI, deep learning optimization
---
# Reference for `ultralytics/nn/backends/openvino.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/openvino.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/openvino.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.openvino.OpenVINOBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore PaddleBackend for Baidu PaddlePaddle inference, supporting deployment with Paddle Inference engine on various hardware platforms.
keywords: Ultralytics, PaddleBackend, PaddlePaddle inference, Baidu Paddle, Paddle Inference, deep learning, model deployment
---
# Reference for `ultralytics/nn/backends/paddle.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/paddle.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/paddle.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.paddle.PaddleBackend
<br><br>

View file

@ -0,0 +1,20 @@
---
description: Explore PyTorchBackend and TorchScriptBackend for native PyTorch and TorchScript model inference in Ultralytics YOLO models.
keywords: Ultralytics, PyTorchBackend, TorchScriptBackend, PyTorch inference, TorchScript inference, .pt models, deep learning, YOLO
---
# Reference for `ultralytics/nn/backends/pytorch.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/pytorch.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/pytorch.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.pytorch.PyTorchBackend
<br><br><hr><br>
## ::: ultralytics.nn.backends.pytorch.TorchScriptBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore RKNNBackend for Rockchip RKNN inference, enabling optimized YOLO deployment on Rockchip NPU-equipped edge devices.
keywords: Ultralytics, RKNNBackend, RKNN inference, Rockchip RKNN, NPU inference, edge AI, embedded deployment, deep learning
---
# Reference for `ultralytics/nn/backends/rknn.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/rknn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/rknn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.rknn.RKNNBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore TensorFlowBackend for Google TensorFlow inference including SavedModel, GraphDef, TFLite, and Edge TPU formats.
keywords: Ultralytics, TensorFlowBackend, Google TensorFlow, TFLite, Edge TPU, SavedModel, GraphDef, deep learning, model deployment
---
# Reference for `ultralytics/nn/backends/tensorflow.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorflow.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorflow.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.tensorflow.TensorFlowBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore TensorRTBackend for high-performance GPU inference with NVIDIA TensorRT, optimizing YOLO models for production deployment.
keywords: Ultralytics, TensorRTBackend, TensorRT inference, NVIDIA TensorRT, GPU inference, .engine models, production deployment, deep learning
---
# Reference for `ultralytics/nn/backends/tensorrt.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorrt.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorrt.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.tensorrt.TensorRTBackend
<br><br>

View file

@ -0,0 +1,16 @@
---
description: Explore TritonBackend for NVIDIA Triton Inference Server, enabling scalable cloud and edge deployment of YOLO models.
keywords: Ultralytics, TritonBackend, Triton Inference Server, NVIDIA Triton, cloud inference, model serving, scalable deployment
---
# Reference for `ultralytics/nn/backends/triton.py`
!!! success "Improvements"
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/triton.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/triton.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
<br>
## ::: ultralytics.nn.backends.triton.TritonBackend
<br><br>

View file

@ -745,6 +745,21 @@ nav:
- val: reference/models/yolo/yoloe/val.md
- nn:
- autobackend: reference/nn/autobackend.md
- backends:
- axelera: reference/nn/backends/axelera.md
- base: reference/nn/backends/base.md
- coreml: reference/nn/backends/coreml.md
- executorch: reference/nn/backends/executorch.md
- mnn: reference/nn/backends/mnn.md
- ncnn: reference/nn/backends/ncnn.md
- onnx: reference/nn/backends/onnx.md
- openvino: reference/nn/backends/openvino.md
- paddle: reference/nn/backends/paddle.md
- pytorch: reference/nn/backends/pytorch.md
- rknn: reference/nn/backends/rknn.md
- tensorflow: reference/nn/backends/tensorflow.md
- tensorrt: reference/nn/backends/tensorrt.md
- triton: reference/nn/backends/triton.md
- modules:
- activation: reference/nn/modules/activation.md
- block: reference/nn/modules/block.md

View file

@ -1,6 +1,6 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
__version__ = "8.4.22"
__version__ = "8.4.23"
import importlib
import os

View file

@ -195,7 +195,7 @@ class BasePredictor:
self.imgsz,
auto=same_shapes
and self.args.rect
and (self.model.pt or (getattr(self.model, "dynamic", False) and not self.model.imx)),
and (self.model.format == "pt" or (getattr(self.model, "dynamic", False) and self.model.format != "imx")),
stride=self.model.stride,
)
return [letterbox(image=x) for x in im]
@ -258,7 +258,7 @@ class BasePredictor:
batch=self.args.batch,
vid_stride=self.args.vid_stride,
buffer=self.args.stream_buffer,
channels=getattr(self.model, "ch", 3),
channels=getattr(self.model, "channels", 3),
)
self.source_type = self.dataset.source_type
if (
@ -305,7 +305,11 @@ class BasePredictor:
# Warmup model
if not self.done_warmup:
self.model.warmup(
imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, self.model.ch, *self.imgsz)
imgsz=(
1 if self.model.format in {"pt", "triton"} else self.dataset.bs,
self.model.channels,
*self.imgsz,
)
)
self.done_warmup = True
@ -372,7 +376,7 @@ class BasePredictor:
t = tuple(x.t / self.seen * 1e3 for x in profilers) # speeds per image
LOGGER.info(
f"Speed: %.1fms preprocess, %.1fms inference, %.1fms postprocess per image at shape "
f"{(min(self.args.batch, self.seen), getattr(self.model, 'ch', 3), *im.shape[2:])}" % t
f"{(min(self.args.batch, self.seen), getattr(self.model, 'channels', 3), *im.shape[2:])}" % t
)
if self.args.save or self.args.save_txt or self.args.save_crop:
nl = len(list(self.save_dir.glob("labels/*.txt"))) # number of labels

View file

@ -172,9 +172,10 @@ class BaseValidator:
)
self.device = model.device # update device
self.args.half = model.fp16 # update half
stride, pt, jit = model.stride, model.pt, model.jit
stride, fmt = model.stride, model.format
pt = fmt == "pt"
imgsz = check_imgsz(self.args.imgsz, stride=stride)
if not (pt or jit or getattr(model, "dynamic", False)):
if fmt not in {"pt", "torchscript"} and not getattr(model, "dynamic", False):
self.args.batch = model.metadata.get("batch", 1) # export.py models default to batch-size 1
LOGGER.info(f"Setting batch={self.args.batch} input of shape ({self.args.batch}, 3, {imgsz}, {imgsz})")
@ -187,7 +188,7 @@ class BaseValidator:
if self.device.type in {"cpu", "mps"}:
self.args.workers = 0 # faster CPU val as time dominated by inference, not dataloading
if not (pt or (getattr(model, "dynamic", False) and not model.imx)):
if not (pt or (getattr(model, "dynamic", False) and fmt != "imx")):
self.args.rect = False
self.stride = model.stride # used in get_dataloader() for padding
self.dataloader = self.dataloader or self.get_dataloader(self.data.get(self.args.split), self.args.batch)

View file

@ -462,8 +462,7 @@ class Predictor(BasePredictor):
self.std = torch.tensor([58.395, 57.12, 57.375]).view(-1, 1, 1).to(device)
# Ultralytics compatibility settings
self.model.pt = False
self.model.triton = False
self.model.format = "sam"
self.model.stride = 32
self.model.fp16 = self.args.half
self.done_warmup = True

View file

@ -59,7 +59,7 @@ class ClassificationPredictor(BasePredictor):
else False
)
self.transforms = (
classify_transforms(self.imgsz) if updated or not self.model.pt else self.model.model.transforms
classify_transforms(self.imgsz) if updated or self.model.format != "pt" else self.model.model.transforms
)
def preprocess(self, img):

View file

@ -63,7 +63,7 @@ class WorldTrainerFromScratch(WorldTrainer):
Args:
cfg (dict): Configuration dictionary with default parameters for model training.
overrides (dict, optional): Dictionary of parameter overrides to customize the configuration.
_callbacks (dict, optional): Dictionary of callback functions to be executed during different stages of training.
_callbacks (dict, optional): Dictionary of callback functions to run during different stages of training.
"""
if overrides is None:
overrides = {}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,41 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
"""Ultralytics YOLO inference backends.
This package provides modular inference backends for various deep learning frameworks and hardware accelerators.
Each backend implements the `BaseBackend` interface and can be used independently or through the unified
`AutoBackend` dispatcher for automatic format detection and inference routing.
"""
from .axelera import AxeleraBackend
from .base import BaseBackend
from .coreml import CoreMLBackend
from .executorch import ExecuTorchBackend
from .mnn import MNNBackend
from .ncnn import NCNNBackend
from .onnx import ONNXBackend, ONNXIMXBackend
from .openvino import OpenVINOBackend
from .paddle import PaddleBackend
from .pytorch import PyTorchBackend, TorchScriptBackend
from .rknn import RKNNBackend
from .tensorflow import TensorFlowBackend
from .tensorrt import TensorRTBackend
from .triton import TritonBackend
__all__ = [
"AxeleraBackend",
"BaseBackend",
"CoreMLBackend",
"ExecuTorchBackend",
"MNNBackend",
"NCNNBackend",
"ONNXBackend",
"ONNXIMXBackend",
"OpenVINOBackend",
"PaddleBackend",
"PyTorchBackend",
"RKNNBackend",
"TensorFlowBackend",
"TensorRTBackend",
"TorchScriptBackend",
"TritonBackend",
]

View file

@ -0,0 +1,69 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
import os
from pathlib import Path
import torch
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class AxeleraBackend(BaseBackend):
"""Axelera AI inference backend for Axelera Metis AI accelerators.
Loads compiled Axelera models (.axm files) and runs inference using the Axelera AI runtime SDK. Requires the Axelera
runtime environment to be activated before use.
"""
def load_model(self, weight: str | Path) -> None:
"""Load an Axelera model from a directory containing a .axm file.
Args:
weight (str | Path): Path to the Axelera model directory containing the .axm binary.
"""
if not os.environ.get("AXELERA_RUNTIME_DIR"):
LOGGER.warning(
"Axelera runtime environment is not activated.\n"
"Please run: source /opt/axelera/sdk/latest/axelera_activate.sh\n\n"
"If this fails, verify driver installation: "
"https://docs.ultralytics.com/integrations/axelera/#axelera-driver-installation"
)
try:
from axelera.runtime import op
except ImportError:
check_requirements(
"axelera_runtime2==0.1.2",
cmds="--extra-index-url https://software.axelera.ai/artifactory/axelera-runtime-pypi",
)
from axelera.runtime import op
w = Path(weight)
found = next(w.rglob("*.axm"), None)
if found is None:
raise FileNotFoundError(f"No .axm file found in: {w}")
self.model = op.load(str(found))
# Load metadata
metadata_file = found.parent / "metadata.yaml"
if metadata_file.exists():
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
def forward(self, im: torch.Tensor) -> list:
"""Run inference on the Axelera hardware accelerator.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list): Model predictions as a list of output arrays.
"""
return self.model(im.cpu())

View file

@ -0,0 +1,104 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
import ast
from abc import ABC, abstractmethod
import torch
class BaseBackend(ABC):
"""Base class for all inference backends.
This abstract class defines the interface that all inference backends must implement. It provides common
functionality for model loading, metadata processing, and device management.
Attributes:
model: The underlying inference model or runtime session.
device (torch.device): The device to run inference on.
fp16 (bool): Whether to use FP16 (half-precision) inference.
nhwc (bool): Whether the model expects NHWC input format instead of NCHW.
stride (int): Model stride, typically 32 for YOLO models.
names (dict): Dictionary mapping class indices to class names.
task (str | None): The task type (detect, segment, classify, pose, obb).
batch (int): Batch size for inference.
imgsz (tuple): Input image size as (height, width).
channels (int): Number of input channels, typically 3 for RGB.
end2end (bool): Whether the model includes end-to-end NMS post-processing.
dynamic (bool): Whether the model supports dynamic input shapes.
metadata (dict): Model metadata dictionary containing export configuration.
"""
def __init__(self, weight: str | torch.nn.Module, device: torch.device | str, fp16: bool = False):
"""Initialize the base backend with common attributes and load the model.
Args:
weight (str | torch.nn.Module): Path to the model weights file or a PyTorch module instance.
device (torch.device | str): Device to run inference on (e.g., 'cpu', 'cuda:0').
fp16 (bool): Whether to use FP16 half-precision inference.
"""
self.device = device
self.fp16 = fp16
self.nhwc = False
self.stride = 32
self.names = {}
self.task = None
self.batch = 1
self.channels = 3
self.end2end = False
self.dynamic = False
self.metadata = {}
self.model = None
self.load_model(weight)
@abstractmethod
def load_model(self, weight: str | torch.nn.Module) -> None:
"""Load the model from a weights file or module instance.
Args:
weight (str | torch.nn.Module): Path to model weights or a PyTorch module.
"""
raise NotImplementedError
@abstractmethod
def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor]:
"""Run inference on the input image tensor.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(torch.Tensor | list[torch.Tensor]): Model output as a single tensor or list of tensors.
"""
raise NotImplementedError
def apply_metadata(self, metadata: dict | None) -> None:
"""Process and apply model metadata to backend attributes.
Handles type conversions for common metadata fields (e.g., stride, batch, names) and sets them as
instance attributes. Also resolves end-to-end NMS and dynamic shape settings from export args.
Args:
metadata (dict | None): Dictionary containing metadata key-value pairs from model export.
"""
if not metadata:
return
# Store raw metadata
self.metadata = metadata
# Process type conversions for known fields
for k, v in metadata.items():
if k in {"stride", "batch", "channels"}:
metadata[k] = int(v)
elif k in {"imgsz", "names", "kpt_shape", "kpt_names", "args", "end2end"} and isinstance(v, str):
metadata[k] = ast.literal_eval(v)
# Handle models exported with end-to-end NMS
metadata["end2end"] = metadata.get("end2end", False) or metadata.get("args", {}).get("nms", False)
metadata["dynamic"] = metadata.get("args", {}).get("dynamic", self.dynamic)
# Apply all metadata fields as backend attributes
for k, v in metadata.items():
setattr(self, k, v)

View file

@ -0,0 +1,64 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import numpy as np
import torch
from PIL import Image
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class CoreMLBackend(BaseBackend):
"""CoreML inference backend for Apple hardware.
Loads and runs inference with CoreML models (.mlpackage files) using the coremltools library. Supports both static
and dynamic input shapes and handles NMS-included model outputs.
"""
def load_model(self, weight: str | Path) -> None:
"""Load a CoreML model from a .mlpackage file.
Args:
weight (str | Path): Path to the .mlpackage model file.
"""
check_requirements(["coremltools>=9.0", "numpy>=1.14.5,<=2.3.5"])
import coremltools as ct
LOGGER.info(f"Loading {weight} for CoreML inference...")
self.model = ct.models.MLModel(weight)
self.dynamic = self.model.get_spec().description.input[0].type.HasField("multiArrayType")
# Load metadata
self.apply_metadata(dict(self.model.user_defined_metadata))
def forward(self, im: torch.Tensor) -> np.ndarray | list[np.ndarray]:
"""Run CoreML inference with automatic input format handling.
Args:
im (torch.Tensor): Input image tensor in BHWC format (converted from BCHW by AutoBackend).
Returns:
(np.ndarray | list[np.ndarray]): Model predictions as numpy array(s).
"""
im = im.cpu().numpy()
h, w = im.shape[1:3]
im = im.transpose(0, 3, 1, 2) if self.dynamic else Image.fromarray((im[0] * 255).astype("uint8"))
y = self.model.predict({"image": im})
if "confidence" in y: # NMS included
from ultralytics.utils.ops import xywh2xyxy
box = xywh2xyxy(y["coordinates"] * [[w, h, w, h]])
cls = y["confidence"].argmax(1, keepdims=True)
y = np.concatenate((box, np.take_along_axis(y["confidence"], cls, axis=1), cls), 1)[None]
else:
y = list(y.values())
if len(y) == 2 and len(y[1].shape) != 4: # segmentation model
y = list(reversed(y))
return y

View file

@ -0,0 +1,59 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import torch
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_executorch_requirements
from .base import BaseBackend
class ExecuTorchBackend(BaseBackend):
"""Meta ExecuTorch inference backend for on-device deployment.
Loads and runs inference with Meta ExecuTorch models (.pte files) using the ExecuTorch runtime. Supports both
standalone .pte files and directory-based model packages with metadata.
"""
def load_model(self, weight: str | Path) -> None:
"""Load an ExecuTorch model from a .pte file or directory.
Args:
weight (str | Path): Path to the .pte model file or directory containing the model.
"""
LOGGER.info(f"Loading {weight} for ExecuTorch inference...")
check_executorch_requirements()
from executorch.runtime import Runtime
w = Path(weight)
if w.is_dir():
model_file = next(w.rglob("*.pte"))
metadata_file = w / "metadata.yaml"
else:
model_file = w
metadata_file = w.parent / "metadata.yaml"
program = Runtime.get().load_program(str(model_file))
self.model = program.load_method("forward")
# Load metadata
if metadata_file.exists():
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
def forward(self, im: torch.Tensor) -> list:
"""Run inference using the ExecuTorch runtime.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list): Model predictions as a list of ExecuTorch output values.
"""
return self.model.execute([im])

View file

@ -0,0 +1,59 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
import json
import os
from pathlib import Path
import torch
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class MNNBackend(BaseBackend):
"""MNN (Mobile Neural Network) inference backend.
Loads and runs inference with MNN models (.mnn files) using the Alibaba MNN framework. Optimized for mobile and edge
deployment with configurable thread count and precision.
"""
def load_model(self, weight: str | Path) -> None:
"""Load an Alibaba MNN model from a .mnn file.
Args:
weight (str | Path): Path to the .mnn model file.
"""
LOGGER.info(f"Loading {weight} for MNN inference...")
check_requirements("MNN")
import MNN
config = {"precision": "low", "backend": "CPU", "numThread": (os.cpu_count() + 1) // 2}
rt = MNN.nn.create_runtime_manager((config,))
self.net = MNN.nn.load_module_from_file(weight, [], [], runtime_manager=rt, rearrange=True)
self.expr = MNN.expr
# Load metadata from bizCode
info = self.net.get_info()
if "bizCode" in info:
try:
self.apply_metadata(json.loads(info["bizCode"]))
except json.JSONDecodeError:
pass
def forward(self, im: torch.Tensor) -> list:
"""Run inference using the MNN runtime.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list): Model predictions as a list of numpy arrays.
"""
input_var = self.expr.const(im.data_ptr(), im.shape)
output_var = self.net.onForward([input_var])
# NOTE: need this copy(), or it'd get incorrect results on ARM devices
return [x.read().copy() for x in output_var]

View file

@ -0,0 +1,72 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import numpy as np
import torch
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class NCNNBackend(BaseBackend):
"""Tencent NCNN inference backend for mobile and embedded deployment.
Loads and runs inference with Tencent NCNN models (*_ncnn_model/ directories). Optimized for mobile platforms with
optional Vulkan GPU acceleration when available.
"""
def load_model(self, weight: str | Path) -> None:
"""Load an NCNN model from a .param/.bin file pair or model directory.
Args:
weight (str | Path): Path to the .param file or directory containing NCNN model files.
"""
LOGGER.info(f"Loading {weight} for NCNN inference...")
check_requirements("ncnn", cmds="--no-deps")
import ncnn as pyncnn
self.pyncnn = pyncnn
self.net = pyncnn.Net()
# Setup Vulkan if available
if isinstance(self.device, str) and self.device.startswith("vulkan"):
self.net.opt.use_vulkan_compute = True
self.net.set_vulkan_device(int(self.device.split(":")[1]))
self.device = torch.device("cpu")
else:
self.net.opt.use_vulkan_compute = False
w = Path(weight)
if not w.is_file():
w = next(w.glob("*.param"))
self.net.load_param(str(w))
self.net.load_model(str(w.with_suffix(".bin")))
# Load metadata
metadata_file = w.parent / "metadata.yaml"
if metadata_file.exists():
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
"""Run inference using the NCNN runtime.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list[np.ndarray]): Model predictions as a list of numpy arrays, one per output layer.
"""
mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
with self.net.create_extractor() as ex:
ex.input(self.net.input_names()[0], mat_in)
# Sort output names as temporary fix for pnnx issue
y = [np.array(ex.extract(x)[1])[None] for x in sorted(self.net.output_names())]
return y

View file

@ -0,0 +1,196 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import numpy as np
import torch
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class ONNXBackend(BaseBackend):
"""Microsoft ONNX Runtime inference backend with optional OpenCV DNN support.
Loads and runs inference with ONNX models (.onnx files) using either Microsoft ONNX Runtime with CUDA/CoreML
execution providers, or OpenCV DNN for lightweight CPU inference. Supports IO binding for optimized GPU inference
with static input shapes.
"""
def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False, format: str = "onnx"):
"""Initialize the ONNX backend.
Args:
weight (str | Path): Path to the .onnx model file.
device (torch.device): Device to run inference on.
fp16 (bool): Whether to use FP16 half-precision inference.
format (str): Inference engine, either "onnx" for ONNX Runtime or "dnn" for OpenCV DNN.
"""
assert format in {"onnx", "dnn"}, f"Unsupported ONNX format: {format}."
self.format = format
super().__init__(weight, device, fp16)
def load_model(self, weight: str | Path) -> None:
"""Load an ONNX model using ONNX Runtime or OpenCV DNN.
Args:
weight (str | Path): Path to the .onnx model file.
"""
cuda = isinstance(self.device, torch.device) and torch.cuda.is_available() and self.device.type != "cpu"
if self.format == "dnn":
# OpenCV DNN
LOGGER.info(f"Loading {weight} for ONNX OpenCV DNN inference...")
check_requirements("opencv-python>=4.5.4")
import cv2
self.net = cv2.dnn.readNetFromONNX(weight)
else:
# ONNX Runtime
LOGGER.info(f"Loading {weight} for ONNX Runtime inference...")
check_requirements(("onnx", "onnxruntime-gpu" if cuda else "onnxruntime"))
import onnxruntime
# Select execution provider
available = onnxruntime.get_available_providers()
if cuda and "CUDAExecutionProvider" in available:
providers = [("CUDAExecutionProvider", {"device_id": self.device.index}), "CPUExecutionProvider"]
elif self.device.type == "mps" and "CoreMLExecutionProvider" in available:
providers = ["CoreMLExecutionProvider", "CPUExecutionProvider"]
else:
providers = ["CPUExecutionProvider"]
if cuda:
LOGGER.warning("CUDA requested but CUDAExecutionProvider not available. Using CPU...")
self.device = torch.device("cpu")
cuda = False
LOGGER.info(
f"Using ONNX Runtime {onnxruntime.__version__} with "
f"{providers[0] if isinstance(providers[0], str) else providers[0][0]}"
)
self.session = onnxruntime.InferenceSession(weight, providers=providers)
self.output_names = [x.name for x in self.session.get_outputs()]
# Get metadata
metadata_map = self.session.get_modelmeta().custom_metadata_map
if metadata_map:
self.apply_metadata(dict(metadata_map))
# Check if dynamic shapes
self.dynamic = isinstance(self.session.get_outputs()[0].shape[0], str)
self.fp16 = "float16" in self.session.get_inputs()[0].type
# Setup IO binding for CUDA
self.use_io_binding = not self.dynamic and cuda
if self.use_io_binding:
self.io = self.session.io_binding()
self.bindings = []
for output in self.session.get_outputs():
out_fp16 = "float16" in output.type
y_tensor = torch.empty(output.shape, dtype=torch.float16 if out_fp16 else torch.float32).to(
self.device
)
self.io.bind_output(
name=output.name,
device_type=self.device.type,
device_id=self.device.index if cuda else 0,
element_type=np.float16 if out_fp16 else np.float32,
shape=tuple(y_tensor.shape),
buffer_ptr=y_tensor.data_ptr(),
)
self.bindings.append(y_tensor)
def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor] | np.ndarray:
"""Run ONNX inference using IO binding (CUDA) or standard session execution.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(torch.Tensor | list[torch.Tensor] | np.ndarray): Model predictions as tensor(s) or numpy array(s).
"""
if self.format == "dnn":
# OpenCV DNN
self.net.setInput(im.cpu().numpy())
return self.net.forward()
# ONNX Runtime
if self.use_io_binding:
if self.device.type == "cpu":
im = im.cpu()
self.io.bind_input(
name="images",
device_type=im.device.type,
device_id=im.device.index if im.device.type == "cuda" else 0,
element_type=np.float16 if self.fp16 else np.float32,
shape=tuple(im.shape),
buffer_ptr=im.data_ptr(),
)
self.session.run_with_iobinding(self.io)
return self.bindings
else:
return self.session.run(self.output_names, {self.session.get_inputs()[0].name: im.cpu().numpy()})
class ONNXIMXBackend(ONNXBackend):
"""ONNX IMX inference backend for NXP i.MX processors.
Extends `ONNXBackend` with support for quantized models targeting NXP i.MX edge devices. Uses MCT (Model Compression
Toolkit) quantizers and custom NMS operations for optimized inference.
"""
def load_model(self, weight: str | Path) -> None:
"""Load a quantized ONNX model from an IMX model directory.
Args:
weight (str | Path): Path to the IMX model directory containing the .onnx file.
"""
check_requirements(("model-compression-toolkit>=2.4.1", "edge-mdt-cl<1.1.0", "onnxruntime-extensions"))
check_requirements(("onnx", "onnxruntime"))
import mct_quantizers as mctq
import onnxruntime
from edgemdt_cl.pytorch.nms import nms_ort # noqa - register custom NMS ops
w = Path(weight)
onnx_file = next(w.glob("*.onnx"))
LOGGER.info(f"Loading {onnx_file} for ONNX IMX inference...")
session_options = mctq.get_ort_session_options()
session_options.enable_mem_reuse = False
self.session = onnxruntime.InferenceSession(onnx_file, session_options, providers=["CPUExecutionProvider"])
self.output_names = [x.name for x in self.session.get_outputs()]
self.dynamic = isinstance(self.session.get_outputs()[0].shape[0], str)
self.fp16 = "float16" in self.session.get_inputs()[0].type
metadata_map = self.session.get_modelmeta().custom_metadata_map
if metadata_map:
self.apply_metadata(dict(metadata_map))
def forward(self, im: torch.Tensor) -> np.ndarray | list[np.ndarray] | tuple[np.ndarray, ...]:
"""Run IMX inference with task-specific output concatenation for detect, pose, and segment tasks.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(np.ndarray | list[np.ndarray] | tuple[np.ndarray, ...]): Task-formatted model predictions.
"""
y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im.cpu().numpy()})
if self.task == "detect":
# boxes, conf, cls
return np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None]], axis=-1)
elif self.task == "pose":
# boxes, conf, kpts
return np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None], y[3]], axis=-1, dtype=y[0].dtype)
elif self.task == "segment":
return (
np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None], y[3]], axis=-1, dtype=y[0].dtype),
y[4],
)
return y

View file

@ -0,0 +1,105 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import numpy as np
import torch
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class OpenVINOBackend(BaseBackend):
"""Intel OpenVINO inference backend for Intel hardware acceleration.
Loads and runs inference with Intel OpenVINO IR models (*_openvino_model/ directories). Supports automatic device
selection, Intel-specific device targeting, and async inference for throughput optimization.
"""
def load_model(self, weight: str | Path) -> None:
"""Load an Intel OpenVINO IR model from a .xml/.bin file pair or model directory.
Args:
weight (str | Path): Path to the .xml file or directory containing OpenVINO model files.
"""
LOGGER.info(f"Loading {weight} for OpenVINO inference...")
check_requirements("openvino>=2024.0.0")
import openvino as ov
core = ov.Core()
device_name = "AUTO"
if isinstance(self.device, str) and self.device.startswith("intel"):
device_name = self.device.split(":")[1].upper()
self.device = torch.device("cpu")
if device_name not in core.available_devices:
LOGGER.warning(f"OpenVINO device '{device_name}' not available. Using 'AUTO' instead.")
device_name = "AUTO"
w = Path(weight)
if not w.is_file():
w = next(w.glob("*.xml"))
ov_model = core.read_model(model=str(w), weights=w.with_suffix(".bin"))
if ov_model.get_parameters()[0].get_layout().empty:
ov_model.get_parameters()[0].set_layout(ov.Layout("NCHW"))
# Load metadata
metadata_file = w.parent / "metadata.yaml"
if metadata_file.exists():
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
# Set inference mode
self.inference_mode = "CUMULATIVE_THROUGHPUT" if self.dynamic and self.batch > 1 else "LATENCY"
self.ov_compiled_model = core.compile_model(
ov_model,
device_name=device_name,
config={"PERFORMANCE_HINT": self.inference_mode},
)
LOGGER.info(
f"Using OpenVINO {self.inference_mode} mode for batch={self.batch} inference on "
f"{', '.join(self.ov_compiled_model.get_property('EXECUTION_DEVICES'))}..."
)
self.input_name = self.ov_compiled_model.input().get_any_name()
self.ov = ov
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
"""Run Intel OpenVINO inference with sync or async execution based on inference mode.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list[np.ndarray]): Model predictions as a list of numpy arrays, one per output layer.
"""
im = im.cpu().numpy().astype(np.float32)
if self.inference_mode in {"THROUGHPUT", "CUMULATIVE_THROUGHPUT"}:
# Async inference for larger batch sizes
n = im.shape[0]
results = [None] * n
def callback(request, userdata):
"""Store async inference result in the preallocated results list at the given index."""
results[userdata] = request.results
async_queue = self.ov.AsyncInferQueue(self.ov_compiled_model)
async_queue.set_callback(callback)
for i in range(n):
async_queue.start_async(inputs={self.input_name: im[i : i + 1]}, userdata=i)
async_queue.wait_all()
y = [list(r.values()) for r in results]
y = [np.concatenate(x) for x in zip(*y)]
else:
# Sync inference for LATENCY mode
y = list(self.ov_compiled_model(im).values())
return y

View file

@ -0,0 +1,79 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import numpy as np
import torch
from ultralytics.utils import ARM64, LOGGER
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class PaddleBackend(BaseBackend):
"""Baidu PaddlePaddle inference backend.
Loads and runs inference with Baidu PaddlePaddle models (*_paddle_model/ directories). Supports both CPU and GPU
execution with automatic device configuration and memory pool initialization.
"""
def load_model(self, weight: str | Path) -> None:
"""Load a Baidu PaddlePaddle model from a directory containing .json and .pdiparams files.
Args:
weight (str | Path): Path to the model directory or .pdiparams file.
"""
cuda = isinstance(self.device, torch.device) and torch.cuda.is_available() and self.device.type != "cpu"
LOGGER.info(f"Loading {weight} for PaddlePaddle inference...")
if cuda:
check_requirements("paddlepaddle-gpu>=3.0.0,!=3.3.0")
elif ARM64:
check_requirements("paddlepaddle==3.0.0")
else:
check_requirements("paddlepaddle>=3.0.0,!=3.3.0")
import paddle.inference as pdi
w = Path(weight)
model_file, params_file = None, None
if w.is_dir():
model_file = next(w.rglob("*.json"), None)
params_file = next(w.rglob("*.pdiparams"), None)
elif w.suffix == ".pdiparams":
model_file = w.with_name("model.json")
params_file = w
if not (model_file and params_file and model_file.is_file() and params_file.is_file()):
raise FileNotFoundError(f"Paddle model not found in {w}. Both .json and .pdiparams files are required.")
config = pdi.Config(str(model_file), str(params_file))
if cuda:
config.enable_use_gpu(memory_pool_init_size_mb=2048, device_id=self.device.index or 0)
self.predictor = pdi.create_predictor(config)
self.input_handle = self.predictor.get_input_handle(self.predictor.get_input_names()[0])
self.output_names = self.predictor.get_output_names()
# Load metadata
metadata_file = (w if w.is_dir() else w.parent) / "metadata.yaml"
if metadata_file.exists():
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
"""Run Baidu PaddlePaddle inference.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list[np.ndarray]): Model predictions as a list of numpy arrays, one per output handle.
"""
self.input_handle.copy_from_cpu(im.cpu().numpy().astype(np.float32))
self.predictor.run()
return [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]

View file

@ -0,0 +1,137 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
from typing import Any
import torch
import torch.nn as nn
from ultralytics.utils import IS_JETSON, LOGGER, is_jetson
from .base import BaseBackend
class PyTorchBackend(BaseBackend):
"""PyTorch inference backend for native model execution.
Loads and runs inference with native PyTorch models (.pt checkpoint files) or pre-loaded nn.Module
instances. Supports model layer fusion, FP16 precision, and NVIDIA Jetson compatibility.
"""
def __init__(
self,
weight: str | Path | nn.Module,
device: torch.device,
fp16: bool = False,
fuse: bool = True,
verbose: bool = True,
):
"""Initialize the PyTorch backend.
Args:
weight (str | Path | nn.Module): Path to the .pt model file or a pre-loaded nn.Module instance.
device (torch.device): Device to run inference on (e.g., 'cpu', 'cuda:0').
fp16 (bool): Whether to use FP16 half-precision inference.
fuse (bool): Whether to fuse Conv2D + BatchNorm layers for optimization.
verbose (bool): Whether to print verbose model loading messages.
"""
self.fuse = fuse
self.verbose = verbose
super().__init__(weight, device, fp16)
def load_model(self, weight: str | torch.nn.Module) -> None:
"""Load a PyTorch model from a checkpoint file or nn.Module instance.
Args:
weight (str | torch.nn.Module): Path to the .pt checkpoint or a pre-loaded module.
"""
from ultralytics.nn.tasks import load_checkpoint
if isinstance(weight, torch.nn.Module):
if self.fuse and hasattr(weight, "fuse"):
if IS_JETSON and is_jetson(jetpack=5):
weight = weight.to(self.device)
weight = weight.fuse(verbose=self.verbose)
model = weight.to(self.device)
else:
model, _ = load_checkpoint(weight, device=self.device, fuse=self.fuse)
# Extract model attributes
if hasattr(model, "kpt_shape"):
self.kpt_shape = model.kpt_shape
self.stride = max(int(model.stride.max()), 32) if hasattr(model, "stride") else 32
self.names = model.module.names if hasattr(model, "module") else getattr(model, "names", {})
self.channels = model.yaml.get("channels", 3) if hasattr(model, "yaml") else 3
model.half() if self.fp16 else model.float()
for p in model.parameters():
p.requires_grad = False
self.model = model
self.end2end = getattr(model, "end2end", False)
def forward(
self, im: torch.Tensor, augment: bool = False, visualize: bool = False, embed: list | None = None, **kwargs: Any
) -> torch.Tensor | list[torch.Tensor]:
"""Run native PyTorch inference with support for augmentation, visualization, and embeddings.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
augment (bool): Whether to apply test-time augmentation.
visualize (bool): Whether to visualize intermediate feature maps.
embed (list | None): List of layer indices to extract embeddings from, or None.
**kwargs (Any): Additional keyword arguments passed to the model forward method.
Returns:
(torch.Tensor | list[torch.Tensor]): Model predictions as tensor(s).
"""
return self.model(im, augment=augment, visualize=visualize, embed=embed, **kwargs)
class TorchScriptBackend(BaseBackend):
"""PyTorch TorchScript inference backend for serialized model execution.
Loads and runs inference with TorchScript models (.torchscript files) created via torch.jit.trace or
torch.jit.script. Supports FP16 precision and embedded metadata extraction.
"""
def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False):
"""Initialize the TorchScript backend.
Args:
weight (str | Path): Path to the .torchscript model file.
device (torch.device): Device to run inference on (e.g., 'cpu', 'cuda:0').
fp16 (bool): Whether to use FP16 half-precision inference.
"""
super().__init__(weight, device, fp16)
def load_model(self, weight: str) -> None:
"""Load a TorchScript model from a .torchscript file with optional embedded metadata.
Args:
weight (str): Path to the .torchscript model file.
"""
import json
import torchvision # noqa - required for TorchScript model deserialization
LOGGER.info(f"Loading {weight} for TorchScript inference...")
extra_files = {"config.txt": ""}
self.model = torch.jit.load(weight, _extra_files=extra_files, map_location=self.device)
self.model.half() if self.fp16 else self.model.float()
if extra_files["config.txt"]:
self.apply_metadata(json.loads(extra_files["config.txt"], object_hook=lambda x: dict(x.items())))
def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor]:
"""Run TorchScript inference.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(torch.Tensor | list[torch.Tensor]): Model predictions as tensor(s).
"""
return self.model(im)

View file

@ -0,0 +1,70 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import torch
from ultralytics.utils import LOGGER
from ultralytics.utils.checks import check_requirements, is_rockchip
from .base import BaseBackend
class RKNNBackend(BaseBackend):
"""Rockchip RKNN inference backend for Rockchip NPU hardware.
Loads and runs inference with RKNN models (.rknn files) using the RKNN-Toolkit-Lite2 runtime. Only supported on
Rockchip devices with NPU hardware (e.g., RK3588, RK3566).
"""
def load_model(self, weight: str | Path) -> None:
"""Load a Rockchip RKNN model from a .rknn file or model directory.
Args:
weight (str | Path): Path to the .rknn file or directory containing the model.
Raises:
OSError: If not running on a Rockchip device.
RuntimeError: If model loading or runtime initialization fails.
"""
if not is_rockchip():
raise OSError("RKNN inference is only supported on Rockchip devices.")
LOGGER.info(f"Loading {weight} for RKNN inference...")
check_requirements("rknn-toolkit-lite2")
from rknnlite.api import RKNNLite
w = Path(weight)
if not w.is_file():
w = next(w.rglob("*.rknn"))
self.model = RKNNLite()
ret = self.model.load_rknn(str(w))
if ret != 0:
raise RuntimeError(f"Failed to load RKNN model: {ret}")
ret = self.model.init_runtime()
if ret != 0:
raise RuntimeError(f"Failed to init RKNN runtime: {ret}")
# Load metadata
metadata_file = w.parent / "metadata.yaml"
if metadata_file.exists():
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
def forward(self, im: torch.Tensor) -> list:
"""Run inference on the Rockchip NPU.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list): Model predictions as a list of output arrays.
"""
im = (im.cpu().numpy() * 255).astype("uint8")
im = im if isinstance(im, (list, tuple)) else [im]
return self.model.inference(inputs=im)

View file

@ -0,0 +1,183 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
import ast
import json
import platform
import zipfile
from pathlib import Path
import numpy as np
import torch
from ultralytics.utils import LOGGER
from .base import BaseBackend
class TensorFlowBackend(BaseBackend):
"""Google TensorFlow inference backend supporting multiple serialization formats.
Loads and runs inference with Google TensorFlow models in SavedModel, GraphDef (.pb), TFLite (.tflite), and Edge TPU
formats. Handles quantized model dequantization and task-specific output formatting.
"""
def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False, format: str = "saved_model"):
"""Initialize the Google TensorFlow backend.
Args:
weight (str | Path): Path to the SavedModel directory, .pb file, or .tflite file.
device (torch.device): Device to run inference on.
fp16 (bool): Whether to use FP16 half-precision inference.
format (str): Model format, one of "saved_model", "pb", "tflite", or "edgetpu".
"""
assert format in {"saved_model", "pb", "tflite", "edgetpu"}, f"Unsupported TensorFlow format: {format}."
self.format = format
super().__init__(weight, device, fp16)
def load_model(self, weight: str | Path) -> None:
"""Load a Google TensorFlow model in SavedModel, GraphDef, TFLite, or Edge TPU format.
Args:
weight (str | Path): Path to the model file or directory.
"""
import tensorflow as tf
if self.format == "saved_model":
LOGGER.info(f"Loading {weight} for TensorFlow SavedModel inference...")
self.model = tf.saved_model.load(weight)
# Load metadata
metadata_file = Path(weight) / "metadata.yaml"
if metadata_file.exists():
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
elif self.format == "pb":
LOGGER.info(f"Loading {weight} for TensorFlow GraphDef inference...")
from ultralytics.utils.export.tensorflow import gd_outputs
def wrap_frozen_graph(gd, inputs, outputs):
"""Wrap a TensorFlow frozen graph for inference by pruning to specified input/output nodes."""
x = tf.compat.v1.wrap_function(lambda: tf.compat.v1.import_graph_def(gd, name=""), [])
ge = x.graph.as_graph_element
return x.prune(tf.nest.map_structure(ge, inputs), tf.nest.map_structure(ge, outputs))
gd = tf.Graph().as_graph_def()
with open(weight, "rb") as f:
gd.ParseFromString(f.read())
self.frozen_func = wrap_frozen_graph(gd, inputs="x:0", outputs=gd_outputs(gd))
# Try to find metadata
try:
metadata_file = next(
Path(weight).resolve().parent.rglob(f"{Path(weight).stem}_saved_model*/metadata.yaml")
)
from ultralytics.utils import YAML
self.apply_metadata(YAML.load(metadata_file))
except StopIteration:
pass
else: # tflite and edgetpu
try:
from tflite_runtime.interpreter import Interpreter, load_delegate
self.tf = None
except ImportError:
import tensorflow as tf
self.tf = tf
Interpreter, load_delegate = tf.lite.Interpreter, tf.lite.experimental.load_delegate
if self.format == "edgetpu":
device = self.device[3:] if str(self.device).startswith("tpu") else ":0"
LOGGER.info(f"Loading {weight} on device {device[1:]} for TensorFlow Lite Edge TPU inference...")
delegate = {"Linux": "libedgetpu.so.1", "Darwin": "libedgetpu.1.dylib", "Windows": "edgetpu.dll"}[
platform.system()
]
self.interpreter = Interpreter(
model_path=str(weight),
experimental_delegates=[load_delegate(delegate, options={"device": device})],
)
self.device = torch.device("cpu") # Edge TPU runs on CPU from PyTorch's perspective
else:
LOGGER.info(f"Loading {weight} for TensorFlow Lite inference...")
self.interpreter = Interpreter(model_path=weight)
self.interpreter.allocate_tensors()
self.input_details = self.interpreter.get_input_details()
self.output_details = self.interpreter.get_output_details()
# Load metadata
try:
with zipfile.ZipFile(weight, "r") as zf:
name = zf.namelist()[0]
contents = zf.read(name).decode("utf-8")
if name == "metadata.json":
self.apply_metadata(json.loads(contents))
else:
self.apply_metadata(ast.literal_eval(contents))
except (zipfile.BadZipFile, SyntaxError, ValueError, json.JSONDecodeError):
pass
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
"""Run Google TensorFlow inference with format-specific execution and output post-processing.
Args:
im (torch.Tensor): Input image tensor in BHWC format (converted from BCHW by AutoBackend).
Returns:
(list[np.ndarray]): Model predictions as a list of numpy arrays.
"""
im = im.cpu().numpy()
if self.format == "saved_model":
y = self.model.serving_default(im)
if not isinstance(y, list):
y = [y]
elif self.format == "pb":
import tensorflow as tf
y = self.frozen_func(x=tf.constant(im))
else:
h, w = im.shape[1:3]
details = self.input_details[0]
is_int = details["dtype"] in {np.int8, np.int16}
if is_int:
scale, zero_point = details["quantization"]
im = (im / scale + zero_point).astype(details["dtype"])
self.interpreter.set_tensor(details["index"], im)
self.interpreter.invoke()
y = []
for output in self.output_details:
x = self.interpreter.get_tensor(output["index"])
if is_int:
scale, zero_point = output["quantization"]
x = (x.astype(np.float32) - zero_point) * scale
if x.ndim == 3:
# Denormalize xywh by image size
if x.shape[-1] == 6 or self.end2end:
x[:, :, [0, 2]] *= w
x[:, :, [1, 3]] *= h
if self.task == "pose":
x[:, :, 6::3] *= w
x[:, :, 7::3] *= h
else:
x[:, [0, 2]] *= w
x[:, [1, 3]] *= h
if self.task == "pose":
x[:, 5::3] *= w
x[:, 6::3] *= h
y.append(x)
if self.task == "segment": # segment with (det, proto) output order reversed
if len(y[1].shape) != 4:
y = list(reversed(y)) # should be y = (1, 116, 8400), (1, 160, 160, 32)
if y[1].shape[-1] == 6: # end-to-end model
y = [y[1]]
else:
y[1] = np.transpose(y[1], (0, 3, 1, 2)) # should be y = (1, 116, 8400), (1, 32, 160, 160)
return [x if isinstance(x, np.ndarray) else x.numpy() for x in y]

View file

@ -0,0 +1,144 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
import json
from collections import OrderedDict, namedtuple
from pathlib import Path
import numpy as np
import torch
from ultralytics.utils import IS_JETSON, LINUX, LOGGER, PYTHON_VERSION
from ultralytics.utils.checks import check_requirements, check_version
from .base import BaseBackend
class TensorRTBackend(BaseBackend):
"""NVIDIA TensorRT inference backend for GPU-accelerated deployment.
Loads and runs inference with NVIDIA TensorRT serialized engines (.engine files). Supports both TensorRT 7-9 and
TensorRT 10+ APIs, dynamic input shapes, FP16 precision, and DLA core offloading.
"""
def load_model(self, weight: str | Path) -> None:
"""Load an NVIDIA TensorRT engine from a serialized .engine file.
Args:
weight (str | Path): Path to the .engine file with optional embedded metadata.
"""
LOGGER.info(f"Loading {weight} for TensorRT inference...")
if IS_JETSON and check_version(PYTHON_VERSION, "<=3.8.10"):
check_requirements("numpy==1.23.5")
try:
import tensorrt as trt
except ImportError:
if LINUX:
check_requirements("tensorrt>7.0.0,!=10.1.0")
import tensorrt as trt
check_version(trt.__version__, ">=7.0.0", hard=True)
check_version(trt.__version__, "!=10.1.0", msg="https://github.com/ultralytics/ultralytics/pull/14239")
if self.device.type == "cpu":
self.device = torch.device("cuda:0")
Binding = namedtuple("Binding", ("name", "dtype", "shape", "data", "ptr"))
logger = trt.Logger(trt.Logger.INFO)
# Read engine file
with open(weight, "rb") as f, trt.Runtime(logger) as runtime:
try:
meta_len = int.from_bytes(f.read(4), byteorder="little")
metadata = json.loads(f.read(meta_len).decode("utf-8"))
dla = metadata.get("dla", None)
if dla is not None:
runtime.DLA_core = int(dla)
except UnicodeDecodeError:
f.seek(0)
metadata = None
engine = runtime.deserialize_cuda_engine(f.read())
self.apply_metadata(metadata)
try:
self.context = engine.create_execution_context()
except Exception as e:
LOGGER.error("TensorRT model exported with a different version than expected\n")
raise e
# Setup bindings
self.bindings = OrderedDict()
self.output_names = []
self.fp16 = False
self.dynamic = False
self.is_trt10 = not hasattr(engine, "num_bindings")
num = range(engine.num_io_tensors) if self.is_trt10 else range(engine.num_bindings)
for i in num:
if self.is_trt10:
name = engine.get_tensor_name(i)
dtype = trt.nptype(engine.get_tensor_dtype(name))
is_input = engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT
shape = tuple(engine.get_tensor_shape(name))
profile_shape = tuple(engine.get_tensor_profile_shape(name, 0)[2]) if is_input else None
else:
name = engine.get_binding_name(i)
dtype = trt.nptype(engine.get_binding_dtype(i))
is_input = engine.binding_is_input(i)
shape = tuple(engine.get_binding_shape(i))
profile_shape = tuple(engine.get_profile_shape(0, i)[1]) if is_input else None
if is_input:
if -1 in shape:
self.dynamic = True
if self.is_trt10:
self.context.set_input_shape(name, profile_shape)
else:
self.context.set_binding_shape(i, profile_shape)
if dtype == np.float16:
self.fp16 = True
else:
self.output_names.append(name)
shape = (
tuple(self.context.get_tensor_shape(name))
if self.is_trt10
else tuple(self.context.get_binding_shape(i))
)
im = torch.from_numpy(np.empty(shape, dtype=dtype)).to(self.device)
self.bindings[name] = Binding(name, dtype, shape, im, int(im.data_ptr()))
self.binding_addrs = OrderedDict((n, d.ptr) for n, d in self.bindings.items())
self.model = engine
def forward(self, im: torch.Tensor) -> list[torch.Tensor]:
"""Run NVIDIA TensorRT inference with dynamic shape handling.
Args:
im (torch.Tensor): Input image tensor in BCHW format on the CUDA device.
Returns:
(list[torch.Tensor]): Model predictions as a list of tensors on the CUDA device.
"""
if self.dynamic and im.shape != self.bindings["images"].shape:
if self.is_trt10:
self.context.set_input_shape("images", im.shape)
self.bindings["images"] = self.bindings["images"]._replace(shape=im.shape)
for name in self.output_names:
self.bindings[name].data.resize_(tuple(self.context.get_tensor_shape(name)))
else:
i = self.model.get_binding_index("images")
self.context.set_binding_shape(i, im.shape)
self.bindings["images"] = self.bindings["images"]._replace(shape=im.shape)
for name in self.output_names:
i = self.model.get_binding_index(name)
self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))
s = self.bindings["images"].shape
assert im.shape == s, f"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}"
self.binding_addrs["images"] = int(im.data_ptr())
self.context.execute_v2(list(self.binding_addrs.values()))
return [self.bindings[x].data for x in sorted(self.output_names)]

View file

@ -0,0 +1,45 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from __future__ import annotations
from pathlib import Path
import torch
from ultralytics.utils.checks import check_requirements
from .base import BaseBackend
class TritonBackend(BaseBackend):
"""NVIDIA Triton Inference Server backend for remote model serving.
Connects to and runs inference with models hosted on an NVIDIA Triton Inference Server instance via HTTP or gRPC
protocols. The model is specified using a triton:// URL scheme.
"""
def load_model(self, weight: str | Path) -> None:
"""Connect to a remote model on an NVIDIA Triton Inference Server.
Args:
weight (str | Path): Triton model URL (e.g., 'http://localhost:8000/model_name').
"""
check_requirements("tritonclient[all]")
from ultralytics.utils.triton import TritonRemoteModel
self.model = TritonRemoteModel(weight)
# Copy metadata from Triton model
if hasattr(self.model, "metadata"):
self.apply_metadata(self.model.metadata)
def forward(self, im: torch.Tensor) -> list:
"""Run inference via the NVIDIA Triton Inference Server.
Args:
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
Returns:
(list): Model predictions as a list of numpy arrays from the Triton server.
"""
return self.model(im.cpu().numpy())

View file

@ -253,7 +253,7 @@ class BaseSolution:
f" {', '.join([f'{v} {self.names[k]}' for k, v in counts.items()])}\n"
f"Speed: {track_or_predict_speed:.1f}ms {track_or_predict}, "
f"{solution_speed:.1f}ms solution per image at shape "
f"(1, {getattr(self.model, 'ch', 3)}, {result.plot_im.shape[0]}, {result.plot_im.shape[1]})\n"
f"(1, {getattr(self.model, 'channels', 3)}, {result.plot_im.shape[0]}, {result.plot_im.shape[1]})\n"
)
return result