mirror of
https://github.com/ultralytics/ultralytics
synced 2026-04-21 14:07:18 +00:00
ultralytics 8.4.23 Refactor AutoBackend into modular per-backend classes (#23790)
Signed-off-by: Jing Qiu <61612323+Laughing-q@users.noreply.github.com> Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Ultralytics Assistant <135830346+UltralyticsAssistant@users.noreply.github.com> Co-authored-by: Lakshantha Dissanayake <lakshantha@ultralytics.com> Co-authored-by: Onuralp SEZER <onuralp@ultralytics.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This commit is contained in:
parent
b3c79532e3
commit
b10fa7be23
38 changed files with 1844 additions and 773 deletions
16
docs/en/reference/nn/backends/axelera.md
Normal file
16
docs/en/reference/nn/backends/axelera.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore AxeleraBackend for Axelera hardware inference, deploying YOLO models on Axelera AI accelerators with optimized performance.
|
||||
keywords: Ultralytics, AxeleraBackend, Axelera inference, AI accelerator, hardware inference, edge AI, deep learning acceleration
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/axelera.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/axelera.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/axelera.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.axelera.AxeleraBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/base.md
Normal file
16
docs/en/reference/nn/backends/base.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore the BaseBackend class, the abstract foundation for all inference backends in Ultralytics, defining the interface for model loading and inference.
|
||||
keywords: Ultralytics, BaseBackend, inference backend, abstract class, model loading, deep learning, neural network inference
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/base.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/base.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/base.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.base.BaseBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/coreml.md
Normal file
16
docs/en/reference/nn/backends/coreml.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore CoreMLBackend for Apple CoreML inference, enabling efficient YOLO model deployment on iOS, macOS, and Apple Silicon devices.
|
||||
keywords: Ultralytics, CoreMLBackend, CoreML inference, Apple CoreML, iOS deployment, macOS inference, Apple Silicon, mobile AI
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/coreml.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/coreml.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/coreml.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.coreml.CoreMLBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/executorch.md
Normal file
16
docs/en/reference/nn/backends/executorch.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore ExecuTorchBackend for Meta ExecuTorch inference, enabling efficient PyTorch model deployment on mobile and edge devices.
|
||||
keywords: Ultralytics, ExecuTorchBackend, ExecuTorch inference, Meta ExecuTorch, mobile inference, edge deployment, PyTorch Mobile
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/executorch.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/executorch.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/executorch.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.executorch.ExecuTorchBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/mnn.md
Normal file
16
docs/en/reference/nn/backends/mnn.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore MNNBackend for Alibaba MNN inference, enabling lightweight and efficient model deployment on mobile and edge devices.
|
||||
keywords: Ultralytics, MNNBackend, MNN inference, Alibaba MNN, mobile inference, edge AI, .mnn models, deep learning
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/mnn.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/mnn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/mnn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.mnn.MNNBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/ncnn.md
Normal file
16
docs/en/reference/nn/backends/ncnn.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore NCNNBackend for Tencent NCNN inference, optimized for mobile and embedded platforms with Vulkan acceleration support.
|
||||
keywords: Ultralytics, NCNNBackend, NCNN inference, Tencent NCNN, mobile inference, Vulkan acceleration, embedded AI, deep learning
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/ncnn.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/ncnn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/ncnn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.ncnn.NCNNBackend
|
||||
|
||||
<br><br>
|
||||
20
docs/en/reference/nn/backends/onnx.md
Normal file
20
docs/en/reference/nn/backends/onnx.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
---
|
||||
description: Explore ONNXBackend and ONNXIMXBackend for Microsoft ONNX Runtime inference, supporting standard ONNX models and NXP IMX-optimized variants.
|
||||
keywords: Ultralytics, ONNXBackend, ONNXIMXBackend, Microsoft ONNX Runtime, Sony IMX, ONNX inference, edge deployment, deep learning
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/onnx.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/onnx.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/onnx.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.onnx.ONNXBackend
|
||||
|
||||
<br><br><hr><br>
|
||||
|
||||
## ::: ultralytics.nn.backends.onnx.ONNXIMXBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/openvino.md
Normal file
16
docs/en/reference/nn/backends/openvino.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore OpenVINOBackend for optimized inference on Intel hardware, supporting OpenVINO IR models for efficient deployment on CPUs, GPUs, and VPUs.
|
||||
keywords: Ultralytics, OpenVINOBackend, OpenVINO inference, Intel OpenVINO, CPU inference, VPU, edge AI, deep learning optimization
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/openvino.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/openvino.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/openvino.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.openvino.OpenVINOBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/paddle.md
Normal file
16
docs/en/reference/nn/backends/paddle.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore PaddleBackend for Baidu PaddlePaddle inference, supporting deployment with Paddle Inference engine on various hardware platforms.
|
||||
keywords: Ultralytics, PaddleBackend, PaddlePaddle inference, Baidu Paddle, Paddle Inference, deep learning, model deployment
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/paddle.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/paddle.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/paddle.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.paddle.PaddleBackend
|
||||
|
||||
<br><br>
|
||||
20
docs/en/reference/nn/backends/pytorch.md
Normal file
20
docs/en/reference/nn/backends/pytorch.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
---
|
||||
description: Explore PyTorchBackend and TorchScriptBackend for native PyTorch and TorchScript model inference in Ultralytics YOLO models.
|
||||
keywords: Ultralytics, PyTorchBackend, TorchScriptBackend, PyTorch inference, TorchScript inference, .pt models, deep learning, YOLO
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/pytorch.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/pytorch.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/pytorch.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.pytorch.PyTorchBackend
|
||||
|
||||
<br><br><hr><br>
|
||||
|
||||
## ::: ultralytics.nn.backends.pytorch.TorchScriptBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/rknn.md
Normal file
16
docs/en/reference/nn/backends/rknn.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore RKNNBackend for Rockchip RKNN inference, enabling optimized YOLO deployment on Rockchip NPU-equipped edge devices.
|
||||
keywords: Ultralytics, RKNNBackend, RKNN inference, Rockchip RKNN, NPU inference, edge AI, embedded deployment, deep learning
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/rknn.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/rknn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/rknn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.rknn.RKNNBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/tensorflow.md
Normal file
16
docs/en/reference/nn/backends/tensorflow.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore TensorFlowBackend for Google TensorFlow inference including SavedModel, GraphDef, TFLite, and Edge TPU formats.
|
||||
keywords: Ultralytics, TensorFlowBackend, Google TensorFlow, TFLite, Edge TPU, SavedModel, GraphDef, deep learning, model deployment
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/tensorflow.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorflow.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorflow.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.tensorflow.TensorFlowBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/tensorrt.md
Normal file
16
docs/en/reference/nn/backends/tensorrt.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore TensorRTBackend for high-performance GPU inference with NVIDIA TensorRT, optimizing YOLO models for production deployment.
|
||||
keywords: Ultralytics, TensorRTBackend, TensorRT inference, NVIDIA TensorRT, GPU inference, .engine models, production deployment, deep learning
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/tensorrt.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorrt.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorrt.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.tensorrt.TensorRTBackend
|
||||
|
||||
<br><br>
|
||||
16
docs/en/reference/nn/backends/triton.md
Normal file
16
docs/en/reference/nn/backends/triton.md
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
---
|
||||
description: Explore TritonBackend for NVIDIA Triton Inference Server, enabling scalable cloud and edge deployment of YOLO models.
|
||||
keywords: Ultralytics, TritonBackend, Triton Inference Server, NVIDIA Triton, cloud inference, model serving, scalable deployment
|
||||
---
|
||||
|
||||
# Reference for `ultralytics/nn/backends/triton.py`
|
||||
|
||||
!!! success "Improvements"
|
||||
|
||||
This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/triton.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/triton.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
|
||||
|
||||
<br>
|
||||
|
||||
## ::: ultralytics.nn.backends.triton.TritonBackend
|
||||
|
||||
<br><br>
|
||||
15
mkdocs.yml
15
mkdocs.yml
|
|
@ -745,6 +745,21 @@ nav:
|
|||
- val: reference/models/yolo/yoloe/val.md
|
||||
- nn:
|
||||
- autobackend: reference/nn/autobackend.md
|
||||
- backends:
|
||||
- axelera: reference/nn/backends/axelera.md
|
||||
- base: reference/nn/backends/base.md
|
||||
- coreml: reference/nn/backends/coreml.md
|
||||
- executorch: reference/nn/backends/executorch.md
|
||||
- mnn: reference/nn/backends/mnn.md
|
||||
- ncnn: reference/nn/backends/ncnn.md
|
||||
- onnx: reference/nn/backends/onnx.md
|
||||
- openvino: reference/nn/backends/openvino.md
|
||||
- paddle: reference/nn/backends/paddle.md
|
||||
- pytorch: reference/nn/backends/pytorch.md
|
||||
- rknn: reference/nn/backends/rknn.md
|
||||
- tensorflow: reference/nn/backends/tensorflow.md
|
||||
- tensorrt: reference/nn/backends/tensorrt.md
|
||||
- triton: reference/nn/backends/triton.md
|
||||
- modules:
|
||||
- activation: reference/nn/modules/activation.md
|
||||
- block: reference/nn/modules/block.md
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
__version__ = "8.4.22"
|
||||
__version__ = "8.4.23"
|
||||
|
||||
import importlib
|
||||
import os
|
||||
|
|
|
|||
|
|
@ -195,7 +195,7 @@ class BasePredictor:
|
|||
self.imgsz,
|
||||
auto=same_shapes
|
||||
and self.args.rect
|
||||
and (self.model.pt or (getattr(self.model, "dynamic", False) and not self.model.imx)),
|
||||
and (self.model.format == "pt" or (getattr(self.model, "dynamic", False) and self.model.format != "imx")),
|
||||
stride=self.model.stride,
|
||||
)
|
||||
return [letterbox(image=x) for x in im]
|
||||
|
|
@ -258,7 +258,7 @@ class BasePredictor:
|
|||
batch=self.args.batch,
|
||||
vid_stride=self.args.vid_stride,
|
||||
buffer=self.args.stream_buffer,
|
||||
channels=getattr(self.model, "ch", 3),
|
||||
channels=getattr(self.model, "channels", 3),
|
||||
)
|
||||
self.source_type = self.dataset.source_type
|
||||
if (
|
||||
|
|
@ -305,7 +305,11 @@ class BasePredictor:
|
|||
# Warmup model
|
||||
if not self.done_warmup:
|
||||
self.model.warmup(
|
||||
imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, self.model.ch, *self.imgsz)
|
||||
imgsz=(
|
||||
1 if self.model.format in {"pt", "triton"} else self.dataset.bs,
|
||||
self.model.channels,
|
||||
*self.imgsz,
|
||||
)
|
||||
)
|
||||
self.done_warmup = True
|
||||
|
||||
|
|
@ -372,7 +376,7 @@ class BasePredictor:
|
|||
t = tuple(x.t / self.seen * 1e3 for x in profilers) # speeds per image
|
||||
LOGGER.info(
|
||||
f"Speed: %.1fms preprocess, %.1fms inference, %.1fms postprocess per image at shape "
|
||||
f"{(min(self.args.batch, self.seen), getattr(self.model, 'ch', 3), *im.shape[2:])}" % t
|
||||
f"{(min(self.args.batch, self.seen), getattr(self.model, 'channels', 3), *im.shape[2:])}" % t
|
||||
)
|
||||
if self.args.save or self.args.save_txt or self.args.save_crop:
|
||||
nl = len(list(self.save_dir.glob("labels/*.txt"))) # number of labels
|
||||
|
|
|
|||
|
|
@ -172,9 +172,10 @@ class BaseValidator:
|
|||
)
|
||||
self.device = model.device # update device
|
||||
self.args.half = model.fp16 # update half
|
||||
stride, pt, jit = model.stride, model.pt, model.jit
|
||||
stride, fmt = model.stride, model.format
|
||||
pt = fmt == "pt"
|
||||
imgsz = check_imgsz(self.args.imgsz, stride=stride)
|
||||
if not (pt or jit or getattr(model, "dynamic", False)):
|
||||
if fmt not in {"pt", "torchscript"} and not getattr(model, "dynamic", False):
|
||||
self.args.batch = model.metadata.get("batch", 1) # export.py models default to batch-size 1
|
||||
LOGGER.info(f"Setting batch={self.args.batch} input of shape ({self.args.batch}, 3, {imgsz}, {imgsz})")
|
||||
|
||||
|
|
@ -187,7 +188,7 @@ class BaseValidator:
|
|||
|
||||
if self.device.type in {"cpu", "mps"}:
|
||||
self.args.workers = 0 # faster CPU val as time dominated by inference, not dataloading
|
||||
if not (pt or (getattr(model, "dynamic", False) and not model.imx)):
|
||||
if not (pt or (getattr(model, "dynamic", False) and fmt != "imx")):
|
||||
self.args.rect = False
|
||||
self.stride = model.stride # used in get_dataloader() for padding
|
||||
self.dataloader = self.dataloader or self.get_dataloader(self.data.get(self.args.split), self.args.batch)
|
||||
|
|
|
|||
|
|
@ -462,8 +462,7 @@ class Predictor(BasePredictor):
|
|||
self.std = torch.tensor([58.395, 57.12, 57.375]).view(-1, 1, 1).to(device)
|
||||
|
||||
# Ultralytics compatibility settings
|
||||
self.model.pt = False
|
||||
self.model.triton = False
|
||||
self.model.format = "sam"
|
||||
self.model.stride = 32
|
||||
self.model.fp16 = self.args.half
|
||||
self.done_warmup = True
|
||||
|
|
|
|||
|
|
@ -59,7 +59,7 @@ class ClassificationPredictor(BasePredictor):
|
|||
else False
|
||||
)
|
||||
self.transforms = (
|
||||
classify_transforms(self.imgsz) if updated or not self.model.pt else self.model.model.transforms
|
||||
classify_transforms(self.imgsz) if updated or self.model.format != "pt" else self.model.model.transforms
|
||||
)
|
||||
|
||||
def preprocess(self, img):
|
||||
|
|
|
|||
|
|
@ -63,7 +63,7 @@ class WorldTrainerFromScratch(WorldTrainer):
|
|||
Args:
|
||||
cfg (dict): Configuration dictionary with default parameters for model training.
|
||||
overrides (dict, optional): Dictionary of parameter overrides to customize the configuration.
|
||||
_callbacks (dict, optional): Dictionary of callback functions to be executed during different stages of training.
|
||||
_callbacks (dict, optional): Dictionary of callback functions to run during different stages of training.
|
||||
"""
|
||||
if overrides is None:
|
||||
overrides = {}
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
41
ultralytics/nn/backends/__init__.py
Normal file
41
ultralytics/nn/backends/__init__.py
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
"""Ultralytics YOLO inference backends.
|
||||
|
||||
This package provides modular inference backends for various deep learning frameworks and hardware accelerators.
|
||||
Each backend implements the `BaseBackend` interface and can be used independently or through the unified
|
||||
`AutoBackend` dispatcher for automatic format detection and inference routing.
|
||||
"""
|
||||
|
||||
from .axelera import AxeleraBackend
|
||||
from .base import BaseBackend
|
||||
from .coreml import CoreMLBackend
|
||||
from .executorch import ExecuTorchBackend
|
||||
from .mnn import MNNBackend
|
||||
from .ncnn import NCNNBackend
|
||||
from .onnx import ONNXBackend, ONNXIMXBackend
|
||||
from .openvino import OpenVINOBackend
|
||||
from .paddle import PaddleBackend
|
||||
from .pytorch import PyTorchBackend, TorchScriptBackend
|
||||
from .rknn import RKNNBackend
|
||||
from .tensorflow import TensorFlowBackend
|
||||
from .tensorrt import TensorRTBackend
|
||||
from .triton import TritonBackend
|
||||
|
||||
__all__ = [
|
||||
"AxeleraBackend",
|
||||
"BaseBackend",
|
||||
"CoreMLBackend",
|
||||
"ExecuTorchBackend",
|
||||
"MNNBackend",
|
||||
"NCNNBackend",
|
||||
"ONNXBackend",
|
||||
"ONNXIMXBackend",
|
||||
"OpenVINOBackend",
|
||||
"PaddleBackend",
|
||||
"PyTorchBackend",
|
||||
"RKNNBackend",
|
||||
"TensorFlowBackend",
|
||||
"TensorRTBackend",
|
||||
"TorchScriptBackend",
|
||||
"TritonBackend",
|
||||
]
|
||||
69
ultralytics/nn/backends/axelera.py
Normal file
69
ultralytics/nn/backends/axelera.py
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class AxeleraBackend(BaseBackend):
|
||||
"""Axelera AI inference backend for Axelera Metis AI accelerators.
|
||||
|
||||
Loads compiled Axelera models (.axm files) and runs inference using the Axelera AI runtime SDK. Requires the Axelera
|
||||
runtime environment to be activated before use.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load an Axelera model from a directory containing a .axm file.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the Axelera model directory containing the .axm binary.
|
||||
"""
|
||||
if not os.environ.get("AXELERA_RUNTIME_DIR"):
|
||||
LOGGER.warning(
|
||||
"Axelera runtime environment is not activated.\n"
|
||||
"Please run: source /opt/axelera/sdk/latest/axelera_activate.sh\n\n"
|
||||
"If this fails, verify driver installation: "
|
||||
"https://docs.ultralytics.com/integrations/axelera/#axelera-driver-installation"
|
||||
)
|
||||
|
||||
try:
|
||||
from axelera.runtime import op
|
||||
except ImportError:
|
||||
check_requirements(
|
||||
"axelera_runtime2==0.1.2",
|
||||
cmds="--extra-index-url https://software.axelera.ai/artifactory/axelera-runtime-pypi",
|
||||
)
|
||||
from axelera.runtime import op
|
||||
|
||||
w = Path(weight)
|
||||
found = next(w.rglob("*.axm"), None)
|
||||
if found is None:
|
||||
raise FileNotFoundError(f"No .axm file found in: {w}")
|
||||
|
||||
self.model = op.load(str(found))
|
||||
|
||||
# Load metadata
|
||||
metadata_file = found.parent / "metadata.yaml"
|
||||
if metadata_file.exists():
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list:
|
||||
"""Run inference on the Axelera hardware accelerator.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list): Model predictions as a list of output arrays.
|
||||
"""
|
||||
return self.model(im.cpu())
|
||||
104
ultralytics/nn/backends/base.py
Normal file
104
ultralytics/nn/backends/base.py
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import ast
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
import torch
|
||||
|
||||
|
||||
class BaseBackend(ABC):
|
||||
"""Base class for all inference backends.
|
||||
|
||||
This abstract class defines the interface that all inference backends must implement. It provides common
|
||||
functionality for model loading, metadata processing, and device management.
|
||||
|
||||
Attributes:
|
||||
model: The underlying inference model or runtime session.
|
||||
device (torch.device): The device to run inference on.
|
||||
fp16 (bool): Whether to use FP16 (half-precision) inference.
|
||||
nhwc (bool): Whether the model expects NHWC input format instead of NCHW.
|
||||
stride (int): Model stride, typically 32 for YOLO models.
|
||||
names (dict): Dictionary mapping class indices to class names.
|
||||
task (str | None): The task type (detect, segment, classify, pose, obb).
|
||||
batch (int): Batch size for inference.
|
||||
imgsz (tuple): Input image size as (height, width).
|
||||
channels (int): Number of input channels, typically 3 for RGB.
|
||||
end2end (bool): Whether the model includes end-to-end NMS post-processing.
|
||||
dynamic (bool): Whether the model supports dynamic input shapes.
|
||||
metadata (dict): Model metadata dictionary containing export configuration.
|
||||
"""
|
||||
|
||||
def __init__(self, weight: str | torch.nn.Module, device: torch.device | str, fp16: bool = False):
|
||||
"""Initialize the base backend with common attributes and load the model.
|
||||
|
||||
Args:
|
||||
weight (str | torch.nn.Module): Path to the model weights file or a PyTorch module instance.
|
||||
device (torch.device | str): Device to run inference on (e.g., 'cpu', 'cuda:0').
|
||||
fp16 (bool): Whether to use FP16 half-precision inference.
|
||||
"""
|
||||
self.device = device
|
||||
self.fp16 = fp16
|
||||
self.nhwc = False
|
||||
self.stride = 32
|
||||
self.names = {}
|
||||
self.task = None
|
||||
self.batch = 1
|
||||
self.channels = 3
|
||||
self.end2end = False
|
||||
self.dynamic = False
|
||||
self.metadata = {}
|
||||
self.model = None
|
||||
self.load_model(weight)
|
||||
|
||||
@abstractmethod
|
||||
def load_model(self, weight: str | torch.nn.Module) -> None:
|
||||
"""Load the model from a weights file or module instance.
|
||||
|
||||
Args:
|
||||
weight (str | torch.nn.Module): Path to model weights or a PyTorch module.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@abstractmethod
|
||||
def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor]:
|
||||
"""Run inference on the input image tensor.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(torch.Tensor | list[torch.Tensor]): Model output as a single tensor or list of tensors.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
def apply_metadata(self, metadata: dict | None) -> None:
|
||||
"""Process and apply model metadata to backend attributes.
|
||||
|
||||
Handles type conversions for common metadata fields (e.g., stride, batch, names) and sets them as
|
||||
instance attributes. Also resolves end-to-end NMS and dynamic shape settings from export args.
|
||||
|
||||
Args:
|
||||
metadata (dict | None): Dictionary containing metadata key-value pairs from model export.
|
||||
"""
|
||||
if not metadata:
|
||||
return
|
||||
|
||||
# Store raw metadata
|
||||
self.metadata = metadata
|
||||
|
||||
# Process type conversions for known fields
|
||||
for k, v in metadata.items():
|
||||
if k in {"stride", "batch", "channels"}:
|
||||
metadata[k] = int(v)
|
||||
elif k in {"imgsz", "names", "kpt_shape", "kpt_names", "args", "end2end"} and isinstance(v, str):
|
||||
metadata[k] = ast.literal_eval(v)
|
||||
|
||||
# Handle models exported with end-to-end NMS
|
||||
metadata["end2end"] = metadata.get("end2end", False) or metadata.get("args", {}).get("nms", False)
|
||||
metadata["dynamic"] = metadata.get("args", {}).get("dynamic", self.dynamic)
|
||||
|
||||
# Apply all metadata fields as backend attributes
|
||||
for k, v in metadata.items():
|
||||
setattr(self, k, v)
|
||||
64
ultralytics/nn/backends/coreml.py
Normal file
64
ultralytics/nn/backends/coreml.py
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
from PIL import Image
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class CoreMLBackend(BaseBackend):
|
||||
"""CoreML inference backend for Apple hardware.
|
||||
|
||||
Loads and runs inference with CoreML models (.mlpackage files) using the coremltools library. Supports both static
|
||||
and dynamic input shapes and handles NMS-included model outputs.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load a CoreML model from a .mlpackage file.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .mlpackage model file.
|
||||
"""
|
||||
check_requirements(["coremltools>=9.0", "numpy>=1.14.5,<=2.3.5"])
|
||||
import coremltools as ct
|
||||
|
||||
LOGGER.info(f"Loading {weight} for CoreML inference...")
|
||||
self.model = ct.models.MLModel(weight)
|
||||
self.dynamic = self.model.get_spec().description.input[0].type.HasField("multiArrayType")
|
||||
|
||||
# Load metadata
|
||||
self.apply_metadata(dict(self.model.user_defined_metadata))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> np.ndarray | list[np.ndarray]:
|
||||
"""Run CoreML inference with automatic input format handling.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BHWC format (converted from BCHW by AutoBackend).
|
||||
|
||||
Returns:
|
||||
(np.ndarray | list[np.ndarray]): Model predictions as numpy array(s).
|
||||
"""
|
||||
im = im.cpu().numpy()
|
||||
h, w = im.shape[1:3]
|
||||
|
||||
im = im.transpose(0, 3, 1, 2) if self.dynamic else Image.fromarray((im[0] * 255).astype("uint8"))
|
||||
y = self.model.predict({"image": im})
|
||||
if "confidence" in y: # NMS included
|
||||
from ultralytics.utils.ops import xywh2xyxy
|
||||
|
||||
box = xywh2xyxy(y["coordinates"] * [[w, h, w, h]])
|
||||
cls = y["confidence"].argmax(1, keepdims=True)
|
||||
y = np.concatenate((box, np.take_along_axis(y["confidence"], cls, axis=1), cls), 1)[None]
|
||||
else:
|
||||
y = list(y.values())
|
||||
if len(y) == 2 and len(y[1].shape) != 4: # segmentation model
|
||||
y = list(reversed(y))
|
||||
return y
|
||||
59
ultralytics/nn/backends/executorch.py
Normal file
59
ultralytics/nn/backends/executorch.py
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_executorch_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class ExecuTorchBackend(BaseBackend):
|
||||
"""Meta ExecuTorch inference backend for on-device deployment.
|
||||
|
||||
Loads and runs inference with Meta ExecuTorch models (.pte files) using the ExecuTorch runtime. Supports both
|
||||
standalone .pte files and directory-based model packages with metadata.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load an ExecuTorch model from a .pte file or directory.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .pte model file or directory containing the model.
|
||||
"""
|
||||
LOGGER.info(f"Loading {weight} for ExecuTorch inference...")
|
||||
check_executorch_requirements()
|
||||
|
||||
from executorch.runtime import Runtime
|
||||
|
||||
w = Path(weight)
|
||||
if w.is_dir():
|
||||
model_file = next(w.rglob("*.pte"))
|
||||
metadata_file = w / "metadata.yaml"
|
||||
else:
|
||||
model_file = w
|
||||
metadata_file = w.parent / "metadata.yaml"
|
||||
|
||||
program = Runtime.get().load_program(str(model_file))
|
||||
self.model = program.load_method("forward")
|
||||
|
||||
# Load metadata
|
||||
if metadata_file.exists():
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list:
|
||||
"""Run inference using the ExecuTorch runtime.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list): Model predictions as a list of ExecuTorch output values.
|
||||
"""
|
||||
return self.model.execute([im])
|
||||
59
ultralytics/nn/backends/mnn.py
Normal file
59
ultralytics/nn/backends/mnn.py
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class MNNBackend(BaseBackend):
|
||||
"""MNN (Mobile Neural Network) inference backend.
|
||||
|
||||
Loads and runs inference with MNN models (.mnn files) using the Alibaba MNN framework. Optimized for mobile and edge
|
||||
deployment with configurable thread count and precision.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load an Alibaba MNN model from a .mnn file.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .mnn model file.
|
||||
"""
|
||||
LOGGER.info(f"Loading {weight} for MNN inference...")
|
||||
check_requirements("MNN")
|
||||
import MNN
|
||||
|
||||
config = {"precision": "low", "backend": "CPU", "numThread": (os.cpu_count() + 1) // 2}
|
||||
rt = MNN.nn.create_runtime_manager((config,))
|
||||
self.net = MNN.nn.load_module_from_file(weight, [], [], runtime_manager=rt, rearrange=True)
|
||||
self.expr = MNN.expr
|
||||
|
||||
# Load metadata from bizCode
|
||||
info = self.net.get_info()
|
||||
if "bizCode" in info:
|
||||
try:
|
||||
self.apply_metadata(json.loads(info["bizCode"]))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list:
|
||||
"""Run inference using the MNN runtime.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list): Model predictions as a list of numpy arrays.
|
||||
"""
|
||||
input_var = self.expr.const(im.data_ptr(), im.shape)
|
||||
output_var = self.net.onForward([input_var])
|
||||
# NOTE: need this copy(), or it'd get incorrect results on ARM devices
|
||||
return [x.read().copy() for x in output_var]
|
||||
72
ultralytics/nn/backends/ncnn.py
Normal file
72
ultralytics/nn/backends/ncnn.py
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class NCNNBackend(BaseBackend):
|
||||
"""Tencent NCNN inference backend for mobile and embedded deployment.
|
||||
|
||||
Loads and runs inference with Tencent NCNN models (*_ncnn_model/ directories). Optimized for mobile platforms with
|
||||
optional Vulkan GPU acceleration when available.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load an NCNN model from a .param/.bin file pair or model directory.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .param file or directory containing NCNN model files.
|
||||
"""
|
||||
LOGGER.info(f"Loading {weight} for NCNN inference...")
|
||||
check_requirements("ncnn", cmds="--no-deps")
|
||||
import ncnn as pyncnn
|
||||
|
||||
self.pyncnn = pyncnn
|
||||
self.net = pyncnn.Net()
|
||||
|
||||
# Setup Vulkan if available
|
||||
if isinstance(self.device, str) and self.device.startswith("vulkan"):
|
||||
self.net.opt.use_vulkan_compute = True
|
||||
self.net.set_vulkan_device(int(self.device.split(":")[1]))
|
||||
self.device = torch.device("cpu")
|
||||
else:
|
||||
self.net.opt.use_vulkan_compute = False
|
||||
|
||||
w = Path(weight)
|
||||
if not w.is_file():
|
||||
w = next(w.glob("*.param"))
|
||||
|
||||
self.net.load_param(str(w))
|
||||
self.net.load_model(str(w.with_suffix(".bin")))
|
||||
|
||||
# Load metadata
|
||||
metadata_file = w.parent / "metadata.yaml"
|
||||
if metadata_file.exists():
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
|
||||
"""Run inference using the NCNN runtime.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list[np.ndarray]): Model predictions as a list of numpy arrays, one per output layer.
|
||||
"""
|
||||
mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
|
||||
with self.net.create_extractor() as ex:
|
||||
ex.input(self.net.input_names()[0], mat_in)
|
||||
# Sort output names as temporary fix for pnnx issue
|
||||
y = [np.array(ex.extract(x)[1])[None] for x in sorted(self.net.output_names())]
|
||||
return y
|
||||
196
ultralytics/nn/backends/onnx.py
Normal file
196
ultralytics/nn/backends/onnx.py
Normal file
|
|
@ -0,0 +1,196 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class ONNXBackend(BaseBackend):
|
||||
"""Microsoft ONNX Runtime inference backend with optional OpenCV DNN support.
|
||||
|
||||
Loads and runs inference with ONNX models (.onnx files) using either Microsoft ONNX Runtime with CUDA/CoreML
|
||||
execution providers, or OpenCV DNN for lightweight CPU inference. Supports IO binding for optimized GPU inference
|
||||
with static input shapes.
|
||||
"""
|
||||
|
||||
def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False, format: str = "onnx"):
|
||||
"""Initialize the ONNX backend.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .onnx model file.
|
||||
device (torch.device): Device to run inference on.
|
||||
fp16 (bool): Whether to use FP16 half-precision inference.
|
||||
format (str): Inference engine, either "onnx" for ONNX Runtime or "dnn" for OpenCV DNN.
|
||||
"""
|
||||
assert format in {"onnx", "dnn"}, f"Unsupported ONNX format: {format}."
|
||||
self.format = format
|
||||
super().__init__(weight, device, fp16)
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load an ONNX model using ONNX Runtime or OpenCV DNN.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .onnx model file.
|
||||
"""
|
||||
cuda = isinstance(self.device, torch.device) and torch.cuda.is_available() and self.device.type != "cpu"
|
||||
|
||||
if self.format == "dnn":
|
||||
# OpenCV DNN
|
||||
LOGGER.info(f"Loading {weight} for ONNX OpenCV DNN inference...")
|
||||
check_requirements("opencv-python>=4.5.4")
|
||||
import cv2
|
||||
|
||||
self.net = cv2.dnn.readNetFromONNX(weight)
|
||||
else:
|
||||
# ONNX Runtime
|
||||
LOGGER.info(f"Loading {weight} for ONNX Runtime inference...")
|
||||
check_requirements(("onnx", "onnxruntime-gpu" if cuda else "onnxruntime"))
|
||||
import onnxruntime
|
||||
|
||||
# Select execution provider
|
||||
available = onnxruntime.get_available_providers()
|
||||
if cuda and "CUDAExecutionProvider" in available:
|
||||
providers = [("CUDAExecutionProvider", {"device_id": self.device.index}), "CPUExecutionProvider"]
|
||||
elif self.device.type == "mps" and "CoreMLExecutionProvider" in available:
|
||||
providers = ["CoreMLExecutionProvider", "CPUExecutionProvider"]
|
||||
else:
|
||||
providers = ["CPUExecutionProvider"]
|
||||
if cuda:
|
||||
LOGGER.warning("CUDA requested but CUDAExecutionProvider not available. Using CPU...")
|
||||
self.device = torch.device("cpu")
|
||||
cuda = False
|
||||
|
||||
LOGGER.info(
|
||||
f"Using ONNX Runtime {onnxruntime.__version__} with "
|
||||
f"{providers[0] if isinstance(providers[0], str) else providers[0][0]}"
|
||||
)
|
||||
|
||||
self.session = onnxruntime.InferenceSession(weight, providers=providers)
|
||||
self.output_names = [x.name for x in self.session.get_outputs()]
|
||||
|
||||
# Get metadata
|
||||
metadata_map = self.session.get_modelmeta().custom_metadata_map
|
||||
if metadata_map:
|
||||
self.apply_metadata(dict(metadata_map))
|
||||
|
||||
# Check if dynamic shapes
|
||||
self.dynamic = isinstance(self.session.get_outputs()[0].shape[0], str)
|
||||
self.fp16 = "float16" in self.session.get_inputs()[0].type
|
||||
|
||||
# Setup IO binding for CUDA
|
||||
self.use_io_binding = not self.dynamic and cuda
|
||||
if self.use_io_binding:
|
||||
self.io = self.session.io_binding()
|
||||
self.bindings = []
|
||||
for output in self.session.get_outputs():
|
||||
out_fp16 = "float16" in output.type
|
||||
y_tensor = torch.empty(output.shape, dtype=torch.float16 if out_fp16 else torch.float32).to(
|
||||
self.device
|
||||
)
|
||||
self.io.bind_output(
|
||||
name=output.name,
|
||||
device_type=self.device.type,
|
||||
device_id=self.device.index if cuda else 0,
|
||||
element_type=np.float16 if out_fp16 else np.float32,
|
||||
shape=tuple(y_tensor.shape),
|
||||
buffer_ptr=y_tensor.data_ptr(),
|
||||
)
|
||||
self.bindings.append(y_tensor)
|
||||
|
||||
def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor] | np.ndarray:
|
||||
"""Run ONNX inference using IO binding (CUDA) or standard session execution.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(torch.Tensor | list[torch.Tensor] | np.ndarray): Model predictions as tensor(s) or numpy array(s).
|
||||
"""
|
||||
if self.format == "dnn":
|
||||
# OpenCV DNN
|
||||
self.net.setInput(im.cpu().numpy())
|
||||
return self.net.forward()
|
||||
|
||||
# ONNX Runtime
|
||||
if self.use_io_binding:
|
||||
if self.device.type == "cpu":
|
||||
im = im.cpu()
|
||||
self.io.bind_input(
|
||||
name="images",
|
||||
device_type=im.device.type,
|
||||
device_id=im.device.index if im.device.type == "cuda" else 0,
|
||||
element_type=np.float16 if self.fp16 else np.float32,
|
||||
shape=tuple(im.shape),
|
||||
buffer_ptr=im.data_ptr(),
|
||||
)
|
||||
self.session.run_with_iobinding(self.io)
|
||||
return self.bindings
|
||||
else:
|
||||
return self.session.run(self.output_names, {self.session.get_inputs()[0].name: im.cpu().numpy()})
|
||||
|
||||
|
||||
class ONNXIMXBackend(ONNXBackend):
|
||||
"""ONNX IMX inference backend for NXP i.MX processors.
|
||||
|
||||
Extends `ONNXBackend` with support for quantized models targeting NXP i.MX edge devices. Uses MCT (Model Compression
|
||||
Toolkit) quantizers and custom NMS operations for optimized inference.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load a quantized ONNX model from an IMX model directory.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the IMX model directory containing the .onnx file.
|
||||
"""
|
||||
check_requirements(("model-compression-toolkit>=2.4.1", "edge-mdt-cl<1.1.0", "onnxruntime-extensions"))
|
||||
check_requirements(("onnx", "onnxruntime"))
|
||||
import mct_quantizers as mctq
|
||||
import onnxruntime
|
||||
from edgemdt_cl.pytorch.nms import nms_ort # noqa - register custom NMS ops
|
||||
|
||||
w = Path(weight)
|
||||
onnx_file = next(w.glob("*.onnx"))
|
||||
LOGGER.info(f"Loading {onnx_file} for ONNX IMX inference...")
|
||||
|
||||
session_options = mctq.get_ort_session_options()
|
||||
session_options.enable_mem_reuse = False
|
||||
|
||||
self.session = onnxruntime.InferenceSession(onnx_file, session_options, providers=["CPUExecutionProvider"])
|
||||
self.output_names = [x.name for x in self.session.get_outputs()]
|
||||
self.dynamic = isinstance(self.session.get_outputs()[0].shape[0], str)
|
||||
self.fp16 = "float16" in self.session.get_inputs()[0].type
|
||||
metadata_map = self.session.get_modelmeta().custom_metadata_map
|
||||
if metadata_map:
|
||||
self.apply_metadata(dict(metadata_map))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> np.ndarray | list[np.ndarray] | tuple[np.ndarray, ...]:
|
||||
"""Run IMX inference with task-specific output concatenation for detect, pose, and segment tasks.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(np.ndarray | list[np.ndarray] | tuple[np.ndarray, ...]): Task-formatted model predictions.
|
||||
"""
|
||||
y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im.cpu().numpy()})
|
||||
|
||||
if self.task == "detect":
|
||||
# boxes, conf, cls
|
||||
return np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None]], axis=-1)
|
||||
elif self.task == "pose":
|
||||
# boxes, conf, kpts
|
||||
return np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None], y[3]], axis=-1, dtype=y[0].dtype)
|
||||
elif self.task == "segment":
|
||||
return (
|
||||
np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None], y[3]], axis=-1, dtype=y[0].dtype),
|
||||
y[4],
|
||||
)
|
||||
return y
|
||||
105
ultralytics/nn/backends/openvino.py
Normal file
105
ultralytics/nn/backends/openvino.py
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class OpenVINOBackend(BaseBackend):
|
||||
"""Intel OpenVINO inference backend for Intel hardware acceleration.
|
||||
|
||||
Loads and runs inference with Intel OpenVINO IR models (*_openvino_model/ directories). Supports automatic device
|
||||
selection, Intel-specific device targeting, and async inference for throughput optimization.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load an Intel OpenVINO IR model from a .xml/.bin file pair or model directory.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .xml file or directory containing OpenVINO model files.
|
||||
"""
|
||||
LOGGER.info(f"Loading {weight} for OpenVINO inference...")
|
||||
check_requirements("openvino>=2024.0.0")
|
||||
import openvino as ov
|
||||
|
||||
core = ov.Core()
|
||||
device_name = "AUTO"
|
||||
|
||||
if isinstance(self.device, str) and self.device.startswith("intel"):
|
||||
device_name = self.device.split(":")[1].upper()
|
||||
self.device = torch.device("cpu")
|
||||
if device_name not in core.available_devices:
|
||||
LOGGER.warning(f"OpenVINO device '{device_name}' not available. Using 'AUTO' instead.")
|
||||
device_name = "AUTO"
|
||||
|
||||
w = Path(weight)
|
||||
if not w.is_file():
|
||||
w = next(w.glob("*.xml"))
|
||||
|
||||
ov_model = core.read_model(model=str(w), weights=w.with_suffix(".bin"))
|
||||
if ov_model.get_parameters()[0].get_layout().empty:
|
||||
ov_model.get_parameters()[0].set_layout(ov.Layout("NCHW"))
|
||||
|
||||
# Load metadata
|
||||
metadata_file = w.parent / "metadata.yaml"
|
||||
if metadata_file.exists():
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
|
||||
# Set inference mode
|
||||
self.inference_mode = "CUMULATIVE_THROUGHPUT" if self.dynamic and self.batch > 1 else "LATENCY"
|
||||
|
||||
self.ov_compiled_model = core.compile_model(
|
||||
ov_model,
|
||||
device_name=device_name,
|
||||
config={"PERFORMANCE_HINT": self.inference_mode},
|
||||
)
|
||||
LOGGER.info(
|
||||
f"Using OpenVINO {self.inference_mode} mode for batch={self.batch} inference on "
|
||||
f"{', '.join(self.ov_compiled_model.get_property('EXECUTION_DEVICES'))}..."
|
||||
)
|
||||
self.input_name = self.ov_compiled_model.input().get_any_name()
|
||||
self.ov = ov
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
|
||||
"""Run Intel OpenVINO inference with sync or async execution based on inference mode.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list[np.ndarray]): Model predictions as a list of numpy arrays, one per output layer.
|
||||
"""
|
||||
im = im.cpu().numpy().astype(np.float32)
|
||||
|
||||
if self.inference_mode in {"THROUGHPUT", "CUMULATIVE_THROUGHPUT"}:
|
||||
# Async inference for larger batch sizes
|
||||
n = im.shape[0]
|
||||
results = [None] * n
|
||||
|
||||
def callback(request, userdata):
|
||||
"""Store async inference result in the preallocated results list at the given index."""
|
||||
results[userdata] = request.results
|
||||
|
||||
async_queue = self.ov.AsyncInferQueue(self.ov_compiled_model)
|
||||
async_queue.set_callback(callback)
|
||||
|
||||
for i in range(n):
|
||||
async_queue.start_async(inputs={self.input_name: im[i : i + 1]}, userdata=i)
|
||||
async_queue.wait_all()
|
||||
|
||||
y = [list(r.values()) for r in results]
|
||||
y = [np.concatenate(x) for x in zip(*y)]
|
||||
else:
|
||||
# Sync inference for LATENCY mode
|
||||
y = list(self.ov_compiled_model(im).values())
|
||||
return y
|
||||
79
ultralytics/nn/backends/paddle.py
Normal file
79
ultralytics/nn/backends/paddle.py
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import ARM64, LOGGER
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class PaddleBackend(BaseBackend):
|
||||
"""Baidu PaddlePaddle inference backend.
|
||||
|
||||
Loads and runs inference with Baidu PaddlePaddle models (*_paddle_model/ directories). Supports both CPU and GPU
|
||||
execution with automatic device configuration and memory pool initialization.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load a Baidu PaddlePaddle model from a directory containing .json and .pdiparams files.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the model directory or .pdiparams file.
|
||||
"""
|
||||
cuda = isinstance(self.device, torch.device) and torch.cuda.is_available() and self.device.type != "cpu"
|
||||
LOGGER.info(f"Loading {weight} for PaddlePaddle inference...")
|
||||
if cuda:
|
||||
check_requirements("paddlepaddle-gpu>=3.0.0,!=3.3.0")
|
||||
elif ARM64:
|
||||
check_requirements("paddlepaddle==3.0.0")
|
||||
else:
|
||||
check_requirements("paddlepaddle>=3.0.0,!=3.3.0")
|
||||
|
||||
import paddle.inference as pdi
|
||||
|
||||
w = Path(weight)
|
||||
model_file, params_file = None, None
|
||||
|
||||
if w.is_dir():
|
||||
model_file = next(w.rglob("*.json"), None)
|
||||
params_file = next(w.rglob("*.pdiparams"), None)
|
||||
elif w.suffix == ".pdiparams":
|
||||
model_file = w.with_name("model.json")
|
||||
params_file = w
|
||||
|
||||
if not (model_file and params_file and model_file.is_file() and params_file.is_file()):
|
||||
raise FileNotFoundError(f"Paddle model not found in {w}. Both .json and .pdiparams files are required.")
|
||||
|
||||
config = pdi.Config(str(model_file), str(params_file))
|
||||
if cuda:
|
||||
config.enable_use_gpu(memory_pool_init_size_mb=2048, device_id=self.device.index or 0)
|
||||
|
||||
self.predictor = pdi.create_predictor(config)
|
||||
self.input_handle = self.predictor.get_input_handle(self.predictor.get_input_names()[0])
|
||||
self.output_names = self.predictor.get_output_names()
|
||||
|
||||
# Load metadata
|
||||
metadata_file = (w if w.is_dir() else w.parent) / "metadata.yaml"
|
||||
if metadata_file.exists():
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
|
||||
"""Run Baidu PaddlePaddle inference.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list[np.ndarray]): Model predictions as a list of numpy arrays, one per output handle.
|
||||
"""
|
||||
self.input_handle.copy_from_cpu(im.cpu().numpy().astype(np.float32))
|
||||
self.predictor.run()
|
||||
return [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]
|
||||
137
ultralytics/nn/backends/pytorch.py
Normal file
137
ultralytics/nn/backends/pytorch.py
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
|
||||
from ultralytics.utils import IS_JETSON, LOGGER, is_jetson
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class PyTorchBackend(BaseBackend):
|
||||
"""PyTorch inference backend for native model execution.
|
||||
|
||||
Loads and runs inference with native PyTorch models (.pt checkpoint files) or pre-loaded nn.Module
|
||||
instances. Supports model layer fusion, FP16 precision, and NVIDIA Jetson compatibility.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
weight: str | Path | nn.Module,
|
||||
device: torch.device,
|
||||
fp16: bool = False,
|
||||
fuse: bool = True,
|
||||
verbose: bool = True,
|
||||
):
|
||||
"""Initialize the PyTorch backend.
|
||||
|
||||
Args:
|
||||
weight (str | Path | nn.Module): Path to the .pt model file or a pre-loaded nn.Module instance.
|
||||
device (torch.device): Device to run inference on (e.g., 'cpu', 'cuda:0').
|
||||
fp16 (bool): Whether to use FP16 half-precision inference.
|
||||
fuse (bool): Whether to fuse Conv2D + BatchNorm layers for optimization.
|
||||
verbose (bool): Whether to print verbose model loading messages.
|
||||
"""
|
||||
self.fuse = fuse
|
||||
self.verbose = verbose
|
||||
super().__init__(weight, device, fp16)
|
||||
|
||||
def load_model(self, weight: str | torch.nn.Module) -> None:
|
||||
"""Load a PyTorch model from a checkpoint file or nn.Module instance.
|
||||
|
||||
Args:
|
||||
weight (str | torch.nn.Module): Path to the .pt checkpoint or a pre-loaded module.
|
||||
"""
|
||||
from ultralytics.nn.tasks import load_checkpoint
|
||||
|
||||
if isinstance(weight, torch.nn.Module):
|
||||
if self.fuse and hasattr(weight, "fuse"):
|
||||
if IS_JETSON and is_jetson(jetpack=5):
|
||||
weight = weight.to(self.device)
|
||||
weight = weight.fuse(verbose=self.verbose)
|
||||
model = weight.to(self.device)
|
||||
else:
|
||||
model, _ = load_checkpoint(weight, device=self.device, fuse=self.fuse)
|
||||
|
||||
# Extract model attributes
|
||||
if hasattr(model, "kpt_shape"):
|
||||
self.kpt_shape = model.kpt_shape
|
||||
self.stride = max(int(model.stride.max()), 32) if hasattr(model, "stride") else 32
|
||||
self.names = model.module.names if hasattr(model, "module") else getattr(model, "names", {})
|
||||
self.channels = model.yaml.get("channels", 3) if hasattr(model, "yaml") else 3
|
||||
model.half() if self.fp16 else model.float()
|
||||
|
||||
for p in model.parameters():
|
||||
p.requires_grad = False
|
||||
|
||||
self.model = model
|
||||
self.end2end = getattr(model, "end2end", False)
|
||||
|
||||
def forward(
|
||||
self, im: torch.Tensor, augment: bool = False, visualize: bool = False, embed: list | None = None, **kwargs: Any
|
||||
) -> torch.Tensor | list[torch.Tensor]:
|
||||
"""Run native PyTorch inference with support for augmentation, visualization, and embeddings.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
augment (bool): Whether to apply test-time augmentation.
|
||||
visualize (bool): Whether to visualize intermediate feature maps.
|
||||
embed (list | None): List of layer indices to extract embeddings from, or None.
|
||||
**kwargs (Any): Additional keyword arguments passed to the model forward method.
|
||||
|
||||
Returns:
|
||||
(torch.Tensor | list[torch.Tensor]): Model predictions as tensor(s).
|
||||
"""
|
||||
return self.model(im, augment=augment, visualize=visualize, embed=embed, **kwargs)
|
||||
|
||||
|
||||
class TorchScriptBackend(BaseBackend):
|
||||
"""PyTorch TorchScript inference backend for serialized model execution.
|
||||
|
||||
Loads and runs inference with TorchScript models (.torchscript files) created via torch.jit.trace or
|
||||
torch.jit.script. Supports FP16 precision and embedded metadata extraction.
|
||||
"""
|
||||
|
||||
def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False):
|
||||
"""Initialize the TorchScript backend.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .torchscript model file.
|
||||
device (torch.device): Device to run inference on (e.g., 'cpu', 'cuda:0').
|
||||
fp16 (bool): Whether to use FP16 half-precision inference.
|
||||
"""
|
||||
super().__init__(weight, device, fp16)
|
||||
|
||||
def load_model(self, weight: str) -> None:
|
||||
"""Load a TorchScript model from a .torchscript file with optional embedded metadata.
|
||||
|
||||
Args:
|
||||
weight (str): Path to the .torchscript model file.
|
||||
"""
|
||||
import json
|
||||
|
||||
import torchvision # noqa - required for TorchScript model deserialization
|
||||
|
||||
LOGGER.info(f"Loading {weight} for TorchScript inference...")
|
||||
extra_files = {"config.txt": ""}
|
||||
self.model = torch.jit.load(weight, _extra_files=extra_files, map_location=self.device)
|
||||
self.model.half() if self.fp16 else self.model.float()
|
||||
|
||||
if extra_files["config.txt"]:
|
||||
self.apply_metadata(json.loads(extra_files["config.txt"], object_hook=lambda x: dict(x.items())))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor]:
|
||||
"""Run TorchScript inference.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(torch.Tensor | list[torch.Tensor]): Model predictions as tensor(s).
|
||||
"""
|
||||
return self.model(im)
|
||||
70
ultralytics/nn/backends/rknn.py
Normal file
70
ultralytics/nn/backends/rknn.py
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
from ultralytics.utils.checks import check_requirements, is_rockchip
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class RKNNBackend(BaseBackend):
|
||||
"""Rockchip RKNN inference backend for Rockchip NPU hardware.
|
||||
|
||||
Loads and runs inference with RKNN models (.rknn files) using the RKNN-Toolkit-Lite2 runtime. Only supported on
|
||||
Rockchip devices with NPU hardware (e.g., RK3588, RK3566).
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load a Rockchip RKNN model from a .rknn file or model directory.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .rknn file or directory containing the model.
|
||||
|
||||
Raises:
|
||||
OSError: If not running on a Rockchip device.
|
||||
RuntimeError: If model loading or runtime initialization fails.
|
||||
"""
|
||||
if not is_rockchip():
|
||||
raise OSError("RKNN inference is only supported on Rockchip devices.")
|
||||
|
||||
LOGGER.info(f"Loading {weight} for RKNN inference...")
|
||||
check_requirements("rknn-toolkit-lite2")
|
||||
from rknnlite.api import RKNNLite
|
||||
|
||||
w = Path(weight)
|
||||
if not w.is_file():
|
||||
w = next(w.rglob("*.rknn"))
|
||||
|
||||
self.model = RKNNLite()
|
||||
ret = self.model.load_rknn(str(w))
|
||||
if ret != 0:
|
||||
raise RuntimeError(f"Failed to load RKNN model: {ret}")
|
||||
|
||||
ret = self.model.init_runtime()
|
||||
if ret != 0:
|
||||
raise RuntimeError(f"Failed to init RKNN runtime: {ret}")
|
||||
|
||||
# Load metadata
|
||||
metadata_file = w.parent / "metadata.yaml"
|
||||
if metadata_file.exists():
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list:
|
||||
"""Run inference on the Rockchip NPU.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list): Model predictions as a list of output arrays.
|
||||
"""
|
||||
im = (im.cpu().numpy() * 255).astype("uint8")
|
||||
im = im if isinstance(im, (list, tuple)) else [im]
|
||||
return self.model.inference(inputs=im)
|
||||
183
ultralytics/nn/backends/tensorflow.py
Normal file
183
ultralytics/nn/backends/tensorflow.py
Normal file
|
|
@ -0,0 +1,183 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import ast
|
||||
import json
|
||||
import platform
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import LOGGER
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class TensorFlowBackend(BaseBackend):
|
||||
"""Google TensorFlow inference backend supporting multiple serialization formats.
|
||||
|
||||
Loads and runs inference with Google TensorFlow models in SavedModel, GraphDef (.pb), TFLite (.tflite), and Edge TPU
|
||||
formats. Handles quantized model dequantization and task-specific output formatting.
|
||||
"""
|
||||
|
||||
def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False, format: str = "saved_model"):
|
||||
"""Initialize the Google TensorFlow backend.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the SavedModel directory, .pb file, or .tflite file.
|
||||
device (torch.device): Device to run inference on.
|
||||
fp16 (bool): Whether to use FP16 half-precision inference.
|
||||
format (str): Model format, one of "saved_model", "pb", "tflite", or "edgetpu".
|
||||
"""
|
||||
assert format in {"saved_model", "pb", "tflite", "edgetpu"}, f"Unsupported TensorFlow format: {format}."
|
||||
self.format = format
|
||||
super().__init__(weight, device, fp16)
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load a Google TensorFlow model in SavedModel, GraphDef, TFLite, or Edge TPU format.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the model file or directory.
|
||||
"""
|
||||
import tensorflow as tf
|
||||
|
||||
if self.format == "saved_model":
|
||||
LOGGER.info(f"Loading {weight} for TensorFlow SavedModel inference...")
|
||||
self.model = tf.saved_model.load(weight)
|
||||
# Load metadata
|
||||
metadata_file = Path(weight) / "metadata.yaml"
|
||||
if metadata_file.exists():
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
elif self.format == "pb":
|
||||
LOGGER.info(f"Loading {weight} for TensorFlow GraphDef inference...")
|
||||
from ultralytics.utils.export.tensorflow import gd_outputs
|
||||
|
||||
def wrap_frozen_graph(gd, inputs, outputs):
|
||||
"""Wrap a TensorFlow frozen graph for inference by pruning to specified input/output nodes."""
|
||||
x = tf.compat.v1.wrap_function(lambda: tf.compat.v1.import_graph_def(gd, name=""), [])
|
||||
ge = x.graph.as_graph_element
|
||||
return x.prune(tf.nest.map_structure(ge, inputs), tf.nest.map_structure(ge, outputs))
|
||||
|
||||
gd = tf.Graph().as_graph_def()
|
||||
with open(weight, "rb") as f:
|
||||
gd.ParseFromString(f.read())
|
||||
self.frozen_func = wrap_frozen_graph(gd, inputs="x:0", outputs=gd_outputs(gd))
|
||||
|
||||
# Try to find metadata
|
||||
try:
|
||||
metadata_file = next(
|
||||
Path(weight).resolve().parent.rglob(f"{Path(weight).stem}_saved_model*/metadata.yaml")
|
||||
)
|
||||
from ultralytics.utils import YAML
|
||||
|
||||
self.apply_metadata(YAML.load(metadata_file))
|
||||
except StopIteration:
|
||||
pass
|
||||
else: # tflite and edgetpu
|
||||
try:
|
||||
from tflite_runtime.interpreter import Interpreter, load_delegate
|
||||
|
||||
self.tf = None
|
||||
except ImportError:
|
||||
import tensorflow as tf
|
||||
|
||||
self.tf = tf
|
||||
Interpreter, load_delegate = tf.lite.Interpreter, tf.lite.experimental.load_delegate
|
||||
|
||||
if self.format == "edgetpu":
|
||||
device = self.device[3:] if str(self.device).startswith("tpu") else ":0"
|
||||
LOGGER.info(f"Loading {weight} on device {device[1:]} for TensorFlow Lite Edge TPU inference...")
|
||||
delegate = {"Linux": "libedgetpu.so.1", "Darwin": "libedgetpu.1.dylib", "Windows": "edgetpu.dll"}[
|
||||
platform.system()
|
||||
]
|
||||
self.interpreter = Interpreter(
|
||||
model_path=str(weight),
|
||||
experimental_delegates=[load_delegate(delegate, options={"device": device})],
|
||||
)
|
||||
self.device = torch.device("cpu") # Edge TPU runs on CPU from PyTorch's perspective
|
||||
else:
|
||||
LOGGER.info(f"Loading {weight} for TensorFlow Lite inference...")
|
||||
self.interpreter = Interpreter(model_path=weight)
|
||||
|
||||
self.interpreter.allocate_tensors()
|
||||
self.input_details = self.interpreter.get_input_details()
|
||||
self.output_details = self.interpreter.get_output_details()
|
||||
|
||||
# Load metadata
|
||||
try:
|
||||
with zipfile.ZipFile(weight, "r") as zf:
|
||||
name = zf.namelist()[0]
|
||||
contents = zf.read(name).decode("utf-8")
|
||||
if name == "metadata.json":
|
||||
self.apply_metadata(json.loads(contents))
|
||||
else:
|
||||
self.apply_metadata(ast.literal_eval(contents))
|
||||
except (zipfile.BadZipFile, SyntaxError, ValueError, json.JSONDecodeError):
|
||||
pass
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list[np.ndarray]:
|
||||
"""Run Google TensorFlow inference with format-specific execution and output post-processing.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BHWC format (converted from BCHW by AutoBackend).
|
||||
|
||||
Returns:
|
||||
(list[np.ndarray]): Model predictions as a list of numpy arrays.
|
||||
"""
|
||||
im = im.cpu().numpy()
|
||||
if self.format == "saved_model":
|
||||
y = self.model.serving_default(im)
|
||||
if not isinstance(y, list):
|
||||
y = [y]
|
||||
elif self.format == "pb":
|
||||
import tensorflow as tf
|
||||
|
||||
y = self.frozen_func(x=tf.constant(im))
|
||||
else:
|
||||
h, w = im.shape[1:3]
|
||||
|
||||
details = self.input_details[0]
|
||||
is_int = details["dtype"] in {np.int8, np.int16}
|
||||
|
||||
if is_int:
|
||||
scale, zero_point = details["quantization"]
|
||||
im = (im / scale + zero_point).astype(details["dtype"])
|
||||
|
||||
self.interpreter.set_tensor(details["index"], im)
|
||||
self.interpreter.invoke()
|
||||
|
||||
y = []
|
||||
for output in self.output_details:
|
||||
x = self.interpreter.get_tensor(output["index"])
|
||||
if is_int:
|
||||
scale, zero_point = output["quantization"]
|
||||
x = (x.astype(np.float32) - zero_point) * scale
|
||||
if x.ndim == 3:
|
||||
# Denormalize xywh by image size
|
||||
if x.shape[-1] == 6 or self.end2end:
|
||||
x[:, :, [0, 2]] *= w
|
||||
x[:, :, [1, 3]] *= h
|
||||
if self.task == "pose":
|
||||
x[:, :, 6::3] *= w
|
||||
x[:, :, 7::3] *= h
|
||||
else:
|
||||
x[:, [0, 2]] *= w
|
||||
x[:, [1, 3]] *= h
|
||||
if self.task == "pose":
|
||||
x[:, 5::3] *= w
|
||||
x[:, 6::3] *= h
|
||||
y.append(x)
|
||||
|
||||
if self.task == "segment": # segment with (det, proto) output order reversed
|
||||
if len(y[1].shape) != 4:
|
||||
y = list(reversed(y)) # should be y = (1, 116, 8400), (1, 160, 160, 32)
|
||||
if y[1].shape[-1] == 6: # end-to-end model
|
||||
y = [y[1]]
|
||||
else:
|
||||
y[1] = np.transpose(y[1], (0, 3, 1, 2)) # should be y = (1, 116, 8400), (1, 32, 160, 160)
|
||||
return [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
|
||||
144
ultralytics/nn/backends/tensorrt.py
Normal file
144
ultralytics/nn/backends/tensorrt.py
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from collections import OrderedDict, namedtuple
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
from ultralytics.utils import IS_JETSON, LINUX, LOGGER, PYTHON_VERSION
|
||||
from ultralytics.utils.checks import check_requirements, check_version
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class TensorRTBackend(BaseBackend):
|
||||
"""NVIDIA TensorRT inference backend for GPU-accelerated deployment.
|
||||
|
||||
Loads and runs inference with NVIDIA TensorRT serialized engines (.engine files). Supports both TensorRT 7-9 and
|
||||
TensorRT 10+ APIs, dynamic input shapes, FP16 precision, and DLA core offloading.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Load an NVIDIA TensorRT engine from a serialized .engine file.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Path to the .engine file with optional embedded metadata.
|
||||
"""
|
||||
LOGGER.info(f"Loading {weight} for TensorRT inference...")
|
||||
|
||||
if IS_JETSON and check_version(PYTHON_VERSION, "<=3.8.10"):
|
||||
check_requirements("numpy==1.23.5")
|
||||
|
||||
try:
|
||||
import tensorrt as trt
|
||||
except ImportError:
|
||||
if LINUX:
|
||||
check_requirements("tensorrt>7.0.0,!=10.1.0")
|
||||
import tensorrt as trt
|
||||
|
||||
check_version(trt.__version__, ">=7.0.0", hard=True)
|
||||
check_version(trt.__version__, "!=10.1.0", msg="https://github.com/ultralytics/ultralytics/pull/14239")
|
||||
|
||||
if self.device.type == "cpu":
|
||||
self.device = torch.device("cuda:0")
|
||||
|
||||
Binding = namedtuple("Binding", ("name", "dtype", "shape", "data", "ptr"))
|
||||
logger = trt.Logger(trt.Logger.INFO)
|
||||
|
||||
# Read engine file
|
||||
with open(weight, "rb") as f, trt.Runtime(logger) as runtime:
|
||||
try:
|
||||
meta_len = int.from_bytes(f.read(4), byteorder="little")
|
||||
metadata = json.loads(f.read(meta_len).decode("utf-8"))
|
||||
dla = metadata.get("dla", None)
|
||||
if dla is not None:
|
||||
runtime.DLA_core = int(dla)
|
||||
except UnicodeDecodeError:
|
||||
f.seek(0)
|
||||
metadata = None
|
||||
engine = runtime.deserialize_cuda_engine(f.read())
|
||||
self.apply_metadata(metadata)
|
||||
try:
|
||||
self.context = engine.create_execution_context()
|
||||
except Exception as e:
|
||||
LOGGER.error("TensorRT model exported with a different version than expected\n")
|
||||
raise e
|
||||
|
||||
# Setup bindings
|
||||
self.bindings = OrderedDict()
|
||||
self.output_names = []
|
||||
self.fp16 = False
|
||||
self.dynamic = False
|
||||
self.is_trt10 = not hasattr(engine, "num_bindings")
|
||||
num = range(engine.num_io_tensors) if self.is_trt10 else range(engine.num_bindings)
|
||||
|
||||
for i in num:
|
||||
if self.is_trt10:
|
||||
name = engine.get_tensor_name(i)
|
||||
dtype = trt.nptype(engine.get_tensor_dtype(name))
|
||||
is_input = engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT
|
||||
shape = tuple(engine.get_tensor_shape(name))
|
||||
profile_shape = tuple(engine.get_tensor_profile_shape(name, 0)[2]) if is_input else None
|
||||
else:
|
||||
name = engine.get_binding_name(i)
|
||||
dtype = trt.nptype(engine.get_binding_dtype(i))
|
||||
is_input = engine.binding_is_input(i)
|
||||
shape = tuple(engine.get_binding_shape(i))
|
||||
profile_shape = tuple(engine.get_profile_shape(0, i)[1]) if is_input else None
|
||||
|
||||
if is_input:
|
||||
if -1 in shape:
|
||||
self.dynamic = True
|
||||
if self.is_trt10:
|
||||
self.context.set_input_shape(name, profile_shape)
|
||||
else:
|
||||
self.context.set_binding_shape(i, profile_shape)
|
||||
if dtype == np.float16:
|
||||
self.fp16 = True
|
||||
else:
|
||||
self.output_names.append(name)
|
||||
|
||||
shape = (
|
||||
tuple(self.context.get_tensor_shape(name))
|
||||
if self.is_trt10
|
||||
else tuple(self.context.get_binding_shape(i))
|
||||
)
|
||||
im = torch.from_numpy(np.empty(shape, dtype=dtype)).to(self.device)
|
||||
self.bindings[name] = Binding(name, dtype, shape, im, int(im.data_ptr()))
|
||||
|
||||
self.binding_addrs = OrderedDict((n, d.ptr) for n, d in self.bindings.items())
|
||||
self.model = engine
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list[torch.Tensor]:
|
||||
"""Run NVIDIA TensorRT inference with dynamic shape handling.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format on the CUDA device.
|
||||
|
||||
Returns:
|
||||
(list[torch.Tensor]): Model predictions as a list of tensors on the CUDA device.
|
||||
"""
|
||||
if self.dynamic and im.shape != self.bindings["images"].shape:
|
||||
if self.is_trt10:
|
||||
self.context.set_input_shape("images", im.shape)
|
||||
self.bindings["images"] = self.bindings["images"]._replace(shape=im.shape)
|
||||
for name in self.output_names:
|
||||
self.bindings[name].data.resize_(tuple(self.context.get_tensor_shape(name)))
|
||||
else:
|
||||
i = self.model.get_binding_index("images")
|
||||
self.context.set_binding_shape(i, im.shape)
|
||||
self.bindings["images"] = self.bindings["images"]._replace(shape=im.shape)
|
||||
for name in self.output_names:
|
||||
i = self.model.get_binding_index(name)
|
||||
self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))
|
||||
|
||||
s = self.bindings["images"].shape
|
||||
assert im.shape == s, f"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}"
|
||||
|
||||
self.binding_addrs["images"] = int(im.data_ptr())
|
||||
self.context.execute_v2(list(self.binding_addrs.values()))
|
||||
return [self.bindings[x].data for x in sorted(self.output_names)]
|
||||
45
ultralytics/nn/backends/triton.py
Normal file
45
ultralytics/nn/backends/triton.py
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import torch
|
||||
|
||||
from ultralytics.utils.checks import check_requirements
|
||||
|
||||
from .base import BaseBackend
|
||||
|
||||
|
||||
class TritonBackend(BaseBackend):
|
||||
"""NVIDIA Triton Inference Server backend for remote model serving.
|
||||
|
||||
Connects to and runs inference with models hosted on an NVIDIA Triton Inference Server instance via HTTP or gRPC
|
||||
protocols. The model is specified using a triton:// URL scheme.
|
||||
"""
|
||||
|
||||
def load_model(self, weight: str | Path) -> None:
|
||||
"""Connect to a remote model on an NVIDIA Triton Inference Server.
|
||||
|
||||
Args:
|
||||
weight (str | Path): Triton model URL (e.g., 'http://localhost:8000/model_name').
|
||||
"""
|
||||
check_requirements("tritonclient[all]")
|
||||
from ultralytics.utils.triton import TritonRemoteModel
|
||||
|
||||
self.model = TritonRemoteModel(weight)
|
||||
|
||||
# Copy metadata from Triton model
|
||||
if hasattr(self.model, "metadata"):
|
||||
self.apply_metadata(self.model.metadata)
|
||||
|
||||
def forward(self, im: torch.Tensor) -> list:
|
||||
"""Run inference via the NVIDIA Triton Inference Server.
|
||||
|
||||
Args:
|
||||
im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
|
||||
|
||||
Returns:
|
||||
(list): Model predictions as a list of numpy arrays from the Triton server.
|
||||
"""
|
||||
return self.model(im.cpu().numpy())
|
||||
|
|
@ -253,7 +253,7 @@ class BaseSolution:
|
|||
f" {', '.join([f'{v} {self.names[k]}' for k, v in counts.items()])}\n"
|
||||
f"Speed: {track_or_predict_speed:.1f}ms {track_or_predict}, "
|
||||
f"{solution_speed:.1f}ms solution per image at shape "
|
||||
f"(1, {getattr(self.model, 'ch', 3)}, {result.plot_im.shape[0]}, {result.plot_im.shape[1]})\n"
|
||||
f"(1, {getattr(self.model, 'channels', 3)}, {result.plot_im.shape[0]}, {result.plot_im.shape[1]})\n"
|
||||
)
|
||||
return result
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue