ultralytics 8.4.23 Refactor AutoBackend into modular per-backend classes (#23790)

Signed-off-by: Jing Qiu <61612323+Laughing-q@users.noreply.github.com> Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Ultralytics Assistant <135830346+UltralyticsAssistant@users.noreply.github.com> Co-authored-by: Lakshantha Dissanayake <lakshantha@ultralytics.com> Co-authored-by: Onuralp SEZER <onuralp@ultralytics.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2026-04-21 14:07:18 +00:00 · 2026-03-17 07:39:27 +08:00 · 2026-03-17 07:39:27 +08:00 · b10fa7be23
commit b10fa7be23
parent b3c79532e3
38 changed files with 1844 additions and 773 deletions
--- a/docs/en/reference/nn/backends/axelera.md
+++ b/docs/en/reference/nn/backends/axelera.md
@ -0,0 +1,16 @@
+---
+description: Explore AxeleraBackend for Axelera hardware inference, deploying YOLO models on Axelera AI accelerators with optimized performance.
+keywords: Ultralytics, AxeleraBackend, Axelera inference, AI accelerator, hardware inference, edge AI, deep learning acceleration
+---
+
+# Reference for `ultralytics/nn/backends/axelera.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/axelera.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/axelera.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.axelera.AxeleraBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/base.md
+++ b/docs/en/reference/nn/backends/base.md
@ -0,0 +1,16 @@
+---
+description: Explore the BaseBackend class, the abstract foundation for all inference backends in Ultralytics, defining the interface for model loading and inference.
+keywords: Ultralytics, BaseBackend, inference backend, abstract class, model loading, deep learning, neural network inference
+---
+
+# Reference for `ultralytics/nn/backends/base.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/base.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/base.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.base.BaseBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/coreml.md
+++ b/docs/en/reference/nn/backends/coreml.md
@ -0,0 +1,16 @@
+---
+description: Explore CoreMLBackend for Apple CoreML inference, enabling efficient YOLO model deployment on iOS, macOS, and Apple Silicon devices.
+keywords: Ultralytics, CoreMLBackend, CoreML inference, Apple CoreML, iOS deployment, macOS inference, Apple Silicon, mobile AI
+---
+
+# Reference for `ultralytics/nn/backends/coreml.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/coreml.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/coreml.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.coreml.CoreMLBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/executorch.md
+++ b/docs/en/reference/nn/backends/executorch.md
@ -0,0 +1,16 @@
+---
+description: Explore ExecuTorchBackend for Meta ExecuTorch inference, enabling efficient PyTorch model deployment on mobile and edge devices.
+keywords: Ultralytics, ExecuTorchBackend, ExecuTorch inference, Meta ExecuTorch, mobile inference, edge deployment, PyTorch Mobile
+---
+
+# Reference for `ultralytics/nn/backends/executorch.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/executorch.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/executorch.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.executorch.ExecuTorchBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/mnn.md
+++ b/docs/en/reference/nn/backends/mnn.md
@ -0,0 +1,16 @@
+---
+description: Explore MNNBackend for Alibaba MNN inference, enabling lightweight and efficient model deployment on mobile and edge devices.
+keywords: Ultralytics, MNNBackend, MNN inference, Alibaba MNN, mobile inference, edge AI, .mnn models, deep learning
+---
+
+# Reference for `ultralytics/nn/backends/mnn.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/mnn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/mnn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.mnn.MNNBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/ncnn.md
+++ b/docs/en/reference/nn/backends/ncnn.md
@ -0,0 +1,16 @@
+---
+description: Explore NCNNBackend for Tencent NCNN inference, optimized for mobile and embedded platforms with Vulkan acceleration support.
+keywords: Ultralytics, NCNNBackend, NCNN inference, Tencent NCNN, mobile inference, Vulkan acceleration, embedded AI, deep learning
+---
+
+# Reference for `ultralytics/nn/backends/ncnn.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/ncnn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/ncnn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.ncnn.NCNNBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/onnx.md
+++ b/docs/en/reference/nn/backends/onnx.md
@ -0,0 +1,20 @@
+---
+description: Explore ONNXBackend and ONNXIMXBackend for Microsoft ONNX Runtime inference, supporting standard ONNX models and NXP IMX-optimized variants.
+keywords: Ultralytics, ONNXBackend, ONNXIMXBackend, Microsoft ONNX Runtime, Sony IMX, ONNX inference, edge deployment, deep learning
+---
+
+# Reference for `ultralytics/nn/backends/onnx.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/onnx.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/onnx.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.onnx.ONNXBackend
+
+<br><br><hr><br>
+
+## ::: ultralytics.nn.backends.onnx.ONNXIMXBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/openvino.md
+++ b/docs/en/reference/nn/backends/openvino.md
@ -0,0 +1,16 @@
+---
+description: Explore OpenVINOBackend for optimized inference on Intel hardware, supporting OpenVINO IR models for efficient deployment on CPUs, GPUs, and VPUs.
+keywords: Ultralytics, OpenVINOBackend, OpenVINO inference, Intel OpenVINO, CPU inference, VPU, edge AI, deep learning optimization
+---
+
+# Reference for `ultralytics/nn/backends/openvino.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/openvino.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/openvino.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.openvino.OpenVINOBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/paddle.md
+++ b/docs/en/reference/nn/backends/paddle.md
@ -0,0 +1,16 @@
+---
+description: Explore PaddleBackend for Baidu PaddlePaddle inference, supporting deployment with Paddle Inference engine on various hardware platforms.
+keywords: Ultralytics, PaddleBackend, PaddlePaddle inference, Baidu Paddle, Paddle Inference, deep learning, model deployment
+---
+
+# Reference for `ultralytics/nn/backends/paddle.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/paddle.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/paddle.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.paddle.PaddleBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/pytorch.md
+++ b/docs/en/reference/nn/backends/pytorch.md
@ -0,0 +1,20 @@
+---
+description: Explore PyTorchBackend and TorchScriptBackend for native PyTorch and TorchScript model inference in Ultralytics YOLO models.
+keywords: Ultralytics, PyTorchBackend, TorchScriptBackend, PyTorch inference, TorchScript inference, .pt models, deep learning, YOLO
+---
+
+# Reference for `ultralytics/nn/backends/pytorch.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/pytorch.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/pytorch.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.pytorch.PyTorchBackend
+
+<br><br><hr><br>
+
+## ::: ultralytics.nn.backends.pytorch.TorchScriptBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/rknn.md
+++ b/docs/en/reference/nn/backends/rknn.md
@ -0,0 +1,16 @@
+---
+description: Explore RKNNBackend for Rockchip RKNN inference, enabling optimized YOLO deployment on Rockchip NPU-equipped edge devices.
+keywords: Ultralytics, RKNNBackend, RKNN inference, Rockchip RKNN, NPU inference, edge AI, embedded deployment, deep learning
+---
+
+# Reference for `ultralytics/nn/backends/rknn.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/rknn.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/rknn.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.rknn.RKNNBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/tensorflow.md
+++ b/docs/en/reference/nn/backends/tensorflow.md
@ -0,0 +1,16 @@
+---
+description: Explore TensorFlowBackend for Google TensorFlow inference including SavedModel, GraphDef, TFLite, and Edge TPU formats.
+keywords: Ultralytics, TensorFlowBackend, Google TensorFlow, TFLite, Edge TPU, SavedModel, GraphDef, deep learning, model deployment
+---
+
+# Reference for `ultralytics/nn/backends/tensorflow.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorflow.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorflow.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.tensorflow.TensorFlowBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/tensorrt.md
+++ b/docs/en/reference/nn/backends/tensorrt.md
@ -0,0 +1,16 @@
+---
+description: Explore TensorRTBackend for high-performance GPU inference with NVIDIA TensorRT, optimizing YOLO models for production deployment.
+keywords: Ultralytics, TensorRTBackend, TensorRT inference, NVIDIA TensorRT, GPU inference, .engine models, production deployment, deep learning
+---
+
+# Reference for `ultralytics/nn/backends/tensorrt.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorrt.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/tensorrt.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.tensorrt.TensorRTBackend
+
+<br><br>
--- a/docs/en/reference/nn/backends/triton.md
+++ b/docs/en/reference/nn/backends/triton.md
@ -0,0 +1,16 @@
+---
+description: Explore TritonBackend for NVIDIA Triton Inference Server, enabling scalable cloud and edge deployment of YOLO models.
+keywords: Ultralytics, TritonBackend, Triton Inference Server, NVIDIA Triton, cloud inference, model serving, scalable deployment
+---
+
+# Reference for `ultralytics/nn/backends/triton.py`
+
+!!! success "Improvements"
+
+    This page is sourced from [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/triton.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/nn/backends/triton.py). Have an improvement or example to add? Open a [Pull Request](https://docs.ultralytics.com/help/contributing/) — thank you! 🙏
+
+<br>
+
+## ::: ultralytics.nn.backends.triton.TritonBackend
+
+<br><br>
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -745,6 +745,21 @@ nav:
                  - val: reference/models/yolo/yoloe/val.md
      - nn:
          - autobackend: reference/nn/autobackend.md
+          - backends:
+              - axelera: reference/nn/backends/axelera.md
+              - base: reference/nn/backends/base.md
+              - coreml: reference/nn/backends/coreml.md
+              - executorch: reference/nn/backends/executorch.md
+              - mnn: reference/nn/backends/mnn.md
+              - ncnn: reference/nn/backends/ncnn.md
+              - onnx: reference/nn/backends/onnx.md
+              - openvino: reference/nn/backends/openvino.md
+              - paddle: reference/nn/backends/paddle.md
+              - pytorch: reference/nn/backends/pytorch.md
+              - rknn: reference/nn/backends/rknn.md
+              - tensorflow: reference/nn/backends/tensorflow.md
+              - tensorrt: reference/nn/backends/tensorrt.md
+              - triton: reference/nn/backends/triton.md
          - modules:
              - activation: reference/nn/modules/activation.md
              - block: reference/nn/modules/block.md
--- a/ultralytics/init.py
+++ b/ultralytics/init.py
@ -1,6 +1,6 @@
 # Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

-__version__ = "8.4.22"
+__version__ = "8.4.23"

 import importlib
 import os
--- a/ultralytics/engine/predictor.py
+++ b/ultralytics/engine/predictor.py
@ -195,7 +195,7 @@ class BasePredictor:
            self.imgsz,
            auto=same_shapes
            and self.args.rect
-            and (self.model.pt or (getattr(self.model, "dynamic", False) and not self.model.imx)),
+            and (self.model.format == "pt" or (getattr(self.model, "dynamic", False) and self.model.format != "imx")),
            stride=self.model.stride,
        )
        return [letterbox(image=x) for x in im]
@ -258,7 +258,7 @@ class BasePredictor:
            batch=self.args.batch,
            vid_stride=self.args.vid_stride,
            buffer=self.args.stream_buffer,
-            channels=getattr(self.model, "ch", 3),
+            channels=getattr(self.model, "channels", 3),
        )
        self.source_type = self.dataset.source_type
        if (
@ -305,7 +305,11 @@ class BasePredictor:
            # Warmup model
            if not self.done_warmup:
                self.model.warmup(
-                    imgsz=(1 if self.model.pt or self.model.triton else self.dataset.bs, self.model.ch, *self.imgsz)
+                    imgsz=(
+                        1 if self.model.format in {"pt", "triton"} else self.dataset.bs,
+                        self.model.channels,
+                        *self.imgsz,
+                    )
                )
                self.done_warmup = True

@ -372,7 +376,7 @@ class BasePredictor:
            t = tuple(x.t / self.seen * 1e3 for x in profilers)  # speeds per image
            LOGGER.info(
                f"Speed: %.1fms preprocess, %.1fms inference, %.1fms postprocess per image at shape "
-                f"{(min(self.args.batch, self.seen), getattr(self.model, 'ch', 3), *im.shape[2:])}" % t
+                f"{(min(self.args.batch, self.seen), getattr(self.model, 'channels', 3), *im.shape[2:])}" % t
            )
        if self.args.save or self.args.save_txt or self.args.save_crop:
            nl = len(list(self.save_dir.glob("labels/*.txt")))  # number of labels
--- a/ultralytics/engine/validator.py
+++ b/ultralytics/engine/validator.py
@ -172,9 +172,10 @@ class BaseValidator:
            )
            self.device = model.device  # update device
            self.args.half = model.fp16  # update half
-            stride, pt, jit = model.stride, model.pt, model.jit
+            stride, fmt = model.stride, model.format
+            pt = fmt == "pt"
            imgsz = check_imgsz(self.args.imgsz, stride=stride)
-            if not (pt or jit or getattr(model, "dynamic", False)):
+            if fmt not in {"pt", "torchscript"} and not getattr(model, "dynamic", False):
                self.args.batch = model.metadata.get("batch", 1)  # export.py models default to batch-size 1
                LOGGER.info(f"Setting batch={self.args.batch} input of shape ({self.args.batch}, 3, {imgsz}, {imgsz})")

@ -187,7 +188,7 @@ class BaseValidator:

            if self.device.type in {"cpu", "mps"}:
                self.args.workers = 0  # faster CPU val as time dominated by inference, not dataloading
-            if not (pt or (getattr(model, "dynamic", False) and not model.imx)):
+            if not (pt or (getattr(model, "dynamic", False) and fmt != "imx")):
                self.args.rect = False
            self.stride = model.stride  # used in get_dataloader() for padding
            self.dataloader = self.dataloader or self.get_dataloader(self.data.get(self.args.split), self.args.batch)
--- a/ultralytics/models/sam/predict.py
+++ b/ultralytics/models/sam/predict.py
@ -462,8 +462,7 @@ class Predictor(BasePredictor):
        self.std = torch.tensor([58.395, 57.12, 57.375]).view(-1, 1, 1).to(device)

        # Ultralytics compatibility settings
-        self.model.pt = False
-        self.model.triton = False
+        self.model.format = "sam"
        self.model.stride = 32
        self.model.fp16 = self.args.half
        self.done_warmup = True
--- a/ultralytics/models/yolo/classify/predict.py
+++ b/ultralytics/models/yolo/classify/predict.py
@ -59,7 +59,7 @@ class ClassificationPredictor(BasePredictor):
            else False
        )
        self.transforms = (
-            classify_transforms(self.imgsz) if updated or not self.model.pt else self.model.model.transforms
+            classify_transforms(self.imgsz) if updated or self.model.format != "pt" else self.model.model.transforms
        )

    def preprocess(self, img):
--- a/ultralytics/models/yolo/world/train_world.py
+++ b/ultralytics/models/yolo/world/train_world.py
@ -63,7 +63,7 @@ class WorldTrainerFromScratch(WorldTrainer):
        Args:
            cfg (dict): Configuration dictionary with default parameters for model training.
            overrides (dict, optional): Dictionary of parameter overrides to customize the configuration.
-            _callbacks (dict, optional): Dictionary of callback functions to be executed during different stages of training.
+            _callbacks (dict, optional): Dictionary of callback functions to run during different stages of training.
        """
        if overrides is None:
            overrides = {}
--- a/ultralytics/nn/autobackend.py
+++ b/ultralytics/nn/autobackend.py
--- a/ultralytics/nn/backends/init.py
+++ b/ultralytics/nn/backends/init.py
@ -0,0 +1,41 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+"""Ultralytics YOLO inference backends.
+
+This package provides modular inference backends for various deep learning frameworks and hardware accelerators.
+Each backend implements the `BaseBackend` interface and can be used independently or through the unified
+`AutoBackend` dispatcher for automatic format detection and inference routing.
+"""
+
+from .axelera import AxeleraBackend
+from .base import BaseBackend
+from .coreml import CoreMLBackend
+from .executorch import ExecuTorchBackend
+from .mnn import MNNBackend
+from .ncnn import NCNNBackend
+from .onnx import ONNXBackend, ONNXIMXBackend
+from .openvino import OpenVINOBackend
+from .paddle import PaddleBackend
+from .pytorch import PyTorchBackend, TorchScriptBackend
+from .rknn import RKNNBackend
+from .tensorflow import TensorFlowBackend
+from .tensorrt import TensorRTBackend
+from .triton import TritonBackend
+
+__all__ = [
+    "AxeleraBackend",
+    "BaseBackend",
+    "CoreMLBackend",
+    "ExecuTorchBackend",
+    "MNNBackend",
+    "NCNNBackend",
+    "ONNXBackend",
+    "ONNXIMXBackend",
+    "OpenVINOBackend",
+    "PaddleBackend",
+    "PyTorchBackend",
+    "RKNNBackend",
+    "TensorFlowBackend",
+    "TensorRTBackend",
+    "TorchScriptBackend",
+    "TritonBackend",
+]
--- a/ultralytics/nn/backends/axelera.py
+++ b/ultralytics/nn/backends/axelera.py
@ -0,0 +1,69 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+import os
+from pathlib import Path
+
+import torch
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class AxeleraBackend(BaseBackend):
+    """Axelera AI inference backend for Axelera Metis AI accelerators.
+
+    Loads compiled Axelera models (.axm files) and runs inference using the Axelera AI runtime SDK. Requires the Axelera
+    runtime environment to be activated before use.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load an Axelera model from a directory containing a .axm file.
+
+        Args:
+            weight (str | Path): Path to the Axelera model directory containing the .axm binary.
+        """
+        if not os.environ.get("AXELERA_RUNTIME_DIR"):
+            LOGGER.warning(
+                "Axelera runtime environment is not activated.\n"
+                "Please run: source /opt/axelera/sdk/latest/axelera_activate.sh\n\n"
+                "If this fails, verify driver installation: "
+                "https://docs.ultralytics.com/integrations/axelera/#axelera-driver-installation"
+            )
+
+        try:
+            from axelera.runtime import op
+        except ImportError:
+            check_requirements(
+                "axelera_runtime2==0.1.2",
+                cmds="--extra-index-url https://software.axelera.ai/artifactory/axelera-runtime-pypi",
+            )
+            from axelera.runtime import op
+
+        w = Path(weight)
+        found = next(w.rglob("*.axm"), None)
+        if found is None:
+            raise FileNotFoundError(f"No .axm file found in: {w}")
+
+        self.model = op.load(str(found))
+
+        # Load metadata
+        metadata_file = found.parent / "metadata.yaml"
+        if metadata_file.exists():
+            from ultralytics.utils import YAML
+
+            self.apply_metadata(YAML.load(metadata_file))
+
+    def forward(self, im: torch.Tensor) -> list:
+        """Run inference on the Axelera hardware accelerator.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list): Model predictions as a list of output arrays.
+        """
+        return self.model(im.cpu())
--- a/ultralytics/nn/backends/base.py
+++ b/ultralytics/nn/backends/base.py
@ -0,0 +1,104 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+import ast
+from abc import ABC, abstractmethod
+
+import torch
+
+
+class BaseBackend(ABC):
+    """Base class for all inference backends.
+
+    This abstract class defines the interface that all inference backends must implement. It provides common
+    functionality for model loading, metadata processing, and device management.
+
+    Attributes:
+        model: The underlying inference model or runtime session.
+        device (torch.device): The device to run inference on.
+        fp16 (bool): Whether to use FP16 (half-precision) inference.
+        nhwc (bool): Whether the model expects NHWC input format instead of NCHW.
+        stride (int): Model stride, typically 32 for YOLO models.
+        names (dict): Dictionary mapping class indices to class names.
+        task (str | None): The task type (detect, segment, classify, pose, obb).
+        batch (int): Batch size for inference.
+        imgsz (tuple): Input image size as (height, width).
+        channels (int): Number of input channels, typically 3 for RGB.
+        end2end (bool): Whether the model includes end-to-end NMS post-processing.
+        dynamic (bool): Whether the model supports dynamic input shapes.
+        metadata (dict): Model metadata dictionary containing export configuration.
+    """
+
+    def __init__(self, weight: str | torch.nn.Module, device: torch.device | str, fp16: bool = False):
+        """Initialize the base backend with common attributes and load the model.
+
+        Args:
+            weight (str | torch.nn.Module): Path to the model weights file or a PyTorch module instance.
+            device (torch.device | str): Device to run inference on (e.g., 'cpu', 'cuda:0').
+            fp16 (bool): Whether to use FP16 half-precision inference.
+        """
+        self.device = device
+        self.fp16 = fp16
+        self.nhwc = False
+        self.stride = 32
+        self.names = {}
+        self.task = None
+        self.batch = 1
+        self.channels = 3
+        self.end2end = False
+        self.dynamic = False
+        self.metadata = {}
+        self.model = None
+        self.load_model(weight)
+
+    @abstractmethod
+    def load_model(self, weight: str | torch.nn.Module) -> None:
+        """Load the model from a weights file or module instance.
+
+        Args:
+            weight (str | torch.nn.Module): Path to model weights or a PyTorch module.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor]:
+        """Run inference on the input image tensor.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (torch.Tensor | list[torch.Tensor]): Model output as a single tensor or list of tensors.
+        """
+        raise NotImplementedError
+
+    def apply_metadata(self, metadata: dict | None) -> None:
+        """Process and apply model metadata to backend attributes.
+
+        Handles type conversions for common metadata fields (e.g., stride, batch, names) and sets them as
+        instance attributes. Also resolves end-to-end NMS and dynamic shape settings from export args.
+
+        Args:
+            metadata (dict | None): Dictionary containing metadata key-value pairs from model export.
+        """
+        if not metadata:
+            return
+
+        # Store raw metadata
+        self.metadata = metadata
+
+        # Process type conversions for known fields
+        for k, v in metadata.items():
+            if k in {"stride", "batch", "channels"}:
+                metadata[k] = int(v)
+            elif k in {"imgsz", "names", "kpt_shape", "kpt_names", "args", "end2end"} and isinstance(v, str):
+                metadata[k] = ast.literal_eval(v)
+
+        # Handle models exported with end-to-end NMS
+        metadata["end2end"] = metadata.get("end2end", False) or metadata.get("args", {}).get("nms", False)
+        metadata["dynamic"] = metadata.get("args", {}).get("dynamic", self.dynamic)
+
+        # Apply all metadata fields as backend attributes
+        for k, v in metadata.items():
+            setattr(self, k, v)
--- a/ultralytics/nn/backends/coreml.py
+++ b/ultralytics/nn/backends/coreml.py
@ -0,0 +1,64 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import numpy as np
+import torch
+from PIL import Image
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class CoreMLBackend(BaseBackend):
+    """CoreML inference backend for Apple hardware.
+
+    Loads and runs inference with CoreML models (.mlpackage files) using the coremltools library. Supports both static
+    and dynamic input shapes and handles NMS-included model outputs.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load a CoreML model from a .mlpackage file.
+
+        Args:
+            weight (str | Path): Path to the .mlpackage model file.
+        """
+        check_requirements(["coremltools>=9.0", "numpy>=1.14.5,<=2.3.5"])
+        import coremltools as ct
+
+        LOGGER.info(f"Loading {weight} for CoreML inference...")
+        self.model = ct.models.MLModel(weight)
+        self.dynamic = self.model.get_spec().description.input[0].type.HasField("multiArrayType")
+
+        # Load metadata
+        self.apply_metadata(dict(self.model.user_defined_metadata))
+
+    def forward(self, im: torch.Tensor) -> np.ndarray | list[np.ndarray]:
+        """Run CoreML inference with automatic input format handling.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BHWC format (converted from BCHW by AutoBackend).
+
+        Returns:
+            (np.ndarray | list[np.ndarray]): Model predictions as numpy array(s).
+        """
+        im = im.cpu().numpy()
+        h, w = im.shape[1:3]
+
+        im = im.transpose(0, 3, 1, 2) if self.dynamic else Image.fromarray((im[0] * 255).astype("uint8"))
+        y = self.model.predict({"image": im})
+        if "confidence" in y:  # NMS included
+            from ultralytics.utils.ops import xywh2xyxy
+
+            box = xywh2xyxy(y["coordinates"] * [[w, h, w, h]])
+            cls = y["confidence"].argmax(1, keepdims=True)
+            y = np.concatenate((box, np.take_along_axis(y["confidence"], cls, axis=1), cls), 1)[None]
+        else:
+            y = list(y.values())
+        if len(y) == 2 and len(y[1].shape) != 4:  # segmentation model
+            y = list(reversed(y))
+        return y
--- a/ultralytics/nn/backends/executorch.py
+++ b/ultralytics/nn/backends/executorch.py
@ -0,0 +1,59 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import torch
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_executorch_requirements
+
+from .base import BaseBackend
+
+
+class ExecuTorchBackend(BaseBackend):
+    """Meta ExecuTorch inference backend for on-device deployment.
+
+    Loads and runs inference with Meta ExecuTorch models (.pte files) using the ExecuTorch runtime. Supports both
+    standalone .pte files and directory-based model packages with metadata.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load an ExecuTorch model from a .pte file or directory.
+
+        Args:
+            weight (str | Path): Path to the .pte model file or directory containing the model.
+        """
+        LOGGER.info(f"Loading {weight} for ExecuTorch inference...")
+        check_executorch_requirements()
+
+        from executorch.runtime import Runtime
+
+        w = Path(weight)
+        if w.is_dir():
+            model_file = next(w.rglob("*.pte"))
+            metadata_file = w / "metadata.yaml"
+        else:
+            model_file = w
+            metadata_file = w.parent / "metadata.yaml"
+
+        program = Runtime.get().load_program(str(model_file))
+        self.model = program.load_method("forward")
+
+        # Load metadata
+        if metadata_file.exists():
+            from ultralytics.utils import YAML
+
+            self.apply_metadata(YAML.load(metadata_file))
+
+    def forward(self, im: torch.Tensor) -> list:
+        """Run inference using the ExecuTorch runtime.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list): Model predictions as a list of ExecuTorch output values.
+        """
+        return self.model.execute([im])
--- a/ultralytics/nn/backends/mnn.py
+++ b/ultralytics/nn/backends/mnn.py
@ -0,0 +1,59 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+import json
+import os
+from pathlib import Path
+
+import torch
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class MNNBackend(BaseBackend):
+    """MNN (Mobile Neural Network) inference backend.
+
+    Loads and runs inference with MNN models (.mnn files) using the Alibaba MNN framework. Optimized for mobile and edge
+    deployment with configurable thread count and precision.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load an Alibaba MNN model from a .mnn file.
+
+        Args:
+            weight (str | Path): Path to the .mnn model file.
+        """
+        LOGGER.info(f"Loading {weight} for MNN inference...")
+        check_requirements("MNN")
+        import MNN
+
+        config = {"precision": "low", "backend": "CPU", "numThread": (os.cpu_count() + 1) // 2}
+        rt = MNN.nn.create_runtime_manager((config,))
+        self.net = MNN.nn.load_module_from_file(weight, [], [], runtime_manager=rt, rearrange=True)
+        self.expr = MNN.expr
+
+        # Load metadata from bizCode
+        info = self.net.get_info()
+        if "bizCode" in info:
+            try:
+                self.apply_metadata(json.loads(info["bizCode"]))
+            except json.JSONDecodeError:
+                pass
+
+    def forward(self, im: torch.Tensor) -> list:
+        """Run inference using the MNN runtime.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list): Model predictions as a list of numpy arrays.
+        """
+        input_var = self.expr.const(im.data_ptr(), im.shape)
+        output_var = self.net.onForward([input_var])
+        # NOTE: need this copy(), or it'd get incorrect results on ARM devices
+        return [x.read().copy() for x in output_var]
--- a/ultralytics/nn/backends/ncnn.py
+++ b/ultralytics/nn/backends/ncnn.py
@ -0,0 +1,72 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import numpy as np
+import torch
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class NCNNBackend(BaseBackend):
+    """Tencent NCNN inference backend for mobile and embedded deployment.
+
+    Loads and runs inference with Tencent NCNN models (*_ncnn_model/ directories). Optimized for mobile platforms with
+    optional Vulkan GPU acceleration when available.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load an NCNN model from a .param/.bin file pair or model directory.
+
+        Args:
+            weight (str | Path): Path to the .param file or directory containing NCNN model files.
+        """
+        LOGGER.info(f"Loading {weight} for NCNN inference...")
+        check_requirements("ncnn", cmds="--no-deps")
+        import ncnn as pyncnn
+
+        self.pyncnn = pyncnn
+        self.net = pyncnn.Net()
+
+        # Setup Vulkan if available
+        if isinstance(self.device, str) and self.device.startswith("vulkan"):
+            self.net.opt.use_vulkan_compute = True
+            self.net.set_vulkan_device(int(self.device.split(":")[1]))
+            self.device = torch.device("cpu")
+        else:
+            self.net.opt.use_vulkan_compute = False
+
+        w = Path(weight)
+        if not w.is_file():
+            w = next(w.glob("*.param"))
+
+        self.net.load_param(str(w))
+        self.net.load_model(str(w.with_suffix(".bin")))
+
+        # Load metadata
+        metadata_file = w.parent / "metadata.yaml"
+        if metadata_file.exists():
+            from ultralytics.utils import YAML
+
+            self.apply_metadata(YAML.load(metadata_file))
+
+    def forward(self, im: torch.Tensor) -> list[np.ndarray]:
+        """Run inference using the NCNN runtime.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list[np.ndarray]): Model predictions as a list of numpy arrays, one per output layer.
+        """
+        mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
+        with self.net.create_extractor() as ex:
+            ex.input(self.net.input_names()[0], mat_in)
+            # Sort output names as temporary fix for pnnx issue
+            y = [np.array(ex.extract(x)[1])[None] for x in sorted(self.net.output_names())]
+        return y
--- a/ultralytics/nn/backends/onnx.py
+++ b/ultralytics/nn/backends/onnx.py
@ -0,0 +1,196 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import numpy as np
+import torch
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class ONNXBackend(BaseBackend):
+    """Microsoft ONNX Runtime inference backend with optional OpenCV DNN support.
+
+    Loads and runs inference with ONNX models (.onnx files) using either Microsoft ONNX Runtime with CUDA/CoreML
+    execution providers, or OpenCV DNN for lightweight CPU inference. Supports IO binding for optimized GPU inference
+    with static input shapes.
+    """
+
+    def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False, format: str = "onnx"):
+        """Initialize the ONNX backend.
+
+        Args:
+            weight (str | Path): Path to the .onnx model file.
+            device (torch.device): Device to run inference on.
+            fp16 (bool): Whether to use FP16 half-precision inference.
+            format (str): Inference engine, either "onnx" for ONNX Runtime or "dnn" for OpenCV DNN.
+        """
+        assert format in {"onnx", "dnn"}, f"Unsupported ONNX format: {format}."
+        self.format = format
+        super().__init__(weight, device, fp16)
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load an ONNX model using ONNX Runtime or OpenCV DNN.
+
+        Args:
+            weight (str | Path): Path to the .onnx model file.
+        """
+        cuda = isinstance(self.device, torch.device) and torch.cuda.is_available() and self.device.type != "cpu"
+
+        if self.format == "dnn":
+            # OpenCV DNN
+            LOGGER.info(f"Loading {weight} for ONNX OpenCV DNN inference...")
+            check_requirements("opencv-python>=4.5.4")
+            import cv2
+
+            self.net = cv2.dnn.readNetFromONNX(weight)
+        else:
+            # ONNX Runtime
+            LOGGER.info(f"Loading {weight} for ONNX Runtime inference...")
+            check_requirements(("onnx", "onnxruntime-gpu" if cuda else "onnxruntime"))
+            import onnxruntime
+
+            # Select execution provider
+            available = onnxruntime.get_available_providers()
+            if cuda and "CUDAExecutionProvider" in available:
+                providers = [("CUDAExecutionProvider", {"device_id": self.device.index}), "CPUExecutionProvider"]
+            elif self.device.type == "mps" and "CoreMLExecutionProvider" in available:
+                providers = ["CoreMLExecutionProvider", "CPUExecutionProvider"]
+            else:
+                providers = ["CPUExecutionProvider"]
+                if cuda:
+                    LOGGER.warning("CUDA requested but CUDAExecutionProvider not available. Using CPU...")
+                    self.device = torch.device("cpu")
+                    cuda = False
+
+            LOGGER.info(
+                f"Using ONNX Runtime {onnxruntime.__version__} with "
+                f"{providers[0] if isinstance(providers[0], str) else providers[0][0]}"
+            )
+
+            self.session = onnxruntime.InferenceSession(weight, providers=providers)
+            self.output_names = [x.name for x in self.session.get_outputs()]
+
+            # Get metadata
+            metadata_map = self.session.get_modelmeta().custom_metadata_map
+            if metadata_map:
+                self.apply_metadata(dict(metadata_map))
+
+            # Check if dynamic shapes
+            self.dynamic = isinstance(self.session.get_outputs()[0].shape[0], str)
+            self.fp16 = "float16" in self.session.get_inputs()[0].type
+
+            # Setup IO binding for CUDA
+            self.use_io_binding = not self.dynamic and cuda
+            if self.use_io_binding:
+                self.io = self.session.io_binding()
+                self.bindings = []
+                for output in self.session.get_outputs():
+                    out_fp16 = "float16" in output.type
+                    y_tensor = torch.empty(output.shape, dtype=torch.float16 if out_fp16 else torch.float32).to(
+                        self.device
+                    )
+                    self.io.bind_output(
+                        name=output.name,
+                        device_type=self.device.type,
+                        device_id=self.device.index if cuda else 0,
+                        element_type=np.float16 if out_fp16 else np.float32,
+                        shape=tuple(y_tensor.shape),
+                        buffer_ptr=y_tensor.data_ptr(),
+                    )
+                    self.bindings.append(y_tensor)
+
+    def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor] | np.ndarray:
+        """Run ONNX inference using IO binding (CUDA) or standard session execution.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (torch.Tensor | list[torch.Tensor] | np.ndarray): Model predictions as tensor(s) or numpy array(s).
+        """
+        if self.format == "dnn":
+            # OpenCV DNN
+            self.net.setInput(im.cpu().numpy())
+            return self.net.forward()
+
+        # ONNX Runtime
+        if self.use_io_binding:
+            if self.device.type == "cpu":
+                im = im.cpu()
+            self.io.bind_input(
+                name="images",
+                device_type=im.device.type,
+                device_id=im.device.index if im.device.type == "cuda" else 0,
+                element_type=np.float16 if self.fp16 else np.float32,
+                shape=tuple(im.shape),
+                buffer_ptr=im.data_ptr(),
+            )
+            self.session.run_with_iobinding(self.io)
+            return self.bindings
+        else:
+            return self.session.run(self.output_names, {self.session.get_inputs()[0].name: im.cpu().numpy()})
+
+
+class ONNXIMXBackend(ONNXBackend):
+    """ONNX IMX inference backend for NXP i.MX processors.
+
+    Extends `ONNXBackend` with support for quantized models targeting NXP i.MX edge devices. Uses MCT (Model Compression
+    Toolkit) quantizers and custom NMS operations for optimized inference.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load a quantized ONNX model from an IMX model directory.
+
+        Args:
+            weight (str | Path): Path to the IMX model directory containing the .onnx file.
+        """
+        check_requirements(("model-compression-toolkit>=2.4.1", "edge-mdt-cl<1.1.0", "onnxruntime-extensions"))
+        check_requirements(("onnx", "onnxruntime"))
+        import mct_quantizers as mctq
+        import onnxruntime
+        from edgemdt_cl.pytorch.nms import nms_ort  # noqa - register custom NMS ops
+
+        w = Path(weight)
+        onnx_file = next(w.glob("*.onnx"))
+        LOGGER.info(f"Loading {onnx_file} for ONNX IMX inference...")
+
+        session_options = mctq.get_ort_session_options()
+        session_options.enable_mem_reuse = False
+
+        self.session = onnxruntime.InferenceSession(onnx_file, session_options, providers=["CPUExecutionProvider"])
+        self.output_names = [x.name for x in self.session.get_outputs()]
+        self.dynamic = isinstance(self.session.get_outputs()[0].shape[0], str)
+        self.fp16 = "float16" in self.session.get_inputs()[0].type
+        metadata_map = self.session.get_modelmeta().custom_metadata_map
+        if metadata_map:
+            self.apply_metadata(dict(metadata_map))
+
+    def forward(self, im: torch.Tensor) -> np.ndarray | list[np.ndarray] | tuple[np.ndarray, ...]:
+        """Run IMX inference with task-specific output concatenation for detect, pose, and segment tasks.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (np.ndarray | list[np.ndarray] | tuple[np.ndarray, ...]): Task-formatted model predictions.
+        """
+        y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im.cpu().numpy()})
+
+        if self.task == "detect":
+            # boxes, conf, cls
+            return np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None]], axis=-1)
+        elif self.task == "pose":
+            # boxes, conf, kpts
+            return np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None], y[3]], axis=-1, dtype=y[0].dtype)
+        elif self.task == "segment":
+            return (
+                np.concatenate([y[0], y[1][:, :, None], y[2][:, :, None], y[3]], axis=-1, dtype=y[0].dtype),
+                y[4],
+            )
+        return y
--- a/ultralytics/nn/backends/openvino.py
+++ b/ultralytics/nn/backends/openvino.py
@ -0,0 +1,105 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import numpy as np
+import torch
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class OpenVINOBackend(BaseBackend):
+    """Intel OpenVINO inference backend for Intel hardware acceleration.
+
+    Loads and runs inference with Intel OpenVINO IR models (*_openvino_model/ directories). Supports automatic device
+    selection, Intel-specific device targeting, and async inference for throughput optimization.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load an Intel OpenVINO IR model from a .xml/.bin file pair or model directory.
+
+        Args:
+            weight (str | Path): Path to the .xml file or directory containing OpenVINO model files.
+        """
+        LOGGER.info(f"Loading {weight} for OpenVINO inference...")
+        check_requirements("openvino>=2024.0.0")
+        import openvino as ov
+
+        core = ov.Core()
+        device_name = "AUTO"
+
+        if isinstance(self.device, str) and self.device.startswith("intel"):
+            device_name = self.device.split(":")[1].upper()
+            self.device = torch.device("cpu")
+            if device_name not in core.available_devices:
+                LOGGER.warning(f"OpenVINO device '{device_name}' not available. Using 'AUTO' instead.")
+                device_name = "AUTO"
+
+        w = Path(weight)
+        if not w.is_file():
+            w = next(w.glob("*.xml"))
+
+        ov_model = core.read_model(model=str(w), weights=w.with_suffix(".bin"))
+        if ov_model.get_parameters()[0].get_layout().empty:
+            ov_model.get_parameters()[0].set_layout(ov.Layout("NCHW"))
+
+        # Load metadata
+        metadata_file = w.parent / "metadata.yaml"
+        if metadata_file.exists():
+            from ultralytics.utils import YAML
+
+            self.apply_metadata(YAML.load(metadata_file))
+
+        # Set inference mode
+        self.inference_mode = "CUMULATIVE_THROUGHPUT" if self.dynamic and self.batch > 1 else "LATENCY"
+
+        self.ov_compiled_model = core.compile_model(
+            ov_model,
+            device_name=device_name,
+            config={"PERFORMANCE_HINT": self.inference_mode},
+        )
+        LOGGER.info(
+            f"Using OpenVINO {self.inference_mode} mode for batch={self.batch} inference on "
+            f"{', '.join(self.ov_compiled_model.get_property('EXECUTION_DEVICES'))}..."
+        )
+        self.input_name = self.ov_compiled_model.input().get_any_name()
+        self.ov = ov
+
+    def forward(self, im: torch.Tensor) -> list[np.ndarray]:
+        """Run Intel OpenVINO inference with sync or async execution based on inference mode.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list[np.ndarray]): Model predictions as a list of numpy arrays, one per output layer.
+        """
+        im = im.cpu().numpy().astype(np.float32)
+
+        if self.inference_mode in {"THROUGHPUT", "CUMULATIVE_THROUGHPUT"}:
+            # Async inference for larger batch sizes
+            n = im.shape[0]
+            results = [None] * n
+
+            def callback(request, userdata):
+                """Store async inference result in the preallocated results list at the given index."""
+                results[userdata] = request.results
+
+            async_queue = self.ov.AsyncInferQueue(self.ov_compiled_model)
+            async_queue.set_callback(callback)
+
+            for i in range(n):
+                async_queue.start_async(inputs={self.input_name: im[i : i + 1]}, userdata=i)
+            async_queue.wait_all()
+
+            y = [list(r.values()) for r in results]
+            y = [np.concatenate(x) for x in zip(*y)]
+        else:
+            # Sync inference for LATENCY mode
+            y = list(self.ov_compiled_model(im).values())
+        return y
--- a/ultralytics/nn/backends/paddle.py
+++ b/ultralytics/nn/backends/paddle.py
@ -0,0 +1,79 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import numpy as np
+import torch
+
+from ultralytics.utils import ARM64, LOGGER
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class PaddleBackend(BaseBackend):
+    """Baidu PaddlePaddle inference backend.
+
+    Loads and runs inference with Baidu PaddlePaddle models (*_paddle_model/ directories). Supports both CPU and GPU
+    execution with automatic device configuration and memory pool initialization.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load a Baidu PaddlePaddle model from a directory containing .json and .pdiparams files.
+
+        Args:
+            weight (str | Path): Path to the model directory or .pdiparams file.
+        """
+        cuda = isinstance(self.device, torch.device) and torch.cuda.is_available() and self.device.type != "cpu"
+        LOGGER.info(f"Loading {weight} for PaddlePaddle inference...")
+        if cuda:
+            check_requirements("paddlepaddle-gpu>=3.0.0,!=3.3.0")
+        elif ARM64:
+            check_requirements("paddlepaddle==3.0.0")
+        else:
+            check_requirements("paddlepaddle>=3.0.0,!=3.3.0")
+
+        import paddle.inference as pdi
+
+        w = Path(weight)
+        model_file, params_file = None, None
+
+        if w.is_dir():
+            model_file = next(w.rglob("*.json"), None)
+            params_file = next(w.rglob("*.pdiparams"), None)
+        elif w.suffix == ".pdiparams":
+            model_file = w.with_name("model.json")
+            params_file = w
+
+        if not (model_file and params_file and model_file.is_file() and params_file.is_file()):
+            raise FileNotFoundError(f"Paddle model not found in {w}. Both .json and .pdiparams files are required.")
+
+        config = pdi.Config(str(model_file), str(params_file))
+        if cuda:
+            config.enable_use_gpu(memory_pool_init_size_mb=2048, device_id=self.device.index or 0)
+
+        self.predictor = pdi.create_predictor(config)
+        self.input_handle = self.predictor.get_input_handle(self.predictor.get_input_names()[0])
+        self.output_names = self.predictor.get_output_names()
+
+        # Load metadata
+        metadata_file = (w if w.is_dir() else w.parent) / "metadata.yaml"
+        if metadata_file.exists():
+            from ultralytics.utils import YAML
+
+            self.apply_metadata(YAML.load(metadata_file))
+
+    def forward(self, im: torch.Tensor) -> list[np.ndarray]:
+        """Run Baidu PaddlePaddle inference.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list[np.ndarray]): Model predictions as a list of numpy arrays, one per output handle.
+        """
+        self.input_handle.copy_from_cpu(im.cpu().numpy().astype(np.float32))
+        self.predictor.run()
+        return [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]
--- a/ultralytics/nn/backends/pytorch.py
+++ b/ultralytics/nn/backends/pytorch.py
@ -0,0 +1,137 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+import torch
+import torch.nn as nn
+
+from ultralytics.utils import IS_JETSON, LOGGER, is_jetson
+
+from .base import BaseBackend
+
+
+class PyTorchBackend(BaseBackend):
+    """PyTorch inference backend for native model execution.
+
+    Loads and runs inference with native PyTorch models (.pt checkpoint files) or pre-loaded nn.Module
+    instances. Supports model layer fusion, FP16 precision, and NVIDIA Jetson compatibility.
+    """
+
+    def __init__(
+        self,
+        weight: str | Path | nn.Module,
+        device: torch.device,
+        fp16: bool = False,
+        fuse: bool = True,
+        verbose: bool = True,
+    ):
+        """Initialize the PyTorch backend.
+
+        Args:
+            weight (str | Path | nn.Module): Path to the .pt model file or a pre-loaded nn.Module instance.
+            device (torch.device): Device to run inference on (e.g., 'cpu', 'cuda:0').
+            fp16 (bool): Whether to use FP16 half-precision inference.
+            fuse (bool): Whether to fuse Conv2D + BatchNorm layers for optimization.
+            verbose (bool): Whether to print verbose model loading messages.
+        """
+        self.fuse = fuse
+        self.verbose = verbose
+        super().__init__(weight, device, fp16)
+
+    def load_model(self, weight: str | torch.nn.Module) -> None:
+        """Load a PyTorch model from a checkpoint file or nn.Module instance.
+
+        Args:
+            weight (str | torch.nn.Module): Path to the .pt checkpoint or a pre-loaded module.
+        """
+        from ultralytics.nn.tasks import load_checkpoint
+
+        if isinstance(weight, torch.nn.Module):
+            if self.fuse and hasattr(weight, "fuse"):
+                if IS_JETSON and is_jetson(jetpack=5):
+                    weight = weight.to(self.device)
+                weight = weight.fuse(verbose=self.verbose)
+            model = weight.to(self.device)
+        else:
+            model, _ = load_checkpoint(weight, device=self.device, fuse=self.fuse)
+
+        # Extract model attributes
+        if hasattr(model, "kpt_shape"):
+            self.kpt_shape = model.kpt_shape
+        self.stride = max(int(model.stride.max()), 32) if hasattr(model, "stride") else 32
+        self.names = model.module.names if hasattr(model, "module") else getattr(model, "names", {})
+        self.channels = model.yaml.get("channels", 3) if hasattr(model, "yaml") else 3
+        model.half() if self.fp16 else model.float()
+
+        for p in model.parameters():
+            p.requires_grad = False
+
+        self.model = model
+        self.end2end = getattr(model, "end2end", False)
+
+    def forward(
+        self, im: torch.Tensor, augment: bool = False, visualize: bool = False, embed: list | None = None, **kwargs: Any
+    ) -> torch.Tensor | list[torch.Tensor]:
+        """Run native PyTorch inference with support for augmentation, visualization, and embeddings.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+            augment (bool): Whether to apply test-time augmentation.
+            visualize (bool): Whether to visualize intermediate feature maps.
+            embed (list | None): List of layer indices to extract embeddings from, or None.
+            **kwargs (Any): Additional keyword arguments passed to the model forward method.
+
+        Returns:
+            (torch.Tensor | list[torch.Tensor]): Model predictions as tensor(s).
+        """
+        return self.model(im, augment=augment, visualize=visualize, embed=embed, **kwargs)
+
+
+class TorchScriptBackend(BaseBackend):
+    """PyTorch TorchScript inference backend for serialized model execution.
+
+    Loads and runs inference with TorchScript models (.torchscript files) created via torch.jit.trace or
+    torch.jit.script. Supports FP16 precision and embedded metadata extraction.
+    """
+
+    def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False):
+        """Initialize the TorchScript backend.
+
+        Args:
+            weight (str | Path): Path to the .torchscript model file.
+            device (torch.device): Device to run inference on (e.g., 'cpu', 'cuda:0').
+            fp16 (bool): Whether to use FP16 half-precision inference.
+        """
+        super().__init__(weight, device, fp16)
+
+    def load_model(self, weight: str) -> None:
+        """Load a TorchScript model from a .torchscript file with optional embedded metadata.
+
+        Args:
+            weight (str): Path to the .torchscript model file.
+        """
+        import json
+
+        import torchvision  # noqa - required for TorchScript model deserialization
+
+        LOGGER.info(f"Loading {weight} for TorchScript inference...")
+        extra_files = {"config.txt": ""}
+        self.model = torch.jit.load(weight, _extra_files=extra_files, map_location=self.device)
+        self.model.half() if self.fp16 else self.model.float()
+
+        if extra_files["config.txt"]:
+            self.apply_metadata(json.loads(extra_files["config.txt"], object_hook=lambda x: dict(x.items())))
+
+    def forward(self, im: torch.Tensor) -> torch.Tensor | list[torch.Tensor]:
+        """Run TorchScript inference.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (torch.Tensor | list[torch.Tensor]): Model predictions as tensor(s).
+        """
+        return self.model(im)
--- a/ultralytics/nn/backends/rknn.py
+++ b/ultralytics/nn/backends/rknn.py
@ -0,0 +1,70 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import torch
+
+from ultralytics.utils import LOGGER
+from ultralytics.utils.checks import check_requirements, is_rockchip
+
+from .base import BaseBackend
+
+
+class RKNNBackend(BaseBackend):
+    """Rockchip RKNN inference backend for Rockchip NPU hardware.
+
+    Loads and runs inference with RKNN models (.rknn files) using the RKNN-Toolkit-Lite2 runtime. Only supported on
+    Rockchip devices with NPU hardware (e.g., RK3588, RK3566).
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load a Rockchip RKNN model from a .rknn file or model directory.
+
+        Args:
+            weight (str | Path): Path to the .rknn file or directory containing the model.
+
+        Raises:
+            OSError: If not running on a Rockchip device.
+            RuntimeError: If model loading or runtime initialization fails.
+        """
+        if not is_rockchip():
+            raise OSError("RKNN inference is only supported on Rockchip devices.")
+
+        LOGGER.info(f"Loading {weight} for RKNN inference...")
+        check_requirements("rknn-toolkit-lite2")
+        from rknnlite.api import RKNNLite
+
+        w = Path(weight)
+        if not w.is_file():
+            w = next(w.rglob("*.rknn"))
+
+        self.model = RKNNLite()
+        ret = self.model.load_rknn(str(w))
+        if ret != 0:
+            raise RuntimeError(f"Failed to load RKNN model: {ret}")
+
+        ret = self.model.init_runtime()
+        if ret != 0:
+            raise RuntimeError(f"Failed to init RKNN runtime: {ret}")
+
+        # Load metadata
+        metadata_file = w.parent / "metadata.yaml"
+        if metadata_file.exists():
+            from ultralytics.utils import YAML
+
+            self.apply_metadata(YAML.load(metadata_file))
+
+    def forward(self, im: torch.Tensor) -> list:
+        """Run inference on the Rockchip NPU.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list): Model predictions as a list of output arrays.
+        """
+        im = (im.cpu().numpy() * 255).astype("uint8")
+        im = im if isinstance(im, (list, tuple)) else [im]
+        return self.model.inference(inputs=im)
--- a/ultralytics/nn/backends/tensorflow.py
+++ b/ultralytics/nn/backends/tensorflow.py
@ -0,0 +1,183 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+import ast
+import json
+import platform
+import zipfile
+from pathlib import Path
+
+import numpy as np
+import torch
+
+from ultralytics.utils import LOGGER
+
+from .base import BaseBackend
+
+
+class TensorFlowBackend(BaseBackend):
+    """Google TensorFlow inference backend supporting multiple serialization formats.
+
+    Loads and runs inference with Google TensorFlow models in SavedModel, GraphDef (.pb), TFLite (.tflite), and Edge TPU
+    formats. Handles quantized model dequantization and task-specific output formatting.
+    """
+
+    def __init__(self, weight: str | Path, device: torch.device, fp16: bool = False, format: str = "saved_model"):
+        """Initialize the Google TensorFlow backend.
+
+        Args:
+            weight (str | Path): Path to the SavedModel directory, .pb file, or .tflite file.
+            device (torch.device): Device to run inference on.
+            fp16 (bool): Whether to use FP16 half-precision inference.
+            format (str): Model format, one of "saved_model", "pb", "tflite", or "edgetpu".
+        """
+        assert format in {"saved_model", "pb", "tflite", "edgetpu"}, f"Unsupported TensorFlow format: {format}."
+        self.format = format
+        super().__init__(weight, device, fp16)
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load a Google TensorFlow model in SavedModel, GraphDef, TFLite, or Edge TPU format.
+
+        Args:
+            weight (str | Path): Path to the model file or directory.
+        """
+        import tensorflow as tf
+
+        if self.format == "saved_model":
+            LOGGER.info(f"Loading {weight} for TensorFlow SavedModel inference...")
+            self.model = tf.saved_model.load(weight)
+            # Load metadata
+            metadata_file = Path(weight) / "metadata.yaml"
+            if metadata_file.exists():
+                from ultralytics.utils import YAML
+
+                self.apply_metadata(YAML.load(metadata_file))
+        elif self.format == "pb":
+            LOGGER.info(f"Loading {weight} for TensorFlow GraphDef inference...")
+            from ultralytics.utils.export.tensorflow import gd_outputs
+
+            def wrap_frozen_graph(gd, inputs, outputs):
+                """Wrap a TensorFlow frozen graph for inference by pruning to specified input/output nodes."""
+                x = tf.compat.v1.wrap_function(lambda: tf.compat.v1.import_graph_def(gd, name=""), [])
+                ge = x.graph.as_graph_element
+                return x.prune(tf.nest.map_structure(ge, inputs), tf.nest.map_structure(ge, outputs))
+
+            gd = tf.Graph().as_graph_def()
+            with open(weight, "rb") as f:
+                gd.ParseFromString(f.read())
+            self.frozen_func = wrap_frozen_graph(gd, inputs="x:0", outputs=gd_outputs(gd))
+
+            # Try to find metadata
+            try:
+                metadata_file = next(
+                    Path(weight).resolve().parent.rglob(f"{Path(weight).stem}_saved_model*/metadata.yaml")
+                )
+                from ultralytics.utils import YAML
+
+                self.apply_metadata(YAML.load(metadata_file))
+            except StopIteration:
+                pass
+        else:  # tflite and edgetpu
+            try:
+                from tflite_runtime.interpreter import Interpreter, load_delegate
+
+                self.tf = None
+            except ImportError:
+                import tensorflow as tf
+
+                self.tf = tf
+                Interpreter, load_delegate = tf.lite.Interpreter, tf.lite.experimental.load_delegate
+
+            if self.format == "edgetpu":
+                device = self.device[3:] if str(self.device).startswith("tpu") else ":0"
+                LOGGER.info(f"Loading {weight} on device {device[1:]} for TensorFlow Lite Edge TPU inference...")
+                delegate = {"Linux": "libedgetpu.so.1", "Darwin": "libedgetpu.1.dylib", "Windows": "edgetpu.dll"}[
+                    platform.system()
+                ]
+                self.interpreter = Interpreter(
+                    model_path=str(weight),
+                    experimental_delegates=[load_delegate(delegate, options={"device": device})],
+                )
+                self.device = torch.device("cpu")  # Edge TPU runs on CPU from PyTorch's perspective
+            else:
+                LOGGER.info(f"Loading {weight} for TensorFlow Lite inference...")
+                self.interpreter = Interpreter(model_path=weight)
+
+            self.interpreter.allocate_tensors()
+            self.input_details = self.interpreter.get_input_details()
+            self.output_details = self.interpreter.get_output_details()
+
+            # Load metadata
+            try:
+                with zipfile.ZipFile(weight, "r") as zf:
+                    name = zf.namelist()[0]
+                    contents = zf.read(name).decode("utf-8")
+                    if name == "metadata.json":
+                        self.apply_metadata(json.loads(contents))
+                    else:
+                        self.apply_metadata(ast.literal_eval(contents))
+            except (zipfile.BadZipFile, SyntaxError, ValueError, json.JSONDecodeError):
+                pass
+
+    def forward(self, im: torch.Tensor) -> list[np.ndarray]:
+        """Run Google TensorFlow inference with format-specific execution and output post-processing.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BHWC format (converted from BCHW by AutoBackend).
+
+        Returns:
+            (list[np.ndarray]): Model predictions as a list of numpy arrays.
+        """
+        im = im.cpu().numpy()
+        if self.format == "saved_model":
+            y = self.model.serving_default(im)
+            if not isinstance(y, list):
+                y = [y]
+        elif self.format == "pb":
+            import tensorflow as tf
+
+            y = self.frozen_func(x=tf.constant(im))
+        else:
+            h, w = im.shape[1:3]
+
+            details = self.input_details[0]
+            is_int = details["dtype"] in {np.int8, np.int16}
+
+            if is_int:
+                scale, zero_point = details["quantization"]
+                im = (im / scale + zero_point).astype(details["dtype"])
+
+            self.interpreter.set_tensor(details["index"], im)
+            self.interpreter.invoke()
+
+            y = []
+            for output in self.output_details:
+                x = self.interpreter.get_tensor(output["index"])
+                if is_int:
+                    scale, zero_point = output["quantization"]
+                    x = (x.astype(np.float32) - zero_point) * scale
+                if x.ndim == 3:
+                    # Denormalize xywh by image size
+                    if x.shape[-1] == 6 or self.end2end:
+                        x[:, :, [0, 2]] *= w
+                        x[:, :, [1, 3]] *= h
+                        if self.task == "pose":
+                            x[:, :, 6::3] *= w
+                            x[:, :, 7::3] *= h
+                    else:
+                        x[:, [0, 2]] *= w
+                        x[:, [1, 3]] *= h
+                        if self.task == "pose":
+                            x[:, 5::3] *= w
+                            x[:, 6::3] *= h
+                y.append(x)
+
+        if self.task == "segment":  # segment with (det, proto) output order reversed
+            if len(y[1].shape) != 4:
+                y = list(reversed(y))  # should be y = (1, 116, 8400), (1, 160, 160, 32)
+            if y[1].shape[-1] == 6:  # end-to-end model
+                y = [y[1]]
+            else:
+                y[1] = np.transpose(y[1], (0, 3, 1, 2))  # should be y = (1, 116, 8400), (1, 32, 160, 160)
+        return [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
--- a/ultralytics/nn/backends/tensorrt.py
+++ b/ultralytics/nn/backends/tensorrt.py
@ -0,0 +1,144 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+import json
+from collections import OrderedDict, namedtuple
+from pathlib import Path
+
+import numpy as np
+import torch
+
+from ultralytics.utils import IS_JETSON, LINUX, LOGGER, PYTHON_VERSION
+from ultralytics.utils.checks import check_requirements, check_version
+
+from .base import BaseBackend
+
+
+class TensorRTBackend(BaseBackend):
+    """NVIDIA TensorRT inference backend for GPU-accelerated deployment.
+
+    Loads and runs inference with NVIDIA TensorRT serialized engines (.engine files). Supports both TensorRT 7-9 and
+    TensorRT 10+ APIs, dynamic input shapes, FP16 precision, and DLA core offloading.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Load an NVIDIA TensorRT engine from a serialized .engine file.
+
+        Args:
+            weight (str | Path): Path to the .engine file with optional embedded metadata.
+        """
+        LOGGER.info(f"Loading {weight} for TensorRT inference...")
+
+        if IS_JETSON and check_version(PYTHON_VERSION, "<=3.8.10"):
+            check_requirements("numpy==1.23.5")
+
+        try:
+            import tensorrt as trt
+        except ImportError:
+            if LINUX:
+                check_requirements("tensorrt>7.0.0,!=10.1.0")
+            import tensorrt as trt
+
+        check_version(trt.__version__, ">=7.0.0", hard=True)
+        check_version(trt.__version__, "!=10.1.0", msg="https://github.com/ultralytics/ultralytics/pull/14239")
+
+        if self.device.type == "cpu":
+            self.device = torch.device("cuda:0")
+
+        Binding = namedtuple("Binding", ("name", "dtype", "shape", "data", "ptr"))
+        logger = trt.Logger(trt.Logger.INFO)
+
+        # Read engine file
+        with open(weight, "rb") as f, trt.Runtime(logger) as runtime:
+            try:
+                meta_len = int.from_bytes(f.read(4), byteorder="little")
+                metadata = json.loads(f.read(meta_len).decode("utf-8"))
+                dla = metadata.get("dla", None)
+                if dla is not None:
+                    runtime.DLA_core = int(dla)
+            except UnicodeDecodeError:
+                f.seek(0)
+                metadata = None
+            engine = runtime.deserialize_cuda_engine(f.read())
+            self.apply_metadata(metadata)
+        try:
+            self.context = engine.create_execution_context()
+        except Exception as e:
+            LOGGER.error("TensorRT model exported with a different version than expected\n")
+            raise e
+
+        # Setup bindings
+        self.bindings = OrderedDict()
+        self.output_names = []
+        self.fp16 = False
+        self.dynamic = False
+        self.is_trt10 = not hasattr(engine, "num_bindings")
+        num = range(engine.num_io_tensors) if self.is_trt10 else range(engine.num_bindings)
+
+        for i in num:
+            if self.is_trt10:
+                name = engine.get_tensor_name(i)
+                dtype = trt.nptype(engine.get_tensor_dtype(name))
+                is_input = engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT
+                shape = tuple(engine.get_tensor_shape(name))
+                profile_shape = tuple(engine.get_tensor_profile_shape(name, 0)[2]) if is_input else None
+            else:
+                name = engine.get_binding_name(i)
+                dtype = trt.nptype(engine.get_binding_dtype(i))
+                is_input = engine.binding_is_input(i)
+                shape = tuple(engine.get_binding_shape(i))
+                profile_shape = tuple(engine.get_profile_shape(0, i)[1]) if is_input else None
+
+            if is_input:
+                if -1 in shape:
+                    self.dynamic = True
+                    if self.is_trt10:
+                        self.context.set_input_shape(name, profile_shape)
+                    else:
+                        self.context.set_binding_shape(i, profile_shape)
+                if dtype == np.float16:
+                    self.fp16 = True
+            else:
+                self.output_names.append(name)
+
+            shape = (
+                tuple(self.context.get_tensor_shape(name))
+                if self.is_trt10
+                else tuple(self.context.get_binding_shape(i))
+            )
+            im = torch.from_numpy(np.empty(shape, dtype=dtype)).to(self.device)
+            self.bindings[name] = Binding(name, dtype, shape, im, int(im.data_ptr()))
+
+        self.binding_addrs = OrderedDict((n, d.ptr) for n, d in self.bindings.items())
+        self.model = engine
+
+    def forward(self, im: torch.Tensor) -> list[torch.Tensor]:
+        """Run NVIDIA TensorRT inference with dynamic shape handling.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format on the CUDA device.
+
+        Returns:
+            (list[torch.Tensor]): Model predictions as a list of tensors on the CUDA device.
+        """
+        if self.dynamic and im.shape != self.bindings["images"].shape:
+            if self.is_trt10:
+                self.context.set_input_shape("images", im.shape)
+                self.bindings["images"] = self.bindings["images"]._replace(shape=im.shape)
+                for name in self.output_names:
+                    self.bindings[name].data.resize_(tuple(self.context.get_tensor_shape(name)))
+            else:
+                i = self.model.get_binding_index("images")
+                self.context.set_binding_shape(i, im.shape)
+                self.bindings["images"] = self.bindings["images"]._replace(shape=im.shape)
+                for name in self.output_names:
+                    i = self.model.get_binding_index(name)
+                    self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))
+
+        s = self.bindings["images"].shape
+        assert im.shape == s, f"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}"
+
+        self.binding_addrs["images"] = int(im.data_ptr())
+        self.context.execute_v2(list(self.binding_addrs.values()))
+        return [self.bindings[x].data for x in sorted(self.output_names)]
--- a/ultralytics/nn/backends/triton.py
+++ b/ultralytics/nn/backends/triton.py
@ -0,0 +1,45 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import torch
+
+from ultralytics.utils.checks import check_requirements
+
+from .base import BaseBackend
+
+
+class TritonBackend(BaseBackend):
+    """NVIDIA Triton Inference Server backend for remote model serving.
+
+    Connects to and runs inference with models hosted on an NVIDIA Triton Inference Server instance via HTTP or gRPC
+    protocols. The model is specified using a triton:// URL scheme.
+    """
+
+    def load_model(self, weight: str | Path) -> None:
+        """Connect to a remote model on an NVIDIA Triton Inference Server.
+
+        Args:
+            weight (str | Path): Triton model URL (e.g., 'http://localhost:8000/model_name').
+        """
+        check_requirements("tritonclient[all]")
+        from ultralytics.utils.triton import TritonRemoteModel
+
+        self.model = TritonRemoteModel(weight)
+
+        # Copy metadata from Triton model
+        if hasattr(self.model, "metadata"):
+            self.apply_metadata(self.model.metadata)
+
+    def forward(self, im: torch.Tensor) -> list:
+        """Run inference via the NVIDIA Triton Inference Server.
+
+        Args:
+            im (torch.Tensor): Input image tensor in BCHW format, normalized to [0, 1].
+
+        Returns:
+            (list): Model predictions as a list of numpy arrays from the Triton server.
+        """
+        return self.model(im.cpu().numpy())
--- a/ultralytics/solutions/solutions.py
+++ b/ultralytics/solutions/solutions.py
@ -253,7 +253,7 @@ class BaseSolution:
                f" {', '.join([f'{v} {self.names[k]}' for k, v in counts.items()])}\n"
                f"Speed: {track_or_predict_speed:.1f}ms {track_or_predict}, "
                f"{solution_speed:.1f}ms solution per image at shape "
-                f"(1, {getattr(self.model, 'ch', 3)}, {result.plot_im.shape[0]}, {result.plot_im.shape[1]})\n"
+                f"(1, {getattr(self.model, 'channels', 3)}, {result.plot_im.shape[0]}, {result.plot_im.shape[1]})\n"
            )
        return result