diff --git a/docs/en/models/yoloe.md b/docs/en/models/yoloe.md index 1e07cba54b..ae84bb1cd5 100644 --- a/docs/en/models/yoloe.md +++ b/docs/en/models/yoloe.md @@ -12,6 +12,17 @@ keywords: YOLOE, open-vocabulary detection, real-time object detection, instance [YOLOE (Real-Time Seeing Anything)](https://arxiv.org/html/2503.07465v1) is a new advancement in zero-shot, promptable YOLO models, designed for **open-vocabulary** detection and segmentation. Unlike previous YOLO models limited to fixed categories, YOLOE uses text, image, or internal vocabulary prompts, enabling real-time detection of any object class. Built upon YOLOv10 and inspired by [YOLO-World](yolo-world.md), YOLOE achieves **state-of-the-art zero-shot performance** with minimal impact on speed and accuracy. +

+
+ +
+ Watch: How to use YOLOE with Ultralytics Python package: Open Vocabulary & Real-Time Seeing Anything 🚀 +

+ Compared to earlier YOLO models, YOLOE significantly boosts efficiency and accuracy. It improves by **+3.5 AP** over YOLO-Worldv2 on LVIS while using just a third of the training resources and achieving 1.4× faster inference speeds. Fine-tuned on COCO, YOLOE-v8-large surpasses YOLOv8-L by **0.1 mAP**, using nearly **4× less training time**. This demonstrates YOLOE's exceptional balance of accuracy, efficiency, and versatility. The sections below explore YOLOE's architecture, benchmark comparisons, and integration with the [Ultralytics](https://www.ultralytics.com/) framework. ## Architecture Overview