Object Detection: Supported Models

Object Detection & Tracking is built on the robust Ultralytics model architecture. We offer support for a wide range of models, each tailored to specific tasks like object detection, instance segmentation, image classification, pose estimation and multi-object tracking.

Featured Models

Here are some of the key models supported:

YOLOv5

  • Description: An improved version of the YOLO architecture by Ultralytics, offering better performance and speed trade-offs compared to previous versions.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBYOLOv5COCO64.5%85.1%

YOLOv8

  • Description: The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBYOLOv8COCO70.1%90.2%

YOLOv9

  • Description: An experimental model trained on the Ultralytics YOLOv5 codebase implementing Programmable Gradient Information (PGI).
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBYOLOv9COCO71.5%91.0%

YOLOv10

  • Description: By Tsinghua University, featuring NMS-free training and efficiency-accuracy driven architecture, delivering state-of-the-art performance and latency.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBYOLOv10COCO72.3%92.0%

Segment Anything Model (SAM)

  • Description: Meta's Segment Anything Model (SAM).
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBSAMCOCO73.0%92.5%

Mobile Segment Anything Model (MobileSAM)

  • Description: MobileSAM for mobile applications, by Kyung Hee University.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBMobileSAMCOCO68.0%88.0%

Fast Segment Anything Model (FastSAM)

  • Description: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBFastSAMCOCO72.0%91.5%

YOLO-NAS

  • Description: YOLO Neural Architecture Search (NAS) Models.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBYOLO-NASCOCO74.0%93.0%

Realtime Detection Transformers (RT-DETR)

  • Description: Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBRT-DETRCOCO75.0%93.5%

YOLO-World

  • Description: Real-time Open Vocabulary Object Detection models from Tencent AI Lab.
ModalityBackbonePretrainedTop-1 AccuracyTop-5 Accuracy
RGBYOLO-WorldCOCO73.5%92.7%