Object Detection: Supported Models
Object Detection & Tracking is built on the robust Ultralytics model architecture. We offer support for a wide range of models, each tailored to specific tasks like object detection, instance segmentation, image classification, pose estimation and multi-object tracking.
Featured Models
Here are some of the key models supported:
YOLOv5
- Description: An improved version of the YOLO architecture by Ultralytics, offering better performance and speed trade-offs compared to previous versions.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | YOLOv5 | COCO | 64.5% | 85.1% |
YOLOv8
- Description: The latest version of the YOLO family, featuring enhanced capabilities such as instance segmentation, pose/keypoints estimation, and classification.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | YOLOv8 | COCO | 70.1% | 90.2% |
YOLOv9
- Description: An experimental model trained on the Ultralytics YOLOv5 codebase implementing Programmable Gradient Information (PGI).
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | YOLOv9 | COCO | 71.5% | 91.0% |
YOLOv10
- Description: By Tsinghua University, featuring NMS-free training and efficiency-accuracy driven architecture, delivering state-of-the-art performance and latency.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | YOLOv10 | COCO | 72.3% | 92.0% |
Segment Anything Model (SAM)
- Description: Meta's Segment Anything Model (SAM).
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | SAM | COCO | 73.0% | 92.5% |
Mobile Segment Anything Model (MobileSAM)
- Description: MobileSAM for mobile applications, by Kyung Hee University.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | MobileSAM | COCO | 68.0% | 88.0% |
Fast Segment Anything Model (FastSAM)
- Description: FastSAM by Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | FastSAM | COCO | 72.0% | 91.5% |
YOLO-NAS
- Description: YOLO Neural Architecture Search (NAS) Models.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | YOLO-NAS | COCO | 74.0% | 93.0% |
Realtime Detection Transformers (RT-DETR)
- Description: Baidu's PaddlePaddle Realtime Detection Transformer (RT-DETR) models.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | RT-DETR | COCO | 75.0% | 93.5% |
YOLO-World
- Description: Real-time Open Vocabulary Object Detection models from Tencent AI Lab.
| Modality | Backbone | Pretrained | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| RGB | YOLO-World | COCO | 73.5% | 92.7% |