YOLO Format Format Guide
Annotation format for YOLO object detection models
AnnotationSpecification
The YOLO annotation format is a simple text-based labeling format used for training YOLO (You Only Look Once) object detection models. Each image in the dataset has a corresponding .txt annotation file with the same base name. Every line in the annotation file represents one bounding box and contains five space-separated values: the class index (integer), the x-center coordinate, the y-center coordinate, the box width, and the box height. All coordinates are normalized to the range [0, 1] relative to the image dimensions.
The normalization convention means that x-center and width are divided by the image width, while y-center and height are divided by the image height. This makes annotations resolution-independent — the same annotation file works correctly regardless of whether the image is resized. The class index is a zero-based integer that maps to class names defined in a separate configuration file (typically data.yaml). An image with no objects has an empty annotation file or no annotation file at all.
For segmentation tasks, the YOLO format extends to polygon annotations where each line contains the class index followed by pairs of x,y coordinates defining the polygon vertices. For oriented bounding boxes (OBB), the format uses class index followed by four x,y corner point pairs. For pose estimation, keypoints are appended after the bounding box as x,y,visibility triplets for each keypoint. The Ultralytics YOLOv8 framework has standardized these extended formats across detection, segmentation, classification, pose estimation, and oriented bounding box tasks.
When to Use YOLO Format
Use the YOLO format when training any model in the YOLO family — YOLOv5, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, or RT-DETR through the Ultralytics framework. YOLO is the most popular object detection architecture for real-time applications, and the annotation format is supported by all major labeling tools including Label Studio, CVAT, Roboflow, Labelbox, and V7. If your use case involves real-time object detection, the YOLO format and training pipeline is likely the fastest path to deployment.
Choose YOLO format over COCO format when your workflow is centered on YOLO-family models and you prefer the simplicity of one-text-file-per-image. YOLO format is easier to manually inspect, edit, and version control because each annotation is a small text file rather than a single large JSON. It also avoids the complexity of COCO's nested JSON structure with separate category, annotation, and image dictionaries.
YOLO format is less suitable when you need to store rich annotation metadata (annotator ID, confidence scores, annotation timestamps), when your task requires non-rectangular annotations beyond what YOLO's polygon format supports, or when you are training models outside the YOLO family that expect COCO, VOC, or other annotation formats.
Schema / Structure
YOLO Detection Format (per line):
<class_id> <x_center> <y_center> <width> <height>
Where:
class_id - Integer class index (0-based)
x_center - Bounding box center X (normalized 0.0-1.0)
y_center - Bounding box center Y (normalized 0.0-1.0)
width - Bounding box width (normalized 0.0-1.0)
height - Bounding box height (normalized 0.0-1.0)
YOLO Segmentation Format (per line):
<class_id> <x1> <y1> <x2> <y2> ... <xn> <yn>
Dataset Directory Structure:
dataset/
├── data.yaml # Class names and paths
├── train/
│ ├── images/
│ │ ├── img001.jpg
│ │ └── img002.jpg
│ └── labels/
│ ├── img001.txt
│ └── img002.txt
├── val/
│ ├── images/
│ └── labels/
└── test/
├── images/
└── labels/Example Data
# data.yaml - Dataset configuration
path: ./dataset
train: train/images
val: val/images
test: test/images
names:
0: person
1: car
2: bicycle
3: traffic_light
# --- labels/img001.txt ---
# Two people and one car detected in the image
0 0.4531 0.3275 0.1200 0.4500
0 0.7125 0.4100 0.0950 0.3800
1 0.2800 0.5500 0.3200 0.2800
# --- labels/img002.txt ---
# One bicycle and one traffic light
2 0.6200 0.6800 0.1500 0.2200
3 0.1500 0.1200 0.0400 0.0800Ertas Support
Ertas Data Suite supports YOLO format datasets for computer vision training data preparation. You can import YOLO-formatted annotation datasets, apply data quality checks including annotation validation (verifying coordinate ranges, class index validity, and bounding box sanity), and export cleaned datasets maintaining the YOLO directory structure. PII redaction can be applied to associated metadata files while preserving annotation integrity.
Related Resources
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.