COCO Format Format Guide

    Microsoft COCO annotation format for object detection and segmentation

    Annotation

    Specification

    The COCO (Common Objects in Context) annotation format is a comprehensive JSON-based annotation standard developed by Microsoft Research for the COCO dataset and benchmark. It has become one of the most widely adopted annotation formats in computer vision, supporting object detection, instance segmentation, keypoint detection, panoptic segmentation, image captioning, and dense pose estimation. Unlike the simpler YOLO format, COCO stores all annotations for an entire dataset in a single JSON file with a rich relational structure.

    The COCO format uses a relational data model with four primary entities: images (with id, file_name, width, height), annotations (with id, image_id, category_id, bbox, segmentation, area, iscrowd), categories (with id, name, supercategory), and optionally licenses and info metadata. Bounding boxes are stored in [x_min, y_min, width, height] format using absolute pixel coordinates (not normalized like YOLO). Segmentation masks are stored as polygon vertex lists or compressed RLE (Run-Length Encoding) for binary masks.

    The relational structure enables powerful queries — finding all annotations for a specific image, all instances of a specific category, or filtering by annotation attributes like area or crowd status. The pycocotools library provides a Python API (COCO class) for loading, querying, and evaluating against COCO-format datasets. The COCO evaluation metrics (AP, AP50, AP75, AP_small, AP_medium, AP_large) have become the standard benchmarks for object detection and segmentation model evaluation.

    When to Use COCO Format

    Use the COCO format when training models with frameworks that expect COCO-style annotations, including Detectron2, MMDetection, DETR, and many Hugging Face vision models. COCO format is the standard for academic research papers and benchmark comparisons — if you are publishing results or comparing against published baselines, COCO format and evaluation metrics are expected. It is also the best choice when your annotations include segmentation masks, keypoints, or captions alongside bounding boxes.

    Choose COCO format over YOLO format when you need rich annotation metadata (area, iscrowd flags, segmentation polygons), when you are working with panoptic segmentation tasks that require both "stuff" and "thing" annotations, or when your evaluation workflow uses the standard COCO metrics. COCO format is also preferred when you have multiple annotation types per image (bounding boxes plus segmentation plus keypoints) that need to be stored in a unified format.

    COCO format is less convenient when your dataset is very large and you want to version-control individual image annotations (a single JSON file can be difficult to diff and merge). It is also more complex to parse and generate than YOLO format, requiring either the pycocotools library or careful JSON manipulation. For simple bounding-box-only detection tasks trained with YOLO models, the YOLO format is simpler and equally effective.

    Schema / Structure

    json
    {
      "info": {
        "year": 2026,
        "version": "1.0",
        "description": "Custom object detection dataset",
        "contributor": "Ertas",
        "date_created": "2026-03-15"
      },
      "licenses": [
        {"id": 1, "name": "CC BY 4.0", "url": "https://creativecommons.org/licenses/by/4.0/"}
      ],
      "categories": [
        {"id": 1, "name": "car", "supercategory": "vehicle"},
        {"id": 2, "name": "person", "supercategory": "human"}
      ],
      "images": [
        {"id": 1, "file_name": "img001.jpg", "width": 1920, "height": 1080}
      ],
      "annotations": [
        {
          "id": 1,
          "image_id": 1,
          "category_id": 1,
          "bbox": [100.0, 200.0, 300.0, 150.0],
          "area": 45000.0,
          "segmentation": [[100,200, 400,200, 400,350, 100,350]],
          "iscrowd": 0
        }
      ]
    }
    COCO JSON annotation format with info, categories, images, and annotations sections

    Example Data

    python
    from pycocotools.coco import COCO
    import json
    
    # Load and query a COCO dataset
    coco = COCO("annotations/instances_train.json")
    
    # Get all images containing 'car'
    car_id = coco.getCatIds(catNms=["car"])[0]
    car_img_ids = coco.getImgIds(catIds=[car_id])
    print(f"Found {len(car_img_ids)} images with cars")
    
    # Get annotations for a specific image
    img_info = coco.loadImgs(car_img_ids[0])[0]
    ann_ids = coco.getAnnIds(imgIds=img_info["id"])
    anns = coco.loadAnns(ann_ids)
    for ann in anns:
        cat = coco.loadCats(ann["category_id"])[0]
        print(f"  {cat['name']}: bbox={ann['bbox']}, area={ann['area']}")
    
    # Create a COCO dataset programmatically
    dataset = {
        "info": {"version": "1.0", "description": "My dataset"},
        "categories": [
            {"id": 1, "name": "dog", "supercategory": "animal"},
            {"id": 2, "name": "cat", "supercategory": "animal"},
        ],
        "images": [
            {"id": 1, "file_name": "photo_001.jpg", "width": 640, "height": 480},
        ],
        "annotations": [
            {"id": 1, "image_id": 1, "category_id": 1,
             "bbox": [50, 100, 200, 180], "area": 36000, "iscrowd": 0,
             "segmentation": [[50,100, 250,100, 250,280, 50,280]]},
        ],
    }
    with open("annotations.json", "w") as f:
        json.dump(dataset, f, indent=2)
    Loading, querying, and creating COCO-format annotation datasets with pycocotools

    Ertas Support

    Ertas Data Suite supports COCO format import and export for computer vision training data workflows. You can import COCO JSON annotation files alongside their image datasets, validate annotation integrity (checking for orphaned annotations, missing images, and invalid category references), and export cleaned datasets in COCO format. Format conversion between COCO and YOLO is supported for workflows that require both formats.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.