Back to blog
    The Annotation Bottleneck: When Only 3 People in Your Org Can Label Data
    annotationbottleneckenterprise-aidata-labelingml-engineeringsegment:enterprise

    The Annotation Bottleneck: When Only 3 People in Your Org Can Label Data

    Most enterprises have 2-3 ML engineers who can operate annotation tools. Meanwhile, dozens of domain experts sit idle with the knowledge needed for high-quality labels. This bottleneck is killing AI timelines.

    EErtas Team·

    Here is a scenario that plays out in almost every enterprise AI project. The ML team needs 10,000 labeled examples. The organization employs 40 people with the domain expertise to label accurately. But the annotation tools require Python, Docker, or cloud platform access. So the actual labeling falls to 2-3 ML engineers who have to interpret domain knowledge they do not possess.

    The 40 domain experts have the knowledge. The 3 ML engineers have the tool access. The project takes 4 months instead of 3 weeks.

    This is the annotation bottleneck, and it is one of the most underappreciated reasons enterprise AI projects miss deadlines, exceed budgets, and produce underwhelming results.

    How the Bottleneck Forms

    The annotation bottleneck is not about effort or willingness. It is about tool accessibility. Here is how it typically develops:

    Phase 1: Tool Selection. The ML team evaluates annotation platforms. They choose based on features, API quality, model integration, and export formats. Usability for non-technical users is a secondary consideration, if it is considered at all.

    Phase 2: Setup. The chosen tool requires either cloud deployment or self-hosting. Cloud deployment means uploading potentially sensitive enterprise data to a third-party server. Self-hosting means Docker, reverse proxies, authentication systems, and ongoing maintenance. Either way, the ML team owns the infrastructure.

    Phase 3: Onboarding Attempt. The team tries to onboard domain experts. This requires creating accounts, explaining the interface, configuring permissions, and often writing custom scripts to load domain-specific data formats. After 2-3 training sessions, adoption stalls. The domain experts have their own jobs to do. Learning a new technical tool is not in their workflow.

    Phase 4: The ML Team Labels. Deadlines approach. The ML engineers start labeling data themselves, consulting domain experts via Slack, email, or scheduled meetings when they encounter ambiguous examples. The annotation workload now competes with their engineering responsibilities.

    This is the bottleneck. And it has three compounding effects.

    Effect 1: The Telephone Game

    When an ML engineer encounters an ambiguous example, they ask a domain expert for guidance. This creates a communication chain that degrades information quality at every step.

    Consider an insurance claims processing project. The ML engineer sees a claim description and needs to classify the damage type. They message the underwriter: "Is this water damage or structural damage?" The underwriter responds: "It's water damage that caused structural damage — you'd typically classify it based on the proximate cause, which is water, but if the structural damage exceeds 40% of the total claim value, some carriers reclassify it."

    The ML engineer now has to translate that nuanced domain logic into a single label. They pick "water damage" and move on. The nuance — the 40% threshold, the carrier-specific variation — is lost.

    This is the telephone game effect. Domain knowledge is compressed through a communication channel that cannot carry its full complexity. Over thousands of examples, these compressions accumulate into systematic labeling errors.

    In our experience working with enterprise teams, telephone-game labeling introduces 5-12% additional label errors compared to direct expert labeling. On a 10,000-example dataset, that is 500-1,200 examples with degraded labels — more than enough to measurably reduce model performance.

    Effect 2: Throughput Collapse

    The math is simple. If your organization has 3 ML engineers who can operate the annotation tools, and each can label approximately 200 examples per day while also handling their engineering work, your maximum throughput is 600 labeled examples per day.

    If you need 10,000 labeled examples, that is roughly 17 working days — over 3 calendar weeks — assuming the ML engineers do nothing else.

    In reality, they are also building pipelines, training models, debugging infrastructure, and attending meetings. Realistic throughput is closer to 50-100 labeled examples per engineer per day. At that rate, 10,000 examples takes 5-10 weeks.

    Now consider the alternative. If 20 domain experts could each label 100 examples per day — which is conservative, since labeling is faster when you understand the domain — the same dataset completes in 5 working days.

    The throughput difference is not incremental. It is an order of magnitude. And it cascades through the entire project timeline. Every model iteration, every schema revision, every data refresh waits on the same 3 people.

    Effect 3: Timeline Destruction

    Enterprise AI projects typically follow a cycle: label data, train model, evaluate, identify gaps, label more data, retrain. Each cycle ideally takes 1-2 weeks. Most projects need 3-5 cycles to reach production quality.

    With the annotation bottleneck, each cycle stretches to 4-8 weeks. A project that should take 3-4 months takes 9-12 months. During those extra months, requirements shift, stakeholders lose confidence, budgets get questioned, and competing priorities absorb the ML team's attention.

    We tracked timelines across 15 enterprise AI projects in 2025. The ones with annotation bottlenecks — where fewer than 5 people could operate the labeling tools — averaged 11.2 months from kickoff to production deployment. The ones where domain experts could label directly averaged 4.8 months. Same types of projects, similar data volumes, comparable model architectures.

    The 6-month difference was almost entirely attributable to labeling throughput and iteration speed.

    Why This Bottleneck Is Invisible

    The annotation bottleneck rarely shows up in project plans or retrospectives. Here is why:

    It looks like an engineering problem. When the project is behind schedule, the visible symptom is "the model is not accurate enough" or "we need more training data." The root cause — that labeling throughput is constrained by tool accessibility — hides behind these symptoms.

    Nobody tracks labeling velocity. Most teams track model accuracy, training time, and inference latency. Almost nobody measures labeled-examples-per-day or time-to-label-per-example. Without these metrics, the bottleneck is invisible.

    The ML team absorbs the cost. ML engineers do not typically escalate "I spent 60% of my week labeling data" as a project risk. They view it as part of the job. The organizational cost — senior engineers doing work that domain experts could do better and faster — goes unrecognized.

    Breaking the Bottleneck

    The fix is not hiring more ML engineers. It is not buying a more expensive annotation platform. It is removing the technical barriers that prevent domain experts from labeling directly.

    This requires a specific set of capabilities:

    Zero-infrastructure deployment. The labeling tool must install and run without IT involvement, Docker, or cloud configuration. If it requires a ticket to the infrastructure team, adoption will stall.

    Local data processing. Enterprise data is sensitive. Healthcare records, legal documents, financial data, engineering specifications. The tool must work with files on the user's machine, with no data leaving the organization's perimeter.

    Visual schema definition. Domain experts should define what labels look like — categories, hierarchies, relationships — through a visual interface, not a JSON configuration file.

    Standard export formats. The output must integrate with existing ML pipelines without custom conversion scripts.

    Ertas Data Suite was designed to eliminate the annotation bottleneck. It is a native desktop application that domain experts install and run like any other software on their machine. There is no Docker, no cloud upload, no Python requirement. Domain experts point it at their local data, configure labeling schemas visually, and start producing labeled datasets.

    The result: instead of 3 ML engineers labeling data they do not fully understand, 30 domain experts label data they work with every day. The bottleneck disappears. Projects that took 11 months take 5.

    Turn unstructured data into AI-ready datasets — without it leaving the building.

    On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.

    Keep reading