Why Domain Experts — Not ML Engineers — Should Own Data Labeling

There is a fundamental misalignment in how most organizations build AI systems. The people who understand the data — clinicians, attorneys, engineers, underwriters, analysts — are not the people who label it. Instead, ML engineers sit between the data and the model, making judgment calls about domains they do not fully understand.

This is not a tooling problem. It is a structural problem. And it is the single biggest reason enterprise AI projects produce mediocre results.

The Knowledge Gap in Every Labeling Pipeline

Consider a concrete example. A legal AI team is building a contract analysis model. The model needs to classify clauses as "favorable," "neutral," or "unfavorable" from the client's perspective.

An ML engineer can set up the annotation environment, write the labeling schema, configure the export pipeline. But when they encounter a limitation-of-liability clause with a carve-out for gross negligence, they cannot reliably determine whether that clause favors the client. That judgment requires years of contract negotiation experience.

What happens in practice: the ML engineer labels it based on surface-level heuristics. Maybe they flag anything with "limitation" as unfavorable. Maybe they ask an attorney over Slack, get a one-word answer without context, and move on. The label goes into the dataset. It trains the model. The model learns a shallow pattern.

Multiply this by 5,000 examples and you get a model that is confidently wrong about the cases that matter most — the edge cases where domain expertise is the difference between a useful classification and a dangerous one.

Why This Keeps Happening

The answer is straightforward: annotation tools require technical skills that domain experts do not have.

Most enterprise labeling workflows look like this:

Data lives in a cloud storage bucket or database
An ML engineer writes a Python script to extract and format the data
The data gets loaded into an annotation platform (Label Studio, Prodigy, Labelbox)
The platform requires either self-hosting (Docker, networking, authentication) or cloud upload
Annotators need accounts, training on the tool's interface, and often API access for custom label types
Completed labels get exported via Python scripts for model training

At minimum, steps 1, 2, 3, and 6 require someone comfortable with Python, command-line tools, and data engineering concepts. In most organizations, that means 2-5 people on the ML team.

The domain experts — the people whose knowledge actually determines label quality — are locked out by the infrastructure.

The Numbers Tell the Story

Research from Google's Data Cascades paper found that 92% of AI practitioners reported data quality issues in their projects, and the majority traced back to labeling and annotation problems. A 2024 study from MIT found that label errors exist in approximately 3-5% of major benchmark datasets — and these are datasets built by dedicated research teams.

In enterprise settings, where labeling is done by proxy (ML engineers labeling domain-specific data), error rates are significantly higher. We have seen organizations with 8-15% label error rates on domain-specific classification tasks. Not because anyone is careless, but because the labelers lack the domain knowledge to make correct judgments consistently.

The cost compounds. A model trained on data with 10% label errors does not just lose 10% accuracy. The errors create contradictory training signals that degrade performance across the board. In practice, a 10% label error rate can reduce model accuracy by 20-30% on the hardest examples — which are usually the ones that matter most.

What "Domain Expert Labeling" Actually Means

Giving domain experts ownership of labeling does not mean teaching them Python. It does not mean giving them a crash course in Docker or Jupyter notebooks. It means removing every technical barrier between them and the labeling task.

A radiologist should be able to open an application, see medical images, and apply labels using terminology they already understand. An attorney should be able to review contract clauses and tag them using the same categories they use in practice. A quantity surveyor should be able to look at a bill of quantities line item and classify it without learning what a JSON schema is.

The requirements for this are specific:

No installation complexity. The tool installs like any desktop application — download, double-click, run. No Docker, no terminal commands, no environment variables.

No data upload. Domain-specific data is often sensitive. Medical records, legal documents, financial data. The tool must work with local files, on the user's machine, without sending data to external servers.

No code required. Schema definition, label application, quality review, and export should all happen through a visual interface. If someone needs to write a single line of code to label data, you have already lost 90% of your domain experts.

Domain-appropriate interfaces. Text annotation for documents. Image annotation for visual data. Structured field annotation for tabular data. The interface should match how the expert thinks about the data, not how the ML pipeline consumes it.

The Proxy Labeling Tax

When domain experts cannot label directly, organizations pay what we call the "proxy labeling tax." This shows up in three ways:

Time tax. Every labeling decision requires a round-trip between the ML engineer and the domain expert. The engineer encounters an ambiguous example, messages the expert, waits for a response, interprets the response, applies the label. A task that should take 5 seconds takes 15 minutes.

Accuracy tax. Communication compresses nuance. The expert's response of "it depends on the jurisdiction and the specific carve-out language" gets compressed to a binary label. Context is lost. Edge cases get flattened.

Throughput tax. The ML team becomes the bottleneck. If you have 3 ML engineers and 50 domain experts, you are operating at 6% of your potential labeling capacity. Projects that should take weeks take months.

Organizations that eliminate the proxy labeling tax — by giving domain experts direct access to labeling tools — typically see 3-5x improvement in labeling throughput and measurable improvements in label accuracy within the first month.

What Changes When Experts Label Directly

The shift from proxy labeling to direct expert labeling changes more than throughput numbers. It changes the quality of the dataset in ways that are hard to quantify but easy to observe.

First, edge cases get labeled correctly. The examples that trip up proxy labelers — the ones that require deep domain knowledge — are exactly the examples that domain experts handle confidently.

Second, label schemas improve. When domain experts interact with the labeling schema directly, they immediately spot categories that are too broad, too narrow, or missing entirely. An attorney labeling contract clauses will tell you within an hour that "unfavorable" needs subcategories. An ML engineer might never discover that.

Third, inter-annotator agreement goes up. Domain experts share a common understanding of terminology and classification criteria. Two attorneys will agree on clause classification far more often than two ML engineers attempting the same task.

Fourth, iteration cycles shorten. When the model produces incorrect outputs, the domain expert can look at the training data and identify the labeling decisions that led to the error. They do not need to file a ticket with the ML team, wait for an investigation, and hope the engineer understands the domain context.

Making This Practical

The shift to domain-expert-owned labeling requires tooling that meets experts where they are. That means native desktop applications that work with local data, visual interfaces that require zero code, and export formats that integrate with existing ML pipelines.

Ertas Data Suite was built specifically for this use case. It runs as a native desktop application — no Docker, no cloud, no Python environment. Domain experts install it like any other application, point it at their local data, define labeling schemas through a visual interface, and start labeling. The data never leaves their machine. The labeled dataset exports in standard formats ready for model training.

The result is that the people who understand the data are the people who label the data. Which is how it should have been from the start.