No-Code Data Labeling for Legal Teams

An ML engineer is labeling contract clauses for a legal AI model. They encounter an indemnification clause with a mutual carve-out for willful misconduct and a cap tied to 12 months of fees. Is this "standard," "favorable," or "unfavorable"?

The ML engineer guesses "standard." A commercial litigator would recognize this as favorable to the indemnifying party — the willful misconduct carve-out is narrow, and fee-based caps are typically more protective than uncapped indemnification. The difference between those labels determines whether the model learns to flag this clause for negotiation or wave it through.

Legal AI is only as good as the legal judgment embedded in its training data. And that judgment cannot come from ML engineers.

Why Legal Labeling Requires Attorneys

Legal documents are not just text. They are instruments with specific legal effects that depend on jurisdiction, governing law, parties, context, and how courts have interpreted similar language. Labeling legal data accurately requires the same skills as practicing law.

Contractual language is intentionally ambiguous. Lawyers draft provisions with constructive ambiguity — language that both parties can interpret favorably. Determining what a clause "means" for labeling purposes requires understanding how a court would likely interpret it, which requires legal training and experience.

Classification depends on perspective. The same clause is "favorable" to one party and "unfavorable" to the other. A labeler must understand which perspective the model is being trained to adopt. An ML engineer labeling without this context will produce inconsistent labels that confuse the model.

Legal significance is not proportional to text length. A two-word phrase — "including, without limitation" — has significant legal effect. A three-page recital section might have almost none. ML engineers tend to weight labels by text volume. Attorneys weight by legal consequence.

Precedent matters. Whether a particular clause structure has been upheld or struck down by courts affects its classification. This knowledge lives in attorneys' experience, not in the text itself.

A 2025 study from Stanford's CodeX lab found that contract review models trained on attorney-labeled data achieved 89% agreement with senior attorney judgment, while models trained on paralegal-labeled data achieved 71% and models trained on non-legal-annotator data achieved 54%. The gap is not small. It is the difference between a useful tool and an unreliable one.

The Privilege Problem

Attorney-client privilege and work product doctrine create a hard constraint that most annotation platforms cannot satisfy.

Privilege can be waived by disclosure. Attorney-client privilege protects confidential communications between attorneys and clients. When privileged documents are uploaded to a cloud annotation platform, there is a risk of privilege waiver. If the platform's employees can access the data, if the data transits through third-party infrastructure, or if the platform's terms of service grant any rights to uploaded data, privilege may be compromised.

This is not theoretical. Courts have found privilege waiver when documents were shared with third-party litigation support vendors without adequate confidentiality protections. A cloud annotation platform with access to privileged legal documents creates the same risk.

Work product doctrine has similar constraints. Documents prepared in anticipation of litigation — case analyses, strategy memos, deposition summaries — are protected work product. Sharing them with a third-party annotation platform can waive that protection if the platform is not covered by a common-interest or confidentiality agreement.

Ethical obligations compound the problem. Attorneys have professional responsibility obligations to maintain client confidences. ABA Model Rule 1.6 requires "reasonable efforts" to prevent unauthorized disclosure. Uploading client documents to a cloud platform for ML training purposes raises questions about whether this constitutes a "reasonable" use, especially without explicit client consent.

Conflict checks become impossible. Large law firms handle matters for competing clients. If contract data from Client A and Client B are uploaded to the same annotation platform, there is a risk of cross-contamination — even if the data is logically separated. The ethical screens that law firms maintain internally do not extend to third-party platforms.

The practical effect: most law firms and legal departments cannot use cloud-based annotation tools for the documents that matter most. The data that would produce the best legal AI models — privileged communications, work product, confidential client documents — is exactly the data that cannot leave the organization's control.

Self-Hosting Is Not the Answer

The obvious alternative is self-hosting an annotation platform on the firm's own infrastructure. This keeps data internal but introduces a different set of problems.

Law firms do not have DevOps teams. Most law firm IT departments manage desktops, email, document management systems, and network infrastructure. They do not run containerized applications. Asking them to deploy and maintain a Docker-based annotation platform is asking them to develop capabilities they do not have and do not need for any other purpose.

Security review is intensive. Any new application that touches client data requires review by the firm's information security team (and often the client's security team for matters governed by outside counsel guidelines). Self-hosted applications with web interfaces, database backends, and API endpoints present a larger attack surface than a desktop application, leading to longer review cycles.

Cost is disproportionate. For a firm that needs to label 5,000-10,000 examples for a specific legal AI project, the infrastructure cost and IT labor of self-hosting an annotation platform can exceed $30,000-50,000 — before anyone labels a single document.

What Attorneys Need from a Labeling Tool

Based on our work with legal teams at firms ranging from 50 to 2,000 attorneys, the requirements are clear:

Desktop-native operation. The tool runs on the attorney's laptop or workstation. Documents stay on local storage or the firm's document management system. Nothing is transmitted externally. Privilege is preserved by architecture, not by policy.

No technical prerequisites. Attorneys should not need to install Python, run terminal commands, or understand data formats. The tool should install from a standard installer and open like any desktop application.

Legal-workflow integration. Attorneys work with documents in PDF, DOCX, and text formats. The tool should open these formats natively, display them in a readable layout, and allow annotation directly on the document. Requiring format conversion before labeling adds friction that kills adoption.

Configurable taxonomy. Legal classification schemas vary by practice area, firm, and client. Contract review uses different categories than litigation document review, which uses different categories than regulatory compliance. The labeling schema should be configurable through a visual interface without modifying code.

Audit trail. Legal work requires accountability. Every label should be attributed to the attorney who applied it, timestamped, and logged. This supports quality review, inter-annotator reliability measurement, and — if the labeled data is ever challenged — defensibility of the training dataset.

The Efficiency Case

Beyond privilege and compliance, there is a straightforward efficiency argument for attorney labeling.

A mid-level associate can review and label 40-60 contract clauses per hour. They understand the language, recognize standard provisions immediately, and only slow down for genuinely unusual terms. At billing rates of $400-600/hour, the cost per labeled example is $7-15.

An ML engineer labeling the same clauses manages 15-25 per hour because they must look up terms, consult references, and message attorneys about ambiguous provisions. Their fully loaded cost is $80-120/hour, making the cost per labeled example $3-8 — cheaper per example, but lower quality and slower throughput.

When you factor in the cost of retraining models on corrected labels — which happens in roughly 40% of projects using non-attorney-labeled legal data — the attorney-labeled approach is cheaper overall and produces better models on the first iteration.

Desktop Tools Solve the Legal Labeling Problem

The constraints are clear: data cannot leave the firm, attorneys will not use technical tools, and privilege must be preserved by design.

Ertas Data Suite meets these constraints directly. It is a native desktop application that attorneys install on their workstation. Documents stay on local storage. The labeling interface is visual — no code, no command line, no data engineering. Labeling schemas are configured through point-and-click. Exports produce standard formats that ML teams consume directly.

Privilege is preserved because the architecture makes waiver impossible — data never leaves the attorney's machine. IT review is straightforward because there is no server component, no network listener, no database to secure.

Legal AI needs legal judgment in its training data. The tooling should make that judgment accessible, not lock it behind technical barriers.

No-Code Data Labeling for Legal Teams

Why Legal Labeling Requires Attorneys

The Privilege Problem

Self-Hosting Is Not the Answer

What Attorneys Need from a Labeling Tool

The Efficiency Case

Desktop Tools Solve the Legal Labeling Problem

Turn unstructured data into AI-ready datasets — without it leaving the building.

Keep reading

No-Code Data Labeling for Healthcare Teams

No-Code Data Labeling for Engineering and Construction Teams

RAG Pipeline for Non-ML Engineers: How Domain Experts Build Retrieval Systems