Ertas for Document Classification
Fine-tune AI models that automatically categorize documents by type, department, urgency, or custom taxonomies — with accuracy that generic models cannot match.
The Challenge
Organizations process thousands of documents daily — contracts, invoices, correspondence, reports, applications, and compliance filings — and routing each document to the right team or workflow depends on accurate classification. Manual classification is slow, inconsistent, and scales poorly. When a single misrouted document can delay a legal filing or lose a time-sensitive business opportunity, the cost of errors is significant.
Generic AI models struggle with document classification in specialized domains because they lack context about an organization's specific document taxonomy. A general model might distinguish between an invoice and a contract, but it cannot reliably differentiate between a master services agreement and a statement of work, or between a regulatory filing and an internal compliance memo. These fine-grained distinctions require domain knowledge that can only come from training on the organization's actual document corpus — exactly the kind of task that fine-tuning is designed to solve.
The Solution
Ertas enables organizations to fine-tune classification models on their own document taxonomy using real examples from their archives. With Ertas Studio, teams upload labeled document samples in JSONL format — where each entry maps document text to its correct category — and train a lightweight LoRA adapter that teaches the model to recognize the specific patterns, vocabulary, and structural cues that distinguish each document type in their taxonomy.
The fine-tuned model can be deployed as a classification endpoint through Ollama, vLLM, or Ertas Cloud, processing incoming documents in real time with sub-second latency. Because the model runs on your infrastructure, sensitive document content never leaves your network. Ertas Vault ensures that all training data and model artifacts are encrypted and access-controlled, meeting the data governance requirements of regulated industries. As the document taxonomy evolves — new categories are added, existing ones are split or merged — teams can retrain the model in Ertas Studio with updated examples and redeploy without any application changes.
Key Features
Custom Taxonomy Training
Train classification models on your organization's exact document taxonomy using labeled examples. Support for hierarchical categories, multi-label classification, and confidence scoring per category.
Pre-Trained Document Models
Start from base models on Hub that already understand document structure — headers, footers, tables, signatures — so your fine-tuning focuses on classification accuracy rather than basic document comprehension.
Real-Time Classification API
Deploy your classifier as a low-latency REST endpoint through Cloud. Process documents on arrival with sub-second classification and route them automatically to downstream workflows.
Secure Document Processing
Vault ensures all training documents and inference data are encrypted at rest and in transit. Configurable retention policies automatically purge processed documents after classification.
Example Workflow
A large insurance company receives 10,000+ documents daily across email, fax, and web portal channels. The documents include new claims, policy amendments, medical records, adjuster reports, and legal correspondence — each requiring routing to a different department. The team exports 50,000 labeled document examples from their archive and uploads them to Ertas Vault. In Ertas Studio, they fine-tune a 7B model with a LoRA adapter targeting their 28-category taxonomy. After training, the model achieves 96% classification accuracy on a held-out test set — compared to 71% from a generic model. The classifier is deployed as an API endpoint behind their document intake system, automatically routing each incoming document to the correct department queue with a confidence score. Documents below the confidence threshold are flagged for human review, creating a feedback loop that generates additional training data for future model improvements.
Related Resources
Fine-Tuning
GGUF
Inference
JSONL
LoRA
Fine-Tune AI Models Without Writing Code
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
How to Fine-Tune an LLM: The Complete 2026 Guide
Privacy-Conscious AI Development: Fine-Tune in the Cloud, Run on Your Terms
Data Sovereignty for AI Agencies: Why Clients Demand Local Models
Flowise
LangChain
n8n
Ollama
vLLM
Ertas for Legal
Ertas for Data Extraction
Ertas for Compliance Report Generation
Ertas for Insurance Claims Processing
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.