
How to Fine-Tune a Legal AI Model Without an ML Team
Most AI agencies don't have ML engineers on staff. Here's how to fine-tune production-quality legal AI models using Ertas Studio — no Python, no GPU rental, no ML expertise required.
The biggest bottleneck for AI agencies entering the legal vertical is not sales or compliance knowledge — it is ML expertise. Fine-tuning a language model traditionally requires Python proficiency, understanding of training hyperparameters, access to GPU infrastructure, and experience debugging training runs.
Most agencies do not have this. They have automation engineers, workflow specialists, and client-facing consultants. Hiring an ML engineer adds $150K-250K in salary before they produce anything.
This article shows how to bypass the ML bottleneck entirely using Ertas Studio, walking through a complete legal model fine-tuning workflow from data preparation to deployment.
The Agency Bottleneck
A typical AI agency workflow for a legal client looks like this:
- Client wants AI-powered contract review
- Agency builds prompt-engineered solution with GPT-4o
- Results are decent but not accurate enough for production use
- Client asks for better accuracy and data privacy
- Agency realises they need fine-tuning and on-premise deployment
- Agency hits the wall — no one on the team knows how to fine-tune a model
At this point, agencies either:
- Turn down the engagement (lost revenue)
- Outsource to an ML contractor ($10K-30K per model, 4-8 week timeline)
- Try to learn fine-tuning from scratch (months of experimentation)
None of these options are sustainable. Ertas Studio provides a fourth option: a no-code fine-tuning interface purpose-built for non-ML teams.
Walkthrough: Fine-Tuning a Legal Contract Review Model
Here is the complete workflow, step by step.
Step 1: Prepare Your Training Data
Training data for legal contract review comes from the law firm's historical work. You need instruction-response pairs where:
- Instruction: A contract clause or section + the analysis question
- Response: The lawyer's analysis, risk assessment, or extracted information
Example:
{
"instruction": "Review the following indemnification clause and assess risk level:\n\n'Tenant shall indemnify and hold harmless Landlord from and against any and all claims, damages, losses, costs and expenses (including reasonable attorneys' fees) arising out of or in connection with Tenant's use of the Premises, except to the extent caused by Landlord's gross negligence or willful misconduct.'",
"response": "Risk Level: MODERATE\n\nAnalysis: This is a standard one-way indemnification clause favouring the landlord. The carve-out for landlord's gross negligence/willful misconduct provides some protection, but the clause is broader than ideal. Key concerns:\n1. 'Any and all claims' is very broad — consider narrowing to claims arising from tenant's negligence\n2. No cap on indemnification liability\n3. Includes attorneys' fees which could be significant\n\nRecommendation: Negotiate to add mutual indemnification and a liability cap tied to the lease value."
}
Data requirements:
- Minimum: 500 examples (for a basic model)
- Recommended: 2,000-3,000 examples (for production quality)
- Format: JSONL (one JSON object per line)
Where to source data:
- Export from document management systems (iManage, NetDocuments)
- Convert lawyer annotations and comments into structured pairs
- Use historical review memoranda as response templates
Step 2: Upload to Ertas Studio
In Ertas Studio:
- Create a new project and name it (e.g., "Acme Legal - Contract Review")
- Upload your JSONL training file
- Studio automatically validates the format and shows a preview of your examples
- Review the data statistics — distribution of response lengths, instruction categories
Studio flags potential data quality issues: duplicate entries, extremely short responses, formatting inconsistencies. Fix these before proceeding.
Step 3: Configure Training
Studio presents training configuration with sensible defaults:
| Parameter | Default | What It Means |
|---|---|---|
| Base model | Llama 3.1 8B | The foundation model to fine-tune |
| Adapter type | LoRA | Trains a small adapter, not the full model |
| LoRA rank | 16 | Controls adapter capacity (higher = more capacity, more compute) |
| Epochs | 3 | Number of passes through the training data |
| Learning rate | 2e-4 | How aggressively the model learns (lower = more stable) |
For legal tasks, the defaults work well. The main decision is base model size:
- 8B: Fast training, runs on consumer GPUs, sufficient for single-task models (e.g., just contract review)
- 13B: Slower training, needs more VRAM, better for multi-task models (contract review + case summarisation + document classification)
Step 4: Train
Click "Start Training." Studio handles:
- Tokenisation and data formatting
- GPU allocation and scheduling
- Training execution with automatic checkpointing
- Evaluation on a held-out validation set
- Loss curves and quality metrics displayed in real-time
Training time for a 2,000-example dataset on an 8B model: approximately 30-60 minutes.
Step 5: Evaluate
Once training completes, Studio provides an evaluation interface:
- Side-by-side comparison: Send the same contract clause to both the base model and your fine-tuned model. Compare outputs.
- Validation metrics: Loss on held-out data, response quality scores
- Test with custom inputs: Paste any contract clause and see the fine-tuned model's analysis
This is where the quality difference becomes obvious. The base model produces generic, sometimes inaccurate analysis. The fine-tuned model produces analysis that mirrors the firm's own lawyers — using their terminology, applying their risk thresholds, following their reporting format.
Step 6: Export and Deploy
Export the fine-tuned model in your preferred format:
- GGUF: For deployment via Ollama (recommended for most agency deployments)
- SafeTensors: For deployment via vLLM or Hugging Face Transformers
- LoRA adapter only: If deploying alongside other adapters from the same base model
Deploy to the client's hardware using Ollama or vLLM. The model runs locally — no internet connection required after deployment.
Before and After: Output Quality Comparison
Input clause:
"Notwithstanding any other provision of this Agreement, either party may terminate this Agreement immediately upon written notice if the other party commits a material breach and fails to cure such breach within ten (10) business days of receiving written notice thereof."
Base model (Llama 3.1 8B, no fine-tuning):
"This is a termination clause. It allows either party to terminate the agreement if there is a material breach. The cure period is 10 business days."
Fine-tuned model (same base, trained on firm's data):
"Risk Level: LOW-MODERATE
Standard mutual termination for material breach with cure period. The 10 business day cure period is shorter than our typical recommendation of 30 calendar days. Key considerations:
- 'Material breach' is undefined — recommend adding examples or referencing specific obligations
- 10 business days may be insufficient for complex curing actions (e.g., insurance procurement, structural repairs)
- 'Immediately upon written notice' after cure failure is acceptable
- 'Notwithstanding any other provision' creates potential conflict with force majeure clause in Section 14
Recommendation: Negotiate cure period to 30 calendar days. Add definition of material breach or cross-reference to specific obligations."
The difference is not incremental — it is categorical. The fine-tuned model produces the kind of analysis a junior associate would write after reviewing the firm's analysis guidelines.
From One Model to a Scalable Practice
Once you have fine-tuned your first legal model, the process is replicable:
- Same workflow, different client: Each new law firm engagement follows the same data → train → deploy pipeline
- Same base model, different adapters: Train client-specific LoRA adapters from the same base model
- Same infrastructure, multiple models: A single GPU serves multiple client models through adapter hot-swapping
- Portfolio pricing: Your per-client cost decreases with each additional client, improving margins
The ML bottleneck that stopped your agency from entering the legal vertical no longer exists.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- Fine-Tune AI Models Without Code — General guide to no-code fine-tuning with Ertas Studio
- Introducing Ertas Studio — Platform overview and capabilities
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

How to QA a Fine-Tuned Model Before Client Delivery
A complete QA process for testing fine-tuned models before delivering them to clients — covering functional testing, edge cases, regression checks, and client acceptance criteria.

Running 10+ Fine-Tuned Models for Different Clients: Operations Guide
An operations guide for AI agencies managing 10+ fine-tuned models across multiple clients — covering model organization, resource allocation, monitoring, updates, and scaling without chaos.

AI Agency Proposal Template: How to Win Custom Model Projects
Most AI agency proposals lose because they lead with technology. Here's the structure, the writing formula, and the common mistakes that cost agencies deals.