
Getting Started: Fine-Tune Your First Legal AI Model in 30 Minutes with Ertas
A hands-on tutorial: sign up for Ertas Studio, upload a sample legal dataset, configure and run fine-tuning, evaluate the output, and export your model — all in 30 minutes.
This is a hands-on tutorial. By the end, you will have a fine-tuned AI model that can analyse legal contract clauses — trained on sample data, evaluated, and ready to export.
No ML experience needed. No Python. No GPU rental. Just a browser and 30 minutes.
What You Will Build
A fine-tuned model that:
- Takes a contract clause as input
- Identifies the clause type (indemnification, termination, confidentiality, etc.)
- Assesses the risk level (low, moderate, high)
- Provides a brief analysis with key concerns
- Suggests negotiation points
This is a simplified version of the contract review models that agencies deploy for law firm clients. The process is identical — only the training data and scope differ.
Prerequisites
- An Ertas account (sign up at ertas.ai)
- 30 minutes of uninterrupted time
- No technical prerequisites — the tutorial covers everything
Minute 0-5: Sign Up and Create a Project
- Go to ertas.ai and create your account
- From the dashboard, click New Project
- Name it: "Legal Contract Review - Tutorial"
- Select project type: Fine-Tuning
You are now in the Ertas Studio workspace. This is where you will upload data, configure training, and evaluate your model.
Minute 5-10: Upload Sample Legal Dataset
For this tutorial, we will use a sample dataset of annotated contract clauses. In a real deployment, this data comes from the law firm's historical document reviews.
Sample Data Format
The training data is a JSONL file where each line is a JSON object with an instruction and response:
{"instruction": "Analyse the following contract clause for risk level and key concerns:\n\n'The Vendor shall indemnify, defend, and hold harmless the Client from and against any and all claims, losses, damages, liabilities, costs, and expenses (including reasonable attorneys' fees) arising out of or relating to any breach of this Agreement by the Vendor.'", "response": "Clause Type: Indemnification\nRisk Level: LOW\n\nAnalysis: This is a standard one-way indemnification clause where the vendor indemnifies the client. Key observations:\n1. Scope is limited to breaches of the agreement — this is appropriately narrow\n2. Includes attorneys' fees, which is standard\n3. Uses 'arising out of or relating to' — broader than 'arising out of' alone but still reasonable\n4. One-way protection favours the client\n\nNegotiation Points:\n- Consider adding a reciprocal indemnification clause\n- Consider adding a liability cap (e.g., aggregate limit equal to fees paid)\n- The 'relating to' language could be narrowed to 'directly arising from' if the vendor has leverage"}
Upload Steps
-
Click Upload Data in your project
-
Select your JSONL file (or drag and drop)
-
Studio validates the format and shows a preview:
- Total examples: displayed
- Average instruction length: displayed
- Average response length: displayed
- Any formatting issues: flagged
-
Review the preview — scroll through a few examples to confirm they look correct
-
Click Confirm Upload
If you do not have a legal dataset prepared, Ertas Studio includes sample datasets for common use cases. Select the "Legal Contract Analysis" sample dataset to proceed with the tutorial.
Minute 10-15: Configure Training
With your data uploaded, configure the fine-tuning job.
Base Model Selection
Click Select Base Model. For this tutorial:
- Llama 3.1 8B (Recommended) — Good balance of quality and speed for legal tasks
- Mistral 7B — Viable alternative, slightly different output style
Select Llama 3.1 8B.
Training Parameters
Studio shows default parameters with explanations. For this tutorial, keep the defaults:
| Parameter | Default | What It Means |
|---|---|---|
| Adapter type | LoRA | Trains a small adapter instead of modifying the entire model |
| LoRA rank | 16 | Controls how much the model can learn (16 is a good default) |
| LoRA alpha | 32 | Scaling factor (2× rank is standard) |
| Epochs | 3 | Number of passes through the training data |
| Learning rate | 2e-4 | How aggressively the model learns |
| Batch size | Auto | Studio optimises based on your data and available GPU |
For a production model, you might adjust these based on results. For the tutorial, defaults work well.
Validation Split
Studio automatically holds out 10% of your data for validation — these examples are not used during training and are used to measure quality afterward. This is standard ML practice and happens automatically.
Minute 15-20: Run Fine-Tuning
Click Start Training.
Studio handles everything:
- Formats your data for the selected base model's tokeniser
- Allocates GPU resources
- Runs the training loop with automatic checkpointing
- Evaluates on the held-out validation set
What You Will See
A real-time training dashboard shows:
- Training loss curve — should decrease and stabilise. A smooth downward curve means the model is learning.
- Validation loss — should track training loss without diverging. If validation loss increases while training loss decreases, the model is overfitting (unlikely with these defaults).
- Estimated time remaining — typically 15-30 minutes for a sample dataset on an 8B model.
- Checkpoints saved — Studio saves the model at each epoch, so you can revert if a later epoch performs worse.
While training runs, take a break or read about how LoRA works to understand what is happening under the hood.
Minute 20-25: Evaluate the Model
Training complete. Now test your fine-tuned model.
Side-by-Side Comparison
Studio provides a comparison interface. Enter a contract clause — one the model has not seen before — and compare outputs from the base model and your fine-tuned model.
Example input:
"Either party may terminate this Agreement upon sixty (60) days' prior written notice to the other party. In the event of termination, all outstanding invoices shall become immediately due and payable, and the receiving party shall return or destroy all Confidential Information within thirty (30) days."
Base model output (before fine-tuning):
This is a termination clause. It allows either party to terminate with 60 days notice. There are provisions about invoices and confidential information.
Fine-tuned model output:
Clause Type: Termination Risk Level: LOW-MODERATE
Analysis: Standard mutual termination clause with reasonable notice period. Key observations:
- 60-day notice period is within normal range (30-90 days typical)
- Immediate acceleration of outstanding invoices on termination is standard but could create cash flow pressure if triggered unexpectedly
- 30-day window for return/destruction of Confidential Information is reasonable
- "Return or destroy" gives the receiving party flexibility — consider requiring certification of destruction
Negotiation Points:
- If you are the service provider, negotiate for a longer notice period (90 days) to allow for transition
- Add a survival clause specifying which provisions continue after termination (indemnification, confidentiality, IP ownership)
- Consider adding a termination-for-cause provision with a shorter notice period and cure right
The difference is immediately visible. The fine-tuned model produces structured, analytical output that matches what a junior associate would write.
Validation Metrics
Studio shows quantitative metrics:
- Validation loss — lower is better, compare against the base model
- Response quality score — automated assessment of output coherence and completeness
- Format adherence — how consistently the model follows the output format from training data
Review these metrics and test 5-10 additional clauses to build confidence in the model's quality.
Minute 25-30: Export Your Model
Satisfied with the quality? Export the model for deployment.
Export Options
Click Export Model and select your format:
- GGUF (Recommended for most deployments) — Compatible with Ollama for local inference
- SafeTensors — Compatible with vLLM, Hugging Face Transformers
- LoRA adapter only — Just the adapter file, to be used alongside the base model
For this tutorial, select GGUF.
Download
Studio packages and quantises the model (reducing file size while preserving quality). The download is typically 4-6 GB for an 8B model.
Deploy (Bonus Step)
To run your model locally:
# Install Ollama (if you haven't already)
# Visit https://ollama.com
# Create a Modelfile
echo 'FROM /path/to/your-exported-model.gguf' > Modelfile
# Register the model
ollama create legal-contract-review -f Modelfile
# Test it
ollama run legal-contract-review "Analyse this clause: [paste a clause]"
Your fine-tuned legal AI model is now running locally. No API costs. No data sent to third parties. Ready to integrate with n8n or any application that supports OpenAI-compatible APIs.
What is Next
You have completed a fine-tuning run on sample data. To move from tutorial to production:
- Collect real training data from your law firm client — historical contract reviews, annotated documents, analysis memos
- Increase the dataset — 2,000-3,000 examples for production quality
- Customise for the client — their risk thresholds, terminology, formatting preferences
- Deploy on client hardware — on-premise for privilege and compliance
- Iterate — collect feedback, add examples, retrain periodically
The process scales to any legal task — due diligence, legal research, regulatory compliance, document classification. The pipeline is the same: data → fine-tune → evaluate → deploy.
For a deeper dive into building a legal AI practice, see our guide on fine-tuning legal AI without an ML team.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- Getting Started with Ertas — General platform overview and first steps
- How to Fine-Tune a Legal AI Model Without an ML Team — The complete agency workflow for legal AI
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

How to Distill Open-Source Models Legally: A Step-by-Step Guide
A practical guide to model distillation the right way: using open-source teacher models with permissive licenses, your own domain data, and a clear legal path to model ownership.

Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
A step-by-step guide to uploading datasets, fine-tuning models in Ertas Studio, and deploying GGUF models — all without ML expertise.

Distilling Claude/GPT into a 7B Model for Production: Step-by-Step
A step-by-step tutorial for distilling the capabilities of Claude or GPT-4o into a 7B parameter model for local production deployment — from dataset generation through fine-tuning to GGUF export.