Getting Started: Fine-Tune Your First Legal AI Model in 30 Minutes with Ertas

This is a hands-on tutorial. By the end, you will have a fine-tuned AI model that can analyse legal contract clauses — trained on sample data, evaluated, and ready to export.

No ML experience needed. No Python. No GPU rental. Just a browser and 30 minutes.

What You Will Build

A fine-tuned model that:

Takes a contract clause as input
Identifies the clause type (indemnification, termination, confidentiality, etc.)
Assesses the risk level (low, moderate, high)
Provides a brief analysis with key concerns
Suggests negotiation points

This is a simplified version of the contract review models that agencies deploy for law firm clients. The process is identical — only the training data and scope differ.

Prerequisites

An Ertas account (sign up at ertas.ai)
30 minutes of uninterrupted time
No technical prerequisites — the tutorial covers everything

Go to ertas.ai and create your account
From the dashboard, click New Project
Name it: "Legal Contract Review - Tutorial"
Select project type: Fine-Tuning

You are now in the Ertas Studio workspace. This is where you will upload data, configure training, and evaluate your model.

Minute 5-10: Upload Sample Legal Dataset

For this tutorial, we will use a sample dataset of annotated contract clauses. In a real deployment, this data comes from the law firm's historical document reviews.

Sample Data Format

The training data is a JSONL file where each line is a JSON object with an instruction and response:

{"instruction": "Analyse the following contract clause for risk level and key concerns:\n\n'The Vendor shall indemnify, defend, and hold harmless the Client from and against any and all claims, losses, damages, liabilities, costs, and expenses (including reasonable attorneys' fees) arising out of or relating to any breach of this Agreement by the Vendor.'", "response": "Clause Type: Indemnification\nRisk Level: LOW\n\nAnalysis: This is a standard one-way indemnification clause where the vendor indemnifies the client. Key observations:\n1. Scope is limited to breaches of the agreement — this is appropriately narrow\n2. Includes attorneys' fees, which is standard\n3. Uses 'arising out of or relating to' — broader than 'arising out of' alone but still reasonable\n4. One-way protection favours the client\n\nNegotiation Points:\n- Consider adding a reciprocal indemnification clause\n- Consider adding a liability cap (e.g., aggregate limit equal to fees paid)\n- The 'relating to' language could be narrowed to 'directly arising from' if the vendor has leverage"}

Upload Steps

Click Upload Data in your project
Select your JSONL file (or drag and drop)
Studio validates the format and shows a preview:
- Total examples: displayed
- Average instruction length: displayed
- Average response length: displayed
- Any formatting issues: flagged
Review the preview — scroll through a few examples to confirm they look correct
Click Confirm Upload

If you do not have a legal dataset prepared, Ertas Studio includes sample datasets for common use cases. Select the "Legal Contract Analysis" sample dataset to proceed with the tutorial.

Minute 10-15: Configure Training

With your data uploaded, configure the fine-tuning job.

Base Model Selection

Click Select Base Model. For this tutorial:

Llama 3.1 8B (Recommended) — Good balance of quality and speed for legal tasks
Mistral 7B — Viable alternative, slightly different output style

Select Llama 3.1 8B.

Training Parameters

Studio shows default parameters with explanations. For this tutorial, keep the defaults:

Parameter	Default	What It Means
Adapter type	LoRA	Trains a small adapter instead of modifying the entire model
LoRA rank	16	Controls how much the model can learn (16 is a good default)
LoRA alpha	32	Scaling factor (2× rank is standard)
Epochs	3	Number of passes through the training data
Learning rate	2e-4	How aggressively the model learns
Batch size	Auto	Studio optimises based on your data and available GPU

For a production model, you might adjust these based on results. For the tutorial, defaults work well.

Validation Split

Studio automatically holds out 10% of your data for validation — these examples are not used during training and are used to measure quality afterward. This is standard ML practice and happens automatically.

Minute 15-20: Run Fine-Tuning

Click Start Training.

Studio handles everything:

Formats your data for the selected base model's tokeniser
Allocates GPU resources
Runs the training loop with automatic checkpointing
Evaluates on the held-out validation set

What You Will See

A real-time training dashboard shows:

Training loss curve — should decrease and stabilise. A smooth downward curve means the model is learning.
Validation loss — should track training loss without diverging. If validation loss increases while training loss decreases, the model is overfitting (unlikely with these defaults).
Estimated time remaining — typically 15-30 minutes for a sample dataset on an 8B model.
Checkpoints saved — Studio saves the model at each epoch, so you can revert if a later epoch performs worse.

While training runs, take a break or read about how LoRA works to understand what is happening under the hood.

Minute 20-25: Evaluate the Model

Training complete. Now test your fine-tuned model.

Side-by-Side Comparison

Studio provides a comparison interface. Enter a contract clause — one the model has not seen before — and compare outputs from the base model and your fine-tuned model.

Example input:

"Either party may terminate this Agreement upon sixty (60) days' prior written notice to the other party. In the event of termination, all outstanding invoices shall become immediately due and payable, and the receiving party shall return or destroy all Confidential Information within thirty (30) days."

Base model output (before fine-tuning):

This is a termination clause. It allows either party to terminate with 60 days notice. There are provisions about invoices and confidential information.

Fine-tuned model output:

Clause Type: Termination Risk Level: LOW-MODERATE

Analysis: Standard mutual termination clause with reasonable notice period. Key observations:

60-day notice period is within normal range (30-90 days typical)

Immediate acceleration of outstanding invoices on termination is standard but could create cash flow pressure if triggered unexpectedly

30-day window for return/destruction of Confidential Information is reasonable

"Return or destroy" gives the receiving party flexibility — consider requiring certification of destruction

Negotiation Points:

If you are the service provider, negotiate for a longer notice period (90 days) to allow for transition

Add a survival clause specifying which provisions continue after termination (indemnification, confidentiality, IP ownership)

Consider adding a termination-for-cause provision with a shorter notice period and cure right

The difference is immediately visible. The fine-tuned model produces structured, analytical output that matches what a junior associate would write.

Validation Metrics

Studio shows quantitative metrics:

Validation loss — lower is better, compare against the base model
Response quality score — automated assessment of output coherence and completeness
Format adherence — how consistently the model follows the output format from training data

Review these metrics and test 5-10 additional clauses to build confidence in the model's quality.

Minute 25-30: Export Your Model

Satisfied with the quality? Export the model for deployment.

Export Options

Click Export Model and select your format:

GGUF (Recommended for most deployments) — Compatible with Ollama for local inference
SafeTensors — Compatible with vLLM, Hugging Face Transformers
LoRA adapter only — Just the adapter file, to be used alongside the base model

For this tutorial, select GGUF.

Download

Studio packages and quantises the model (reducing file size while preserving quality). The download is typically 4-6 GB for an 8B model.

Deploy (Bonus Step)

To run your model locally:

# Install Ollama (if you haven't already)
# Visit https://ollama.com

# Create a Modelfile
echo 'FROM /path/to/your-exported-model.gguf' > Modelfile

# Register the model
ollama create legal-contract-review -f Modelfile

# Test it
ollama run legal-contract-review "Analyse this clause: [paste a clause]"

Your fine-tuned legal AI model is now running locally. No API costs. No data sent to third parties. Ready to integrate with n8n or any application that supports OpenAI-compatible APIs.

What is Next

You have completed a fine-tuning run on sample data. To move from tutorial to production:

Collect real training data from your law firm client — historical contract reviews, annotated documents, analysis memos
Increase the dataset — 2,000-3,000 examples for production quality
Customise for the client — their risk thresholds, terminology, formatting preferences
Deploy on client hardware — on-premise for privilege and compliance
Iterate — collect feedback, add examples, retrain periodically

The process scales to any legal task — due diligence, legal research, regulatory compliance, document classification. The pipeline is the same: data → fine-tune → evaluate → deploy.

For a deeper dive into building a legal AI practice, see our guide on fine-tuning legal AI without an ML team.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

Getting Started: Fine-Tune Your First Legal AI Model in 30 Minutes with Ertas

What You Will Build

Prerequisites