Back to blog
    Getting Started: Fine-Tune Your First Legal AI Model in 30 Minutes with Ertas
    tutorialfine-tuninglegalertas-studiogetting-startedsegment:agency

    Getting Started: Fine-Tune Your First Legal AI Model in 30 Minutes with Ertas

    A hands-on tutorial: sign up for Ertas Studio, upload a sample legal dataset, configure and run fine-tuning, evaluate the output, and export your model — all in 30 minutes.

    EErtas Team·

    This is a hands-on tutorial. By the end, you will have a fine-tuned AI model that can analyse legal contract clauses — trained on sample data, evaluated, and ready to export.

    No ML experience needed. No Python. No GPU rental. Just a browser and 30 minutes.

    What You Will Build

    A fine-tuned model that:

    • Takes a contract clause as input
    • Identifies the clause type (indemnification, termination, confidentiality, etc.)
    • Assesses the risk level (low, moderate, high)
    • Provides a brief analysis with key concerns
    • Suggests negotiation points

    This is a simplified version of the contract review models that agencies deploy for law firm clients. The process is identical — only the training data and scope differ.

    Prerequisites

    • An Ertas account (sign up at ertas.ai)
    • 30 minutes of uninterrupted time
    • No technical prerequisites — the tutorial covers everything

    Minute 0-5: Sign Up and Create a Project

    1. Go to ertas.ai and create your account
    2. From the dashboard, click New Project
    3. Name it: "Legal Contract Review - Tutorial"
    4. Select project type: Fine-Tuning

    You are now in the Ertas Studio workspace. This is where you will upload data, configure training, and evaluate your model.

    For this tutorial, we will use a sample dataset of annotated contract clauses. In a real deployment, this data comes from the law firm's historical document reviews.

    Sample Data Format

    The training data is a JSONL file where each line is a JSON object with an instruction and response:

    {"instruction": "Analyse the following contract clause for risk level and key concerns:\n\n'The Vendor shall indemnify, defend, and hold harmless the Client from and against any and all claims, losses, damages, liabilities, costs, and expenses (including reasonable attorneys' fees) arising out of or relating to any breach of this Agreement by the Vendor.'", "response": "Clause Type: Indemnification\nRisk Level: LOW\n\nAnalysis: This is a standard one-way indemnification clause where the vendor indemnifies the client. Key observations:\n1. Scope is limited to breaches of the agreement — this is appropriately narrow\n2. Includes attorneys' fees, which is standard\n3. Uses 'arising out of or relating to' — broader than 'arising out of' alone but still reasonable\n4. One-way protection favours the client\n\nNegotiation Points:\n- Consider adding a reciprocal indemnification clause\n- Consider adding a liability cap (e.g., aggregate limit equal to fees paid)\n- The 'relating to' language could be narrowed to 'directly arising from' if the vendor has leverage"}
    

    Upload Steps

    1. Click Upload Data in your project

    2. Select your JSONL file (or drag and drop)

    3. Studio validates the format and shows a preview:

      • Total examples: displayed
      • Average instruction length: displayed
      • Average response length: displayed
      • Any formatting issues: flagged
    4. Review the preview — scroll through a few examples to confirm they look correct

    5. Click Confirm Upload

    If you do not have a legal dataset prepared, Ertas Studio includes sample datasets for common use cases. Select the "Legal Contract Analysis" sample dataset to proceed with the tutorial.

    Minute 10-15: Configure Training

    With your data uploaded, configure the fine-tuning job.

    Base Model Selection

    Click Select Base Model. For this tutorial:

    • Llama 3.1 8B (Recommended) — Good balance of quality and speed for legal tasks
    • Mistral 7B — Viable alternative, slightly different output style

    Select Llama 3.1 8B.

    Training Parameters

    Studio shows default parameters with explanations. For this tutorial, keep the defaults:

    ParameterDefaultWhat It Means
    Adapter typeLoRATrains a small adapter instead of modifying the entire model
    LoRA rank16Controls how much the model can learn (16 is a good default)
    LoRA alpha32Scaling factor (2× rank is standard)
    Epochs3Number of passes through the training data
    Learning rate2e-4How aggressively the model learns
    Batch sizeAutoStudio optimises based on your data and available GPU

    For a production model, you might adjust these based on results. For the tutorial, defaults work well.

    Validation Split

    Studio automatically holds out 10% of your data for validation — these examples are not used during training and are used to measure quality afterward. This is standard ML practice and happens automatically.

    Minute 15-20: Run Fine-Tuning

    Click Start Training.

    Studio handles everything:

    1. Formats your data for the selected base model's tokeniser
    2. Allocates GPU resources
    3. Runs the training loop with automatic checkpointing
    4. Evaluates on the held-out validation set

    What You Will See

    A real-time training dashboard shows:

    • Training loss curve — should decrease and stabilise. A smooth downward curve means the model is learning.
    • Validation loss — should track training loss without diverging. If validation loss increases while training loss decreases, the model is overfitting (unlikely with these defaults).
    • Estimated time remaining — typically 15-30 minutes for a sample dataset on an 8B model.
    • Checkpoints saved — Studio saves the model at each epoch, so you can revert if a later epoch performs worse.

    While training runs, take a break or read about how LoRA works to understand what is happening under the hood.

    Minute 20-25: Evaluate the Model

    Training complete. Now test your fine-tuned model.

    Side-by-Side Comparison

    Studio provides a comparison interface. Enter a contract clause — one the model has not seen before — and compare outputs from the base model and your fine-tuned model.

    Example input:

    "Either party may terminate this Agreement upon sixty (60) days' prior written notice to the other party. In the event of termination, all outstanding invoices shall become immediately due and payable, and the receiving party shall return or destroy all Confidential Information within thirty (30) days."

    Base model output (before fine-tuning):

    This is a termination clause. It allows either party to terminate with 60 days notice. There are provisions about invoices and confidential information.

    Fine-tuned model output:

    Clause Type: Termination Risk Level: LOW-MODERATE

    Analysis: Standard mutual termination clause with reasonable notice period. Key observations:

    1. 60-day notice period is within normal range (30-90 days typical)
    2. Immediate acceleration of outstanding invoices on termination is standard but could create cash flow pressure if triggered unexpectedly
    3. 30-day window for return/destruction of Confidential Information is reasonable
    4. "Return or destroy" gives the receiving party flexibility — consider requiring certification of destruction

    Negotiation Points:

    • If you are the service provider, negotiate for a longer notice period (90 days) to allow for transition
    • Add a survival clause specifying which provisions continue after termination (indemnification, confidentiality, IP ownership)
    • Consider adding a termination-for-cause provision with a shorter notice period and cure right

    The difference is immediately visible. The fine-tuned model produces structured, analytical output that matches what a junior associate would write.

    Validation Metrics

    Studio shows quantitative metrics:

    • Validation loss — lower is better, compare against the base model
    • Response quality score — automated assessment of output coherence and completeness
    • Format adherence — how consistently the model follows the output format from training data

    Review these metrics and test 5-10 additional clauses to build confidence in the model's quality.

    Minute 25-30: Export Your Model

    Satisfied with the quality? Export the model for deployment.

    Export Options

    Click Export Model and select your format:

    • GGUF (Recommended for most deployments) — Compatible with Ollama for local inference
    • SafeTensors — Compatible with vLLM, Hugging Face Transformers
    • LoRA adapter only — Just the adapter file, to be used alongside the base model

    For this tutorial, select GGUF.

    Download

    Studio packages and quantises the model (reducing file size while preserving quality). The download is typically 4-6 GB for an 8B model.

    Deploy (Bonus Step)

    To run your model locally:

    # Install Ollama (if you haven't already)
    # Visit https://ollama.com
    
    # Create a Modelfile
    echo 'FROM /path/to/your-exported-model.gguf' > Modelfile
    
    # Register the model
    ollama create legal-contract-review -f Modelfile
    
    # Test it
    ollama run legal-contract-review "Analyse this clause: [paste a clause]"
    

    Your fine-tuned legal AI model is now running locally. No API costs. No data sent to third parties. Ready to integrate with n8n or any application that supports OpenAI-compatible APIs.

    What is Next

    You have completed a fine-tuning run on sample data. To move from tutorial to production:

    1. Collect real training data from your law firm client — historical contract reviews, annotated documents, analysis memos
    2. Increase the dataset — 2,000-3,000 examples for production quality
    3. Customise for the client — their risk thresholds, terminology, formatting preferences
    4. Deploy on client hardware — on-premise for privilege and compliance
    5. Iterate — collect feedback, add examples, retrain periodically

    The process scales to any legal task — due diligence, legal research, regulatory compliance, document classification. The pipeline is the same: data → fine-tune → evaluate → deploy.

    For a deeper dive into building a legal AI practice, see our guide on fine-tuning legal AI without an ML team.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading