Back to blog
    How to Fine-Tune a Legal AI Model Without an ML Team
    fine-tuninglegalno-codeagencysegment:agency

    How to Fine-Tune a Legal AI Model Without an ML Team

    Most AI agencies don't have ML engineers on staff. Here's how to fine-tune production-quality legal AI models using Ertas Studio — no Python, no GPU rental, no ML expertise required.

    EErtas Team·

    The biggest bottleneck for AI agencies entering the legal vertical is not sales or compliance knowledge — it is ML expertise. Fine-tuning a language model traditionally requires Python proficiency, understanding of training hyperparameters, access to GPU infrastructure, and experience debugging training runs.

    Most agencies do not have this. They have automation engineers, workflow specialists, and client-facing consultants. Hiring an ML engineer adds $150K-250K in salary before they produce anything.

    This article shows how to bypass the ML bottleneck entirely using Ertas Studio, walking through a complete legal model fine-tuning workflow from data preparation to deployment.

    The Agency Bottleneck

    A typical AI agency workflow for a legal client looks like this:

    1. Client wants AI-powered contract review
    2. Agency builds prompt-engineered solution with GPT-4o
    3. Results are decent but not accurate enough for production use
    4. Client asks for better accuracy and data privacy
    5. Agency realises they need fine-tuning and on-premise deployment
    6. Agency hits the wall — no one on the team knows how to fine-tune a model

    At this point, agencies either:

    • Turn down the engagement (lost revenue)
    • Outsource to an ML contractor ($10K-30K per model, 4-8 week timeline)
    • Try to learn fine-tuning from scratch (months of experimentation)

    None of these options are sustainable. Ertas Studio provides a fourth option: a no-code fine-tuning interface purpose-built for non-ML teams.

    Here is the complete workflow, step by step.

    Step 1: Prepare Your Training Data

    Training data for legal contract review comes from the law firm's historical work. You need instruction-response pairs where:

    • Instruction: A contract clause or section + the analysis question
    • Response: The lawyer's analysis, risk assessment, or extracted information

    Example:

    {
      "instruction": "Review the following indemnification clause and assess risk level:\n\n'Tenant shall indemnify and hold harmless Landlord from and against any and all claims, damages, losses, costs and expenses (including reasonable attorneys' fees) arising out of or in connection with Tenant's use of the Premises, except to the extent caused by Landlord's gross negligence or willful misconduct.'",
      "response": "Risk Level: MODERATE\n\nAnalysis: This is a standard one-way indemnification clause favouring the landlord. The carve-out for landlord's gross negligence/willful misconduct provides some protection, but the clause is broader than ideal. Key concerns:\n1. 'Any and all claims' is very broad — consider narrowing to claims arising from tenant's negligence\n2. No cap on indemnification liability\n3. Includes attorneys' fees which could be significant\n\nRecommendation: Negotiate to add mutual indemnification and a liability cap tied to the lease value."
    }
    

    Data requirements:

    • Minimum: 500 examples (for a basic model)
    • Recommended: 2,000-3,000 examples (for production quality)
    • Format: JSONL (one JSON object per line)

    Where to source data:

    • Export from document management systems (iManage, NetDocuments)
    • Convert lawyer annotations and comments into structured pairs
    • Use historical review memoranda as response templates

    Step 2: Upload to Ertas Studio

    In Ertas Studio:

    1. Create a new project and name it (e.g., "Acme Legal - Contract Review")
    2. Upload your JSONL training file
    3. Studio automatically validates the format and shows a preview of your examples
    4. Review the data statistics — distribution of response lengths, instruction categories

    Studio flags potential data quality issues: duplicate entries, extremely short responses, formatting inconsistencies. Fix these before proceeding.

    Step 3: Configure Training

    Studio presents training configuration with sensible defaults:

    ParameterDefaultWhat It Means
    Base modelLlama 3.1 8BThe foundation model to fine-tune
    Adapter typeLoRATrains a small adapter, not the full model
    LoRA rank16Controls adapter capacity (higher = more capacity, more compute)
    Epochs3Number of passes through the training data
    Learning rate2e-4How aggressively the model learns (lower = more stable)

    For legal tasks, the defaults work well. The main decision is base model size:

    • 8B: Fast training, runs on consumer GPUs, sufficient for single-task models (e.g., just contract review)
    • 13B: Slower training, needs more VRAM, better for multi-task models (contract review + case summarisation + document classification)

    Step 4: Train

    Click "Start Training." Studio handles:

    • Tokenisation and data formatting
    • GPU allocation and scheduling
    • Training execution with automatic checkpointing
    • Evaluation on a held-out validation set
    • Loss curves and quality metrics displayed in real-time

    Training time for a 2,000-example dataset on an 8B model: approximately 30-60 minutes.

    Step 5: Evaluate

    Once training completes, Studio provides an evaluation interface:

    • Side-by-side comparison: Send the same contract clause to both the base model and your fine-tuned model. Compare outputs.
    • Validation metrics: Loss on held-out data, response quality scores
    • Test with custom inputs: Paste any contract clause and see the fine-tuned model's analysis

    This is where the quality difference becomes obvious. The base model produces generic, sometimes inaccurate analysis. The fine-tuned model produces analysis that mirrors the firm's own lawyers — using their terminology, applying their risk thresholds, following their reporting format.

    Step 6: Export and Deploy

    Export the fine-tuned model in your preferred format:

    • GGUF: For deployment via Ollama (recommended for most agency deployments)
    • SafeTensors: For deployment via vLLM or Hugging Face Transformers
    • LoRA adapter only: If deploying alongside other adapters from the same base model

    Deploy to the client's hardware using Ollama or vLLM. The model runs locally — no internet connection required after deployment.

    Before and After: Output Quality Comparison

    Input clause:

    "Notwithstanding any other provision of this Agreement, either party may terminate this Agreement immediately upon written notice if the other party commits a material breach and fails to cure such breach within ten (10) business days of receiving written notice thereof."

    Base model (Llama 3.1 8B, no fine-tuning):

    "This is a termination clause. It allows either party to terminate the agreement if there is a material breach. The cure period is 10 business days."

    Fine-tuned model (same base, trained on firm's data):

    "Risk Level: LOW-MODERATE

    Standard mutual termination for material breach with cure period. The 10 business day cure period is shorter than our typical recommendation of 30 calendar days. Key considerations:

    1. 'Material breach' is undefined — recommend adding examples or referencing specific obligations
    2. 10 business days may be insufficient for complex curing actions (e.g., insurance procurement, structural repairs)
    3. 'Immediately upon written notice' after cure failure is acceptable
    4. 'Notwithstanding any other provision' creates potential conflict with force majeure clause in Section 14

    Recommendation: Negotiate cure period to 30 calendar days. Add definition of material breach or cross-reference to specific obligations."

    The difference is not incremental — it is categorical. The fine-tuned model produces the kind of analysis a junior associate would write after reviewing the firm's analysis guidelines.

    From One Model to a Scalable Practice

    Once you have fine-tuned your first legal model, the process is replicable:

    1. Same workflow, different client: Each new law firm engagement follows the same data → train → deploy pipeline
    2. Same base model, different adapters: Train client-specific LoRA adapters from the same base model
    3. Same infrastructure, multiple models: A single GPU serves multiple client models through adapter hot-swapping
    4. Portfolio pricing: Your per-client cost decreases with each additional client, improving margins

    The ML bottleneck that stopped your agency from entering the legal vertical no longer exists.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading