Back to blog
    From n8n Workflow to Fine-Tuned Model: A Step-by-Step Agency Playbook
    n8nfine-tuningplaybookagencyworkflowsegment:agency

    From n8n Workflow to Fine-Tuned Model: A Step-by-Step Agency Playbook

    A tactical guide for n8n agencies: collect client interaction data via workflows, clean and format it, fine-tune a model in Ertas Studio, deploy locally, and connect back to n8n for inference.

    EErtas Team·

    You have n8n workflows running for your clients. They call OpenAI or Anthropic APIs for classification, summarisation, generation, or analysis tasks. The workflows work, but the API costs eat your margins and the quality is inconsistent.

    Here is the playbook for turning those existing n8n workflows into a fine-tuning pipeline — using the interaction data you are already generating to train a custom model that is cheaper, faster, and more accurate.

    The Pipeline Overview

    n8n Workflows (existing) → Data Collection → Cleaning → Fine-Tuning → Local Deployment → n8n Workflows (updated)
    

    You start and end with n8n. The middle steps transform your client's usage data into a custom model that replaces the API calls.

    Step 1: Collect Client Interaction Data via n8n

    Your existing n8n workflows already contain the training data you need — every API call includes an input (the instruction) and an output (the model's response). You just need to capture it.

    Add a Data Collection Branch

    For each workflow that calls an AI API, add a parallel branch that logs the interaction:

    1. After the HTTP Request node (API call), add a Set node that extracts:

      • The input prompt/message sent to the API
      • The response received from the API
      • A timestamp
      • Client identifier
      • Task type (classification, summarisation, etc.)
    2. Route this to a Google Sheets, Airtable, or PostgreSQL node that stores the records.

    For workflows already in production, you can add this logging branch without disrupting the existing flow — n8n's branching model lets you add parallel paths.

    What to Capture

    {
      "instruction": "Summarise this customer support ticket: [ticket text]",
      "response": "The customer is requesting a refund for order #12345 due to a defective product received on 2026-01-15...",
      "task_type": "ticket_summarisation",
      "client_id": "client_acme",
      "timestamp": "2026-02-10T14:30:00Z",
      "model_used": "gpt-4o",
      "was_accepted": true
    }
    

    The was_accepted field is optional but valuable — if the client's team reviews AI outputs and sometimes rejects them, tracking this helps filter for high-quality training data.

    Volume Targets

    Fine-Tuning QualityExamples NeededCollection Time (typical)
    Minimum viable5001-2 weeks
    Good quality1,500-2,0003-6 weeks
    Production-ready3,000+6-12 weeks

    Start collecting now, even if you are weeks away from fine-tuning. More data produces better models.

    Step 2: Clean and Format the Dataset

    Raw interaction logs need cleaning before fine-tuning. Build an n8n workflow for this or do it manually — the choice depends on volume.

    Automated Cleaning (n8n Workflow)

    Create a data cleaning workflow that:

    1. Reads from your data store (Google Sheets, PostgreSQL, etc.)
    2. Filters out rejected responses (where was_accepted is false)
    3. Removes duplicates (same instruction with same response)
    4. Normalises formatting (consistent line breaks, trim whitespace)
    5. Validates structure (instruction and response fields are non-empty, reasonable length)
    6. Exports as JSONL (one JSON object per line)

    Manual Review

    For the first fine-tuning run, manually review a sample (100-200 examples):

    • Are the instructions clear and representative of the task?
    • Are the responses high quality? (Would you want the model to produce this?)
    • Is there sensitive data that needs removal? (PII, API keys, internal references)
    • Are there edge cases that should be excluded from training?

    Output Format

    The final JSONL file should look like:

    {"instruction": "Classify this email as: billing, technical, general, or spam.\n\nEmail: I can't log into my account after the update...", "response": "technical"}
    {"instruction": "Summarise this support ticket for the weekly report:\n\nTicket: Customer reported that...", "response": "Customer experienced login failure after v2.3 update. Resolution: cleared browser cache and reset session tokens. Time to resolve: 15 minutes."}
    

    Step 3: Fine-Tune in Ertas Studio

    With your cleaned JSONL file ready:

    1. Create a project in Ertas Studio for this client and task
    2. Upload the JSONL file — Studio validates format and shows data statistics
    3. Select base model — Llama 3.1 8B for most agency tasks, Mistral 7B as an alternative
    4. Configure training:
      • LoRA rank: 16 (default, works for most tasks)
      • Epochs: 3
      • Learning rate: 2e-4
    5. Start training — typically 30-60 minutes for 2,000 examples on an 8B model
    6. Evaluate — use Studio's side-by-side comparison to test the fine-tuned model against sample inputs

    Quality Check

    Before deploying, test with 20-30 examples the model has never seen:

    • Does the fine-tuned model match or exceed the API model's quality?
    • Is the output format consistent?
    • Does it handle edge cases correctly?

    If quality is not sufficient, common fixes:

    • Add more training data (especially for the cases where quality is weak)
    • Increase LoRA rank from 16 to 32
    • Add another epoch of training
    • Improve data quality (remove noisy examples)

    Step 4: Deploy the Model Locally

    Export your fine-tuned model from Ertas Studio in GGUF format (for Ollama) or SafeTensors (for vLLM).

    Deploy with Ollama

    # Create a Modelfile
    echo 'FROM llama3.1:8b
    ADAPTER /path/to/your-adapter.gguf' > Modelfile
    
    # Register the model
    ollama create client-acme-summariser -f Modelfile
    
    # Test it
    ollama run client-acme-summariser "Summarise this ticket: ..."
    

    Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1/chat/completions.

    Deploy with vLLM

    python -m vllm.entrypoints.openai.api_server \
      --model meta-llama/Llama-3.1-8B \
      --enable-lora \
      --lora-modules client-acme=/path/to/adapter \
      --host 0.0.0.0 --port 8000
    

    vLLM exposes an OpenAI-compatible API at http://your-server:8000/v1/chat/completions.

    Step 5: Connect n8n to Your Local Model

    This is the payoff. Update your existing n8n workflows to point at your local model instead of the cloud API.

    Option A: Change the API URL

    If you are using n8n's HTTP Request node to call OpenAI:

    1. Change the URL from https://api.openai.com/v1/chat/completions to http://localhost:11434/v1/chat/completions (Ollama) or http://your-server:8000/v1/chat/completions (vLLM)
    2. Update the model parameter from gpt-4o to your model name
    3. Remove or update the API key (Ollama does not require one)

    That is it. The request/response format is identical. Your workflow logic, error handling, and output processing stay the same.

    Option B: Use n8n's OpenAI Credentials with Custom Base URL

    1. In n8n, create a new OpenAI credential
    2. Set the base URL to your local endpoint
    3. Set the API key to any string (e.g., "local")
    4. Use this credential in your existing OpenAI nodes
    5. Change the model name to your fine-tuned model

    This approach requires no workflow changes beyond updating the credential — every node that uses the credential automatically switches to local inference.

    Testing the Switch

    Before switching production workflows:

    1. Clone the workflow — create a copy that uses the local model
    2. Run both in parallel for 24-48 hours
    3. Compare outputs — are the local model's results equal or better?
    4. Monitor latency — local inference should be faster for most workloads
    5. Switch over — update the production workflow to use the local endpoint

    Step 6: Iterate and Improve

    Fine-tuning is not a one-time event. The model improves with feedback:

    Continuous Data Collection

    Keep the data collection branch active in your updated workflows. Now it captures:

    • Interactions with your fine-tuned model (not the API)
    • Client feedback (accepted/rejected)
    • Edge cases where the model underperforms

    Periodic Retraining

    Every 4-8 weeks (or when quality issues surface):

    1. Export new interaction data from your logging pipeline
    2. Add corrective examples for cases where the model struggled
    3. Combine with original training data
    4. Retrain in Ertas Studio
    5. Evaluate against the previous model version
    6. Deploy if improved

    Track Improvement Over Time

    Log model versions and corresponding quality metrics. Over 3-4 training cycles, you will see measurable improvement as the model learns from real-world usage patterns.

    The Business Impact

    MetricBefore (API)After (Local Fine-Tuned)
    Monthly API cost (per client)$150-500~$0
    Response latency800-2000ms200-500ms
    Output qualityGenericClient-specific
    Data privacyData sent to third partyData stays local
    ScalabilityLinear cost increaseFixed cost (GPU tier)

    For most agencies, the switch pays for itself within 1-3 months. More importantly, it transforms the service from "we connect your workflows to ChatGPT" to "we build custom AI models trained on your data" — a significantly higher-value proposition.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading