
From n8n Workflow to Fine-Tuned Model: A Step-by-Step Agency Playbook
A tactical guide for n8n agencies: collect client interaction data via workflows, clean and format it, fine-tune a model in Ertas Studio, deploy locally, and connect back to n8n for inference.
You have n8n workflows running for your clients. They call OpenAI or Anthropic APIs for classification, summarisation, generation, or analysis tasks. The workflows work, but the API costs eat your margins and the quality is inconsistent.
Here is the playbook for turning those existing n8n workflows into a fine-tuning pipeline — using the interaction data you are already generating to train a custom model that is cheaper, faster, and more accurate.
The Pipeline Overview
n8n Workflows (existing) → Data Collection → Cleaning → Fine-Tuning → Local Deployment → n8n Workflows (updated)
You start and end with n8n. The middle steps transform your client's usage data into a custom model that replaces the API calls.
Step 1: Collect Client Interaction Data via n8n
Your existing n8n workflows already contain the training data you need — every API call includes an input (the instruction) and an output (the model's response). You just need to capture it.
Add a Data Collection Branch
For each workflow that calls an AI API, add a parallel branch that logs the interaction:
-
After the HTTP Request node (API call), add a Set node that extracts:
- The input prompt/message sent to the API
- The response received from the API
- A timestamp
- Client identifier
- Task type (classification, summarisation, etc.)
-
Route this to a Google Sheets, Airtable, or PostgreSQL node that stores the records.
For workflows already in production, you can add this logging branch without disrupting the existing flow — n8n's branching model lets you add parallel paths.
What to Capture
{
"instruction": "Summarise this customer support ticket: [ticket text]",
"response": "The customer is requesting a refund for order #12345 due to a defective product received on 2026-01-15...",
"task_type": "ticket_summarisation",
"client_id": "client_acme",
"timestamp": "2026-02-10T14:30:00Z",
"model_used": "gpt-4o",
"was_accepted": true
}
The was_accepted field is optional but valuable — if the client's team reviews AI outputs and sometimes rejects them, tracking this helps filter for high-quality training data.
Volume Targets
| Fine-Tuning Quality | Examples Needed | Collection Time (typical) |
|---|---|---|
| Minimum viable | 500 | 1-2 weeks |
| Good quality | 1,500-2,000 | 3-6 weeks |
| Production-ready | 3,000+ | 6-12 weeks |
Start collecting now, even if you are weeks away from fine-tuning. More data produces better models.
Step 2: Clean and Format the Dataset
Raw interaction logs need cleaning before fine-tuning. Build an n8n workflow for this or do it manually — the choice depends on volume.
Automated Cleaning (n8n Workflow)
Create a data cleaning workflow that:
- Reads from your data store (Google Sheets, PostgreSQL, etc.)
- Filters out rejected responses (where
was_acceptedis false) - Removes duplicates (same instruction with same response)
- Normalises formatting (consistent line breaks, trim whitespace)
- Validates structure (instruction and response fields are non-empty, reasonable length)
- Exports as JSONL (one JSON object per line)
Manual Review
For the first fine-tuning run, manually review a sample (100-200 examples):
- Are the instructions clear and representative of the task?
- Are the responses high quality? (Would you want the model to produce this?)
- Is there sensitive data that needs removal? (PII, API keys, internal references)
- Are there edge cases that should be excluded from training?
Output Format
The final JSONL file should look like:
{"instruction": "Classify this email as: billing, technical, general, or spam.\n\nEmail: I can't log into my account after the update...", "response": "technical"}
{"instruction": "Summarise this support ticket for the weekly report:\n\nTicket: Customer reported that...", "response": "Customer experienced login failure after v2.3 update. Resolution: cleared browser cache and reset session tokens. Time to resolve: 15 minutes."}
Step 3: Fine-Tune in Ertas Studio
With your cleaned JSONL file ready:
- Create a project in Ertas Studio for this client and task
- Upload the JSONL file — Studio validates format and shows data statistics
- Select base model — Llama 3.1 8B for most agency tasks, Mistral 7B as an alternative
- Configure training:
- LoRA rank: 16 (default, works for most tasks)
- Epochs: 3
- Learning rate: 2e-4
- Start training — typically 30-60 minutes for 2,000 examples on an 8B model
- Evaluate — use Studio's side-by-side comparison to test the fine-tuned model against sample inputs
Quality Check
Before deploying, test with 20-30 examples the model has never seen:
- Does the fine-tuned model match or exceed the API model's quality?
- Is the output format consistent?
- Does it handle edge cases correctly?
If quality is not sufficient, common fixes:
- Add more training data (especially for the cases where quality is weak)
- Increase LoRA rank from 16 to 32
- Add another epoch of training
- Improve data quality (remove noisy examples)
Step 4: Deploy the Model Locally
Export your fine-tuned model from Ertas Studio in GGUF format (for Ollama) or SafeTensors (for vLLM).
Deploy with Ollama
# Create a Modelfile
echo 'FROM llama3.1:8b
ADAPTER /path/to/your-adapter.gguf' > Modelfile
# Register the model
ollama create client-acme-summariser -f Modelfile
# Test it
ollama run client-acme-summariser "Summarise this ticket: ..."
Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1/chat/completions.
Deploy with vLLM
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3.1-8B \
--enable-lora \
--lora-modules client-acme=/path/to/adapter \
--host 0.0.0.0 --port 8000
vLLM exposes an OpenAI-compatible API at http://your-server:8000/v1/chat/completions.
Step 5: Connect n8n to Your Local Model
This is the payoff. Update your existing n8n workflows to point at your local model instead of the cloud API.
Option A: Change the API URL
If you are using n8n's HTTP Request node to call OpenAI:
- Change the URL from
https://api.openai.com/v1/chat/completionstohttp://localhost:11434/v1/chat/completions(Ollama) orhttp://your-server:8000/v1/chat/completions(vLLM) - Update the model parameter from
gpt-4oto your model name - Remove or update the API key (Ollama does not require one)
That is it. The request/response format is identical. Your workflow logic, error handling, and output processing stay the same.
Option B: Use n8n's OpenAI Credentials with Custom Base URL
- In n8n, create a new OpenAI credential
- Set the base URL to your local endpoint
- Set the API key to any string (e.g., "local")
- Use this credential in your existing OpenAI nodes
- Change the model name to your fine-tuned model
This approach requires no workflow changes beyond updating the credential — every node that uses the credential automatically switches to local inference.
Testing the Switch
Before switching production workflows:
- Clone the workflow — create a copy that uses the local model
- Run both in parallel for 24-48 hours
- Compare outputs — are the local model's results equal or better?
- Monitor latency — local inference should be faster for most workloads
- Switch over — update the production workflow to use the local endpoint
Step 6: Iterate and Improve
Fine-tuning is not a one-time event. The model improves with feedback:
Continuous Data Collection
Keep the data collection branch active in your updated workflows. Now it captures:
- Interactions with your fine-tuned model (not the API)
- Client feedback (accepted/rejected)
- Edge cases where the model underperforms
Periodic Retraining
Every 4-8 weeks (or when quality issues surface):
- Export new interaction data from your logging pipeline
- Add corrective examples for cases where the model struggled
- Combine with original training data
- Retrain in Ertas Studio
- Evaluate against the previous model version
- Deploy if improved
Track Improvement Over Time
Log model versions and corresponding quality metrics. Over 3-4 training cycles, you will see measurable improvement as the model learns from real-world usage patterns.
The Business Impact
| Metric | Before (API) | After (Local Fine-Tuned) |
|---|---|---|
| Monthly API cost (per client) | $150-500 | ~$0 |
| Response latency | 800-2000ms | 200-500ms |
| Output quality | Generic | Client-specific |
| Data privacy | Data sent to third party | Data stays local |
| Scalability | Linear cost increase | Fixed cost (GPU tier) |
For most agencies, the switch pays for itself within 1-3 months. More importantly, it transforms the service from "we connect your workflows to ChatGPT" to "we build custom AI models trained on your data" — a significantly higher-value proposition.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- n8n + Local LLMs: Building HIPAA-Compliant Automation — Healthcare-specific n8n + local LLM workflows
- AI Agency Tech Stack for Legal Clients — The complete architecture for agencies serving law firms
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Multi-Client Fine-Tuning: One Base Model, Custom LoRA Adapters Per Law Firm
How to use LoRA adapters to serve multiple law firm clients from a single base model — covering architecture, training, hot-swapping, cost efficiency, and data isolation guarantees.

Managing 50+ LoRA Adapters in Production: Versioning and Organization
Practical systems for managing dozens of LoRA adapters across multiple clients, tasks, and base models — covering naming conventions, metadata, registries, multi-LoRA serving, and scaling milestones from 10 to 100+ adapters.

How to Power OpenClaw with Fine-Tuned Local Models (No API Costs)
OpenClaw defaults to cloud APIs that charge per token. Here's how to run it on fine-tuned local models via Ollama for better domain performance and zero marginal inference cost.