From n8n Workflow to Fine-Tuned Model: A Step-by-Step Agency Playbook

You have n8n workflows running for your clients. They call OpenAI or Anthropic APIs for classification, summarisation, generation, or analysis tasks. The workflows work, but the API costs eat your margins and the quality is inconsistent.

Here is the playbook for turning those existing n8n workflows into a fine-tuning pipeline — using the interaction data you are already generating to train a custom model that is cheaper, faster, and more accurate.

The Pipeline Overview

n8n Workflows (existing) → Data Collection → Cleaning → Fine-Tuning → Local Deployment → n8n Workflows (updated)

You start and end with n8n. The middle steps transform your client's usage data into a custom model that replaces the API calls.

Step 1: Collect Client Interaction Data via n8n

Your existing n8n workflows already contain the training data you need — every API call includes an input (the instruction) and an output (the model's response). You just need to capture it.

Add a Data Collection Branch

For each workflow that calls an AI API, add a parallel branch that logs the interaction:

After the HTTP Request node (API call), add a Set node that extracts:
- The input prompt/message sent to the API
- The response received from the API
- A timestamp
- Client identifier
- Task type (classification, summarisation, etc.)
Route this to a Google Sheets, Airtable, or PostgreSQL node that stores the records.

For workflows already in production, you can add this logging branch without disrupting the existing flow — n8n's branching model lets you add parallel paths.

What to Capture

{
  "instruction": "Summarise this customer support ticket: [ticket text]",
  "response": "The customer is requesting a refund for order #12345 due to a defective product received on 2026-01-15...",
  "task_type": "ticket_summarisation",
  "client_id": "client_acme",
  "timestamp": "2026-02-10T14:30:00Z",
  "model_used": "gpt-4o",
  "was_accepted": true
}

The was_accepted field is optional but valuable — if the client's team reviews AI outputs and sometimes rejects them, tracking this helps filter for high-quality training data.

Volume Targets

Fine-Tuning Quality	Examples Needed	Collection Time (typical)
Minimum viable	500	1-2 weeks
Good quality	1,500-2,000	3-6 weeks
Production-ready	3,000+	6-12 weeks

Start collecting now, even if you are weeks away from fine-tuning. More data produces better models.

Step 2: Clean and Format the Dataset

Raw interaction logs need cleaning before fine-tuning. Build an n8n workflow for this or do it manually — the choice depends on volume.

Automated Cleaning (n8n Workflow)

Create a data cleaning workflow that:

Reads from your data store (Google Sheets, PostgreSQL, etc.)
Filters out rejected responses (where was_accepted is false)
Removes duplicates (same instruction with same response)
Normalises formatting (consistent line breaks, trim whitespace)
Validates structure (instruction and response fields are non-empty, reasonable length)
Exports as JSONL (one JSON object per line)

Manual Review

For the first fine-tuning run, manually review a sample (100-200 examples):

Are the instructions clear and representative of the task?
Are the responses high quality? (Would you want the model to produce this?)
Is there sensitive data that needs removal? (PII, API keys, internal references)
Are there edge cases that should be excluded from training?

Output Format

The final JSONL file should look like:

{"instruction": "Classify this email as: billing, technical, general, or spam.\n\nEmail: I can't log into my account after the update...", "response": "technical"}
{"instruction": "Summarise this support ticket for the weekly report:\n\nTicket: Customer reported that...", "response": "Customer experienced login failure after v2.3 update. Resolution: cleared browser cache and reset session tokens. Time to resolve: 15 minutes."}

Step 3: Fine-Tune in Ertas Studio

With your cleaned JSONL file ready:

Create a project in Ertas Studio for this client and task
Upload the JSONL file — Studio validates format and shows data statistics
Select base model — Llama 3.1 8B for most agency tasks, Mistral 7B as an alternative
Configure training:
- LoRA rank: 16 (default, works for most tasks)
- Epochs: 3
- Learning rate: 2e-4
Start training — typically 30-60 minutes for 2,000 examples on an 8B model
Evaluate — use Studio's side-by-side comparison to test the fine-tuned model against sample inputs

Quality Check

Before deploying, test with 20-30 examples the model has never seen:

Does the fine-tuned model match or exceed the API model's quality?
Is the output format consistent?
Does it handle edge cases correctly?

If quality is not sufficient, common fixes:

Add more training data (especially for the cases where quality is weak)
Increase LoRA rank from 16 to 32
Add another epoch of training
Improve data quality (remove noisy examples)

Step 4: Deploy the Model Locally

Export your fine-tuned model from Ertas Studio in GGUF format (for Ollama) or SafeTensors (for vLLM).

Deploy with Ollama

# Create a Modelfile
echo 'FROM llama3.1:8b
ADAPTER /path/to/your-adapter.gguf' > Modelfile

# Register the model
ollama create client-acme-summariser -f Modelfile

# Test it
ollama run client-acme-summariser "Summarise this ticket: ..."

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1/chat/completions.

Deploy with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B \
  --enable-lora \
  --lora-modules client-acme=/path/to/adapter \
  --host 0.0.0.0 --port 8000

vLLM exposes an OpenAI-compatible API at http://your-server:8000/v1/chat/completions.

Step 5: Connect n8n to Your Local Model

This is the payoff. Update your existing n8n workflows to point at your local model instead of the cloud API.

Option A: Change the API URL

If you are using n8n's HTTP Request node to call OpenAI:

Change the URL from https://api.openai.com/v1/chat/completions to http://localhost:11434/v1/chat/completions (Ollama) or http://your-server:8000/v1/chat/completions (vLLM)
Update the model parameter from gpt-4o to your model name
Remove or update the API key (Ollama does not require one)

That is it. The request/response format is identical. Your workflow logic, error handling, and output processing stay the same.

Option B: Use n8n's OpenAI Credentials with Custom Base URL

In n8n, create a new OpenAI credential
Set the base URL to your local endpoint
Set the API key to any string (e.g., "local")
Use this credential in your existing OpenAI nodes
Change the model name to your fine-tuned model

This approach requires no workflow changes beyond updating the credential — every node that uses the credential automatically switches to local inference.

Testing the Switch

Before switching production workflows:

Clone the workflow — create a copy that uses the local model
Run both in parallel for 24-48 hours
Compare outputs — are the local model's results equal or better?
Monitor latency — local inference should be faster for most workloads
Switch over — update the production workflow to use the local endpoint

Step 6: Iterate and Improve

Fine-tuning is not a one-time event. The model improves with feedback:

Continuous Data Collection

Keep the data collection branch active in your updated workflows. Now it captures:

Interactions with your fine-tuned model (not the API)
Client feedback (accepted/rejected)
Edge cases where the model underperforms

Periodic Retraining

Every 4-8 weeks (or when quality issues surface):

Export new interaction data from your logging pipeline
Add corrective examples for cases where the model struggled
Combine with original training data
Retrain in Ertas Studio
Evaluate against the previous model version
Deploy if improved

Track Improvement Over Time

Log model versions and corresponding quality metrics. Over 3-4 training cycles, you will see measurable improvement as the model learns from real-world usage patterns.

The Business Impact

Metric	Before (API)	After (Local Fine-Tuned)
Monthly API cost (per client)	$150-500	~$0
Response latency	800-2000ms	200-500ms
Output quality	Generic	Client-specific
Data privacy	Data sent to third party	Data stays local
Scalability	Linear cost increase	Fixed cost (GPU tier)

For most agencies, the switch pays for itself within 1-3 months. More importantly, it transforms the service from "we connect your workflows to ChatGPT" to "we build custom AI models trained on your data" — a significantly higher-value proposition.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →