n8n Local AI: Replace OpenAI With Your Own Fine-Tuned Model

Your n8n workflows work great. You've built automations that classify emails, extract data from invoices, score leads, and draft follow-up messages — all powered by OpenAI nodes that reliably deliver quality results.

But every one of those AI nodes is a recurring cost. Each execution burns tokens. Each token appears on your monthly bill. And as your automations scale — more workflows, more clients, more volume — that bill grows in lockstep.

What if you could run the exact same quality AI, locally, for free per execution?

That's what this tutorial walks you through, step by step. By the end, you'll have a fine-tuned model running on Ollama, connected to your n8n workflows, producing the same quality outputs as OpenAI — at zero per-execution cost.

No ML background required. No Python scripting. Just the specific steps to go from "paying OpenAI per token" to "running your own model locally."

What We're Building

Here's the end state:

┌──────────────────────────────────────────┐
│              n8n Workflow                  │
│                                           │
│  Trigger → Process → [AI Node] → Action   │
│                        │                  │
│                        ▼                  │
│              HTTP Request Node            │
│         POST localhost:11434/api/chat     │
└──────────────────┬───────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────┐
│            Ollama Server                  │
│                                           │
│  ┌─────────────────────────────────────┐ │
│  │  Your Fine-Tuned Model (GGUF)       │ │
│  │  Trained on YOUR workflow data      │ │
│  │  Runs on CPU — no GPU needed        │ │
│  └─────────────────────────────────────┘ │
└──────────────────────────────────────────┘

Your n8n workflow triggers exactly as before. Instead of the OpenAI node sending a request to api.openai.com (and costing tokens), an HTTP Request node sends the same prompt to Ollama running on your local machine or VPS. Ollama runs your fine-tuned model and returns the result. Same format. Same quality. Zero API cost.

The key insight: your fine-tuned model is trained on the exact input/output pairs from your existing OpenAI workflows. It doesn't need to know everything GPT-4 knows. It just needs to replicate the specific behavior your workflows rely on.

Prerequisites

Before we start, make sure you have:

A running n8n instance — self-hosted or n8n cloud. This tutorial assumes you have existing workflows with OpenAI nodes.
Ollama installed — on your n8n server, a nearby VPS, or your local machine. Installation: curl -fsSL https://ollama.com/install.sh | sh
An Ertas account — for fine-tuning your model without code. Sign up at ertas.io ($14.50/month).
Workflow execution history — at least 200 successful executions from the workflow you want to convert. More is better.

Hardware requirements for Ollama:

Setup	Specs	Monthly Cost
Same server as n8n	4+ vCPU, 16GB+ RAM	Already paying for it
Separate VPS	4 vCPU, 16GB RAM	~$30/month
Local machine	Any modern laptop with 16GB RAM	$0

A 7B parameter model with Q4 quantization uses about 4-5GB of RAM. If your n8n server has 16GB+ RAM, you can run Ollama alongside n8n on the same machine without issues.

Step 1: Export Your OpenAI Training Data

Your existing workflows have been generating training data every time they execute. Each execution contains the input that was sent to OpenAI and the output that came back. That's exactly what we need to fine-tune a model.

Identify the target workflow

Pick the workflow you want to convert first. The ideal candidate is:

High volume — runs 50+ times per day (biggest cost savings)
Narrow task — classification, extraction, or templated generation (easiest to fine-tune)
Consistent quality — OpenAI outputs are reliably good (clean training data)

Extract input/output pairs

Method 1: Manual extraction from n8n UI

Open the target workflow in n8n
Click "Executions" in the sidebar
Filter for "Success" status
Click into each execution and find the OpenAI node
Record the input (the messages array sent to OpenAI) and output (the response content)
Format as JSONL:

{"input": "Classify this email: Hi, I'd like to cancel my subscription...", "output": "cancellation"}
{"input": "Classify this email: When will my order arrive?", "output": "shipping_inquiry"}
{"input": "Classify this email: The product broke after 2 days...", "output": "defect_report"}

This works for collecting 50-100 examples but becomes tedious for larger datasets.

Method 2: Automated extraction via n8n API

Create a separate n8n workflow that pulls execution data programmatically:

Add an HTTP Request node that calls the n8n API:
- URL: http://localhost:5678/api/v1/executions?workflowId=YOUR_WORKFLOW_ID&status=success&limit=500
- Add your n8n API key in the headers
Add a Code node to extract and format the AI node data:

const executions = $input.all();
const trainingData = [];

for (const exec of executions) {
  const nodes = exec.json.data?.resultData?.runData;
  if (nodes && nodes['OpenAI']) {
    const openaiNode = nodes['OpenAI'][0];
    const input = openaiNode.data?.main?.[0]?.[0]?.json?.messages;
    const output = openaiNode.data?.main?.[0]?.[0]?.json?.choices?.[0]?.message?.content;

    if (input && output) {
      const userMessage = input.find(m => m.role === 'user')?.content || '';
      trainingData.push({
        input: userMessage,
        output: output
      });
    }
  }
}

return [{ json: { trainingData, count: trainingData.length } }];

Add a Write File node to save the output as a JSONL file.

Method 3: Prospective data collection

If you don't have enough execution history yet, add a logging step to your current workflow. Insert a Code node after your OpenAI node that appends each input/output pair to a file or database. Run this for 1-2 weeks until you have 300+ examples.

How much data do you need?

Task Type	Minimum	Sweet Spot	Diminishing Returns
Binary classification (yes/no)	100 pairs	250 pairs	500+ pairs
Multi-class classification	200 pairs	500 pairs	1,000+ pairs
Data extraction (structured)	200 pairs	500 pairs	1,000+ pairs
Short text generation	300 pairs	800 pairs	2,000+ pairs
Summarization	300 pairs	1,000 pairs	3,000+ pairs

Aim for the "sweet spot" column. You'll get good results at the minimum, but the sweet spot gives you a model that handles edge cases better.

Clean your data

Before uploading, do a quick quality pass:

Remove failed outputs. If OpenAI returned an error or a nonsensical response, drop that pair.
Remove duplicates. Exact duplicate input/output pairs don't help. Keep one copy.
Check for consistency. If similar inputs produced wildly different outputs, investigate. Your model will learn the average — inconsistent training data produces inconsistent outputs.
Standardize format. Make sure all your outputs follow the same format (e.g., all classification labels are lowercase, all JSON outputs use the same schema).

Step 2: Fine-Tune Your Model With Ertas

Now you've got a clean dataset. Time to turn it into a model.

Upload your data

Log into Ertas Studio at app.ertas.io
Create a new project (name it after your workflow, e.g., "Email Classifier" or "Invoice Extractor")
Click "Upload Dataset" and drag in your JSONL file
Ertas validates the file and shows you a preview — verify a few examples look correct

Select your base model

For n8n automation tasks, these are the recommended choices:

Base Model	Best For	Inference Speed	Quality
Qwen 2.5 7B	Classification, extraction, structured output	Fast	Excellent
Llama 3.3 8B	Generation, summarization, longer outputs	Fast	Excellent
Mistral 7B	High-throughput automation, short outputs	Fastest	Very Good

Our recommendation for most n8n workflows: Qwen 2.5 7B. It handles structured tasks exceptionally well and produces clean, consistent outputs — exactly what automation workflows need.

Configure training

Ertas auto-configures the training parameters based on your dataset:

LoRA rank: Automatically selected based on task complexity (typically 16-32 for automation tasks)
Learning rate: Optimized for your dataset size
Epochs: Usually 3-5 for automation datasets
Validation split: 10% of your data is held out for evaluation

You can adjust these if you want, but the defaults are tuned for automation use cases and work well out of the box.

Start training

Click "Start Training." Depending on your dataset size:

200-500 examples: ~15-20 minutes
500-1,000 examples: ~25-35 minutes
1,000+ examples: ~35-50 minutes

You can watch the training loss curve in real time. For automation tasks, you typically see the loss drop sharply in the first epoch and stabilize by epoch 3.

Evaluate results

When training completes, Ertas shows you:

Accuracy metrics: For classification tasks, you get precision, recall, and F1 scores broken down by class.
Side-by-side comparisons: Your fine-tuned model's outputs vs. the original GPT-4 outputs for test examples.
Sample predictions: Run any input through the model and see the output instantly.

What to look for:

Classification accuracy above 90% (for well-defined categories)
Extraction completeness (all fields captured)
Generation quality (reads naturally, matches expected format)

If quality isn't where you need it, the most common fixes are:

Add more training examples (especially for underperforming categories)
Clean up inconsistent training pairs
Increase LoRA rank (from 16 to 32)

Step 3: Export to GGUF and Load in Ollama

Export from Ertas

In your Ertas project, click "Export Model"
Select GGUF format
Choose Q4_K_M quantization — this is the optimal balance between quality and file size for 7B models:

Quantization	File Size (7B)	Quality	Speed
Q8_0	~7.5GB	Highest	Slower
Q5_K_M	~5.5GB	Very High	Medium
Q4_K_M	~4.5GB	High	Fast
Q3_K_M	~3.5GB	Good	Fastest

Q4_K_M gives you about 99% of the quality at 60% of the file size compared to Q8. For automation tasks where outputs are short and structured, the quality difference is negligible.

Download the GGUF file. It'll be named something like email-classifier-q4km.gguf.

Load in Ollama

Transfer the GGUF file to your server (the machine running Ollama):

scp email-classifier-q4km.gguf user@your-server:/home/user/models/

Create a Modelfile that tells Ollama how to serve your model:

FROM /home/user/models/email-classifier-q4km.gguf

PARAMETER temperature 0.1
PARAMETER num_ctx 4096
PARAMETER stop "<|im_end|>"

Key parameters:

temperature 0.1 — Low temperature for consistent, deterministic outputs. Critical for automation tasks.
num_ctx 4096 — Context window size. Increase if your inputs are longer.
stop token — Depends on your base model. Qwen uses the im_end token, Llama uses the end-of-sequence token.

Create the model in Ollama:

ollama create email-classifier -f Modelfile

Test it:

ollama run email-classifier "Classify this email: Hi, I need to change my shipping address for order #4521."

You should get a response like shipping_inquiry — matching the format your workflow expects.

Verify the API endpoint

Ollama serves a REST API on port 11434 by default. Test it with curl:

curl http://localhost:11434/api/chat -d '{
  "model": "email-classifier",
  "messages": [
    {
      "role": "system",
      "content": "Classify the following email into one of these categories: cancellation, shipping_inquiry, defect_report, billing_question, general_inquiry"
    },
    {
      "role": "user",
      "content": "Hi, I was charged twice for my last order."
    }
  ],
  "stream": false
}'

Expected response:

{
  "model": "email-classifier",
  "message": {
    "role": "assistant",
    "content": "billing_question"
  }
}

If this works, you're ready to wire it into n8n.

Step 4: Create the Ollama Node in n8n

Now for the actual workflow change. Open the n8n workflow you want to convert.

Option A: Replace with HTTP Request Node (recommended)

This gives you full control over the request and works with any n8n version.

Disable the OpenAI node (don't delete it yet — you'll want it for comparison testing)
Add an HTTP Request node right where the OpenAI node was
Configure the HTTP Request node:

Method: POST

URL: http://localhost:11434/api/chat

(If Ollama is on a different server, use that server's IP instead of localhost)

Headers:

Content-Type: application/json

Body (JSON):

{
  "model": "email-classifier",
  "messages": [
    {
      "role": "system",
      "content": "Classify the following email into one of these categories: cancellation, shipping_inquiry, defect_report, billing_question, general_inquiry"
    },
    {
      "role": "user",
      "content": "={{ $json.email_body }}"
    }
  ],
  "stream": false,
  "options": {
    "temperature": 0.1
  }
}

Replace $json.email_body with whatever expression references the input data from the previous node in your workflow.

Add a Code node after the HTTP Request to extract the response:

const response = $input.first().json;
const result = response.message?.content?.trim() || response.response?.trim() || '';

return [{
  json: {
    classification: result,
    model: response.model,
    raw_response: response
  }
}];

Connect the Code node's output to whatever comes next in your workflow (the same node the OpenAI node was connected to).

Option B: Use the OpenAI-Compatible Endpoint

Ollama also serves an OpenAI-compatible API at /v1/chat/completions. If your n8n version has an OpenAI node that lets you change the base URL, you can:

Open the OpenAI node settings
Change the base URL to http://localhost:11434/v1
Set the model to email-classifier
Remove the API key (or set any dummy value — Ollama doesn't need one)

This approach requires fewer workflow changes but depends on your n8n version supporting custom base URLs in the OpenAI node.

Option C: Native Ollama Node

Recent n8n versions (1.20+) include native Ollama integration in the AI nodes section. If available:

Add the Ollama Chat Model node
Set Base URL to http://localhost:11434
Select your model name
Wire it into your workflow

This is the simplest option but gives you the least control over request parameters.

Step 5: Test and Compare

Before you go live, run a proper comparison test.

A/B test your workflow

Keep both the OpenAI node and the Ollama node in your workflow
Add a Switch node before them that sends 50% of executions to each
Add logging nodes after each to capture the outputs
Run 100+ real executions through the split

Compare outputs

After collecting comparison data, evaluate:

For classification tasks:

Metric	OpenAI (GPT-4)	Fine-Tuned Local
Accuracy	Baseline	Compare
Consistency (same input → same output)	~95%	~99%
Speed	2-5 seconds	0.3-1 second

A fine-tuned model is typically more consistent than GPT-4 for classification because it's been trained specifically on your categories and doesn't exhibit the creative variation that general-purpose models do.

For extraction tasks:

Metric	OpenAI (GPT-4)	Fine-Tuned Local
Field completeness	Baseline	Compare
Format adherence (valid JSON, etc.)	~92%	~97%
Speed	3-8 seconds	0.5-2 seconds

Fine-tuned models tend to produce more consistent output formats because they've learned your specific schema through training, rather than following schema instructions through prompting.

For generation tasks:

Metric	OpenAI (GPT-4)	Fine-Tuned Local
Quality (subjective)	Baseline	Compare
Tone consistency	Variable	Consistent
Speed	3-10 seconds	1-3 seconds

Generation tasks are the most subjective. Run 20-30 outputs past a human reviewer and score them on a 1-5 scale for quality and appropriateness.

Performance Benchmarks

Here's real-world performance data for common n8n automation tasks running on a $30/month VPS (4 vCPU, 16GB RAM) with a fine-tuned Qwen 2.5 7B model:

Metric	OpenAI API (GPT-4)	Local Fine-Tuned (7B)
Response time (classification)	1.5-3.0 seconds	0.2-0.5 seconds
Response time (extraction)	2.0-5.0 seconds	0.4-1.0 seconds
Response time (generation)	3.0-8.0 seconds	0.8-2.5 seconds
Throughput (requests/second)	Limited by rate tier	10-20 req/sec
Cost per execution	$0.02-0.10	$0.00
Monthly cost (1K exec/day)	$600-3,000	$44.50 flat
Monthly cost (10K exec/day)	$6,000-30,000	$44.50 flat
Uptime dependency	OpenAI status page	Your server
Data leaves your infra	Yes	No

The throughput advantage is significant for batch workflows. If you have a workflow that processes 500 emails every morning, the OpenAI version takes 12-25 minutes (rate-limited). The local version completes in 25-50 seconds.

Troubleshooting Common Issues

Model too slow

Symptom: Responses take 5+ seconds for simple tasks.

Fixes:

Check VPS CPU usage — if it's maxed at 100%, you need more vCPUs or a beefier machine
Use Q4_K_M quantization instead of Q8 — half the memory, 30% faster
Reduce num_ctx if your inputs are short — a 2048 context window is faster than 4096
Make sure no other resource-heavy processes are running on the same server

Quality drop compared to OpenAI

Symptom: Outputs are noticeably worse than GPT-4 was producing.

Fixes:

More training data. The most common fix. Go from 200 to 500+ examples.
Cleaner training data. Remove any examples where OpenAI's output was wrong or inconsistent.
More representative data. If certain categories or input types are underrepresented, add more examples of those specifically.
Higher LoRA rank. If you used rank 8 or 16, try 32. This gives the model more capacity to learn your task.
Try a different base model. If Mistral 7B isn't cutting it, try Qwen 2.5 7B or Llama 3.3 8B. Different base models have different strengths.

Context length errors

Symptom: Model returns garbage or errors on longer inputs.

Fixes:

Increase num_ctx in your Modelfile (e.g., from 4096 to 8192)
Note: larger context uses more RAM. A 7B model with 8K context needs ~6GB RAM.
If your inputs are regularly over 4K tokens, consider truncating or summarizing the input before sending it to the model
For very long inputs (8K+ tokens), consider a two-stage approach: summarize first, then classify/extract from the summary

Ollama not responding

Symptom: n8n gets connection refused or timeout errors.

Fixes:

Verify Ollama is running: systemctl status ollama or ollama list
Check the port: curl http://localhost:11434/api/tags should return a JSON response
If n8n is on a different machine, make sure Ollama is bound to 0.0.0.0: set OLLAMA_HOST=0.0.0.0 in the Ollama environment config
Check firewall rules: port 11434 must be accessible from the n8n machine
Check RAM: if the server ran out of memory, Ollama may have crashed. dmesg | grep -i oom will show out-of-memory kills.

Inconsistent output format

Symptom: Model sometimes returns "billing_question" and sometimes "Billing Question" or "The category is billing_question."

Fixes:

Add a post-processing step in n8n (Code node) that normalizes the output: lowercase, trim whitespace, strip prefixes
Improve training data consistency — make sure all your training examples use the exact same format
Lower temperature to 0.05 (almost deterministic)
Add a system prompt that explicitly specifies the output format

Going Live

Once you've validated quality and resolved any issues:

Remove the A/B split — route 100% of traffic to the local model
Keep the OpenAI node disabled (not deleted) as a fallback for the first week
Monitor for 7 days — check outputs daily, compare error rates
After 7 days: If everything looks good, delete the OpenAI node and remove the API key from n8n credentials
Set up retraining schedule — every 4-8 weeks, export new execution data and retrain the model on the expanded dataset

Your n8n workflows now run with zero API costs. Every execution is free. Scale to 10x the volume and your bill stays exactly the same: $14.50 for Ertas plus $30 for your VPS.

That's $44.50/month for unlimited AI automation. No tokens. No rate limits. No surprise invoices.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

n8n Local AI: Replace OpenAI With Your Own Fine-Tuned Model

What We're Building

Prerequisites

Step 1: Export Your OpenAI Training Data

Identify the target workflow

Extract input/output pairs

How much data do you need?

Clean your data

Step 2: Fine-Tune Your Model With Ertas

Upload your data

Select your base model

Configure training

Start training

Evaluate results

Step 3: Export to GGUF and Load in Ollama

Export from Ertas

Load in Ollama

Verify the API endpoint

Step 4: Create the Ollama Node in n8n

Option A: Replace with HTTP Request Node (recommended)

Option B: Use the OpenAI-Compatible Endpoint

Option C: Native Ollama Node

Step 5: Test and Compare

A/B test your workflow

Compare outputs

Performance Benchmarks

Troubleshooting Common Issues

Model too slow

Quality drop compared to OpenAI

Context length errors

Ollama not responding

Inconsistent output format

Going Live

Further Reading

Ship AI that runs on your users' devices.

Keep reading

I Replaced Every OpenAI Call in My n8n Workflows With a Fine-Tuned Model

From $500/Month OpenAI Bills to $0: Migrating n8n Workflows to Local Models

n8n + Ollama + Fine-Tuned Models: The Zero-API-Cost Automation Stack