
n8n + Ollama + Fine-Tuned Models: The Zero-API-Cost Automation Stack
Build powerful AI automations in n8n that cost nothing per execution. This guide shows you how to replace every OpenAI node with a locally-running fine-tuned model via Ollama.
n8n is one of the best tools in the automation world. Self-hosted, open source, endlessly flexible. You can wire up hundreds of workflows that move data between apps, process documents, classify inputs, and generate outputs — all without writing a traditional backend.
But the moment you add an AI node to an n8n workflow, you introduce a variable cost that scales with every execution. Each time that workflow fires — each email classified, each document summarized, each lead scored — you're paying OpenAI for tokens.
At 100 executions per day, it's negligible. At 1,000 per day, you start noticing. At 10,000+ per day, it's your single largest operational expense after hosting.
This guide shows you how to eliminate that cost entirely by replacing every OpenAI node in your n8n workflows with a locally-running fine-tuned model served through Ollama.
The Hidden Cost of AI Nodes in n8n
Let's start with what's actually happening behind the scenes when you use an OpenAI node in n8n.
Every time an n8n workflow with an AI node executes, it sends a request to the OpenAI API. That request includes your system prompt, the input data from the previous node, and any context you've injected. OpenAI processes it, returns the response, and charges you based on token count.
Here's what that costs for common automation patterns:
| Workflow Type | Avg Input Tokens | Avg Output Tokens | Cost Per Execution | 1K Runs/Day (Monthly) |
|---|---|---|---|---|
| Email classification | 800 | 50 | $0.027 | $810/mo |
| Document summarization | 2,500 | 400 | $0.099 | $2,970/mo |
| Lead scoring | 600 | 100 | $0.024 | $720/mo |
| Support ticket routing | 1,000 | 80 | $0.035 | $1,050/mo |
| Invoice data extraction | 1,500 | 200 | $0.057 | $1,710/mo |
Those numbers assume GPT-4-level pricing ($30/1M input, $60/1M output). Even if you use GPT-3.5-turbo at 10x lower prices, running 1,000 executions per day still costs $70-300/month depending on the workflow.
And most n8n power users don't run just one workflow. They run dozens. An agency managing client automations might have 50+ workflows with AI nodes across all their clients. The cost compounds fast.
The worst part: most of these tasks are narrow and repetitive. Classifying emails into 5 categories. Extracting names and dates from invoices. Scoring leads as hot/warm/cold. These aren't tasks that require GPT-4's knowledge of Shakespearean sonnets. They're pattern-matching tasks that a small, specialized model handles perfectly.
The Stack: n8n + Ollama + Fine-Tuned Models
Here's the architecture we're building:
┌─────────────┐ ┌─────────────┐ ┌──────────────────┐
│ n8n │────▶│ Ollama │────▶│ Fine-Tuned Model │
│ Workflow │◀────│ Server │◀────│ (GGUF format) │
└─────────────┘ └─────────────┘ └──────────────────┘
│ │
│ HTTP Request │ Local inference
│ (localhost:11434) │ (zero API cost)
│ │
n8n orchestrates your workflows — triggers, data routing, transformations, output actions. This doesn't change.
Ollama runs on the same server (or a nearby VPS) and serves your fine-tuned model through an API endpoint that's compatible with the OpenAI format. n8n talks to it the same way it talks to OpenAI — just at a different URL.
Your fine-tuned model is a 7B-parameter model trained on your specific workflow data. It knows how to do the exact tasks your automations need, and nothing else. It runs on CPU (no GPU required for 7B models at automation-scale throughput).
The result: every execution costs $0 in API fees. Your only costs are the VPS ($30/month) and the fine-tuning platform ($14.50/month for Ertas).
Step 1: Identify Your AI Workflows
Start by auditing your n8n instance. Open the workflow list and look for any workflow that contains:
- OpenAI node (the most obvious one)
- AI Agent node with an OpenAI model configured
- HTTP Request node pointing at
api.openai.com - LangChain nodes using OpenAI as the LLM provider
For each workflow, note:
- What task the AI is performing (classification, extraction, generation, summarization)
- How many times it executes per day/week
- The system prompt being used
- The typical input and output format
The best candidates for replacement are workflows where:
- The task is narrow and well-defined (classify into N categories, extract specific fields, generate from a template)
- The workflow runs frequently (100+ times per day)
- The output format is predictable (JSON, short text, category labels)
Common high-value targets:
- Email triage and classification
- Lead scoring and routing
- Invoice and receipt data extraction
- Support ticket categorization
- Content moderation
- Sentiment analysis
Step 2: Collect Training Data From Your Workflows
This is the step most guides skip, and it's the most important one. Your fine-tuned model is only as good as the data you train it on. The good news: n8n has already been generating your training data.
Every time your n8n workflow executes, it logs the input and output. These execution logs are your training dataset.
Here's how to extract them:
Option A: n8n Execution History (UI)
- Open the workflow you want to replace
- Click "Executions" in the left sidebar
- Filter for successful executions
- For each execution, click to view the data at the OpenAI node
- Copy the input (what was sent to OpenAI) and output (what OpenAI returned)
This works for small datasets (under 200 examples) but gets tedious at scale.
Option B: n8n API (Programmatic)
n8n has a REST API. You can pull execution data programmatically:
GET /api/v1/executions?workflowId={id}&status=success&limit=500
For each execution, extract the data at the AI node and format it as a training pair:
{
"input": "The system prompt + user input that was sent to OpenAI",
"output": "The response OpenAI returned"
}
Option C: Add a Logging Node
If you want to start collecting data going forward, add a Function node right after your OpenAI node that writes each input/output pair to a Google Sheet, Airtable, or a JSON file. After a few weeks, you'll have a clean dataset.
How much data do you need?
| Task Type | Minimum Examples | Recommended |
|---|---|---|
| Binary classification | 100 | 300+ |
| Multi-class classification (5-10 classes) | 200 | 500+ |
| Data extraction | 200 | 500+ |
| Short text generation | 300 | 800+ |
| Summarization | 300 | 1,000+ |
Quality matters more than quantity. 300 clean, representative examples will outperform 3,000 noisy ones.
Step 3: Fine-Tune With Ertas
Now you've got a dataset. Time to build your model.
-
Sign up for Ertas at ertas.io. The platform costs $14.50/month and includes everything you need for fine-tuning.
-
Upload your dataset. Ertas accepts JSONL (one JSON object per line with "input" and "output" fields), CSV (two columns: input and output), or you can paste data directly into the Studio interface.
-
Select a base model. For automation tasks, we recommend:
- Qwen 2.5 7B — best all-around for classification and extraction
- Llama 3.3 8B — strong for generation tasks and longer outputs
- Mistral 7B — fast inference, good for high-throughput workflows
-
Configure and train. Ertas auto-selects LoRA rank, learning rate, and epoch count based on your dataset. You can adjust these if you know what you're doing, but the defaults work well for 90% of use cases. Click "Start Training" and wait — typically 15-45 minutes depending on dataset size.
-
Evaluate. Ertas runs your model against a held-out test set and shows you accuracy metrics. For classification tasks, you'll see precision, recall, and F1 scores. For generation tasks, you'll see sample outputs compared against the expected outputs.
-
Export to GGUF. Click "Export" and select GGUF format with Q4_K_M quantization (the best balance of quality and file size for 7B models). Download the file — it'll be 4-5GB.
Step 4: Deploy With Ollama
Ollama is the bridge between your fine-tuned model and n8n. It serves your GGUF model through a local API that's compatible with the OpenAI format.
Install Ollama on your VPS:
curl -fsSL https://ollama.com/install.sh | sh
Create a Modelfile for your fine-tuned model:
FROM /path/to/your-model.gguf
PARAMETER temperature 0.1
PARAMETER num_ctx 4096
The low temperature (0.1) is important for automation tasks — you want consistent, deterministic outputs, not creative variation.
Create and run the model:
ollama create my-workflow-model -f Modelfile
ollama run my-workflow-model "Test input here"
Verify the API is accessible:
curl http://localhost:11434/api/generate -d '{
"model": "my-workflow-model",
"prompt": "Test input",
"stream": false
}'
For production, make sure Ollama starts on boot and is accessible from your n8n instance. If n8n and Ollama are on the same server, localhost works. If they're on different servers, configure Ollama to bind to 0.0.0.0 and secure the connection.
VPS Sizing:
| Model Size | Minimum VPS | Recommended VPS | Monthly Cost |
|---|---|---|---|
| 7B (Q4) | 4 vCPU, 8GB RAM | 4 vCPU, 16GB RAM | $20-30/mo |
| 13B (Q4) | 8 vCPU, 16GB RAM | 8 vCPU, 32GB RAM | $40-60/mo |
A $30/month VPS from Hetzner or DigitalOcean handles a 7B model comfortably, processing 10-20 requests per second for short classification/extraction tasks.
Step 5: Update n8n Nodes
Now wire it together. For each workflow where you're replacing the OpenAI node:
Option A: Replace OpenAI node with HTTP Request node
- Delete (or disable) the OpenAI node
- Add an HTTP Request node
- Configure it:
- Method: POST
- URL:
http://localhost:11434/api/chat - Body (JSON):
{
"model": "my-workflow-model",
"messages": [
{
"role": "system",
"content": "Your system prompt here"
},
{
"role": "user",
"content": "{{ $json.input_field }}"
}
],
"stream": false
}
- Add a Function node after the HTTP Request to extract the response:
const response = $input.first().json;
return [{
json: {
result: response.message.content
}
}];
Option B: Use the Ollama node (if available)
Recent versions of n8n include a native Ollama node in the AI nodes section. Configure it with:
- Base URL:
http://localhost:11434 - Model:
my-workflow-model
This is simpler but gives you less control over parameters.
Test thoroughly. Run 20-30 real executions through the updated workflow and compare the outputs against what OpenAI was producing. For classification tasks, check that the categories match. For extraction tasks, verify all fields are captured correctly.
Cost Comparison
Here's the math for an agency running multiple AI workflows:
| Monthly Executions | OpenAI API Cost | Local Fine-Tuned Cost | Savings |
|---|---|---|---|
| 10,000 | $270 - $990 | $44.50 | 84-95% |
| 50,000 | $1,350 - $4,950 | $44.50 | 97-99% |
| 100,000 | $2,700 - $9,900 | $44.50 | 98-99.5% |
| 500,000 | $13,500 - $49,500 | $44.50* | 99.7%+ |
*At 500K+ monthly executions, you may need a beefier VPS ($60-100/month) for throughput. Still a rounding error compared to API costs.
The local cost column is $14.50/month for Ertas + $30/month for the VPS. That's it. No per-execution fee. No per-token charge. No surprise bill at the end of the month.
For agencies managing automations across multiple clients, this is transformative. Instead of passing API costs through to clients (and watching them churn when the bills climb), you offer a flat-rate service with near-zero marginal cost per execution.
When This Stack Works Best
Not every AI task should be fine-tuned. Here's where the n8n + Ollama + fine-tuned stack delivers the biggest wins:
Classification tasks — Email routing, ticket categorization, sentiment analysis, lead scoring. These are the sweet spot. The task is well-defined, the output format is constrained, and a fine-tuned 7B model typically matches or exceeds GPT-4 accuracy.
Data extraction — Pulling structured data from invoices, receipts, forms, emails. Fine-tuned models excel here because they learn your specific schema and field names.
Templated generation — Drafting responses from templates, generating product descriptions from specs, writing follow-up emails based on meeting notes. The output follows a predictable pattern that a small model learns quickly.
Summarization — Condensing documents, emails, or transcripts into key points. Fine-tuned models produce summaries that match your preferred style and length.
Where to keep using APIs:
- Complex multi-step reasoning across diverse domains
- Tasks requiring up-to-date information (news, current events)
- One-off creative tasks where consistency doesn't matter
- Workflows with fewer than 100 monthly executions (the cost savings don't justify the setup)
The 80/20 rule applies: 80% of your AI automation spend probably comes from 20% of your workflows. Target those high-volume, narrow-task workflows first and you'll capture most of the savings immediately.
Scaling the Stack
As your automation volume grows, here's how the stack scales:
10K-50K executions/month: Single VPS with one model handles this easily. A 7B model on a 4 vCPU / 16GB RAM VPS can process 15-20 requests per second for short tasks.
50K-200K executions/month: You might need a slightly beefier VPS (8 vCPU / 32GB RAM, ~$50/month) or optimize with model batching. Still dramatically cheaper than API costs.
200K+ executions/month: Consider running multiple model instances behind a simple load balancer. Two $30 VPS instances give you redundancy and double throughput. Your total infra cost is $75/month compared to $5,000+ in API costs.
Multiple models for different tasks: You can run several fine-tuned models on the same Ollama instance. One for email classification, another for data extraction, a third for summarization. Each model gets loaded into memory when called and unloaded when idle. A 16GB RAM VPS can serve 2-3 7B models concurrently.
Getting Started Today
Here's the minimal path to your first zero-cost AI workflow:
- Pick your highest-volume AI workflow in n8n — the one that executes most frequently
- Export 300+ execution examples as input/output pairs
- Fine-tune on Ertas — upload data, pick Qwen 2.5 7B, train, export GGUF
- Deploy on Ollama — install on your VPS, load the model, verify the endpoint
- Swap the node — replace the OpenAI node with an HTTP Request pointing at Ollama
- Monitor for a week — compare outputs and confirm quality matches
Once you've validated the first workflow, repeat for every AI workflow in your n8n instance. Most of the work is in Step 2 (collecting the data). The fine-tuning and deployment become routine after you've done it once.
Your n8n automations shouldn't have a per-execution tax. Build the stack once, run it forever.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- n8n + Local LLMs: HIPAA-Compliant AI Automation — How local models solve compliance requirements for healthcare and legal automations.
- Running AI Models Locally: A Practical Guide — Everything you need to know about Ollama, GGUF, and local deployment.
- LM Studio vs Ollama: Which Local AI Runtime Should You Use? — A head-to-head comparison of the two most popular local model runtimes.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

I Replaced Every OpenAI Call in My n8n Workflows With a Fine-Tuned Model
A builder's firsthand account of migrating 12 n8n workflows from OpenAI to locally-running fine-tuned models. The costs, the gotchas, and the results after 60 days.

n8n Local AI: Replace OpenAI With Your Own Fine-Tuned Model
Step-by-step guide to replacing OpenAI API calls in your n8n workflows with a locally-running fine-tuned model. Cut costs to zero without sacrificing quality.

From $500/Month OpenAI Bills to $0: Migrating n8n Workflows to Local Models
A practical migration guide for n8n users spending hundreds on OpenAI API calls. Move your workflows to local fine-tuned models without breaking anything.