
From $500/Month OpenAI Bills to $0: Migrating n8n Workflows to Local Models
A practical migration guide for n8n users spending hundreds on OpenAI API calls. Move your workflows to local fine-tuned models without breaking anything.
You started with one n8n workflow using OpenAI. A simple one — maybe it classified incoming emails or extracted data from form submissions. The API call cost fractions of a penny per execution. Barely noticeable. So you built five more workflows. Then ten. Then you added GPT-4 to the ones that needed better reasoning. Then your colleague saw what you built and asked for three more.
Now you are staring at a $500/month OpenAI bill. And it is climbing.
Here is the thing: most of those workflows do not need GPT-4. They do not even need GPT-3.5. They need a model that is really good at one specific task — classifying, extracting, reformatting, summarizing — and that is exactly what a fine-tuned 7B model does. The migration from OpenAI API calls to local fine-tuned models is not as scary as it sounds, and the cost savings are dramatic: from hundreds of dollars per month to literally zero in per-token costs.
This guide walks through the entire migration, step by step. We will audit your workflows, prioritize what to migrate, fine-tune models for each workflow type, deploy them with Ollama, and swap the endpoints in n8n without breaking anything.
The Migration Audit
Before you migrate anything, you need to know what you are working with. The goal of the audit is to inventory every n8n workflow that uses an AI node, categorize each one by complexity and volume, and identify the quick wins.
Step 1: List every workflow with an AI node. In n8n, go to your workflow list and search for workflows containing OpenAI nodes (or any AI/LLM node). For each workflow, document:
- Workflow name and purpose
- Which model it uses (GPT-4, GPT-4o, GPT-3.5-turbo)
- Approximate executions per day
- Average input token count per execution
- Average output token count per execution
- Whether it uses structured output (JSON mode, function calling)
Step 2: Categorize by task type. Most AI-powered n8n workflows fall into these buckets:
| Task Type | Examples | Complexity | Migration Difficulty |
|---|---|---|---|
| Classification | Email routing, ticket categorization, sentiment analysis | Low | Easy |
| Extraction | Pull names/dates/amounts from text, parse invoices | Low-Medium | Easy |
| Reformatting | Convert prose to bullet points, standardize formats | Low | Easy |
| Summarization | Summarize emails, meeting notes, documents | Medium | Moderate |
| Generation | Write email replies, create descriptions, draft content | Medium-High | Moderate |
| Reasoning | Multi-step analysis, decision-making, complex Q&A | High | Hard |
| Code generation | Write SQL queries, generate scripts | High | Hard |
Step 3: Calculate per-workflow costs. Multiply each workflow's daily executions by its token usage and the model's per-token rate. Here is a quick reference:
| Model | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4 | $30.00 | $60.00 |
| GPT-3.5-turbo | $0.50 | $1.50 |
A workflow running 500 times/day with 800 input tokens and 200 output tokens on GPT-4o costs:
- Input: 500 * 800 = 400K tokens/day = 12M tokens/month = $30/month
- Output: 500 * 200 = 100K tokens/day = 3M tokens/month = $30/month
- Total: $60/month for one workflow
Multiply that across 10-15 workflows and you see how $500/month happens fast.
Which Workflows to Migrate First
Not all workflows are equal candidates for migration. The ideal first targets are:
High volume, low complexity. A workflow that classifies 2,000 emails per day into 5 categories is perfect. It has a clear input-output pattern, high volume (so high savings), and low complexity (a fine-tuned 3B model can handle it easily).
Structured output. Workflows that expect JSON output — like extracting fields from invoices or parsing form data — are excellent candidates. The output format is constrained and predictable, which makes fine-tuning straightforward and evaluation simple. Either the JSON is correct or it is not.
Repetitive patterns. If a workflow does essentially the same transformation thousands of times with only the input data changing, fine-tuning works beautifully. The model just needs to learn the pattern once.
Workflows NOT to migrate first:
- Anything requiring up-to-date world knowledge (the fine-tuned model knows what it was trained on, nothing more)
- Multi-step reasoning chains where earlier outputs feed into later prompts
- Creative generation where quality is subjective and hard to evaluate
- Anything with safety-critical consequences (medical, legal, financial advice)
Here is a prioritization framework:
| Priority | Criteria | Expected Savings |
|---|---|---|
| P0 — Migrate immediately | Classification, extraction, reformatting; >100 executions/day | 90-100% cost reduction |
| P1 — Migrate next | Summarization, simple generation; >50 executions/day | 85-95% cost reduction |
| P2 — Evaluate carefully | Complex generation, multi-step reasoning; any volume | 70-90% cost reduction |
| P3 — Keep on API | Safety-critical, requires world knowledge, highly variable tasks | 0% (stay on API) |
The Migration Framework
The migration follows four phases. Do not skip phases. Do not rush.
Phase 1: Export Execution Data
For each workflow you are migrating, you need the actual input-output pairs from real executions. This is your training data.
From n8n execution logs: n8n stores execution data for every workflow run. You can access this through the n8n API or directly from the database if you are self-hosting. For each execution of an AI node, extract:
- The prompt/input that was sent to OpenAI
- The response/output that was received
- Whether the workflow completed successfully (filter out failures)
Export script approach:
// Pseudocode for extracting training pairs from n8n executions
const executions = await n8nApi.getExecutions({
workflowId: "your-workflow-id",
status: "success",
limit: 5000,
});
const trainingData = executions.map((exec) => {
const aiNode = exec.data.resultData.runData["OpenAI Node"][0];
return {
input: aiNode.parameters.prompt,
output: aiNode.data.main[0][0].json.text,
};
});
// Write as JSONL
const jsonl = trainingData
.map((d) => JSON.stringify(d))
.join("\n");
fs.writeFileSync("training-data.jsonl", jsonl);
How much data do you need per workflow type?
| Workflow Complexity | Minimum Examples | Recommended | Diminishing Returns After |
|---|---|---|---|
| Classification (5-10 classes) | 200 | 500 | 2,000 |
| Data extraction | 300 | 800 | 3,000 |
| Reformatting | 200 | 500 | 1,500 |
| Summarization | 500 | 1,500 | 5,000 |
| Content generation | 800 | 2,000 | 5,000+ |
For most n8n workflows, two to four weeks of execution logs provide more than enough training data.
Phase 2: Fine-Tune Per Workflow
Now the question: do you train one model for all workflows or one model per workflow type?
One model per workflow type is almost always the right choice. Here is why:
- Each model can be small and fast (3B-7B parameters) because it only needs to handle one task
- Quality is higher because the model is not confused by competing task patterns
- You can update each model independently when requirements change
- If one model underperforms, you only retrain that one — not everything
The fine-tuning process with Ertas:
- Upload your JSONL training file to Ertas
- Select the base model:
- Qwen 2.5 3B for simple classification and extraction (runs on 4GB RAM)
- Qwen 2.5 7B for summarization and generation (runs on 8GB RAM)
- Configure LoRA training (defaults are fine for most workflows)
- Train — 500 examples on a 3B model takes about 15 minutes, 7B takes about 30 minutes
- Evaluate against held-out test examples
- Export as GGUF
Cost for fine-tuning: Ertas at $14.50/month includes unlimited training runs. If you are migrating 10 workflows and training one model per workflow type (after deduplicating similar workflows), you might need 5-7 training runs. All included.
Phase 3: Deploy and Test
Deploy all your fine-tuned models on a single Ollama instance. Ollama handles multiple models efficiently — it loads the active model into memory and can swap between models in seconds.
Deployment setup:
# Install Ollama on your VPS
curl -fsSL https://ollama.com/install.sh | sh
# Create each model
ollama create email-classifier -f Modelfile.email-classifier
ollama create invoice-extractor -f Modelfile.invoice-extractor
ollama create ticket-summarizer -f Modelfile.ticket-summarizer
VPS sizing for multiple models:
| Number of Models | Active Concurrently | Recommended VPS | Monthly Cost |
|---|---|---|---|
| 1-3 (3B models) | 1 | 4 vCPU, 8GB RAM | ~$14/mo |
| 3-5 (mix 3B/7B) | 1-2 | 4 vCPU, 16GB RAM | ~$26/mo |
| 5-10 (mix 3B/7B) | 2-3 | 8 vCPU, 32GB RAM | ~$48/mo |
| 10+ or high concurrency | 3+ | 16 vCPU, 64GB RAM | ~$96/mo |
Even at the high end — $96/month for running 10+ models — you are paying a fraction of your $500/month OpenAI bill.
Parallel testing strategy: Before swapping anything in production, run your fine-tuned models in parallel with your existing OpenAI workflows for at least one week. Here is how:
- Clone each workflow you are migrating
- In the clone, replace the OpenAI node with an HTTP Request node pointing to your Ollama endpoint
- Run both workflows simultaneously on the same triggers
- Compare outputs side by side
Create a simple comparison spreadsheet:
| Input | OpenAI Output | Local Model Output | Match? | Notes |
|---|---|---|---|---|
| ... | ... | ... | Yes/No | ... |
You want at least 95% match rate for classification and extraction workflows. For generation and summarization, human judgment is needed — but the outputs should be functionally equivalent, not necessarily identical.
Phase 4: Swap and Monitor
Once parallel testing confirms quality, swap the production workflows.
Gradual cutover approach:
- Week 1: Migrate P0 workflows (classification, extraction). Keep OpenAI as a fallback — if the local model returns an error or confidence is low, fall back to OpenAI.
- Week 2: If P0 is stable, remove the OpenAI fallback for P0 workflows. Migrate P1 workflows with fallback.
- Week 3: Remove fallback for P1. Evaluate P2 workflows.
- Week 4: Migrate or defer P2 based on evaluation results.
Fallback pattern in n8n:
Input → Local Model (Ollama) → IF (confidence > threshold) → Use local result
→ ELSE → OpenAI API → Use API result
For classification workflows, you can implement confidence thresholds based on the model's output probabilities. For extraction and generation, use a simpler heuristic: if the local model returns a valid response within the expected format, use it. If it errors or returns malformed output, fall back.
Monitoring checklist:
- Track error rates per workflow per day
- Compare execution times (local should be faster for most tasks)
- Log any fallback-to-API events and investigate why
- Monitor VPS resource utilization (CPU, RAM)
- Check output quality weekly by sampling 20-30 results per workflow
Migration Cost Calculator
Here is what the numbers look like for a typical migration:
Before: OpenAI API costs
| Workflow | Model | Executions/Day | Monthly Token Cost |
|---|---|---|---|
| Email classifier | GPT-4o | 800 | $45 |
| Invoice extractor | GPT-4o | 200 | $38 |
| Ticket summarizer | GPT-4 | 150 | $85 |
| Lead scorer | GPT-3.5 | 500 | $12 |
| Content reformatter | GPT-4o | 300 | $28 |
| Report generator | GPT-4 | 50 | $62 |
| Sentiment analyzer | GPT-3.5 | 1,000 | $18 |
| Data normalizer | GPT-4o | 400 | $32 |
| FAQ responder | GPT-4o | 250 | $55 |
| Email drafter | GPT-4 | 100 | $78 |
| Total | 3,750/day | $453/mo |
After: Local fine-tuned models
| Cost Component | Monthly |
|---|---|
| Ollama VPS (8 vCPU, 32GB RAM, Hetzner) | $48 |
| Ertas subscription (unlimited training) | $14.50 |
| OpenAI API (P3 workflows kept on API) | $35 |
| Total | $97.50/mo |
Monthly savings: $355.50. Annual savings: $4,266.
And this is a conservative estimate. As workflow volumes grow, the API costs would have grown linearly while the local infrastructure costs stay flat. If your email classifier doubles to 1,600 executions/day, your local cost is still $48/month for the VPS. On OpenAI, that workflow alone would jump to $90/month.
What We Did Not Migrate (and Why)
Honesty matters. Not everything should move off the API. Here are the workflows we intentionally kept on OpenAI:
The report generator. This workflow takes 15 data points and generates a 2,000-word analysis with strategic recommendations. It requires genuine reasoning, synthesis of multiple data sources, and creative framing. A 7B model can handle the formatting, but the analytical quality drops noticeably compared to GPT-4. We kept it on the API. At 50 executions/day, the cost is manageable ($62/month), and the quality difference matters.
The email drafter. Similar to the report generator — it drafts complex, multi-paragraph emails that reference previous conversation history and require nuanced tone matching. A fine-tuned model handles simple replies well but struggles with the long-form, context-heavy drafts. We kept the complex drafts on GPT-4 and migrated simple reply templates to the local model, splitting the workflow in two.
Anything touching financial calculations. We have one workflow that takes raw transaction data and produces financial summaries with computed totals. The computation is done in n8n (not the LLM), but the LLM formats the final report. Even though the LLM is just formatting, the stakes are high enough that we kept it on GPT-4 with its lower hallucination rate for numerical tasks. Peace of mind is worth $35/month.
The pattern: keep tasks on the API when they require (a) genuine reasoning over novel inputs, (b) long-form creative generation, or (c) high-stakes accuracy where even a 2% error rate is unacceptable.
Results After 30 Days
Here is what actually happened after one month of running the migrated stack:
Cost reduction: 78%. From $453/month to $97.50/month. We expected to save more, but we kept three workflows on the API that we originally planned to migrate (see above). The savings are still $4,266/year.
Latency improvement: 40% faster on average. This surprised us. Local inference on a Hetzner VPS was consistently faster than OpenAI API calls, especially during peak hours. The email classifier went from 800ms average (OpenAI) to 320ms average (local Ollama). No network round-trip, no API queue, no rate limiting.
| Metric | OpenAI API | Local Ollama | Change |
|---|---|---|---|
| Avg response time (classification) | 800ms | 320ms | -60% |
| Avg response time (extraction) | 1,200ms | 650ms | -46% |
| Avg response time (summarization) | 2,500ms | 1,800ms | -28% |
| P99 response time (all) | 8,500ms | 2,100ms | -75% |
| Rate limit errors/day | 3-5 | 0 | -100% |
Quality metrics:
- Classification accuracy: 97.2% (local) vs 98.1% (OpenAI). Less than 1% difference.
- Extraction accuracy: 95.8% (local) vs 96.4% (OpenAI). Negligible difference.
- Summarization quality (human eval, 100 samples): 4.2/5 (local) vs 4.4/5 (OpenAI). Acceptable.
Reliability: Zero downtime on the Ollama VPS in 30 days. Zero rate limit errors. The OpenAI API, by comparison, had 3-5 rate limit errors per day during peak hours, each requiring retry logic and adding latency.
Surprise benefit: data privacy. With local inference, none of our workflow data leaves our infrastructure. For workflows processing customer emails, invoices, and support tickets, this is a significant compliance benefit we had not fully valued upfront.
The Migration Timeline
For a team with 10-15 n8n workflows and moderate technical comfort, here is a realistic timeline:
- Week 1: Audit all workflows, categorize, prioritize. Export training data.
- Week 2: Fine-tune models for P0 workflows on Ertas. Set up Ollama VPS.
- Week 3: Parallel testing for P0 workflows. Fine-tune P1 models.
- Week 4: Swap P0 to production. Start parallel testing P1.
- Week 5-6: Swap P1. Evaluate P2. Settle into steady state.
Six weeks from start to finish. The first cost savings hit in week 4. By week 6, you are running at full savings with confidence.
The $500/month OpenAI bill was not inevitable. It was a scaling artifact of using general-purpose models for specific tasks. Fine-tuned local models are the fix — and the migration is more straightforward than you think.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- From API-Dependent to Model Owner: A 90-Day Migration Playbook — The complete phased plan for moving from API to owned models.
- n8n + Local LLMs for HIPAA-Compliant Automation — How local models solve the compliance problem for healthcare workflows.
- The n8n-to-Fine-Tuned-Model Agency Playbook — Productizing n8n migrations as an agency service.
- The Hidden Cost of Per-Token AI Pricing — Why per-token pricing is fundamentally misaligned with sustainable businesses.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

I Replaced Every OpenAI Call in My n8n Workflows With a Fine-Tuned Model
A builder's firsthand account of migrating 12 n8n workflows from OpenAI to locally-running fine-tuned models. The costs, the gotchas, and the results after 60 days.

n8n Local AI: Replace OpenAI With Your Own Fine-Tuned Model
Step-by-step guide to replacing OpenAI API calls in your n8n workflows with a locally-running fine-tuned model. Cut costs to zero without sacrificing quality.

n8n + Ollama + Fine-Tuned Models: The Zero-API-Cost Automation Stack
Build powerful AI automations in n8n that cost nothing per execution. This guide shows you how to replace every OpenAI node with a locally-running fine-tuned model via Ollama.