From $500/Month OpenAI Bills to $0: Migrating n8n Workflows to Local Models

You started with one n8n workflow using OpenAI. A simple one — maybe it classified incoming emails or extracted data from form submissions. The API call cost fractions of a penny per execution. Barely noticeable. So you built five more workflows. Then ten. Then you added GPT-4 to the ones that needed better reasoning. Then your colleague saw what you built and asked for three more.

Now you are staring at a $500/month OpenAI bill. And it is climbing.

Here is the thing: most of those workflows do not need GPT-4. They do not even need GPT-3.5. They need a model that is really good at one specific task — classifying, extracting, reformatting, summarizing — and that is exactly what a fine-tuned 7B model does. The migration from OpenAI API calls to local fine-tuned models is not as scary as it sounds, and the cost savings are dramatic: from hundreds of dollars per month to literally zero in per-token costs.

This guide walks through the entire migration, step by step. We will audit your workflows, prioritize what to migrate, fine-tune models for each workflow type, deploy them with Ollama, and swap the endpoints in n8n without breaking anything.

The Migration Audit

Before you migrate anything, you need to know what you are working with. The goal of the audit is to inventory every n8n workflow that uses an AI node, categorize each one by complexity and volume, and identify the quick wins.

Step 1: List every workflow with an AI node. In n8n, go to your workflow list and search for workflows containing OpenAI nodes (or any AI/LLM node). For each workflow, document:

Workflow name and purpose
Which model it uses (GPT-4, GPT-4o, GPT-3.5-turbo)
Approximate executions per day
Average input token count per execution
Average output token count per execution
Whether it uses structured output (JSON mode, function calling)

Step 2: Categorize by task type. Most AI-powered n8n workflows fall into these buckets:

Task Type	Examples	Complexity	Migration Difficulty
Classification	Email routing, ticket categorization, sentiment analysis	Low	Easy
Extraction	Pull names/dates/amounts from text, parse invoices	Low-Medium	Easy
Reformatting	Convert prose to bullet points, standardize formats	Low	Easy
Summarization	Summarize emails, meeting notes, documents	Medium	Moderate
Generation	Write email replies, create descriptions, draft content	Medium-High	Moderate
Reasoning	Multi-step analysis, decision-making, complex Q&A	High	Hard
Code generation	Write SQL queries, generate scripts	High	Hard

Step 3: Calculate per-workflow costs. Multiply each workflow's daily executions by its token usage and the model's per-token rate. Here is a quick reference:

Model	Input Cost (per 1M tokens)	Output Cost (per 1M tokens)
GPT-4o	$2.50	$10.00
GPT-4	$30.00	$60.00
GPT-3.5-turbo	$0.50	$1.50

A workflow running 500 times/day with 800 input tokens and 200 output tokens on GPT-4o costs:

Input: 500 * 800 = 400K tokens/day = 12M tokens/month = $30/month
Output: 500 * 200 = 100K tokens/day = 3M tokens/month = $30/month
Total: $60/month for one workflow

Multiply that across 10-15 workflows and you see how $500/month happens fast.

Which Workflows to Migrate First

Not all workflows are equal candidates for migration. The ideal first targets are:

High volume, low complexity. A workflow that classifies 2,000 emails per day into 5 categories is perfect. It has a clear input-output pattern, high volume (so high savings), and low complexity (a fine-tuned 3B model can handle it easily).

Structured output. Workflows that expect JSON output — like extracting fields from invoices or parsing form data — are excellent candidates. The output format is constrained and predictable, which makes fine-tuning straightforward and evaluation simple. Either the JSON is correct or it is not.

Repetitive patterns. If a workflow does essentially the same transformation thousands of times with only the input data changing, fine-tuning works beautifully. The model just needs to learn the pattern once.

Workflows NOT to migrate first:

Anything requiring up-to-date world knowledge (the fine-tuned model knows what it was trained on, nothing more)
Multi-step reasoning chains where earlier outputs feed into later prompts
Creative generation where quality is subjective and hard to evaluate
Anything with safety-critical consequences (medical, legal, financial advice)

Here is a prioritization framework:

Priority	Criteria	Expected Savings
P0 — Migrate immediately	Classification, extraction, reformatting; >100 executions/day	90-100% cost reduction
P1 — Migrate next	Summarization, simple generation; >50 executions/day	85-95% cost reduction
P2 — Evaluate carefully	Complex generation, multi-step reasoning; any volume	70-90% cost reduction
P3 — Keep on API	Safety-critical, requires world knowledge, highly variable tasks	0% (stay on API)

The Migration Framework

The migration follows four phases. Do not skip phases. Do not rush.

Phase 1: Export Execution Data

For each workflow you are migrating, you need the actual input-output pairs from real executions. This is your training data.

From n8n execution logs: n8n stores execution data for every workflow run. You can access this through the n8n API or directly from the database if you are self-hosting. For each execution of an AI node, extract:

The prompt/input that was sent to OpenAI
The response/output that was received
Whether the workflow completed successfully (filter out failures)

Export script approach:

// Pseudocode for extracting training pairs from n8n executions
const executions = await n8nApi.getExecutions({
  workflowId: "your-workflow-id",
  status: "success",
  limit: 5000,
});

const trainingData = executions.map((exec) => {
  const aiNode = exec.data.resultData.runData["OpenAI Node"][0];
  return {
    input: aiNode.parameters.prompt,
    output: aiNode.data.main[0][0].json.text,
  };
});

// Write as JSONL
const jsonl = trainingData
  .map((d) => JSON.stringify(d))
  .join("\n");
fs.writeFileSync("training-data.jsonl", jsonl);

How much data do you need per workflow type?

Workflow Complexity	Minimum Examples	Recommended	Diminishing Returns After
Classification (5-10 classes)	200	500	2,000
Data extraction	300	800	3,000
Reformatting	200	500	1,500
Summarization	500	1,500	5,000
Content generation	800	2,000	5,000+

For most n8n workflows, two to four weeks of execution logs provide more than enough training data.

Phase 2: Fine-Tune Per Workflow

Now the question: do you train one model for all workflows or one model per workflow type?

One model per workflow type is almost always the right choice. Here is why:

Each model can be small and fast (3B-7B parameters) because it only needs to handle one task
Quality is higher because the model is not confused by competing task patterns
You can update each model independently when requirements change
If one model underperforms, you only retrain that one — not everything

The fine-tuning process with Ertas:

Upload your JSONL training file to Ertas
Select the base model:
- Qwen 2.5 3B for simple classification and extraction (runs on 4GB RAM)
- Qwen 2.5 7B for summarization and generation (runs on 8GB RAM)
Configure LoRA training (defaults are fine for most workflows)
Train — 500 examples on a 3B model takes about 15 minutes, 7B takes about 30 minutes
Evaluate against held-out test examples
Export as GGUF

Cost for fine-tuning: Ertas at $14.50/month includes unlimited training runs. If you are migrating 10 workflows and training one model per workflow type (after deduplicating similar workflows), you might need 5-7 training runs. All included.

Phase 3: Deploy and Test

Deploy all your fine-tuned models on a single Ollama instance. Ollama handles multiple models efficiently — it loads the active model into memory and can swap between models in seconds.

Deployment setup:

# Install Ollama on your VPS
curl -fsSL https://ollama.com/install.sh | sh

# Create each model
ollama create email-classifier -f Modelfile.email-classifier
ollama create invoice-extractor -f Modelfile.invoice-extractor
ollama create ticket-summarizer -f Modelfile.ticket-summarizer

VPS sizing for multiple models:

Number of Models	Active Concurrently	Recommended VPS	Monthly Cost
1-3 (3B models)	1	4 vCPU, 8GB RAM	~$14/mo
3-5 (mix 3B/7B)	1-2	4 vCPU, 16GB RAM	~$26/mo
5-10 (mix 3B/7B)	2-3	8 vCPU, 32GB RAM	~$48/mo
10+ or high concurrency	3+	16 vCPU, 64GB RAM	~$96/mo

Even at the high end — $96/month for running 10+ models — you are paying a fraction of your $500/month OpenAI bill.

Parallel testing strategy: Before swapping anything in production, run your fine-tuned models in parallel with your existing OpenAI workflows for at least one week. Here is how:

Clone each workflow you are migrating
In the clone, replace the OpenAI node with an HTTP Request node pointing to your Ollama endpoint
Run both workflows simultaneously on the same triggers
Compare outputs side by side

Create a simple comparison spreadsheet:

Input	OpenAI Output	Local Model Output	Match?	Notes
...	...	...	Yes/No	...

You want at least 95% match rate for classification and extraction workflows. For generation and summarization, human judgment is needed — but the outputs should be functionally equivalent, not necessarily identical.

Phase 4: Swap and Monitor

Once parallel testing confirms quality, swap the production workflows.

Gradual cutover approach:

Week 1: Migrate P0 workflows (classification, extraction). Keep OpenAI as a fallback — if the local model returns an error or confidence is low, fall back to OpenAI.
Week 2: If P0 is stable, remove the OpenAI fallback for P0 workflows. Migrate P1 workflows with fallback.
Week 3: Remove fallback for P1. Evaluate P2 workflows.
Week 4: Migrate or defer P2 based on evaluation results.

Fallback pattern in n8n:

Input → Local Model (Ollama) → IF (confidence > threshold) → Use local result
                                                           → ELSE → OpenAI API → Use API result

For classification workflows, you can implement confidence thresholds based on the model's output probabilities. For extraction and generation, use a simpler heuristic: if the local model returns a valid response within the expected format, use it. If it errors or returns malformed output, fall back.

Monitoring checklist:

Track error rates per workflow per day
Compare execution times (local should be faster for most tasks)
Log any fallback-to-API events and investigate why
Monitor VPS resource utilization (CPU, RAM)
Check output quality weekly by sampling 20-30 results per workflow

Migration Cost Calculator

Here is what the numbers look like for a typical migration:

Before: OpenAI API costs

Workflow	Model	Executions/Day	Monthly Token Cost
Email classifier	GPT-4o	800	$45
Invoice extractor	GPT-4o	200	$38
Ticket summarizer	GPT-4	150	$85
Lead scorer	GPT-3.5	500	$12
Content reformatter	GPT-4o	300	$28
Report generator	GPT-4	50	$62
Sentiment analyzer	GPT-3.5	1,000	$18
Data normalizer	GPT-4o	400	$32
FAQ responder	GPT-4o	250	$55
Email drafter	GPT-4	100	$78
Total		3,750/day	$453/mo

After: Local fine-tuned models

Cost Component	Monthly
Ollama VPS (8 vCPU, 32GB RAM, Hetzner)	$48
Ertas subscription (unlimited training)	$14.50
OpenAI API (P3 workflows kept on API)	$35
Total	$97.50/mo

Monthly savings: $355.50. Annual savings: $4,266.

And this is a conservative estimate. As workflow volumes grow, the API costs would have grown linearly while the local infrastructure costs stay flat. If your email classifier doubles to 1,600 executions/day, your local cost is still $48/month for the VPS. On OpenAI, that workflow alone would jump to $90/month.

What We Did Not Migrate (and Why)

Honesty matters. Not everything should move off the API. Here are the workflows we intentionally kept on OpenAI:

The report generator. This workflow takes 15 data points and generates a 2,000-word analysis with strategic recommendations. It requires genuine reasoning, synthesis of multiple data sources, and creative framing. A 7B model can handle the formatting, but the analytical quality drops noticeably compared to GPT-4. We kept it on the API. At 50 executions/day, the cost is manageable ($62/month), and the quality difference matters.

The email drafter. Similar to the report generator — it drafts complex, multi-paragraph emails that reference previous conversation history and require nuanced tone matching. A fine-tuned model handles simple replies well but struggles with the long-form, context-heavy drafts. We kept the complex drafts on GPT-4 and migrated simple reply templates to the local model, splitting the workflow in two.

Anything touching financial calculations. We have one workflow that takes raw transaction data and produces financial summaries with computed totals. The computation is done in n8n (not the LLM), but the LLM formats the final report. Even though the LLM is just formatting, the stakes are high enough that we kept it on GPT-4 with its lower hallucination rate for numerical tasks. Peace of mind is worth $35/month.

The pattern: keep tasks on the API when they require (a) genuine reasoning over novel inputs, (b) long-form creative generation, or (c) high-stakes accuracy where even a 2% error rate is unacceptable.

Results After 30 Days

Here is what actually happened after one month of running the migrated stack:

Cost reduction: 78%. From $453/month to $97.50/month. We expected to save more, but we kept three workflows on the API that we originally planned to migrate (see above). The savings are still $4,266/year.

Latency improvement: 40% faster on average. This surprised us. Local inference on a Hetzner VPS was consistently faster than OpenAI API calls, especially during peak hours. The email classifier went from 800ms average (OpenAI) to 320ms average (local Ollama). No network round-trip, no API queue, no rate limiting.

Metric	OpenAI API	Local Ollama	Change
Avg response time (classification)	800ms	320ms	-60%
Avg response time (extraction)	1,200ms	650ms	-46%
Avg response time (summarization)	2,500ms	1,800ms	-28%
P99 response time (all)	8,500ms	2,100ms	-75%
Rate limit errors/day	3-5	0	-100%

Quality metrics:

Classification accuracy: 97.2% (local) vs 98.1% (OpenAI). Less than 1% difference.
Extraction accuracy: 95.8% (local) vs 96.4% (OpenAI). Negligible difference.
Summarization quality (human eval, 100 samples): 4.2/5 (local) vs 4.4/5 (OpenAI). Acceptable.

Reliability: Zero downtime on the Ollama VPS in 30 days. Zero rate limit errors. The OpenAI API, by comparison, had 3-5 rate limit errors per day during peak hours, each requiring retry logic and adding latency.

Surprise benefit: data privacy. With local inference, none of our workflow data leaves our infrastructure. For workflows processing customer emails, invoices, and support tickets, this is a significant compliance benefit we had not fully valued upfront.

The Migration Timeline

For a team with 10-15 n8n workflows and moderate technical comfort, here is a realistic timeline:

Week 1: Audit all workflows, categorize, prioritize. Export training data.
Week 2: Fine-tune models for P0 workflows on Ertas. Set up Ollama VPS.
Week 3: Parallel testing for P0 workflows. Fine-tune P1 models.
Week 4: Swap P0 to production. Start parallel testing P1.
Week 5-6: Swap P1. Evaluate P2. Settle into steady state.

Six weeks from start to finish. The first cost savings hit in week 4. By week 6, you are running at full savings with confidence.

The $500/month OpenAI bill was not inevitable. It was a scaling artifact of using general-purpose models for specific tasks. Fine-tuned local models are the fix — and the migration is more straightforward than you think.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →