Back to blog
    From $500/Month OpenAI Bills to $0: Migrating n8n Workflows to Local Models
    n8nopenaimigrationlocal-modelscost-reductionsegment:builder

    From $500/Month OpenAI Bills to $0: Migrating n8n Workflows to Local Models

    A practical migration guide for n8n users spending hundreds on OpenAI API calls. Move your workflows to local fine-tuned models without breaking anything.

    EErtas Team·

    You started with one n8n workflow using OpenAI. A simple one — maybe it classified incoming emails or extracted data from form submissions. The API call cost fractions of a penny per execution. Barely noticeable. So you built five more workflows. Then ten. Then you added GPT-4 to the ones that needed better reasoning. Then your colleague saw what you built and asked for three more.

    Now you are staring at a $500/month OpenAI bill. And it is climbing.

    Here is the thing: most of those workflows do not need GPT-4. They do not even need GPT-3.5. They need a model that is really good at one specific task — classifying, extracting, reformatting, summarizing — and that is exactly what a fine-tuned 7B model does. The migration from OpenAI API calls to local fine-tuned models is not as scary as it sounds, and the cost savings are dramatic: from hundreds of dollars per month to literally zero in per-token costs.

    This guide walks through the entire migration, step by step. We will audit your workflows, prioritize what to migrate, fine-tune models for each workflow type, deploy them with Ollama, and swap the endpoints in n8n without breaking anything.

    The Migration Audit

    Before you migrate anything, you need to know what you are working with. The goal of the audit is to inventory every n8n workflow that uses an AI node, categorize each one by complexity and volume, and identify the quick wins.

    Step 1: List every workflow with an AI node. In n8n, go to your workflow list and search for workflows containing OpenAI nodes (or any AI/LLM node). For each workflow, document:

    • Workflow name and purpose
    • Which model it uses (GPT-4, GPT-4o, GPT-3.5-turbo)
    • Approximate executions per day
    • Average input token count per execution
    • Average output token count per execution
    • Whether it uses structured output (JSON mode, function calling)

    Step 2: Categorize by task type. Most AI-powered n8n workflows fall into these buckets:

    Task TypeExamplesComplexityMigration Difficulty
    ClassificationEmail routing, ticket categorization, sentiment analysisLowEasy
    ExtractionPull names/dates/amounts from text, parse invoicesLow-MediumEasy
    ReformattingConvert prose to bullet points, standardize formatsLowEasy
    SummarizationSummarize emails, meeting notes, documentsMediumModerate
    GenerationWrite email replies, create descriptions, draft contentMedium-HighModerate
    ReasoningMulti-step analysis, decision-making, complex Q&AHighHard
    Code generationWrite SQL queries, generate scriptsHighHard

    Step 3: Calculate per-workflow costs. Multiply each workflow's daily executions by its token usage and the model's per-token rate. Here is a quick reference:

    ModelInput Cost (per 1M tokens)Output Cost (per 1M tokens)
    GPT-4o$2.50$10.00
    GPT-4$30.00$60.00
    GPT-3.5-turbo$0.50$1.50

    A workflow running 500 times/day with 800 input tokens and 200 output tokens on GPT-4o costs:

    • Input: 500 * 800 = 400K tokens/day = 12M tokens/month = $30/month
    • Output: 500 * 200 = 100K tokens/day = 3M tokens/month = $30/month
    • Total: $60/month for one workflow

    Multiply that across 10-15 workflows and you see how $500/month happens fast.

    Which Workflows to Migrate First

    Not all workflows are equal candidates for migration. The ideal first targets are:

    High volume, low complexity. A workflow that classifies 2,000 emails per day into 5 categories is perfect. It has a clear input-output pattern, high volume (so high savings), and low complexity (a fine-tuned 3B model can handle it easily).

    Structured output. Workflows that expect JSON output — like extracting fields from invoices or parsing form data — are excellent candidates. The output format is constrained and predictable, which makes fine-tuning straightforward and evaluation simple. Either the JSON is correct or it is not.

    Repetitive patterns. If a workflow does essentially the same transformation thousands of times with only the input data changing, fine-tuning works beautifully. The model just needs to learn the pattern once.

    Workflows NOT to migrate first:

    • Anything requiring up-to-date world knowledge (the fine-tuned model knows what it was trained on, nothing more)
    • Multi-step reasoning chains where earlier outputs feed into later prompts
    • Creative generation where quality is subjective and hard to evaluate
    • Anything with safety-critical consequences (medical, legal, financial advice)

    Here is a prioritization framework:

    PriorityCriteriaExpected Savings
    P0 — Migrate immediatelyClassification, extraction, reformatting; >100 executions/day90-100% cost reduction
    P1 — Migrate nextSummarization, simple generation; >50 executions/day85-95% cost reduction
    P2 — Evaluate carefullyComplex generation, multi-step reasoning; any volume70-90% cost reduction
    P3 — Keep on APISafety-critical, requires world knowledge, highly variable tasks0% (stay on API)

    The Migration Framework

    The migration follows four phases. Do not skip phases. Do not rush.

    Phase 1: Export Execution Data

    For each workflow you are migrating, you need the actual input-output pairs from real executions. This is your training data.

    From n8n execution logs: n8n stores execution data for every workflow run. You can access this through the n8n API or directly from the database if you are self-hosting. For each execution of an AI node, extract:

    • The prompt/input that was sent to OpenAI
    • The response/output that was received
    • Whether the workflow completed successfully (filter out failures)

    Export script approach:

    // Pseudocode for extracting training pairs from n8n executions
    const executions = await n8nApi.getExecutions({
      workflowId: "your-workflow-id",
      status: "success",
      limit: 5000,
    });
    
    const trainingData = executions.map((exec) => {
      const aiNode = exec.data.resultData.runData["OpenAI Node"][0];
      return {
        input: aiNode.parameters.prompt,
        output: aiNode.data.main[0][0].json.text,
      };
    });
    
    // Write as JSONL
    const jsonl = trainingData
      .map((d) => JSON.stringify(d))
      .join("\n");
    fs.writeFileSync("training-data.jsonl", jsonl);
    

    How much data do you need per workflow type?

    Workflow ComplexityMinimum ExamplesRecommendedDiminishing Returns After
    Classification (5-10 classes)2005002,000
    Data extraction3008003,000
    Reformatting2005001,500
    Summarization5001,5005,000
    Content generation8002,0005,000+

    For most n8n workflows, two to four weeks of execution logs provide more than enough training data.

    Phase 2: Fine-Tune Per Workflow

    Now the question: do you train one model for all workflows or one model per workflow type?

    One model per workflow type is almost always the right choice. Here is why:

    • Each model can be small and fast (3B-7B parameters) because it only needs to handle one task
    • Quality is higher because the model is not confused by competing task patterns
    • You can update each model independently when requirements change
    • If one model underperforms, you only retrain that one — not everything

    The fine-tuning process with Ertas:

    1. Upload your JSONL training file to Ertas
    2. Select the base model:
      • Qwen 2.5 3B for simple classification and extraction (runs on 4GB RAM)
      • Qwen 2.5 7B for summarization and generation (runs on 8GB RAM)
    3. Configure LoRA training (defaults are fine for most workflows)
    4. Train — 500 examples on a 3B model takes about 15 minutes, 7B takes about 30 minutes
    5. Evaluate against held-out test examples
    6. Export as GGUF

    Cost for fine-tuning: Ertas at $14.50/month includes unlimited training runs. If you are migrating 10 workflows and training one model per workflow type (after deduplicating similar workflows), you might need 5-7 training runs. All included.

    Phase 3: Deploy and Test

    Deploy all your fine-tuned models on a single Ollama instance. Ollama handles multiple models efficiently — it loads the active model into memory and can swap between models in seconds.

    Deployment setup:

    # Install Ollama on your VPS
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Create each model
    ollama create email-classifier -f Modelfile.email-classifier
    ollama create invoice-extractor -f Modelfile.invoice-extractor
    ollama create ticket-summarizer -f Modelfile.ticket-summarizer
    

    VPS sizing for multiple models:

    Number of ModelsActive ConcurrentlyRecommended VPSMonthly Cost
    1-3 (3B models)14 vCPU, 8GB RAM~$14/mo
    3-5 (mix 3B/7B)1-24 vCPU, 16GB RAM~$26/mo
    5-10 (mix 3B/7B)2-38 vCPU, 32GB RAM~$48/mo
    10+ or high concurrency3+16 vCPU, 64GB RAM~$96/mo

    Even at the high end — $96/month for running 10+ models — you are paying a fraction of your $500/month OpenAI bill.

    Parallel testing strategy: Before swapping anything in production, run your fine-tuned models in parallel with your existing OpenAI workflows for at least one week. Here is how:

    1. Clone each workflow you are migrating
    2. In the clone, replace the OpenAI node with an HTTP Request node pointing to your Ollama endpoint
    3. Run both workflows simultaneously on the same triggers
    4. Compare outputs side by side

    Create a simple comparison spreadsheet:

    InputOpenAI OutputLocal Model OutputMatch?Notes
    .........Yes/No...

    You want at least 95% match rate for classification and extraction workflows. For generation and summarization, human judgment is needed — but the outputs should be functionally equivalent, not necessarily identical.

    Phase 4: Swap and Monitor

    Once parallel testing confirms quality, swap the production workflows.

    Gradual cutover approach:

    • Week 1: Migrate P0 workflows (classification, extraction). Keep OpenAI as a fallback — if the local model returns an error or confidence is low, fall back to OpenAI.
    • Week 2: If P0 is stable, remove the OpenAI fallback for P0 workflows. Migrate P1 workflows with fallback.
    • Week 3: Remove fallback for P1. Evaluate P2 workflows.
    • Week 4: Migrate or defer P2 based on evaluation results.

    Fallback pattern in n8n:

    Input → Local Model (Ollama) → IF (confidence > threshold) → Use local result
                                                               → ELSE → OpenAI API → Use API result
    

    For classification workflows, you can implement confidence thresholds based on the model's output probabilities. For extraction and generation, use a simpler heuristic: if the local model returns a valid response within the expected format, use it. If it errors or returns malformed output, fall back.

    Monitoring checklist:

    • Track error rates per workflow per day
    • Compare execution times (local should be faster for most tasks)
    • Log any fallback-to-API events and investigate why
    • Monitor VPS resource utilization (CPU, RAM)
    • Check output quality weekly by sampling 20-30 results per workflow

    Migration Cost Calculator

    Here is what the numbers look like for a typical migration:

    Before: OpenAI API costs

    WorkflowModelExecutions/DayMonthly Token Cost
    Email classifierGPT-4o800$45
    Invoice extractorGPT-4o200$38
    Ticket summarizerGPT-4150$85
    Lead scorerGPT-3.5500$12
    Content reformatterGPT-4o300$28
    Report generatorGPT-450$62
    Sentiment analyzerGPT-3.51,000$18
    Data normalizerGPT-4o400$32
    FAQ responderGPT-4o250$55
    Email drafterGPT-4100$78
    Total3,750/day$453/mo

    After: Local fine-tuned models

    Cost ComponentMonthly
    Ollama VPS (8 vCPU, 32GB RAM, Hetzner)$48
    Ertas subscription (unlimited training)$14.50
    OpenAI API (P3 workflows kept on API)$35
    Total$97.50/mo

    Monthly savings: $355.50. Annual savings: $4,266.

    And this is a conservative estimate. As workflow volumes grow, the API costs would have grown linearly while the local infrastructure costs stay flat. If your email classifier doubles to 1,600 executions/day, your local cost is still $48/month for the VPS. On OpenAI, that workflow alone would jump to $90/month.

    What We Did Not Migrate (and Why)

    Honesty matters. Not everything should move off the API. Here are the workflows we intentionally kept on OpenAI:

    The report generator. This workflow takes 15 data points and generates a 2,000-word analysis with strategic recommendations. It requires genuine reasoning, synthesis of multiple data sources, and creative framing. A 7B model can handle the formatting, but the analytical quality drops noticeably compared to GPT-4. We kept it on the API. At 50 executions/day, the cost is manageable ($62/month), and the quality difference matters.

    The email drafter. Similar to the report generator — it drafts complex, multi-paragraph emails that reference previous conversation history and require nuanced tone matching. A fine-tuned model handles simple replies well but struggles with the long-form, context-heavy drafts. We kept the complex drafts on GPT-4 and migrated simple reply templates to the local model, splitting the workflow in two.

    Anything touching financial calculations. We have one workflow that takes raw transaction data and produces financial summaries with computed totals. The computation is done in n8n (not the LLM), but the LLM formats the final report. Even though the LLM is just formatting, the stakes are high enough that we kept it on GPT-4 with its lower hallucination rate for numerical tasks. Peace of mind is worth $35/month.

    The pattern: keep tasks on the API when they require (a) genuine reasoning over novel inputs, (b) long-form creative generation, or (c) high-stakes accuracy where even a 2% error rate is unacceptable.

    Results After 30 Days

    Here is what actually happened after one month of running the migrated stack:

    Cost reduction: 78%. From $453/month to $97.50/month. We expected to save more, but we kept three workflows on the API that we originally planned to migrate (see above). The savings are still $4,266/year.

    Latency improvement: 40% faster on average. This surprised us. Local inference on a Hetzner VPS was consistently faster than OpenAI API calls, especially during peak hours. The email classifier went from 800ms average (OpenAI) to 320ms average (local Ollama). No network round-trip, no API queue, no rate limiting.

    MetricOpenAI APILocal OllamaChange
    Avg response time (classification)800ms320ms-60%
    Avg response time (extraction)1,200ms650ms-46%
    Avg response time (summarization)2,500ms1,800ms-28%
    P99 response time (all)8,500ms2,100ms-75%
    Rate limit errors/day3-50-100%

    Quality metrics:

    • Classification accuracy: 97.2% (local) vs 98.1% (OpenAI). Less than 1% difference.
    • Extraction accuracy: 95.8% (local) vs 96.4% (OpenAI). Negligible difference.
    • Summarization quality (human eval, 100 samples): 4.2/5 (local) vs 4.4/5 (OpenAI). Acceptable.

    Reliability: Zero downtime on the Ollama VPS in 30 days. Zero rate limit errors. The OpenAI API, by comparison, had 3-5 rate limit errors per day during peak hours, each requiring retry logic and adding latency.

    Surprise benefit: data privacy. With local inference, none of our workflow data leaves our infrastructure. For workflows processing customer emails, invoices, and support tickets, this is a significant compliance benefit we had not fully valued upfront.

    The Migration Timeline

    For a team with 10-15 n8n workflows and moderate technical comfort, here is a realistic timeline:

    • Week 1: Audit all workflows, categorize, prioritize. Export training data.
    • Week 2: Fine-tune models for P0 workflows on Ertas. Set up Ollama VPS.
    • Week 3: Parallel testing for P0 workflows. Fine-tune P1 models.
    • Week 4: Swap P0 to production. Start parallel testing P1.
    • Week 5-6: Swap P1. Evaluate P2. Settle into steady state.

    Six weeks from start to finish. The first cost savings hit in week 4. By week 6, you are running at full savings with confidence.

    The $500/month OpenAI bill was not inevitable. It was a scaling artifact of using general-purpose models for specific tasks. Fine-tuned local models are the fix — and the migration is more straightforward than you think.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading