Back to blog
    n8n + Ollama + Fine-Tuned Models: The Zero-API-Cost Automation Stack
    n8nollamafine-tuningautomationzero-costsegment:builder

    n8n + Ollama + Fine-Tuned Models: The Zero-API-Cost Automation Stack

    Build powerful AI automations in n8n that cost nothing per execution. This guide shows you how to replace every OpenAI node with a locally-running fine-tuned model via Ollama.

    EErtas Team·

    n8n is one of the best tools in the automation world. Self-hosted, open source, endlessly flexible. You can wire up hundreds of workflows that move data between apps, process documents, classify inputs, and generate outputs — all without writing a traditional backend.

    But the moment you add an AI node to an n8n workflow, you introduce a variable cost that scales with every execution. Each time that workflow fires — each email classified, each document summarized, each lead scored — you're paying OpenAI for tokens.

    At 100 executions per day, it's negligible. At 1,000 per day, you start noticing. At 10,000+ per day, it's your single largest operational expense after hosting.

    This guide shows you how to eliminate that cost entirely by replacing every OpenAI node in your n8n workflows with a locally-running fine-tuned model served through Ollama.

    The Hidden Cost of AI Nodes in n8n

    Let's start with what's actually happening behind the scenes when you use an OpenAI node in n8n.

    Every time an n8n workflow with an AI node executes, it sends a request to the OpenAI API. That request includes your system prompt, the input data from the previous node, and any context you've injected. OpenAI processes it, returns the response, and charges you based on token count.

    Here's what that costs for common automation patterns:

    Workflow TypeAvg Input TokensAvg Output TokensCost Per Execution1K Runs/Day (Monthly)
    Email classification80050$0.027$810/mo
    Document summarization2,500400$0.099$2,970/mo
    Lead scoring600100$0.024$720/mo
    Support ticket routing1,00080$0.035$1,050/mo
    Invoice data extraction1,500200$0.057$1,710/mo

    Those numbers assume GPT-4-level pricing ($30/1M input, $60/1M output). Even if you use GPT-3.5-turbo at 10x lower prices, running 1,000 executions per day still costs $70-300/month depending on the workflow.

    And most n8n power users don't run just one workflow. They run dozens. An agency managing client automations might have 50+ workflows with AI nodes across all their clients. The cost compounds fast.

    The worst part: most of these tasks are narrow and repetitive. Classifying emails into 5 categories. Extracting names and dates from invoices. Scoring leads as hot/warm/cold. These aren't tasks that require GPT-4's knowledge of Shakespearean sonnets. They're pattern-matching tasks that a small, specialized model handles perfectly.

    The Stack: n8n + Ollama + Fine-Tuned Models

    Here's the architecture we're building:

    ┌─────────────┐     ┌─────────────┐     ┌──────────────────┐
    │   n8n        │────▶│   Ollama     │────▶│  Fine-Tuned Model │
    │  Workflow    │◀────│   Server     │◀────│  (GGUF format)    │
    └─────────────┘     └─────────────┘     └──────────────────┘
         │                     │
         │  HTTP Request       │  Local inference
         │  (localhost:11434)  │  (zero API cost)
         │                     │
    

    n8n orchestrates your workflows — triggers, data routing, transformations, output actions. This doesn't change.

    Ollama runs on the same server (or a nearby VPS) and serves your fine-tuned model through an API endpoint that's compatible with the OpenAI format. n8n talks to it the same way it talks to OpenAI — just at a different URL.

    Your fine-tuned model is a 7B-parameter model trained on your specific workflow data. It knows how to do the exact tasks your automations need, and nothing else. It runs on CPU (no GPU required for 7B models at automation-scale throughput).

    The result: every execution costs $0 in API fees. Your only costs are the VPS ($30/month) and the fine-tuning platform ($14.50/month for Ertas).

    Step 1: Identify Your AI Workflows

    Start by auditing your n8n instance. Open the workflow list and look for any workflow that contains:

    • OpenAI node (the most obvious one)
    • AI Agent node with an OpenAI model configured
    • HTTP Request node pointing at api.openai.com
    • LangChain nodes using OpenAI as the LLM provider

    For each workflow, note:

    • What task the AI is performing (classification, extraction, generation, summarization)
    • How many times it executes per day/week
    • The system prompt being used
    • The typical input and output format

    The best candidates for replacement are workflows where:

    • The task is narrow and well-defined (classify into N categories, extract specific fields, generate from a template)
    • The workflow runs frequently (100+ times per day)
    • The output format is predictable (JSON, short text, category labels)

    Common high-value targets:

    • Email triage and classification
    • Lead scoring and routing
    • Invoice and receipt data extraction
    • Support ticket categorization
    • Content moderation
    • Sentiment analysis

    Step 2: Collect Training Data From Your Workflows

    This is the step most guides skip, and it's the most important one. Your fine-tuned model is only as good as the data you train it on. The good news: n8n has already been generating your training data.

    Every time your n8n workflow executes, it logs the input and output. These execution logs are your training dataset.

    Here's how to extract them:

    Option A: n8n Execution History (UI)

    1. Open the workflow you want to replace
    2. Click "Executions" in the left sidebar
    3. Filter for successful executions
    4. For each execution, click to view the data at the OpenAI node
    5. Copy the input (what was sent to OpenAI) and output (what OpenAI returned)

    This works for small datasets (under 200 examples) but gets tedious at scale.

    Option B: n8n API (Programmatic)

    n8n has a REST API. You can pull execution data programmatically:

    GET /api/v1/executions?workflowId={id}&status=success&limit=500
    

    For each execution, extract the data at the AI node and format it as a training pair:

    {
      "input": "The system prompt + user input that was sent to OpenAI",
      "output": "The response OpenAI returned"
    }
    

    Option C: Add a Logging Node

    If you want to start collecting data going forward, add a Function node right after your OpenAI node that writes each input/output pair to a Google Sheet, Airtable, or a JSON file. After a few weeks, you'll have a clean dataset.

    How much data do you need?

    Task TypeMinimum ExamplesRecommended
    Binary classification100300+
    Multi-class classification (5-10 classes)200500+
    Data extraction200500+
    Short text generation300800+
    Summarization3001,000+

    Quality matters more than quantity. 300 clean, representative examples will outperform 3,000 noisy ones.

    Step 3: Fine-Tune With Ertas

    Now you've got a dataset. Time to build your model.

    1. Sign up for Ertas at ertas.io. The platform costs $14.50/month and includes everything you need for fine-tuning.

    2. Upload your dataset. Ertas accepts JSONL (one JSON object per line with "input" and "output" fields), CSV (two columns: input and output), or you can paste data directly into the Studio interface.

    3. Select a base model. For automation tasks, we recommend:

      • Qwen 2.5 7B — best all-around for classification and extraction
      • Llama 3.3 8B — strong for generation tasks and longer outputs
      • Mistral 7B — fast inference, good for high-throughput workflows
    4. Configure and train. Ertas auto-selects LoRA rank, learning rate, and epoch count based on your dataset. You can adjust these if you know what you're doing, but the defaults work well for 90% of use cases. Click "Start Training" and wait — typically 15-45 minutes depending on dataset size.

    5. Evaluate. Ertas runs your model against a held-out test set and shows you accuracy metrics. For classification tasks, you'll see precision, recall, and F1 scores. For generation tasks, you'll see sample outputs compared against the expected outputs.

    6. Export to GGUF. Click "Export" and select GGUF format with Q4_K_M quantization (the best balance of quality and file size for 7B models). Download the file — it'll be 4-5GB.

    Step 4: Deploy With Ollama

    Ollama is the bridge between your fine-tuned model and n8n. It serves your GGUF model through a local API that's compatible with the OpenAI format.

    Install Ollama on your VPS:

    curl -fsSL https://ollama.com/install.sh | sh
    

    Create a Modelfile for your fine-tuned model:

    FROM /path/to/your-model.gguf
    
    PARAMETER temperature 0.1
    PARAMETER num_ctx 4096
    

    The low temperature (0.1) is important for automation tasks — you want consistent, deterministic outputs, not creative variation.

    Create and run the model:

    ollama create my-workflow-model -f Modelfile
    ollama run my-workflow-model "Test input here"
    

    Verify the API is accessible:

    curl http://localhost:11434/api/generate -d '{
      "model": "my-workflow-model",
      "prompt": "Test input",
      "stream": false
    }'
    

    For production, make sure Ollama starts on boot and is accessible from your n8n instance. If n8n and Ollama are on the same server, localhost works. If they're on different servers, configure Ollama to bind to 0.0.0.0 and secure the connection.

    VPS Sizing:

    Model SizeMinimum VPSRecommended VPSMonthly Cost
    7B (Q4)4 vCPU, 8GB RAM4 vCPU, 16GB RAM$20-30/mo
    13B (Q4)8 vCPU, 16GB RAM8 vCPU, 32GB RAM$40-60/mo

    A $30/month VPS from Hetzner or DigitalOcean handles a 7B model comfortably, processing 10-20 requests per second for short classification/extraction tasks.

    Step 5: Update n8n Nodes

    Now wire it together. For each workflow where you're replacing the OpenAI node:

    Option A: Replace OpenAI node with HTTP Request node

    1. Delete (or disable) the OpenAI node
    2. Add an HTTP Request node
    3. Configure it:
      • Method: POST
      • URL: http://localhost:11434/api/chat
      • Body (JSON):
    {
      "model": "my-workflow-model",
      "messages": [
        {
          "role": "system",
          "content": "Your system prompt here"
        },
        {
          "role": "user",
          "content": "{{ $json.input_field }}"
        }
      ],
      "stream": false
    }
    
    1. Add a Function node after the HTTP Request to extract the response:
    const response = $input.first().json;
    return [{
      json: {
        result: response.message.content
      }
    }];
    

    Option B: Use the Ollama node (if available)

    Recent versions of n8n include a native Ollama node in the AI nodes section. Configure it with:

    • Base URL: http://localhost:11434
    • Model: my-workflow-model

    This is simpler but gives you less control over parameters.

    Test thoroughly. Run 20-30 real executions through the updated workflow and compare the outputs against what OpenAI was producing. For classification tasks, check that the categories match. For extraction tasks, verify all fields are captured correctly.

    Cost Comparison

    Here's the math for an agency running multiple AI workflows:

    Monthly ExecutionsOpenAI API CostLocal Fine-Tuned CostSavings
    10,000$270 - $990$44.5084-95%
    50,000$1,350 - $4,950$44.5097-99%
    100,000$2,700 - $9,900$44.5098-99.5%
    500,000$13,500 - $49,500$44.50*99.7%+

    *At 500K+ monthly executions, you may need a beefier VPS ($60-100/month) for throughput. Still a rounding error compared to API costs.

    The local cost column is $14.50/month for Ertas + $30/month for the VPS. That's it. No per-execution fee. No per-token charge. No surprise bill at the end of the month.

    For agencies managing automations across multiple clients, this is transformative. Instead of passing API costs through to clients (and watching them churn when the bills climb), you offer a flat-rate service with near-zero marginal cost per execution.

    When This Stack Works Best

    Not every AI task should be fine-tuned. Here's where the n8n + Ollama + fine-tuned stack delivers the biggest wins:

    Classification tasks — Email routing, ticket categorization, sentiment analysis, lead scoring. These are the sweet spot. The task is well-defined, the output format is constrained, and a fine-tuned 7B model typically matches or exceeds GPT-4 accuracy.

    Data extraction — Pulling structured data from invoices, receipts, forms, emails. Fine-tuned models excel here because they learn your specific schema and field names.

    Templated generation — Drafting responses from templates, generating product descriptions from specs, writing follow-up emails based on meeting notes. The output follows a predictable pattern that a small model learns quickly.

    Summarization — Condensing documents, emails, or transcripts into key points. Fine-tuned models produce summaries that match your preferred style and length.

    Where to keep using APIs:

    • Complex multi-step reasoning across diverse domains
    • Tasks requiring up-to-date information (news, current events)
    • One-off creative tasks where consistency doesn't matter
    • Workflows with fewer than 100 monthly executions (the cost savings don't justify the setup)

    The 80/20 rule applies: 80% of your AI automation spend probably comes from 20% of your workflows. Target those high-volume, narrow-task workflows first and you'll capture most of the savings immediately.

    Scaling the Stack

    As your automation volume grows, here's how the stack scales:

    10K-50K executions/month: Single VPS with one model handles this easily. A 7B model on a 4 vCPU / 16GB RAM VPS can process 15-20 requests per second for short tasks.

    50K-200K executions/month: You might need a slightly beefier VPS (8 vCPU / 32GB RAM, ~$50/month) or optimize with model batching. Still dramatically cheaper than API costs.

    200K+ executions/month: Consider running multiple model instances behind a simple load balancer. Two $30 VPS instances give you redundancy and double throughput. Your total infra cost is $75/month compared to $5,000+ in API costs.

    Multiple models for different tasks: You can run several fine-tuned models on the same Ollama instance. One for email classification, another for data extraction, a third for summarization. Each model gets loaded into memory when called and unloaded when idle. A 16GB RAM VPS can serve 2-3 7B models concurrently.

    Getting Started Today

    Here's the minimal path to your first zero-cost AI workflow:

    1. Pick your highest-volume AI workflow in n8n — the one that executes most frequently
    2. Export 300+ execution examples as input/output pairs
    3. Fine-tune on Ertas — upload data, pick Qwen 2.5 7B, train, export GGUF
    4. Deploy on Ollama — install on your VPS, load the model, verify the endpoint
    5. Swap the node — replace the OpenAI node with an HTTP Request pointing at Ollama
    6. Monitor for a week — compare outputs and confirm quality matches

    Once you've validated the first workflow, repeat for every AI workflow in your n8n instance. Most of the work is in Step 2 (collecting the data). The fine-tuning and deployment become routine after you've done it once.

    Your n8n automations shouldn't have a per-execution tax. Build the stack once, run it forever.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading