Make.com + Local AI: Automations That Don't Bill You Per Token

Make.com is one of the most powerful tools in any AI agency's stack. But if you are building high-volume automations — content pipelines, customer support flows, data enrichment workflows — you already know the pain: every AI module call costs tokens, and those token costs add up fast.

The fix is straightforward: run your AI locally and point your Make.com HTTP modules at your local endpoint instead of OpenAI. This guide walks through exactly how to do it.

Why Local AI Changes the Economics

Standard Make.com AI module setup:

Make.com AI module → calls OpenAI → charges per 1K tokens
100 scenarios/day × 2,000 tokens per run = 200,000 tokens/day
At GPT-4o pricing: ~AU$6/day, AU$180/month per workflow

Local AI setup:

Make.com HTTP module → calls local Ollama endpoint → no per-token charge
Same 100 scenarios/day × 2,000 tokens per run = AU$0/day

The hardware that runs a small local model (a Mac Mini M4 or a used RTX 3080 rig) costs around AU$800-1,500. The break-even on a single high-volume workflow is often under two months.

What You Need

Make.com account (any plan with access to the HTTP module)
Ollama installed locally (free, runs on Mac, Linux, or Windows with WSL)
A model pulled via Ollama (ollama pull llama3.2 or ollama pull mistral)
ngrok or a similar tunnel if your Make.com scenarios run in the cloud (most do)

Step 1: Install Ollama and Pull a Model

Ollama is the easiest way to run local models. Install it from ollama.ai, then open a terminal and pull the model you want:

# For general-purpose tasks
ollama pull llama3.2

# For a smaller, faster model
ollama pull phi4-mini

# For code-heavy workflows
ollama pull qwen2.5-coder

Ollama automatically starts serving an API on http://localhost:11434. You can verify it's working:

curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "prompt": "Say hello"}'

Step 2: Expose Your Local Endpoint

Make.com's automation engine runs in Anthropic's cloud, not on your machine. To make your local Ollama endpoint accessible to Make.com, you need to expose it via a tunnel.

Option A: ngrok (simplest)

# Install ngrok (free tier works)
# Then run:
ngrok http 11434

ngrok gives you a public URL like https://abc123.ngrok-free.app. This is what you will use in Make.com.

Option B: Cloudflare Tunnel (more stable)

If you want a persistent, free tunnel without time limits:

# Install cloudflared
cloudflared tunnel --url http://localhost:11434

Option C: Self-hosted VPS

For production use, run Ollama on a VPS or cloud server instead of your local machine. This eliminates the tunnel requirement entirely and gives you a stable, always-on endpoint.

Step 3: Configure Make.com HTTP Module

In Make.com, instead of using the "OpenAI" module, use the HTTP module with a custom request. Ollama serves an OpenAI-compatible API, so the request format is familiar.

Module settings:

Method: POST
URL: https://your-ngrok-url.ngrok-free.app/v1/chat/completions
Headers:
- Content-Type: application/json
- (No Authorization header needed for local Ollama)
Body type: Raw
Content type: JSON (application/json)

Request body:

{
  "model": "llama3.2",
  "messages": [
    {
      "role": "system",
      "content": "{{1.system_prompt}}"
    },
    {
      "role": "user",
      "content": "{{1.user_input}}"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 500
}

Map the variables to your Make.com scenario data using the variable picker as usual.

Step 4: Parse the Response

The Ollama response follows the same format as OpenAI's chat completions response. The text you want is at:

{{body.choices[].message.content}}

In the Make.com HTTP module response mapping, add a variable pointing to body.choices[1].message.content to extract the AI's response.

Practical Use Cases

Customer Support Triage

Trigger: New support ticket submitted via Typeform → Make.com HTTP module: Sends ticket text to local Llama model with classification prompt Output: Routes to Slack channel based on urgent/billing/technical/general classification

With local AI, you can run this for every incoming ticket — even at 500+ tickets/day — without worrying about API bills.

Content Enrichment Pipeline

Trigger: New row added to Airtable product database HTTP module: Sends product title + features to local model for SEO description generation Output: Updates Airtable row with generated description

This workflow can process thousands of products for AU$0 in AI costs.

Lead Research Summarization

Trigger: New lead added to CRM HTTP module: Sends company name + industry to local model to generate outreach context Output: Adds research summary to CRM lead record before sales team follow-up

Using Fine-Tuned Models in Make.com

The real power comes when you use a fine-tuned model in these workflows instead of a generic base model. If you have fine-tuned a model on your client's brand voice, customer support style, or domain-specific content — you point the Make.com HTTP module at the same Ollama endpoint but specify the fine-tuned model name.

When you fine-tune with Ertas, the output is a GGUF model file you can load directly into Ollama with a custom modelfile. The Make.com integration stays identical — only the model name in the request body changes.

This gives you:

Per-client customization without duplicate infrastructure
No per-token charges regardless of volume
Model output specifically trained on your client's data and style

Troubleshooting Common Issues

Make.com can't reach your endpoint: Check that ngrok is running and the URL hasn't changed. ngrok free tier rotates URLs on restart — use Cloudflare Tunnel or a fixed domain for stability.

Responses are slow: Local models run on your hardware. A 7B model on an M4 Mac Mini processes at ~30-50 tokens/second. For high-concurrency workflows, either run a smaller model (3B) or use server hardware with a GPU.

JSON parsing errors: Some models add markdown formatting or extra text around JSON. Add a post-processing step in Make.com to extract the relevant text, or include "respond only with raw JSON, no markdown" in your system prompt.

Model output quality is lower than expected: Try a different model — Mistral 7B and Llama 3.2 perform differently on different task types. For domain-specific tasks, consider fine-tuning on your data to significantly improve quality.

The Bigger Picture

Make.com is a powerful automation layer, but its value proposition is undermined when every AI call costs money at scale. Moving to local inference is not just a cost optimization — it changes what automations are economically viable.

Workflows that were previously only profitable at low volume become viable at any volume. High-frequency tasks like content classification, entity extraction, and response generation move from "cost center" to "fixed infrastructure cost."

The combination of Make.com's automation flexibility and locally-run fine-tuned models is the foundation of a serious AI agency practice that scales without blowing up your cost structure.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →