
Make.com + Local AI: Automations That Don't Bill You Per Token
Connect Make.com to a locally-running AI model and eliminate per-token API costs from your automations. Step-by-step setup guide for no-code AI builders.
Make.com is one of the most powerful tools in any AI agency's stack. But if you are building high-volume automations — content pipelines, customer support flows, data enrichment workflows — you already know the pain: every AI module call costs tokens, and those token costs add up fast.
The fix is straightforward: run your AI locally and point your Make.com HTTP modules at your local endpoint instead of OpenAI. This guide walks through exactly how to do it.
Why Local AI Changes the Economics
Standard Make.com AI module setup:
- Make.com AI module → calls OpenAI → charges per 1K tokens
- 100 scenarios/day × 2,000 tokens per run = 200,000 tokens/day
- At GPT-4o pricing: ~AU$6/day, AU$180/month per workflow
Local AI setup:
- Make.com HTTP module → calls local Ollama endpoint → no per-token charge
- Same 100 scenarios/day × 2,000 tokens per run = AU$0/day
The hardware that runs a small local model (a Mac Mini M4 or a used RTX 3080 rig) costs around AU$800-1,500. The break-even on a single high-volume workflow is often under two months.
What You Need
- Make.com account (any plan with access to the HTTP module)
- Ollama installed locally (free, runs on Mac, Linux, or Windows with WSL)
- A model pulled via Ollama (
ollama pull llama3.2orollama pull mistral) - ngrok or a similar tunnel if your Make.com scenarios run in the cloud (most do)
Step 1: Install Ollama and Pull a Model
Ollama is the easiest way to run local models. Install it from ollama.ai, then open a terminal and pull the model you want:
# For general-purpose tasks
ollama pull llama3.2
# For a smaller, faster model
ollama pull phi4-mini
# For code-heavy workflows
ollama pull qwen2.5-coder
Ollama automatically starts serving an API on http://localhost:11434. You can verify it's working:
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "prompt": "Say hello"}'
Step 2: Expose Your Local Endpoint
Make.com's automation engine runs in Anthropic's cloud, not on your machine. To make your local Ollama endpoint accessible to Make.com, you need to expose it via a tunnel.
Option A: ngrok (simplest)
# Install ngrok (free tier works)
# Then run:
ngrok http 11434
ngrok gives you a public URL like https://abc123.ngrok-free.app. This is what you will use in Make.com.
Option B: Cloudflare Tunnel (more stable)
If you want a persistent, free tunnel without time limits:
# Install cloudflared
cloudflared tunnel --url http://localhost:11434
Option C: Self-hosted VPS
For production use, run Ollama on a VPS or cloud server instead of your local machine. This eliminates the tunnel requirement entirely and gives you a stable, always-on endpoint.
Step 3: Configure Make.com HTTP Module
In Make.com, instead of using the "OpenAI" module, use the HTTP module with a custom request. Ollama serves an OpenAI-compatible API, so the request format is familiar.
Module settings:
- Method: POST
- URL:
https://your-ngrok-url.ngrok-free.app/v1/chat/completions - Headers:
Content-Type: application/json- (No Authorization header needed for local Ollama)
- Body type: Raw
- Content type: JSON (application/json)
Request body:
{
"model": "llama3.2",
"messages": [
{
"role": "system",
"content": "{{1.system_prompt}}"
},
{
"role": "user",
"content": "{{1.user_input}}"
}
],
"temperature": 0.7,
"max_tokens": 500
}
Map the variables to your Make.com scenario data using the variable picker as usual.
Step 4: Parse the Response
The Ollama response follows the same format as OpenAI's chat completions response. The text you want is at:
{{body.choices[].message.content}}
In the Make.com HTTP module response mapping, add a variable pointing to body.choices[1].message.content to extract the AI's response.
Practical Use Cases
Customer Support Triage
Trigger: New support ticket submitted via Typeform → Make.com
HTTP module: Sends ticket text to local Llama model with classification prompt
Output: Routes to Slack channel based on urgent/billing/technical/general classification
With local AI, you can run this for every incoming ticket — even at 500+ tickets/day — without worrying about API bills.
Content Enrichment Pipeline
Trigger: New row added to Airtable product database HTTP module: Sends product title + features to local model for SEO description generation Output: Updates Airtable row with generated description
This workflow can process thousands of products for AU$0 in AI costs.
Lead Research Summarization
Trigger: New lead added to CRM HTTP module: Sends company name + industry to local model to generate outreach context Output: Adds research summary to CRM lead record before sales team follow-up
Using Fine-Tuned Models in Make.com
The real power comes when you use a fine-tuned model in these workflows instead of a generic base model. If you have fine-tuned a model on your client's brand voice, customer support style, or domain-specific content — you point the Make.com HTTP module at the same Ollama endpoint but specify the fine-tuned model name.
When you fine-tune with Ertas, the output is a GGUF model file you can load directly into Ollama with a custom modelfile. The Make.com integration stays identical — only the model name in the request body changes.
This gives you:
- Per-client customization without duplicate infrastructure
- No per-token charges regardless of volume
- Model output specifically trained on your client's data and style
Troubleshooting Common Issues
Make.com can't reach your endpoint: Check that ngrok is running and the URL hasn't changed. ngrok free tier rotates URLs on restart — use Cloudflare Tunnel or a fixed domain for stability.
Responses are slow: Local models run on your hardware. A 7B model on an M4 Mac Mini processes at ~30-50 tokens/second. For high-concurrency workflows, either run a smaller model (3B) or use server hardware with a GPU.
JSON parsing errors: Some models add markdown formatting or extra text around JSON. Add a post-processing step in Make.com to extract the relevant text, or include "respond only with raw JSON, no markdown" in your system prompt.
Model output quality is lower than expected: Try a different model — Mistral 7B and Llama 3.2 perform differently on different task types. For domain-specific tasks, consider fine-tuning on your data to significantly improve quality.
The Bigger Picture
Make.com is a powerful automation layer, but its value proposition is undermined when every AI call costs money at scale. Moving to local inference is not just a cost optimization — it changes what automations are economically viable.
Workflows that were previously only profitable at low volume become viable at any volume. High-frequency tasks like content classification, entity extraction, and response generation move from "cost center" to "fixed infrastructure cost."
The combination of Make.com's automation flexibility and locally-run fine-tuned models is the foundation of a serious AI agency practice that scales without blowing up your cost structure.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- How to Cut Your AI Agency Costs by 90% with Fine-Tuned Local Models — The full economics of moving off cloud APIs
- Running AI Models Locally — Hardware guide and Ollama deep dive
- n8n + Local LLM + HIPAA Automation — Similar setup for n8n workflows with compliance requirements
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Fine-Tuned Tool Calling for n8n and Make.com Workflows
Replace the OpenAI node in your n8n or Make.com workflow with a fine-tuned local model. Same tool routing, same structured output, zero API cost. Here's the exact pattern — from extracting training data from workflow logs to deploying via Ollama.

Cleaning and Curating Datasets for Fine-Tuning Without a Data Science Team
Step-by-step guide to cleaning, validating, and curating fine-tuning datasets using no-code tools — covering deduplication, label validation, format checks, and distribution analysis for non-technical teams.

The Solo AI Agency Tech Stack: 8 Tools, Zero Full-Time Hires
Running an AI agency solo in 2026 is possible with the right stack. Here are the 8 core tools, what each costs, and what they let you accomplish without hiring.