
Fine-Tuned Tool Calling for n8n and Make.com Workflows
Replace the OpenAI node in your n8n or Make.com workflow with a fine-tuned local model. Same tool routing, same structured output, zero API cost. Here's the exact pattern — from extracting training data from workflow logs to deploying via Ollama.
You have an n8n workflow that works. A customer sends a message, an OpenAI node decides which action to take, and the workflow branches accordingly. Lookup order, check status, issue refund, escalate to human. It runs 500 times a day. The OpenAI bill is $450/month and climbing.
Here is the pattern that eliminates that cost entirely: extract training data from your existing workflow executions, fine-tune a small model on your specific tool schema, deploy it locally with Ollama, and swap the OpenAI node for an HTTP Request node pointing at localhost.
Same workflow. Same routing logic. Same structured output. Zero API cost.
The Architecture
The current pattern in most n8n AI workflows:
Webhook → OpenAI Node (tool selection) → Switch Node → Action Branches
The replacement:
Webhook → HTTP Request (local Ollama) → Switch Node → Action Branches
The only node that changes is the AI decision point. Everything downstream — the Switch node, the action branches, the database writes, the email sends — stays identical.
The fine-tuned model's job is narrow: receive a user message, output a tool name and parameters as JSON. It does not need to be a general-purpose assistant. It does not need to write poetry or summarize documents. It needs to classify intent and extract parameters for your specific tools.
The Example: An 8-Tool Customer Support Workflow
We will walk through a real workflow with eight tools:
| Tool | What It Does | Trigger Example |
|---|---|---|
lookup_order | Find order by ID or email | "Where's my order ORD-48291?" |
check_status | Get current order status | "Has my order shipped yet?" |
initiate_refund | Start refund process | "I want my money back" |
update_address | Change shipping address | "I moved to 123 Oak St" |
send_notification | Send email/SMS to customer | "Can you email me a receipt?" |
check_inventory | Check product availability | "Do you have the blue widget in stock?" |
apply_discount | Apply promo code to order | "I have a coupon code SAVE20" |
escalate_to_human | Transfer to human agent | "Let me talk to a person" |
In n8n, this workflow looks like: Webhook trigger, OpenAI node with tool definitions, Switch node on the tool name, eight branches each executing the appropriate action.
The OpenAI node costs $0.02-0.04 per execution depending on the prompt length and model. At 500 executions per day, that is $300-600/month just for the routing decision.
Step 1: Extract Training Data from Workflow Logs
n8n stores execution data for every workflow run. This is your training data source — real user messages paired with the actual tool calls that worked.
In n8n, go to Executions in your workflow and filter for successful runs. Each execution contains the input (user message) and the OpenAI node's output (tool selection + parameters).
Export this data. You can use the n8n API or a Code node to extract it:
// n8n Code Node — extract training pairs from execution history
const executions = await this.helpers.httpRequest({
method: 'GET',
url: 'http://localhost:5678/api/v1/executions',
headers: { 'X-N8N-API-KEY': 'your-api-key' },
qs: { workflowId: 'your-workflow-id', status: 'success', limit: 1000 }
});
const trainingPairs = executions.data.map(exec => {
const webhookData = exec.data.resultData.runData['Webhook'][0];
const openaiData = exec.data.resultData.runData['OpenAI'][0];
return {
userMessage: webhookData.data.main[0][0].json.body.message,
toolCall: openaiData.data.main[0][0].json.tool_calls[0]
};
});
return trainingPairs;
This gives you real input-output pairs. If you have been running the workflow for a month at 500 executions/day, that is 15,000 labeled examples. More than enough.
Filter for quality: Remove executions where the user message was empty, where the tool call was followed by an error in the downstream branch (indicating a wrong routing decision), or where the workflow was manually corrected.
Step 2: Format for Fine-Tuning
Convert your extracted pairs into the standard JSONL training format:
{"messages": [{"role": "system", "content": "You are a customer support router. Given a customer message, call the appropriate tool. Available tools: lookup_order, check_status, initiate_refund, update_address, send_notification, check_inventory, apply_discount, escalate_to_human."}, {"role": "user", "content": "Where's my order ORD-48291?"}, {"role": "assistant", "tool_calls": [{"function": {"name": "lookup_order", "arguments": "{\"order_id\": \"ORD-48291\"}"}}]}]}
Include the full tool schemas in the system message. Keep it identical across all examples — this is the same system prompt you will use at inference time.
For the 8-tool workflow, aim for this distribution:
| Tool | Training Examples | % |
|---|---|---|
| lookup_order | 150 | 14% |
| check_status | 150 | 14% |
| initiate_refund | 100 | 9% |
| update_address | 80 | 7% |
| send_notification | 80 | 7% |
| check_inventory | 100 | 9% |
| apply_discount | 80 | 7% |
| escalate_to_human | 60 | 6% |
| No tool (negative) | 300 | 27% |
| Total | 1,100 | 100% |
The distribution should roughly match your real traffic. If lookup_order accounts for 30% of your requests, it should have the most examples. If you do not have enough real data for a tool, use synthetic expansion to fill the gaps.
Step 3: Fine-Tune the Model
For tool-calling routing, a 7B or 8B model is more than sufficient. Qwen 2.5 7B and Llama 3.3 8B both work well. You are training a classifier with structured output, not a general assistant.
With Ertas, the fine-tuning process is:
- Upload your JSONL dataset
- Select your base model (Qwen 2.5 7B recommended for tool calling)
- Configure LoRA — rank 16, alpha 32, target all attention layers
- Train for 3 epochs with learning rate 2e-4
Training time on a single GPU: 30-45 minutes for 1,100 examples. The output is a LoRA adapter — a 50-100MB file that layers on top of the base model.
If you do not have enough real execution data yet, build a synthetic dataset from your tool schemas first, train an initial model, deploy it, and retrain on real data after a few weeks.
Step 4: Deploy with Ollama
Export your fine-tuned model in GGUF format and create an Ollama Modelfile:
FROM ./qwen2.5-7b-customer-tools.gguf
PARAMETER temperature 0.1
PARAMETER num_ctx 2048
SYSTEM """You are a customer support router. Given a customer message, call the appropriate tool. Available tools: lookup_order, check_status, initiate_refund, update_address, send_notification, check_inventory, apply_discount, escalate_to_human. Respond with a JSON tool call or a conversational message if no tool applies."""
ollama create customer-router -f Modelfile
ollama run customer-router
Test it:
curl http://localhost:11434/api/chat -d '{
"model": "customer-router",
"messages": [{"role": "user", "content": "I want a refund for ORD-55123, it was defective"}],
"stream": false
}'
Expected output:
{
"tool_calls": [{
"function": {
"name": "initiate_refund",
"arguments": "{\"order_id\": \"ORD-55123\", \"reason\": \"defective\"}"
}
}]
}
Response time on a machine with 16GB RAM: 200-400ms. Fast enough for real-time workflow execution.
Step 5: Replace the OpenAI Node in n8n
In your n8n workflow, replace the OpenAI node with an HTTP Request node:
HTTP Request node configuration:
- Method: POST
- URL:
http://localhost:11434/api/chat - Body (JSON):
{
"model": "customer-router",
"messages": [
{
"role": "user",
"content": "={{ $json.body.message }}"
}
],
"stream": false,
"format": "json"
}
The response from Ollama contains the tool call in the same structure. Your downstream Switch node parses the tool name and routes to the correct branch — exactly like before.
Update the Switch node to parse from the Ollama response format instead of the OpenAI response format. The tool name is in the same location; you may need to adjust the JSON path slightly.
The Cost Comparison
Here is the real math at different automation volumes:
| Daily Executions | Monthly Total | OpenAI Cost (GPT-4o-mini) | OpenAI Cost (GPT-4o) | Fine-Tuned Local | Monthly Savings |
|---|---|---|---|---|---|
| 100 | 3,000 | $60/mo | $90/mo | $0/mo | $60-90/mo |
| 500 | 15,000 | $300/mo | $450/mo | $0/mo | $300-450/mo |
| 2,000 | 60,000 | $1,200/mo | $1,800/mo | $0/mo | $1,200-1,800/mo |
| 5,000 | 150,000 | $3,000/mo | $4,500/mo | $0/mo | $3,000-4,500/mo |
The "Fine-Tuned Local" column is $0 because inference on your own hardware has no marginal cost. You are already paying for the server. The electricity cost for inference is negligible — under $5/month even at 5,000 daily executions.
One-time costs: fine-tuning compute ($5-20 on cloud GPU, or free on Ertas), plus 4-8 hours of your time building the dataset and testing.
Break-even at 100 daily executions: less than one week.
Make.com Equivalent
The same pattern works in Make.com. Replace the OpenAI module with an HTTP module calling your local Ollama endpoint.
HTTP Module configuration:
- URL:
http://your-server-ip:11434/api/chat - Method: POST
- Body type: Raw (JSON)
- Request content:
{
"model": "customer-router",
"messages": [{"role": "user", "content": "{{1.body.message}}"}],
"stream": false,
"format": "json"
}
If your Make.com instance runs in the cloud and your Ollama instance runs on-premise, you will need to expose the Ollama endpoint. Options: Cloudflare Tunnel (free), ngrok, or a VPN. For production, use Cloudflare Tunnel with access policies to restrict who can reach the endpoint.
Make.com's pricing model actually makes the savings more dramatic. You pay per operation, and the OpenAI module counts as one operation plus you pay OpenAI separately. With the HTTP module calling a local endpoint, you still pay for the Make.com operation but eliminate the OpenAI cost entirely.
Reliability Improvement
This is the underrated benefit. Cost savings get attention, but reliability is often the bigger win.
A generic GPT-4 model occasionally hallucinates tool names. It might output search_order instead of lookup_order, or refund_order instead of initiate_refund. Close enough for a human to understand, wrong enough to crash your Switch node.
A fine-tuned model trained exclusively on your eight tool names will never hallucinate a ninth. It has only ever seen these eight names in its training data. The output space is constrained to your actual schema.
In testing across customer support workflows, fine-tuned 7B models achieve:
- Tool selection accuracy: 94-97% (vs 89-93% for GPT-4o-mini with prompt engineering)
- Parameter extraction accuracy: 91-95% (vs 87-92% for GPT-4o-mini)
- Schema compliance: 99.5%+ (vs 96-98% for GPT-4o-mini)
- Hallucinated tool names: 0% (vs 1-3% for generic models)
That 1-3% hallucination rate on generic models does not sound like much until you are running 500 executions per day. At 2%, that is 10 failed executions daily — 10 customers who hit an error, 10 support tickets, 10 manual interventions.
Handling Edge Cases
What about messages that do not match any tool? The model needs to respond conversationally.
Include negative examples in your training data — messages like "What are your business hours?" or "Thanks for the help!" that should not trigger any tool. Train the model to output a conversational response for these.
What about messages that could match multiple tools? "I want to return my order and change my address" — this needs two tool calls. Train multi-tool examples if your workflow supports sequential execution, or train the model to pick the primary intent and handle the second in a follow-up turn.
What about new tools? When you add a ninth tool to your workflow, you need to retrain. Add examples for the new tool to your dataset, include negative examples that distinguish it from similar existing tools, and run a quick fine-tune. With LoRA, retraining takes 30-45 minutes. You can do it during a lunch break.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Making the Switch
Here is the migration checklist:
- Export execution history from your n8n/Make.com workflow (minimum 500 successful runs)
- Format as JSONL with proper tool-call structure and 20% negative examples
- Fine-tune a 7B model with LoRA (30-45 minutes)
- Deploy to Ollama on the same machine or network as your workflow engine
- Replace the OpenAI node with an HTTP Request node pointing at localhost:11434
- Test with 50 real messages before switching production traffic
- Monitor the first 1,000 executions, comparing tool selection accuracy against the OpenAI baseline
- Retrain monthly with new execution data for continuous improvement
The hardest part is step 1 — getting comfortable with the idea that a 7B model running on your own hardware can match GPT-4's routing accuracy for your specific tools. Once you see the test results, the rest is straightforward plumbing.
Your workflows keep running. Your automation keeps working. Your API bill drops to zero.
Further Reading
- How to Replace the OpenAI Node in n8n with a Fine-Tuned Model — Step-by-step migration guide with node-by-node configuration screenshots.
- Make.com with Local AI: The Complete Integration Guide — Full walkthrough of connecting Make.com HTTP modules to local Ollama endpoints.
- n8n + Ollama + Fine-Tuned Models: The Zero-Cost Stack — Architecture guide for running the complete automation stack on a single server with no recurring costs.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Stop Paying GPT-4 to Call Your APIs: Fine-Tune a Local Tool-Calling Model
You're paying frontier-model prices for what amounts to pattern matching and JSON generation. A fine-tuned 8B model handles tool calling at 90%+ accuracy for zero per-query cost. Here's the math and the migration path.

Building AI Agents That Work Offline: Fine-Tuned Models for Edge Automation
AI agents that depend on cloud APIs are fragile, expensive, and privacy-risky. Fine-tuned tool-calling models running on edge hardware create agents that work offline, respond instantly, and keep data local.

Make.com + Local AI: Automations That Don't Bill You Per Token
Connect Make.com to a locally-running AI model and eliminate per-token API costs from your automations. Step-by-step setup guide for no-code AI builders.