Fine-Tuned Tool Calling for n8n and Make.com Workflows

You have an n8n workflow that works. A customer sends a message, an OpenAI node decides which action to take, and the workflow branches accordingly. Lookup order, check status, issue refund, escalate to human. It runs 500 times a day. The OpenAI bill is $450/month and climbing.

Here is the pattern that eliminates that cost entirely: extract training data from your existing workflow executions, fine-tune a small model on your specific tool schema, deploy it locally with Ollama, and swap the OpenAI node for an HTTP Request node pointing at localhost.

Same workflow. Same routing logic. Same structured output. Zero API cost.

The Architecture

The current pattern in most n8n AI workflows:

Webhook → OpenAI Node (tool selection) → Switch Node → Action Branches

The replacement:

Webhook → HTTP Request (local Ollama) → Switch Node → Action Branches

The only node that changes is the AI decision point. Everything downstream — the Switch node, the action branches, the database writes, the email sends — stays identical.

The fine-tuned model's job is narrow: receive a user message, output a tool name and parameters as JSON. It does not need to be a general-purpose assistant. It does not need to write poetry or summarize documents. It needs to classify intent and extract parameters for your specific tools.

The Example: An 8-Tool Customer Support Workflow

We will walk through a real workflow with eight tools:

Tool	What It Does	Trigger Example
`lookup_order`	Find order by ID or email	"Where's my order ORD-48291?"
`check_status`	Get current order status	"Has my order shipped yet?"
`initiate_refund`	Start refund process	"I want my money back"
`update_address`	Change shipping address	"I moved to 123 Oak St"
`send_notification`	Send email/SMS to customer	"Can you email me a receipt?"
`check_inventory`	Check product availability	"Do you have the blue widget in stock?"
`apply_discount`	Apply promo code to order	"I have a coupon code SAVE20"
`escalate_to_human`	Transfer to human agent	"Let me talk to a person"

In n8n, this workflow looks like: Webhook trigger, OpenAI node with tool definitions, Switch node on the tool name, eight branches each executing the appropriate action.

The OpenAI node costs $0.02-0.04 per execution depending on the prompt length and model. At 500 executions per day, that is $300-600/month just for the routing decision.

Step 1: Extract Training Data from Workflow Logs

n8n stores execution data for every workflow run. This is your training data source — real user messages paired with the actual tool calls that worked.

In n8n, go to Executions in your workflow and filter for successful runs. Each execution contains the input (user message) and the OpenAI node's output (tool selection + parameters).

Export this data. You can use the n8n API or a Code node to extract it:

// n8n Code Node — extract training pairs from execution history
const executions = await this.helpers.httpRequest({
  method: 'GET',
  url: 'http://localhost:5678/api/v1/executions',
  headers: { 'X-N8N-API-KEY': 'your-api-key' },
  qs: { workflowId: 'your-workflow-id', status: 'success', limit: 1000 }
});

const trainingPairs = executions.data.map(exec => {
  const webhookData = exec.data.resultData.runData['Webhook'][0];
  const openaiData = exec.data.resultData.runData['OpenAI'][0];

  return {
    userMessage: webhookData.data.main[0][0].json.body.message,
    toolCall: openaiData.data.main[0][0].json.tool_calls[0]
  };
});

return trainingPairs;

This gives you real input-output pairs. If you have been running the workflow for a month at 500 executions/day, that is 15,000 labeled examples. More than enough.

Filter for quality: Remove executions where the user message was empty, where the tool call was followed by an error in the downstream branch (indicating a wrong routing decision), or where the workflow was manually corrected.

Step 2: Format for Fine-Tuning

Convert your extracted pairs into the standard JSONL training format:

{"messages": [{"role": "system", "content": "You are a customer support router. Given a customer message, call the appropriate tool. Available tools: lookup_order, check_status, initiate_refund, update_address, send_notification, check_inventory, apply_discount, escalate_to_human."}, {"role": "user", "content": "Where's my order ORD-48291?"}, {"role": "assistant", "tool_calls": [{"function": {"name": "lookup_order", "arguments": "{\"order_id\": \"ORD-48291\"}"}}]}]}

Include the full tool schemas in the system message. Keep it identical across all examples — this is the same system prompt you will use at inference time.

For the 8-tool workflow, aim for this distribution:

Tool	Training Examples	%
lookup_order	150	14%
check_status	150	14%
initiate_refund	100	9%
update_address	80	7%
send_notification	80	7%
check_inventory	100	9%
apply_discount	80	7%
escalate_to_human	60	6%
No tool (negative)	300	27%
Total	1,100	100%

The distribution should roughly match your real traffic. If lookup_order accounts for 30% of your requests, it should have the most examples. If you do not have enough real data for a tool, use synthetic expansion to fill the gaps.

Step 3: Fine-Tune the Model

For tool-calling routing, a 7B or 8B model is more than sufficient. Qwen 2.5 7B and Llama 3.3 8B both work well. You are training a classifier with structured output, not a general assistant.

With Ertas, the fine-tuning process is:

Upload your JSONL dataset
Select your base model (Qwen 2.5 7B recommended for tool calling)
Configure LoRA — rank 16, alpha 32, target all attention layers
Train for 3 epochs with learning rate 2e-4

Training time on a single GPU: 30-45 minutes for 1,100 examples. The output is a LoRA adapter — a 50-100MB file that layers on top of the base model.

If you do not have enough real execution data yet, build a synthetic dataset from your tool schemas first, train an initial model, deploy it, and retrain on real data after a few weeks.

Step 4: Deploy with Ollama

Export your fine-tuned model in GGUF format and create an Ollama Modelfile:

FROM ./qwen2.5-7b-customer-tools.gguf

PARAMETER temperature 0.1
PARAMETER num_ctx 2048

SYSTEM """You are a customer support router. Given a customer message, call the appropriate tool. Available tools: lookup_order, check_status, initiate_refund, update_address, send_notification, check_inventory, apply_discount, escalate_to_human. Respond with a JSON tool call or a conversational message if no tool applies."""

ollama create customer-router -f Modelfile
ollama run customer-router

Test it:

curl http://localhost:11434/api/chat -d '{
  "model": "customer-router",
  "messages": [{"role": "user", "content": "I want a refund for ORD-55123, it was defective"}],
  "stream": false
}'

Expected output:

{
  "tool_calls": [{
    "function": {
      "name": "initiate_refund",
      "arguments": "{\"order_id\": \"ORD-55123\", \"reason\": \"defective\"}"
    }
  }]
}

Response time on a machine with 16GB RAM: 200-400ms. Fast enough for real-time workflow execution.

Step 5: Replace the OpenAI Node in n8n

In your n8n workflow, replace the OpenAI node with an HTTP Request node:

HTTP Request node configuration:

Method: POST
URL: http://localhost:11434/api/chat
Body (JSON):

{
  "model": "customer-router",
  "messages": [
    {
      "role": "user",
      "content": "={{ $json.body.message }}"
    }
  ],
  "stream": false,
  "format": "json"
}

The response from Ollama contains the tool call in the same structure. Your downstream Switch node parses the tool name and routes to the correct branch — exactly like before.

Update the Switch node to parse from the Ollama response format instead of the OpenAI response format. The tool name is in the same location; you may need to adjust the JSON path slightly.

The Cost Comparison

Here is the real math at different automation volumes:

Daily Executions	Monthly Total	OpenAI Cost (GPT-4o-mini)	OpenAI Cost (GPT-4o)	Fine-Tuned Local	Monthly Savings
100	3,000	$60/mo	$90/mo	$0/mo	$60-90/mo
500	15,000	$300/mo	$450/mo	$0/mo	$300-450/mo
2,000	60,000	$1,200/mo	$1,800/mo	$0/mo	$1,200-1,800/mo
5,000	150,000	$3,000/mo	$4,500/mo	$0/mo	$3,000-4,500/mo

The "Fine-Tuned Local" column is $0 because inference on your own hardware has no marginal cost. You are already paying for the server. The electricity cost for inference is negligible — under $5/month even at 5,000 daily executions.

One-time costs: fine-tuning compute ($5-20 on cloud GPU, or free on Ertas), plus 4-8 hours of your time building the dataset and testing.

Break-even at 100 daily executions: less than one week.

Make.com Equivalent

The same pattern works in Make.com. Replace the OpenAI module with an HTTP module calling your local Ollama endpoint.

HTTP Module configuration:

URL: http://your-server-ip:11434/api/chat
Method: POST
Body type: Raw (JSON)
Request content:

{
  "model": "customer-router",
  "messages": [{"role": "user", "content": "{{1.body.message}}"}],
  "stream": false,
  "format": "json"
}

If your Make.com instance runs in the cloud and your Ollama instance runs on-premise, you will need to expose the Ollama endpoint. Options: Cloudflare Tunnel (free), ngrok, or a VPN. For production, use Cloudflare Tunnel with access policies to restrict who can reach the endpoint.

Make.com's pricing model actually makes the savings more dramatic. You pay per operation, and the OpenAI module counts as one operation plus you pay OpenAI separately. With the HTTP module calling a local endpoint, you still pay for the Make.com operation but eliminate the OpenAI cost entirely.

Reliability Improvement

This is the underrated benefit. Cost savings get attention, but reliability is often the bigger win.

A generic GPT-4 model occasionally hallucinates tool names. It might output search_order instead of lookup_order, or refund_order instead of initiate_refund. Close enough for a human to understand, wrong enough to crash your Switch node.

A fine-tuned model trained exclusively on your eight tool names will never hallucinate a ninth. It has only ever seen these eight names in its training data. The output space is constrained to your actual schema.

In testing across customer support workflows, fine-tuned 7B models achieve:

Tool selection accuracy: 94-97% (vs 89-93% for GPT-4o-mini with prompt engineering)
Parameter extraction accuracy: 91-95% (vs 87-92% for GPT-4o-mini)
Schema compliance: 99.5%+ (vs 96-98% for GPT-4o-mini)
Hallucinated tool names: 0% (vs 1-3% for generic models)

That 1-3% hallucination rate on generic models does not sound like much until you are running 500 executions per day. At 2%, that is 10 failed executions daily — 10 customers who hit an error, 10 support tickets, 10 manual interventions.

Handling Edge Cases

What about messages that do not match any tool? The model needs to respond conversationally.

Include negative examples in your training data — messages like "What are your business hours?" or "Thanks for the help!" that should not trigger any tool. Train the model to output a conversational response for these.

What about messages that could match multiple tools? "I want to return my order and change my address" — this needs two tool calls. Train multi-tool examples if your workflow supports sequential execution, or train the model to pick the primary intent and handle the second in a follow-up turn.

What about new tools? When you add a ninth tool to your workflow, you need to retrain. Add examples for the new tool to your dataset, include negative examples that distinguish it from similar existing tools, and run a quick fine-tune. With LoRA, retraining takes 30-45 minutes. You can do it during a lunch break.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

Making the Switch

Here is the migration checklist:

Export execution history from your n8n/Make.com workflow (minimum 500 successful runs)
Format as JSONL with proper tool-call structure and 20% negative examples
Fine-tune a 7B model with LoRA (30-45 minutes)
Deploy to Ollama on the same machine or network as your workflow engine
Replace the OpenAI node with an HTTP Request node pointing at localhost:11434
Test with 50 real messages before switching production traffic
Monitor the first 1,000 executions, comparing tool selection accuracy against the OpenAI baseline
Retrain monthly with new execution data for continuous improvement

The hardest part is step 1 — getting comfortable with the idea that a 7B model running on your own hardware can match GPT-4's routing accuracy for your specific tools. Once you see the test results, the rest is straightforward plumbing.

Your workflows keep running. Your automation keeps working. Your API bill drops to zero.