What is Tool Use?

The ability of an LLM to invoke external functions, APIs, or tools as part of its response generation — implemented through structured function-call schemas that the model produces and a runtime executes, foundational to all modern agent architectures.

Definition

Tool use is the capability of a language model to invoke external functions, APIs, or tools during response generation rather than relying solely on its internal knowledge. The pattern is implemented through structured function-call schemas: the developer registers tools (with names, descriptions, and parameter schemas), the model decides when to invoke a tool and produces a structured call, the runtime executes that call against the actual tool, and the result is fed back to the model for continued reasoning. This loop — model decides, runtime executes, result returns — is the foundation of all modern agent architectures.

Tool-use fidelity (the model's ability to produce well-formed tool calls reliably under pressure) is now a primary axis of model capability separate from raw reasoning quality. Models from labs that invest heavily in tool-use training (OpenAI, Anthropic, increasingly Alibaba and Moonshot) typically have higher fidelity than community fine-tunes that don't include explicit tool-use training data. Open-weight bases like GPT-OSS, Qwen 3+, Kimi K2.6, and Hermes 4 have particularly strong tool-use behavior; older or general-purpose bases often need fine-tuning to achieve production reliability.

Why It Matters

Tool use is the line between LLMs as text generators and LLMs as agents. Without tool use, a model can only produce text; with tool use, a model can take actions in the world — querying databases, calling APIs, controlling browsers, executing code. Every agent framework (LangChain, LangGraph, CrewAI, AutoGen, Mastra, smolagents, Hermes Agent) builds on tool use as its primitive. For production deployments, tool-use fidelity is often more important than peak reasoning capability — a model that hallucinates tool calls 5% of the time produces unreliable agents regardless of how clever its reasoning is otherwise.

Key Takeaways

Tool use enables LLMs to invoke external functions and APIs during response generation
Implemented through structured function-call schemas (name, description, parameters)
Foundational to all modern agent frameworks and architectures
Tool-use fidelity is a separate capability axis from raw reasoning quality
Strong open-weight tool-use bases: GPT-OSS, Qwen 3+, Kimi K2.6, Hermes 4

How Ertas Helps

When fine-tuning models for agentic deployments in Ertas Studio, including explicit tool-use traces in the training data substantially improves the fine-tuned model's tool-use fidelity in production. Ertas Studio supports training data formats with structured function calls, observed tool outputs, and multi-step reasoning traces — letting you produce a fine-tune that handles your specific tool surface reliably rather than degrading toward generic tool-use behavior.