smolagents + Ertas
Build code-action agents with smolagents — Hugging Face's ~1,000-line agent framework where the LLM writes Python code as its primary action mode, with first-class support for fine-tuned and local models.
Overview
smolagents is Hugging Face's minimal agent framework, designed around the 'code-action' paradigm: rather than choosing from a fixed list of tools via JSON function calls, the agent writes and executes Python code as its action format. This pattern is inspired by research showing that code-action agents typically outperform JSON-tool-call agents on complex tasks, and it has the practical benefit that the agent can compose, iterate, and reason over code naturally rather than being constrained to pre-defined tool schemas.
The framework's defining trait is its small footprint — the core implementation is approximately 1,000 lines of code. This makes smolagents unusually approachable for teams who want to understand exactly what their agent framework is doing, customize it deeply, or integrate it into existing systems without inheriting framework opinions about workflow, memory, or orchestration. Hugging Face's `ml-intern` (released April 2026) is built on smolagents and demonstrates the framework's capability — improving a 1.7B model's reasoning score by 200% in a 10-hour H100 self-improvement run.
How Ertas Integrates
Ertas-trained models work with smolagents through the framework's flexible LLM provider system. After fine-tuning your model in Ertas Studio and deploying to an OpenAI-compatible endpoint (Ollama, vLLM, or Ertas Cloud), you configure smolagents to call your endpoint via the LiteLLM or OpenAI provider classes. The code-action paradigm pairs particularly well with fine-tuned models: training data that includes Python code traces — task descriptions, code attempts, execution outputs, and corrections — produces a model that writes more reliable agent code.
For teams building self-improving agentic systems, the smolagents + Ertas combination is particularly powerful. You can run a smolagents agent in production, log its successful and failed code-action traces, then use those traces to fine-tune a smaller model in Ertas Studio that performs the same task at lower inference cost. This pattern — large model in production, small model trained on its traces — is the operational backbone of how teams scale agent deployments cost-effectively.
Getting Started
- 1
Fine-tune your code-action model in Ertas Studio
Train on data that includes Python code traces: task description, code attempts, execution results. Ertas Studio supports this format natively.
- 2
Deploy to an OpenAI-compatible endpoint
Export to GGUF and serve via Ollama, vLLM, or Ertas Cloud. smolagents calls any endpoint that exposes the standard chat completion API.
- 3
Install smolagents and configure the model
Install smolagents from Hugging Face and configure a LiteLLMModel or OpenAIServerModel pointed at your inference endpoint.
- 4
Define tools and create the CodeAgent
Add Python tools the agent can use (HTTP requests, database queries, file operations). Create a CodeAgent that uses these tools by writing executable Python code.
- 5
Run, log, and continuously improve
Execute agent runs in production, log code-action traces, and feed them back into Ertas Studio for incremental fine-tuning that distills production behavior into smaller, faster models.
from smolagents import CodeAgent, LiteLLMModel, tool
# Point smolagents at your Ertas-trained model served via Ollama
model = LiteLLMModel(
model_id="openai/ertas-coder-14b",
api_base="http://localhost:11434/v1",
api_key="not-needed",
)
@tool
def query_database(sql: str) -> str:
"""Execute a SQL query against the analytics database."""
return run_query(sql)
@tool
def generate_chart(data: list, chart_type: str) -> str:
"""Generate a chart from data. Returns the chart file path."""
return create_chart(data, chart_type)
# Create a code-action agent
agent = CodeAgent(
tools=[query_database, generate_chart],
model=model,
)
# The agent writes Python code to accomplish the task
result = agent.run("Show me Q3 revenue by product line as a bar chart")Benefits
- Code-action agents outperform JSON-tool-call agents on complex multi-step tasks
- ~1,000-line core implementation — small enough to fully understand and customize
- Hugging Face's official agent framework with strong community support
- Pairs with Ertas fine-tuning: train on code-action traces for higher reliability
- Self-improvement loop: log production traces, fine-tune smaller models on them
- Works with any LiteLLM or OpenAI-compatible endpoint including local Ertas models
Related Resources
Fine-Tuning
GGUF
Inference
LoRA
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
Fine-Tune AI Models Without Writing Code
Running AI Models Locally: The Complete Guide to Local LLM Inference
CrewAI
Hugging Face
LangChain
Ollama
vLLM
Ertas for Data Extraction
Ertas for AI Automation Agencies
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.