Back to blog
    Building AI Agents That Work Offline: Fine-Tuned Models for Edge Automation
    ai-agentsedge-aiofflinefine-tuningtool-callinglocal-inferenceiotautomation

    Building AI Agents That Work Offline: Fine-Tuned Models for Edge Automation

    AI agents that depend on cloud APIs are fragile, expensive, and privacy-risky. Fine-tuned tool-calling models running on edge hardware create agents that work offline, respond instantly, and keep data local.

    EErtas Team·

    Every AI agent in production today depends on the internet. A user sends a message. The message travels to a cloud API. The API returns a response. The agent acts on it.

    This creates three dependencies:

    1. Network connectivity — if the internet goes down, the agent goes down
    2. API availability — if the provider has an outage, the agent is offline
    3. Data exposure — every user interaction transits through third-party infrastructure

    For many use cases, these dependencies are acceptable. For others — industrial automation, medical devices, field operations, secure facilities, retail point-of-sale — they're deal-breakers.

    The alternative: AI agents powered by fine-tuned models running on edge hardware. No internet required. No cloud dependency. No data leaving the device.

    Why Offline Agents Matter

    Industrial IoT and Manufacturing

    A factory floor agent that monitors sensors, classifies anomalies, and triggers maintenance workflows can't depend on a cloud API. Network latency introduces delays in safety-critical systems. Network outages create blind spots. And sending proprietary manufacturing data to third-party servers creates IP risk.

    A fine-tuned model running on an edge server in the factory processes sensor data locally, makes classification decisions in milliseconds, and triggers workflows without ever connecting to the internet.

    Medical Devices and Clinical Settings

    Medical devices that use AI for diagnostic assistance, patient monitoring, or clinical decision support face HIPAA constraints that effectively prohibit cloud API usage for patient data. But they also face a more basic constraint: reliability. A clinical AI tool that stops working because of a WiFi outage is worse than no AI tool at all.

    Fine-tuned models on edge hardware provide AI capabilities that work regardless of network status — in operating rooms, ambulances, remote clinics, and anywhere connectivity is uncertain.

    Field Service and Remote Operations

    Field technicians, agricultural operations, mining sites, maritime vessels, and military deployments all operate in environments where internet connectivity is unreliable or nonexistent. AI-powered diagnostic tools, maintenance assistants, and decision support systems must work offline.

    Retail Point-of-Sale

    Retail locations need AI for inventory queries, product recommendations, and customer service. But POS systems in stores can't afford latency or downtime. A local AI agent that runs on store hardware provides consistent, fast responses regardless of network conditions.

    Secure Facilities

    Government agencies, defense contractors, and high-security corporate environments operate in air-gapped networks where cloud APIs are architecturally impossible. Any AI capability must run on-premise, on hardware that never connects to the internet.

    The Architecture: Edge Agent Stack

    An offline AI agent has four components:

    1. Fine-Tuned Tool-Calling Model

    The agent's brain. A fine-tuned model that knows your specific tools, your domain terminology, and your workflow patterns. Runs on edge hardware via Ollama or llama.cpp.

    The base model should be small enough for your edge hardware (3B-8B parameters, quantized appropriately) and fine-tuned for the specific tool-calling patterns the agent needs.

    2. Local Tool Registry

    The set of tools the agent can invoke — API endpoints, scripts, system commands, sensor interfaces. These run on the same device or local network. No external API calls.

    Examples:

    • Query a local database for inventory status
    • Trigger a PLC command on a factory machine
    • Write a log entry to local storage
    • Send a notification to a local display
    • Read sensor data from connected devices

    3. Automation Engine

    The workflow orchestrator that connects the model to the tools. n8n (self-hosted) works well for this — it runs entirely on the edge device, connects to the local model via Ollama, and executes workflows without any cloud dependency.

    Alternative: a lightweight script-based agent framework that calls the Ollama API, parses tool calls, and executes them locally.

    4. Edge Hardware

    The physical device running all of the above. Options scale with budget and requirements:

    HardwareCostModels SupportedPowerUse Case
    Raspberry Pi 5 (8 GB)$801-3B quantized5WSimple classification, IoT sensors
    Nvidia Jetson Orin$500-2,0003-8B quantized15-60WIndustrial IoT, robotics
    Intel NUC / Mini PC$300-8003-7B quantized (CPU)30-65WRetail POS, office automation
    Mac Mini M4$600-1,6007-13B at Q515-20WGeneral edge inference
    RTX 4090 workstation$2,500-3,0008-13B at Q8100-200WHigh-throughput edge server

    LoRA Adapters as Agent Personalities

    One of the most powerful patterns for edge agents: use a shared base model with per-deployment LoRA adapters.

    Same base model (Llama 3.1 8B) running on the same hardware. Different LoRA adapter for each deployment context:

    • Factory floor adapter: Trained on manufacturing terminology, equipment codes, maintenance procedures, sensor classifications
    • Retail adapter: Trained on product catalog, customer inquiry patterns, inventory terminology
    • Clinical adapter: Trained on medical terminology, clinical workflows, diagnostic patterns
    • Field service adapter: Trained on equipment manuals, diagnostic procedures, repair protocols

    Each adapter is 50-200MB. Swap one adapter for another and the same hardware serves a completely different use case. This is the multi-tenant deployment model applied to edge hardware instead of cloud servers.

    The Development Workflow

    1. Define the Agent's Tools

    List every action the agent can take. Define each as a function with typed parameters. Keep the tool count manageable — 5-15 tools is the sweet spot for reliable tool selection on small models.

    2. Collect Training Data

    Build 300-500 training examples covering:

    • Clear tool calls for each tool
    • Ambiguous cases where context determines the right tool
    • No-tool cases (the agent should respond directly)
    • Error cases (invalid inputs, the agent should ask for clarification)

    3. Fine-Tune on Cloud GPUs

    Use Ertas to fine-tune on cloud GPUs. Training is a one-time cost (minutes on cloud GPUs). The resulting model runs forever on edge hardware.

    4. Export and Deploy to Edge

    Export as GGUF at the quantization level that fits your edge hardware. Deploy via Ollama on the target device. Connect to n8n or your automation framework.

    5. Test Offline

    Disconnect the device from the internet. Run your full test suite. The agent should operate identically — because it never needed the internet in the first place.

    6. Deploy to Production

    Ship the configured hardware to the deployment site. The agent works immediately upon power-on. No internet setup required (unless you want remote monitoring, which is optional).

    7. Update Periodically

    When the agent needs new capabilities or improved accuracy, fine-tune an updated model, export to GGUF, and ship a new model file to the device. This can be automated via local network updates or even sneakernet (USB drive) for air-gapped environments.

    The Reliability Advantage

    Beyond cost and privacy, offline agents offer a reliability profile that cloud-dependent agents can't match:

    • No API latency: Response time is hardware-limited (milliseconds), not network-limited (50-200ms)
    • No rate limits: Process as many queries as your hardware can handle, no throttling
    • No outages: No dependency on OpenAI's uptime, no service disruptions from provider incidents
    • No API deprecation: Your model doesn't get deprecated. It runs until you choose to update it
    • Deterministic behavior: Same input → same output, every time. No model version changes in the background

    For mission-critical applications — safety systems, medical devices, industrial control — this reliability profile is often the primary selling point, ahead of cost or privacy.

    Getting Started

    1. Identify a use case where cloud dependency is a problem (latency, connectivity, privacy, reliability)
    2. Define the agent's tools (5-15 actions)
    3. Build training data (300-500 examples)
    4. Fine-tune on Ertas → export as GGUF
    5. Deploy on edge hardware (Mac Mini, Jetson, consumer GPU)
    6. Test offline to verify full independence from cloud services
    7. Ship to production

    The future of AI agents isn't more cloud APIs. It's local, fine-tuned models running on hardware at the point of need — edge inference that works anywhere, anytime, without permission from a cloud provider.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading