Ertas for Voice Agent Fine-Tuning

Fine-tune the LLM backbone of voice agents for faster, more accurate spoken interactions — with domain-specific understanding and consistent conversational patterns.

The Challenge

Voice agents — AI systems that handle phone calls, drive-through orders, appointment scheduling, and customer service conversations — are rapidly replacing traditional IVR systems. The language model is the brain of every voice agent, responsible for understanding caller intent, generating natural responses, making decisions about call routing, and maintaining coherent multi-turn conversations. Yet most voice agent builders rely on generic language models that do not understand the specific domain, vocabulary, or conversational patterns of the business they serve.

The consequences of using a generic model in a voice agent are immediately apparent to callers. The agent misunderstands industry-specific terms, asks redundant questions because it cannot infer context, generates responses that are too long for natural speech timing, and fails to follow the specific call scripts and escalation procedures the business requires. Latency is another critical factor — voice conversations require sub-second response times, and sending requests to large cloud models introduces perceptible delays that make the conversation feel unnatural. These issues compound caller frustration and drive abandonment rates that undermine the business case for voice AI.

The Solution

Ertas enables voice agent builders to fine-tune compact, fast language models on domain-specific conversational data. With Ertas Studio, teams train on transcripts of successful calls, approved call scripts, and conversation flows that capture the exact patterns callers expect. The fine-tuned model understands the business's terminology, follows its call handling procedures, and generates responses optimized for spoken delivery — concise, natural-sounding, and appropriately timed.

Because Ertas exports models in GGUF format, the fine-tuned model can be deployed on edge infrastructure for ultra-low latency inference. A 7B model running on a local GPU delivers responses in under 200 milliseconds — fast enough for natural conversation pacing. The model's compact size also means lower per-call compute costs compared to large cloud model API calls. Deployed through Ollama, vLLM, or Ertas Cloud, the model serves as the reasoning engine behind voice agent platforms like Retell, Vapi, Bland, or custom telephony integrations. Ertas Vault ensures all call transcripts and training data are handled according to call recording regulations and privacy requirements.

Key Features

Studio

Conversational Fine-Tuning

Train models on call transcripts, approved scripts, and multi-turn conversation flows using Studio. Optimize for spoken delivery with response length controls and natural turn-taking patterns.

Hub

Compact Voice-Optimized Models

Start from efficient models on Hub that deliver fast inference on edge hardware. Fine-tuning these compact models produces voice agents with sub-200ms response latency.

Cloud

Low-Latency Inference Endpoints

Deploy through Cloud or edge infrastructure for the sub-second response times that voice conversations demand. Scale endpoints based on concurrent call volume.

Vault

Call Data Compliance

Vault ensures all call recordings, transcripts, and training data comply with call recording consent laws, PCI-DSS requirements for payment processing, and HIPAA for healthcare calls.

Example Workflow

A dental practice management company builds voice agents that handle appointment scheduling for 500 dental offices. They collect 100,000 call transcripts from successful scheduling interactions — including appointment types, insurance verification questions, schedule negotiations, and cancellation handling — and upload them to Ertas Vault. Using Ertas Studio, they fine-tune a 7B model on the dental scheduling domain, training it to understand dental terminology (prophylaxis, periodontal maintenance, crown prep), insurance plan names, and the specific scheduling logic for different procedure types. The model is deployed on GPU servers in their data center, achieving 150ms average response latency. The fine-tuned voice agent handles 75% of scheduling calls end-to-end without human intervention, up from 45% with the generic model. Call duration drops by 30% because the model understands caller intent faster, and patient satisfaction scores increase because responses are natural and contextually appropriate.