Cookbook

    Five worked recipes for fine-tuning Ertas models on real on-device use cases: support bots, summarisers, code completion, transcript cleanup, and structured extraction.

    The Cookbook is the answer to "OK, but what would I actually build with this?" Each recipe walks an end-to-end on-device fine-tune from scratch: pick the use case, decide what data to collect or synthesise, choose a base model, set training hyperparameters, run a probe set, and ship into a real app shape (iOS, Android, desktop, or web). The recipes are not lifted from any single customer; they take real-world product surfaces from companies you have heard of and use them as anchors so the trade-offs feel concrete.

    How to read these recipes

    Every recipe has the same backbone:

    • The problem: who needs this, why a fine-tune is the right answer, and why on-device beats a hosted API for this specific shape.
    • The dataset: what rows look like, where to source or synthesise them, and how many you need before the fine-tune starts paying back.
    • The base model: which model from the Ertas catalogue fits the task, with the reasoning behind the pick.
    • Training config: the hyperparameters that move the needle, plus an honest read on cost and wall-clock time.
    • Integration: at least one of the Ship paths with adapted code samples.
    • Probe set: 8 to 10 sample prompts you can run by hand to confirm the model learned what you wanted, before you wire up a full eval.
    • Limits: where the model will quietly fail and what to do about it.

    You can read a recipe in two ways. End-to-end if you are picking a use case for the first time and want to feel the shape of the work. Skim-to-section if you already know what you are building and want a specific answer (typically the dataset section, since that is where most projects under-invest).

    The five recipes

    RecipeAnchor scenarioBase modelHardest part
    Customer support botA SaaS company ships a desktop helper trained on its public docs and past support ticketsGemma 4 E2B (3B class)Authoring refuse-unknown behaviour
    Document summariserA browser adds "TL;DR for any open tab" that runs without sending the page to a serverGemma 4 E2B (3B class)Holding the summary length and tone steady across genres
    Structured data extractionAn expense card extracts line items from photographed receipts on-deviceQwen 2.5 3B InstructProducing strict JSON and recovering when OCR is half-wrong
    Voice transcript cleanupA meeting-notes app adds offline cleanup so interviews and standups work in airplane modeLlama 3.2 3B InstructSourcing realistic ASR noise without breaking diarisation
    Code completionA game studio adds completions for its scripting language inside its creator appQwen 2.5 Coder 3BAuthoring a fill-in-the-middle dataset that does not leak the answer into the prefix

    The order is roughly easiest to hardest in terms of data work. Support is the gentlest first project (the dataset is largely already written: your docs, your tickets). Code completion is the hardest: it needs the largest dataset, a fill-in-the-middle authoring pipeline with prefix-leak gotchas, and the broadest task surface, where subtle failures (wrong API, wrong dialect) hide in plausible-looking output. Extraction and transcript sit between them: both have well-defined output shapes and paired-data sources that travel from public corpora to your project with reasonable effort.

    Before you start

    Two prerequisites the recipes assume:

    1. You have a clear picture of where the model will run. Browse the Ship section first if you do not. The choice between iOS-only, Android-only, desktop via Ollama, or web via WebAssembly changes the base model you pick (and sometimes the dataset shape, e.g. you may want shorter outputs for web because of memory ceilings).
    2. You can describe the success criteria in one sentence. "The bot answers 80% of frequently-asked product questions correctly without making up features," "the summariser fits in 60 words and does not invent numbers," and so on. Recipes without a one-sentence success criterion tend to produce models that no one trusts to ship.

    If either is missing, work through Concepts and Picking a base model before you start, then come back.

    How to adapt a recipe

    The recipes use specific companies as worked examples because abstract advice doesn't stick. None of them are Ertas customers; they are stand-ins picked for shape, not identity. Read the customer-support-bot recipe as "any SaaS with public docs and a support team," the code-completion recipe as "any vertical that has its own scripting language or framework," and so on.

    The hyperparameters in the recipes are starting points that work for the anchor scenario. Move them when your situation differs: a smaller dataset wants more epochs and lower learning rate; a much bigger dataset wants fewer epochs and a slightly higher learning rate. The Training tips page covers the heuristics.

    What's next