Cookbook
Five worked recipes for fine-tuning Ertas models on real on-device use cases: support bots, summarisers, code completion, transcript cleanup, and structured extraction.
The Cookbook is the answer to "OK, but what would I actually build with this?" Each recipe walks an end-to-end on-device fine-tune from scratch: pick the use case, decide what data to collect or synthesise, choose a base model, set training hyperparameters, run a probe set, and ship into a real app shape (iOS, Android, desktop, or web). The recipes are not lifted from any single customer; they take real-world product surfaces from companies you have heard of and use them as anchors so the trade-offs feel concrete.
Customer Support Bot
Read the customer support bot guide.
Document Summarizer
Read the document summarizer guide.
Structured Data Extraction
Read the structured data extraction guide.
Voice Transcript Cleanup
Read the voice transcript cleanup guide.
Code Completion
Read the code completion guide.
How to read these recipes
Every recipe has the same backbone:
- The problem: who needs this, why a fine-tune is the right answer, and why on-device beats a hosted API for this specific shape.
- The dataset: what rows look like, where to source or synthesise them, and how many you need before the fine-tune starts paying back.
- The base model: which model from the Ertas catalogue fits the task, with the reasoning behind the pick.
- Training config: the hyperparameters that move the needle, plus an honest read on cost and wall-clock time.
- Integration: at least one of the Ship paths with adapted code samples.
- Probe set: 8 to 10 sample prompts you can run by hand to confirm the model learned what you wanted, before you wire up a full eval.
- Limits: where the model will quietly fail and what to do about it.
You can read a recipe in two ways. End-to-end if you are picking a use case for the first time and want to feel the shape of the work. Skim-to-section if you already know what you are building and want a specific answer (typically the dataset section, since that is where most projects under-invest).
The five recipes
| Recipe | Anchor scenario | Base model | Hardest part |
|---|---|---|---|
| Customer support bot | A SaaS company ships a desktop helper trained on its public docs and past support tickets | Gemma 4 E2B (3B class) | Authoring refuse-unknown behaviour |
| Document summariser | A browser adds "TL;DR for any open tab" that runs without sending the page to a server | Gemma 4 E2B (3B class) | Holding the summary length and tone steady across genres |
| Structured data extraction | An expense card extracts line items from photographed receipts on-device | Qwen 2.5 3B Instruct | Producing strict JSON and recovering when OCR is half-wrong |
| Voice transcript cleanup | A meeting-notes app adds offline cleanup so interviews and standups work in airplane mode | Llama 3.2 3B Instruct | Sourcing realistic ASR noise without breaking diarisation |
| Code completion | A game studio adds completions for its scripting language inside its creator app | Qwen 2.5 Coder 3B | Authoring a fill-in-the-middle dataset that does not leak the answer into the prefix |
The order is roughly easiest to hardest in terms of data work. Support is the gentlest first project (the dataset is largely already written: your docs, your tickets). Code completion is the hardest: it needs the largest dataset, a fill-in-the-middle authoring pipeline with prefix-leak gotchas, and the broadest task surface, where subtle failures (wrong API, wrong dialect) hide in plausible-looking output. Extraction and transcript sit between them: both have well-defined output shapes and paired-data sources that travel from public corpora to your project with reasonable effort.
Before you start
Two prerequisites the recipes assume:
- You have a clear picture of where the model will run. Browse the Ship section first if you do not. The choice between iOS-only, Android-only, desktop via Ollama, or web via WebAssembly changes the base model you pick (and sometimes the dataset shape, e.g. you may want shorter outputs for web because of memory ceilings).
- You can describe the success criteria in one sentence. "The bot answers 80% of frequently-asked product questions correctly without making up features," "the summariser fits in 60 words and does not invent numbers," and so on. Recipes without a one-sentence success criterion tend to produce models that no one trusts to ship.
If either is missing, work through Concepts and Picking a base model before you start, then come back.
How to adapt a recipe
The recipes use specific companies as worked examples because abstract advice doesn't stick. None of them are Ertas customers; they are stand-ins picked for shape, not identity. Read the customer-support-bot recipe as "any SaaS with public docs and a support team," the code-completion recipe as "any vertical that has its own scripting language or framework," and so on.
The hyperparameters in the recipes are starting points that work for the anchor scenario. Move them when your situation differs: a smaller dataset wants more epochs and lower learning rate; a much bigger dataset wants fewer epochs and a slightly higher learning rate. The Training tips page covers the heuristics.