Back to blog
    Fine-Tune a Model on Your App's Data: A Guide for Solo Developers
    indie-devfine-tuningtutorialbeginnersegment:vibecoder

    Fine-Tune a Model on Your App's Data: A Guide for Solo Developers

    A step-by-step guide for non-ML developers who want to fine-tune an AI model on their app's actual usage data — from collecting training examples to deploying the model.

    EErtas Team·

    You built an app. It uses an AI API — maybe OpenAI, maybe Anthropic — and it works. But the API costs are climbing, the latency is unpredictable, and you are starting to realize that 90% of what the model does for your app is the same narrow task repeated thousands of times. You do not need a genius generalist. You need a specialist.

    Fine-tuning a smaller model on your app's actual data is how you get that specialist. And despite what the ML Twitter discourse might suggest, you do not need a PhD or a cluster of H100s to do it. This guide walks through the entire process, step by step, for developers who have never touched model training before.

    Why Fine-Tuning on YOUR Data Matters

    Generic frontier models are remarkable generalists. They can write poetry, debug code, summarize legal documents, and generate SQL — all in the same conversation. But that generality comes at a cost: they are not optimized for any single task.

    Your app probably needs the model to do one thing well. Maybe it classifies support tickets. Maybe it extracts structured data from invoices. Maybe it generates product descriptions in a specific tone. For tasks like these, a 3B parameter model fine-tuned on your data will consistently outperform a 70B general model — at a fraction of the cost and latency.

    Fine-tuning is not about making a model smarter. It is about making it focused.

    Step 1: What Training Data Looks Like

    Training data for fine-tuning is simply a collection of input-output pairs that demonstrate the task you want the model to perform. If your app sends a prompt to an API and gets a response, you already have the raw material.

    The standard format is JSONL — one JSON object per line:

    {"input": "Classify this ticket: My order hasn't arrived in 2 weeks", "output": "shipping_delay"}
    {"input": "Classify this ticket: The app crashes when I upload photos", "output": "bug_report"}
    {"input": "Classify this ticket: Can I change my subscription plan?", "output": "account_inquiry"}
    

    The specifics of the format depend on your tooling, but the concept is universal: show the model what goes in and what should come out.

    Start by logging your API calls. Every request your app sends to the AI API and every response it receives is a potential training example. Add a logging layer if you do not already have one, and let it accumulate for a few weeks.

    Step 2: How Much Data You Need

    The most common question — and the answer is less than you think.

    For narrow, well-defined tasks (classification, extraction, reformatting), 1,000 to 3,000 high-quality examples is typically sufficient to see meaningful improvements over the base model. Some tasks converge with as few as 500 examples.

    Quality matters far more than quantity. A thousand carefully curated examples will outperform ten thousand noisy ones. Focus on:

    • Diversity: Cover the full range of inputs your app handles, including edge cases
    • Correctness: Every output in your training set should be exactly what you want the model to produce
    • Consistency: Similar inputs should have consistently formatted outputs

    If you have 50,000 API logs but half of them contain errors or inconsistent formatting, filter ruthlessly. A clean dataset of 2,000 examples beats a messy one of 20,000.

    Step 3: Choosing a Base Model

    For app-specific fine-tuning, smaller models are almost always the right choice. Here is the reasoning:

    • 3B parameters: Fast inference, runs on consumer hardware, excellent for classification and extraction tasks
    • 7B parameters: The sweet spot for most applications, handles generation tasks well, still runs on a single GPU
    • 13B+ parameters: Only needed for complex generation tasks where output quality is paramount

    Start with the smallest model that could plausibly handle your task. Fine-tuning a 3B model takes minutes. Fine-tuning a 13B model takes hours. If the 3B model gets you 90% of the way there, ship it and move on.

    For base model selection, look at the Hugging Face leaderboards filtered by size. Models like Llama 3, Mistral, Phi, and Qwen all have strong small variants that fine-tune well.

    Step 4: The Fine-Tuning Process

    You do not need to understand the math behind fine-tuning to use it effectively. Here is what matters at a practical level:

    LoRA (Low-Rank Adaptation) is the technique you will use. Instead of modifying all the billions of parameters in a model, LoRA adds small trainable matrices to specific layers. This means:

    • Training is fast — minutes to hours instead of days
    • Memory requirements are low — a 7B model can be fine-tuned on a single consumer GPU with 16GB VRAM
    • The output is a small adapter file (50-150MB) rather than a full model copy

    The training loop itself is straightforward: the model sees your input-output pairs, adjusts the LoRA weights to better predict the expected outputs, and repeats for a few passes (epochs) over your dataset. Two to four epochs is typical for most tasks.

    Step 5: Evaluating Your Model

    Before deploying, you need to know whether the fine-tuned model actually performs better than your current setup. Set aside 10-20% of your data as a test set — examples the model never saw during training.

    Run both your current API-based solution and the fine-tuned model on the test set. Compare:

    • Accuracy: Does the fine-tuned model produce correct outputs as often or more often?
    • Format compliance: Does it follow your expected output structure consistently?
    • Edge cases: How does it handle the unusual inputs that trip up the generic model?

    Common quality issues at this stage:

    • Overfitting: The model memorizes training examples instead of learning the pattern. Fix by adding more diverse data or reducing training epochs.
    • Format drift: The model produces correct answers in inconsistent formats. Fix by ensuring your training data has perfectly consistent formatting.
    • Hallucination on unfamiliar inputs: The model confidently produces wrong outputs for input types not well-represented in training. Fix by expanding your training data to cover those cases.

    Step 6: Deploying Your Model

    Once your model passes evaluation, deployment is the easy part.

    Export to GGUF format. GGUF is the standard format for running models locally with tools like Ollama and llama.cpp. Your fine-tuned adapter gets merged with the base model into a single portable file.

    Run with Ollama. Install Ollama, load your GGUF file, and you have a local API endpoint that is a drop-in replacement for the cloud API your app currently uses. Change the endpoint URL in your app config, and you are live.

    What you gain: Zero per-token costs, predictable latency, full data privacy, and no rate limits. For a solo developer running a SaaS product, switching from API calls to a self-hosted fine-tuned model can cut AI costs by 95% or more.

    Common Mistakes to Avoid

    Too little data diversity. If your training data only covers happy-path inputs, the model will fail on anything unusual. Deliberately include edge cases and error scenarios.

    Wrong base model size. Starting with a 70B model because "bigger is better" wastes time and money. Start small, evaluate, and only scale up if the smaller model genuinely cannot handle the task.

    Skipping evaluation. Deploying without a proper test set is flying blind. You will not know whether your fine-tuned model is better until you measure it against a baseline.

    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    How Ertas Makes This Accessible

    Ertas is built for developers who want the benefits of fine-tuning without the ML infrastructure overhead. Upload your training data, select a base model, and kick off a training run — no Python scripts, no CUDA debugging, no YAML configuration files.

    The platform handles data validation, training orchestration, evaluation metrics, and GGUF export. You go from API logs to a deployable model without leaving the browser.

    For solo developers and small teams, this is the difference between "I should fine-tune someday" and actually shipping a fine-tuned model this week.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading