
Fine-Tune a Support Bot for Your Lovable App (No API Costs in Production)
Build an AI support bot that actually knows your product — trained on your docs, your tickets, your tone. Then run it locally for zero ongoing API costs.
Every Lovable app eventually needs customer support. The standard play is plugging in GPT-4 with a system prompt stuffed full of your docs. You set the temperature low, tell it to be helpful, and hope for the best. It works — sort of. But it costs $0.03 to $0.06 per conversation, it hallucinates your pricing page at least once a week, and it has no idea that you renamed the "Pro Plan" to "Growth Plan" three months ago. There is a better way.
What if your support bot actually knew your product? Not in the "I scraped your FAQ" sense, but in the "I have been trained on 500 real support conversations and I know the exact edge cases your users hit" sense. That is what fine-tuning gives you. And when you deploy the fine-tuned model locally with Ollama, the per-conversation cost drops from $0.04 to exactly $0.00.
This guide walks through the entire process: collecting your training data, fine-tuning a model with Ertas, deploying it locally, and integrating it into a Lovable-built app. No ML background required. No GPU cluster. No ongoing API bill.
Why Generic LLMs Make Bad Support Bots
Let us be honest about what happens when you point GPT-4 at your product docs and tell it to answer customer questions.
It hallucinates product specifics. Ask it about your pricing and it will confidently state numbers that were true six months ago — or were never true at all. Ask about a feature limitation and it will invent a workaround that does not exist. Generic LLMs do not know your product. They know the general pattern of "answering questions about SaaS products," and they fill in the blanks with plausible-sounding fabrications.
It has an inconsistent tone. One conversation it sounds like a Silicon Valley marketing page. The next it sounds like a Wikipedia article. Your support bot should sound like your brand. Generic models sound like themselves.
It is expensive at scale. Here is the math on GPT-4o support conversations:
| Conversations/Month | Avg Tokens/Conversation | Monthly API Cost |
|---|---|---|
| 500 | ~2,000 | $15 - $30 |
| 2,000 | ~2,000 | $60 - $120 |
| 5,000 | ~2,000 | $150 - $300 |
| 10,000 | ~2,000 | $300 - $600 |
| 25,000 | ~2,000 | $750 - $1,500 |
At 10,000 conversations per month, you are spending $300 to $600 just on support bot inference. For an indie app charging $9.99/month, that is a significant margin hit. And unlike your hosting costs, which grow sub-linearly, API costs scale linearly with every single conversation.
It does not improve over time. Your human support agents learn. They see the same ticket about the CSV export bug twelve times and they get faster at resolving it. A GPT-4 bot learns nothing from one conversation to the next. Every interaction starts from zero.
Fine-tuning fixes all four problems. A model trained on your actual support data knows your product specifics, maintains a consistent tone, runs locally for zero per-token cost, and can be retrained as your product evolves.
The Better Approach: A Model Trained on YOUR Support Data
Fine-tuning is not about making a small model generally smarter. It is about making it an expert in one narrow domain: your product, your users, your edge cases.
A 7B parameter model fine-tuned on 400 high-quality support conversations from your actual product will consistently outperform GPT-4 with a system prompt for your specific support use case. Why? Because:
- It has seen the real questions your users ask, not hypothetical ones
- It knows your exact feature set, limitations, and pricing
- It has learned your brand voice from hundreds of examples
- It handles your specific edge cases because it was trained on them
The tradeoff is that it will be worse at everything else. Ask it to write a poem and it will probably try to troubleshoot your billing page. But that is the entire point. You do not need your support bot to write poems.
Collecting Your Training Data
You need input-output pairs: a customer question and the ideal support response. Here is where to find them.
Source 1: Previous support tickets. If you have been doing support manually (email, Intercom, Crisp, whatever), you are sitting on a goldmine. Export your ticket history. Every resolved ticket is a potential training example. Focus on tickets where the customer confirmed their issue was resolved — those represent successful support interactions.
Source 2: Your help documentation. Turn each help article into Q&A pairs. For every section of your docs, write 3-5 questions a user might ask that this section answers, then write the ideal response. A 20-page help center can generate 60-100 training examples.
Source 3: Your FAQ. Similar to docs, but these are already in Q&A format. Expand each FAQ answer into a conversational support response rather than a documentation-style answer.
Source 4: Synthetic generation. This is the secret weapon for bootstrapping when you do not have a large ticket history. Use GPT-4 to generate realistic customer questions about your product, then manually write (or edit) the ideal responses. The key word is "manually." You are curating the responses, not auto-generating them.
Here is a concrete workflow for synthetic generation:
- Give GPT-4 your product description, feature list, and pricing page
- Ask it to generate 50 realistic support questions across categories (billing, bugs, features, how-to, account management)
- For each question, write the ideal response yourself — or have GPT-4 draft one that you then edit for accuracy and tone
- Review every single pair for correctness
Source 5: Your changelog and known issues. Users ask about recent changes and known bugs constantly. Create Q&A pairs for every changelog entry and every known issue.
Building Your Dataset
Your training data needs to be in JSONL format, with each line representing one training example. For a support bot, the format looks like this:
{"input": "I upgraded to the Growth plan but I'm still seeing the free plan limits. It's been 24 hours.", "output": "Thanks for reaching out! Plan upgrades can take up to 10 minutes to propagate, but 24 hours definitely isn't right. Let me check a few things: First, can you try logging out and back in? This forces a session refresh that picks up plan changes. If that doesn't fix it, go to Settings > Billing and confirm the plan shows as 'Growth' there. If it still shows 'Free,' the payment may not have completed — check your email for a receipt from Stripe. Let me know what you see and I'll get this sorted."}
{"input": "How do I export my data as CSV?", "output": "You can export to CSV from the Dashboard page. Click the three-dot menu in the top right corner of any data table, then select 'Export > CSV.' The file will download immediately for tables under 10,000 rows. For larger datasets, you'll get an email with a download link within a few minutes. Note: CSV exports are available on the Growth plan and above. If you're on the Free plan, you can export up to 100 rows at a time."}
How many examples do you need? For a focused support bot, aim for 300 to 500 high-quality examples. This is enough to cover:
- 50-80 common how-to questions
- 30-50 billing and account questions
- 40-60 bug reports and troubleshooting flows
- 30-40 feature requests and workaround explanations
- 20-30 onboarding questions
- The rest for edge cases and misc
Quality matters more than quantity. Every response in your training set should be exactly what you want the bot to say. If you would not send that response to a real customer, do not include it.
Dataset hygiene tips:
- Remove personally identifiable information (names, emails, account IDs)
- Standardize your response format (greeting style, sign-off style, level of detail)
- Include examples of the bot gracefully handling questions it cannot answer ("I'd recommend reaching out to our team at support@yourapp.com for account-specific billing questions")
- Add examples of the bot handling frustrated or upset users with empathy
Fine-Tuning With Ertas
This is the part that sounds intimidating but takes about 15 minutes.
Step 1: Upload your dataset. Log into Ertas and navigate to the Studio. Upload your JSONL file. Ertas will validate the format and flag any issues — malformed JSON, duplicate entries, extremely short responses that might not carry enough signal.
Step 2: Select your base model. For a support bot, Qwen 2.5 7B is the sweet spot. It is large enough to handle nuanced support conversations but small enough to run on modest hardware. If you are deploying on a machine with limited RAM (8GB or less), consider Qwen 2.5 3B — it will still dramatically outperform a prompted generic model for your specific use case.
Step 3: Configure training. Use LoRA fine-tuning (the default in Ertas). Key settings:
- Epochs: 3-5 for a support dataset of 300-500 examples
- Learning rate: Leave at the Ertas default unless you have a reason to change it
- LoRA rank: 16 is a solid default for support bots
Step 4: Train. Hit start. Training a 7B model with LoRA on 500 examples typically takes 20-40 minutes on Ertas. You will see loss metrics updating in real time. You want to see the loss curve descending steadily and flattening — not bouncing around wildly.
Step 5: Evaluate. Ertas provides an evaluation interface where you can test your fine-tuned model against sample inputs. Try 20-30 questions from your actual support queue and compare the responses to what your human agents would say. Pay attention to:
- Factual accuracy (does it get your pricing right?)
- Tone consistency (does it sound like your brand?)
- Edge case handling (what does it do with questions it was not trained on?)
If the results are not satisfactory, the fix is almost always better training data, not more training data. Go back and refine your dataset — fix incorrect responses, add more examples for weak areas, remove noisy entries.
Deploying Your Support Bot
Once you are happy with the model, export it from Ertas as a GGUF file. This is the format Ollama uses for local inference.
Step 1: Install Ollama on your server. If you are running your Lovable app on a VPS (DigitalOcean, Hetzner, Railway), install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
Step 2: Load your model.
ollama create my-support-bot -f Modelfile
Where your Modelfile points to your exported GGUF:
FROM ./my-support-bot.gguf
PARAMETER temperature 0.3
SYSTEM "You are the support assistant for [YourApp]. Be helpful, accurate, and concise."
Step 3: Test it.
curl http://localhost:11434/api/generate -d '{
"model": "my-support-bot",
"prompt": "How do I upgrade my plan?",
"stream": false
}'
Step 4: Verify performance. Run your evaluation set against the deployed model and confirm the quality matches what you saw in Ertas. Quantization to GGUF can occasionally affect quality at the margins — if you notice degradation, try a higher quantization level (Q6_K or Q8_0 instead of Q4_K_M).
Integrating With Your Lovable App
Your Lovable app probably already has a frontend component for the chat widget. The only change is pointing it at your local Ollama endpoint instead of the OpenAI API.
Option A: Direct API call. If your backend is a simple API route, swap the OpenAI SDK call for a fetch to your Ollama instance:
// Before: OpenAI API
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: userQuestion }],
});
// After: Local Ollama
const response = await fetch("http://localhost:11434/api/chat", {
method: "POST",
body: JSON.stringify({
model: "my-support-bot",
messages: [{ role: "user", content: userQuestion }],
stream: false,
}),
});
Option B: OpenAI-compatible endpoint. Ollama exposes an OpenAI-compatible API at /v1/chat/completions. This means you can literally just change the base URL in your existing OpenAI SDK configuration:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:11434/v1",
apiKey: "not-needed", // Ollama doesn't require a key
});
const response = await client.chat.completions.create({
model: "my-support-bot",
messages: [{ role: "user", content: userQuestion }],
});
This approach requires zero changes to your existing chat widget frontend code. The response format is identical.
Cost Comparison: The Numbers That Matter
Let us compare the total cost of running an AI support bot with GPT-4o versus a fine-tuned local model over 12 months:
| GPT-4o API | Fine-Tuned Local (Ertas + Ollama) | |
|---|---|---|
| Setup cost | $0 | $14.50/mo Ertas (or one-time training) |
| Infrastructure | $0 (OpenAI hosts it) | $30/mo VPS (4GB RAM, Hetzner) |
| Per-conversation cost | $0.03 - $0.06 | $0.00 |
| 1K conversations/mo | $30 - $60/mo | $30/mo flat |
| 5K conversations/mo | $150 - $300/mo | $30/mo flat |
| 10K conversations/mo | $300 - $600/mo | $30/mo flat |
| 25K conversations/mo | $750 - $1,500/mo | $30/mo flat |
| Annual cost at 10K/mo | $3,600 - $7,200 | $360 + $174 Ertas = $534 |
The breakeven point is around 1,000 conversations per month. Below that, the API approach is simpler and cheaper. Above it, the savings compound every month. At 10K conversations per month, you are saving $3,000 to $6,600 per year.
And here is the kicker: as your app grows and conversations increase, the API cost goes up linearly while the local model cost stays flat. At 50K conversations per month, the API would cost $1,500 to $3,000 monthly. Your local model still costs $30.
Quality Comparison: Fine-Tuned vs. Prompted
Cost savings mean nothing if the bot gives worse answers. Here is what we consistently see when comparing a fine-tuned 7B model against GPT-4o with a system prompt for product-specific support:
Factual accuracy on product specifics: Fine-tuned model wins. It knows your exact pricing tiers, feature limitations, and known issues because it was trained on them. GPT-4 with a system prompt will occasionally hallucinate or confuse details from its general training data.
Tone consistency: Fine-tuned model wins. Every response sounds like your brand because every training example demonstrated your brand voice. GPT-4 drifts between tones depending on the question phrasing.
Handling edge cases: Fine-tuned model wins for known edge cases (those in your training data) and ties for novel edge cases. If a user asks something completely outside your training set, both models may struggle — but you can add a fallback to route truly novel questions to a human agent.
General knowledge questions: GPT-4 wins. If someone asks your support bot about quantum physics, GPT-4 will answer correctly and your fine-tuned model will probably try to relate it to your billing page. But this is not a scenario that matters for a support bot.
Response latency: Local model wins. Ollama on a decent VPS returns responses in 500ms to 2 seconds. OpenAI API calls typically take 2-5 seconds, with occasional spikes to 10+ seconds during high-traffic periods.
Keeping Your Bot Updated
Your product changes. Your support bot needs to keep up. The workflow is:
- Collect new support conversations monthly (or after major product changes)
- Add 20-50 new training examples covering the changes
- Retrain on Ertas (takes 20-40 minutes)
- Export the updated GGUF and hot-swap it on your server
With Ertas at $14.50/month, you can retrain as often as you need. Most teams settle into a monthly cadence — retrain once a month with new examples from the previous month's support tickets. This keeps the bot current without requiring constant attention.
Getting Started This Weekend
Here is the minimum viable plan:
- Saturday morning: Export your last 3 months of support tickets. Clean them into 200-300 JSONL examples (this is the hardest part — budget 3-4 hours).
- Saturday afternoon: Upload to Ertas, select Qwen 2.5 7B, train with LoRA defaults. While it trains, spin up a $6/month Hetzner VPS and install Ollama.
- Sunday morning: Export the GGUF, deploy to your VPS, test with 20 sample questions.
- Sunday afternoon: Swap the API endpoint in your Lovable app. Deploy. Done.
Total time: one weekend. Total ongoing cost: $30/month for the VPS. Every support conversation from that point forward costs you exactly nothing in AI inference.
Your support bot should know your product better than GPT-4 ever could. Fine-tuning makes that happen, and local deployment makes it free.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Further Reading
- Fine-Tune a Model on Your App's Data — The complete beginner's guide to fine-tuning for solo developers.
- Your Vibe-Coded App Hit 10K Users. Now Your AI Bill Is $3K/Month. — Understanding the AI cost cliff and how to avoid it.
- Fine-Tune AI Without Writing Code — How Ertas makes fine-tuning accessible to non-ML developers.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Your Lovable App Has a $600/Month Problem
Lovable makes building AI apps effortless — until your API bill arrives. Here's the cost math every Lovable builder needs to see, and the fix that keeps AI costs flat at any scale.

Vibecoder AI Cost Guide: Every Major Builder Platform Covered (2026)
The complete AI cost guide for vibecoders using Bolt.new, Replit, Lovable, Cursor, Windsurf, v0, and Bubble. How each platform hits the API cost cliff and how to fix it.

Bolt.new Apps and the OpenAI Cost Cliff: What Happens at Scale
Bolt.new makes it easy to add AI features. Here's exactly what happens to your OpenAI bill as users grow — and how to replace it with a fine-tuned local model at flat cost.