Fine-Tuned Chatbot vs RAG Chatbot: What to Actually Build for a Client

Every AI consultant and agency gets asked the same question eventually: "Should we fine-tune a model or use RAG?" The honest answer is: it depends on the problem, and often you need both.

But "it depends" is not useful guidance. This article gives you a precise decision framework so you can walk into a client scoping session and know within 30 minutes which approach you need.

What Each Technique Does

Fine-tuning modifies the model's weights to change its behavior. You train the model on examples of the task you want it to perform, and the model learns to perform that task better — with the right style, terminology, output format, and behavioral patterns. Fine-tuning is about behavior.

RAG (Retrieval-Augmented Generation) injects relevant documents or data into the model's context at inference time. The model's weights are unchanged; instead, it is given information to reason about at the moment of each query. RAG is about knowledge access.

This distinction is fundamental and determines which technique applies to a given problem.

The Core Decision Framework

Ask these four questions about the client's use case:

Question 1: Is the failure mode "wrong style/behavior" or "wrong facts"?

If wrong style/behavior: Fine-tuning. The model gives the right information but sounds wrong — too formal, too casual, uses generic AI language instead of the client's voice, structures outputs incorrectly, fails to follow the client's specific format requirements.

If wrong facts: RAG. The model gives confidently wrong information because it does not have access to the correct facts — wrong product specifications, outdated pricing, incorrect policy details, information about specific people or records the model was never trained on.

Question 2: Does the knowledge change frequently?

If knowledge changes frequently: RAG. Product catalogs, pricing, inventory, case statuses, policy updates, personnel directories — anything that updates more than once a month. RAG pulls from a database you can update without retraining. Fine-tuning is a snapshot.

If knowledge is stable: Fine-tuning is viable. Domain terminology that rarely changes, stylistic conventions, task patterns — these can be learned through fine-tuning and will remain accurate for 12+ months.

Question 3: How much data does the client have?

Fewer than 200 examples: RAG is easier to get started with. RAG requires document chunking and embedding, not training data. Fine-tuning needs sufficient examples to learn from.

200+ high-quality examples: Fine-tuning is viable. More examples (500-2,000) produce noticeably better results.

Both exist: Use both techniques. Fine-tune on behavioral examples, add RAG for factual retrieval.

Question 4: Is there a data sovereignty requirement?

Yes, data cannot leave the premises: Both techniques are viable with local deployment (Ollama). Fine-tuned models are fully self-contained — no API calls. RAG with a local vector database (Chroma, Qdrant running locally) also satisfies data sovereignty. This requirement does not determine which technique; it determines the deployment architecture.

No specific requirement: Cloud-hosted RAG (Pinecone, Weaviate Cloud) is an option for reduced operational overhead.

Decision Matrix

Situation	Recommendation
Client needs specific tone/voice	Fine-tuning
Client has product catalog updated weekly	RAG
Client wants accurate answers about their services	RAG
Client wants consistent format in all outputs	Fine-tuning
Client has 2,000+ support ticket examples	Fine-tuning
Client's domain terminology is specific and unusual	Fine-tuning
Client asks questions about current orders/records	RAG
Client needs answers from lengthy policy documents	RAG
Client wants a model that "sounds like us"	Fine-tuning
Client needs current information from a database	RAG
Complex client use case, budget allows	Both

Use Case Deep Dives

Customer Support Chatbot

Typical requirements: Answer common questions, maintain brand voice, handle escalations appropriately, cover FAQs and product questions.

Recommendation: Fine-tuning + RAG.

Fine-tune for: tone, escalation behavior, format (always include order number in response, always offer to transfer to human agent), response style.

RAG for: current product specs, pricing, order status (if connected to live data), policy details that update frequently.

Why not just RAG? Because RAG alone will produce answers in generic AI assistant style, not in the client's voice. Fine-tuning fixes the behavior; RAG fixes the knowledge.

Why not just fine-tuning? Because fine-tuned models memorize facts from their training data. If you fine-tune on a product catalog that then changes, the model gives wrong answers until you retrain. RAG solves this.

Internal Document Q&A

Typical requirements: Answer questions about internal policies, procedures, HR documents, technical documentation.

Recommendation: RAG, potentially with light fine-tuning.

RAG is the primary technique — the entire value proposition is "answer questions using our documents." The model needs access to the documents at inference time, not memorized knowledge.

Light fine-tuning adds value if: the client has specific formatting requirements for answers (always cite the source document, always provide a confidence statement), or if the document style is unusual enough that the base model struggles to understand it.

Content Generation (Brand Voice)

Typical requirements: Generate blog posts, social media content, product descriptions, email drafts that sound like the client.

Recommendation: Fine-tuning, potentially with RAG for product details.

Brand voice is a behavioral characteristic — the right tone, word choices, sentence rhythm, structural patterns. This is learned through fine-tuning on examples of existing brand content.

If the content generation also needs to include accurate product specifications, pricing, or other factual details — add RAG to pull this data at generation time.

Sales Prospect Research

Typical requirements: Summarise company information, generate outreach context, research lead backgrounds.

Recommendation: RAG with live web/database integration.

This use case needs current information that changes constantly. A fine-tuned model does not help here — the problem is data access, not behavior. Connect a RAG pipeline to relevant data sources (LinkedIn, company websites, CRM data) to provide the model with fresh context at inference time.

Code Review Assistant

Typical requirements: Review code against team conventions, suggest improvements in the team's style.

Recommendation: Fine-tuning.

Team coding conventions are stable behavioral patterns (always add error handling, prefer functional style, specific naming conventions). These are learned through fine-tuning on examples of approved vs. flagged code reviews. RAG over documentation adds little beyond what a well-prompted base model provides.

The "Use Both" Architecture

For most serious production deployments, the right answer is not fine-tuning OR RAG — it is both, playing different roles:

User query
    ↓
[Retrieval system: pulls relevant docs/data from knowledge base]
    ↓
[Fine-tuned model: processes query + retrieved context, generates response]
    ↓
Response

The fine-tuned model brings the behavioral characteristics (tone, format, task performance). The retrieval system brings current, factual grounding. Together, they produce responses that are both stylistically correct and factually accurate.

This architecture is more complex to build and maintain than either approach alone. It requires:

A knowledge base with regular update processes
An embedding and indexing pipeline
A fine-tuning pipeline for behavioral updates
Evaluation metrics that cover both behavior quality and factual accuracy

For clients with the budget and the use case, it is worth the investment. For simpler use cases, start with the technique that addresses the primary failure mode and add the second layer later if needed.

Building the Client Recommendation

When you present a recommendation to a client, frame it as:

"Your primary challenge is [wrong style vs. wrong facts]. This means [fine-tuning / RAG] is the right starting point. Here's what it will do for you: [specific improvement]. The secondary challenge is [other issue], which we would address in phase two with [the other technique]."

Clients appreciate being told what the problem actually is and why you chose the approach you did. This is more persuasive than a technical explainer and sets better expectations for what the system will and will not do.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →