E-Commerce Customer Service AI: Build a Fine-Tuned Support Model

An e-commerce brand handling 8,000 support tickets per month with GPT-4 spends roughly $3,000-5,000/month in API costs. A fine-tuned model trained on their ticket history costs $20/month in infrastructure and handles the same volume with better accuracy on brand-specific questions.

This is the most direct ROI case in AI agency work. Here is how to build it.

Why Generic AI Underperforms on E-Commerce Support

Generic AI handles common questions well. But e-commerce support is mostly brand-specific:

"What is your return policy for sale items?"
"My order #84521 says delivered but I never received it — what do I do?"
"Do you ship to Puerto Rico?"
"Is the blue version of product X back in stock?"

These questions require knowledge of this specific brand's policies, catalog, and procedures. Generic AI either hallucinates (invents a return policy) or deflects (says "please contact customer service" instead of answering). A model fine-tuned on the brand's actual support resolutions answers correctly from its training.

What You Need to Build

Your deliverable: A model that takes a support ticket (customer message) as input and returns the correct resolution or draft response — accurately handling the brand's policies, products, and procedures.

Quality target: ≥85% fully correct resolutions on held-out test set. The remaining 15% should be escalated or partially drafted, not wrong.

Step 1: Extract and Clean Training Data

Source: Your ticketing system (Zendesk, Gorgias, Freshdesk, Intercom). Export all resolved tickets from the past 12-18 months.

What you need per ticket:

Customer message (the input)
Resolution or response (the correct output)
Resolution status (resolved, escalated, requires human)

Filter the dataset:

Include: tickets where AI could realistically handle the resolution (policy questions, order status questions, product questions, returns, tracking)
Exclude: tickets requiring human judgment (fraud disputes, exceptions to policy, emotional escalations requiring empathy-heavy handling, complex multi-issue tickets)

Typically 60-70% of a support ticket archive is AI-handleable with a trained model.

Clean the resolutions:

Remove agent name signatures
Remove internal team notes appended to resolutions
Standardize policy language (remove outdated policies from old tickets)
Fix any factual errors in old resolutions

Target dataset size: 1,000-3,000 clean (ticket, resolution) pairs.

Step 2: Construct the JSONL Dataset

Each training example is a conversation pair:

{"messages": [{"role": "system", "content": "You are a customer support agent for [Brand], an e-commerce store selling outdoor gear. Answer customer questions accurately based on our current policies: Returns: 30 days for unworn items; Sale items: Final sale, no returns; Shipping: Free on orders $75+, typically 3-5 business days; Contact: support@brand.com for exceptions."}, {"role": "user", "content": "Hi, I ordered a jacket last week but it's too big. Can I return it?"}, {"role": "assistant", "content": "Hi! Absolutely — you have 30 days from purchase to return unworn items for a full refund or exchange. Just make sure the jacket still has its original tags and hasn't been worn outside. You can start your return at [brand.com/returns] and we'll email you a prepaid shipping label. Let me know if you have any other questions!"}]}

Include your current policies in the system message. This is critical — it lets you update policy without retraining. The model follows the system message; your fine-tuned training teaches it the tone, format, and handling patterns.

Include escalation examples:

{"messages": [..., {"role": "user", "content": "I've been waiting 3 weeks for my order and I'm really frustrated. This is unacceptable."}, {"role": "assistant", "content": "I'm really sorry about this — a 3-week wait is absolutely not acceptable, and I completely understand your frustration. Let me look into this right now. I'm escalating your case to our senior support team who will reach out within 2 hours with a resolution. Your satisfaction is our priority. — [Escalated to: Senior Support]"}]}

The model learns when to escalate vs resolve, which is as important as knowing the answers.

Step 3: Train With Ertas

Upload your JSONL file, validate, and start training. For a 1,500-example customer support dataset:

Base model: Llama 3 8B Instruct or Mistral 7B Instruct (both handle conversational support well)
Training duration: ~45-75 minutes
Default LoRA settings work well for support tasks

Step 4: Evaluate

Hold out 150-200 tickets before training. After training:

Run the evaluation set through the model. Score each response:

Correct (3): Accurate answer, appropriate tone, matches or improves on the human resolution
Partial (2): Direction is right but missing a specific detail or policy clarification
Wrong (1): Factually incorrect or clearly off-base

Target: 80%+ at score 3, less than 5% at score 1

Pay special attention to:

Policy accuracy (does the model state the correct return window?)
Escalation accuracy (does the model escalate when it should?)
Hallucination rate (does the model invent order numbers or make up stock availability?)

If hallucination rate is high, add more explicit instructions to the system message and add training examples that demonstrate the correct response to information the model cannot know ("I don't have access to your current order status — please check at [order tracking URL] or share your order number for help").

Step 5: Deploy and Route

Deployment: Ollama on a dedicated VPS. Route incoming support tickets to the model API before creating a Zendesk/Gorgias ticket.

Routing logic:

Ticket arrives
Send to fine-tuned model with ticket text
Model returns: {response: "...", confidence: "high|medium|low", escalate: true|false}
If escalate: true or confidence: low: create agent ticket with model draft attached
If confidence: high and not escalation: send response automatically or queue for agent 1-click approval

Starting with 1-click approval mode (agent sees response, clicks Send or edits) builds trust before you go fully automated. Most clients reach 60-70% fully automated within 3 months.

Ongoing Maintenance

Each month:

Review automated responses that were edited or rejected by agents
The edits are your new training data — they show you where the model is wrong
Retrain quarterly (or monthly for high-volume clients) with new examples added

The model improves continuously as long as you have this feedback loop. This is the retainer justification: every month of logged resolutions makes the model better.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →