
Building a Training Dataset from Your App's User Interactions
Your app already generates the training data you need for fine-tuning. How to collect, clean, and format user interactions into a dataset that produces a high-quality on-device model.
The best training data for your AI model comes from your own app. Your users' real interactions, real questions, and real content represent exactly the domain your model needs to learn. No synthetic data or public dataset will match the quality of data from your actual use case.
This guide covers how to collect, clean, and format that data for fine-tuning.
What Counts as Training Data
Every user interaction in your app is a potential training example:
| App Type | Raw Data | Training Example |
|---|---|---|
| Customer support | User question + agent response | Q&A pair |
| Note-taking | User notes + auto-generated summaries | Summarization pair |
| Finance | Transaction description + assigned category | Classification pair |
| Incoming email + user's reply | Reply generation pair | |
| E-commerce | Product + user review | Sentiment pair |
| Health | Symptom description + triage outcome | Classification pair |
The pattern: any input-output pair where the "correct" output is known (either from explicit user action or expert judgment) is a training example.
Data Collection Strategy
Passive Collection (Recommended Start)
Log user interactions that naturally produce input-output pairs:
- Search queries + clicked results: The clicked result is the "correct" answer
- Categorization actions: When a user assigns a category to content, that is a labeled example
- Corrections: When a user edits an AI-generated response, the edited version is the "correct" output
- Completions: When a user accepts a suggestion, that is a positive example
// Log user corrections for training data
function onAiResponseEdited(original: string, edited: string, context: string) {
logTrainingExample({
input: context,
output: edited, // The user's correction is the training target
source: "user_correction",
timestamp: Date.now(),
});
}
Active Collection
Prompt users to provide feedback that directly produces training data:
- Thumbs up/down on AI responses: Filter for thumbs-up responses as positive examples
- Correction interface: Let users fix AI responses; log the corrections
- Template usage: When users select and use a template, the filled template is a training example
Synthetic Augmentation
Supplement real data with synthetic examples:
- Take your best real examples
- Use a larger model (GPT-4o, Claude Sonnet) to generate variations
- Validate synthetic examples against real ones
- Mix synthetic and real data (aim for at least 30% real data)
Privacy and Consent
Legal Requirements
Before collecting any user data for training:
- Update your privacy policy to disclose that anonymized interaction data may be used to improve AI features
- Obtain consent where required (GDPR requires explicit consent for processing personal data)
- Provide opt-out for users who do not want their interactions used for training
- Anonymize data before using it for training. Remove names, emails, phone numbers, and other PII.
Technical Anonymization
import re
def anonymize(text: str) -> str:
# Remove email addresses
text = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[EMAIL]', text)
# Remove phone numbers
text = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', text)
# Remove names (requires NER or a name list)
text = replace_names(text, '[NAME]')
return text
On-Device Collection
The safest approach: collect training data on-device and only transmit anonymized, aggregated data. The raw interaction stays on the user's phone. Only the anonymized training example leaves the device.
Data Cleaning
Raw interaction data is noisy. Cleaning is the most important step in the pipeline.
Quality Filters
- Remove too-short examples: Inputs under 10 characters or outputs under 20 characters rarely contain useful signal
- Remove duplicates: Exact and near-duplicate examples add noise
- Remove errors: Interactions where the app crashed or the user abandoned mid-flow
- Remove off-topic: Interactions that do not match your target task
- Remove PII that slipped through anonymization: Secondary pass with stricter patterns
Quality Scoring
Not all examples are equally useful. Score each example:
| Signal | Weight | Rationale |
|---|---|---|
| User accepted the AI response | High | Direct positive signal |
| User edited then accepted | Highest | The edit is the ideal output |
| User rejected the AI response | Low (use sparingly) | Negative signal, useful for contrast |
| Long, detailed interaction | Medium | More context for the model |
| Common query pattern | Medium | High-frequency patterns matter most |
Target Distribution
Your training set should roughly match your production query distribution. If 40% of user queries are about topic A and 10% about topic B, your training set should reflect that ratio. Over-representing rare topics can skew the model.
Formatting for Fine-Tuning
Chat Format (Standard)
Most fine-tuning frameworks expect the chat format:
{"messages": [
{"role": "system", "content": "You are an assistant for FitTracker app."},
{"role": "user", "content": "How many calories in a banana?"},
{"role": "assistant", "content": "A medium banana has about 105 calories, 27g carbs, 1.3g protein, and 0.4g fat."}
]}
Multi-Turn Conversations
For chat features, include the full conversation:
{"messages": [
{"role": "system", "content": "You are an assistant for FitTracker app."},
{"role": "user", "content": "What should I eat before a workout?"},
{"role": "assistant", "content": "A light meal 1-2 hours before works best. Good options: banana with peanut butter, oatmeal, or a small smoothie. Focus on easily digestible carbs."},
{"role": "user", "content": "What about protein?"},
{"role": "assistant", "content": "Add a small amount of protein: a scoop of whey in your smoothie, Greek yogurt with your oatmeal, or a handful of almonds. Keep it under 20g to avoid feeling heavy during the workout."}
]}
Classification Format
For classification tasks, the format is simpler:
{"messages": [
{"role": "user", "content": "Classify: Morning run in the park"},
{"role": "assistant", "content": "Cardio"}
]}
Dataset Size Guidelines
| Task | Minimum | Good | Excellent |
|---|---|---|---|
| Classification (5-10 categories) | 200 | 500-1,000 | 2,000+ |
| Q&A (bounded domain) | 300 | 1,000-2,000 | 3,000+ |
| Chat (multi-turn) | 500 | 2,000-3,000 | 5,000+ |
| Summarization | 300 | 1,000-2,000 | 3,000+ |
| Content generation | 500 | 1,500-3,000 | 5,000+ |
Quality matters more than quantity. 500 carefully curated examples outperform 5,000 noisy ones.
The Pipeline
- Instrument your app to log interactions (with user consent)
- Accumulate data over 2-4 weeks of normal usage
- Export and anonymize the logged interactions
- Clean and filter using the quality criteria above
- Format into the chat JSON structure
- Split into training (90%) and evaluation (10%) sets
- Fine-tune using a platform like Ertas: upload the formatted dataset, select your base model, train with LoRA, export GGUF
- Evaluate on the held-out set
- Deploy the GGUF model on-device
- Iterate by collecting more data and retraining periodically
Your app is generating training data right now. The question is whether you are capturing it.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Llama 3.2 for Mobile Apps: Fine-Tuning and On-Device Deployment
A complete guide to using Meta's Llama 3.2 1B and 3B models in mobile apps. Fine-tuning with LoRA, exporting to GGUF, and deploying on iOS and Android via llama.cpp.

Gemma 3 for Mobile: Fine-Tuning and On-Device Deployment
How to use Google's Gemma 3 models for on-device mobile AI. Model selection, fine-tuning with LoRA, GGUF export, and deployment via llama.cpp on iOS and Android.

API Logs to Training Data: Using Your Cloud AI History to Fine-Tune
Your existing cloud AI API logs are a ready-made training dataset. How to extract, clean, and format API interaction logs into fine-tuning data for an on-device model.