Back to blog
    API Logs to Training Data: Using Your Cloud AI History to Fine-Tune
    training dataAPI logsfine-tuningmigrationon-device AIsegment:mobile-builder

    API Logs to Training Data: Using Your Cloud AI History to Fine-Tune

    Your existing cloud AI API logs are a ready-made training dataset. How to extract, clean, and format API interaction logs into fine-tuning data for an on-device model.

    EErtas Team·

    If you are currently using a cloud AI API (OpenAI, Anthropic, Google Gemini), you are already generating training data. Every API call you make contains an input (the user's request) and an output (the model's response). That is a training example.

    Your API logs are the fastest path from cloud AI to on-device AI. You do not need to create a dataset from scratch. You already have one.

    What API Logs Contain

    A typical API call log entry:

    {
      "timestamp": "2026-03-15T14:22:03Z",
      "model": "gpt-4o-mini",
      "messages": [
        {
          "role": "system",
          "content": "You are the shopping assistant for StyleApp..."
        },
        {
          "role": "user",
          "content": "Find me a blue dress for a summer wedding"
        },
        {
          "role": "assistant",
          "content": "Here are some suggestions for a summer wedding...\n\n1. Floral midi dress in navy blue...\n2. Light blue chiffon maxi dress..."
        }
      ],
      "tokens_used": {"input": 1842, "output": 387},
      "latency_ms": 1203
    }
    

    This log entry is already in the exact format needed for fine-tuning. The messages array is a training conversation.

    Extraction Pipeline

    Step 1: Export Your Logs

    Where your logs live depends on your architecture:

    If you log API calls yourself: Export from your database (PostgreSQL, MongoDB, etc.) or log aggregation service (Datadog, CloudWatch, etc.).

    If you use OpenAI's API: The API does not store logs by default. You need your own logging middleware. If you do not have one, set it up now. Every future API call is a potential training example.

    # Simple logging middleware example
    import json
    import datetime
    
    def log_api_call(request_messages, response_content, model, tokens):
        log_entry = {
            "timestamp": datetime.datetime.utcnow().isoformat(),
            "model": model,
            "messages": request_messages + [
                {"role": "assistant", "content": response_content}
            ],
            "tokens_used": tokens,
        }
        with open("api_logs.jsonl", "a") as f:
            f.write(json.dumps(log_entry) + "\n")
    

    Step 2: Filter for Quality

    Not every API response is good training data. Filter out:

    Failed responses: Timeout errors, malformed output, refusals.

    Low-quality outputs: Responses where the user immediately retried (indicating dissatisfaction), or where the output was truncated.

    Outliers: Unusually long or short responses that do not represent typical behavior.

    Off-task interactions: If users occasionally ask off-topic questions, exclude those unless you want the model to handle them.

    def is_quality_example(log_entry):
        messages = log_entry["messages"]
        assistant_msg = next(
            (m for m in reversed(messages) if m["role"] == "assistant"), None
        )
        if not assistant_msg:
            return False
    
        content = assistant_msg["content"]
    
        # Filter too-short responses
        if len(content) < 50:
            return False
    
        # Filter error responses
        if "I apologize" in content and "I cannot" in content:
            return False
    
        # Filter truncated responses
        if log_entry.get("finish_reason") == "length":
            return False
    
        return True
    

    Step 3: Remove the System Prompt (Optional)

    If you are fine-tuning, the model learns the behavior from training examples. You may not need the system prompt in the training data because the fine-tuned model will internalize the instructions.

    Two approaches:

    Keep the system prompt: The model learns to follow these specific instructions. Good if your system prompt is short and stable.

    Remove the system prompt: The model learns the behavior pattern without explicit instructions. Good if your system prompt is long (saves tokens in training) or if you want the behavior to be intrinsic rather than instruction-dependent.

    Step 4: Anonymize

    Remove PII from the training data:

    import re
    
    def anonymize_message(content):
        # Emails
        content = re.sub(r'\b[\w.-]+@[\w.-]+\.\w+\b', '[EMAIL]', content)
        # Phone numbers
        content = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', content)
        # Credit card numbers
        content = re.sub(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b', '[CARD]', content)
        # Addresses (basic pattern)
        content = re.sub(r'\d+\s+[\w\s]+(?:Street|St|Avenue|Ave|Road|Rd|Drive|Dr)\b', '[ADDRESS]', content)
        return content
    
    def anonymize_log(log_entry):
        for msg in log_entry["messages"]:
            msg["content"] = anonymize_message(msg["content"])
        return log_entry
    

    Step 5: Format for Training

    Convert your filtered, anonymized logs to the standard fine-tuning format:

    def log_to_training_example(log_entry):
        messages = []
        for msg in log_entry["messages"]:
            if msg["role"] in ("system", "user", "assistant"):
                messages.append({
                    "role": msg["role"],
                    "content": msg["content"]
                })
        return {"messages": messages}
    
    # Process all logs
    training_data = []
    for log_entry in load_logs("api_logs.jsonl"):
        if is_quality_example(log_entry):
            anonymized = anonymize_log(log_entry)
            example = log_to_training_example(anonymized)
            training_data.append(example)
    
    # Write training file
    with open("training_data.jsonl", "w") as f:
        for example in training_data:
            f.write(json.dumps(example) + "\n")
    

    How Many Logs Do You Need?

    API Calls Per DayTime to Collect 1,000 ExamplesTime to Collect 5,000 Examples
    1002-3 weeks (after quality filtering)2-3 months
    5003-5 days2-3 weeks
    1,0002-3 days1-2 weeks
    5,0001 day3-5 days

    Assume 50-70% of raw API calls survive quality filtering. At 500 calls per day, you accumulate 250-350 quality examples daily.

    For most tasks, 1,000 quality examples are sufficient for a well-performing fine-tuned model. You can start training within days to weeks of setting up logging.

    The Distillation Advantage

    When you fine-tune a small model (1-3B) on outputs from a larger model (GPT-4o, Claude Sonnet), you are performing knowledge distillation. The small model learns to reproduce the behavior of the large model on your specific task.

    The result: a 3B fine-tuned model that matches the large model's performance on your domain tasks while running on-device. This is not theoretical. Research and production deployments consistently show that fine-tuned small models match or exceed prompted large models on narrow, domain-specific tasks.

    Your API logs are the distillation dataset. The large model has already done the work. You just need to teach a small model to replicate it.

    From Logs to Deployment

    The end-to-end pipeline:

    1. Set up API logging (if not already in place)
    2. Accumulate 1,000+ quality examples (days to weeks)
    3. Extract, filter, anonymize, and format the logs
    4. Upload to a fine-tuning platform like Ertas
    5. Select a base model (Llama 3.2 3B recommended)
    6. Fine-tune with LoRA (30 min - 3 hours)
    7. Export GGUF (Q4_K_M quantization)
    8. Integrate llama.cpp in your mobile app
    9. A/B test the on-device model against your cloud API
    10. Migrate when the on-device model meets your quality bar

    Your API logs are not just a cost center. They are the bridge to on-device AI.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading