客户支持训练数据 Template

用于构建客户支持对话数据集的模板，微调能够处理工单、解决问题并保持品牌语气的 AI 助手。

NLP

Overview

客户支持训练数据集旨在微调语言模型，使其能够处理客户咨询、排查问题、升级复杂案例，并在整个交互过程中保持一致的品牌语气。这些数据集捕捉了高效客服对话的模式——问候客户、理解问题、提供分步解决方案，以及在确认问题已解决后结束交互。

构建高质量的客户支持数据集需要精心整理真实的支持交互记录，并结合涵盖边缘案例、罕见产品问题和升级场景的合成示例（这些在历史数据中可能代表不足）。数据集应涵盖完整的支持场景：账户管理、账单咨询、技术排查、产品退货、功能请求和投诉处理。每个示例都应体现定义品牌支持标准的语气、同理心和问题解决方法。

最有效的客户支持数据集同时包含简单的单轮交互（密码重置、订单状态查询）和复杂的多轮对话（客服需要收集诊断信息、尝试多种解决方案，并根据客户的反馈调整方法）。包含优雅升级的示例——识别何时需要人工介入并平稳交接——能训练模型认识到自身的局限性。

Dataset Schema

typescript

interface CustomerSupportExample {
  conversations: {
    from: "system" | "human" | "gpt";
    value: string;
  }[];
  metadata: {
    category: "billing" | "technical" | "account" | "returns" | "general";
    product: string;
    resolution: "resolved" | "escalated" | "pending";
    sentiment_shift: "positive" | "neutral" | "negative_to_positive";
    turns: number;
  };
}

客户支持训练示例的 TypeScript 架构，包含对话元数据

Sample Data

json

[
  {
    "conversations": [
      {"from": "system", "value": "You are a friendly and professional customer support agent for TechCorp. Always greet customers by name when available, empathize with their issue, and provide clear step-by-step solutions. If you cannot resolve the issue, offer to escalate to a specialist."},
      {"from": "human", "value": "Hi, I've been charged twice for my subscription this month. Order #TC-48291."},
      {"from": "gpt", "value": "I'm sorry to hear about the double charge — I understand how frustrating that must be. Let me look into order #TC-48291 right away.\n\nI can see that a duplicate charge of $14.99 was processed on March 3rd. This appears to be a billing system error. I've initiated a refund for the duplicate charge, which should appear on your statement within 3-5 business days.\n\nIs there anything else I can help you with regarding your account?"},
      {"from": "human", "value": "That was fast, thank you! Actually, can I also upgrade to the annual plan?"},
      {"from": "gpt", "value": "Of course! I'd be happy to help with that. The annual plan is $149.99/year, which saves you about 17% compared to the monthly plan.\n\nHere's what will happen when I process the upgrade:\n1. Your current monthly billing cycle will end today\n2. A prorated credit for the remaining days will be applied\n3. The annual plan will start immediately\n\nShall I go ahead and process the upgrade?"}
    ],
    "metadata": {
      "category": "billing",
      "product": "subscription",
      "resolution": "resolved",
      "sentiment_shift": "negative_to_positive",
      "turns": 4
    }
  },
  {
    "conversations": [
      {"from": "system", "value": "You are a friendly and professional customer support agent for TechCorp."},
      {"from": "human", "value": "My smart thermostat keeps disconnecting from WiFi every few hours. Model ST-200."},
      {"from": "gpt", "value": "I'm sorry you're experiencing connectivity issues with your ST-200 thermostat. Let's troubleshoot this step by step.\n\nFirst, let's check a few things:\n1. Is your thermostat running firmware version 3.2 or later? You can check in Settings > About.\n2. How far is the thermostat from your WiFi router?\n3. Are other devices on your network experiencing similar disconnections?\n\nThis will help me narrow down whether it's a firmware issue, signal strength problem, or router compatibility concern."}
    ],
    "metadata": {
      "category": "technical",
      "product": "smart-thermostat",
      "resolution": "pending",
      "sentiment_shift": "neutral",
      "turns": 2
    }
  }
]

涵盖账单和技术支持场景的客户支持训练示例

Data Collection Guide

最优质的客户支持训练数据来源于你自己的历史支持交互记录。从你的工单系统（Zendesk、Intercom、Freshdesk）中导出已解决的工单，筛选出客户评分较高的对话（CSAT 4-5 星），并将其转换为训练格式。对所有客户个人信息进行脱敏处理——姓名、电子邮件地址、订单号和账户详情应替换为逼真的占位符或进行遮蔽。Ertas Data Suite 的 PII 脱敏引擎可自动完成这一关键步骤。

用合成示例补充历史数据，以填补覆盖空白。常见的空白包括：在真实数据中出现频率较低的罕见产品问题、没有历史支持数据的新产品上线、客服应将问题交由人工处理的升级场景、多语言支持交互，以及政策冲突的边缘案例。让你最优秀的客服人员撰写或审核合成示例，确保其反映真实的客户语言和适当的解决模式。

质量重于数量。2,000-5,000 条高质量、多样化的支持对话训练出的模型，优于 50,000 条低质量或重复的示例。确保在产品类别、问题类型、客户情绪等级和解决结果之间均衡分布。同时包含简单的快速解决方案和复杂的多步骤排查序列，以训练模型应对全方位的支持场景。

Quality Criteria

每个训练示例都应展示专业、有同理心的沟通方式，符合你品牌的支持标准。检查客服回复是否在提供解决方案之前先对客户的不满或担忧表示理解，是否提供了清晰可操作的步骤，是否避免使用专业术语（除非客户已展示出技术知识），以及是否在交互结束时确认问题已解决并提供进一步帮助。

验证数据集中技术解决方案的准确性和时效性。过时的排查步骤、错误的定价信息或对已停产产品的引用会训练模型给出错误答案。建立审核周期，让支持团队负责人每季度验证数据集中技术内容的准确性。标记那些解决方案需要模型无法获取的信息（如访问内部管理工具）的示例，并将其删除或修改为反映模型实际能力的版本。

通过覆盖指标衡量数据集质量：实际支持工单类别中有多少比例被涵盖，简单与复杂交互的分布如何，有多少示例包含成功的升级处理，以及有多少比例展示了从负面客户情绪中恢复的能力。一个均衡的数据集应在所有主要支持类别中都有代表，每个类别至少包含 50-100 个示例以确保充分覆盖。

Using This Template with Ertas

将你的历史支持对话导入 Ertas Data Suite，PII 脱敏引擎将自动检测并遮蔽客户姓名、电子邮件地址、电话号码、账户号码和其他个人标识信息。数据溯源追踪系统会准确记录哪些字段被脱敏处理以及使用了什么脱敏方法，为 GDPR 和 CCPA 合规生成所需的文档。

清洗和脱敏完成后，以 ShareGPT 或 JSONL 格式导出数据集，并使用 Ertas Studio 进行微调。训练后的模型以 GGUF 格式导出用于本地推理，确保你的支持 AI 处理的客户数据永远不会离开你的基础设施。

Recommended Model

对于客户支持微调，建议从 7B-8B 参数的基础模型开始，如 Llama 3.1 8B 或 Mistral 7B。这些模型足够大，能处理支持交互中细微的语言差异，同时又足够小，可实现高效微调和低延迟本地推理。以 Q4_K_M 量化导出的 GGUF 模型不到 5 GB，可在现代 CPU 上流畅运行。

对于处理高支持量的企业部署，可考虑 13B-14B 参数的模型以提升回复质量，尤其在复杂的技术排查场景中。针对你的具体支持基准测试两种大小的模型，找到适合你用例的质量与速度之间的最佳平衡。

Ship AI that runs on your users' devices.

Free plan with 30 credits/mo, no card required. Paid plans from $25/mo USD.

or view pricing →