客戶支援資料集 Template

用於訓練客戶支援自動化和工單分類 AI 模型的結構化對話資料集範本

NLP

Overview

客戶支援訓練資料集旨在微調能夠處理客戶諮詢、排除問題、升級複雜案例，並在整個互動過程中維持一致品牌語調的語言模型。這些資料集捕捉有效支援對話的模式——問候客戶、了解他們的問題、提供逐步解決方案，以及在確認問題已解決後結束互動。

建構高品質的客戶支援資料集需要仔細整理真實的支援互動，並結合涵蓋邊緣案例、罕見產品問題以及在歷史資料中可能代表性不足的升級情境的合成範例。資料集應涵蓋完整的支援情境範圍：帳戶管理、帳單查詢、技術故障排除、產品退貨、功能請求和投訴處理。每個範例都應展示定義您品牌支援標準的語調、同理心和問題解決方法。

最有效的客戶支援資料集同時包含簡單的單輪互動（密碼重設、訂單狀態查詢）和複雜的多輪對話，其中客服人員需要收集診斷資訊、嘗試多種解決方案，並根據客戶回應調整方法。包含優雅升級的範例——識別何時問題需要人工介入並順暢移交——可訓練模型了解自身的局限性。

Dataset Schema

typescript

interface CustomerSupportExample {
  conversations: {
    from: "system" | "human" | "gpt";
    value: string;
  }[];
  metadata: {
    category: "billing" | "technical" | "account" | "returns" | "general";
    product: string;
    resolution: "resolved" | "escalated" | "pending";
    sentiment_shift: "positive" | "neutral" | "negative_to_positive";
    turns: number;
  };
}

客戶支援訓練範例的 TypeScript 結構定義，包含對話元資料

Sample Data

json

[
  {
    "conversations": [
      {"from": "system", "value": "You are a friendly and professional customer support agent for TechCorp. Always greet customers by name when available, empathize with their issue, and provide clear step-by-step solutions. If you cannot resolve the issue, offer to escalate to a specialist."},
      {"from": "human", "value": "Hi, I've been charged twice for my subscription this month. Order #TC-48291."},
      {"from": "gpt", "value": "I'm sorry to hear about the double charge — I understand how frustrating that must be. Let me look into order #TC-48291 right away.\n\nI can see that a duplicate charge of $14.99 was processed on March 3rd. This appears to be a billing system error. I've initiated a refund for the duplicate charge, which should appear on your statement within 3-5 business days.\n\nIs there anything else I can help you with regarding your account?"},
      {"from": "human", "value": "That was fast, thank you! Actually, can I also upgrade to the annual plan?"},
      {"from": "gpt", "value": "Of course! I'd be happy to help with that. The annual plan is $149.99/year, which saves you about 17% compared to the monthly plan.\n\nHere's what will happen when I process the upgrade:\n1. Your current monthly billing cycle will end today\n2. A prorated credit for the remaining days will be applied\n3. The annual plan will start immediately\n\nShall I go ahead and process the upgrade?"}
    ],
    "metadata": {
      "category": "billing",
      "product": "subscription",
      "resolution": "resolved",
      "sentiment_shift": "negative_to_positive",
      "turns": 4
    }
  },
  {
    "conversations": [
      {"from": "system", "value": "You are a friendly and professional customer support agent for TechCorp."},
      {"from": "human", "value": "My smart thermostat keeps disconnecting from WiFi every few hours. Model ST-200."},
      {"from": "gpt", "value": "I'm sorry you're experiencing connectivity issues with your ST-200 thermostat. Let's troubleshoot this step by step.\n\nFirst, let's check a few things:\n1. Is your thermostat running firmware version 3.2 or later? You can check in Settings > About.\n2. How far is the thermostat from your WiFi router?\n3. Are other devices on your network experiencing similar disconnections?\n\nThis will help me narrow down whether it's a firmware issue, signal strength problem, or router compatibility concern."}
    ],
    "metadata": {
      "category": "technical",
      "product": "smart-thermostat",
      "resolution": "pending",
      "sentiment_shift": "neutral",
      "turns": 2
    }
  }
]

涵蓋帳單與技術支援情境的客戶支援訓練範例

Data Collection Guide

最佳的客戶支援訓練資料來自您自身的歷史支援互動。從您的客服系統（Zendesk、Intercom、Freshdesk）匯出已解決的工單，篩選客戶評分較高的對話（CSAT 4-5 星），然後轉換為訓練格式。匿名化所有客戶個人資訊——姓名、電子郵件地址、訂單號碼和帳戶詳情應替換為逼真的佔位符或予以遮蔽。Ertas Data Suite 的個資遮蔽引擎可自動化這個關鍵步驟。

以合成範例補充歷史資料，針對覆蓋範圍的缺口。常見的缺口包括：在真實資料中出現頻率較低的罕見產品問題、尚無歷史支援資料的新產品上線、客服人員應移交給人工處理的升級情境、多語言支援互動，以及政策衝突的邊緣案例。請您最優秀的支援人員撰寫或審核合成範例，以確保它們反映真實的客戶用語和適當的解決模式。

品質優先於數量至關重要。2,000 至 5,000 筆高品質、多樣化的支援對話所產生的模型，將優於 50,000 筆低品質或重複的範例。確保在產品類別、問題類型、客戶情緒等級和解決結果之間有均衡的代表性。同時包含直接明瞭的解決方案和複雜的多步驟故障排除序列，以訓練模型應對完整範圍的支援情境。

Quality Criteria

每個訓練範例都應展示專業、富有同理心的溝通方式，符合您品牌的支援標準。檢查客服回應是否在跳入解決方案之前先承認客戶的挫折或關切、提供清晰可行的步驟、避免使用專業術語（除非客戶已展示技術知識），以及在確認問題已解決並提供進一步協助後結束互動。

驗證資料集中的技術解決方案是否準確且為最新。過時的故障排除步驟、不正確的定價資訊，或對已停產產品的引用，都會訓練模型給出錯誤答案。建立審核週期，讓支援團隊主管每季度驗證資料集中技術內容的準確性。標記那些解決方案需要模型無法取得的資訊（例如存取內部管理工具）的範例，並移除或修改它們以反映模型實際能做到的事。

透過覆蓋率指標衡量資料集品質：您實際支援工單類別的覆蓋百分比為何、簡單與複雜互動的分佈如何、有多少範例包含成功的升級，以及有多大比例展示了從負面客戶情緒的恢復。一個平衡良好的資料集應在所有主要支援類別中都有代表性，每個類別至少有 50-100 個範例以確保足夠的覆蓋。

Using This Template with Ertas

將您的歷史支援對話匯入 Ertas Data Suite，個資遮蔽引擎將自動偵測並遮蔽客戶姓名、電子郵件地址、電話號碼、帳號及其他個人識別資訊。資料血統追蹤功能記錄確切哪些欄位被遮蔽及使用何種遮蔽方法，為您建立符合 GDPR 和 CCPA 所需的合規文件。

清理和遮蔽完成後，以 ShareGPT 或 JSONL 格式匯出資料集，並使用 Ertas Studio 進行微調。訓練完成的模型以 GGUF 格式匯出供本地推論使用，確保您的支援 AI 所處理的客戶資料永遠不會離開您的基礎設施。

Recommended Model

對於客戶支援微調，建議從 7B-8B 參數的基礎模型開始，例如 Llama 3.1 8B 或 Mistral 7B。這些模型足夠大，能處理支援互動中細膩的語言，同時又足夠小，可進行高效微調和低延遲的本地推論。以 Q4_K_M 量化的 GGUF 匯出所產生的模型不到 5 GB，可在現代 CPU 上流暢運行。

對於處理大量支援需求的企業部署，可考慮 13B-14B 參數的模型以提升回應品質，特別是在複雜的技術故障排除情境中。針對您的具體支援基準測試兩種尺寸，以找到適合您使用案例的最佳品質與速度平衡。

Ship AI that runs on your users' devices.

Free plan with 30 credits/mo, no card required. Paid plans from $25/mo USD.

or view pricing →