
DPO 和偏好数据:在本地准备对齐数据集
DPO 对齐需要选择/拒绝响应对。对于有敏感数据的企业,此准备必须在本地进行。以下是构建偏好数据集的完整工作流程,无需数据外泄。
直接偏好优化(DPO)是当今企业团队可用的最实用的对齐技术。它引导模型行为——语气、准确性、策略合规、安全性——无需 RLHF 的基础设施复杂性。只需标记为"选择「和」拒绝"的响应对,和一次微调。
偏好数据集格式
{
"prompt": "客户问:'我能退订阅费吗?'",
"chosen": "我可以帮您。我们的退款政策允许在购买后 30 天内全额退款。您能分享一下订单号吗?",
"rejected": "当然,我马上处理您的退款!您应该在 24 小时内看到退回的钱。"
}
企业中偏好数据的来源
人类反馈日志、A/B 测试结果、质量审查的模型输出、专家纠正、内部风格指南和合规规则。
准备管道
- 收集提示-响应对(目标 1,000-2,000 个原始集)
- 领域专家排名或选择偏好响应(40-60 对/小时)
- 格式化为 DPO 对
- 标注者间一致性质量检查(Cohen's kappa 高于 0.7)
- 导出为 JSONL(85% 训练 / 15% 验证)
为什么必须在本地
偏好数据可以说比它衍生的原始训练数据更敏感。选择/拒绝对揭示了组织认为"好"的东西。拒绝的响应特别有揭示性。
规模要求
最低可行:500 对。推荐:2,000-3,000 对。全面:5,000+ 对。
DPO 对齐是数据质量问题,不是数据数量问题。
Your data is the bottleneck — not your models.
Ertas Data Suite turns unstructured enterprise files into AI-ready datasets — on-premise, air-gapped, with full audit trail. One platform replaces 3–7 tools.
延伸阅读
Turn unstructured data into AI-ready datasets — without it leaving the building.
On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.
Keep reading

How On-Premise Data Preparation Solves EU AI Act Documentation Requirements
Why on-premise data preparation platforms naturally satisfy EU AI Act documentation requirements — and why cloud-based and fragmented pipelines create compliance gaps.

AI Data Preparation for Construction: BOQs, Drawings, and Technical PDFs
How construction and engineering companies can convert BOQs, technical drawings, and project documentation into AI-ready training datasets — on-premise, with full audit trail.

AI Data Preparation for Insurance: Claims, Policies, and Underwriting Documents
How insurance companies can prepare claims forms, policy documents, and underwriting reports for AI model training — on-premise, with PII redaction and full compliance.