
EU AI Act Article 10 Compliance: Data Prep Documentation as a Client Deliverable
How service providers turn EU AI Act Article 10 training data requirements into a structured client deliverable with documentation templates and deadlines.
The EU AI Act requires specific documentation for training data used in high-risk AI systems. Article 10 lays out requirements for data governance — how training data is collected, prepared, examined for biases, and documented. Articles 11 and 53, plus Annex IV, extend these into technical documentation requirements that must be maintained throughout the system's lifecycle.
The August 2, 2026 applicability deadline for high-risk AI systems is not theoretical. It applies to systems placed on the EU market or put into service after that date. If your enterprise client is deploying an AI system that falls under Annex III's high-risk categories — which includes systems used in healthcare, employment, law enforcement, education, and critical infrastructure — the training data documentation is not optional.
For service providers who prepare training data for these clients, this creates both an obligation and an opportunity. The obligation: your data preparation process must produce the documentation that Article 10 requires. The opportunity: delivering Article 10-compliant documentation as part of your engagement package differentiates you from providers who deliver a JSONL file and nothing else.
What Article 10 Actually Requires
Article 10 ("Data and Data Governance") specifies that high-risk AI systems must be developed using training, validation, and testing datasets that are subject to appropriate data governance and management practices. Specifically:
Data Governance Practices (Article 10(2))
- Design choices for the datasets
- Data collection processes and origin of the data
- Relevant data preparation processing operations (annotation, labeling, cleaning, enrichment, aggregation)
- The formulation of assumptions about what the data measures and represents
- An assessment of availability, quantity, and suitability of the datasets
- Examination of possible biases
- Identification of data gaps or shortcomings
Data Quality Criteria (Article 10(3))
Training datasets must be:
- Relevant and sufficiently representative for the intended purpose
- Free of errors and complete to the degree that the intended purpose requires
- Subject to appropriate statistical properties for the geographic, behavioral, or functional setting
Bias Examination (Article 10(2)(f))
Datasets must be examined for possible biases "that are likely to affect the health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination." This is not a check-the-box exercise. The examination must be documented with methodology and findings.
Connecting Article 10 to Technical Documentation (Article 11 / Annex IV)
Article 11 requires providers to draw up technical documentation before the system is placed on the market. Annex IV specifies what this documentation must contain. Section 2 of Annex IV covers training data:
| Annex IV Section | Requirement |
|---|---|
| 2(a) | Training methodologies and techniques used |
| 2(b) | Training datasets: characteristics, description, source |
| 2(c) | Information about origin, scope, main characteristics |
| 2(d) | How data was obtained and selected |
| 2(e) | Labeling procedures and annotation methodologies |
| 2(f) | Data cleaning and preprocessing measures |
| 2(g) | Data quality examination and verification |
For service providers, this means the technical documentation for the training data portion is effectively a deliverable you must produce.
The August 2026 Deadline Reality
The EU AI Act entered into force on August 1, 2024. The compliance timeline for high-risk AI systems is:
- February 2, 2025: Prohibited AI practices take effect
- August 2, 2025: Obligations for general-purpose AI models take effect
- August 2, 2026: Full application of obligations for high-risk AI systems
Any high-risk AI system placed on the EU market or put into service after August 2, 2026 must comply with the full requirements — including Article 10 training data governance documentation.
For service providers, this means engagements that will deliver in Q3 2026 or later must already be planning for Article 10 documentation. If your current pipeline does not produce the required documentation, you have roughly 5 months to close the gap.
Enterprise clients are already adding EU AI Act compliance requirements to RFPs and vendor assessments. Service providers who can demonstrate their data preparation process produces Article 10-compliant documentation will be selected over those who cannot.
Practical Documentation Template
The following template structure covers the Article 10 and Annex IV requirements for training data documentation. Adapt it per engagement.
Section 1: Data Sources and Collection
1.1 Data Source Inventory
- Source name, type, owner, collection period
- Per-source record count and characteristics
- Legal basis for data processing (per GDPR if applicable)
1.2 Data Selection Criteria
- Inclusion/exclusion criteria applied
- Sampling methodology (if applicable)
- Rationale for data selection relative to intended purpose
1.3 Data Representativeness Assessment
- Geographic coverage
- Temporal coverage
- Demographic coverage (where relevant)
- Known limitations and gaps
Section 2: Data Preparation Operations
2.1 Preprocessing Steps
- Document parsing method and parameters
- Text extraction approach
- Cleaning operations (deduplication, normalization, filtering)
- Operator IDs and timestamps for each operation
2.2 De-identification and Redaction
- PII/PHI detection methods
- Entity types targeted
- Replacement strategy (mask, pseudonymize, remove)
- Validation results (detection rate, sample size)
2.3 Data Quality Measures
- Quality scoring criteria
- Records removed and reasons
- Error rate measurements
- Completeness assessment
Section 3: Annotation and Labeling
3.1 Annotation Methodology
- Task definition and label schema
- Annotation guideline version
- Annotator qualifications and training
3.2 Annotation Process
- Number of annotators
- Inter-annotator agreement methodology and results
- Disagreement resolution process
- Review and approval workflow
3.3 Label Distribution
- Per-label record counts
- Class balance assessment
- Underrepresented categories identification
Section 4: Bias Examination
4.1 Bias Assessment Methodology
- Methods used to examine potential biases
- Protected characteristics examined
- Tools and metrics employed
4.2 Findings
- Identified biases and their potential impact
- Mitigation measures applied
- Residual bias assessment
4.3 Limitations
- Known gaps in bias examination
- Areas where further assessment is recommended
Section 5: Dataset Description
5.1 Final Dataset Composition
- Total records, format, schema
- Source distribution
- Label distribution
- Quality score distribution
5.2 Dataset Versioning
- Version identifier
- Relationship to previous versions (if any)
- Change log from previous version
5.3 Known Limitations
- Coverage gaps
- Quality limitations
- Recommended use constraints
Turning Documentation Into Competitive Advantage
Most AI service providers deliver a training dataset and a brief README. The compliance documentation — if it exists — is assembled retroactively, often weeks after the engagement ends, from whatever logs and notes can be found.
Providers who integrate documentation production into their pipeline — generating it automatically as data flows through each stage — deliver a structurally different product. The documentation is:
- Complete: Every operation is captured, not just the ones someone remembered to log
- Contemporaneous: Timestamps and operator IDs are recorded at the time of the action, not reconstructed later
- Consistent: The same schema and format across all engagements, making it auditable and comparable
This is where the service provider's tool choice has direct business impact. A fragmented pipeline (Docling + custom scripts + Label Studio + augmentation scripts) requires manual documentation assembly. An integrated platform produces the documentation as a byproduct of normal operations.
Ertas Data Suite generates EU AI Act-compliant documentation automatically. Its Article 30 documentation export feature produces structured reports covering data governance, preprocessing operations, annotation methodology, and bias examination — formatted for inclusion in the technical documentation package required by Annex IV. Because every operation in the Ingest → Clean → Label → Augment → Export pipeline is logged to a unified audit trail, the documentation is complete by construction, not by retrospective effort.
Delivering to the Client
Structure the Article 10 documentation as a standalone section of your deliverable:
- PDF summary report for the compliance team (non-technical, high-level)
- Structured data export (JSON/CSV) for the technical team and for integration with the client's compliance management system
- Raw audit log for detailed review if needed
- Bias examination report as a separate document (some clients route this to a separate review committee)
Include it in your Statement of Work from the beginning. If the client knows they are getting Article 10 documentation as part of the engagement, it changes how they evaluate your proposal.
Conclusion
EU AI Act Article 10 is not a theoretical regulatory concern. It is a concrete set of documentation requirements with a concrete deadline, and it applies to a wide category of AI systems in regulated industries. For service providers who prepare training data for these systems, producing Article 10-compliant documentation is becoming a standard deliverable — and the providers who can produce it efficiently will capture the engagements.
The underlying requirement is structural: your data preparation process must log enough information, at the right granularity, to produce this documentation. If it does not, no amount of post-hoc writing will close the gap.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Building Audit-Ready Training Data Pipelines for Regulated Industry Clients
How AI service providers build training data pipelines that survive client compliance audits across GDPR, HIPAA, EU AI Act, and SOC 2 frameworks.

How to Pass a Client Compliance Audit for Your AI Data Preparation Workflow
Pre-audit checklist and practical guide for AI service providers preparing for client compliance audits across GDPR, HIPAA, EU AI Act, and SOC 2.

EU AI Act Article 10 vs. Article 30: What Your Data Team Needs to Know
A detailed comparison of EU AI Act Articles 10 and 30 — the two most critical provisions for AI training data governance, documentation, and compliance.