EU AI Act Article 10 Implementation Playbook: From Raw Data to Compliant Pipeline

Article 10 of the EU AI Act is the article that affects data teams directly. While other articles address system-level requirements, risk management, and transparency obligations, Article 10 focuses specifically on training, validation, and testing data. It mandates how high-risk AI systems must handle the data that shapes their behavior.

The problem is that Article 10 is written in legal language. It says things like "training, validation and testing data sets shall be subject to data governance and management practices appropriate for the intended purpose of the AI system." Correct, but not actionable. A data engineer reading that sentence cannot start writing code.

This playbook translates each Article 10 requirement into concrete engineering tasks. For each sub-article, you get: the legal requirement, what it means in practice, the specific engineering work needed, and a template for the documentation that proves compliance.

Article 10(2)(a): Data Governance and Management Practices

Legal requirement: "Training, validation and testing data sets shall be subject to data governance and management practices appropriate for the intended purpose of the AI system."

What it means in practice: You need a documented system for managing training data throughout its lifecycle — from collection to retirement. "Appropriate for the intended purpose" means the governance must be proportional to the risk. A high-risk credit scoring system needs more rigorous governance than an internal document search tool.

Engineering tasks:

Establish a data governance policy document that covers:
- Who is responsible for training data quality (named role, not a team)
- What approval process is required before data enters the pipeline
- How data quality issues are reported and resolved
- How often data quality is reviewed
- What happens when data quality falls below thresholds
Implement role-based access control on training data:
- Data steward: can approve data sources and quality thresholds
- Data engineer: can process and transform data within approved parameters
- Annotator: can label data according to approved guidelines
- Auditor: read-only access to data, logs, and documentation
- Define these roles in your pipeline tooling, not just in a policy document
Create a data registry that tracks:
- All datasets used for training, validation, and testing
- The current version and status of each dataset (draft, approved, in-use, retired)
- The assigned data steward for each dataset
- Links to quality reports, bias assessments, and gap analyses

Documentation template:

Data Governance Policy — [System Name]
1. Scope: This policy applies to all training, validation, and testing data for [system name].
2. Roles: Data Steward: [name]. Data Engineers: [names]. Annotators: [names].
3. Approval Process: New data sources require Data Steward approval before ingestion.
4. Quality Thresholds: [defined metrics and minimum values]
5. Review Cadence: Quality review conducted [monthly/quarterly].
6. Issue Resolution: Quality issues reported via [system] and resolved within [timeframe].

Article 10(2)(b): Design Choices for Datasets

Legal requirement: Training data shall be subject to "relevant design choices."

What it means in practice: Document why you chose this specific dataset. What alternatives did you consider? Why was this data appropriate for the intended purpose? This prevents the common pattern of using whatever data was available without evaluating whether it was suitable.

Engineering tasks:

Create a dataset design document for each training dataset that records:
- The intended purpose of the AI system (specific use case, not a general description)
- What data would be ideal for this purpose
- What data is actually available
- The gap between ideal and available, and how this gap was addressed
- Alternative datasets that were considered and why they were not selected
Document the train/validation/test split strategy:
- How the data was split (random, stratified, temporal, domain-based)
- The rationale for the split ratios
- How the split ensures the validation set is representative of production conditions
Record any assumptions made during dataset design:
- "We assume that historical customer interactions from 2023-2025 are representative of future interactions"
- "We assume that the distribution of complaint types in the training data matches the distribution in production"
- These assumptions are testable — and auditors may ask for evidence

Documentation template:

Dataset Design Document — [Dataset Name]
1. Intended Purpose: [specific use case description]
2. Ideal Data: [description of what perfect training data would look like]
3. Available Data: [description of what was actually available]
4. Gap Analysis: [how ideal and available differ, and how gaps were addressed]
5. Alternatives Considered: [other datasets evaluated and reasons for rejection]
6. Split Strategy: [train/val/test split method and rationale]
7. Assumptions: [listed with testability assessment]

Article 10(2)(c): Data Collection Processes

Legal requirement: Training data shall be subject to "relevant data collection processes."

What it means in practice: Document how the data was obtained. This includes the sources, the collection methods, any consent or licensing requirements, and the dates of collection.

Engineering tasks:

Implement automated source tracking in your ingestion pipeline:
- Record the source URI or file path for every ingested document
- Record the collection date (when the data was obtained, not when the source was created)
- Record the collection method (API pull, file upload, web scrape, manual entry)
Maintain a source registry that tracks:
- All data sources currently in use
- The legal basis for using each source (consent, legitimate interest, license, public domain)
- License restrictions (if applicable): can the data be used for model training? Are there geographic restrictions?
- Expiration dates for time-limited licenses or consent
Implement consent tracking for data sourced from individuals:
- What consent was given?
- When was it given?
- Can it be withdrawn? If so, what is the process for removing that person's data from the training set?

Documentation template:

Data Collection Record — [Source Name]
1. Source: [URI, database, file location]
2. Collection Method: [API, upload, scrape, manual]
3. Collection Date: [date range]
4. Legal Basis: [consent, license, legitimate interest, public domain]
5. License Details: [license name, restrictions, expiration]
6. Consent Status: [applicable/not applicable, withdrawal process]
7. Update Frequency: [how often this source is re-collected]

Article 10(2)(d): Data Preparation Operations

Legal requirement: Training data shall be subject to "relevant data preparation operations, such as annotation, labelling, cleaning, updating, enrichment and aggregation."

What it means in practice: Every transformation applied to the data must be documented and logged. This is the most operationally intensive requirement because it covers the entire data pipeline.

Engineering tasks:

Implement automated logging for every transformation (see our article on immutable audit trails for the technical specification):
- Cleaning: what was removed, why, how many records affected
- Labeling: who labeled, what taxonomy was used, what quality checks were applied
- Augmentation: what method was used, how many synthetic records were created
- Aggregation: what records were combined, what aggregation logic was applied
Version every intermediate dataset:
- After each transformation stage, create a versioned snapshot
- Link each snapshot to the transformation log entries that produced it
- Enable rollback to any previous version
Validate at each stage:
- Run automated quality checks after each transformation
- Log the validation results alongside the transformation logs
- Define minimum quality thresholds — if a transformation degrades quality below the threshold, halt the pipeline and alert the data steward

Documentation template:

Data Preparation Log — [Dataset Version]
Generated automatically by pipeline logging system.
See audit trail entries [ID range] for detailed transformation records.
Summary:
- Cleaning: [X] operations, [Y] records modified, [Z] records removed
- Labeling: [X] records labeled by [Y] annotators, agreement rate [Z]%
- Augmentation: [X] synthetic records generated, [Y] passed quality filter
- Total input records: [N], Total output records: [M]

Article 10(2)(e): Assessment of Availability, Quantity, and Suitability

Legal requirement: Training data shall be subject to "an assessment of the availability, quantity and suitability of the data sets that are needed."

What it means in practice: Before training, formally assess whether you have enough data, whether the data you have is suitable for the intended purpose, and whether there are gaps.

Engineering tasks:

Quantitative assessment:
- Total records in training, validation, and test sets
- Records per class/category (for classification tasks)
- Minimum class representation threshold (e.g., at least 50 examples per category)
- Statistical power analysis: is the dataset large enough to detect the effects you need?
Suitability assessment:
- Domain coverage: does the training data cover all the scenarios the model will encounter in production?
- Temporal coverage: is the data current enough? If trained on 2024 data, will the model perform well on 2026 queries?
- Linguistic coverage: does the data cover all languages and dialects the system will serve?
- Edge case coverage: are rare but important scenarios represented?
Availability assessment:
- What data would improve the model but is not available?
- What is the plan for acquiring additional data?
- Are there legal or technical barriers to obtaining needed data?

Documentation template:

Data Assessment Report — [Dataset Version]
1. Quantity: [total records] training, [total] validation, [total] test
2. Class Distribution: [table of classes and record counts]
3. Minimum Representation: [threshold] — all classes meet/do not meet threshold
4. Domain Coverage: [assessment with specific gaps identified]
5. Temporal Coverage: Data from [date range]. Currency assessment: [current/stale/mixed]
6. Availability Gaps: [data that would improve the model but is unavailable]
7. Remediation Plan: [how gaps will be addressed]

Article 10(2)(f): Examination for Biases

Legal requirement: Training data shall be subject to "examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights or lead to discrimination."

What it means in practice: Analyze the training data for biases — not as a one-time checkbox, but as an ongoing process. The focus is on biases that could cause real harm: discrimination in hiring, biased credit decisions, unfair treatment in education or law enforcement.

Engineering tasks:

Demographic analysis (where applicable):
- Measure representation of protected characteristics in the training data
- Compare training data demographics to the population the model will serve
- Identify underrepresented or overrepresented groups
Label bias detection:
- Measure inter-annotator agreement across demographic groups
- Check whether label distributions differ by annotator background
- Identify systematic labeling patterns that could encode bias
Proxy variable detection:
- Identify features that correlate with protected characteristics (e.g., zip code as proxy for race)
- Document the decision to include, exclude, or mitigate each proxy variable
Fairness metrics:
- Compute fairness metrics on the validation set: demographic parity, equalized odds, predictive parity
- Define acceptable thresholds for each metric
- Document any metric that falls outside acceptable bounds and the remediation taken

Documentation template:

Bias Examination Report — [Dataset Version]
1. Demographic Representation: [table comparing training data demographics to target population]
2. Underrepresented Groups: [identified groups and degree of underrepresentation]
3. Label Bias Analysis: [inter-annotator agreement by group, findings]
4. Proxy Variables: [identified proxies, decision for each (include/exclude/mitigate)]
5. Fairness Metrics: [metric values vs thresholds, pass/fail for each]
6. Remediation Actions: [what was done to address identified biases]
7. Residual Risk: [biases that could not be fully mitigated, with justification]

Article 10(2)(g): Identification of Data Gaps

Legal requirement: Training data shall be subject to "the identification of relevant data gaps or shortcomings, and how those gaps and shortcomings can be addressed."

What it means in practice: This is not the same as the suitability assessment in (e). Gap identification is specifically about shortcomings that could affect the system's performance on its intended purpose — missing scenarios, underrepresented use cases, blind spots.

Engineering tasks:

Coverage gap analysis:
- Map all intended use cases to training data coverage
- For each use case, verify that the training data contains representative examples
- Identify use cases with zero or insufficient training data
Error analysis on validation set:
- Run the model on the validation set and analyze errors
- Categorize errors by type (false positives, false negatives, wrong class)
- Identify systematic error patterns that suggest data gaps
Remediation planning:
- For each identified gap, define a remediation plan: collect more data, synthesize examples, adjust the model scope, or add a human-in-the-loop for gap scenarios
- Set timelines for remediation
- Track remediation progress

Documentation template:

Data Gap Analysis — [Dataset Version]
1. Use Case Coverage Matrix: [table mapping use cases to training data availability]
2. Identified Gaps: [list of gaps with severity assessment]
3. Error Pattern Analysis: [systematic errors linked to data gaps]
4. Remediation Plan:
   - Gap 1: [description] → [remediation action] → [timeline] → [status]
   - Gap 2: [description] → [remediation action] → [timeline] → [status]
5. Accepted Risks: [gaps that cannot be remediated, with impact assessment and mitigations]

Putting It All Together

Article 10 compliance is not a one-time project. It is an ongoing operational requirement. The documentation described above must be:

Created before the system is deployed (or before the August 2, 2026 deadline for existing systems)
Updated whenever the training data changes, the model is retrained, or the system's scope changes
Reviewed periodically (quarterly at minimum) to ensure accuracy
Available for auditor review within a reasonable timeframe (same business day for high-risk systems)

The engineering tasks are substantial but well-defined. For a team starting from scratch, expect 8-12 weeks of implementation work across the pipeline logging, quality assessment, bias analysis, and documentation systems. For teams using Ertas Data Suite, much of this infrastructure is built in — the platform generates Article 10-compliant documentation from the data that its audit trail already captures.

The key principle: if it is not logged, it did not happen. Every assessment, every decision, every transformation must produce a verifiable record. That record is the evidence that Article 10 compliance is operational, not aspirational.

Your data is the bottleneck — not your models.

Ertas Data Suite turns unstructured enterprise files into AI-ready datasets — on-premise, air-gapped, with full audit trail. One platform replaces 3–7 tools.

Book a Discovery Call Learn about Ertas Data Suite →