EU AI Act Operational Evidence: What Auditors Actually Ask For

There is a gap between what companies think compliance means and what auditors actually examine. Most organizations approach the EU AI Act like they approach ISO certifications — write policies, create checklists, have someone sign off, file it away. This worked for traditional compliance frameworks. It does not work for the EU AI Act.

The EU AI Act is evidence-based, not declaration-based. The regulation requires that high-risk AI systems demonstrate compliance through operational evidence — verifiable, machine-readable records that show the system is compliant right now, not that someone declared it compliant six months ago.

This distinction matters enormously for AI data pipelines. You cannot declare that your training data is high-quality. You must demonstrate it with metrics, logs, and traceable lineage. You cannot declare that your data governance is adequate. You must show the auditor the system that enforces it.

Here is what auditors actually examine, based on the requirements in Articles 10, 11, 12, and 30 of the regulation.

What Auditors Actually Examine

1. Data Lineage

The first question an auditor will ask: "Show me how this model's training data was produced."

This is not a conceptual question. They want to see the actual chain — from source document to final training example, through every transformation, with timestamps and operator identification at each step.

What they want to see:

Source identification: Where did the raw data come from? Document names, dates, collection methods, consent/license status.
Transformation chain: Every operation applied to the data — filtering, cleaning, labeling, augmenting, format conversion — logged in sequence with timestamps.
Version linkage: Which version of the training data was used for which version of the model? If the model was retrained in February 2026, which dataset version was used?
Reversibility: Can you roll back to a previous dataset version and explain why changes were made?

What they will test: The auditor picks a random output from the model, then asks you to trace it back through the model version, dataset version, transformation history, and source data. If the chain breaks at any point, that is a finding.

Typical failure mode: The data engineering team processed the data in a Jupyter notebook, exported the final dataset, and did not save the notebook or its intermediate outputs. Six months later, no one can explain how the dataset was produced.

2. Transformation Logs

Logs are the backbone of operational evidence. Every operation on training data must produce an immutable, timestamped record.

What a compliant log entry looks like:

{
  "timestamp": "2026-02-14T09:32:17.441Z",
  "operator_id": "anna.schmidt@company.eu",
  "operation": "text_cleaning",
  "parameters": {
    "remove_html_tags": true,
    "normalize_unicode": true,
    "min_text_length": 50,
    "language_filter": ["en", "de"]
  },
  "input_records": 24891,
  "output_records": 22347,
  "records_removed": 2544,
  "removal_reason_distribution": {
    "below_min_length": 1823,
    "unsupported_language": 721
  },
  "pipeline_version": "3.1.2",
  "log_integrity_hash": "sha256:a4f2e8..."
}

What auditors look for in logs:

Completeness: Is every transformation logged, or are there gaps? If the dataset went from 50,000 records to 35,000 records with no log entry explaining the reduction, that is a gap.
Granularity: "Cleaned the data" is not a log entry. What was cleaned? How? What was removed? Why?
Operator identification: Who performed the operation? "System" or "admin" is not acceptable — the regulation requires individual accountability.
Immutability: Can log entries be modified after creation? If yes, the log has no evidentiary value.
Continuity: Are logs continuous from the beginning of the pipeline, or did logging start three months ago? Gaps in the log timeline suggest the system was not consistently monitored.

What they will test: The auditor requests logs for a specific time period or a specific transformation type. They check timestamps for consistency, look for gaps, and verify that operator IDs correspond to real personnel.

3. Quality Metrics

Article 10 requires that training data be "relevant, sufficiently representative, and as far as possible, free of errors and complete." Auditors need evidence that you assessed these properties, not just a claim that the data is good.

What qualifies as quality evidence:

Statistical profiles of the dataset: size, feature distributions, class balance, coverage analysis
Quality scores at each pipeline stage: OCR confidence, cleaning validation, label agreement rates
Threshold documentation: What quality standards were applied? What was the minimum acceptable accuracy? What happened to data that fell below the threshold?
Trend analysis: How has data quality changed over time? Are quality metrics improving, degrading, or stable?

What does NOT qualify:

"We reviewed the data and it looked good." This is subjective and unverifiable.
A single quality assessment from six months ago. Quality must be monitored continuously, not assessed once.
Quality metrics without defined thresholds. A 92% accuracy score means nothing without a documented threshold (e.g., "minimum 90% accuracy required for production use").

4. Version Control

The regulation implicitly requires reproducibility — the ability to recreate the exact training data used for any deployed model version. This requires version control for datasets, not just code.

What auditors look for:

Dataset version identifiers linked to model version identifiers
The ability to checkout or regenerate any historical dataset version
Change logs between dataset versions (what changed and why)
Protection against accidental modification (write protection on finalized dataset versions)

What they will test: "Give me the exact dataset used to train the model version deployed on January 15, 2026." If you cannot produce it — exact contents, not an approximation — that is a finding.

5. Live Demonstration

This is the one that catches unprepared organizations. Auditors do not just review documents — they ask for a live demonstration of the system.

What a live demonstration looks like:

The auditor watches an operator process new data through the pipeline
The auditor observes that the system automatically generates log entries
The auditor checks that the log entries match what was observed
The auditor attempts to modify a log entry (expecting it to fail)
The auditor follows a data lineage chain in real-time

Why this matters: Live demonstrations catch "paper compliance" — organizations that created documentation but do not actually use the systems described. If the operator hesitates, opens a different tool than the one documented, or cannot navigate the audit trail interface, the auditor sees that the compliance infrastructure is not operationally integrated.

What Is NOT Acceptable

Based on the regulation's requirements and early enforcement guidance, the following do not constitute compliance evidence:

Manual spreadsheets: A Google Sheet or Excel file where team members manually log their data processing activities. Spreadsheets have no write protection, no guaranteed timestamps, and no integrity verification. They can be edited retroactively without trace.

Shared drive documentation: A folder of documents describing the data pipeline. Documents can be modified, backdated, or fabricated. Without version control and integrity hashing, they have no evidentiary value.

Self-attestation without supporting logs: A signed statement saying "Our training data meets quality standards" without the metrics, thresholds, and continuous monitoring records to support it.

Retroactive documentation: Documentation created in response to an audit request rather than as part of ongoing operations. Auditors check document metadata — creation dates, modification dates, and version histories reveal when documentation was actually produced.

Third-party certifications alone: A certificate from a vendor saying their tool is "EU AI Act compliant" does not transfer compliance to the user. The user must demonstrate that they use the tool in a compliant manner with operational evidence from their own deployment.

How to Prepare: Run a Mock Audit

The single most effective preparation step is a mock audit. Here is how to run one.

Select an auditor: Choose someone outside the data team — a compliance officer, a legal team member, or an external consultant. They need enough technical understanding to evaluate evidence but should not be involved in producing it.

Scope the audit: Pick one AI system and its associated data pipeline. The mock audit should cover all five evidence categories described above.

Provide auditor access: Give the mock auditor the same access a real auditor would have — read-only access to logs, lineage systems, documentation, and the pipeline interface.

Execute the audit: The mock auditor follows the same process a real auditor would:

Request the data lineage for the current production model
Request transformation logs for a specific date range
Request quality metrics and threshold documentation
Request the ability to reproduce a historical dataset version
Observe a live demonstration of the pipeline in operation
Attempt to identify gaps, inconsistencies, or weaknesses

Document findings: Every gap, inconsistency, or missing piece of evidence is documented with a severity rating and a remediation recommendation.

Remediate: Fix every finding before the real audit arrives. The findings from a mock audit are a gift — they tell you exactly what to fix while there is still time to fix it.

Plan 2-3 days for a mock audit of a single pipeline. If your organization has multiple in-scope AI systems, run mock audits on the highest-risk systems first.

Evidence Format Requirements

The regulation does not prescribe a specific log format, but the implicit requirements point to clear standards.

Machine-readable: Logs must be parseable by automated tools. JSON, structured CSV, or database records — not free-text notes.

Timestamped: Every record must have a timestamp from a trusted time source. NTP-synchronized system clocks are the minimum; hardware security module timestamps are the gold standard for high-risk systems.

Immutable: Once written, log entries cannot be modified or deleted. Append-only databases, write-once storage, or cryptographically signed log chains provide immutability.

Attributable: Every record must identify the operator (person or system) that performed the action. Service accounts are acceptable for automated operations, but the person who configured and authorized the automated operation must be traceable.

Retainable: Records must be retained for the lifetime of the AI system plus a reasonable period (10 years is the standard interpretation for high-risk systems). Plan storage accordingly — at roughly 1KB per log entry, a pipeline processing 10,000 records per day generates approximately 3.6 million log entries per year, or about 3.6GB of raw log data.

How Ertas Data Suite Generates Evidence

Ertas Data Suite was built with EU AI Act compliance as a design requirement, not an afterthought. Every operation in the platform — every filter, every label, every export — automatically generates a compliant log entry with timestamp, operator ID, parameters, and record counts.

Data lineage is tracked natively. You can select any training example in the exported dataset and trace it back to the source document, through every transformation, with a visual lineage graph.

Dataset versioning is built in. Every export creates a versioned snapshot. You can reproduce any historical dataset version with a single action.

The audit trail is immutable. Log entries are append-only with cryptographic integrity hashing. An auditor can verify that no entries have been modified since creation.

For organizations facing the August 2, 2026 deadline, adopting a platform with built-in compliance evidence generation is faster than building compliance infrastructure from scratch. The implementation timeline drops from 3-4 months of engineering work to 2-3 weeks of data migration and configuration.

Your data is the bottleneck — not your models.

Ertas Data Suite turns unstructured enterprise files into AI-ready datasets — on-premise, air-gapped, with full audit trail. One platform replaces 3–7 tools.

Book a Discovery Call Learn about Ertas Data Suite →