AI Incident Response Playbook: What to Do When Your Model Gets It Wrong

AI incidents are not like software incidents. When a software system has a bug, you find the line of code, fix it, and deploy the patch. The failure mode is deterministic: the same input always produces the same wrong output.

AI failure modes are statistical. A model doesn't "break" in the traditional sense — it produces the wrong output with some probability across some distribution of inputs. The failure may have been occurring for weeks before anyone noticed. The affected population may be identifiable and large, which triggers disclosure obligations. The root cause may be something that happened during training — months before the failure surfaced — making remediation significantly more complicated than a code rollback.

These differences require a distinct incident response process. This playbook covers the full lifecycle: detection, triage, investigation, remediation, disclosure, and post-incident review.

Incident Severity Classification

Not all AI failures are equal. Classify severity immediately upon detection to ensure proportional response.

Severity	Definition	Examples
P0 — Critical	AI system caused or contributed to physical harm; financial loss >$100K; regulatory breach; or 1,000+ individuals affected by incorrect decisions	Incorrect medical recommendation acted upon; discriminatory loan decisions at scale; GDPR-notifiable data breach involving AI processing
P1 — High	AI system produced systematically wrong outputs for a defined group; compliance gap discovered; reputational risk if the incident became public	Fraud detection model blocking a demographic group at significantly higher rates; LLM generating factually false claims in customer-facing context
P2 — Medium	AI system producing incorrect outputs for a subset of inputs; no immediate harm to individuals; correctable without notification	Document summarization model failing on a specific document format; recommendation model producing irrelevant results for a specific input category
P3 — Low	Quality degradation noticed; no individual harm; no compliance implication	Model accuracy metrics declining toward but not beyond alert threshold; user-reported reduction in output quality

Severity escalation: Start with your best estimate and escalate if investigation reveals broader scope. It is better to over-classify and de-escalate than to under-classify and miss a notification deadline.

Phase 1: Detection and Triage

Target: 0-2 hours for P0 and P1 incidents

Detection Sources

AI incidents typically surface through one of five channels. Make sure your monitoring covers all of them:

Automated monitoring alerts — threshold breaches in model accuracy metrics, output distribution anomalies, latency or error rate spikes
User reports — customer support tickets, internal reports from employees using AI tools
Downstream metric anomalies — business metrics behaving unexpectedly in systems that depend on AI outputs (e.g., loan approval rates changing without policy changes)
Audit log anomalies — patterns in the audit log that indicate unexpected behavior (e.g., unusually high override rates from human reviewers, unusual input patterns)
Third-party reports — regulatory inquiry, journalist inquiry, partner notification, security researcher disclosure

Triage Checklist

Complete within the first 2 hours for P0/P1:

Step 1: Determine severity

What is the nature of the failure? (wrong classification, wrong generation, missing output, model unavailable)
Are individual people affected? If yes, how many and how severely?
Is there a regulatory reporting obligation (EU AI Act, GDPR Article 33, HIPAA)?
Assign initial severity: P0 / P1 / P2 / P3

Step 2: Identify the affected system

Which system is affected? (Reference model inventory ID)
What is the current model version in production?
When was this version deployed? Has it changed recently?
Is the failure limited to one model version, or could it affect other deployments?

Step 3: Estimate scope

How long has the failure likely been occurring? (Check logs from before detection)
How many decisions or outputs are potentially affected?
Is the failure on all inputs or a specific subset?

Step 4: Preserve evidence — do this before any remediation

Export audit logs covering the incident period (minimum: 48 hours before first detected anomaly to now)
Save sample inputs and outputs that demonstrate the failure
Record the current model version and configuration
Screenshot monitoring dashboards showing the anomaly
Do NOT update, rollback, or modify the model before evidence is preserved

Step 5: Immediate notifications

P0/P1: AI System Owner → AI Risk Officer → CISO/DPO → Legal (same hour)
P2: AI System Owner → AI Risk Officer (within 4 hours)
P3: AI System Owner (log and review)

Immediate Containment

After evidence is preserved, decide on containment:

Option A: Traffic rerouting — Route traffic away from the affected model version (to a backup version, fallback logic, or human-only workflow). Use this when a fallback is available and the failure is version-specific.

Option B: Pause the use case — Suspend AI-assisted processing and route all cases to human review or manual processing. Use this when no safe fallback exists or when human review is required by the incident's severity.

Document your containment decision and rationale. The choice between Option A and Option B, and the timing of that decision, will be reviewed in the post-incident review and may be examined by regulators.

Phase 2: Investigation

Target: 2-24 hours for P0/P1; up to 5 days for P2

Root Cause Analysis Framework

AI incidents typically trace to one of four root causes. Work through each systematically:

Root Cause Type 1: Model behavior change

Has the model version changed recently? Check the Model Change Log.
For vendor-API models: did the vendor update the model without notice? Compare current model behavior against your logged baseline.
For internal models: was the model retrained recently? On different data?
Diagnostic: run your standard evaluation set through the current model version and compare scores to the pre-incident baseline.

Root Cause Type 2: Data distribution shift

Are the failing inputs qualitatively different from the model's training data?
Has something changed in your upstream data pipeline that affects what the model receives?
Diagnostic: compare the statistical distribution of recent inputs to the training data distribution. Flag inputs that fall outside the training distribution.

Root Cause Type 3: Prompt or integration bug

For LLM-based systems: has the prompt template changed? Is there a bug in how context is assembled?
For pipeline systems: has the preprocessing logic changed in a way that produces malformed inputs?
Diagnostic: manually trace a failing case through the integration layer, step by step, before the model receives it.

Root Cause Type 4: Human oversight failure

Were human reviewers approving outputs they should have rejected?
Is the override rate unusually low? (Possible rubber-stamping)
Did reviewers receive sufficient context to identify the failure?
Diagnostic: review the audit log of human review decisions during the incident period. Calculate override rate and time-to-decision. Interview reviewers.

Evidence to Collect

Evidence Type	Where to Find It	Why It Matters
Model version at time of incident	Model inventory + deployment logs	Establishes exactly what was running
Sample of affected inputs and outputs	Audit logs	Characterizes the failure pattern
Model performance on eval set before and after	Model validation records	Quantifies performance change
Human review decisions during incident period	Audit logs	Determines if oversight failure contributed
Upstream data statistics	Data pipeline logs	Identifies distribution shift
Vendor change notifications (if applicable)	Email/API changelogs	Establishes if vendor caused change

Scope Confirmation

Once you have a hypothesis for the root cause, run a scope confirmation:

Identify the full population of inputs processed during the affected period
Apply the root cause hypothesis to classify each as likely-affected or not
For a sample of likely-affected cases, verify the failure manually
Produce a confirmed count and percentage of affected decisions

This number is what you report to regulators and what determines individual notification obligations. Take the time to get it right — over-reporting and under-reporting both have consequences.

Phase 3: Remediation

Remediation happens in three phases with distinct timelines.

Immediate Remediation (Day 1-2)

Roll back to the last known-good model version, OR suspend the use case if no safe version exists
Verify the rollback actually resolves the failure by testing on the affected input types
Restore normal operation only after confirming resolution — not before

Short-Term Remediation (Days 3-30)

Identify all cases processed during the incident period that may have been affected
Re-process affected cases with the corrected model (or human review, depending on severity)
Notify affected individuals if required by regulation or policy (see Phase 4)
Implement enhanced monitoring targeting the failure pattern to detect recurrence

Long-Term Remediation (Days 30+)

Root Cause	Long-Term Fix
Model behavior change (vendor)	Negotiate version pinning; evaluate vendor scorecard; consider alternative vendor or owned model
Model behavior change (internal retraining)	Improve evaluation process before promoting retrained models to production; add A/B testing period
Data distribution shift	Implement input distribution monitoring; update training data to include the new distribution
Prompt/integration bug	Add integration tests covering the failing case type; add input validation before model inference
Human oversight failure	Recalibrate reviewers; adjust review interface to surface relevant context; review threshold settings

Phase 4: Disclosure

Internal Disclosure

P0: Board-level notification within 24 hours of severity confirmation
P1: Executive (C-suite) notification within 24 hours; board notification at next regular meeting or sooner if warranted

Regulatory Disclosure

Regulatory disclosure obligations depend on jurisdiction, industry, and what happened:

EU AI Act (if applicable): Article 73 requires providers of high-risk AI systems to report serious incidents to the market surveillance authority of the member state. "Serious incident" means malfunction or use that leads to death, serious physical harm, or significant damage to property. Timeline: without undue delay.

GDPR Article 33: Personal data breaches (including those caused by or involving AI processing) must be reported to the supervisory authority within 72 hours. If AI processing caused incorrect decisions affecting individuals whose personal data was involved, assess whether this constitutes a breach.

HIPAA Breach Notification Rule (US, healthcare): If PHI was involved in the incident, assess breach notification obligations. Business associates must notify covered entities within 60 days.

SR 11-7 (US banking regulators): Model risk events should be documented and reported through existing model risk management reporting channels. P0 incidents may require direct regulator notification depending on your institution's agreement with its primary regulator.

Document your disclosure assessment even if you determine no notification is required. The documented analysis showing why you concluded notification wasn't required is itself a compliance artifact.

Individual Notification

If individuals received incorrect AI-generated decisions that affected their rights or access to services, Legal will advise on notification obligations (which vary by jurisdiction and industry). The technical investigation should produce: the list of affected individuals, the nature of the incorrect decision they received, and the corrected outcome.

Phase 5: Post-Incident Review

Conduct within 10 business days of incident closure. Document the results and store with the incident record.

Timeline reconstruction

When did the failure start? (Not when it was detected — when did it actually begin?)
When was it detected?
When was it contained?
When was it resolved?
What was the total time from start to resolution?

Root cause confirmed

What was the confirmed root cause?
What evidence confirmed it?

Lessons learned

What controls failed to prevent or detect this incident?
What controls worked as intended?
Was the response process followed? If not, why not?
What would have caught this faster?

Policy and process updates

What changes to monitoring, thresholds, or review processes will prevent recurrence?
What changes to the incident response process itself would improve future response?
Owner and deadline for each change

Model governance documentation updates

Update the Model Inventory entry (validation status, incident log link)
Update the model card if root cause reveals capability limitation
Update the Model Change Log if a rollback was performed

Common AI Incident Pitfalls

Not preserving logs before remediation. The single most common and damaging mistake. Once you roll back the model or clear processing queues, the evidence of exactly what happened may be gone. Preserve first, remediate second — always.

Assuming the rollback fixed everything without validation. Test on the specific input types that triggered the failure before declaring the incident resolved. Rollbacks can introduce different problems, or the root cause may not be the model version at all.

Treating the incident as purely technical and not engaging Legal and Compliance. Even P2 incidents can have regulatory implications that aren't apparent at first. Loop Legal in early and let them determine whether reporting is required — that determination should not be made by the engineering team alone.

Scope estimation based on gut feel rather than data. "We think a few hundred records were affected" is not an acceptable scope estimate for regulatory reporting. Run the analysis. If you can't run it accurately in time for a regulatory deadline, say so and provide your best estimate with explicit uncertainty bounds.

Not updating the model inventory after the incident. The inventory entry should reflect what happened: validation status, incident log reference, and any changes to oversight level. Auditors check consistency between incident records and inventory entries.

Connecting Audit Logs to Investigation

Root cause analysis for AI incidents depends entirely on the quality and completeness of your audit logs. If your logs don't capture model version at inference time, you can't confirm when a version change occurred. If they don't capture the full input, you can't characterize the failure pattern. If they don't capture human review decisions, you can't assess oversight failure as a contributing factor.

Ertas Data Suite generates immutable, timestamped audit records for every processing step — who ran it, what inputs were used, what the outputs were, and which operator reviewed it. For incident investigations, this means you have a complete, tamper-evident record to work from rather than reconstructing events from incomplete logs.

Book a discovery call with Ertas →