AI Governance Framework for Healthcare: HIPAA, FDA SaMD, and Clinical Oversight Requirements

AI in healthcare isn't a theoretical governance challenge. It's a regulatory compliance requirement with specific standards, audit expectations, and liability implications. Healthcare organizations deploying AI face a layered regulatory environment: HIPAA governs data handling, FDA regulates software that qualifies as a medical device, and the EU AI Act's high-risk classification captures clinical decision support systems used across European operations.

This framework covers the governance structures, oversight mechanisms, and documentation practices that healthcare AI deployments require.

Regulatory Landscape: What Applies to Your AI System

Before designing governance, establish which regulations apply.

HIPAA applies if your AI system processes, stores, or transmits protected health information (PHI). This includes systems that receive patient data as input (clinical notes, imaging metadata, lab values) or that produce outputs containing PHI. HIPAA's security, privacy, and breach notification rules all extend to AI systems that touch PHI.

FDA's Software as a Medical Device (SaMD) framework applies if your AI system meets the definition of a medical device under 21 USC 321(h): software intended to diagnose, cure, mitigate, treat, or prevent disease, or to affect the structure or function of the body. Clinical decision support software that provides patient-specific recommendations based on patient data typically qualifies. Administrative AI (scheduling, billing, documentation summarization without clinical recommendations) typically doesn't.

The EU AI Act's high-risk classification applies to AI systems intended for use in medical devices and clinical decision support systems covered by EU medical device regulations, deployed within the EU or to EU patients. High-risk classification requires risk management systems, data governance documentation, transparency to deployers, human oversight capability, and registration in the EU database.

Joint Commission and CMS standards increasingly address AI in accreditation and certification criteria, particularly around clinical decision support governance and clinical staff training requirements.

Determine which of these apply to each AI system in your environment before building the governance structure. The compliance burden differs substantially across these frameworks.

Clinical Human-in-the-Loop Design

For AI systems that inform clinical decisions, human-in-the-loop architecture isn't optional governance — it's the design baseline. Three structural questions must be answered for every clinical AI deployment:

What decisions does the AI inform? Define the decision scope precisely. "Assists with sepsis risk stratification" is specific enough to design governance around. "Assists clinical decision-making" is too broad and creates liability without clarity.

Who reviews the AI's output before it affects patient care? Identify the specific role (attending physician, nurse, pharmacist) responsible for review. The reviewer must have the clinical authority to accept, modify, or reject the AI recommendation. If the only person reviewing the output is the same person who entered the data, that's not meaningful oversight.

What happens when the AI and the clinician disagree? The override workflow is as important as the default workflow. Document the override mechanism, require documentation of the clinical rationale when the AI recommendation is not followed, and track override patterns as a quality signal.

Confidence Thresholds and Escalation

Well-designed clinical AI systems don't present all outputs the same way. Build confidence thresholds into the presentation layer:

High confidence outputs (the model is operating in its validated range): present to the responsible clinician with a clear recommended action
Moderate confidence outputs (the model is at the edges of its validated range): flag for explicit clinician review before action
Low confidence outputs (out-of-distribution inputs): escalate to a senior clinician or flag for human-only assessment, do not present the AI recommendation as guidance

The threshold calibration should be validated against your clinical population, not just against benchmark datasets. Distribution shift is a real phenomenon — a model trained on one hospital's population may have different confidence characteristics on another's.

HIPAA Compliance for AI Systems

HIPAA compliance for AI systems has several components that standard data handling frameworks don't fully cover.

Minimum Necessary Standard

HIPAA's minimum necessary standard (45 CFR §164.514) requires limiting PHI access to the minimum necessary to accomplish the intended purpose. For AI, this means:

The AI system should only receive the data elements it actually needs to perform its function. A sepsis prediction model doesn't need the patient's insurance information or billing codes. Design data pipelines to pass only necessary fields.
Role-based access controls on the AI system itself: a billing coder should be able to query AI for coding assistance but should not be able to query the clinical AI for patient-specific clinical assessments.
Query-level audit: every AI query that includes PHI must be logged with the user's identity and the role authorization that permitted the query.

Audit Control Requirements

HIPAA's audit control standard (45 CFR §164.312(b)) requires mechanisms that record and examine activity in systems containing PHI. AI systems containing or processing PHI must produce audit-grade logs that include:

User identity and role at time of query
Timestamp in UTC
Data elements present in the query (not necessarily full query content for high-volume systems, but enough to identify what PHI was involved)
Model version queried
Output delivered
Whether the output was used in a clinical decision (if your system captures this)

These logs must be retained per your HIPAA retention schedule and must be producible in response to a breach investigation or OCR audit.

Business Associate Agreements

If your AI vendor processes PHI on your behalf, a Business Associate Agreement (BAA) is required under HIPAA. This applies to:

Cloud AI API providers that receive PHI-containing prompts
Fine-tuning platforms that train on datasets containing PHI
Inference hosting providers where your AI model runs

If you fine-tune on PHI and deploy locally, the inference layer doesn't require a BAA (you're your own business associate). The training step does if it involves a third-party compute provider.

FDA SaMD Governance Requirements

If your AI system qualifies as a Software as a Medical Device, FDA governance requirements apply in addition to HIPAA.

Predetermined Change Control Plan

FDA's AI/ML SaMD guidance emphasizes a Predetermined Change Control Plan (PCCP) — documentation of how you will manage model updates, retraining, and performance changes after initial clearance. The PCCP should describe:

Performance metrics and thresholds that trigger model review
The process for evaluating whether a model change constitutes a modification requiring new regulatory submission
Clinical validation requirements before deploying a retrained model
How you will communicate changes to clinical users

For fine-tuned models you own: you control the retraining process. Document it. For cloud API-based systems: your PCCP must account for the vendor's model update practices, which may be outside your control — one reason many SaMD developers prefer owned model infrastructure.

Algorithm Transparency

FDA's guidance on AI/ML-based SaMD includes transparency requirements. Clinical users and clinical decision-makers must have access to:

What the AI is doing (its clinical function)
What data it was trained on (at the category level)
Its known performance characteristics (accuracy, sensitivity, specificity by relevant subgroup)
Its known limitations (patient populations where performance is lower, input conditions where the model may underperform)

This information should be documented in a model card and made accessible to the clinical staff who use the system.

Post-Market Surveillance

SaMD requires post-market surveillance programs. For AI systems, this means tracking:

Model performance against real-world clinical outcomes (not just predicted vs. observed on held-out data)
Adverse event reports where the AI may have contributed to a clinical error
Override patterns as a performance signal
Demographic performance monitoring for bias detection

Audit Trail Specification for Healthcare AI

A healthcare AI audit trail must capture enough information to reconstruct every AI-influenced clinical decision. Minimum record per query:

Field	Value
Event ID	Unique identifier per query
Timestamp	UTC, millisecond precision
User ID	Linked to HR/credentialing system
User role	As authorized at time of query
Patient ID	De-identified or pseudonymized per logging policy
Data elements included	Categories (labs, vitals, notes) without full PHI in log
Model ID	Specific model version and adapter version
Output delivered	The recommendation presented to the clinician
Clinician action	Accepted / Modified / Rejected (if captured)
Override reason	Free text if rejected (if captured)

Retention: HIPAA minimum is 6 years for covered entities. State laws may extend this. EU AI Act high-risk systems require 10-year log retention.

Governance Committee Structure

Healthcare AI governance requires cross-functional accountability. A minimum viable governance committee includes:

Clinical champion: A physician or advanced practice provider who can assess clinical validity of AI recommendations and serve as the liaison between governance and the clinical staff using the system.

Compliance officer or privacy officer: Responsible for HIPAA and regulatory compliance review of each AI deployment.

Information security: Responsible for access control, audit log integrity, and data security of the AI infrastructure.

Legal counsel: Review of FDA regulatory status, vendor contracts (BAAs, indemnification), and liability implications.

IT/engineering: Responsible for infrastructure, model deployment, and log implementation.

This committee should meet at minimum quarterly, and before each new AI deployment, model update, or significant change to AI scope.

On-Premise Infrastructure for Healthcare AI

Healthcare organizations with strict PHI handling requirements have structural reasons to prefer on-premise AI inference. When AI models run on your infrastructure:

PHI never leaves your network to reach a cloud API endpoint
Your existing network access controls apply to the AI system
Your audit logging infrastructure captures AI queries the same way it captures any other clinical system query
You are not dependent on a BAA with a third-party inference provider for each production query

The deployment model: fine-tune on cloud GPUs using de-identified or synthetic training data, export to GGUF format, deploy inference locally via Ollama or llama.cpp on your clinical network. Cloud for training, on-premise for inference.

Book a discovery call with Ertas →

Ertas Data Suite runs entirely on your infrastructure — no data egress during operation, no cloud inference calls, and logging of every processing step with operator identity. For healthcare organizations where PHI handling is non-negotiable, the on-premise architecture is the foundation that makes AI governance tractable.