Back to blog
    AI Governance Framework for Healthcare: HIPAA, FDA SaMD, and Clinical Oversight Requirements
    ai-governancehealthcare-aihipaafda-samdhuman-in-the-loopregulated-industries

    AI Governance Framework for Healthcare: HIPAA, FDA SaMD, and Clinical Oversight Requirements

    A practical AI governance framework for healthcare organizations. Covers HIPAA compliance, FDA Software as a Medical Device requirements, clinical human-in-the-loop design, and audit trail specifications.

    EErtas Team·

    AI in healthcare isn't a theoretical governance challenge. It's a regulatory compliance requirement with specific standards, audit expectations, and liability implications. Healthcare organizations deploying AI face a layered regulatory environment: HIPAA governs data handling, FDA regulates software that qualifies as a medical device, and the EU AI Act's high-risk classification captures clinical decision support systems used across European operations.

    This framework covers the governance structures, oversight mechanisms, and documentation practices that healthcare AI deployments require.


    Regulatory Landscape: What Applies to Your AI System

    Before designing governance, establish which regulations apply.

    HIPAA applies if your AI system processes, stores, or transmits protected health information (PHI). This includes systems that receive patient data as input (clinical notes, imaging metadata, lab values) or that produce outputs containing PHI. HIPAA's security, privacy, and breach notification rules all extend to AI systems that touch PHI.

    FDA's Software as a Medical Device (SaMD) framework applies if your AI system meets the definition of a medical device under 21 USC 321(h): software intended to diagnose, cure, mitigate, treat, or prevent disease, or to affect the structure or function of the body. Clinical decision support software that provides patient-specific recommendations based on patient data typically qualifies. Administrative AI (scheduling, billing, documentation summarization without clinical recommendations) typically doesn't.

    The EU AI Act's high-risk classification applies to AI systems intended for use in medical devices and clinical decision support systems covered by EU medical device regulations, deployed within the EU or to EU patients. High-risk classification requires risk management systems, data governance documentation, transparency to deployers, human oversight capability, and registration in the EU database.

    Joint Commission and CMS standards increasingly address AI in accreditation and certification criteria, particularly around clinical decision support governance and clinical staff training requirements.

    Determine which of these apply to each AI system in your environment before building the governance structure. The compliance burden differs substantially across these frameworks.


    Clinical Human-in-the-Loop Design

    For AI systems that inform clinical decisions, human-in-the-loop architecture isn't optional governance — it's the design baseline. Three structural questions must be answered for every clinical AI deployment:

    What decisions does the AI inform? Define the decision scope precisely. "Assists with sepsis risk stratification" is specific enough to design governance around. "Assists clinical decision-making" is too broad and creates liability without clarity.

    Who reviews the AI's output before it affects patient care? Identify the specific role (attending physician, nurse, pharmacist) responsible for review. The reviewer must have the clinical authority to accept, modify, or reject the AI recommendation. If the only person reviewing the output is the same person who entered the data, that's not meaningful oversight.

    What happens when the AI and the clinician disagree? The override workflow is as important as the default workflow. Document the override mechanism, require documentation of the clinical rationale when the AI recommendation is not followed, and track override patterns as a quality signal.

    Confidence Thresholds and Escalation

    Well-designed clinical AI systems don't present all outputs the same way. Build confidence thresholds into the presentation layer:

    • High confidence outputs (the model is operating in its validated range): present to the responsible clinician with a clear recommended action
    • Moderate confidence outputs (the model is at the edges of its validated range): flag for explicit clinician review before action
    • Low confidence outputs (out-of-distribution inputs): escalate to a senior clinician or flag for human-only assessment, do not present the AI recommendation as guidance

    The threshold calibration should be validated against your clinical population, not just against benchmark datasets. Distribution shift is a real phenomenon — a model trained on one hospital's population may have different confidence characteristics on another's.


    HIPAA Compliance for AI Systems

    HIPAA compliance for AI systems has several components that standard data handling frameworks don't fully cover.

    Minimum Necessary Standard

    HIPAA's minimum necessary standard (45 CFR §164.514) requires limiting PHI access to the minimum necessary to accomplish the intended purpose. For AI, this means:

    • The AI system should only receive the data elements it actually needs to perform its function. A sepsis prediction model doesn't need the patient's insurance information or billing codes. Design data pipelines to pass only necessary fields.
    • Role-based access controls on the AI system itself: a billing coder should be able to query AI for coding assistance but should not be able to query the clinical AI for patient-specific clinical assessments.
    • Query-level audit: every AI query that includes PHI must be logged with the user's identity and the role authorization that permitted the query.

    Audit Control Requirements

    HIPAA's audit control standard (45 CFR §164.312(b)) requires mechanisms that record and examine activity in systems containing PHI. AI systems containing or processing PHI must produce audit-grade logs that include:

    • User identity and role at time of query
    • Timestamp in UTC
    • Data elements present in the query (not necessarily full query content for high-volume systems, but enough to identify what PHI was involved)
    • Model version queried
    • Output delivered
    • Whether the output was used in a clinical decision (if your system captures this)

    These logs must be retained per your HIPAA retention schedule and must be producible in response to a breach investigation or OCR audit.

    Business Associate Agreements

    If your AI vendor processes PHI on your behalf, a Business Associate Agreement (BAA) is required under HIPAA. This applies to:

    • Cloud AI API providers that receive PHI-containing prompts
    • Fine-tuning platforms that train on datasets containing PHI
    • Inference hosting providers where your AI model runs

    If you fine-tune on PHI and deploy locally, the inference layer doesn't require a BAA (you're your own business associate). The training step does if it involves a third-party compute provider.


    FDA SaMD Governance Requirements

    If your AI system qualifies as a Software as a Medical Device, FDA governance requirements apply in addition to HIPAA.

    Predetermined Change Control Plan

    FDA's AI/ML SaMD guidance emphasizes a Predetermined Change Control Plan (PCCP) — documentation of how you will manage model updates, retraining, and performance changes after initial clearance. The PCCP should describe:

    • Performance metrics and thresholds that trigger model review
    • The process for evaluating whether a model change constitutes a modification requiring new regulatory submission
    • Clinical validation requirements before deploying a retrained model
    • How you will communicate changes to clinical users

    For fine-tuned models you own: you control the retraining process. Document it. For cloud API-based systems: your PCCP must account for the vendor's model update practices, which may be outside your control — one reason many SaMD developers prefer owned model infrastructure.

    Algorithm Transparency

    FDA's guidance on AI/ML-based SaMD includes transparency requirements. Clinical users and clinical decision-makers must have access to:

    • What the AI is doing (its clinical function)
    • What data it was trained on (at the category level)
    • Its known performance characteristics (accuracy, sensitivity, specificity by relevant subgroup)
    • Its known limitations (patient populations where performance is lower, input conditions where the model may underperform)

    This information should be documented in a model card and made accessible to the clinical staff who use the system.

    Post-Market Surveillance

    SaMD requires post-market surveillance programs. For AI systems, this means tracking:

    • Model performance against real-world clinical outcomes (not just predicted vs. observed on held-out data)
    • Adverse event reports where the AI may have contributed to a clinical error
    • Override patterns as a performance signal
    • Demographic performance monitoring for bias detection

    Audit Trail Specification for Healthcare AI

    A healthcare AI audit trail must capture enough information to reconstruct every AI-influenced clinical decision. Minimum record per query:

    FieldValue
    Event IDUnique identifier per query
    TimestampUTC, millisecond precision
    User IDLinked to HR/credentialing system
    User roleAs authorized at time of query
    Patient IDDe-identified or pseudonymized per logging policy
    Data elements includedCategories (labs, vitals, notes) without full PHI in log
    Model IDSpecific model version and adapter version
    Output deliveredThe recommendation presented to the clinician
    Clinician actionAccepted / Modified / Rejected (if captured)
    Override reasonFree text if rejected (if captured)

    Retention: HIPAA minimum is 6 years for covered entities. State laws may extend this. EU AI Act high-risk systems require 10-year log retention.


    Governance Committee Structure

    Healthcare AI governance requires cross-functional accountability. A minimum viable governance committee includes:

    Clinical champion: A physician or advanced practice provider who can assess clinical validity of AI recommendations and serve as the liaison between governance and the clinical staff using the system.

    Compliance officer or privacy officer: Responsible for HIPAA and regulatory compliance review of each AI deployment.

    Information security: Responsible for access control, audit log integrity, and data security of the AI infrastructure.

    Legal counsel: Review of FDA regulatory status, vendor contracts (BAAs, indemnification), and liability implications.

    IT/engineering: Responsible for infrastructure, model deployment, and log implementation.

    This committee should meet at minimum quarterly, and before each new AI deployment, model update, or significant change to AI scope.


    On-Premise Infrastructure for Healthcare AI

    Healthcare organizations with strict PHI handling requirements have structural reasons to prefer on-premise AI inference. When AI models run on your infrastructure:

    • PHI never leaves your network to reach a cloud API endpoint
    • Your existing network access controls apply to the AI system
    • Your audit logging infrastructure captures AI queries the same way it captures any other clinical system query
    • You are not dependent on a BAA with a third-party inference provider for each production query

    The deployment model: fine-tune on cloud GPUs using de-identified or synthetic training data, export to GGUF format, deploy inference locally via Ollama or llama.cpp on your clinical network. Cloud for training, on-premise for inference.

    Book a discovery call with Ertas →

    Ertas Data Suite runs entirely on your infrastructure — no data egress during operation, no cloud inference calls, and logging of every processing step with operator identity. For healthcare organizations where PHI handling is non-negotiable, the on-premise architecture is the foundation that makes AI governance tractable.

    Turn unstructured data into AI-ready datasets — without it leaving the building.

    On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.

    Keep reading