HIPAA & AI Compliance

Building HIPAA-compliant AI models with on-premise training data and local inference

Overview

The Health Insurance Portability and Accountability Act (HIPAA) is the primary United States federal law governing the privacy and security of protected health information (PHI). Enacted in 1996 and continuously updated through the HITECH Act and subsequent rules, HIPAA establishes national standards for the protection of individually identifiable health information. For organizations developing AI in healthcare, HIPAA compliance is not optional — violations carry penalties ranging from $100 to $50,000 per violation, with annual maximums reaching $1.5 million per violation category.

HIPAA's scope covers all "covered entities" (health plans, healthcare clearinghouses, and healthcare providers) and their "business associates" — any organization that handles PHI on behalf of a covered entity. When an AI vendor processes patient records, clinical notes, diagnostic images, or any data that could identify a patient, they become a business associate subject to HIPAA's full regulatory requirements. This has profound implications for how AI training pipelines are designed and where patient data is allowed to flow.

The regulation is built on three main rules: the Privacy Rule, the Security Rule, and the Breach Notification Rule. The Privacy Rule establishes who can access PHI and under what conditions. The Security Rule specifies technical safeguards for electronic PHI (ePHI), including access controls, audit controls, integrity controls, and transmission security. The Breach Notification Rule mandates that covered entities notify affected individuals, HHS, and sometimes the media when unsecured PHI is compromised.

AI-Specific Requirements

AI systems in healthcare face heightened HIPAA scrutiny because they often require large volumes of patient data for training. The Security Rule's technical safeguard requirements (45 CFR 164.312) demand access controls that restrict system access to authorized users, audit controls that record and examine access to ePHI, integrity controls that protect ePHI from improper alteration or destruction, and transmission security that guards against unauthorized access during electronic transmission.

The minimum necessary standard is particularly relevant for AI training data preparation. Under this principle, organizations must limit PHI access to the minimum amount necessary to accomplish the intended purpose. For AI teams, this means carefully scoping training datasets to include only the clinical data elements required for the model's specific use case, rather than including entire patient records. De-identification under the HIPAA Safe Harbor method (removing 18 specific identifiers) or the Expert Determination method can allow data to be used outside HIPAA restrictions, but the de-identification must be thorough and verifiable.

Business Associate Agreements (BAAs) are required whenever PHI is shared with third-party service providers. For AI development, this means any cloud platform, annotation service, or model hosting provider that touches PHI must sign a BAA. Many cloud AI services either refuse to sign BAAs or impose significant limitations on how their services can be used with PHI. This creates a strong incentive for healthcare organizations to adopt on-premise AI solutions that eliminate the need for BAAs with external AI infrastructure providers.

How Ertas Helps

Ertas provides a HIPAA-aligned architecture that keeps PHI entirely within your organization's controlled environment. Ertas Data Suite operates as a fully on-premise, air-gapped desktop application — patient data never leaves your facility's network. There is no data egress to external servers, no cloud processing of PHI, and no need to execute Business Associate Agreements with AI infrastructure providers. This dramatically simplifies your HIPAA compliance surface area for AI development.

The built-in PII redaction engine in Ertas Data Suite is specifically designed to handle healthcare data patterns, detecting and masking patient names, medical record numbers, dates of birth, Social Security numbers, and other PHI identifiers. The data lineage tracking provides a complete chain of custody for every data transformation, allowing compliance officers to verify that de-identification was properly applied before data enters the training pipeline. Every action in the system is captured in comprehensive audit logs, satisfying HIPAA's audit control requirements under the Security Rule.

Ertas Studio's Vault feature provides the encryption and access control layer that HIPAA's Security Rule demands. All stored datasets and models are encrypted at rest, and role-based access controls ensure that only authorized personnel can view or modify PHI-derived training data. After training, models are exported in GGUF format for local inference, meaning clinical AI applications can run predictions on-site without transmitting patient data to external inference endpoints. This end-to-end on-premise workflow eliminates the breach risks associated with cloud-based AI inference on healthcare data.

Compliance Checklist

PHI remains on-premise with no external data transmissionSupported

Automated PHI detection and redaction in training dataSupported

Audit controls recording all access to ePHISupported

Encryption at rest for all stored health data and modelsSupported

Role-based access controls for PHI-containing datasetsSupported

Business Associate Agreement execution with ErtasPartial

HIPAA Security Rule risk assessment documentationCustomer Responsibility

Workforce training on HIPAA-compliant AI practicesCustomer Responsibility

Relevant Ertas Features

Air-gapped on-premise deployment
PHI-aware PII redaction engine
Comprehensive audit logging
Vault encryption at rest
Role-based access controls
Local GGUF inference with zero data egress

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →