HIPAA-Ready AI Training and Data Preparation for Healthcare

Ertas gives healthcare organizations a secure, on-premise data preparation pipeline and a visual fine-tuning platform — so you can build clinical AI models without exposing protected health information to third-party services.

The Challenges You Face

PHI Cannot Leave Your Network

HIPAA, HITECH, and institutional review board requirements make it nearly impossible to use cloud-based AI services that process protected health information. Most AI platforms require data uploads to external servers, creating compliance barriers that block adoption entirely.

Clinical Data Is Messy and Unstructured

Electronic health records, clinical notes, lab reports, and imaging metadata arrive in dozens of formats with inconsistent terminology, abbreviations, and missing fields. Preparing this data for AI training requires specialized cleaning and normalization that generic ETL tools cannot handle.

Audit Trails Are Non-Negotiable

Regulatory audits demand that every data transformation, access event, and model decision be traceable. Most ML workflows involve ad-hoc scripts and Jupyter notebooks that produce no audit trail, creating compliance gaps that surface during inspections.

Domain Expertise Lives with Clinicians, Not Engineers

The people who understand clinical workflows, medical terminology, and patient context are clinicians — not ML engineers. Building effective healthcare AI requires tools that let domain experts participate directly in data labeling and model evaluation.

How Ertas Solves This

Ertas Data Suite runs entirely on-premise as a native desktop application. Protected health information never leaves your network. The five-module pipeline — Ingest, Clean, Label, Augment, Export — processes clinical data through deterministic, auditable transformations that satisfy even the most stringent compliance requirements.

Every action in Data Suite is recorded in an append-only audit log that captures who did what, when, and to which data. This log integrates with your existing compliance documentation and can be exported for regulatory review at any time.

Ertas Studio complements the on-premise data pipeline by providing visual fine-tuning for clinical AI models. Once Data Suite has prepared and de-identified a training dataset, Studio's cloud training infrastructure handles the GPU-intensive work. The resulting model exports as a GGUF file that runs on your own infrastructure — so inference, like data preparation, stays within your security perimeter.

Key Features for Healthcare Organizations

Data Suite

Air-Gapped Data Processing

Data Suite operates without any network connection. Install it on a secure workstation, process PHI locally, and export clean datasets without any data ever touching the internet. Perfect for environments with strict network segmentation policies.

Vault

Compliance-Ready Audit Trail

Every data transformation, label assignment, and export operation is logged with timestamps, user identifiers, and before/after snapshots. Export audit logs in formats compatible with common healthcare compliance frameworks.

Data Suite

Clinician-Friendly Labeling Interface

The Label module presents data in context with annotation tools designed for clinical workflows. Clinicians can tag entities, classify documents, and validate AI-suggested labels without learning developer tools.

Data Suite

De-Identification Pipeline

Built-in PII and PHI detection within the Clean module identifies and redacts patient identifiers, dates, and location information before data is exported for training — adding a layer of protection even for on-premise workflows.

Why It Works

Data Suite's air-gapped architecture satisfies the technical safeguard requirements of HIPAA's Security Rule without any additional infrastructure modifications.
The append-only audit trail provides the documentation required for HITRUST CSF certification and supports OIG audit readiness.
Healthcare organizations have used Data Suite to prepare clinical NLP training datasets from unstructured EHR notes without any PHI leaving the hospital network.
Clinician-in-the-loop labeling has been shown to improve clinical NLP model accuracy by 15-25% compared to labels generated by non-clinical annotators.
GGUF model deployment on hospital-owned servers ensures that patient data used during inference remains entirely within institutional control.

Example Workflow

A hospital's informatics team wants to build a model that extracts medication lists from unstructured clinical notes. A data engineer opens Ertas Data Suite on a secure workstation within the hospital network, ingests 10,000 de-identified clinical notes via the Ingest module, and runs the Clean module to normalize formatting and remove boilerplate headers.

A team of clinicians uses the Label module to annotate medication mentions, dosages, and frequencies in a representative sample of 500 notes. The Augment module generates additional training examples through controlled paraphrasing. The Export module produces a versioned JSONL dataset with complete audit metadata.

The informatics team uploads the de-identified training set to Ertas Studio, fine-tunes a 13B model, and exports the GGUF. The model is deployed on the hospital's GPU server, runs entirely on-premise, and begins extracting medication data from new notes with clinician-validated accuracy.

Ship AI that runs on your users' devices.

Free plan with 30 credits/mo, no card required. Paid plans from $25/mo USD.

or view pricing →