Ertas for PII Redaction Pipelines
Build on-premise PII redaction pipelines that handle email, phone, SSN, addresses, and medical IDs — with full audit trail and compliance logging. Designed for AI/ML teams preparing training data from sensitive enterprise documents.
The Challenge
Organizations handling client data for AI/ML projects must redact PII before any model training or RAG ingestion. Manual redaction is slow and error-prone. Regex-based scripts miss edge cases. Cloud redaction tools require data egress that regulated clients prohibit.
The Solution
Ertas Data Suite's PII Redactor node handles email, phone, SSN, addresses, and medical IDs deterministically. Runs as part of a visual pipeline — File Import → Parser → PII Redactor → Quality Scorer → Exporter. Every redaction logged with timestamp and operator ID. Entirely on-prem.
Key Features
Configurable PII Entity Detection
Select entity types to detect and choose redaction method — mask, replace, or remove. Configure per pipeline to match client compliance requirements.
Pipeline-Integrated Redaction
PII redaction as a node in the visual pipeline, not a standalone tool. Chain with parsing, quality scoring, and export nodes for end-to-end workflows.
Redaction Audit Trail
Every entity detected and redacted is logged — entity type, location, redaction method, timestamp, and operator. Exportable for compliance verification.
Quality Verification
Quality Scorer node downstream verifies redaction completeness. Documents with potential missed PII are flagged for manual review before export.
Example Workflow
A service provider receives client healthcare documents for clinical NLP model training. They build a pipeline in Ertas Data Suite: File Import → PDF Parser → PII Redactor (configured for medical IDs, patient names, addresses) → Quality Scorer → JSONL Exporter. The pipeline processes 10,000 documents on the client's on-prem workstation. The audit trail is exported to the client's compliance team showing every redaction decision. Clean, de-identified JSONL is ready for clinical NLP model training.
Compliance & Security
PII Redactor supports GDPR-required data minimization, HIPAA Safe Harbor de-identification method, and EU AI Act Article 30 data governance documentation. All processing runs on-prem with no data egress.
Related Resources
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.