Ertas for PII Redaction Pipelines

    Build on-premise PII redaction pipelines that handle email, phone, SSN, addresses, and medical IDs — with full audit trail and compliance logging. Designed for AI/ML teams preparing training data from sensitive enterprise documents.

    The Challenge

    Organizations handling client data for AI/ML projects must redact PII before any model training or RAG ingestion. Manual redaction is slow and error-prone. Regex-based scripts miss edge cases. Cloud redaction tools require data egress that regulated clients prohibit.

    The Solution

    Ertas Data Suite's PII Redactor node handles email, phone, SSN, addresses, and medical IDs deterministically. Runs as part of a visual pipeline — File Import → Parser → PII Redactor → Quality Scorer → Exporter. Every redaction logged with timestamp and operator ID. Entirely on-prem.

    Key Features

    Data Suite

    Configurable PII Entity Detection

    Select entity types to detect and choose redaction method — mask, replace, or remove. Configure per pipeline to match client compliance requirements.

    Data Suite

    Pipeline-Integrated Redaction

    PII redaction as a node in the visual pipeline, not a standalone tool. Chain with parsing, quality scoring, and export nodes for end-to-end workflows.

    Data Suite

    Redaction Audit Trail

    Every entity detected and redacted is logged — entity type, location, redaction method, timestamp, and operator. Exportable for compliance verification.

    Data Suite

    Quality Verification

    Quality Scorer node downstream verifies redaction completeness. Documents with potential missed PII are flagged for manual review before export.

    Example Workflow

    A service provider receives client healthcare documents for clinical NLP model training. They build a pipeline in Ertas Data Suite: File Import → PDF Parser → PII Redactor (configured for medical IDs, patient names, addresses) → Quality Scorer → JSONL Exporter. The pipeline processes 10,000 documents on the client's on-prem workstation. The audit trail is exported to the client's compliance team showing every redaction decision. Clean, de-identified JSONL is ready for clinical NLP model training.

    Compliance & Security

    PII Redactor supports GDPR-required data minimization, HIPAA Safe Harbor de-identification method, and EU AI Act Article 30 data governance documentation. All processing runs on-prem with no data egress.

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.