Air-Gapped Data Preparation for Security AI Models

    Ertas Data Suite gives cybersecurity teams an on-premise, air-gapped pipeline to prepare threat intelligence, log data, and incident reports for AI model training — without exposing sensitive security data to external services.

    The Challenges You Face

    Security Data Is the Most Sensitive Data

    Threat intelligence, vulnerability reports, incident response playbooks, and network logs contain information that adversaries would love to access. Sending this data to any external service — even a reputable AI provider — expands the attack surface and violates the principle of least exposure.

    Threat Data Arrives in Heterogeneous Formats

    STIX/TAXII feeds, Syslog entries, PCAP metadata, YARA rules, MITRE ATT&CK mappings, and free-text incident reports all need to be normalized before they can become useful training data. Each format has its own parsing challenges and domain-specific structures.

    Labeling Requires Security Expertise

    Classifying threat types, severity levels, and attack vectors requires analysts with deep security knowledge. Outsourcing labeling to generic annotation services is both a security risk and a quality risk — non-experts consistently mislabel nuanced security data.

    Model Training Data Is a Strategic Asset

    The datasets you build for security AI models represent years of accumulated threat intelligence and institutional knowledge. Losing control of this data — through a provider breach, an API vulnerability, or a terms-of-service change — could compromise your competitive advantage and your clients' security.

    How Ertas Solves This

    Ertas Data Suite is a native desktop application that runs entirely air-gapped — no network connection, no telemetry, no external dependencies. Install it on a secure workstation inside your SOC, SCIF, or isolated analysis environment and process the most sensitive security data with zero exposure risk.

    The five-module pipeline handles the full data preparation workflow. Ingest normalizes heterogeneous threat data sources into a consistent format. Clean removes noise, deduplicates entries, and standardizes field names. Label provides a purpose-built interface where security analysts tag threats using their domain expertise. Augment generates controlled variations to balance underrepresented threat categories. Export produces versioned, audit-trailed datasets ready for model training.

    Because every transformation is logged in an append-only audit trail, you maintain the chain-of-custody documentation that security frameworks require — and you can trace any model prediction back to the exact training data and preparation steps that produced it.

    Key Features for Cybersecurity Companies

    Data Suite

    True Air-Gap Operation

    Data Suite requires no network connectivity whatsoever. It runs as a standalone native application with all processing happening locally. No DNS lookups, no update checks, no telemetry. The application is fully functional in environments with no network interface at all.

    Data Suite

    Multi-Format Security Data Ingestion

    The Ingest module handles STIX bundles, CSV/JSON log exports, PDF incident reports, plain-text IOC lists, and structured threat feeds. Custom format parsers can be configured for organization-specific log schemas.

    Data Suite

    Analyst-Driven Labeling

    Security analysts label data using frameworks they already know — MITRE ATT&CK techniques, kill chain phases, severity classifications, and custom taxonomies. The interface surfaces context from related entries to improve labeling consistency and speed.

    Vault

    Immutable Audit Trail

    Every operation is logged to an append-only ledger with cryptographic integrity verification. The audit trail supports NIST CSF, SOC 2 Type II, and FedRAMP documentation requirements for AI systems used in security operations.

    Why It Works

    • Data Suite's air-gapped architecture satisfies the data handling requirements of CISA's Binding Operational Directives and DoD's CMMC Level 3 for AI training data preparation.
    • Security teams have prepared training datasets from classified incident reports without any data leaving the secure facility — enabling AI-assisted threat classification that was previously impossible due to data sensitivity restrictions.
    • The analyst-driven labeling interface reduces the time to prepare a labeled threat intelligence dataset from weeks of manual spreadsheet work to days of structured annotation.
    • Augmentation capabilities help address the class imbalance problem inherent in security data — rare but critical threat types get sufficient representation in training sets without artificial inflation.
    • The immutable audit trail provides the evidence needed to demonstrate that AI models used in security operations were trained on properly handled, properly labeled data.

    Example Workflow

    A cybersecurity firm wants to train a model that classifies phishing emails by attack vector and sophistication level. An analyst opens Ertas Data Suite on an air-gapped workstation in the analysis lab, ingests 20,000 confirmed phishing samples from the firm's internal repository via the Ingest module.

    The Clean module normalizes email headers, extracts URLs and attachment metadata, and deduplicates near-identical variants. Senior threat analysts use the Label module to classify each sample by technique (credential harvesting, malware delivery, BEC, etc.) and sophistication tier. The Augment module generates controlled variations of underrepresented categories to ensure balanced training data.

    The Export module produces a versioned JSONL dataset with a complete audit trail. The dataset is transferred via approved media to the firm's training infrastructure, where it produces a classification model that automatically triages incoming suspicious emails — routing the most sophisticated threats to senior analysts first.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.