Back to blog
    Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data
    rag-pipelinefinancial-servicesair-gappedpii-redactioncomplianceon-premisesegment:enterprise

    Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data

    Financial institutions handle PII-dense documents that cannot touch cloud infrastructure. Here is how to build an air-gapped RAG pipeline that meets SOC 2, GDPR, and internal audit requirements while keeping retrieval fast.

    EErtas Team·

    Financial statements, customer PII, and threat intelligence data must stay in air-gapped environments. That is not a preference — it is a regulatory requirement. Yet most RAG pipeline vendors assume internet connectivity for embeddings, vector database hosting, and model inference. That assumption disqualifies them from the conversation before a single document is ingested.

    This article covers how to build a RAG pipeline for financial services that operates entirely on-premise, handles PII-heavy documents without exposure risk, and satisfies the compliance frameworks that govern the industry.

    Why Standard RAG Pipelines Fail in Financial Services

    A typical RAG pipeline sends documents to a cloud embedding API, stores vectors in a hosted database, and calls a cloud LLM at inference time. Each of those three steps creates a compliance violation for most financial institutions.

    Embedding API calls transmit raw document text. When a financial analyst queries a RAG system about a client's portfolio, the retrieval step sends document chunks — containing account numbers, SSNs, transaction histories — to an external API. That is a data breach under most regulatory frameworks, regardless of whether the API provider claims SOC 2 compliance on their end.

    Hosted vector databases store document representations externally. Even though embeddings are not human-readable, they can be inverted to reconstruct approximate document content. Storing them on third-party infrastructure means PII has left your perimeter.

    Cloud LLM inference exposes query context. The retrieved chunks, combined with the user query, are sent to a cloud model. The full context window — including PII from retrieved documents — is now on someone else's servers.

    An air-gapped RAG pipeline eliminates all three failure points. Every component runs within your network perimeter. No data leaves.

    Compliance Requirements That Shape the Architecture

    Financial services RAG deployments must satisfy overlapping regulatory frameworks. The architecture is not optional — it is dictated by the following requirements.

    SOC 2 Type II

    SOC 2 Type II audits evaluate controls over a minimum six-month period. For a RAG pipeline, this means:

    • Access controls on who can query which document collections
    • Audit logging of every retrieval and inference event, with user identity, timestamp, documents retrieved, and query text
    • Change management for model updates, embedding model swaps, and index rebuilds
    • Encryption at rest for the vector store and document store
    • Encryption in transit for all internal API calls between pipeline components

    GDPR (Articles 17, 20, 25, 35)

    GDPR applies to any financial institution handling EU citizen data, regardless of where the institution is headquartered.

    • Right to erasure (Art. 17): You must be able to delete a specific individual's data from the vector store and re-index without that data. Cloud-hosted embeddings make this nearly impossible to verify.
    • Data portability (Art. 20): The RAG system must be able to export all data associated with a data subject in a portable format.
    • Data protection by design (Art. 25): PII must be identified and handled with appropriate safeguards at every stage — ingestion, chunking, embedding, storage, retrieval, and generation.
    • DPIA (Art. 35): A data protection impact assessment is required before deploying AI systems that process PII at scale.

    MiFID II Record-Keeping

    MiFID II requires financial firms to retain records of all communications and decisions related to client transactions. If a RAG-powered system contributes to investment research, risk assessment, or client communication, every query and every generated response must be retained for a minimum of five years — seven years in some jurisdictions.

    This means the RAG pipeline needs an immutable audit log with the following fields per event: timestamp, user identity, query text, retrieved document IDs with relevance scores, generated response, and model version.

    The Air-Gapped RAG Architecture

    An air-gapped RAG pipeline for financial data has five stages, all running within the network perimeter.

    Stage 1: Document Ingestion and PII Detection

    Raw documents enter the pipeline — financial statements, KYC forms, transaction records, compliance reports. Before any processing, a PII detection pass identifies and tags sensitive fields: account numbers, SSNs, tax IDs, names, addresses, dates of birth.

    This is where Ertas Data Suite's PII Redactor operates. Running as a desktop application with no internet requirement, it scans incoming documents and tags every financial identifier. The tagged PII metadata travels with the document through the pipeline, enabling field-level access controls downstream.

    Stage 2: Chunking and Preprocessing

    Tagged documents are split into retrieval-friendly chunks. Financial documents require domain-aware chunking:

    • Table-aware splitting preserves financial tables as atomic units rather than splitting rows across chunks
    • Section-boundary detection keeps regulatory filing sections (risk factors, management discussion, financial statements) intact
    • Metadata propagation ensures every chunk inherits the PII tags from its source document

    Stage 3: Local Embedding Generation

    An open-source embedding model runs on-premise. No API calls. Models in the 300M-500M parameter range (such as E5-large or BGE-large) produce high-quality embeddings on modest hardware — a single GPU or even CPU-only inference for smaller document collections.

    Embedding generation is a batch process. A collection of 100,000 document chunks can be embedded in under two hours on a single NVIDIA T4.

    Stage 4: Local Vector Storage and Retrieval

    The vector store runs on-premise. Open-source options like Qdrant, Milvus, or Weaviate deploy as self-hosted services within your network. No data leaves.

    Retrieval queries run locally. When a user queries the system, the query is embedded using the same local model, similarity search runs against the local vector store, and the top-k chunks are returned — all within the air-gapped perimeter.

    Stage 5: Local Inference with Audit Logging

    A locally deployed LLM generates responses using retrieved context. The model, the query, and the retrieved chunks never leave your infrastructure. Every inference event is logged to the immutable audit store with full provenance: which documents were retrieved, which user initiated the query, and what response was generated.

    Comparison: Cloud RAG vs. Air-Gapped RAG for Financial Services

    DimensionCloud-Hosted RAGAir-Gapped RAG (Ertas)
    PII exposure riskHigh — document text sent to external APIsNone — all processing on-premise
    SOC 2 Type II auditRequires vendor SOC 2 reports and shared responsibility modelFully within your audit perimeter
    GDPR right to erasureDifficult to verify deletion across third-party systemsFull control — delete and re-index locally
    MiFID II record-keepingAudit logs split across vendor and internal systemsSingle immutable log store on-premise
    Internet dependencyRequired for embeddings, vector DB, and inferenceNone — fully air-gapped operation
    PII redactionManual or third-party service (data leaves perimeter)Ertas PII Redactor — local, no internet
    Embedding model controlVendor-selected, may change without noticeYou choose and version-control the model
    LatencyVariable — depends on API response timesPredictable — local network only
    Cost modelPer-token and per-query fees that scale with usageFixed infrastructure cost, no per-query fees
    Vendor lock-inHigh — proprietary embeddings, vector formatsNone — open-source components throughout

    PII Handling: The Make-or-Break Requirement

    The single biggest differentiator for financial services RAG is PII handling. Most RAG pipelines treat PII as someone else's problem. In financial services, PII is the core data.

    A best-in-class RAG pipeline for sensitive documents must handle PII at three levels:

    Pre-embedding redaction. Certain PII fields (SSNs, full account numbers) should be redacted or tokenized before embedding. The embeddings should encode the semantic content of the document without encoding recoverable PII. Ertas PII Redactor handles this automatically for financial identifier types.

    Field-level access controls. Different users should see different levels of PII in retrieved results. A compliance officer reviewing AML alerts needs full account details. A research analyst querying market commentary does not. The RAG pipeline must enforce these controls at retrieval time, not just at the UI layer.

    Deletion and re-indexing. When a customer exercises their right to erasure, the pipeline must delete all chunks derived from that customer's documents, remove corresponding vectors from the store, and verify that no residual data remains. This is straightforward with a local vector store. It is nearly impossible to verify with a cloud-hosted one.

    Hardware Requirements

    An air-gapped RAG pipeline for a mid-size financial institution (processing 50,000 to 500,000 documents) requires modest hardware:

    • Embedding server: 1x NVIDIA T4 16GB or equivalent. CPU-only is viable for collections under 50,000 chunks but slower for batch re-indexing.
    • Vector store: 64GB RAM, 1TB NVMe SSD. Scales linearly with collection size.
    • Inference server: 1x NVIDIA T4 16GB for 7B-8B parameter models. Add a second for high availability.
    • Audit log store: Append-only storage, sized for five to seven years of retention. 500GB covers most deployments.

    Total hardware cost is typically between $20,000 and $50,000 — a fraction of annual cloud RAG API costs at financial services query volumes.

    Getting Started

    The fastest path to an air-gapped RAG pipeline for financial data is to start with PII handling. If your PII detection and redaction pipeline is solid, the rest of the architecture follows standard patterns.

    Ertas Data Suite provides the PII Redactor as part of its on-premise desktop application. It handles the financial identifiers that generic PII tools miss — account number formats, tax ID patterns across jurisdictions, and institution-specific reference numbers. No internet connection required. Full audit trail for every redaction decision.

    From there, pair it with open-source embedding models and a self-hosted vector store. The best RAG pipeline for enterprise financial services is the one where no data leaves your perimeter — and you can prove it to every auditor who asks.

    Turn unstructured data into AI-ready datasets — without it leaving the building.

    On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.

    Keep reading