Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data

Financial statements, customer PII, and threat intelligence data must stay in air-gapped environments. That is not a preference — it is a regulatory requirement. Yet most RAG pipeline vendors assume internet connectivity for embeddings, vector database hosting, and model inference. That assumption disqualifies them from the conversation before a single document is ingested.

This article covers how to build a RAG pipeline for financial services that operates entirely on-premise, handles PII-heavy documents without exposure risk, and satisfies the compliance frameworks that govern the industry.

Why Standard RAG Pipelines Fail in Financial Services

A typical RAG pipeline sends documents to a cloud embedding API, stores vectors in a hosted database, and calls a cloud LLM at inference time. Each of those three steps creates a compliance violation for most financial institutions.

Embedding API calls transmit raw document text. When a financial analyst queries a RAG system about a client's portfolio, the retrieval step sends document chunks — containing account numbers, SSNs, transaction histories — to an external API. That is a data breach under most regulatory frameworks, regardless of whether the API provider claims SOC 2 compliance on their end.

Hosted vector databases store document representations externally. Even though embeddings are not human-readable, they can be inverted to reconstruct approximate document content. Storing them on third-party infrastructure means PII has left your perimeter.

Cloud LLM inference exposes query context. The retrieved chunks, combined with the user query, are sent to a cloud model. The full context window — including PII from retrieved documents — is now on someone else's servers.

An air-gapped RAG pipeline eliminates all three failure points. Every component runs within your network perimeter. No data leaves.

Compliance Requirements That Shape the Architecture

Financial services RAG deployments must satisfy overlapping regulatory frameworks. The architecture is not optional — it is dictated by the following requirements.

SOC 2 Type II

SOC 2 Type II audits evaluate controls over a minimum six-month period. For a RAG pipeline, this means:

Access controls on who can query which document collections
Audit logging of every retrieval and inference event, with user identity, timestamp, documents retrieved, and query text
Change management for model updates, embedding model swaps, and index rebuilds
Encryption at rest for the vector store and document store
Encryption in transit for all internal API calls between pipeline components

GDPR applies to any financial institution handling EU citizen data, regardless of where the institution is headquartered.

Right to erasure (Art. 17): You must be able to delete a specific individual's data from the vector store and re-index without that data. Cloud-hosted embeddings make this nearly impossible to verify.
Data portability (Art. 20): The RAG system must be able to export all data associated with a data subject in a portable format.
Data protection by design (Art. 25): PII must be identified and handled with appropriate safeguards at every stage — ingestion, chunking, embedding, storage, retrieval, and generation.
DPIA (Art. 35): A data protection impact assessment is required before deploying AI systems that process PII at scale.

MiFID II Record-Keeping

MiFID II requires financial firms to retain records of all communications and decisions related to client transactions. If a RAG-powered system contributes to investment research, risk assessment, or client communication, every query and every generated response must be retained for a minimum of five years — seven years in some jurisdictions.

This means the RAG pipeline needs an immutable audit log with the following fields per event: timestamp, user identity, query text, retrieved document IDs with relevance scores, generated response, and model version.

The Air-Gapped RAG Architecture

An air-gapped RAG pipeline for financial data has five stages, all running within the network perimeter.

Stage 1: Document Ingestion and PII Detection

Raw documents enter the pipeline — financial statements, KYC forms, transaction records, compliance reports. Before any processing, a PII detection pass identifies and tags sensitive fields: account numbers, SSNs, tax IDs, names, addresses, dates of birth.

This is where Ertas Data Suite's PII Redactor operates. Running as a desktop application with no internet requirement, it scans incoming documents and tags every financial identifier. The tagged PII metadata travels with the document through the pipeline, enabling field-level access controls downstream.

Stage 2: Chunking and Preprocessing

Tagged documents are split into retrieval-friendly chunks. Financial documents require domain-aware chunking:

Table-aware splitting preserves financial tables as atomic units rather than splitting rows across chunks
Section-boundary detection keeps regulatory filing sections (risk factors, management discussion, financial statements) intact
Metadata propagation ensures every chunk inherits the PII tags from its source document

Stage 3: Local Embedding Generation

An open-source embedding model runs on-premise. No API calls. Models in the 300M-500M parameter range (such as E5-large or BGE-large) produce high-quality embeddings on modest hardware — a single GPU or even CPU-only inference for smaller document collections.

Embedding generation is a batch process. A collection of 100,000 document chunks can be embedded in under two hours on a single NVIDIA T4.

Stage 4: Local Vector Storage and Retrieval

The vector store runs on-premise. Open-source options like Qdrant, Milvus, or Weaviate deploy as self-hosted services within your network. No data leaves.

Retrieval queries run locally. When a user queries the system, the query is embedded using the same local model, similarity search runs against the local vector store, and the top-k chunks are returned — all within the air-gapped perimeter.

Stage 5: Local Inference with Audit Logging

A locally deployed LLM generates responses using retrieved context. The model, the query, and the retrieved chunks never leave your infrastructure. Every inference event is logged to the immutable audit store with full provenance: which documents were retrieved, which user initiated the query, and what response was generated.

Comparison: Cloud RAG vs. Air-Gapped RAG for Financial Services

Dimension	Cloud-Hosted RAG	Air-Gapped RAG (Ertas)
PII exposure risk	High — document text sent to external APIs	None — all processing on-premise
SOC 2 Type II audit	Requires vendor SOC 2 reports and shared responsibility model	Fully within your audit perimeter
GDPR right to erasure	Difficult to verify deletion across third-party systems	Full control — delete and re-index locally
MiFID II record-keeping	Audit logs split across vendor and internal systems	Single immutable log store on-premise
Internet dependency	Required for embeddings, vector DB, and inference	None — fully air-gapped operation
PII redaction	Manual or third-party service (data leaves perimeter)	Ertas PII Redactor — local, no internet
Embedding model control	Vendor-selected, may change without notice	You choose and version-control the model
Latency	Variable — depends on API response times	Predictable — local network only
Cost model	Per-token and per-query fees that scale with usage	Fixed infrastructure cost, no per-query fees
Vendor lock-in	High — proprietary embeddings, vector formats	None — open-source components throughout

PII Handling: The Make-or-Break Requirement

The single biggest differentiator for financial services RAG is PII handling. Most RAG pipelines treat PII as someone else's problem. In financial services, PII is the core data.

A best-in-class RAG pipeline for sensitive documents must handle PII at three levels:

Pre-embedding redaction. Certain PII fields (SSNs, full account numbers) should be redacted or tokenized before embedding. The embeddings should encode the semantic content of the document without encoding recoverable PII. Ertas PII Redactor handles this automatically for financial identifier types.

Field-level access controls. Different users should see different levels of PII in retrieved results. A compliance officer reviewing AML alerts needs full account details. A research analyst querying market commentary does not. The RAG pipeline must enforce these controls at retrieval time, not just at the UI layer.

Deletion and re-indexing. When a customer exercises their right to erasure, the pipeline must delete all chunks derived from that customer's documents, remove corresponding vectors from the store, and verify that no residual data remains. This is straightforward with a local vector store. It is nearly impossible to verify with a cloud-hosted one.

Hardware Requirements

An air-gapped RAG pipeline for a mid-size financial institution (processing 50,000 to 500,000 documents) requires modest hardware:

Embedding server: 1x NVIDIA T4 16GB or equivalent. CPU-only is viable for collections under 50,000 chunks but slower for batch re-indexing.
Vector store: 64GB RAM, 1TB NVMe SSD. Scales linearly with collection size.
Inference server: 1x NVIDIA T4 16GB for 7B-8B parameter models. Add a second for high availability.
Audit log store: Append-only storage, sized for five to seven years of retention. 500GB covers most deployments.

Total hardware cost is typically between $20,000 and $50,000 — a fraction of annual cloud RAG API costs at financial services query volumes.

Getting Started

The fastest path to an air-gapped RAG pipeline for financial data is to start with PII handling. If your PII detection and redaction pipeline is solid, the rest of the architecture follows standard patterns.

Ertas Data Suite provides the PII Redactor as part of its on-premise desktop application. It handles the financial identifiers that generic PII tools miss — account number formats, tax ID patterns across jurisdictions, and institution-specific reference numbers. No internet connection required. Full audit trail for every redaction decision.

From there, pair it with open-source embedding models and a self-hosted vector store. The best RAG pipeline for enterprise financial services is the one where no data leaves your perimeter — and you can prove it to every auditor who asks.

Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data

Why Standard RAG Pipelines Fail in Financial Services