
Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data
Financial institutions handle PII-dense documents that cannot touch cloud infrastructure. Here is how to build an air-gapped RAG pipeline that meets SOC 2, GDPR, and internal audit requirements while keeping retrieval fast.
Financial statements, customer PII, and threat intelligence data must stay in air-gapped environments. That is not a preference — it is a regulatory requirement. Yet most RAG pipeline vendors assume internet connectivity for embeddings, vector database hosting, and model inference. That assumption disqualifies them from the conversation before a single document is ingested.
This article covers how to build a RAG pipeline for financial services that operates entirely on-premise, handles PII-heavy documents without exposure risk, and satisfies the compliance frameworks that govern the industry.
Why Standard RAG Pipelines Fail in Financial Services
A typical RAG pipeline sends documents to a cloud embedding API, stores vectors in a hosted database, and calls a cloud LLM at inference time. Each of those three steps creates a compliance violation for most financial institutions.
Embedding API calls transmit raw document text. When a financial analyst queries a RAG system about a client's portfolio, the retrieval step sends document chunks — containing account numbers, SSNs, transaction histories — to an external API. That is a data breach under most regulatory frameworks, regardless of whether the API provider claims SOC 2 compliance on their end.
Hosted vector databases store document representations externally. Even though embeddings are not human-readable, they can be inverted to reconstruct approximate document content. Storing them on third-party infrastructure means PII has left your perimeter.
Cloud LLM inference exposes query context. The retrieved chunks, combined with the user query, are sent to a cloud model. The full context window — including PII from retrieved documents — is now on someone else's servers.
An air-gapped RAG pipeline eliminates all three failure points. Every component runs within your network perimeter. No data leaves.
Compliance Requirements That Shape the Architecture
Financial services RAG deployments must satisfy overlapping regulatory frameworks. The architecture is not optional — it is dictated by the following requirements.
SOC 2 Type II
SOC 2 Type II audits evaluate controls over a minimum six-month period. For a RAG pipeline, this means:
- Access controls on who can query which document collections
- Audit logging of every retrieval and inference event, with user identity, timestamp, documents retrieved, and query text
- Change management for model updates, embedding model swaps, and index rebuilds
- Encryption at rest for the vector store and document store
- Encryption in transit for all internal API calls between pipeline components
GDPR (Articles 17, 20, 25, 35)
GDPR applies to any financial institution handling EU citizen data, regardless of where the institution is headquartered.
- Right to erasure (Art. 17): You must be able to delete a specific individual's data from the vector store and re-index without that data. Cloud-hosted embeddings make this nearly impossible to verify.
- Data portability (Art. 20): The RAG system must be able to export all data associated with a data subject in a portable format.
- Data protection by design (Art. 25): PII must be identified and handled with appropriate safeguards at every stage — ingestion, chunking, embedding, storage, retrieval, and generation.
- DPIA (Art. 35): A data protection impact assessment is required before deploying AI systems that process PII at scale.
MiFID II Record-Keeping
MiFID II requires financial firms to retain records of all communications and decisions related to client transactions. If a RAG-powered system contributes to investment research, risk assessment, or client communication, every query and every generated response must be retained for a minimum of five years — seven years in some jurisdictions.
This means the RAG pipeline needs an immutable audit log with the following fields per event: timestamp, user identity, query text, retrieved document IDs with relevance scores, generated response, and model version.
The Air-Gapped RAG Architecture
An air-gapped RAG pipeline for financial data has five stages, all running within the network perimeter.
Stage 1: Document Ingestion and PII Detection
Raw documents enter the pipeline — financial statements, KYC forms, transaction records, compliance reports. Before any processing, a PII detection pass identifies and tags sensitive fields: account numbers, SSNs, tax IDs, names, addresses, dates of birth.
This is where Ertas Data Suite's PII Redactor operates. Running as a desktop application with no internet requirement, it scans incoming documents and tags every financial identifier. The tagged PII metadata travels with the document through the pipeline, enabling field-level access controls downstream.
Stage 2: Chunking and Preprocessing
Tagged documents are split into retrieval-friendly chunks. Financial documents require domain-aware chunking:
- Table-aware splitting preserves financial tables as atomic units rather than splitting rows across chunks
- Section-boundary detection keeps regulatory filing sections (risk factors, management discussion, financial statements) intact
- Metadata propagation ensures every chunk inherits the PII tags from its source document
Stage 3: Local Embedding Generation
An open-source embedding model runs on-premise. No API calls. Models in the 300M-500M parameter range (such as E5-large or BGE-large) produce high-quality embeddings on modest hardware — a single GPU or even CPU-only inference for smaller document collections.
Embedding generation is a batch process. A collection of 100,000 document chunks can be embedded in under two hours on a single NVIDIA T4.
Stage 4: Local Vector Storage and Retrieval
The vector store runs on-premise. Open-source options like Qdrant, Milvus, or Weaviate deploy as self-hosted services within your network. No data leaves.
Retrieval queries run locally. When a user queries the system, the query is embedded using the same local model, similarity search runs against the local vector store, and the top-k chunks are returned — all within the air-gapped perimeter.
Stage 5: Local Inference with Audit Logging
A locally deployed LLM generates responses using retrieved context. The model, the query, and the retrieved chunks never leave your infrastructure. Every inference event is logged to the immutable audit store with full provenance: which documents were retrieved, which user initiated the query, and what response was generated.
Comparison: Cloud RAG vs. Air-Gapped RAG for Financial Services
| Dimension | Cloud-Hosted RAG | Air-Gapped RAG (Ertas) |
|---|---|---|
| PII exposure risk | High — document text sent to external APIs | None — all processing on-premise |
| SOC 2 Type II audit | Requires vendor SOC 2 reports and shared responsibility model | Fully within your audit perimeter |
| GDPR right to erasure | Difficult to verify deletion across third-party systems | Full control — delete and re-index locally |
| MiFID II record-keeping | Audit logs split across vendor and internal systems | Single immutable log store on-premise |
| Internet dependency | Required for embeddings, vector DB, and inference | None — fully air-gapped operation |
| PII redaction | Manual or third-party service (data leaves perimeter) | Ertas PII Redactor — local, no internet |
| Embedding model control | Vendor-selected, may change without notice | You choose and version-control the model |
| Latency | Variable — depends on API response times | Predictable — local network only |
| Cost model | Per-token and per-query fees that scale with usage | Fixed infrastructure cost, no per-query fees |
| Vendor lock-in | High — proprietary embeddings, vector formats | None — open-source components throughout |
PII Handling: The Make-or-Break Requirement
The single biggest differentiator for financial services RAG is PII handling. Most RAG pipelines treat PII as someone else's problem. In financial services, PII is the core data.
A best-in-class RAG pipeline for sensitive documents must handle PII at three levels:
Pre-embedding redaction. Certain PII fields (SSNs, full account numbers) should be redacted or tokenized before embedding. The embeddings should encode the semantic content of the document without encoding recoverable PII. Ertas PII Redactor handles this automatically for financial identifier types.
Field-level access controls. Different users should see different levels of PII in retrieved results. A compliance officer reviewing AML alerts needs full account details. A research analyst querying market commentary does not. The RAG pipeline must enforce these controls at retrieval time, not just at the UI layer.
Deletion and re-indexing. When a customer exercises their right to erasure, the pipeline must delete all chunks derived from that customer's documents, remove corresponding vectors from the store, and verify that no residual data remains. This is straightforward with a local vector store. It is nearly impossible to verify with a cloud-hosted one.
Hardware Requirements
An air-gapped RAG pipeline for a mid-size financial institution (processing 50,000 to 500,000 documents) requires modest hardware:
- Embedding server: 1x NVIDIA T4 16GB or equivalent. CPU-only is viable for collections under 50,000 chunks but slower for batch re-indexing.
- Vector store: 64GB RAM, 1TB NVMe SSD. Scales linearly with collection size.
- Inference server: 1x NVIDIA T4 16GB for 7B-8B parameter models. Add a second for high availability.
- Audit log store: Append-only storage, sized for five to seven years of retention. 500GB covers most deployments.
Total hardware cost is typically between $20,000 and $50,000 — a fraction of annual cloud RAG API costs at financial services query volumes.
Getting Started
The fastest path to an air-gapped RAG pipeline for financial data is to start with PII handling. If your PII detection and redaction pipeline is solid, the rest of the architecture follows standard patterns.
Ertas Data Suite provides the PII Redactor as part of its on-premise desktop application. It handles the financial identifiers that generic PII tools miss — account number formats, tax ID patterns across jurisdictions, and institution-specific reference numbers. No internet connection required. Full audit trail for every redaction decision.
From there, pair it with open-source embedding models and a self-hosted vector store. The best RAG pipeline for enterprise financial services is the one where no data leaves your perimeter — and you can prove it to every auditor who asks.
Turn unstructured data into AI-ready datasets — without it leaving the building.
On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.
Keep reading

GDPR-Compliant RAG Pipeline: Right to Erasure, Data Minimisation, and Vector Store Implications
GDPR Article 17 gives individuals the right to have their data deleted — but once personal data is embedded in a vector store, deletion is not straightforward. Here is how to build a RAG pipeline that handles GDPR from the start.

Best HIPAA-Compliant RAG Pipeline for Healthcare: On-Premise Document Retrieval Without Data Egress
Healthcare organizations need RAG for clinical AI — but cloud-based retrieval pipelines violate HIPAA when they process PHI. Here is how to build a compliant RAG pipeline that runs entirely on your infrastructure.

Best RAG Pipeline With Built-In PII Redaction: Why Retrieval Without Redaction Is a Compliance Risk
Most RAG pipelines index raw documents with PII still intact. Once sensitive data is embedded in a vector store, it is retrievable by any query. Learn how to build a GDPR-safe RAG pipeline with PII redaction before embedding.