Embedding Drift and Stale Vectors: The Silent RAG Pipeline Killer

Your RAG pipeline worked perfectly three months ago. Retrieval was sharp, answers were accurate, and stakeholders were happy. Now the same queries return worse results, users complain that the system "used to be better," and you cannot pinpoint when it started degrading. Welcome to embedding drift.

This article explains the mechanics of how and why embeddings go stale, how to detect drift before users notice, and how to remediate it without rebuilding your entire pipeline from scratch.

What Embedding Drift Actually Is

Embedding drift is the gradual divergence between the semantic representations stored in your vector database and the actual meaning of the content they represent. It manifests in two distinct ways:

Content drift: The source documents have changed, but the embeddings in the vector store still reflect the old versions. A policy document was updated last month, but the vector store contains embeddings from the original version. Queries about the new policy retrieve the old content.

Model drift: The embedding model you used for indexing has been updated, deprecated, or replaced. If you re-embed even a subset of documents with a newer model version, those new embeddings exist in a different vector space than the original embeddings. Similarity scores between old and new embeddings are meaningless — you are comparing coordinates from two different maps.

Both types of drift are invisible. Your pipeline keeps running. The vector database keeps returning results. The similarity scores look normal. But the results are wrong.

How Content Drift Happens in Practice

Content drift does not require dramatic changes. These everyday scenarios introduce it:

Document updates without reindexing. Someone updates a pricing page, replaces a policy PDF, or edits a knowledge base article. The new version exists in the source system, but nobody triggers reindexing. The vector store serves the old embeddings indefinitely.

Partial corpus updates. A reindexing job runs but fails halfway through. Some documents get new embeddings, others keep stale ones. There is no error — the job just processed 60% of the corpus before timing out.

Schema changes in source documents. A CRM changes its export format. Field names shift, column order changes, or new fields are added. The ingestion pipeline does not break — it just parses the new format differently, producing chunks with different structure than the originals.

Deleted documents that persist as vectors. A document is removed from the source system, but its embeddings remain in the vector store. The RAG pipeline retrieves information from a document that no longer exists — and the user has no way to verify it.

How Model Drift Happens

Model drift is rarer but more dangerous because it corrupts the entire vector space:

Embedding provider updates. OpenAI, Cohere, and other embedding API providers update their models. If you indexed your corpus with text-embedding-ada-002 and later queries use text-embedding-3-small, the similarity scores are comparing vectors from incompatible spaces. Some providers maintain backward compatibility; many do not.

Self-hosted model version changes. You upgrade your local embedding model (a new sentence-transformers release, a patched model checkpoint) and start querying with the new model without reindexing the existing corpus.

Dimension mismatches. A model update changes the embedding dimension (e.g., 1536 to 3072). This usually causes hard errors. But if you configured dimensionality reduction and the reduced dimensions happen to match, the pipeline runs without errors while producing meaningless similarity scores.

The Detection Checklist

Use these techniques to detect embedding drift before users report degraded quality.

1. Retrieval Quality Regression Tests

Maintain a set of 50-100 known query-document pairs (a "golden set") where you know the correct document for each query. Run this test suite weekly. Track hit rate at top-1, top-3, and top-5. A declining hit rate is the clearest signal of drift.

2. Freshness Auditing

Store a last_indexed_at timestamp in every chunk's metadata. Run a weekly query: "What percentage of chunks were last indexed more than 30 days ago?" If that number climbs above 20%, you have content drift.

3. Source Hash Comparison

When indexing, store a hash of the source document content alongside the chunk metadata. Periodically compare the stored hashes against the current source documents. Any mismatch means the embeddings are stale.

4. Embedding Model Version Tracking

Record the embedding model identifier and version with every chunk. If your vector store contains chunks from multiple model versions, you have model drift. This should be a hard alert, not a warning.

5. Similarity Score Distribution Monitoring

Track the distribution of similarity scores returned by your retrieval queries over time. A shift in the distribution (scores clustering lower, or the spread widening) can indicate that embeddings and queries are drifting apart.

6. User Feedback Correlation

If you collect thumbs-up/thumbs-down on answers, correlate negative feedback with the age of retrieved chunks. If users consistently reject answers sourced from older chunks, those chunks are likely stale.

Reindexing Strategy Comparison

When drift is detected, how you reindex matters. Each strategy has different trade-offs:

Strategy	When to Use	Pros	Cons
Full reindex	Model version change, initial setup, quarterly maintenance	Guarantees consistency; eliminates all drift; simplest to reason about	Expensive (compute + API costs); downtime or dual-index complexity; can take hours for large corpora
Incremental reindex	Document updates, daily/weekly maintenance	Only re-embeds changed documents; fast; low cost	Requires change detection logic; does not catch model drift; can accumulate errors over time
Rolling reindex	Continuous freshness requirement, large corpora	Reindexes a fixed percentage of the corpus each day (e.g., 5%); full corpus refreshed every 20 days	Higher baseline compute cost; chunks are at different freshness levels at any given time
Triggered reindex	Event-driven updates (CMS webhook, file watcher)	Reindexes immediately when source changes; lowest latency for freshness	Requires integration with source systems; burst compute on high-change days; does not catch silent drift
Shadow reindex	Zero-downtime production systems	Build new index alongside old one; swap atomically when complete; no queries hit partially-reindexed state	Requires 2x storage; more complex infrastructure; swap logic needs testing

Which Strategy for Which Situation

Startup with a few hundred documents: Triggered reindex with a weekly full reindex as a safety net. The corpus is small enough that full reindexing takes minutes.

Mid-market product with 10K-100K documents: Incremental reindex on document change events, with a monthly full reindex scheduled during off-hours. Use source hash comparison to catch missed updates.

Enterprise with 500K+ documents: Rolling reindex (3-5% daily) as the baseline, with triggered reindex for high-priority document categories. Shadow reindex for quarterly model upgrades. Full reindex only for embedding model changes.

The Embedding Model Upgrade Decision

At some point, a better embedding model becomes available and you face a choice: upgrade and reindex everything, or stay on the current model.

Upgrade when:

Retrieval quality regression tests show the new model outperforms the current one on your golden set by more than 5%
Your current model is being deprecated by the provider
You are already planning a full reindex for another reason (new chunking strategy, schema change)
The new model reduces embedding dimensions, saving storage and compute

Do not upgrade when:

The improvement on benchmarks is marginal (under 3% on your specific queries)
You cannot afford the downtime or compute cost for a full reindex
You are mid-sprint on a product feature and cannot allocate engineering time to validate the migration

Never do this:

Mix embeddings from different models in the same vector index without metadata-based isolation
Upgrade the query-side model without reindexing the document-side embeddings
Assume backward compatibility between model versions without testing

Building Drift-Resistant Pipelines

The most effective defense against embedding drift is a pipeline architecture that makes freshness observable and reindexing routine — not an emergency procedure.

Key principles:

Every chunk carries provenance metadata. Source document ID, content hash, embedding model version, index timestamp. Without this metadata, you cannot diagnose drift, only observe its effects.

Change detection is automated. Whether through file system watchers, CMS webhooks, or scheduled hash comparisons, your pipeline should know when source documents change without a human checking manually.

Reindexing is a button press, not a project. If reindexing requires an engineer to write a script, SSH into a server, and babysit the process, it will not happen frequently enough. Reindexing should be a pipeline you trigger and walk away from.

Freshness is measured and reported. A dashboard or alert that shows the percentage of chunks older than your freshness threshold keeps drift visible to the team, not hidden until users complain.

Ertas Data Suite builds these principles into the pipeline architecture. The visual canvas makes every stage of indexing explicit — from file import through parsing, PII redaction, chunking, and embedding to vector store writing. When you need to reindex, you re-run the pipeline on changed documents. Every node logs what it processed and when. Freshness is not a mystery you need to investigate; it is visible on the canvas.

The Cost of Doing Nothing

Teams that ignore embedding drift pay a compounding tax. Retrieval quality degrades by 1-2% per month — imperceptible week to week, devastating over a quarter. By the time users complain loudly enough to trigger an investigation, the vector store may contain months of stale embeddings, and the remediation effort is a full reindex instead of the incremental maintenance that would have prevented the problem.

Treat your vector store like a database, not a write-once archive. The data in it needs to stay current, and the infrastructure around it needs to make that easy. If reindexing is painful, you will avoid it. If you avoid it, your RAG pipeline will slowly, silently, stop working.