
RAG Chunking Strategies: How Chunk Size, Overlap, and Boundary Detection Affect Retrieval Quality
Chunking is the most underestimated step in a RAG pipeline. Too large and you waste context window. Too small and you lose meaning. Wrong boundaries and you split sentences mid-thought. Here is how to get it right.
Most teams building RAG pipelines spend their time optimizing embedding models, vector databases, and prompt templates. They treat chunking as a detail — split the documents into pieces, embed them, move on. This is a mistake. Chunking is the single most impactful variable in retrieval quality, and getting it wrong cascades through every downstream step.
A bad RAG chunking strategy produces embeddings that are either too generic to match specific queries or too fragmented to carry useful context. The retriever returns irrelevant passages. The generator hallucinates or hedges. Users lose trust. The fix is rarely a better model — it is better chunks.
Why Chunking Matters More Than You Think
When a user queries a RAG system, the retriever searches for the chunks most semantically similar to the query. If your chunks are 4,000 tokens each, the embedding represents an average of everything in that block. Specific details get diluted. A question about a particular compliance clause returns a chunk containing three pages of unrelated policy language, and the model has to hunt for the relevant sentence buried inside.
If your chunks are 50 tokens each, you get the opposite problem. The embedding captures a sentence fragment with no surrounding context. The retriever might find the right fragment, but the generator has no way to produce a coherent answer from a sentence that was cut in half.
The best RAG chunking tool for enterprise documents gives you control over three variables: chunk size, overlap percentage, and boundary detection method. These three settings interact with each other, and the right combination depends on your document type.
Fixed-Size Chunking vs. Semantic Chunking
The simplest approach is fixed-size chunking: split every document into blocks of N tokens regardless of content structure. This is fast, predictable, and easy to implement. It is also the wrong default for most enterprise use cases.
Fixed-size chunking ignores document structure entirely. A 512-token window might split a table in half, cut a paragraph mid-sentence, or combine the end of one section with the beginning of an unrelated section. The resulting embeddings are noisy, and retrieval quality suffers.
Semantic chunking respects the natural boundaries in a document. Instead of splitting at a fixed token count, it splits at sentence boundaries, paragraph breaks, or section headers. The chunks vary in size, but each one represents a coherent unit of meaning. Embeddings are cleaner, and retrieval is more precise.
The best way to chunk PDFs for RAG retrieval is almost always semantic chunking with a maximum size constraint. You set an upper bound — say 512 tokens — and the chunker splits at the nearest natural boundary below that limit. This gives you the consistency of fixed-size chunking with the coherence of semantic chunking.
Boundary Detection Methods
Not all boundaries are created equal. The right boundary detection method depends on your document structure.
Sentence boundaries split at sentence-ending punctuation. This is the safest default for unstructured text — articles, emails, support tickets, legal prose. Every chunk contains complete sentences, so embeddings capture complete thoughts. The downside is that sentence-level chunks can be very small, especially in documents with short sentences.
Paragraph boundaries split at double line breaks or explicit paragraph markers. This works well for well-formatted documents — reports, contracts, policy manuals. Each chunk captures a full idea or argument. Paragraph-level chunks tend to be larger and more self-contained, which improves generator performance at the cost of slightly lower retrieval precision.
Section boundaries split at headers (H1, H2, H3 in HTML or Markdown, or detected section titles in PDFs). This is the most aggressive boundary detection and works best for highly structured documents — technical documentation, compliance frameworks, product manuals. Each chunk maps to a logical section of the document, which makes retrieval results easier to trace back to their source.
In practice, you want hierarchical boundary detection: try section boundaries first, fall back to paragraph boundaries if sections are too large, and fall back to sentence boundaries as a last resort. This is the approach that produces the most consistently useful chunks across mixed document types.
Overlap: The Overlooked Setting
Chunk overlap is the percentage of tokens shared between adjacent chunks. If you have 512-token chunks with 10% overlap, each chunk shares approximately 51 tokens with the next chunk. Those shared tokens appear in both embeddings.
Why does this matter? Without overlap, information that spans a chunk boundary is lost. A sentence that starts at token 510 and ends at token 530 gets split across two chunks, and neither chunk captures the full meaning. Overlap ensures that boundary-spanning content appears in at least one chunk in its complete form.
The tradeoff is storage and compute. Higher overlap means more chunks per document, which means more embeddings to store and more candidates to search. For most enterprise deployments, the sweet spot is between 10% and 20% overlap. Below 10%, you lose too much boundary context. Above 20%, you are storing redundant information with diminishing returns.
Zero overlap is only appropriate when your boundary detection is reliable enough that no meaningful content spans boundaries — typically section-level chunking on well-structured documents.
Practical Settings by Document Type
The following table summarizes recommended starting configurations for common enterprise document types. These are starting points — you should validate with your own retrieval benchmarks.
| Document Type | Chunk Size (tokens) | Overlap | Boundary Detection | Notes |
|---|---|---|---|---|
| Legal contracts | 256–512 | 15% | Paragraph | Clauses are self-contained paragraphs |
| Policy manuals | 512–768 | 10% | Section then paragraph | Hierarchical structure maps well to sections |
| Support tickets | 128–256 | 10% | Sentence | Short documents, conversational language |
| Technical docs | 512–1024 | 15% | Section then paragraph | Code blocks should stay intact |
| Financial reports | 256–512 | 20% | Paragraph | Tables and figures need surrounding context |
| Meeting transcripts | 256–512 | 15% | Sentence | Speaker turns create natural boundaries |
| Research papers | 512–768 | 10% | Section | Abstract, methods, results are distinct sections |
| Email threads | 128–256 | 10% | Paragraph | Each message is a natural chunk |
Measuring Chunk Quality
Configuring chunk settings without measuring their impact is guesswork. You need a feedback loop: change settings, re-chunk, re-embed, and evaluate retrieval quality on a test set of queries.
The metrics that matter are retrieval precision (what percentage of retrieved chunks are actually relevant), retrieval recall (what percentage of relevant chunks were retrieved), and answer quality (does the generator produce correct, complete answers from the retrieved chunks).
A common failure mode is optimizing for precision alone. You can get perfect precision by making chunks extremely large — every chunk contains the answer because every chunk contains everything. But this wastes context window and degrades generator performance. The goal is the smallest chunks that still carry enough context for the generator to produce good answers.
How Ertas Handles Chunking
The Ertas RAG Chunker node gives you direct control over chunk size, overlap percentage, and boundary detection method — sentence, paragraph, or section. You configure these settings per pipeline, which means you can use different chunking strategies for different document types within the same workflow.
The visual pipeline shows element counts at each stage, so you can immediately see how changing chunk size from 512 to 256 tokens doubles your document count. This visibility makes it practical to experiment with settings and understand their impact before committing to a configuration.
After chunking, the Quality Scorer node validates chunk quality by checking for truncated sentences, overly short or long chunks, and content coherence. This catches configuration mistakes early — before bad chunks propagate through embedding and indexing.
Getting Started
If you are building a RAG pipeline and have not spent time on your chunking strategy, start here:
- Identify your primary document types and their structural characteristics.
- Choose a boundary detection method that matches your document structure.
- Set chunk size to 512 tokens as a baseline and adjust based on retrieval benchmarks.
- Start with 15% overlap and reduce only if storage costs are a concern.
- Measure retrieval precision and recall on a representative query set.
- Iterate on settings until retrieval quality meets your threshold.
The best RAG chunking strategy is not a universal configuration — it is a systematic process of matching chunk settings to document characteristics and validating with real queries. The teams that invest in this process consistently outperform those that treat chunking as an afterthought.
Turn unstructured data into AI-ready datasets — without it leaving the building.
On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.
Keep reading

Why Your RAG Pipeline Fails Silently — And How to Make It Observable
Most RAG pipelines are invisible glue code. When retrieval quality drops, there is no logging, no node-level metrics, and no way to trace which document caused the bad answer. Here is how to build observable RAG infrastructure.

Best Visual RAG Pipeline Builder: From Documents to Retrieval Endpoint Without Writing Code
Building RAG pipelines typically requires Python expertise across five or more libraries. A visual pipeline builder lets domain experts and engineers alike build production RAG by dragging and connecting nodes on a canvas.

RAG Pipeline Architecture: Indexing vs Retrieval as Separate Concerns
Most RAG implementations tangle indexing and retrieval into one codebase. Separating them into distinct pipelines — each independently observable, deployable, and maintainable — is how production RAG systems stay reliable.