RAG Chunking Strategy Benchmark: Fixed-Size vs Semantic vs Document-Aware

Chunking is the single highest-leverage decision in any RAG pipeline. Get it right and your retrieval accuracy jumps 15-30 percentage points. Get it wrong and no amount of prompt engineering or model upgrades will compensate.

Yet most teams pick a chunking strategy based on blog posts or defaults in their framework of choice, not on empirical data. This article provides that data. We benchmarked five chunking strategies across a standardized enterprise document corpus and measured what actually matters: retrieval accuracy, latency, token efficiency, and robustness across document types.

The Five Strategies

Before the numbers, a brief primer on each approach.

Fixed-size chunking splits documents into chunks of a predetermined token count (typically 256-512 tokens) with optional overlap. It is the simplest approach and the default in most RAG frameworks. Every chunk is the same size regardless of content structure.

Recursive character splitting uses a hierarchy of separators — paragraph breaks, then sentence boundaries, then word boundaries — to split documents at natural breakpoints while staying within a target chunk size. LangChain popularized this approach, and it remains the most commonly deployed strategy in production systems.

Semantic chunking uses an embedding model to detect topic boundaries within a document. Adjacent sentences are grouped based on cosine similarity of their embeddings, and a new chunk starts when the similarity drops below a threshold. This produces variable-sized chunks that correspond to coherent topics.

Document-aware chunking leverages document structure — headings, sections, tables, lists — to define chunk boundaries. A section with its heading becomes one chunk. A table stays intact rather than being split mid-row. This requires a parser that understands document layout, not just raw text.

Sliding window creates overlapping chunks at fixed intervals, where each chunk shares a percentage of tokens with its neighbors (typically 20-50% overlap). This ensures no information falls at a boundary, at the cost of increased index size and token usage.

Test Methodology

We constructed a benchmark corpus from four enterprise document types:

Contracts (50 documents): Multi-party agreements with nested clauses, defined terms, and cross-references
Technical manuals (50 documents): Structured documentation with headings, code blocks, tables, and numbered procedures
Financial reports (50 documents): Annual reports with narrative sections, data tables, footnotes, and charts
Support tickets (50 documents): Unstructured text with short messages, timestamps, and mixed formatting

For each document type, we created 100 ground-truth question-answer pairs where the answer exists within a specific passage. Retrieval accuracy is measured as Recall@5 — the percentage of queries where the correct passage appears in the top 5 retrieved chunks.

Embedding model: OpenAI text-embedding-3-large (3072 dimensions) Vector store: Qdrant with HNSW indexing Chunk target size: 512 tokens (where applicable) Overlap: 20% for fixed-size and sliding window strategies

All strategies were tested on identical hardware (32-core CPU, 64GB RAM) with the same embedding model and vector store configuration.

Benchmark Results

Strategy	Retrieval Accuracy (Recall@5)	Avg Latency (ms)	Token Efficiency	Index Size (relative)
Fixed-size (512 tokens)	71.3%	12ms	1.0x (baseline)	1.0x
Recursive character	78.6%	14ms	1.05x	1.02x
Semantic	83.2%	38ms	0.92x	0.95x
Document-aware	86.7%	16ms	0.88x	0.91x
Sliding window (50% overlap)	76.8%	13ms	1.82x	1.45x

The results tell a clear story. Document-aware chunking achieves the highest retrieval accuracy (86.7%) while also being the most token-efficient. Semantic chunking comes close in accuracy (83.2%) but at significantly higher latency due to the embedding-based boundary detection during indexing. Fixed-size chunking, despite being the most common default, ranks last in retrieval accuracy.

Results by Document Type

The aggregate numbers mask important differences across document types.

Strategy	Contracts	Technical Manuals	Financial Reports	Support Tickets
Fixed-size	64.0%	73.0%	68.0%	80.0%
Recursive character	72.0%	81.0%	76.0%	85.0%
Semantic	80.0%	84.0%	82.0%	87.0%
Document-aware	89.0%	91.0%	88.0%	78.0%
Sliding window	70.0%	79.0%	74.0%	84.0%

Document-aware chunking dominates on structured documents (contracts, manuals, reports) where heading and section boundaries carry semantic meaning. However, it underperforms on support tickets — unstructured, short-form text with no reliable document structure to leverage. For unstructured content, semantic chunking is the strongest performer.

This is the key insight: the best chunking strategy depends on your document mix. Teams processing primarily structured enterprise documents (contracts, reports, manuals) should default to document-aware chunking. Teams handling unstructured or mixed-format content benefit more from semantic chunking.

Latency Breakdown

Latency in the table above measures query-time retrieval latency, not indexing time. Indexing latency differences are more dramatic:

Strategy	Indexing Time (200 docs)	Indexing Time (10K docs)
Fixed-size	4 min	3.2 hrs
Recursive character	5 min	3.8 hrs
Semantic	22 min	18.4 hrs
Document-aware	8 min	6.1 hrs
Sliding window	6 min	4.8 hrs

Semantic chunking's indexing time is 4-5x longer than alternatives because it must embed every sentence to detect topic boundaries. For pipelines that re-index frequently or process high volumes, this cost adds up. Document-aware chunking requires a capable document parser but avoids the embedding overhead during indexing.

Token Efficiency and Cost Implications

Token efficiency measures how many tokens are consumed per query when retrieving context. Sliding window's 1.82x overhead means nearly double the embedding and LLM context costs compared to fixed-size chunking.

At enterprise scale (10,000 queries per day), the cost differences are meaningful:

Strategy	Monthly Embedding Cost	Monthly LLM Context Cost	Total Monthly Overhead
Fixed-size	$450	$1,200	$1,650 (baseline)
Recursive character	$473	$1,260	$1,733
Semantic	$414	$1,104	$1,518
Document-aware	$396	$1,056	$1,452
Sliding window	$819	$2,184	$3,003

Document-aware chunking is not only the most accurate but also the cheapest to operate at scale. Sliding window — often recommended as a "safe default" — is the most expensive, nearly 2x the cost of document-aware chunking for lower accuracy.

When to Use Each Strategy

Fixed-size (512 tokens): Prototyping and rapid iteration where simplicity matters more than accuracy. Acceptable for homogeneous, paragraph-level content like blog posts or wiki articles. Not recommended for production enterprise RAG.

Recursive character: A reasonable default when you need better-than-fixed accuracy without the complexity of semantic or document-aware parsing. Good for teams just starting with RAG who want incremental improvement over fixed-size.

Semantic: Best for unstructured content where document layout provides no useful signal — customer emails, chat logs, social media, support tickets. The indexing latency penalty makes it less suitable for high-volume pipelines with frequent re-indexing.

Document-aware: The clear winner for structured enterprise documents — contracts, reports, manuals, policies, specifications. Requires a parser that understands document structure (headings, tables, sections), but the accuracy and cost benefits justify the investment.

Sliding window: Useful only when you cannot tolerate any information loss at chunk boundaries and are willing to pay the token overhead. Consider it for safety-critical applications where missing a passage is more costly than higher operating expenses.

Implementation Considerations

Choosing a strategy is only part of the challenge. Implementation details matter significantly:

Chunk size selection. Even within a strategy, chunk size dramatically affects performance. Our testing showed a sweet spot between 256 and 768 tokens for most enterprise documents. Chunks smaller than 200 tokens lose context; chunks larger than 1,000 tokens dilute relevance.

Metadata preservation. Regardless of strategy, attaching metadata (document title, section heading, page number) to each chunk improves retrieval by 8-12% in our tests. This metadata enables hybrid search and provides context for reranking.

Hybrid approaches. The highest-performing production systems we have observed combine document-aware chunking for structured content with semantic chunking as a fallback for unstructured sections. This requires a document classifier upstream in the pipeline but achieves 89-92% Recall@5 across mixed corpora.

How Ertas Approaches Chunking

Ertas Data Suite includes a RAG Chunker node that supports multiple chunking strategies within the visual pipeline canvas. Because Ertas processes documents through structured parsing nodes (PDF Parser, Word Parser, Excel/CSV Parser) before chunking, the document structure — headings, tables, sections — is already extracted and available.

This makes document-aware chunking a natural fit. The RAG Chunker node receives parsed, structured output from upstream nodes and can leverage that structure to define chunk boundaries. Teams can also chain the Quality Scorer node after chunking to flag low-quality chunks before they reach the embedding stage.

For teams processing mixed document types, Ertas pipelines can route structured and unstructured documents through different chunking configurations on the same canvas, with full observability at every stage.

Key Takeaways

Document-aware chunking achieves the highest retrieval accuracy (86.7% Recall@5) and the best token efficiency across structured enterprise documents. Semantic chunking is the strongest choice for unstructured content but carries a significant indexing latency penalty. Fixed-size chunking, while simple, leaves 15+ percentage points of accuracy on the table compared to document-aware approaches.

The choice of chunking strategy has a direct, measurable impact on both RAG quality and operating costs. Teams building enterprise RAG pipelines should benchmark against their own document corpus, but the data suggests that investing in document-aware parsing and chunking pays for itself quickly — in both retrieval accuracy and reduced token spend.

RAG Chunking Strategy Benchmark: Fixed-Size vs Semantic vs Document-Aware

The Five Strategies

Test Methodology

Benchmark Results

Results by Document Type

Latency Breakdown

Token Efficiency and Cost Implications

When to Use Each Strategy

Implementation Considerations

How Ertas Approaches Chunking

Key Takeaways

Turn unstructured data into AI-ready datasets — without it leaving the building.

Keep reading

Embedding Model Benchmark for Enterprise RAG (2026): OpenAI, Cohere, BGE, E5, GTE, Nomic Compared

Enterprise Data Pipeline Benchmark Report 2026: Parsing, Redaction, Chunking, and Embedding Compared

On-Premise vs Cloud Data Pipeline Throughput: Enterprise Document Processing Benchmarks