Back to blog
    RAG Chunking Strategy Benchmark: Fixed-Size vs Semantic vs Document-Aware
    benchmarkragchunkingdata-pipelineenterprisesegment:data-engineer

    RAG Chunking Strategy Benchmark: Fixed-Size vs Semantic vs Document-Aware

    Controlled benchmark comparing five RAG chunking strategies — fixed-size, recursive, semantic, document-aware, and sliding window — across retrieval accuracy, latency, token efficiency, and best-fit use cases.

    EErtas Team·

    Chunking is the single highest-leverage decision in any RAG pipeline. Get it right and your retrieval accuracy jumps 15-30 percentage points. Get it wrong and no amount of prompt engineering or model upgrades will compensate.

    Yet most teams pick a chunking strategy based on blog posts or defaults in their framework of choice, not on empirical data. This article provides that data. We benchmarked five chunking strategies across a standardized enterprise document corpus and measured what actually matters: retrieval accuracy, latency, token efficiency, and robustness across document types.

    The Five Strategies

    Before the numbers, a brief primer on each approach.

    Fixed-size chunking splits documents into chunks of a predetermined token count (typically 256-512 tokens) with optional overlap. It is the simplest approach and the default in most RAG frameworks. Every chunk is the same size regardless of content structure.

    Recursive character splitting uses a hierarchy of separators — paragraph breaks, then sentence boundaries, then word boundaries — to split documents at natural breakpoints while staying within a target chunk size. LangChain popularized this approach, and it remains the most commonly deployed strategy in production systems.

    Semantic chunking uses an embedding model to detect topic boundaries within a document. Adjacent sentences are grouped based on cosine similarity of their embeddings, and a new chunk starts when the similarity drops below a threshold. This produces variable-sized chunks that correspond to coherent topics.

    Document-aware chunking leverages document structure — headings, sections, tables, lists — to define chunk boundaries. A section with its heading becomes one chunk. A table stays intact rather than being split mid-row. This requires a parser that understands document layout, not just raw text.

    Sliding window creates overlapping chunks at fixed intervals, where each chunk shares a percentage of tokens with its neighbors (typically 20-50% overlap). This ensures no information falls at a boundary, at the cost of increased index size and token usage.

    Test Methodology

    We constructed a benchmark corpus from four enterprise document types:

    • Contracts (50 documents): Multi-party agreements with nested clauses, defined terms, and cross-references
    • Technical manuals (50 documents): Structured documentation with headings, code blocks, tables, and numbered procedures
    • Financial reports (50 documents): Annual reports with narrative sections, data tables, footnotes, and charts
    • Support tickets (50 documents): Unstructured text with short messages, timestamps, and mixed formatting

    For each document type, we created 100 ground-truth question-answer pairs where the answer exists within a specific passage. Retrieval accuracy is measured as Recall@5 — the percentage of queries where the correct passage appears in the top 5 retrieved chunks.

    Embedding model: OpenAI text-embedding-3-large (3072 dimensions) Vector store: Qdrant with HNSW indexing Chunk target size: 512 tokens (where applicable) Overlap: 20% for fixed-size and sliding window strategies

    All strategies were tested on identical hardware (32-core CPU, 64GB RAM) with the same embedding model and vector store configuration.

    Benchmark Results

    StrategyRetrieval Accuracy (Recall@5)Avg Latency (ms)Token EfficiencyIndex Size (relative)
    Fixed-size (512 tokens)71.3%12ms1.0x (baseline)1.0x
    Recursive character78.6%14ms1.05x1.02x
    Semantic83.2%38ms0.92x0.95x
    Document-aware86.7%16ms0.88x0.91x
    Sliding window (50% overlap)76.8%13ms1.82x1.45x

    The results tell a clear story. Document-aware chunking achieves the highest retrieval accuracy (86.7%) while also being the most token-efficient. Semantic chunking comes close in accuracy (83.2%) but at significantly higher latency due to the embedding-based boundary detection during indexing. Fixed-size chunking, despite being the most common default, ranks last in retrieval accuracy.

    Results by Document Type

    The aggregate numbers mask important differences across document types.

    StrategyContractsTechnical ManualsFinancial ReportsSupport Tickets
    Fixed-size64.0%73.0%68.0%80.0%
    Recursive character72.0%81.0%76.0%85.0%
    Semantic80.0%84.0%82.0%87.0%
    Document-aware89.0%91.0%88.0%78.0%
    Sliding window70.0%79.0%74.0%84.0%

    Document-aware chunking dominates on structured documents (contracts, manuals, reports) where heading and section boundaries carry semantic meaning. However, it underperforms on support tickets — unstructured, short-form text with no reliable document structure to leverage. For unstructured content, semantic chunking is the strongest performer.

    This is the key insight: the best chunking strategy depends on your document mix. Teams processing primarily structured enterprise documents (contracts, reports, manuals) should default to document-aware chunking. Teams handling unstructured or mixed-format content benefit more from semantic chunking.

    Latency Breakdown

    Latency in the table above measures query-time retrieval latency, not indexing time. Indexing latency differences are more dramatic:

    StrategyIndexing Time (200 docs)Indexing Time (10K docs)
    Fixed-size4 min3.2 hrs
    Recursive character5 min3.8 hrs
    Semantic22 min18.4 hrs
    Document-aware8 min6.1 hrs
    Sliding window6 min4.8 hrs

    Semantic chunking's indexing time is 4-5x longer than alternatives because it must embed every sentence to detect topic boundaries. For pipelines that re-index frequently or process high volumes, this cost adds up. Document-aware chunking requires a capable document parser but avoids the embedding overhead during indexing.

    Token Efficiency and Cost Implications

    Token efficiency measures how many tokens are consumed per query when retrieving context. Sliding window's 1.82x overhead means nearly double the embedding and LLM context costs compared to fixed-size chunking.

    At enterprise scale (10,000 queries per day), the cost differences are meaningful:

    StrategyMonthly Embedding CostMonthly LLM Context CostTotal Monthly Overhead
    Fixed-size$450$1,200$1,650 (baseline)
    Recursive character$473$1,260$1,733
    Semantic$414$1,104$1,518
    Document-aware$396$1,056$1,452
    Sliding window$819$2,184$3,003

    Document-aware chunking is not only the most accurate but also the cheapest to operate at scale. Sliding window — often recommended as a "safe default" — is the most expensive, nearly 2x the cost of document-aware chunking for lower accuracy.

    When to Use Each Strategy

    Fixed-size (512 tokens): Prototyping and rapid iteration where simplicity matters more than accuracy. Acceptable for homogeneous, paragraph-level content like blog posts or wiki articles. Not recommended for production enterprise RAG.

    Recursive character: A reasonable default when you need better-than-fixed accuracy without the complexity of semantic or document-aware parsing. Good for teams just starting with RAG who want incremental improvement over fixed-size.

    Semantic: Best for unstructured content where document layout provides no useful signal — customer emails, chat logs, social media, support tickets. The indexing latency penalty makes it less suitable for high-volume pipelines with frequent re-indexing.

    Document-aware: The clear winner for structured enterprise documents — contracts, reports, manuals, policies, specifications. Requires a parser that understands document structure (headings, tables, sections), but the accuracy and cost benefits justify the investment.

    Sliding window: Useful only when you cannot tolerate any information loss at chunk boundaries and are willing to pay the token overhead. Consider it for safety-critical applications where missing a passage is more costly than higher operating expenses.

    Implementation Considerations

    Choosing a strategy is only part of the challenge. Implementation details matter significantly:

    Chunk size selection. Even within a strategy, chunk size dramatically affects performance. Our testing showed a sweet spot between 256 and 768 tokens for most enterprise documents. Chunks smaller than 200 tokens lose context; chunks larger than 1,000 tokens dilute relevance.

    Metadata preservation. Regardless of strategy, attaching metadata (document title, section heading, page number) to each chunk improves retrieval by 8-12% in our tests. This metadata enables hybrid search and provides context for reranking.

    Hybrid approaches. The highest-performing production systems we have observed combine document-aware chunking for structured content with semantic chunking as a fallback for unstructured sections. This requires a document classifier upstream in the pipeline but achieves 89-92% Recall@5 across mixed corpora.

    How Ertas Approaches Chunking

    Ertas Data Suite includes a RAG Chunker node that supports multiple chunking strategies within the visual pipeline canvas. Because Ertas processes documents through structured parsing nodes (PDF Parser, Word Parser, Excel/CSV Parser) before chunking, the document structure — headings, tables, sections — is already extracted and available.

    This makes document-aware chunking a natural fit. The RAG Chunker node receives parsed, structured output from upstream nodes and can leverage that structure to define chunk boundaries. Teams can also chain the Quality Scorer node after chunking to flag low-quality chunks before they reach the embedding stage.

    For teams processing mixed document types, Ertas pipelines can route structured and unstructured documents through different chunking configurations on the same canvas, with full observability at every stage.

    Key Takeaways

    Document-aware chunking achieves the highest retrieval accuracy (86.7% Recall@5) and the best token efficiency across structured enterprise documents. Semantic chunking is the strongest choice for unstructured content but carries a significant indexing latency penalty. Fixed-size chunking, while simple, leaves 15+ percentage points of accuracy on the table compared to document-aware approaches.

    The choice of chunking strategy has a direct, measurable impact on both RAG quality and operating costs. Teams building enterprise RAG pipelines should benchmark against their own document corpus, but the data suggests that investing in document-aware parsing and chunking pays for itself quickly — in both retrieval accuracy and reduced token spend.

    Turn unstructured data into AI-ready datasets — without it leaving the building.

    On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.

    Keep reading