Chunking Guidelines


A breakdown of how to configure chunking settings for optimal performance in search, retrieval, summarization, and embedding pipelines.


What is Chunking?

Chunking breaks down large documents into smaller, manageable pieces that can be processed more effectively by AI models. This is essential for search and retrieval systems, embedding generation, and staying within model token limits.


Chunking Parameters Explained

SettingDescription
Chunk SizeMax size (in tokens or characters) for each chunk
Chunk OverlapOverlapping size between chunks to preserve context
Minimum Characters per SentencePrevents splitting tiny, noisy sentences
Minimum Sentences per ChunkPrevents creation of unhelpful, short fragments

Chunk Size

Controls the granularity of each chunk.

Use CaseRecommended Chunk Size
Document Search256–512
Chatbot Memory128–256
Legal/Financial Docs512–768
Code Embedding128–256
Academic Papers384–768

Example:
Chunk Size = 512 → Includes ~2–3 paragraphs for better semantic understanding.


Chunk Overlap

Adds shared characters between adjacent chunks to retain context.

Use CaseRecommended Overlap
Question-Answer Bots64–128
General Search32–64
RAG Pipelines128+
Speed-Critical Systems0 (no overlap)

Example:
Chunk Size = 256, Overlap = 64 → Smooth "sliding window."


Minimum Characters per Sentence

Filters out fragmented or irrelevant sentences.

Use CaseRecommended Min Characters
General Text50–64
Technical Notes30–40
Spoken Dialogue15–20

Ensures each sentence carries meaningful information.


Minimum Sentences per Chunk

Prevents tiny, contextless chunks.

Use CaseRecommended Min Sentences
Support Logs2–3
Wikipedia-style Text1
Transcripts3–5

1 = flexible chunking, 3+ = ensures more coherent units of thought.


Use Case Presets (Quick Reference)

Use CaseChunk SizeOverlapMin SentencesNotes
Chatbots (FAQ)256641Fast + good recall
Search on Long Docs5121282Preserve topic continuity
RAG Pipeline for QA3841283Chunks remain LLM-friendly
Real-Time Summarization128321Keeps results concise
Support Transcripts512643Captures complete interactions
Legal/Policy Documents7681282Avoids mid-clause cuts
Coding Documentation256641Logical function or block splits

Embedding Model Considerations

Different embedding models have different token limits that affect your chunking strategy.

Model-Specific Token Limits

Model FamilyMax TokensRecommended Chunk SizeBest For
E5 Models (small/base/large)512384–448General purpose, multilingual
ModernBERT Models512384–448BERT-based tasks
Snowflake Arctic Embed8192768–2048Long documents, transcripts

Avoiding Truncation

Critical: Ensure your chunk size + overlap does not exceed the model's maximum token limit.

  • Exceeding limits causes truncation at the embedding stage
  • Truncation leads to context loss and reduced semantic understanding
  • Always verify: chunk_size + overlap ≤ model_max_tokens

Example for E5 models:
Chunk Size = 400, Overlap = 100 → Total = 500 tokens (within 512 limit)


Configuration Guidelines

Choosing Settings

  • Start with use case presets then adjust based on your specific content and performance requirements
  • Balance chunk size and overlap to maintain context while avoiding redundancy
  • Consider your content type when setting minimum sentence and character thresholds
  • Account for embedding model token limits to avoid truncation and context loss

Common Considerations

  • Smaller chunks: Better precision, may lose context
  • Larger chunks: Better context, may exceed model limits
  • More overlap: Better context preservation, more storage/processing
  • Less overlap: Faster processing, potential context loss at boundaries