Node parsers transform documents into smaller chunks (nodes) that are optimized for embedding and retrieval. Effective chunking is critical for RAG performance.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/run-llama/LlamaIndexTS/llms.txt
Use this file to discover all available pages before exploring further.
Why Chunking Matters
Chunking breaks large documents into smaller pieces because:- Embedding models have token limits - Most models work best with 512-2048 tokens
- Better semantic granularity - Smaller chunks provide more precise retrieval
- Improved context relevance - Return only the most relevant sections to the LLM
- Efficient processing - Easier to embed and index smaller text segments
SentenceSplitter
The most commonly used parser that splits text while respecting sentence boundaries.Basic Usage
Configuration Options
How It Works
- Paragraph splitting: First tries to split by paragraph separators
- Sentence splitting: Uses sentence tokenizer to find sentence boundaries
- Regex fallback: If sentences are too long, uses secondary regex
- Word splitting: Final fallback splits by words
- Chunk merging: Combines splits into chunks up to
chunkSizewithchunkOverlap
Metadata-Aware Splitting
MarkdownNodeParser
Splits markdown documents by headers, preserving document structure.Features
- Splits on markdown headers (
#,##,###, etc.) - Preserves header hierarchy in metadata
- Handles code blocks correctly
- Each chunk contains one section’s content
CodeSplitter
Parses code using tree-sitter for syntax-aware chunking.Features
- Syntax-aware: Respects language structure (functions, classes, etc.)
- Configurable size: Set
maxCharsfor chunk length - Multi-language: Works with any tree-sitter grammar
- Recursive chunking: Splits large syntax nodes intelligently
SentenceWindowNodeParser
Creates overlapping windows around sentences for better context.TokenTextSplitter
Splits text by token count without respecting sentence boundaries.SimpleNodeParser (Deprecated)
Custom Parsers
Create your own parser by extendingNodeParser:
Choosing a Chunking Strategy
General text documents (articles, books, documentation)
General text documents (articles, books, documentation)
Use SentenceSplitter with:
chunkSize: 1024for most caseschunkSize: 512for more precise retrievalchunkSize: 2048for broader contextchunkOverlap: 200to maintain continuity
Markdown documentation
Markdown documentation
Use MarkdownNodeParser to:
- Preserve document structure
- Keep sections together
- Add header hierarchy to metadata
- Improve navigation and citations
Code files
Code files
Use CodeSplitter to:
- Respect syntax boundaries
- Keep functions/classes intact
- Enable code search and analysis
- Support multiple languages
Precise question answering
Precise question answering
Use SentenceWindowNodeParser to:
- Retrieve exact sentences
- Provide surrounding context
- Improve answer accuracy
- Support citation to specific sentences
Complete Example
Best Practices
-
Match chunk size to your use case
- Smaller (512) for precise retrieval
- Larger (2048) for broad context
-
Use appropriate overlap
- 10-20% of chunk size typically works well
- Prevents losing context at boundaries
-
Respect document structure
- Use MarkdownNodeParser for markdown
- Use CodeSplitter for code
- Don’t split across major boundaries
-
Consider metadata
- Account for metadata in chunk size
- Use metadata to preserve structure
- Add custom fields for filtering
-
Test your strategy
- Evaluate retrieval quality
- Adjust chunk size based on results
- Monitor token usage
Next Steps
Documents
Learn about Document structure
Ingestion
Build complete processing pipelines
Embeddings
Configure embedding models
Retrieval
Optimize retrieval strategies