Node Parser - LlamaIndex.TS

Overview

Node parsers split documents into smaller chunks (nodes) for processing. They handle text segmentation, maintain relationships between chunks, and preserve metadata.

NodeParser

Abstract base class for all node parsers.

import { NodeParser } from "@llamaindex/core/node-parser";

Properties

includeMetadata

boolean

default:true

Whether to include document metadata in parsed nodes

includePrevNextRel

boolean

default:true

Whether to include previous/next relationships between consecutive chunks

Methods

getNodesFromDocuments

method

Parse documents into nodes

getNodesFromDocuments(documents: TextNode[]): TextNode[] | Promise<TextNode[]>

Show Parameters

documents

TextNode[]

required

Input documents to parse

Show Returns

nodes

TextNode[]

Parsed text nodes with relationships and metadata

TextSplitter

Abstract base class for text splitting strategies.

import { TextSplitter } from "@llamaindex/core/node-parser";

Methods

splitText

method

Split a single text into chunks

abstract splitText(text: string): string[]

Show Parameters

text

string

required

Text to split

Show Returns

chunks

string[]

Array of text chunks

splitTexts

method

Split multiple texts into chunks

splitTexts(texts: string[]): string[]

SentenceSplitter

Splits text by sentences with configurable chunk size and overlap.

import { SentenceSplitter } from "@llamaindex/core/node-parser";

Constructor Options

chunkSize

number

default:1024

Maximum number of characters per chunk

chunkOverlap

number

default:200

Number of characters to overlap between chunks

separator

string

default:" "

Separator to use when splitting

paragraphSeparator

string

default:"\n\n\n"

Separator for paragraph boundaries

secondarySeparator

string

default:"\n\n"

Secondary separator (e.g., line breaks)

Example

import { SentenceSplitter } from "@llamaindex/core/node-parser";
import { Document } from "@llamaindex/core/schema";

const parser = new SentenceSplitter({
  chunkSize: 512,
  chunkOverlap: 50
});

const document = new Document({
  text: "Long document text..."
});

const nodes = parser.getNodesFromDocuments([document]);
console.log(nodes.length); // Number of chunks created

MarkdownNodeParser

Splits markdown documents while preserving structure.

import { MarkdownNodeParser } from "@llamaindex/core/node-parser";

Constructor Options

chunkSize

number

default:1024

Maximum characters per chunk

chunkOverlap

number

default:200

Overlap between chunks

Example

const parser = new MarkdownNodeParser({
  chunkSize: 1024,
  chunkOverlap: 100
});

const document = new Document({
  text: "# Heading\n\nParagraph text...",
  metadata: { format: "markdown" }
});

const nodes = parser.getNodesFromDocuments([document]);

MetadataAwareTextSplitter

Abstract base for splitters that consider metadata when chunking.

abstract class MetadataAwareTextSplitter extends TextSplitter {
  abstract splitTextMetadataAware(
    text: string,
    metadata: string
  ): string[];
}

Useful when metadata should be included in chunk size calculations.

Node Relationships

Parsed nodes automatically include relationships:

const nodes = parser.getNodesFromDocuments([document]);

// First node
console.log(nodes[0].relationships);
// {
//   [NodeRelationship.SOURCE]: { nodeId: "doc-id", ... },
//   [NodeRelationship.NEXT]: { nodeId: "node-1-id", ... }
// }

// Middle node
console.log(nodes[1].relationships);
// {
//   [NodeRelationship.SOURCE]: { nodeId: "doc-id", ... },
//   [NodeRelationship.PREVIOUS]: { nodeId: "node-0-id", ... },
//   [NodeRelationship.NEXT]: { nodeId: "node-2-id", ... }
// }

Metadata Inheritance

Nodes inherit metadata from parent documents:

const document = new Document({
  text: "Document text...",
  metadata: {
    title: "My Document",
    author: "John Doe"
  }
});

const nodes = parser.getNodesFromDocuments([document]);

// All nodes inherit parent metadata
console.log(nodes[0].metadata);
// { title: "My Document", author: "John Doe" }

Character Positions

Parsers track character positions in the original document:

const nodes = parser.getNodesFromDocuments([document]);

console.log(nodes[0].startCharIdx); // 0
console.log(nodes[0].endCharIdx);   // 512
console.log(nodes[1].startCharIdx); // 462 (with overlap)
console.log(nodes[1].endCharIdx);   // 1024

Custom Node Parser

Create custom parsers by extending NodeParser:

import { NodeParser } from "@llamaindex/core/node-parser";
import { TextNode } from "@llamaindex/core/schema";

class CustomParser extends NodeParser {
  protected parseNodes(documents: TextNode[]): TextNode[] {
    return documents.flatMap(doc => {
      // Custom splitting logic
      const chunks = this.customSplit(doc.text);
      
      return chunks.map(chunk => new TextNode({
        text: chunk,
        metadata: { ...doc.metadata }
      }));
    });
  }
  
  private customSplit(text: string): string[] {
    // Your custom splitting logic
    return text.split(/\n---\n/);
  }
}

Best Practices

Choose appropriate chunk size: Smaller chunks (256-512) for precise retrieval, larger chunks (1024-2048) for more context
Use overlap: 10-20% overlap helps maintain context across chunk boundaries
Preserve structure: Use MarkdownNodeParser for markdown to maintain headings and formatting
Consider token limits: Account for model context windows when setting chunk sizes

Documentation Index

​Overview

​NodeParser

​Properties

​Methods

​TextSplitter

​Methods

​SentenceSplitter

​Constructor Options

​Example

​MarkdownNodeParser

​Constructor Options

​Example

​MetadataAwareTextSplitter

​Node Relationships

​Metadata Inheritance

​Character Positions

​Custom Node Parser

​Best Practices

Overview

NodeParser

Properties

Methods

TextSplitter

Methods

SentenceSplitter

Constructor Options

Example

MarkdownNodeParser

Constructor Options

Example

MetadataAwareTextSplitter

Node Relationships

Metadata Inheritance

Character Positions

Custom Node Parser

Best Practices