Documentation Index Fetch the complete documentation index at: https://mintlify.com/run-llama/LlamaIndexTS/llms.txt
Use this file to discover all available pages before exploring further.
Readers ingest data from various sources and convert it into Document objects that LlamaIndex can process. LlamaIndex.TS provides built-in readers for common file formats and integrations.
Overview
Readers implement the BaseReader interface:
interface BaseReader {
loadData ( ... args : unknown []) : Promise < Document []>;
}
All readers convert their input format into one or more Document objects with text content and metadata.
File Readers
SimpleDirectoryReader
Load multiple file types from a directory:
import { SimpleDirectoryReader } from "@llamaindex/readers/directory" ;
const reader = new SimpleDirectoryReader ();
const documents = await reader . loadData ( "./data" );
console . log ( `Loaded ${ documents . length } documents` );
Supported formats : TXT, PDF, CSV, Markdown, DOCX, HTML, JPG/PNG/GIF, XML
Custom File Extensions
import { SimpleDirectoryReader } from "@llamaindex/readers/directory" ;
import { JSONReader } from "@llamaindex/readers/json" ;
const reader = new SimpleDirectoryReader ();
const documents = await reader . loadData ({
directoryPath: "./data" ,
fileExtToReader: {
json: new JSONReader (),
// Add custom readers for other extensions
}
});
PDF Reader
import { PDFReader } from "@llamaindex/readers/pdf" ;
const reader = new PDFReader ();
const documents = await reader . loadData ( "document.pdf" );
// Each page becomes a separate document
for ( const doc of documents ) {
console . log ( doc . metadata . page_number );
}
DOCX Reader
import { DocxReader } from "@llamaindex/readers/docx" ;
const reader = new DocxReader ();
const documents = await reader . loadData ( "document.docx" );
CSV Reader
import { CSVReader } from "@llamaindex/readers/csv" ;
// Concatenate all rows into one document
const reader = new CSVReader (
true , // concatRows
", " , // colJoiner
" \n " // rowJoiner
);
const documents = await reader . loadData ( "data.csv" );
// Or create one document per row
const rowReader = new CSVReader ( false );
const rowDocuments = await rowReader . loadData ( "data.csv" );
Markdown Reader
import { MarkdownReader } from "@llamaindex/readers/markdown" ;
const reader = new MarkdownReader (
true , // removeHyperlinks
true // removeImages
);
const documents = await reader . loadData ( "README.md" );
// Documents are split by headers
for ( const doc of documents ) {
console . log ( doc . text );
}
HTML Reader
import { HTMLReader } from "@llamaindex/readers/html" ;
const reader = new HTMLReader ();
const documents = await reader . loadData ( "page.html" );
JSON Reader
import { JSONReader } from "@llamaindex/readers/json" ;
const reader = new JSONReader ();
const documents = await reader . loadData ( "data.json" );
Image Reader
import { ImageReader } from "@llamaindex/readers/image" ;
const reader = new ImageReader ();
const imageDocuments = await reader . loadData ( "photo.jpg" );
// Creates ImageDocument with image blob
Text File Reader
import { TextFileReader } from "@llamaindex/readers/text" ;
const reader = new TextFileReader ();
const documents = await reader . loadData ( "file.txt" );
XML Reader
import { XMLReader } from "@llamaindex/readers/xml" ;
const reader = new XMLReader ();
const documents = await reader . loadData ( "data.xml" );
LlamaParse
LlamaParse is a premium document parsing service that handles complex layouts, tables, and figures:
import { LlamaParseReader } from "llamaindex" ;
const reader = new LlamaParseReader ({
apiKey: process . env . LLAMA_CLOUD_API_KEY ,
resultType: "markdown" , // or "text"
language: "en"
});
const documents = await reader . loadData ( "complex-document.pdf" );
Features
Advanced PDF parsing : Tables, charts, multi-column layouts
Image extraction : Embedded images and figures
Format preservation : Maintains document structure
Multiple formats : PDF, DOCX, PPTX, and more
Configuration
const reader = new LlamaParseReader ({
apiKey: process . env . LLAMA_CLOUD_API_KEY ,
resultType: "markdown" ,
numWorkers: 4 ,
verbose: true ,
language: "en" ,
// Advanced options
parsingInstructions: "Focus on extracting tables" ,
skipDiagonalText: false ,
invalidateCache: false ,
doNotCache: false ,
fastMode: false
});
Notion Reader
import { NotionReader } from "@llamaindex/notion" ;
const reader = new NotionReader ({
auth: process . env . NOTION_TOKEN
});
const documents = await reader . loadData ({
databaseId: "your-database-id"
});
Discord Reader
import { DiscordReader } from "@llamaindex/discord" ;
const reader = new DiscordReader ({
token: process . env . DISCORD_TOKEN
});
const documents = await reader . loadData ({
channelId: "channel-id" ,
limit: 100
});
AssemblyAI Reader
Transcribe audio/video files:
import { AssemblyAIReader } from "@llamaindex/assemblyai" ;
const reader = new AssemblyAIReader ({
apiKey: process . env . ASSEMBLYAI_API_KEY
});
const documents = await reader . loadData ( "podcast.mp3" );
Loading from URLs
Many readers support loading from HTTP/HTTPS URLs:
import { PDFReader } from "@llamaindex/readers/pdf" ;
const reader = new PDFReader ();
const documents = await reader . loadData (
"https://example.com/document.pdf"
);
Custom Readers
Create your own reader by implementing BaseReader:
import { BaseReader , Document } from "llamaindex" ;
class CustomAPIReader implements BaseReader {
constructor ( private apiKey : string ) {}
async loadData ( endpoint : string ) : Promise < Document []> {
// Fetch data from your API
const response = await fetch ( endpoint , {
headers: {
Authorization: `Bearer ${ this . apiKey } `
}
});
const data = await response . json ();
// Convert to Documents
return data . items . map (( item : any ) =>
new Document ({
text: item . content ,
metadata: {
id: item . id ,
title: item . title ,
date: item . created_at
}
})
);
}
}
const reader = new CustomAPIReader ( process . env . API_KEY ! );
const documents = await reader . loadData ( "https://api.example.com/items" );
Extending FileReader
For file-based readers, extend FileReader:
import { FileReader , Document } from "@llamaindex/core/schema" ;
class CustomFileReader extends FileReader {
async loadDataAsContent (
fileContent : Uint8Array ,
filename ?: string
) : Promise < Document []> {
// Parse file content
const text = new TextDecoder (). decode ( fileContent );
// Custom parsing logic
const sections = this . parseCustomFormat ( text );
// Return documents
return sections . map ( section =>
new Document ({
text: section . content ,
metadata: {
filename ,
section: section . name
}
})
);
}
private parseCustomFormat ( text : string ) {
// Your parsing logic
return [];
}
}
const reader = new CustomFileReader ();
const documents = await reader . loadData ( "file.custom" );
Complete Example
import {
VectorStoreIndex ,
IngestionPipeline ,
SentenceSplitter
} from "llamaindex" ;
import { SimpleDirectoryReader } from "@llamaindex/readers/directory" ;
import { OpenAIEmbedding } from "@llamaindex/openai" ;
import { PDFReader } from "@llamaindex/readers/pdf" ;
import { MarkdownReader } from "@llamaindex/readers/markdown" ;
async function main () {
// Load documents from directory
const reader = new SimpleDirectoryReader ();
const documents = await reader . loadData ({
directoryPath: "./data" ,
fileExtToReader: {
pdf: new PDFReader (),
md: new MarkdownReader ()
}
});
console . log ( `Loaded ${ documents . length } documents` );
// Inspect documents
for ( const doc of documents . slice ( 0 , 3 )) {
console . log ( "File:" , doc . metadata . file_name );
console . log ( "Preview:" , doc . text . substring ( 0 , 100 ));
}
// Process with pipeline
const pipeline = new IngestionPipeline ({
transformations: [
new SentenceSplitter ({ chunkSize: 1024 }),
new OpenAIEmbedding ()
]
});
const nodes = await pipeline . run ({ documents });
// Create index
const index = await VectorStoreIndex . init ({ nodes });
// Query
const queryEngine = index . asQueryEngine ();
const response = await queryEngine . query ({
query: "What are the main topics across all documents?"
});
console . log ( response . toString ());
}
main (). catch ( console . error );
Available Reader Packages
Core Readers @llamaindex/readers
SimpleDirectoryReader
PDFReader
CSVReader
MarkdownReader
DocxReader
HTMLReader
JSONReader
ImageReader
TextFileReader
XMLReader
Platform Integrations
@llamaindex/notion - Notion databases
@llamaindex/discord - Discord channels
@llamaindex/assemblyai - Audio/video transcription
Premium Services
LlamaParse - Advanced document parsing
LlamaCloud - Managed data ingestion
Community Check the LlamaIndex Hub for community-contributed readers:
Web scrapers
Database connectors
API integrations
And more
Best Practices
Choose the right reader
Use format-specific readers for better parsing
LlamaParse for complex PDFs with tables
SimpleDirectoryReader for mixed formats
Handle metadata
Readers automatically add file paths and names
Preserve source information for citations
Add custom metadata after loading
Process in batches
Load files in chunks for large datasets
Monitor memory usage
Use streaming when possible
Error handling
Catch and log file-specific errors
Continue processing other files on failure
Validate file formats before reading
Combine with pipelines
Use readers with IngestionPipeline
Chain transformations after reading
Cache results for repeated access
Next Steps
Documents Work with Document objects
Ingestion Build data processing pipelines
Node Parsers Split documents into chunks
LlamaParse Advanced document parsing