Document objects that LlamaIndex can process. LlamaIndex.TS provides built-in readers for common file formats and integrations.
Overview
Readers implement theBaseReader interface:
Document objects with text content and metadata.
File Readers
SimpleDirectoryReader
Load multiple file types from a directory:Custom File Extensions
PDF Reader
DOCX Reader
CSV Reader
Markdown Reader
HTML Reader
JSON Reader
Image Reader
Text File Reader
XML Reader
LlamaParse
LlamaParse is a premium document parsing service that handles complex layouts, tables, and figures:Features
- Advanced PDF parsing: Tables, charts, multi-column layouts
- Image extraction: Embedded images and figures
- Format preservation: Maintains document structure
- Multiple formats: PDF, DOCX, PPTX, and more
Configuration
Platform Integrations
Notion Reader
Discord Reader
AssemblyAI Reader
Transcribe audio/video files:Loading from URLs
Many readers support loading from HTTP/HTTPS URLs:Custom Readers
Create your own reader by implementingBaseReader:
Extending FileReader
For file-based readers, extendFileReader:
Complete Example
Available Reader Packages
Core Readers
@llamaindex/readers- SimpleDirectoryReader
- PDFReader
- CSVReader
- MarkdownReader
- DocxReader
- HTMLReader
- JSONReader
- ImageReader
- TextFileReader
- XMLReader
Platform Integrations
@llamaindex/notion- Notion databases@llamaindex/discord- Discord channels@llamaindex/assemblyai- Audio/video transcription
Premium Services
- LlamaParse - Advanced document parsing
- LlamaCloud - Managed data ingestion
Community
Check the LlamaIndex Hub for community-contributed readers:
- Web scrapers
- Database connectors
- API integrations
- And more
Best Practices
-
Choose the right reader
- Use format-specific readers for better parsing
- LlamaParse for complex PDFs with tables
- SimpleDirectoryReader for mixed formats
-
Handle metadata
- Readers automatically add file paths and names
- Preserve source information for citations
- Add custom metadata after loading
-
Process in batches
- Load files in chunks for large datasets
- Monitor memory usage
- Use streaming when possible
-
Error handling
- Catch and log file-specific errors
- Continue processing other files on failure
- Validate file formats before reading
-
Combine with pipelines
- Use readers with IngestionPipeline
- Chain transformations after reading
- Cache results for repeated access
Next Steps
Documents
Work with Document objects
Ingestion
Build data processing pipelines
Node Parsers
Split documents into chunks
LlamaParse
Advanced document parsing