What is RAG?
Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval. Instead of relying solely on the LLM’s training data, RAG applications:- Retrieve relevant information from your documents
- Augment the LLM prompt with this context
- Generate accurate, grounded responses
When to Use RAG
RAG is ideal when you need to:- Answer questions about your own documents or data
- Build chatbots with up-to-date information
- Create knowledge bases that can be queried naturally
- Reduce hallucinations by grounding responses in source material
Building Your First RAG App
Load Your Documents
Create documents from your text data:Or use a directory reader for multiple files:
Create a Vector Index
Index your documents with embeddings:This automatically:
- Splits documents into chunks
- Generates embeddings for each chunk
- Stores them in a vector store for similarity search
Complete Working Example
Here’s a full RAG application you can run:VectorStoreIndex Configuration
Customizing Chunk Size
Control how documents are split:Adjusting Retrieval Parameters
Configure how many results to retrieve:Using Different Vector Stores
By default,VectorStoreIndex uses an in-memory vector store. For production, use a persistent store:
Advanced: Low-Level RAG Pipeline
For fine-grained control, build the RAG pipeline manually:Next Steps
- Learn about Chat Engines for conversational RAG
- Explore Query Engines for advanced querying
- Build Agents that can use RAG as a tool