Documentation Index
Fetch the complete documentation index at: https://mintlify.com/run-llama/LlamaIndexTS/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Engines provide high-level interfaces for querying and chatting with your indexed data. LlamaIndex provides two main types:
- Query Engines: Question-answering over data
- Chat Engines: Multi-turn conversations with context
Query Engines
RetrieverQueryEngine
Standard query engine combining retrieval with response synthesis.
import { RetrieverQueryEngine } from "llamaindex/engines/query";
import { VectorStoreIndex } from "llamaindex";
const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine({
similarityTopK: 3
});
const response = await queryEngine.query({
query: "What is LlamaIndex?"
});
console.log(response.response);
console.log(response.sourceNodes); // Retrieved context
SubQuestionQueryEngine
Breaks complex questions into sub-questions.
import { SubQuestionQueryEngine } from "llamaindex/engines/query";
const queryEngineTools = [
{
queryEngine: docsQueryEngine,
description: "Documentation query engine"
},
{
queryEngine: codeQueryEngine,
description: "Code query engine"
}
];
const queryEngine = new SubQuestionQueryEngine({
queryEngineTools
});
const response = await queryEngine.query({
query: "Compare the performance of algorithm A vs algorithm B"
});
RouterQueryEngine
Routes queries to the most appropriate engine.
import { RouterQueryEngine } from "llamaindex/engines/query";
import { LLMSingleSelector } from "llamaindex/selectors";
const selector = new LLMSingleSelector();
const queryEngine = new RouterQueryEngine({
selector,
queryEngineTools: [
{
queryEngine: vectorEngine,
description: "Good for semantic search"
},
{
queryEngine: keywordEngine,
description: "Good for keyword search"
}
]
});
Chat Engines
ContextChatEngine
Chat engine with retrieval augmented generation (RAG).
import { ContextChatEngine } from "llamaindex/engines/chat";
import { VectorStoreIndex } from "llamaindex";
const index = await VectorStoreIndex.fromDocuments(documents);
const chatEngine = index.asChatEngine();
const response1 = await chatEngine.chat({
message: "What is LlamaIndex?"
});
const response2 = await chatEngine.chat({
message: "Tell me more about its features"
});
// Chat history is maintained automatically
const history = await chatEngine.chatHistory;
SimpleChatEngine
Basic chat without retrieval.
import { SimpleChatEngine } from "llamaindex/engines/chat";
import { OpenAI } from "@llamaindex/openai";
const llm = new OpenAI({ model: "gpt-4" });
const chatEngine = new SimpleChatEngine({ llm });
const response = await chatEngine.chat({
message: "Hello!"
});
Streaming
Both query and chat engines support streaming:
Streaming Queries
const stream = await queryEngine.query({
query: "Explain LlamaIndex",
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.response);
}
Streaming Chat
const stream = await chatEngine.chat({
message: "Tell me a story",
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.response);
}
Response Synthesis
Customize how responses are generated from retrieved context:
import { ResponseSynthesizer, CompactAndRefine, TreeSummarize } from "llamaindex";
// Compact and refine
const synthesizer1 = new ResponseSynthesizer({
responseBuilder: new CompactAndRefine()
});
// Tree summarize
const synthesizer2 = new ResponseSynthesizer({
responseBuilder: new TreeSummarize()
});
const queryEngine = index.asQueryEngine({
responseSynthesizer: synthesizer1
});
Retrieval Configuration
const queryEngine = index.asQueryEngine({
// Number of nodes to retrieve
similarityTopK: 5,
// Custom retriever
retriever: index.asRetriever({
similarityTopK: 10
}),
// Post-processing
nodePostprocessors: [similarityPostprocessor]
});
Multi-modal Queries
Engines support multi-modal input:
const response = await queryEngine.query({
query: [
{ type: "text", text: "What's in this diagram?" },
{ type: "image_url", image_url: { url: "data:image/png;base64,..." } }
]
});
Custom System Prompts
const chatEngine = index.asChatEngine({
systemPrompt: "You are a helpful AI assistant specialized in technical documentation."
});
const queryEngine = index.asQueryEngine({
textQATemplate: "Context: {context}\n\nQuestion: {query}\n\nAnswer:"
});
Memory Management
import { ChatMemoryBuffer } from "@llamaindex/core/memory";
const memory = new ChatMemoryBuffer({
tokenLimit: 3000
});
const chatEngine = index.asChatEngine({
chatHistory: memory
});
Node Post-processors
Filter or rerank retrieved nodes:
import { SimilarityPostprocessor } from "llamaindex/postprocessors";
const postprocessor = new SimilarityPostprocessor({
similarityCutoff: 0.7
});
const queryEngine = index.asQueryEngine({
nodePostprocessors: [postprocessor]
});
Best Practices
- Use ContextChatEngine for RAG: Automatically retrieves relevant context
- Configure similarity threshold: Filter low-quality retrieval results
- Stream long responses: Better UX for lengthy answers
- Inspect source nodes: Verify response quality
- Use sub-question for complex queries: Break down multi-part questions
- Set appropriate top_k: Balance between context and noise
See Also