Documentation Index Fetch the complete documentation index at: https://mintlify.com/run-llama/LlamaIndexTS/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Query engines provide interfaces for querying indexed data and generating responses. They combine retrieval with response synthesis to answer questions over your data.
BaseQueryEngine
Abstract base class for all query engines.
import { BaseQueryEngine } from "@llamaindex/core/query-engine" ;
Methods
Query the engine with streaming or non-streaming response Non-streaming: query ( params : NonStreamingQueryParams ): Promise < EngineResponse >
Streaming: query ( params : StreamingQueryParams ): Promise < AsyncIterable < EngineResponse >>
query
string | QueryBundle
required
The query string or QueryBundle
Whether to stream the response. Defaults to false
The generated response text
Retrieved source nodes used to generate the response
Additional response metadata
Retrieve relevant nodes without generating a response retrieve ( query : QueryType ): Promise < NodeWithScore [] >
query
string | QueryBundle
required
The retrieval query
Array of retrieved nodes with relevance scores
QueryBundle
Enhanced query object with optional embeddings.
type QueryBundle = {
query : MessageContent ;
customEmbeddings ?: string [];
embeddings ?: number [];
};
query
string | MessageContentDetail[]
The query text or multi-modal content
Custom embedding strings (optional)
Pre-computed query embeddings (optional)
Usage Examples
Basic Query
import { VectorStoreIndex } from "llamaindex" ;
import { Document } from "@llamaindex/core/schema" ;
const documents = [
new Document ({ text: "LlamaIndex is a data framework for LLM applications." }),
new Document ({ text: "It provides tools for ingestion, indexing, and querying." })
];
const index = await VectorStoreIndex . fromDocuments ( documents );
const queryEngine = index . asQueryEngine ();
const response = await queryEngine . query ({
query: "What is LlamaIndex?"
});
console . log ( response . response );
console . log ( response . sourceNodes ); // Nodes used to generate response
Streaming Query
const queryEngine = index . asQueryEngine ();
const stream = await queryEngine . query ({
query: "What is LlamaIndex?" ,
stream: true
});
for await ( const chunk of stream ) {
process . stdout . write ( chunk . response );
}
Retrieve Only
const nodes = await queryEngine . retrieve ( "LlamaIndex" );
nodes . forEach ( nodeWithScore => {
console . log ( `Score: ${ nodeWithScore . score } ` );
console . log ( `Text: ${ nodeWithScore . node . text } ` );
});
QueryBundle with Custom Embeddings
import { OpenAIEmbedding } from "@llamaindex/openai" ;
const embedModel = new OpenAIEmbedding ();
const queryEmbedding = await embedModel . getTextEmbedding ( "What is LlamaIndex?" );
const response = await queryEngine . query ({
query: {
query: "What is LlamaIndex?" ,
embeddings: queryEmbedding
}
});
Advanced Query Engines
RetrieverQueryEngine
Query engine that uses a retriever and response synthesizer.
import { RetrieverQueryEngine } from "llamaindex" ;
const queryEngine = new RetrieverQueryEngine ({
retriever: index . asRetriever (),
responseSynthesizer: responseSynthesizer
});
SubQuestionQueryEngine
Breaks down complex queries into sub-questions.
import { SubQuestionQueryEngine } from "llamaindex" ;
const queryEngine = new SubQuestionQueryEngine ({
queryEngineTools: [ tool1 , tool2 ],
responseSynthesizer: responseSynthesizer
});
const response = await queryEngine . query ({
query: "Compare feature A and feature B"
});
RouterQueryEngine
Routes queries to appropriate query engines based on content.
import { RouterQueryEngine } from "llamaindex" ;
const queryEngine = new RouterQueryEngine ({
selector: selector ,
queryEngineTools: [ docEngine , codeEngine ]
});
Response Synthesis
Query engines use response synthesizers to generate answers:
import { ResponseSynthesizer , CompactAndRefine } from "llamaindex" ;
const synthesizer = new ResponseSynthesizer ({
responseBuilder: new CompactAndRefine (),
streaming: true
});
const queryEngine = index . asQueryEngine ({
responseSynthesizer: synthesizer
});
Query Events
Query engines emit events during execution:
import { Settings } from "llamaindex" ;
Settings . callbackManager . on ( "query-start" , ( event ) => {
console . log ( "Query started:" , event . query );
});
Settings . callbackManager . on ( "query-end" , ( event ) => {
console . log ( "Query completed:" , event . response );
});
const response = await queryEngine . query ({ query: "What is LlamaIndex?" });
Customization
Custom Query Engine
import { BaseQueryEngine } from "@llamaindex/core/query-engine" ;
import { EngineResponse } from "@llamaindex/core/schema" ;
class CustomQueryEngine extends BaseQueryEngine {
async _query ( query : string , stream ?: boolean ) : Promise < EngineResponse > {
// Custom query logic
const nodes = await this . customRetrieve ( query );
const response = await this . customSynthesize ( nodes );
return {
response: response ,
sourceNodes: nodes ,
metadata: {}
};
}
private async customRetrieve ( query : string ) {
// Custom retrieval logic
return [];
}
private async customSynthesize ( nodes : NodeWithScore []) {
// Custom synthesis logic
return "Generated response" ;
}
}
Best Practices
Use streaming for long responses : Improves perceived latency
Inspect source nodes : Verify response quality by checking retrieved sources
Configure retrieval parameters : Adjust top_k and similarity threshold for better results
Handle errors gracefully : Implement error handling for failed queries
Cache embeddings : Reuse QueryBundle with embeddings for repeated queries