Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/run-llama/LlamaIndexTS/llms.txt

Use this file to discover all available pages before exploring further.

What are Chat Engines?

Chat engines provide conversational interfaces over your data. Unlike query engines that handle single questions, chat engines:
  • Maintain chat history for context
  • Support follow-up questions
  • Enable streaming responses
  • Handle multi-turn conversations

Chat Engine Types

LlamaIndex.TS provides several chat engine types:

SimpleChatEngine

Basic chat without retrieval (just LLM conversation):
import { SimpleChatEngine } from "llamaindex";

const chatEngine = new SimpleChatEngine();

const response = await chatEngine.chat({
  message: "Hello! How are you?"
});

console.log(response.message.content);

ContextChatEngine

Chat with document retrieval for every message:
import { ContextChatEngine, VectorStoreIndex, Document } from "llamaindex";

const document = new Document({ text: "Your document text" });
const index = await VectorStoreIndex.fromDocuments([document]);
const retriever = index.asRetriever({ similarityTopK: 5 });

const chatEngine = new ContextChatEngine({ retriever });

const response = await chatEngine.chat({
  message: "What does the document say?"
});

CondenseQuestionChatEngine

Condenses chat history into standalone questions before retrieval:
import { CondenseQuestionChatEngine } from "llamaindex";

const queryEngine = index.asQueryEngine();

const chatEngine = new CondenseQuestionChatEngine({
  queryEngine,
  chatHistory: []  // Manages history internally
});
The easiest way to create a chat engine from an index:
import { VectorStoreIndex, Document } from "llamaindex";

const document = new Document({ text: "Your document text" });
const index = await VectorStoreIndex.fromDocuments([document]);

// Creates a ContextChatEngine internally
const chatEngine = index.asChatEngine({
  similarityTopK: 5
});

const response = await chatEngine.chat({
  message: "Tell me about the document"
});

console.log(response.message.content);

Complete Working Example

Here’s a full conversational RAG application:
import {
  ContextChatEngine,
  Document,
  Settings,
  VectorStoreIndex
} from "llamaindex";
import { stdin as input, stdout as output } from "node:process";
import readline from "node:readline/promises";

// Configure chunk size
Settings.chunkSize = 512;

async function main() {
  // Load document
  const essay = await loadEssay(); // Your document loading logic
  const document = new Document({ text: essay });
  
  // Create index and retriever
  const index = await VectorStoreIndex.fromDocuments([document]);
  const retriever = index.asRetriever({
    similarityTopK: 5
  });
  
  // Create chat engine
  const chatEngine = new ContextChatEngine({ retriever });
  
  // Interactive chat loop
  const rl = readline.createInterface({ input, output });
  
  console.log("Chat with your document! Type 'exit' to quit.\n");
  
  while (true) {
    const query = await rl.question("You: ");
    
    if (query.toLowerCase() === "exit") break;
    
    const stream = await chatEngine.chat({ 
      message: query, 
      stream: true 
    });
    
    process.stdout.write("Assistant: ");
    for await (const chunk of stream) {
      process.stdout.write(chunk.response);
    }
    process.stdout.write("\n\n");
  }
}

main().catch(console.error);

Chat History Management

Accessing Chat History

Get the conversation history:
const chatEngine = new CondenseQuestionChatEngine({
  queryEngine,
  chatHistory: []
});

// After some chats...
const history = chatEngine.chatHistory;
console.log(history);

Custom Chat History

Provide initial chat context:
import type { ChatMessage } from "llamaindex";

const initialHistory: ChatMessage[] = [
  {
    role: "user",
    content: "What is LlamaIndex?"
  },
  {
    role: "assistant",
    content: "LlamaIndex is a data framework for LLM applications."
  }
];

const chatEngine = new CondenseQuestionChatEngine({
  queryEngine,
  chatHistory: initialHistory
});

Resetting Chat History

Clear the conversation:
chatEngine.reset();

Streaming Responses

Stream tokens as they’re generated:
const stream = await chatEngine.chat({
  message: "Tell me about the document",
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.response);
}

Streaming with Different Indices

All index types support streaming:
import { VectorStoreIndex, SummaryIndex, KeywordTableIndex } from "llamaindex";

// Vector store index chat
const vectorChat = (await VectorStoreIndex.fromDocuments([doc]))
  .asChatEngine();

// Summary index chat
const summaryChat = (await SummaryIndex.fromDocuments([doc]))
  .asChatEngine();

// Keyword index chat  
const keywordChat = (await KeywordTableIndex.fromDocuments([doc]))
  .asChatEngine();

// All support streaming
const stream = await vectorChat.chat({ 
  message: "Hello", 
  stream: true 
});

CondenseQuestionChatEngine Deep Dive

This engine is ideal for question-focused conversations:

How It Works

  1. Condenses the chat history + new message into a standalone question
  2. Queries the index with the condensed question
  3. Returns the answer and updates chat history
import { CondenseQuestionChatEngine } from "llamaindex";

const queryEngine = index.asQueryEngine();

const chatEngine = new CondenseQuestionChatEngine({
  queryEngine,
  chatHistory: []
});

// First question
await chatEngine.chat({
  message: "What is the main topic?"
});
// Internally: "What is the main topic?" (no history)

// Follow-up question
await chatEngine.chat({
  message: "Tell me more about it"
});
// Internally: Condenses to "Tell me more about the main topic" using history

Custom Condense Prompt

Customize how questions are condensed:
import { 
  CondenseQuestionChatEngine,
  type CondenseQuestionPrompt 
} from "llamaindex";

const customPrompt: CondenseQuestionPrompt = ({
  question,
  chatHistory
}) => {
  return `Given this chat history:
${chatHistory}

Rewrite this follow-up question as a standalone question:
${question}

Standalone question:`;
};

const chatEngine = new CondenseQuestionChatEngine({
  queryEngine,
  chatHistory: [],
  condenseMessagePrompt: customPrompt
});

When to Use CondenseQuestionChatEngine

  • Questions build on previous context
  • Queries are primarily questions (not commands)
  • You want explicit question reformulation

When NOT to Use It

  • Messages are conversational statements
  • Heavy use of pronouns (“it”, “that”, “this”)
  • Non-question interactions

Configuration Options

Retrieval Parameters

Control how many chunks to retrieve:
const chatEngine = index.asChatEngine({
  similarityTopK: 10  // Retrieve top 10 chunks
});

Custom Settings

Global configuration:
import { Settings } from "llamaindex";

Settings.chunkSize = 1024;
Settings.chunkOverlap = 100;
Settings.llm = customLLM;
Settings.embedModel = customEmbedding;

Choosing the Right Chat Engine

EngineUse CaseProsCons
SimpleChatEnginePure conversationFast, no retrieval overheadNo document context
ContextChatEngineGeneral chat over docsSimple, always has contextMay retrieve irrelevant info
CondenseQuestionChatEngineQ&A sessionsBetter follow-upsOnly good for questions
Index.asChatEngine()Quick startEasy setupLess customization

Next Steps