Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/run-llama/LlamaIndexTS/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Chat engines enable conversational interactions with your data, maintaining chat history and context across multiple turns.

BaseChatEngine

Abstract base class for all chat engines.
import { BaseChatEngine } from "@llamaindex/core/chat-engine";

Properties

chatHistory
ChatMessage[] | Promise<ChatMessage[]>
The conversation history

Methods

chat
method
Send a message and get a responseNon-streaming:
chat(params: NonStreamingChatEngineParams): Promise<EngineResponse>
Streaming:
chat(params: StreamingChatEngineParams): Promise<AsyncIterable<EngineResponse>>

SimpleChatEngine

Basic chat engine without retrieval, just conversational LLM.
import { SimpleChatEngine } from "@llamaindex/core/chat-engine";
import { OpenAI } from "@llamaindex/openai";

Example

const llm = new OpenAI({ model: "gpt-4" });
const chatEngine = new SimpleChatEngine({ llm });

const response1 = await chatEngine.chat({
  message: "Hello! My name is Alice."
});
console.log(response1.response); // "Hello Alice! How can I help you?"

const response2 = await chatEngine.chat({
  message: "What's my name?"
});
console.log(response2.response); // "Your name is Alice."

ContextChatEngine

Chat engine with retrieval - retrieves relevant context for each message.
import { ContextChatEngine } from "@llamaindex/core/chat-engine";

Constructor Options

retriever
BaseRetriever
required
Retriever for fetching relevant context
llm
LLM
Language model (defaults to Settings.llm)
chatHistory
ChatMessage[]
Initial chat history
systemPrompt
string
System prompt for the chat
contextSystemPrompt
string
Template for injecting retrieved context

Example

import { VectorStoreIndex } from "llamaindex";
import { Document } from "@llamaindex/core/schema";

const documents = [
  new Document({ text: "The company was founded in 2020." }),
  new Document({ text: "Our main product is a data framework." })
];

const index = await VectorStoreIndex.fromDocuments(documents);
const chatEngine = index.asChatEngine();

const response = await chatEngine.chat({
  message: "When was the company founded?"
});

console.log(response.response); // "The company was founded in 2020."
console.log(response.sourceNodes); // Retrieved context nodes

Streaming Chat

const chatEngine = index.asChatEngine();

const stream = await chatEngine.chat({
  message: "Tell me about the company",
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.response);
}

Multi-modal Chat

Chat engines support images and other media:
const response = await chatEngine.chat({
  message: [
    { type: "text", text: "What's in this image?" },
    {
      type: "image_url",
      image_url: { url: "data:image/jpeg;base64,..." }
    }
  ]
});

Custom Chat History

Using ChatMemoryBuffer

import { ChatMemoryBuffer } from "@llamaindex/core/memory";

const memory = new ChatMemoryBuffer({ tokenLimit: 3000 });

const response = await chatEngine.chat({
  message: "Hello",
  chatHistory: memory
});

Manual Chat History

const customHistory: ChatMessage[] = [
  { role: "user", content: "Previous question" },
  { role: "assistant", content: "Previous answer" }
];

const response = await chatEngine.chat({
  message: "Follow-up question",
  chatHistory: customHistory
});

System Prompts

Setting System Prompt

const chatEngine = index.asChatEngine({
  systemPrompt: "You are a helpful assistant that always speaks in rhymes."
});

Custom Context Template

const chatEngine = index.asChatEngine({
  contextSystemPrompt: `
    Use the following context to answer the question.
    If you don't know, say so.
    
    Context:
    {context}
    
    Question: {query}
  `
});

Chat History Management

Accessing Chat History

const response = await chatEngine.chat({
  message: "Hello"
});

const history = await chatEngine.chatHistory;
console.log(history);
// [
//   { role: "user", content: "Hello" },
//   { role: "assistant", content: "Hi! How can I help you?" }
// ]

Resetting Chat History

import { SimpleChatEngine } from "@llamaindex/core/chat-engine";

const chatEngine = new SimpleChatEngine({ llm });

// Chat history is stored in the engine
await chatEngine.chat({ message: "Message 1" });
await chatEngine.chat({ message: "Message 2" });

// Reset by creating new engine
const newChatEngine = new SimpleChatEngine({ llm });

Retrieval Configuration

const chatEngine = index.asChatEngine({
  retriever: index.asRetriever({
    similarityTopK: 5,
    mode: "default"
  })
});

Custom Chat Engine

import { BaseChatEngine } from "@llamaindex/core/chat-engine";
import { EngineResponse } from "@llamaindex/core/schema";

class CustomChatEngine extends BaseChatEngine {
  private history: ChatMessage[] = [];
  
  async chat(params: NonStreamingChatEngineParams): Promise<EngineResponse> {
    const { message, chatHistory } = params;
    
    // Use provided history or internal history
    const messages = chatHistory ?? this.history;
    
    // Add user message
    messages.push({ role: "user", content: message });
    
    // Generate response (custom logic)
    const response = await this.generateResponse(messages);
    
    // Add assistant message
    const assistantMessage = { role: "assistant", content: response };
    messages.push(assistantMessage);
    
    // Update internal history
    this.history = messages;
    
    return {
      response,
      sourceNodes: [],
      metadata: {}
    };
  }
  
  get chatHistory() {
    return this.history;
  }
  
  private async generateResponse(messages: ChatMessage[]): Promise<string> {
    // Custom response generation
    return "Response";
  }
}

Best Practices

  1. Use context chat engine for RAG: Retrieves relevant information for each turn
  2. Manage token limits: Use ChatMemoryBuffer to prevent context overflow
  3. Provide clear system prompts: Guide the assistant’s behavior
  4. Stream long responses: Better user experience for lengthy answers
  5. Reset history periodically: Prevent context from becoming too large or stale
  6. Include source nodes: Track which documents informed the response