Core Concepts

LlamaIndex.TS is built around a few key concepts that work together to enable powerful LLM applications. Understanding these concepts will help you build more effective RAG systems, agents, and workflows.

Overview

At its core, LlamaIndex.TS helps you:

Load and process your data into structured formats
Index that data for efficient retrieval
Query the indexed data using natural language
Generate responses using LLMs with relevant context

All of these components are modular and composable, allowing you to customize every part of the pipeline.

Documents and Nodes

Documents

Documents are the primary data containers in LlamaIndex.TS. They represent your raw data with metadata.

import { Document } from "llamaindex";

// Create a document from text
const doc = new Document({
  text: "LlamaIndex is a data framework for LLM applications.",
  id_: "doc1",
});

// Documents can have metadata
const docWithMetadata = new Document({
  text: "Annual revenue increased by 25%.",
  metadata: {
    year: 2024,
    source: "financial_report.pdf",
    page: 5,
  },
});

Nodes

Nodes are atomic units of data in LlamaIndex.TS. Documents are split into nodes (chunks) for efficient retrieval.

import { TextNode } from "@llamaindex/core/schema";

const node = new TextNode({
  text: "This is a chunk of text.",
  metadata: { source: "doc1" },
});

Nodes are created automatically when you index documents, but you can also create them manually for fine-grained control.

Node Parsing

LlamaIndex.TS includes several node parsers (text splitters) to chunk your documents:

SentenceSplitter: Splits by sentences while respecting chunk size
SimpleNodeParser: Basic chunking with overlap
MarkdownNodeParser: Preserves markdown structure
CodeSplitter: Language-aware code splitting

import { SentenceSplitter } from "@llamaindex/core/node-parser";
import { Document } from "llamaindex";

const parser = new SentenceSplitter({
  chunkSize: 1024,
  chunkOverlap: 20,
});

const nodes = parser.getNodesFromDocuments([document]);

Embeddings

Embeddings are vector representations of text that capture semantic meaning. They enable semantic search by measuring similarity between queries and documents.

import { OpenAIEmbedding } from "@llamaindex/openai";

const embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
});

// Embed a single text
const embedding = await embedModel.getTextEmbedding(
  "What is LlamaIndex?"
);

// embedding is a number array: [0.123, -0.456, ...]

By default, LlamaIndex.TS uses OpenAI’s embedding models, but you can use any provider including local models via Ollama.

Indices

Indices are data structures that organize your nodes for efficient retrieval. The most common is the VectorStoreIndex.

VectorStoreIndex

Stores embeddings for semantic search:

import { VectorStoreIndex, Document } from "llamaindex";

const documents = [
  new Document({ text: "LlamaIndex supports multiple runtimes." }),
  new Document({ text: "RAG improves LLM accuracy with your data." }),
];

// Create index from documents
const index = await VectorStoreIndex.fromDocuments(documents);

// Add more documents later
await index.insert(new Document({ text: "New information" }));

Other Index Types

LlamaIndex.TS provides several index types for different use cases:

SummaryIndex: Sequential scanning of all nodes
KeywordTableIndex: Keyword-based retrieval
KnowledgeGraphIndex: Graph-based relationships

import { SummaryIndex } from "llamaindex/indices";

const summaryIndex = await SummaryIndex.fromDocuments(documents);

Retrieval

Retrievers fetch relevant nodes from an index based on a query.

// Create a retriever from an index
const retriever = index.asRetriever({
  similarityTopK: 5,  // Return top 5 most similar nodes
});

// Retrieve relevant nodes
const results = await retriever.retrieve({
  query: "What runtimes does LlamaIndex support?",
});

results.forEach(result => {
  console.log(result.node.getText());
  console.log("Score:", result.score);
});

Advanced Retrieval

Combine multiple retrieval strategies:

Hybrid Search: Combine semantic and keyword search
Reranking: Improve results with a reranker model
Metadata Filtering: Filter by document metadata

import { MetadataFilters } from "@llamaindex/core/schema";

const retriever = index.asRetriever({
  similarityTopK: 10,
  filters: new MetadataFilters({
    filters: [{
      key: "year",
      value: 2024,
      operator: "==",
    }],
  }),
});

Query Engines

Query Engines combine retrieval and response generation to answer questions.

// Create a query engine from an index
const queryEngine = index.asQueryEngine();

// Query your data
const response = await queryEngine.query({
  query: "What is RAG?",
});

console.log(response.toString());

Streaming Responses

Stream responses for better UX:

const stream = await queryEngine.query({
  query: "Explain LlamaIndex",
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.response);
}

Chat Engines

Chat Engines enable multi-turn conversations with context retention.

import { ContextChatEngine } from "llamaindex";

const chatEngine = new ContextChatEngine({
  retriever,
  chatHistory: [],  // Optional: provide existing history
});

// First message
const response1 = await chatEngine.chat({
  message: "What is LlamaIndex?",
});

// Follow-up (context is maintained)
const response2 = await chatEngine.chat({
  message: "What runtimes does it support?",
});

Chat Engine Types

Different chat engines for different use cases:

ContextChatEngine: Retrieves context for each message
SimpleChatEngine: Direct chat without retrieval
CondensePlusContextChatEngine: Condenses chat history before retrieval

LLMs (Large Language Models)

LLMs generate the final responses in your application. LlamaIndex.TS supports multiple providers.

import { OpenAI } from "@llamaindex/openai";
import { Settings } from "llamaindex";

// Configure the default LLM globally
Settings.llm = new OpenAI({
  model: "gpt-4o",
  temperature: 0.1,
});

// Or use directly
const llm = new OpenAI({ model: "gpt-4o-mini" });
const response = await llm.chat({
  messages: [
    { role: "user", content: "Hello!" },
  ],
});

Switching Providers

Easily switch between LLM providers:

import { claude } from "@llamaindex/anthropic";
import { Gemini } from "@llamaindex/gemini";

// Use Anthropic
Settings.llm = claude({ model: "claude-3-5-sonnet-20241022" });

// Or Google Gemini
Settings.llm = new Gemini({ model: "gemini-pro" });

RAG (Retrieval-Augmented Generation)

RAG is the core pattern that combines retrieval with generation. It allows LLMs to answer questions using your data.

How RAG Works

Index Your Data

Documents are chunked, embedded, and stored in a vector index.

Query Processing

User query is embedded using the same embedding model.

Retrieval

Most similar chunks are retrieved based on vector similarity.

Context Augmentation

Retrieved chunks are added to the LLM prompt as context.

Generation

LLM generates a response using the provided context.

Basic RAG Pipeline

import { VectorStoreIndex, Document } from "llamaindex";

// 1. Index
const index = await VectorStoreIndex.fromDocuments([
  new Document({ text: "Your data here" }),
]);

// 2. Query (retrieval + generation)
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({
  query: "Your question",
});

console.log(response.toString());

Agents and Workflows

For more advanced use cases, agents can reason, plan, and use tools to accomplish tasks.

import { agent } from "@llamaindex/workflow";
import { openai } from "@llamaindex/openai";
import { tool } from "llamaindex";
import { z } from "zod";

// Define a tool
const searchTool = tool({
  name: "search",
  description: "Search the web for information",
  parameters: z.object({
    query: z.string(),
  }),
  execute: async ({ query }) => {
    // Your search implementation
    return `Results for: ${query}`;
  },
});

// Create an agent
const myAgent = agent({
  llm: openai({ model: "gpt-4o" }),
  tools: [searchTool],
});

// Run the agent
const result = await myAgent.run("Find recent news about AI");

Agents are powerful for tasks that require multiple steps, external API calls, or decision-making.

Settings and Configuration

Settings is a global configuration object that controls default behavior:

import { Settings } from "llamaindex";
import { OpenAI, OpenAIEmbedding } from "@llamaindex/openai";

// Configure LLM
Settings.llm = new OpenAI({
  model: "gpt-4o",
  temperature: 0.1,
});

// Configure embeddings
Settings.embedModel = new OpenAIEmbedding({
  model: "text-embedding-3-small",
});

// Configure chunking
Settings.chunkSize = 512;
Settings.chunkOverlap = 50;

// Configure callbacks for logging
Settings.callbackManager.on("llm-tool-call", (event) => {
  console.log("Tool called:", event);
});

Vector Stores

For production applications, use a dedicated vector database instead of in-memory storage:

import { VectorStoreIndex } from "llamaindex";
import { PineconeVectorStore } from "@llamaindex/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";

// Initialize vector store
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const pineconeIndex = pinecone.Index("my-index");
const vectorStore = new PineconeVectorStore({ pineconeIndex });

// Create index with vector store
const index = await VectorStoreIndex.fromDocuments(
  documents,
  { vectorStore }
);

Supported vector stores:

Pinecone
Qdrant
Chroma
Weaviate
Milvus
MongoDB Atlas
PostgreSQL (pgvector)
And more!

Next Steps

Now that you understand the core concepts, dive deeper into specific topics:

Query Engines

Learn about different query engine types and customization

Chat Engines

Build conversational interfaces with context

Agents

Create intelligent agents with tools and reasoning

Vector Stores

Integrate production vector databases

Documentation Index

​Core Concepts

​Overview

​Documents and Nodes

​Documents

​Nodes

​Node Parsing

​Embeddings

​Indices

​VectorStoreIndex

​Other Index Types

​Retrieval

​Advanced Retrieval

​Query Engines

​Streaming Responses

​Chat Engines

​Chat Engine Types

​LLMs (Large Language Models)

​Switching Providers

​RAG (Retrieval-Augmented Generation)

​How RAG Works

​Basic RAG Pipeline

​Agents and Workflows

​Settings and Configuration

​Vector Stores

​Next Steps

Query Engines

Chat Engines

Agents

Vector Stores

Core Concepts

Overview

Documents and Nodes

Documents

Nodes

Node Parsing

Embeddings

Indices

VectorStoreIndex

Other Index Types

Retrieval

Advanced Retrieval

Query Engines

Streaming Responses

Chat Engines

Chat Engine Types

LLMs (Large Language Models)

Switching Providers

RAG (Retrieval-Augmented Generation)

How RAG Works

Basic RAG Pipeline

Agents and Workflows

Settings and Configuration

Vector Stores

Next Steps