Together AI

Overview

Together AI provides fast inference for open-source LLMs and embedding models. The provider extends OpenAI’s interface with Together AI’s API endpoints.

Installation

npm install @llamaindex/together

Basic Usage

LLM

import { TogetherLLM } from "@llamaindex/together";

const llm = new TogetherLLM({
  model: "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
  apiKey: process.env.TOGETHER_API_KEY
});

const response = await llm.chat({
  messages: [
    { role: "user", content: "Explain machine learning" }
  ]
});

console.log(response.message.content);

Embeddings

import { TogetherEmbedding } from "@llamaindex/together";

const embedModel = new TogetherEmbedding({
  model: "togethercomputer/m2-bert-80M-32k-retrieval",
  apiKey: process.env.TOGETHER_API_KEY
});

const embedding = await embedModel.getTextEmbedding(
  "LlamaIndex is a data framework for LLM applications"
);

Constructor Options

TogetherLLM

string

default:"togethercomputer/llama-2-7b-chat"

Together AI model name

string

Together AI API key (defaults to TOGETHER_API_KEY env variable)

number

Sampling temperature

number

Maximum tokens in response

number

Nucleus sampling parameter

object

Additional OpenAI client options (e.g., custom baseURL)

TogetherEmbedding

string

default:"togethercomputer/m2-bert-80M-32k-retrieval"

Together AI embedding model name

string

Together AI API key (defaults to TOGETHER_API_KEY env variable)

object

Additional OpenAI client options

Supported Models

Chat Models

Llama 3.1

meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo: 405B, most capable
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo: 70B, balanced
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo: 8B, fast

Llama 3

meta-llama/Meta-Llama-3-70B-Instruct-Turbo
meta-llama/Meta-Llama-3-8B-Instruct-Turbo

Llama 2

togethercomputer/llama-2-7b-chat: Default model
togethercomputer/llama-2-13b-chat
togethercomputer/llama-2-70b-chat

Mixtral

mistralai/Mixtral-8x7B-Instruct-v0.1
mistralai/Mixtral-8x22B-Instruct-v0.1

Qwen

Qwen/Qwen2.5-72B-Instruct-Turbo
Qwen/Qwen2.5-7B-Instruct-Turbo

Embedding Models

togethercomputer/m2-bert-80M-32k-retrieval: Default, 32K context
togethercomputer/m2-bert-80M-8k-retrieval: 8K context
WhereIsAI/UAE-Large-V1: 512 dimensions
BAAI/bge-large-en-v1.5: BGE large English

Streaming

const stream = await llm.chat({
  messages: [{ role: "user", content: "Write a story about AI" }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);
}

Function Calling

Together AI supports function calling on compatible models:

import { tool } from "@llamaindex/core/tools";
import { z } from "zod";

const weatherTool = tool({
  name: "get_weather",
  description: "Get current weather",
  parameters: z.object({
    location: z.string(),
    units: z.enum(["celsius", "fahrenheit"]).optional()
  }),
  execute: async ({ location, units = "celsius" }) => {
    return `Weather in ${location}: 22°${units === "celsius" ? "C" : "F"}`;
  }
});

const llm = new TogetherLLM({
  model: "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
});

const response = await llm.chat({
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
  tools: [weatherTool]
});

With LlamaIndex

import { Settings, VectorStoreIndex } from "llamaindex";
import { TogetherLLM, TogetherEmbedding } from "@llamaindex/together";

Settings.llm = new TogetherLLM({
  model: "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
});

Settings.embedModel = new TogetherEmbedding({
  model: "togethercomputer/m2-bert-80M-32k-retrieval"
});

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
  query: "What are the key features?"
});

Convenience Functions

import { together } from "@llamaindex/together";

// Quick LLM instance
const llm = together({
  model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
});

Configuration

Environment Variables

TOGETHER_API_KEY=your-api-key-here

Custom Base URL

const llm = new TogetherLLM({
  additionalSessionOptions: {
    baseURL: "https://custom-together-endpoint.com/v1"
  }
});

Default base URL: https://api.together.xyz/v1

Model Selection Guide

Use Case	Recommended Model	Why
Complex reasoning	Meta-Llama-3.1-405B-Instruct-Turbo	Best quality
General purpose	Meta-Llama-3.1-70B-Instruct-Turbo	Balanced
Speed critical	Meta-Llama-3.1-8B-Instruct-Turbo	Fastest
Long context	togethercomputer/m2-bert-80M-32k-retrieval	32K embeddings

Performance

Together AI offers competitive inference speeds:

Turbo models: Optimized for low latency
Batch processing: Efficient for high throughput
Streaming: Real-time token generation

Error Handling

try {
  const response = await llm.chat({ messages });
} catch (error) {
  if (error.message.includes("TOGETHER_API_KEY")) {
    console.error("API key not set or invalid");
  } else {
    console.error("API error:", error.message);
  }
}

Best Practices

Use Turbo models: Better performance for production
Match embedding context: Use 32K model for long documents
Enable streaming: Better UX for chat applications
Choose right model size: Balance cost vs. quality needs
Set appropriate tokens: Control response length and costs

Pricing

Together AI offers competitive pricing for open-source models. Check Together AI pricing for current rates.

Core Package

Main Package

LLM Providers

Vector Stores

Workflow & Tools

Overview

Installation

Basic Usage

LLM

Embeddings

Constructor Options

TogetherLLM

TogetherEmbedding

Supported Models

Chat Models

Llama 3.1

Llama 3

Llama 2

Mixtral

Qwen

Embedding Models

Streaming

Function Calling

With LlamaIndex

Convenience Functions

Configuration

Environment Variables

Custom Base URL

Model Selection Guide

Performance

Error Handling

Best Practices

Pricing

See Also

​Overview

​Installation

​Basic Usage

​LLM

​Embeddings

​Constructor Options

​TogetherLLM

​TogetherEmbedding

​Supported Models

​Chat Models

​Llama 3.1

​Llama 3

​Llama 2

​Mixtral

​Qwen

​Embedding Models

​Streaming

​Function Calling

​With LlamaIndex

​Convenience Functions

​Configuration

​Environment Variables

​Custom Base URL

​Model Selection Guide

​Performance

​Error Handling

​Best Practices

​Pricing

​See Also

Overview

Installation

Basic Usage

LLM

Embeddings

Constructor Options

TogetherLLM

TogetherEmbedding

Supported Models

Chat Models

Llama 3.1

Llama 3

Llama 2

Mixtral

Qwen

Embedding Models

Streaming

Function Calling

With LlamaIndex

Convenience Functions

Configuration

Environment Variables

Custom Base URL

Model Selection Guide

Performance

Error Handling

Best Practices

Pricing

See Also