Documentation Index
Fetch the complete documentation index at: https://mintlify.com/run-llama/LlamaIndexTS/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Together AI provides fast inference for open-source LLMs and embedding models. The provider extends OpenAI’s interface with Together AI’s API endpoints.Installation
Basic Usage
LLM
Embeddings
Constructor Options
TogetherLLM
Together AI model name
Together AI API key (defaults to
TOGETHER_API_KEY env variable)Sampling temperature
Maximum tokens in response
Nucleus sampling parameter
Additional OpenAI client options (e.g., custom baseURL)
TogetherEmbedding
Together AI embedding model name
Together AI API key (defaults to
TOGETHER_API_KEY env variable)Additional OpenAI client options
Supported Models
Chat Models
Llama 3.1
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo: 405B, most capablemeta-llama/Meta-Llama-3.1-70B-Instruct-Turbo: 70B, balancedmeta-llama/Meta-Llama-3.1-8B-Instruct-Turbo: 8B, fast
Llama 3
meta-llama/Meta-Llama-3-70B-Instruct-Turbometa-llama/Meta-Llama-3-8B-Instruct-Turbo
Llama 2
togethercomputer/llama-2-7b-chat: Default modeltogethercomputer/llama-2-13b-chattogethercomputer/llama-2-70b-chat
Mixtral
mistralai/Mixtral-8x7B-Instruct-v0.1mistralai/Mixtral-8x22B-Instruct-v0.1
Qwen
Qwen/Qwen2.5-72B-Instruct-TurboQwen/Qwen2.5-7B-Instruct-Turbo
Embedding Models
togethercomputer/m2-bert-80M-32k-retrieval: Default, 32K contexttogethercomputer/m2-bert-80M-8k-retrieval: 8K contextWhereIsAI/UAE-Large-V1: 512 dimensionsBAAI/bge-large-en-v1.5: BGE large English
Streaming
Function Calling
Together AI supports function calling on compatible models:With LlamaIndex
Convenience Functions
Configuration
Environment Variables
Custom Base URL
https://api.together.xyz/v1
Model Selection Guide
| Use Case | Recommended Model | Why |
|---|---|---|
| Complex reasoning | Meta-Llama-3.1-405B-Instruct-Turbo | Best quality |
| General purpose | Meta-Llama-3.1-70B-Instruct-Turbo | Balanced |
| Speed critical | Meta-Llama-3.1-8B-Instruct-Turbo | Fastest |
| Long context | togethercomputer/m2-bert-80M-32k-retrieval | 32K embeddings |
Performance
Together AI offers competitive inference speeds:- Turbo models: Optimized for low latency
- Batch processing: Efficient for high throughput
- Streaming: Real-time token generation
Error Handling
Best Practices
- Use Turbo models: Better performance for production
- Match embedding context: Use 32K model for long documents
- Enable streaming: Better UX for chat applications
- Choose right model size: Balance cost vs. quality needs
- Set appropriate tokens: Control response length and costs