Together AI Pricing Plans & Tiers

Name: Together AI
Brand: Together AI
Rating: 4.5 (280 reviews)

Fast inference and fine-tuning for open-source AI models

AI & MLusage-based

Pricing last verified: March 16, 2026

Data compiled by Arthur Jacquemin, Founder & Lead Analyst

Updated March 16, 2026

Compare with others Alternatives to Together AI

Try Together AI free

Share Share

Pricing Analysis

Together AI's pricing model exposes a critical gap in the AI infrastructure market: the platform offers six separate pricing tiers (Serverless, Dedicated, GPU On-Demand, GPU Reserved, Sandbox, Fine-Tuning) with no public unit costs disclosed. This opaqueness is intentional—Together targets infrastructure-savvy organizations (ML engineers, research labs) who will request custom quotes rather than self-serve contracts. The absence of published pricing is a signal that Together's real business is not SaaS but consulting and infrastructure partnerships, positioning the company as an outsourced ML ops provider rather than an API vendor.

Dedicated Inference (private GPU instances with guaranteed performance) targets teams optimizing for inference latency at scale—real-time recommendation systems, low-latency LLM APIs, and production ML pipelines where shared infrastructure introduces unacceptable variance. Organizations choosing Dedicated sacrifice commodity pricing for performance guarantees, a tradeoff that's only economic for workloads worth >$10K/month in infrastructure cost. This tier creates natural segmentation where high-volume production teams self-select into expensive contracts.

GPU Clusters (On-demand and Reserved) blur the line between AI API and infrastructure service. Teams can provision raw GPU capacity and run any workload, not just inference—fine-tuning, training, and batch processing share the same cost model. This flexibility comes at the cost of operational overhead: provisioning clusters requires ML engineering expertise, making this tier inaccessible to product teams without dedicated ML infrastructure staff.

Strengths

Dedicated GPU inference with guaranteed performance enables low-latency production deployments without the variance inherent in shared infrastructure.
GPU Clusters (On-demand and Reserved) provide raw compute flexibility to run any workload—inference, fine-tuning, or custom pipelines—on identical infrastructure.
Multiple tier options (Serverless, Dedicated, Sandbox) segment customers by operational maturity, preventing over-engineered solutions for simple use cases.

Considerations

Pricing entirely custom (no public unit costs) creates evaluation friction—teams cannot self-assess cost-benefit before contacting sales, requiring budget negotiation before pilot projects.
Operational overhead for Dedicated and GPU Clusters requires ML engineering expertise, limiting accessibility for product teams.
Reserved GPU capacity locks teams into long-term commitments to achieve discounts, reducing pricing flexibility for variable workloads.

Ideal For

ML-native organizations and research labs requiring custom GPU cluster provisioning and guaranteed inference latency with dedicated infrastructure support.

Pricing Takeaway

Together AI's absence of published pricing signals a shift from SaaS to infrastructure consulting—teams negotiate custom contracts, not self-serve pricing.

Third-Party Ratings

G2Together AI Reviews

Best choice: Together AI

Try Together AI free

Pricing Plans (6)

Serverless Inference

Custom

✓Price per 1M tokens
✓Batch API price

Start with Serverless Inference

Dedicated Inference

Custom

✓Single-tenant GPU instances
✓Guaranteed performance
✓Support for custom models
✓Autoscaling & traffic spike handling

Start with Dedicated Inference

GPU Clusters (On-demand)

Custom

✓Pay as you go GPU capacity
✓Hourly pricing

Start with GPU Clusters (On-demand)

GPU Clusters (Reserved)

Custom

✓Reserve GPU capacity
✓Pricing for different durations

Start with GPU Clusters (Reserved)

Sandbox

Custom

✓Customize a deployment of VM sandboxes
✓Per vCPU pricing
✓Per GiB RAM pricing

Start with Sandbox

Fine-Tuning

Custom

✓Train open-source models
✓Supervised Fine-Tuning
✓Direct Preference Optimization

Start with Fine-Tuning

How does Together AI pricing compare?

See how Together AI's 6 pricing plans stack up against similar AI & ML tools.

Compare Together AI View all alternatives

Pricing History

Dedicated Endpoints - 1x H100 80GB

$3.36

Mar 26Min:$3.36Max:$3.36Mar 26

Dedicated Endpoints - 1x H200 141GB

$4.99

Mar 26Min:$4.99Max:$4.99Mar 26

GPU Cloud - NVIDIA A100

$1.30

Mar 26Min:$1.30Max:$1.30Mar 26

GPU Cloud - NVIDIA H100

$1.75

Mar 26Min:$1.75Max:$1.75Mar 26

GPU Cloud - NVIDIA H200

$3.15

Mar 26Min:$3.15Max:$3.15Mar 26

GPU Cloud - NVIDIA B200

$4.50

Mar 26Min:$4.50Max:$4.50Mar 26

Video Models - Google Veo 3.0

$1.60

Mar 26Min:$1.60Max:$1.60Mar 26

Video Models - Google Veo 3.0 + Audio

$3.20

Mar 26Min:$3.20Max:$3.20Mar 26

Audio Models - Cartesia Sonic-2

$65.00

Mar 26Min:$65.00Max:$65.00Mar 26

Audio Models - Cartesia Sonic-3

$65.00

Mar 26Min:$65.00Max:$65.00Mar 26

Image Models - FLUX.1 Krea [dev]

$0.03

Mar 26Min:$0.03Max:$0.03Mar 26

Image Models - Google Imagen 4.0 Fast

$0.02

Mar 26Min:$0.02Max:$0.02Mar 26

Image Models - Qwen Image

$0.01

Mar 26Min:$0.01Max:$0.01Mar 26

Fine-tuning - Up to 16B (LoRA)

$0.48

Mar 26Min:$0.48Max:$0.48Mar 26

Fine-tuning - 17B-69B (LoRA)

$1.50

Mar 26Min:$1.50Max:$1.50Mar 26

Fine-tuning - 70-100B (LoRA)

$2.90

Mar 26Min:$2.90Max:$2.90Mar 26

Specialized pricing - DeepSeek-R1

$10.00

Mar 26Min:$10.00Max:$10.00Mar 26

Specialized pricing - GLM-4.6

$9.00

Mar 26Min:$9.00Max:$9.00Mar 26

Specialized pricing - Kimi K2 Thinking

$15.00

Mar 26Min:$15.00Max:$15.00Mar 26

Specialized pricing - Llama 4 Maverick

$8.00

Mar 26Min:$8.00Max:$8.00Mar 26

Specialized pricing - Qwen3-Coder-480B-A35B-Instruct

$9.00

Mar 26Min:$9.00Max:$9.00Mar 26

Specialized pricing - Qwen3-235B-A22B

$6.00

Mar 26Min:$6.00Max:$6.00Mar 26

Transcription Models - Whisper Large v3

$0.00

Mar 26Min:$0.00Max:$0.00Mar 26

Specialized pricing - gpt-oss-120B

$5.00

Mar 26Min:$5.00Max:$5.00Mar 26

Free

$0.00

Feb 26Min:$0.00Max:$0.00Mar 26

Dedicated Endpoints - 1x A100 PCIe 80GB

$2.40

Mar 26Min:$2.40Max:$2.40Mar 26

Dedicated Endpoints - 1x A100 SXM 40GB

$2.40

Mar 26Min:$2.40Max:$2.40Mar 26

Dedicated Endpoints - 1x A100 SXM 80GB

$2.56

Mar 26Min:$2.56Max:$2.56Mar 26

Frequently Asked Questions

How much does Together AI cost?

As of March 2026, Together AI offers custom pricing. Contact the vendor for a quote based on your team size and requirements.

Does Together AI offer a free plan?

As of March 2026, Together AI does not advertise a free plan. Contact the vendor for pricing details.

What pricing model does Together AI use?

As of March 2026, Together AI follows a usage-based pricing structure where costs are determined by how much you actually use the tool. This model is common among ai & ml platforms.

Does Together AI offer enterprise or custom pricing?

As of March 2026, For enterprise needs, Together AI offers a Serverless Inference tier where pricing is customized to your organization. Request a quote from the Together AI team for details.

What features are included in Together AI's plans?

As of March 2026, Together AI plans offer between 2 and 4 features. Lower tiers provide core functionality, with each upgrade unlocking additional tools and integrations for growing teams.

Track Together AI Pricing Changes

Get notified when pricing changes for this tool and others you follow.

Reviews

No reviews yet. Be the first to review this tool.

Sources

Together AI Official Pricing— Vendor pricing page
Together AI Reviews— Independent reviews on G2

Are you the team behind Together AI?

Claim your profile to add custom descriptions, featured badges, and direct demo links.

Claim Your Profile

Best AI Tools Pricing Compared (2026)

AI tool pricing: ChatGPT ($20/mo), Claude ($20/mo), Jasper ($39-99/mo), Midjourney ($10-60/mo), Perplexity ($20/mo), Copy.ai ($49/mo). Pricing models and optimization.

Explore More AI & ML tools

Turbotic Amazon Bedrock Anthropic Anyscale AssemblyAI Banana.dev

Compare Together AI with similar tools

Together AI vs Turbotic Together AI vs Amazon Bedrock Together AI vs Anthropic Together AI vs Anyscale Together AI vs AssemblyAI Together AI vs Banana.dev

Together AI Pricing Plans & Tiers

Pricing Analysis

Strengths

Considerations

Third-Party Ratings

Pricing Plans (6)

Serverless Inference

Dedicated Inference

GPU Clusters (On-demand)

GPU Clusters (Reserved)

Sandbox

Fine-Tuning

How does Together AI pricing compare?

Dedicated Endpoints - 1x H100 80GB

Dedicated Endpoints - 1x H200 141GB

GPU Cloud - NVIDIA A100

GPU Cloud - NVIDIA H100

GPU Cloud - NVIDIA H200

GPU Cloud - NVIDIA B200

Video Models - Google Veo 3.0

Video Models - Google Veo 3.0 + Audio

Audio Models - Cartesia Sonic-2

Audio Models - Cartesia Sonic-3

Image Models - FLUX.1 Krea [dev]

Image Models - Google Imagen 4.0 Fast

Image Models - Qwen Image

Fine-tuning - Up to 16B (LoRA)

Fine-tuning - 17B-69B (LoRA)

Fine-tuning - 70-100B (LoRA)

Specialized pricing - DeepSeek-R1

Specialized pricing - GLM-4.6

Specialized pricing - Kimi K2 Thinking

Specialized pricing - Llama 4 Maverick

Specialized pricing - Qwen3-Coder-480B-A35B-Instruct

Specialized pricing - Qwen3-235B-A22B

Transcription Models - Whisper Large v3

Specialized pricing - gpt-oss-120B

Free

Dedicated Endpoints - 1x A100 PCIe 80GB

Dedicated Endpoints - 1x A100 SXM 40GB

Dedicated Endpoints - 1x A100 SXM 80GB

Frequently Asked Questions

Track Together AI Pricing Changes

Reviews

Sources

Are you the team behind Together AI?

Related Articles

Explore More AI & ML tools

Compare Together AI with similar tools

Dedicated Endpoints - 1x H100 80GB

Dedicated Endpoints - 1x H200 141GB

GPU Cloud - NVIDIA A100

GPU Cloud - NVIDIA H100

GPU Cloud - NVIDIA H200

GPU Cloud - NVIDIA B200

Video Models - Google Veo 3.0

Video Models - Google Veo 3.0 + Audio

Audio Models - Cartesia Sonic-2

Audio Models - Cartesia Sonic-3

Image Models - FLUX.1 Krea [dev]

Image Models - Google Imagen 4.0 Fast

Image Models - Qwen Image

Fine-tuning - Up to 16B (LoRA)

Fine-tuning - 17B-69B (LoRA)

Fine-tuning - 70-100B (LoRA)

Specialized pricing - DeepSeek-R1

Specialized pricing - GLM-4.6

Specialized pricing - Kimi K2 Thinking

Specialized pricing - Llama 4 Maverick

Specialized pricing - Qwen3-Coder-480B-A35B-Instruct

Specialized pricing - Qwen3-235B-A22B

Transcription Models - Whisper Large v3

Specialized pricing - gpt-oss-120B

Free

Dedicated Endpoints - 1x A100 PCIe 80GB

Dedicated Endpoints - 1x A100 SXM 40GB

Dedicated Endpoints - 1x A100 SXM 80GB

Compare Together AI with similar tools

Explore More AI & ML tools