671B (37B active)
Get GPT-4 class performance at a fraction of the cost.
671B (37B active)
Unlock advanced reasoning capabilities for your applications.
17B x 128 Experts
Experience Meta's most capable open model with 128K context.
120 Billion
Start with our serverless API or deploy on dedicated GPUs.
Pre-optimized models ready for deployment on our GPU infrastructure.
| Model | Parameters | Category | Context | Action |
|---|---|---|---|---|
| DeepSeek V3 | 671B (37B active) | MoE | 64K tokens | |
| DeepSeek R1 | 671B (37B active) | Reasoning | 64K tokens | |
| Llama 4 Maverick | 17B x 128 Experts | MoE | 128K tokens | |
| Hermes 3 Llama 3.1 405B | 405 Billion | General Purpose | 128K tokens | |
| Sarvam-2B | 2 Billion | Multilingual | 4,096 tokens | |
| GPT OSS 120B | 120 Billion | General Purpose | 8,192 tokens | |
| Llama 4 Scout | 17B x 16 Experts | MoE | 128K tokens | |
| Dolphin 2.9.2 Mistral 8x22B | 8 x 22B MoE | MoE | 64K tokens | |
| DeepSeek V3 0324 | 671B (37B active) | General Purpose | 64K tokens |
Choose how you want to run your AI models.
Pay-per-token pricing with instant scaling. No GPU management, auto-scaling to zero, pay only for usage, sub-second latency.
Reserved GPU capacity for consistent performance. Guaranteed capacity, custom fine-tuning, VPC deployment, SLA guarantees.
Choose how you want to run your AI models.
Deploy any model in seconds with pre-optimized configurations.
Drop-in replacement for OpenAI API with minimal code changes.
Scale from zero to thousands of requests automatically.
Customize models on your data with built-in fine-tuning.
Deploy in your VPC for data privacy and compliance.
Monitor costs, latency, and usage with detailed dashboards.