MODEL INFERENCE

Deploy AI Models

One-click deployment for the latest open-source AI models. Run DeepSeek, Llama 4, and more with serverless inference or dedicated GPU infrastructure.

Full Catalog

All Available Models

Browse our complete model catalog with filtering and one-click deployment.

Model	Parameters	Category	Context	Pricing
DeepSeek V3	671B (37B active)	MoE	64K tokens	In: ₹20/1M tokens Out: ₹40/1M tokens
DeepSeek R1s	671B (37B active)	Reasoning	64K tokens	In: ₹25/1M tokens Out: ₹50/1M tokens
Llama 4 Maverick	17B x 128 Experts	MoE	128K tokens	In: ₹30/1M tokens Out: ₹60/1M tokens
Llama 4 Scout	17B x 16 Experts	MoE	128K tokens	In: ₹15/1M tokens Out: ₹30/1M tokens
GPT OSS 120B	120 Billion	General Purpose	8,192 tokens	In: ₹80/1M tokens Out: ₹160/1M tokens
Hermes 3 Llama 3.1 405B	405 Billion	General Purpose	128K tokens	In: ₹60/1M tokens Out: ₹120/1M tokens
Sarvam-2B	2 Billion	Multilingual	4,096 tokens	In: ₹5/1M tokens Out: ₹10/1M tokens
Dolphin 2.9.2 Mistral 8x22B	8 x 22B MoE	MoE	64K tokens	In: ₹40/1M tokens Out: ₹80/1M tokens
DeepSeek V3 0324	671B (37B active)	General Purpose	64K tokens	In: ₹20/1M tokens Out: ₹40/1M tokens

Showing 1–9 of 9 models

DeepSeek R1

Advanced reasoning model with chain-of-thought capabilities. Excels at mathematical reasoning, logical puzzles, and complex problem solving.

Llama 4 Maverick

Meta's latest flagship MoE model with 128 specialized experts and 128K context length. Superior instruction following with efficient inference.

GPT OSS 120B

Large-scale open-source GPT model with 120 billion parameters. Enterprise-grade performance for chatbots, content generation, and code assistance under Apache 2.0 license.

DeepSeek V3 0324

March 2024 release of DeepSeek V3 with 671B parameters in MoE architecture. Enhanced reasoning, coding, and multilingual capabilities with 64K context length.

Llama 4 Scout

Meta's efficient MoE model optimized for speed and cost. 16 experts deliver competitive quality at half the cost of Maverick, with the same 128K context length for versatile applications.

Dolphin 2.9.2 Mistral 8x22B

Uncensored AI model based on Mistral 8x22B MoE. Fine-tuned on the Dolphin dataset for unrestricted content generation, creative writing, and research applications.

1 2 Next

How It Works

Platform Features

Choose how you want to run your AI models.

One-Click Deploy

Deploy any model in seconds with pre-optimized configurations.

OpenAI-Compatible API

Drop-in replacement for OpenAI API with minimal code changes.

Auto-Scaling

Scale from zero to thousands of requests automatically.

Fine-Tuning Ready

Customize models on your data with built-in fine-tuning.

Private Deployment

Deploy in your VPC for data privacy and compliance.

Usage Analytics

Monitor costs, latency, and usage with detailed dashboards.