Model catalog

The current MVP catalog. We add models the day they ship; older models stay supported with deprecation notice.

Model	Family	Context	TPS	$/1M in	$/1M out	License	Hardware
`gpt-oss-120b`	OpenAI	128k	70	$0.039	$0.18	Apache 2.0	AWS Trainium
`deepseek-v4-pro`	DeepSeek	1M	70	$0.435	$0.87	DeepSeek License	NVIDIA Blackwell
`deepseek-v4-flash`	DeepSeek	1M	70	$0.14	$0.28	DeepSeek License	AWS Trainium
`kimi-k2.6`	Moonshot	256k	70	$0.74	$3.49	Modified MIT	NVIDIA Blackwell
`deepseek-v3.2`	DeepSeek	128k	70	$0.252	$0.378	DeepSeek License	AWS Trainium

Throughput is locked at 70 tokens/sec across the fleet for the MVP — we run a uniform serving target while we tune Trainium / GPU autoscaling. Real-world per-request throughput varies with prompt length and concurrent batch saturation. Hardware indicates the primary tier. Cogito may transparently route between Trainium and GPU within a model’s tier if it produces lower P99 — output is bit-identical either way.

Capabilities

All catalog models support:

Streaming SSE
Function / tool calling (OpenAI-shape tools[])
Structured JSON outputs (grammar-constrained decoding)
Multi-turn chat with system prompts

Picking a model

Default chat / agent → gpt-oss-120b. Cheapest path to GPT-4-class output quality.
Hardest reasoning + long context → deepseek-v4-pro. 1M-token window, frontier reasoning.
High-volume, long context, cheap → deepseek-v4-flash. RAG and summarization workhorse.
Coding agents and tool-heavy workflows → kimi-k2.6.
Cost-sensitive production chat → deepseek-v3.2. Proven, cheap, fast.

For more on each model, see the per-model pages in the sidebar.

Getting started

Guides

Models

Capabilities

Picking a model

Getting started

Guides

Models

Documentation Index

​Capabilities

​Picking a model

Capabilities

Picking a model