Skip to main content
Cogito serves a curated set of frontier open-source models — Kimi K2.6 · DeepSeek V4 (Flash & Pro) · GPT-OSS · Qwen. The full served catalog below. We add models the day they ship; older models stay supported with deprecation notice.
ModelFamilyContextMax outputTPSLicense
moonshotai/kimi-k2.6Moonshot256K256K70Modified MIT
moonshotai/kimi-k2.6:fastMoonshot256K256K70Modified MIT
kimi-ttMoonshot256K256K70Modified MIT
deepseek-v4-proDeepSeek1M128K70DeepSeek License
deepseek-v4-flashDeepSeek1M1M70DeepSeek License
gpt-oss-120bOpenAI128K32K70Apache 2.0
qwen-3-235bAlibaba256K32K70Apache 2.0
Live pricing — input, cached input, and output rates per model — is published on the model catalog and returned by GET /v1/models. Single source of truth so the website, the gateway, and your code never drift. For Kimi K2.6 routes, Moonshot documents the limit as input plus output fitting within the 256K context window; Cogito therefore advertises a 256K max-output cap while upstream may still reject requests whose prompt leaves insufficient room. Throughput is locked at 70 tokens/sec across the fleet for the MVP — we run a uniform serving target while we tune Trainium / GPU autoscaling. Real-world per-request throughput varies with prompt length and concurrent batch saturation. Hardware is managed by Cogito. We route each model to the right silicon for the workload and may transparently shift between tiers if it produces lower P99 — output is bit-identical either way.

Capabilities

All catalog models support:
  • Streaming SSE
  • Function / tool calling (OpenAI-shape tools[])
  • Structured JSON outputs (grammar-constrained decoding)
  • Multi-turn chat with system prompts

Picking a model

  • Default agent / high-volume RAGdeepseek-v4-flash. Cheap, 1M-token window, the live workhorse today.
  • Hardest reasoning + long contextdeepseek-v4-pro. 1M-token window, frontier reasoning.
  • Coding agents and tool-heavy workflowsmoonshotai/kimi-k2.6. Served from the AWS B300 high-capacity route.
  • Latency-sensitive Kimi requestsmoonshotai/kimi-k2.6:fast. CoreWeave B200 low-latency route; lower per-route capacity than the default.
  • Experimental high-concurrency Kimi routekimi-tt.
  • Cheap general chat / codinggpt-oss-120b. Cheapest path to GPT-4-class output quality.
  • Multilingual + tool useqwen-3-235b. Strong non-English coverage.
For more on each model, see the per-model pages in the sidebar.