Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt

Use this file to discover all available pages before exploring further.

The current MVP catalog. We add models the day they ship; older models stay supported with deprecation notice.
ModelFamilyContextTPS$/1M in$/1M outLicenseHardware
gpt-oss-120bOpenAI128k70$0.039$0.18Apache 2.0AWS Trainium
deepseek-v4-proDeepSeek1M70$0.435$0.87DeepSeek LicenseNVIDIA Blackwell
deepseek-v4-flashDeepSeek1M70$0.14$0.28DeepSeek LicenseAWS Trainium
kimi-k2.6Moonshot256k70$0.74$3.49Modified MITNVIDIA Blackwell
deepseek-v3.2DeepSeek128k70$0.252$0.378DeepSeek LicenseAWS Trainium
Throughput is locked at 70 tokens/sec across the fleet for the MVP — we run a uniform serving target while we tune Trainium / GPU autoscaling. Real-world per-request throughput varies with prompt length and concurrent batch saturation. Hardware indicates the primary tier. Cogito may transparently route between Trainium and GPU within a model’s tier if it produces lower P99 — output is bit-identical either way.

Capabilities

All catalog models support:
  • Streaming SSE
  • Function / tool calling (OpenAI-shape tools[])
  • Structured JSON outputs (grammar-constrained decoding)
  • Multi-turn chat with system prompts

Picking a model

  • Default chat / agentgpt-oss-120b. Cheapest path to GPT-4-class output quality.
  • Hardest reasoning + long contextdeepseek-v4-pro. 1M-token window, frontier reasoning.
  • High-volume, long context, cheapdeepseek-v4-flash. RAG and summarization workhorse.
  • Coding agents and tool-heavy workflowskimi-k2.6.
  • Cost-sensitive production chatdeepseek-v3.2. Proven, cheap, fast.
For more on each model, see the per-model pages in the sidebar.