Documentation Index
Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt
Use this file to discover all available pages before exploring further.
What Cogito is
Cogito is an LLM inference API for the leading open-weight models — Llama, Qwen, DeepSeek, Mistral — and your fine-tunes. Inference runs on a hybrid fleet: AWS Trainium for cost-efficient throughput, NVIDIA Blackwell for the largest configurations. We pick the right silicon per workload; you call one API.What it isn’t
Cogito is not a model lab. We don’t ship a frontier model of our own and we don’t fine-tune behind the scenes. The output you get is the open-weight model, untouched.Drop-in OpenAI compatible
The whole API matches the OpenAI spec — chat completions, streaming SSE, function calling, structured JSON outputs. Swap two values in your existing code and you’re on Cogito:Three-step quickstart
Create an API key
Sign up at cogito.decart.ai/sign-up. You start with free credits — no card required.
Pick a model
Browse the model catalog. Each model lists context window, throughput, and per-million-token pricing.
Send a request
Copy the quickstart snippet. Five-minute time-to-first-token.
Why Cogito
Trainium + GPU
Hybrid fleet, automatic routing. Open-weight models at proprietary-model speeds.
No asterisks
Token-aware rate limits, deterministic errors with
request_id, hard spend caps, zero retention by default.Cogito, ergo ship
Five-minute TTFT. Same API for production and your fine-tunes. Get back to shipping.