> ## Documentation Index > Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt > Use this file to discover all available pages before exploring further. # Overview > Stop watching your agent type. Cogito serves open-source LLMs 14× faster than typical providers, on AWS Trainium and NVIDIA Blackwell. > **Stop watching your agent type.** Cogito runs your agents 14× faster than typical inference providers — on the open-source models you already use. Calibrated against the \~70 tok/s baseline that Together publishes for Llama 70B class models; Cogito's premium tier sustains 1000+ tok/s on supported workloads. Cogito is the LLM inference platform from Decart, an efficiency-first research lab. It's built on DOS — the Decart Optimization Stack — which extracts frontier performance from any silicon. The same stack that runs Lucy 2.0 (a large diffusion model) at sub-50ms on AWS Trainium now serves open-source LLMs at 14× the speed of typical providers. **Now serving:** Kimi K2.6 · DeepSeek V4 (Flash & Pro) · GPT-OSS · Qwen. Inference runs on a hybrid fleet: **AWS Trainium** for cost-efficient throughput, **NVIDIA Blackwell** for the largest configurations. Cogito picks the right silicon per workload; you call one API. ## Built on DOS Cogito is built on **DOS — the Decart Optimization Stack** — a multi-silicon performance layer that routes workloads to optimal silicon and extracts frontier throughput. The same stack runs Lucy 2.0, a large diffusion model, at sub-50ms latency on AWS Trainium. For LLM inference, DOS lets Cogito serve open-source models at 14× the speed of typical providers — particularly for agentic workloads, where multi-step inference latency compounds. ## Drop-in OpenAI compatible The whole API matches the OpenAI spec — chat completions, legacy text completions, streaming SSE, function calling, structured JSON outputs. Swap two values in your existing code and you're on Cogito: ```python theme={null} from openai import OpenAI client = OpenAI( base_url="https://api.cogito.decart.ai/v1", # was: https://api.openai.com/v1 api_key=os.environ["COGITO_API_KEY"], ) ``` ## Three-step quickstart Sign up at [cogito.decart.ai/sign-up](https://cogito.decart.ai/sign-up). You start with free credits — no card required. Browse the [model catalog](/getting-started/models). Each model lists context window, throughput, and per-million-token pricing. Copy the [quickstart snippet](/getting-started/quickstart). Five-minute time-to-first-token. ## Why Cogito Hybrid fleet, automatic routing. Frontier-tier throughput on the open-source models you actually use. Token-aware rate limits, deterministic errors with `request_id`, hard spend caps, zero retention by default. Five-minute TTFT. Same API for production and your fine-tunes. Get back to shipping.