> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Stop watching your agent type. Cogito serves open-source LLMs 14× faster than typical providers, on AWS Trainium and NVIDIA Blackwell.

> **Stop watching your agent type.**

Cogito runs your agents 14× faster than typical inference providers — on the open-source models you already use. Calibrated against the \~70 tok/s baseline that Together publishes for Llama 70B class models; Cogito's premium tier sustains 1000+ tok/s on supported workloads.

Cogito is the LLM inference platform from Decart, an efficiency-first research lab. It's built on DOS — the Decart Optimization Stack — which extracts frontier performance from any silicon. The same stack that runs Lucy 2.0 (a large diffusion model) at sub-50ms on AWS Trainium now serves open-source LLMs at 14× the speed of typical providers.

**Now serving:** Kimi K2.6 · DeepSeek V4 (Flash & Pro) · GPT-OSS · Qwen.

Inference runs on a hybrid fleet: **AWS Trainium** for cost-efficient throughput, **NVIDIA Blackwell** for the largest configurations. Cogito picks the right silicon per workload; you call one API.

## Built on DOS

Cogito is built on **DOS — the Decart Optimization Stack** — a multi-silicon performance layer that routes workloads to optimal silicon and extracts frontier throughput. The same stack runs Lucy 2.0, a large diffusion model, at sub-50ms latency on AWS Trainium.

For LLM inference, DOS lets Cogito serve open-source models at 14× the speed of typical providers — particularly for agentic workloads, where multi-step inference latency compounds.

## Drop-in OpenAI compatible

The whole API matches the OpenAI spec — chat completions, legacy text completions, streaming SSE, function calling, structured JSON outputs. Swap two values in your existing code and you're on Cogito:

```python theme={null}
from openai import OpenAI

client = OpenAI(
    base_url="https://api.cogito.decart.ai/v1",  # was: https://api.openai.com/v1
    api_key=os.environ["COGITO_API_KEY"],
)
```

## Three-step quickstart

<Steps>
  <Step title="Create an API key">
    Sign up at [cogito.decart.ai/sign-up](https://cogito.decart.ai/sign-up). You start with free credits — no card required.
  </Step>

  <Step title="Pick a model">
    Browse the [model catalog](/getting-started/models). Each model lists context window, throughput, and per-million-token pricing.
  </Step>

  <Step title="Send a request">
    Copy the [quickstart snippet](/getting-started/quickstart). Five-minute time-to-first-token.
  </Step>
</Steps>

## Why Cogito

<CardGroup cols={3}>
  <Card title="Trainium + Blackwell" icon="microchip">
    Hybrid fleet, automatic routing. Frontier-tier throughput on the open-source models you actually use.
  </Card>

  <Card title="Built for operators" icon="circle-check">
    Token-aware rate limits, deterministic errors with `request_id`, hard spend caps, zero retention by default.
  </Card>

  <Card title="Cogito, ergo ship." icon="rocket">
    Five-minute TTFT. Same API for production and your fine-tunes. Get back to shipping.
  </Card>
</CardGroup>
