Overview

What Cogito is
What it isn’t
Drop-in OpenAI compatible
Three-step quickstart
Why Cogito

What Cogito is

Cogito is an LLM inference API for the leading open-weight models — Llama, Qwen, DeepSeek, Mistral — and your fine-tunes. Inference runs on a hybrid fleet: AWS Trainium for cost-efficient throughput, NVIDIA Blackwell for the largest configurations. We pick the right silicon per workload; you call one API.

What it isn’t

Cogito is not a model lab. We don’t ship a frontier model of our own and we don’t fine-tune behind the scenes. The output you get is the open-weight model, untouched.

Drop-in OpenAI compatible

The whole API matches the OpenAI spec — chat completions, streaming SSE, function calling, structured JSON outputs. Swap two values in your existing code and you’re on Cogito:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cogito.decart.ai/v1",  # was: https://api.openai.com/v1
    api_key=os.environ["COGITO_API_KEY"],
)

Three-step quickstart

Create an API key

Pick a model

Browse the model catalog. Each model lists context window, throughput, and per-million-token pricing.

Send a request

Copy the quickstart snippet. Five-minute time-to-first-token.

Why Cogito

Trainium + GPU

Hybrid fleet, automatic routing. Open-weight models at proprietary-model speeds.

No asterisks

Token-aware rate limits, deterministic errors with request_id, hard spend caps, zero retention by default.

Cogito, ergo ship

Five-minute TTFT. Same API for production and your fine-tunes. Get back to shipping.

Quickstart

Getting started

Guides

Models

What Cogito is

What it isn’t

Drop-in OpenAI compatible

Three-step quickstart

Why Cogito

Trainium + GPU

No asterisks

Cogito, ergo ship

Getting started

Guides

Models

Documentation Index

​What Cogito is

​What it isn’t

​Drop-in OpenAI compatible

​Three-step quickstart

​Why Cogito

Trainium + GPU

No asterisks

Cogito, ergo ship

What Cogito is

What it isn’t

Drop-in OpenAI compatible

Three-step quickstart

Why Cogito