Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt

Use this file to discover all available pages before exploring further.

What Cogito is

Cogito is an LLM inference API for the leading open-weight models — Llama, Qwen, DeepSeek, Mistral — and your fine-tunes. Inference runs on a hybrid fleet: AWS Trainium for cost-efficient throughput, NVIDIA Blackwell for the largest configurations. We pick the right silicon per workload; you call one API.

What it isn’t

Cogito is not a model lab. We don’t ship a frontier model of our own and we don’t fine-tune behind the scenes. The output you get is the open-weight model, untouched.

Drop-in OpenAI compatible

The whole API matches the OpenAI spec — chat completions, streaming SSE, function calling, structured JSON outputs. Swap two values in your existing code and you’re on Cogito:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.cogito.decart.ai/v1",  # was: https://api.openai.com/v1
    api_key=os.environ["COGITO_API_KEY"],
)

Three-step quickstart

1

Create an API key

Sign up at cogito.decart.ai/sign-up. You start with free credits — no card required.
2

Pick a model

Browse the model catalog. Each model lists context window, throughput, and per-million-token pricing.
3

Send a request

Copy the quickstart snippet. Five-minute time-to-first-token.

Why Cogito

Trainium + GPU

Hybrid fleet, automatic routing. Open-weight models at proprietary-model speeds.

No asterisks

Token-aware rate limits, deterministic errors with request_id, hard spend caps, zero retention by default.

Cogito, ergo ship

Five-minute TTFT. Same API for production and your fine-tunes. Get back to shipping.