Pricing - Cogito

Pay for performance, not for the wait.

How billing works

Cogito charges per token of usage — separate rates for standard input tokens, cached input tokens, and output tokens. Per-model rates are published in one place: the model catalog. Each model card lists all three; the same numbers come back from GET /v1/models (pricing.prompt, pricing.input_cache_read, pricing.completion). For each request you’re billed:

cost = (input_tokens − cached_input_tokens) × input_rate
     + cached_input_tokens × cached_input_rate
     + output_tokens × output_rate

Context cache

When you re-send the same prefix tokens (a long system prompt, a reference document, a multi-turn conversation), Cogito automatically charges them at the model’s cached-input rate, which is published per-model alongside the standard rate. No code change required. Every response includes usage.cached_input_tokens alongside prompt_tokens and completion_tokens so you can verify the discount applied.

Fine-tuned variants

Your fine-tunes price the same per-token rate as the base model. We don’t charge a premium for serving them.

Hard spend caps

Set a maximum monthly spend in Dashboard → Billing. Once you hit it, requests return:

{
  "error": {
    "type": "insufficient_quota",
    "code": "spend_cap_reached",
    "message": "Monthly spend cap reached. Raise the cap or top up to continue.",
    "request_id": "req_..."
  }
}

This protects you from runaway loops and DDoS-by-bug. We never silently bypass your cap.

Free credits

New accounts start with $5 in free credits. No card required to spend them. Card is required only when you top up beyond the free tier.

Plans

Plan	Pricing	When to pick it
Free	$0 + free credits	Evaluation, prototyping
Pro	Pay as you go	Production traffic
Enterprise	Contract	VPC isolation, P99 SLA, BAA, SSO

See full plans →

​How billing works

​Context cache

​Fine-tuned variants

​Hard spend caps

​Free credits

​Plans