Skip to main content
Pay for performance, not for the wait.

How billing works

Cogito charges per token of usage — separate rates for standard input tokens, cached input tokens, and output tokens. Per-model rates are published in one place: the model catalog. Each model card lists all three; the same numbers come back from GET /v1/models (pricing.prompt, pricing.input_cache_read, pricing.completion). For each request you’re billed:
cost = (input_tokens − cached_input_tokens) × input_rate
     + cached_input_tokens × cached_input_rate
     + output_tokens × output_rate

Context cache

When you re-send the same prefix tokens (a long system prompt, a reference document, a multi-turn conversation), Cogito automatically charges them at the model’s cached-input rate, which is published per-model alongside the standard rate. No code change required. Every response includes usage.cached_input_tokens alongside prompt_tokens and completion_tokens so you can verify the discount applied.

Fine-tuned variants

Your fine-tunes price the same per-token rate as the base model. We don’t charge a premium for serving them.

Hard spend caps

Set a maximum monthly spend in Dashboard → Billing. Once you hit it, requests return:
{
  "error": {
    "type": "insufficient_quota",
    "code": "spend_cap_reached",
    "message": "Monthly spend cap reached. Raise the cap or top up to continue.",
    "request_id": "req_..."
  }
}
This protects you from runaway loops and DDoS-by-bug. We never silently bypass your cap.

Free credits

New accounts start with $5 in free credits. No card required to spend them. Card is required only when you top up beyond the free tier.

Plans

PlanPricingWhen to pick it
Free$0 + free creditsEvaluation, prototyping
ProPay as you goProduction traffic
EnterpriseContractVPC isolation, P99 SLA, BAA, SSO
See full plans →