Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt

Use this file to discover all available pages before exploring further.

How billing works

Cogito charges per input token and per output token, with separate rates for each. Output is generally 1.5–3× more expensive per token than input — output tokens are generated sequentially on GPU/Trainium and consume more compute. For each request you’re billed:
cost = (input_tokens × input_rate) + (output_tokens × output_rate)
Both rates are listed per model on the model catalog.

Context cache discount

When you re-send the same prefix tokens (a long system prompt, a reference document, a multi-turn conversation), Cogito automatically caches them and bills cached input tokens at 50% of the standard input rate. No code change required. You can verify the discount applied: every response includes usage.cached_input_tokens alongside prompt_tokens and completion_tokens.

Fine-tuned variants

Your fine-tunes price the same per-token rate as the base model. We don’t charge a premium for serving them.

Hard spend caps

Set a maximum monthly spend in Dashboard → Billing. Once you hit it, requests return:
{
  "error": {
    "type": "insufficient_quota",
    "code": "spend_cap_reached",
    "message": "Monthly spend cap reached. Raise the cap or top up to continue.",
    "request_id": "req_..."
  }
}
This protects you from runaway loops and DDoS-by-bug. We never silently bypass your cap.

Free credits

New accounts start with $5 in free credits. No card required to spend them. Card is required only when you top up beyond the free tier.

Plans

PlanPricingWhen to pick it
Free$0 + free creditsEvaluation, prototyping
ProPay as you goProduction traffic
EnterpriseContractVPC isolation, P99 SLA, BAA, SSO
See full plans →