Pay for performance, not for the wait.
How billing works
Cogito charges per token of usage — separate rates for standard input tokens, cached input tokens, and output tokens. Per-model rates are published in one place: the model catalog. Each model card lists all three; the same numbers come back fromGET /v1/models (pricing.prompt, pricing.input_cache_read, pricing.completion).
For each request you’re billed:
Context cache
When you re-send the same prefix tokens (a long system prompt, a reference document, a multi-turn conversation), Cogito automatically charges them at the model’s cached-input rate, which is published per-model alongside the standard rate. No code change required. Every response includesusage.cached_input_tokens alongside prompt_tokens and completion_tokens so you can verify the discount applied.
Fine-tuned variants
Your fine-tunes price the same per-token rate as the base model. We don’t charge a premium for serving them.Hard spend caps
Set a maximum monthly spend in Dashboard → Billing. Once you hit it, requests return:Free credits
New accounts start with $5 in free credits. No card required to spend them. Card is required only when you top up beyond the free tier.Plans
| Plan | Pricing | When to pick it |
|---|---|---|
| Free | $0 + free credits | Evaluation, prototyping |
| Pro | Pay as you go | Production traffic |
| Enterprise | Contract | VPC isolation, P99 SLA, BAA, SSO |