Skip to main content
Cogito returns a single, deterministic error shape on every failure:
{
  "error": {
    "type": "invalid_request_error",
    "code": "model_not_found",
    "message": "Model 'foo' is not available.",
    "request_id": "req_..."
  }
}
Always log request_id. We trace it through the gateway, scheduler, and inference cluster. Support can pinpoint your request from this single ID.

Error types

error.type is set per HTTP status by the gateway. The mapping is deterministic — the table below is the full set you can encounter.
TypeHTTPWhen
invalid_request_error400, 404Malformed JSON, missing required field, invalid params, unknown model slug. All 404s file here (both body-level and gateway-level missing routes) — distinguish via error.code (model_not_found vs. not_found).
authentication_error401Missing / revoked API key
insufficient_quota402Balance exhausted (out of credit)
permission_denied403Key doesn’t have access to a model or feature
rate_limit_error429Token / request rate limit hit, or monthly hard spend cap reached
api_error500–502, 504+Cogito problem; retry safe
service_unavailable503Billing not configured, or inference cluster temporarily degraded; retry with backoff

Common codes

error.code is the machine-readable handle. Always log it alongside request_id. Grouped by category: Auth
  • invalid_api_key — Bearer header missing or revoked (401)
  • missing_authorization — no Bearer header at all (401)
Billing
  • insufficient_quota — balance ≤ 0 (402). The wire code matches error.type — both are insufficient_quota. Renamed from the earlier insufficient_balance for OpenAI parity.
  • spend_cap_reached — month-to-date spend has hit your hard cap (429). Renamed from the earlier spend_capped. Raise the cap in the dashboard or wait for the month to roll over.
  • billing_not_configured — server misconfiguration on our side (503)
Validation
  • invalid_request_error — generic body-shape failure (400)
  • model_not_found — slug typo or model retired (404)
Upstream / cluster
  • backend_unavailable — gateway-side error before reaching the upstream (e.g. missing backend credentials) (500, surfaces as error.type: api_error)
  • upstream_unavailable — TCP/connection failure to the upstream cluster (502, surfaces as error.type: api_error)
  • upstream_<status> — verbatim upstream HTTP status pass-through (e.g. upstream_503 when the inference cluster itself returned 503)

Retry-after

429 and 503 responses include a Retry-After header (seconds). Honor it. Exponential backoff on top of Retry-After is fine; ignoring it will get you rate-limited harder.