Errors - Cogito

Cogito returns a single, deterministic error shape on every failure:

{
  "error": {
    "type": "invalid_request_error",
    "code": "model_not_found",
    "message": "Model 'foo' is not available.",
    "request_id": "req_..."
  }
}

Always log request_id. We trace it through the gateway, scheduler, and inference cluster. Support can pinpoint your request from this single ID.

Error types

error.type is set per HTTP status by the gateway. The mapping is deterministic — the table below is the full set you can encounter.

Type	HTTP	When
`invalid_request_error`	400, 404	Malformed JSON, missing required field, invalid params, unknown model slug. All 404s file here (both body-level and gateway-level missing routes) — distinguish via `error.code` (`model_not_found` vs. `not_found`).
`authentication_error`	401	Missing / revoked API key
`insufficient_quota`	402	Balance exhausted (out of credit)
`permission_denied`	403	Key doesn’t have access to a model or feature
`rate_limit_error`	429	Token / request rate limit hit, or monthly hard spend cap reached
`api_error`	500–502, 504+	Cogito problem; retry safe
`service_unavailable`	503	Billing not configured, or inference cluster temporarily degraded; retry with backoff

Common codes

error.code is the machine-readable handle. Always log it alongside request_id. Grouped by category: Auth

invalid_api_key — Bearer header missing or revoked (401)
missing_authorization — no Bearer header at all (401)

Billing

insufficient_quota — balance ≤ 0 (402). The wire code matches error.type — both are insufficient_quota. Renamed from the earlier insufficient_balance for OpenAI parity.
spend_cap_reached — month-to-date spend has hit your hard cap (429). Renamed from the earlier spend_capped. Raise the cap in the dashboard or wait for the month to roll over.
billing_not_configured — server misconfiguration on our side (503)

Validation

invalid_request_error — generic body-shape failure (400)
model_not_found — slug typo or model retired (404)

Upstream / cluster

backend_unavailable — gateway-side error before reaching the upstream (e.g. missing backend credentials) (500, surfaces as error.type: api_error)
upstream_unavailable — TCP/connection failure to the upstream cluster (502, surfaces as error.type: api_error)
upstream_<status> — verbatim upstream HTTP status pass-through (e.g. upstream_503 when the inference cluster itself returned 503)

Retry-after

429 and 503 responses include a Retry-After header (seconds). Honor it. Exponential backoff on top of Retry-After is fine; ignoring it will get you rate-limited harder.

​Error types

​Common codes

​Retry-after

Error types

Common codes

Retry-after