request_id. We trace it through the gateway, scheduler, and inference cluster. Support can pinpoint your request from this single ID.
Error types
error.type is set per HTTP status by the gateway. The mapping is
deterministic — the table below is the full set you can encounter.
| Type | HTTP | When |
|---|---|---|
invalid_request_error | 400, 404 | Malformed JSON, missing required field, invalid params, unknown model slug. All 404s file here (both body-level and gateway-level missing routes) — distinguish via error.code (model_not_found vs. not_found). |
authentication_error | 401 | Missing / revoked API key |
insufficient_quota | 402 | Balance exhausted (out of credit) |
permission_denied | 403 | Key doesn’t have access to a model or feature |
rate_limit_error | 429 | Token / request rate limit hit, or monthly hard spend cap reached |
api_error | 500–502, 504+ | Cogito problem; retry safe |
service_unavailable | 503 | Billing not configured, or inference cluster temporarily degraded; retry with backoff |
Common codes
error.code is the machine-readable handle. Always log it alongside
request_id. Grouped by category:
Auth
invalid_api_key— Bearer header missing or revoked (401)missing_authorization— no Bearer header at all (401)
insufficient_quota— balance ≤ 0 (402). The wire code matcheserror.type— both areinsufficient_quota. Renamed from the earlierinsufficient_balancefor OpenAI parity.spend_cap_reached— month-to-date spend has hit your hard cap (429). Renamed from the earlierspend_capped. Raise the cap in the dashboard or wait for the month to roll over.billing_not_configured— server misconfiguration on our side (503)
invalid_request_error— generic body-shape failure (400)model_not_found— slug typo or model retired (404)
backend_unavailable— gateway-side error before reaching the upstream (e.g. missing backend credentials) (500, surfaces aserror.type: api_error)upstream_unavailable— TCP/connection failure to the upstream cluster (502, surfaces aserror.type: api_error)upstream_<status>— verbatim upstream HTTP status pass-through (e.g.upstream_503when the inference cluster itself returned 503)
Retry-after
429 and 503 responses include a Retry-After header (seconds). Honor it. Exponential backoff on top of Retry-After is fine; ignoring it will get you rate-limited harder.