Create chat completion

OpenAI-compatible chat completions endpoint. Streaming and non-streaming.

Request body

model

string

required

Model slug from the catalog. Example: gpt-oss-120b.

messages

array

required

Conversation history. Each entry is { role: "system" | "user" | "assistant" | "tool", content: string }.

stream

boolean

default:"false"

When true, responses are streamed as Server-Sent Events. See the streaming guide.

temperature

number

default:"1"

Sampling temperature, 0 to 2. Lower → more deterministic.

top_p

number

default:"1"

Nucleus sampling. Use either temperature or top_p, not both.

max_tokens

integer

Maximum output tokens. Capped per tier — see pricing.

tools

array

Available tools the model may call. See function calling.

tool_choice

string | object

default:"auto"

"auto" (default), "none", or { type: "function", function: { name: ... } } to force.

response_format

object

{ type: "json_object" } or { type: "json_schema", json_schema: {...} }. See structured outputs.

stop

string | string[]

Up to 4 stop sequences.

frequency_penalty

number

default:"0"

-2.0 to 2.0. Penalize tokens by their frequency in the response so far.

presence_penalty

number

default:"0"

-2.0 to 2.0. Penalize tokens that have appeared at all.

Response (non-streaming)

{
  "id": "req_...",
  "object": "chat.completion",
  "created": 1714521600,
  "model": "gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 71,
    "total_tokens": 103,
    "cached_input_tokens": 0
  }
}

Response (streaming)

Content-Type: text/event-stream. Each event is data: <chat.completion.chunk JSON>. Stream ends with data: [DONE]. See the streaming guide.

Headers on every response

x-request-id — opaque ID. Log it. We trace it through every layer.
x-tokens-used — billed total for this request (omitted on errors that didn’t consume tokens).

Chat

Models

Errors

Create chat completion

Request body

Response (non-streaming)

Response (streaming)

Headers on every response

Chat

Models

Errors

Documentation Index

​Request body

​Response (non-streaming)

​Response (streaming)

​Headers on every response

Request body

Response (non-streaming)

Response (streaming)

Headers on every response