> ## Documentation Index > Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt > Use this file to discover all available pages before exploring further. # Create chat completion > POST /v1/chat/completions OpenAI-compatible chat completions endpoint. Streaming and non-streaming. ## Request body Model slug from the [catalog](/getting-started/models). Example: `gpt-oss-120b`. Conversation history. Each entry is `{ role: "system" | "user" | "assistant" | "tool", content: string }`. When `true`, responses are streamed as Server-Sent Events. See the [streaming guide](/guides/streaming). Sampling temperature, `0` to `2`. Lower → more deterministic. Nucleus sampling. Use either `temperature` or `top_p`, not both. Maximum output tokens. Capped per tier — see [pricing](/getting-started/pricing). Clamped to the model's `max_output_length` (visible on `/v1/models`). Same semantics as `max_tokens`; OpenAI's canonical field for o1/o3 reasoning models. Either field is accepted; if both are sent, `max_completion_tokens` wins. Clamped to the model's `max_output_length`. Available tools the model may call. See [function calling](/guides/function-calling). `"auto"` (default), `"none"`, or `{ type: "function", function: { name: ... } }` to force. `{ type: "json_object" }` or `{ type: "json_schema", json_schema: {...} }`. See [structured outputs](/guides/structured-outputs). Up to 4 stop sequences. `-2.0` to `2.0`. Penalize tokens by their frequency in the response so far. `-2.0` to `2.0`. Penalize tokens that have appeared at all. Reasoning effort hint for models that emit a chain of thought — accepted as the standard OpenAI top-level field. For DeepSeek-V4 the gateway mirrors this value into `chat_template_kwargs.reasoning_effort` and strips the top-level field before forwarding, because DeepSeek-V4's chat template only consumes the engine-specific form. Without this mirror, top-level `reasoning_effort` is silently a no-op on V4 (it's also a SamplingParams interference source when the value is outside OpenAI's enum). The OpenRouter-style alias `"xhigh"` is mapped to DeepSeek's `"max"` ("Think Max" mode). DeepSeek-V4 documents `"high"` and `"max"`; other OpenAI tiers (`minimal | low | medium`) are forwarded literally but fall back to the encoder's default branch on this build — i.e. they don't 400 but may produce reasoning depth indistinguishable from sending no hint. Disable thinking entirely with `chat_template_kwargs.enable_thinking: false`. If you set `chat_template_kwargs.reasoning_effort` explicitly, the gateway honors your value and leaves the top-level field alone. Engine-specific chat-template knobs forwarded verbatim to the upstream. On DeepSeek-V4: `{ enable_thinking: false }` disables reasoning and routes output directly to `content`; `{ drop_thinking: true }` drops prior assistant `reasoning_content` from the encoded prompt. Cogito defaults `drop_thinking` to `false` for reasoning-capable models so multi-turn requests preserve prior reasoning traces unless you explicitly opt out; `{ reasoning_effort: "high" | "max" }` selects the model's Think-High vs Think-Max mode. The gateway force-injects `enable_thinking: false` when `response_format` is set (so JSON-mode and structured outputs land in `content`, not `reasoning_content`); any caller-supplied value here always wins. ## Response (non-streaming) ```json theme={null} { "id": "req_...", "object": "chat.completion", "created": 1714521600, "model": "gpt-oss-120b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 32, "completion_tokens": 71, "total_tokens": 103, "prompt_tokens_details": { "cached_tokens": 0 }, "completion_tokens_details": { "reasoning_tokens": 0 } } } ``` ## Response (streaming) `Content-Type: text/event-stream`. Each event is `data: `. Stream ends with `data: [DONE]`. See the [streaming guide](/guides/streaming). ## Headers on every response * `x-request-id` — opaque ID. Log it. We trace it through every layer. * `x-tokens-used` — billed total for this request (omitted on errors that didn't consume tokens).