Create completion
Chat
Create completion
POST /v1/completions
POST
Create completion
Legacy OpenAI-compatible text completions endpoint for prompt-based clients and
benchmark harnesses. Use chat completions for
new applications.
This route is available for Dynamo-backed models. Placeholder catalog rows are
rejected with
model_not_found instead of returning a synthetic completion.
Request body
Prompt text or token ids to complete.
When
true, responses are streamed as Server-Sent Events and end with
data: [DONE].Maximum output tokens. Clamped to the model’s
max_output_length.Sampling temperature,
0 to 2. Lower values are more deterministic.Nucleus sampling. Use either
temperature or top_p, not both.Stop sequence or sequences.
-2.0 to 2.0. Penalize tokens by their frequency in the response so far.-2.0 to 2.0. Penalize tokens that have appeared at all.Engine extension used by fixed-length benchmark harnesses. When supported by
the selected upstream, the model continues until
max_tokens or another stop
condition is reached.Example
Response
Headers on every response
x-request-id— opaque ID. Log it. We trace it through every layer.