Skip to main content
POST
/
v1
/
chat
/
completions
Create chat completion
curl --request POST \
  --url https://api.cogito.decart.ai/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {}
  ],
  "stream": true,
  "temperature": 123,
  "top_p": 123,
  "max_tokens": 123,
  "tools": [
    {}
  ],
  "tool_choice": {},
  "response_format": {},
  "stop": [
    "<string>"
  ],
  "frequency_penalty": 123,
  "presence_penalty": 123
}
'

Documentation Index

Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt

Use this file to discover all available pages before exploring further.

OpenAI-compatible chat completions endpoint. Streaming and non-streaming.

Request body

model
string
required
Model slug from the catalog. Example: gpt-oss-120b.
messages
array
required
Conversation history. Each entry is { role: "system" | "user" | "assistant" | "tool", content: string }.
stream
boolean
default:"false"
When true, responses are streamed as Server-Sent Events. See the streaming guide.
temperature
number
default:"1"
Sampling temperature, 0 to 2. Lower → more deterministic.
top_p
number
default:"1"
Nucleus sampling. Use either temperature or top_p, not both.
max_tokens
integer
Maximum output tokens. Capped per tier — see pricing.
tools
array
Available tools the model may call. See function calling.
tool_choice
string | object
default:"auto"
"auto" (default), "none", or { type: "function", function: { name: ... } } to force.
response_format
object
{ type: "json_object" } or { type: "json_schema", json_schema: {...} }. See structured outputs.
stop
string | string[]
Up to 4 stop sequences.
frequency_penalty
number
default:"0"
-2.0 to 2.0. Penalize tokens by their frequency in the response so far.
presence_penalty
number
default:"0"
-2.0 to 2.0. Penalize tokens that have appeared at all.

Response (non-streaming)

{
  "id": "req_...",
  "object": "chat.completion",
  "created": 1714521600,
  "model": "gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 71,
    "total_tokens": 103,
    "cached_input_tokens": 0
  }
}

Response (streaming)

Content-Type: text/event-stream. Each event is data: <chat.completion.chunk JSON>. Stream ends with data: [DONE]. See the streaming guide.

Headers on every response

  • x-request-id — opaque ID. Log it. We trace it through every layer.
  • x-tokens-used — billed total for this request (omitted on errors that didn’t consume tokens).