Skip to main content
POST
/
v1
/
completions
Create completion
curl --request POST \
  --url https://api.cogito.decart.ai/v1/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "prompt": [
    "<string>"
  ],
  "stream": true,
  "max_tokens": 123,
  "temperature": 123,
  "top_p": 123,
  "stop": [
    "<string>"
  ],
  "frequency_penalty": 123,
  "presence_penalty": 123,
  "ignore_eos": true
}
'
Legacy OpenAI-compatible text completions endpoint for prompt-based clients and benchmark harnesses. Use chat completions for new applications. This route is available for Dynamo-backed models. Placeholder catalog rows are rejected with model_not_found instead of returning a synthetic completion.

Request body

model
string
required
Model id from the catalog. Example: moonshotai/kimi-k2.6:appliedcompute.
prompt
string | string[] | integer[] | integer[][]
required
Prompt text or token ids to complete.
stream
boolean
default:"false"
When true, responses are streamed as Server-Sent Events and end with data: [DONE].
max_tokens
integer
Maximum output tokens. Clamped to the model’s max_output_length.
temperature
number
default:"1"
Sampling temperature, 0 to 2. Lower values are more deterministic.
top_p
number
default:"1"
Nucleus sampling. Use either temperature or top_p, not both.
stop
string | string[]
Stop sequence or sequences.
frequency_penalty
number
default:"0"
-2.0 to 2.0. Penalize tokens by their frequency in the response so far.
presence_penalty
number
default:"0"
-2.0 to 2.0. Penalize tokens that have appeared at all.
ignore_eos
boolean
Engine extension used by fixed-length benchmark harnesses. When supported by the selected upstream, the model continues until max_tokens or another stop condition is reached.

Example

curl https://api.cogito.decart.ai/v1/completions \
  -H "Authorization: Bearer $COGITO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2.6:appliedcompute",
    "prompt": "Write one sentence about fast inference:",
    "max_tokens": 32
  }'

Response

{
  "id": "cmpl_...",
  "object": "text_completion",
  "created": 1714521600,
  "model": "moonshotai/kimi-k2.6",
  "choices": [
    {
      "index": 0,
      "text": " Fast inference keeps agent loops tight.",
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 8,
    "total_tokens": 16
  }
}

Headers on every response

  • x-request-id — opaque ID. Log it. We trace it through every layer.