Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt

Use this file to discover all available pages before exploring further.

When you need the model to return JSON that exactly matches a schema, use response_format. Cogito enforces the schema with grammar-constrained decoding — the model literally cannot emit tokens that would break the schema, so you don’t need to ask nicely or add “respond only in JSON” to the system prompt.

JSON schema mode

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Extract entities from: 'Apple acquired Beats in 2014 for $3B.'"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "entity_extraction",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "company": {"type": "string"},
                    "acquired": {"type": "string"},
                    "year": {"type": "integer"},
                    "amount_usd": {"type": "string"}
                },
                "required": ["company", "acquired", "year", "amount_usd"],
                "additionalProperties": False
            }
        }
    }
)

import json
data = json.loads(response.choices[0].message.content)
# {"company": "Apple", "acquired": "Beats", "year": 2014, "amount_usd": "$3B"}

What “strict” guarantees

With strict: true, every property in required is present, every value matches its declared type, and no extra fields are emitted. You can json.loads() without a try/except in production.

json_object mode

For “give me any valid JSON” without a schema:
response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[...],
    response_format={"type": "json_object"},
)
Less strict — the structure is up to the model — but the result is still guaranteed valid JSON.

Streaming structured outputs

Both modes work with stream: true. Chunks arrive as token-level deltas and assemble into valid JSON once the stream completes. Don’t JSON.parse partial chunks; only the assembled string is guaranteed valid.

Errors

If the schema is malformed (recursive without a depth limit, mutually exclusive constraints), Cogito returns a 400 with code: "invalid_response_format" before generation starts. We don’t waste your tokens on impossible schemas.