> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cogito.decart.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Streaming

> SSE chunks, cancellation, error recovery.

Set `stream: true` to receive an OpenAI-compatible Server-Sent Events stream of chat completion chunks.

## Chunk shape

Each `data:` line carries one JSON object with the same structure as a non-streaming response, but with `delta` instead of `message`:

```json theme={null}
{
  "id": "req_...",
  "object": "chat.completion.chunk",
  "created": 1714521600,
  "model": "gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "delta": { "content": "tok" },
      "finish_reason": null
    }
  ]
}
```

The first chunk includes `delta.role: "assistant"`. The final chunk has `finish_reason: "stop"` (or `length`, `tool_calls`, `content_filter`). The stream ends with the literal sentinel:

```
data: [DONE]
```

## Cancellation

Closing the connection cancels generation server-side and stops billing immediately. We don't charge you for tokens you cancelled before they arrived.

```typescript theme={null}
const controller = new AbortController();

const response = await client.chat.completions.create(
  { model: "gpt-oss-120b", messages: [...], stream: true },
  { signal: controller.signal },
);

// Cancel after 3 seconds
setTimeout(() => controller.abort(), 3000);
```

## Error recovery

Network blips during a stream show up as truncated SSE — the connection drops without a `[DONE]` sentinel. The OpenAI SDK throws on this. On retry, replay your full message history; chunks already received are not retained server-side.

Every response (streamed or not) includes an `x-request-id` header. Log it. When you open a support ticket we trace through the entire stack from that single ID.

## Backpressure

Cogito's gateway respects HTTP/2 backpressure. If you're consuming the stream slowly (e.g. piping to a slow UI), the server pauses generation rather than buffering. This means a lazy reader doesn't get billed for tokens it never reads.
