Streaming

Set stream: true to receive an OpenAI-compatible Server-Sent Events stream of chat completion chunks.

Chunk shape

Each data: line carries one JSON object with the same structure as a non-streaming response, but with delta instead of message:

{
  "id": "req_...",
  "object": "chat.completion.chunk",
  "created": 1714521600,
  "model": "gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "delta": { "content": "tok" },
      "finish_reason": null
    }
  ]
}

The first chunk includes delta.role: "assistant". The final chunk has finish_reason: "stop" (or length, tool_calls, content_filter). The stream ends with the literal sentinel:

data: [DONE]

Cancellation

Closing the connection cancels generation server-side and stops billing immediately. We don’t charge you for tokens you cancelled before they arrived.

const controller = new AbortController();

const response = await client.chat.completions.create(
  { model: "gpt-oss-120b", messages: [...], stream: true },
  { signal: controller.signal },
);

// Cancel after 3 seconds
setTimeout(() => controller.abort(), 3000);

Error recovery

Network blips during a stream show up as truncated SSE — the connection drops without a [DONE] sentinel. The OpenAI SDK throws on this. On retry, replay your full message history; chunks already received are not retained server-side. Every response (streamed or not) includes an x-request-id header. Log it. When you open a support ticket we trace through the entire stack from that single ID.

Backpressure

Cogito’s gateway respects HTTP/2 backpressure. If you’re consuming the stream slowly (e.g. piping to a slow UI), the server pauses generation rather than buffering. This means a lazy reader doesn’t get billed for tokens it never reads.

Getting started

Guides

Models

Chunk shape

Cancellation

Error recovery

Backpressure

Getting started

Guides

Models

Documentation Index

​Chunk shape

​Cancellation

​Error recovery

​Backpressure

Chunk shape

Cancellation

Error recovery

Backpressure