Skip to content

Streaming Guide

Streaming Guide

Streaming allows you to receive tokens as they’re generated, reducing time to first token.

Enabling Streaming

Set stream: true in your request:

{
"model": "gemma-4-26b",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}

Response Format

The response is a series of Server-Sent Events (SSE):

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" time"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]

Python Example

from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://cryptgpt.co/v1"
)
stream = client.chat.completions.create(
model="gemma-4-26b",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

Node.js Example

import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-api-key',
baseURL: 'https://cryptgpt.co/v1',
});
const stream = await client.chat.completions.create({
model: 'gemma-4-26b',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

curl Example

Terminal window
curl -X POST https://cryptgpt.co/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-4-26b",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'

Handling Errors

If an error occurs during streaming, you’ll receive an error event:

data: {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}}

Cancellation

To cancel a streaming request, close the HTTP connection. The server will stop generating tokens.

Best Practices

  1. Buffer partial tokens for display
  2. Handle connection drops gracefully
  3. Implement timeout for stalled streams
  4. Use backpressure to avoid overwhelming slow clients
  5. Parse SSE format correctly (handle data: prefix)