Streaming Guide

Streaming allows you to receive tokens as they’re generated, reducing time to first token.

Enabling Streaming

Set stream: true in your request:

{
  "model": "gemma-4-26b",
  "messages": [{"role": "user", "content": "Tell me a story"}],
  "stream": true
}

Response Format

The response is a series of Server-Sent Events (SSE):

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" time"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Python Example

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://cryptgpt.co/v1"
)

stream = client.chat.completions.create(
    model="gemma-4-26b",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Node.js Example

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-api-key',
  baseURL: 'https://cryptgpt.co/v1',
});

const stream = await client.chat.completions.create({
  model: 'gemma-4-26b',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

curl Example

curl -X POST https://cryptgpt.co/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-26b",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Handling Errors

If an error occurs during streaming, you’ll receive an error event:

data: {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}}

Cancellation

To cancel a streaming request, close the HTTP connection. The server will stop generating tokens.

Best Practices

Buffer partial tokens for display
Handle connection drops gracefully
Implement timeout for stalled streams
Use backpressure to avoid overwhelming slow clients
Parse SSE format correctly (handle data: prefix)