Streaming Guide
Streaming Guide
Streaming allows you to receive tokens as they’re generated, reducing time to first token.
Enabling Streaming
Set stream: true in your request:
{ "model": "gemma-4-26b", "messages": [{"role": "user", "content": "Tell me a story"}], "stream": true}Response Format
The response is a series of Server-Sent Events (SSE):
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" time"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]Python Example
from openai import OpenAI
client = OpenAI( api_key="your-api-key", base_url="https://cryptgpt.co/v1")
stream = client.chat.completions.create( model="gemma-4-26b", messages=[{"role": "user", "content": "Tell me a story"}], stream=True)
for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)Node.js Example
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: 'your-api-key', baseURL: 'https://cryptgpt.co/v1',});
const stream = await client.chat.completions.create({ model: 'gemma-4-26b', messages: [{ role: 'user', content: 'Tell me a story' }], stream: true,});
for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || '');}curl Example
curl -X POST https://cryptgpt.co/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemma-4-26b", "messages": [{"role": "user", "content": "Tell me a story"}], "stream": true }'Handling Errors
If an error occurs during streaming, you’ll receive an error event:
data: {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}}Cancellation
To cancel a streaming request, close the HTTP connection. The server will stop generating tokens.
Best Practices
- Buffer partial tokens for display
- Handle connection drops gracefully
- Implement timeout for stalled streams
- Use backpressure to avoid overwhelming slow clients
- Parse SSE format correctly (handle
data:prefix)