Stream
The stream
parameter is a Boolean setting that controls whether the model should return results incrementally, chunk by chunk, or wait until the entire response is ready before returning it.
Setting stream=True
allows you to receive partial responses in real time, which is useful when you want to provide faster feedback to users while the entire completion is still being processed.
Here’s an example demonstrating how to use the streaming feature:
response = client.chat.completions.create(
model="YOUR_TASKINGAI_MODEL_ID",
messages=[
{"role": "user", "content": "Hello, how are you?"},
],
stream=True,
)
# The response will be received in chunks, allowing you to process it incrementally.
for chunk in response:
print(chunk)
Using streaming can improve the responsiveness of your application, especially for longer responses where waiting for the entire completion might introduce noticeable latency.
Other Configurations
Other configurations for chat completions include options such as temperature, max_tokens, top_p, stop, and more. However, these attributes are currently unsupported in the OpenAI-compatible API. Instead, invocations will default to the pre-set configurations of the assistant application.