文章目录

一、介绍：
二、不足之处：
三、示例代码：
- 0. 引入库：
- 1. 不使用stream的后台代码（官方示例）：
- 2. 使用stream的后台代码（官方示例）：
- 3. 实际生产环境的示例后台代码（Sanic）：
- 4. 实际生产环境的示例前端代码（SSE）：
四、总结：

一、介绍：

默认情况下，当请求OpenAI的API时，整个响应将在生成后一次性发送回来。如果需要的响应比较复杂，就会需要很长时间来等待响应。

为了更快地获得响应，可以在请求API时选择“流式传输”。

要使用流式传输，调用API时设置stream=True。这将返回一个对象，以data-only server-sent events流式返回响应。需要从delta字段而不是message字段中提取块。

二、不足之处：

由于是逐步传输的，所以使用流式传输会提高对内容审核的难度。
流式响应的另一个小缺点是，响应不再包括“usage”字段，所以无法立即得知使用了多少令牌。

三、示例代码：

0. 引入库：

import openai  # for OpenAI API calls
import time  # for measuring time duration of API calls

1. 不使用stream的后台代码（官方示例）：

在不使用stream的ChatCompletions API调用中，响应被计算出来后一次性地返回。

# Example of an OpenAI ChatCompletion request
# https://platform.openai.com/docs/guides/chat

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
)

# calculate the time it took to receive the response
response_time = time.time() - start_time

# print the time delay and text received
print(f"Full response received {response_time:.2f} seconds after request")
print(f"Full response received:\n{response}")

reply = response['choices'][0]['message']
print(f"Extracted reply: \n{reply}")

reply_content = response['choices'][0]['message']['content']
print(f"Extracted content: \n{reply_content}")

2. 使用stream的后台代码（官方示例）：

在流式 API 调用中，响应通过事件流以分块的方式递增送回。在Python中，可以用一个for循环来迭代这些事件。

# Example of an OpenAI ChatCompletion request with stream=True
# https://platform.openai.com/docs/guides/chat

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
    stream=True  # again, we set stream=True
)

# create variables to collect the stream of chunks
collected_chunks = []
collected_messages = []
# iterate through the stream of events
for chunk in response:
    chunk_time = time.time() - start_time  # calculate the time delay of the chunk
    collected_chunks.append(chunk)  # save the event response
    chunk_message = chunk['choices'][0]['delta']  # extract the message
    collected_messages.append(chunk_message)  # save the message
    print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")  # print the delay and text

# print the time delay and text received
print(f"Full response received {chunk_time:.2f} seconds after request")
full_reply_content = ''.join([m.get('content', '') for m in collected_messages])
print(f"Full conversation received: {full_reply_content}")

3. 实际生产环境的示例后台代码（Sanic）：

	api_response = openai.ChatCompletion.create(
	        model='gpt-3.5-turbo',
	        messages=request.ctx.session['message'],
	        temperature=1,
	        stream=True
	)

    response = await request.respond(content_type='text/event-stream')
    answer = ''
    for part in api_response:
        finish_reason = part["choices"][0]["finish_reason"]
        if "content" in part["choices"][0]["delta"]:
            content = part["choices"][0]["delta"]["content"]
            answer += content
            content = content.replace('\n', '<br>')  # 将换行替换为<br>，用于前端显示。
            await response.send(f"data: {content}\n\n")  # 使用 Server-Sent Events (SSE) 格式发送数据
        elif finish_reason:
            await response.send(f"event: end\ndata: {answer}\n\n")

4. 实际生产环境的示例前端代码（SSE）：

const source = new EventSource("/api/chat_stream/?message=" + message);
const messageDiv = document.createElement('div');
let res_msg = ''
chatContainer.appendChild(messageDiv);
source.addEventListener('message', (e) => {
    messageDiv.innerHTML += e.data;
    chatContainer.scrollTop = chatContainer.scrollHeight;
    res_msg += e.data;
})
// 收到所有的回复后，重新整理格式，这里用的是marked.js，也可以用mark-it。
source.addEventListener('end', (e) => {
    source.close()
    res_msg = res_msg.replaceAll('<br>', '\r\n')
    messageDiv.innerHTML = marked.parse(res_msg)
    chatContainer.scrollTop = chatContainer.scrollHeight;
})
source.onerror = function (e) {
    console.log(e)
}

四、总结：

本文介绍了OpenAI API中流式传输（stream=True）的实现方法，以及如何使用该功能来处理大型文本数据。

此外，还列出了使用流式传输的优缺点，以及示例代码，包括不使用流式传输的代码和使用流式传输的代码。其中使用流式传输的代码示例演示了如何通过事件流以分块的方式递增接收响应，并在 Python 中使用 for 循环迭代这些事件，最终获得完整的响应。同时，本文还提供了实际生产环境的示例代码，包括后台代码和前端代码。

文章出处登录后可见！

已经登录？立即刷新

ChatGPT流式传输（stream=True)的实现-OpenAI API 流式传输