流式输出

更新时间:2025-02-14 07:39:42

大模型收到输入后并不是一次性生成最终结果,而是逐步地生成中间结果,最终结果由中间结果拼接而成。用流式输出的方式调用大模型 API,能够实时返回中间结果,减少用户的阅读等待时间,并降低请求的超时风险。

概述

相比非流式输出,流式输出可以实时地将中间结果返回,您可以在模型进行输出的同时进行阅读,减少等待模型回复的时间;并且当输出内容较长时,有效降低请求超时的风险。

请求超时错误的报错信息:Request timed out, please try again later. 或 Response timeout。

以下为流式输出与非流式输出的效果对比。

⏱️ 等待时间:3 秒
已关闭流式输出
以上组件仅供您参考,并未真实发送请求。

如何使用

前提条件

您需要已获取API Key配置API Key到环境变量。如果通过OpenAI SDKDashScope SDK进行调用,还需要安装SDK

开始使用

OpenAI兼容
DashScope

通过 OpenAI 兼容方式开启流式输出十分简便,只需在请求参数中设置 stream 为 true 即可,详情请参见以下代码。

Python
Node.js
curl
流式输出默认不会返回本次请求所使用的 Token 量。您可以通过设置stream_options参数为{"include_usage": True},使最后一个返回的 chunk 包含本次请求所使用的 Token 量。
我们未来会将stream_options参数默认设置为{"include_usage": True},这会使最后一个chunkchoices字段成为空列表,我们建议您参考本文的最新代码,在业务代码中加上if chunk.choices:的判断条件。
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

completion = client.chat.completions.create(
    model="qwen-plus",  # 此处以qwen-plus为例,您可按需更换模型名称。模型列表:https://help.aliyun.com/zh/model-studio/getting-started/models
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你是谁?"}
    ],
    stream=True
)

full_content = ""
print("流式输出内容为:")
for chunk in completion:
    # 如果stream_options.include_usage为True,则最后一个chunk的choices字段为空列表,需要跳过(可以通过chunk.usage获取 Token 使用量)
    if chunk.choices:
        full_content += chunk.choices[0].delta.content
        print(chunk.choices[0].delta.content)
print(f"完整内容为:{full_content}")

返回结果

流式输出内容为:

我是来自
阿里
云
的大规模语言模型
,我叫通
义千问。

完整内容为:我是来自阿里云的大规模语言模型,我叫通义千问。
流式输出默认不会返回本次请求所使用的 Token 量。您可以通过设置stream_options参数为{"include_usage": true},使最后一个返回的 chunk 包含本次请求所使用的 Token 量。
我们未来会将stream_options参数默认设置为{"include_usage": true},这会使最后一个chunkchoices字段成为空数组。我们建议您参考本文的最新代码,在业务代码中加上if (Array.isArray(chunk.choices) && chunk.choices.length > 0)的判断条件。
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const completion = await openai.chat.completions.create({
    model: "qwen-plus", // 此处以qwen-plus为例,您可按需更换模型名称。模型列表:https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你是谁?"}
    ],
    stream: true,
    stream_options: {
        include_usage: true
    }
});

let fullContent = "";
console.log("流式输出内容为:")
for await (const chunk of completion) {
    // 如果stream_options.include_usage为true,则最后一个chunk的choices字段为空数组,需要跳过(可以通过chunk.usage获取 Token 使用量)
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        fullContent = fullContent + chunk.choices[0].delta.content;
        console.log(chunk.choices[0].delta.content);
    }
}
console.log("\n完整内容为:")
console.log(fullContent);

返回结果

流式输出内容为:

我是
来自
阿里
云
的大规模语言模型
,我叫通
义千问。


完整内容为:
我是来自阿里云的大规模语言模型,我叫通义千问。
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "你是谁?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    }
}'

返回结果

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"我是"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"来自"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"阿里"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"云的超大规模语言"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"模型,我叫通义千问"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"delta":{"content":"。"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":22,"completion_tokens":17,"total_tokens":39},"created":1726132850,"system_fingerprint":null,"model":"qwen-max","id":"chatcmpl-428b414f-fdd4-94c6-b179-8f576ad653a8"}

data: [DONE]

您可以通过DashScope SDKHTTP方式使用流式输出的功能。Python SDK 需要设置 stream参数 为 True;Java SDK 需要通过streamCall接口使用;HTTP 方式需要设置 Header 参数 X-DashScope-SSEenable

流式输出的内容默认是非增量式(即每次返回的内容都包含之前生成的内容),如果您需要使用增量式流式输出,请设置incremental_output(Java 为incrementalOutput)参数为 true
Python
Java
curl
import os
from dashscope import Generation


messages = [
    {'role':'system','content':'you are a helpful assistant'},
    {'role': 'user','content': '你是谁?'}]
responses = Generation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen-plus", # 此处以qwen-plus为例,您可按需更换模型名称。模型列表:https://help.aliyun.com/zh/model-studio/getting-started/models
    messages=messages,
    result_format='message',
    stream=True,
    # 增量式流式输出
    incremental_output=True
    )
full_content = ""
print("流式输出内容为:")
for response in responses:
    full_content += response.output.choices[0].message.content
    print(response.output.choices[0].message.content)
print(f"完整内容为:{full_content}")

返回结果

流式输出内容为:
我是来自
阿里
云
的大规模语言模型
,我叫通
义千问。

完整内容为:我是来自阿里云的大规模语言模型,我叫通义千问。
import java.util.Arrays;
import java.lang.System;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder fullContent = new StringBuilder();
    private static void handleGenerationResult(GenerationResult message) {
        String content = message.getOutput().getChoices().get(0).getMessage().getContent();
        fullContent.append(content);
        System.out.println(content);
    }
    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        System.out.println("流式输出内容为:");
        Flowable<GenerationResult> result = gen.streamCall(param);
        result.blockingForEach(message -> handleGenerationResult(message));
        System.out.println("完整内容为: " + fullContent.toString());
    }
    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                // 若没有配置环境变量,请用百炼API Key将下行替换为:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-plus")   // 此处以qwen-plus为例,您可按需更换模型名称。模型列表:https://help.aliyun.com/zh/model-studio/getting-started/models
                .messages(Arrays.asList(userMsg))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .incrementalOutput(true)
                .build();
    }
    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("你是谁?").build();
            streamCallWithMessage(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException  e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

流式输出内容为:
我是通
义
千
问,由阿里
云开发的人工
智能助手。我
被设计用来回答
各种问题、提供
信息和与用户
进行对话。有什么
我可以帮助你的吗
?

完整内容为: 我是通义千问,由阿里云开发的人工智能助手。我被设计用来回答各种问题、提供信息和与用户进行对话。有什么我可以帮助你的吗?
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen-plus",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "你是谁?"
            }
        ]
    },
    "parameters": {
        "result_format": "message",
        "incremental_output":true
    }
}'

返回结果

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"我是","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":23,"input_tokens":22,"output_tokens":1},"request_id":"xxx"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"通","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":24,"input_tokens":22,"output_tokens":2},"request_id":"xxx"}

id:3
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"义","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":25,"input_tokens":22,"output_tokens":3},"request_id":"xxx"}

id:4
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"千问,由阿里","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":30,"input_tokens":22,"output_tokens":8},"request_id":"xxx"}

id:5
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"云开发的AI助手。我被","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":38,"input_tokens":22,"output_tokens":16},"request_id":"xxx"}

id:6
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"设计用来回答各种问题、提供信息","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":46,"input_tokens":22,"output_tokens":24},"request_id":"xxx"}

id:7
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"和与用户进行对话。有什么我可以","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":54,"input_tokens":22,"output_tokens":32},"request_id":"xxx"}

id:8
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"帮助你的吗?","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":58,"input_tokens":22,"output_tokens":36},"request_id":"xxx"}

错误码

如果模型调用失败并返回报错信息,请参见错误信息进行解决。

常见问题

Q1:开启流式输出对模型的回复效果是否有影响?

A1:不会有影响。

Q2:开启流式输出功能需要额外付费吗?

A2:不需要额外付费,流式输出与非流式输出计费规则一样,都是按照输入与输出的 Token 计费。

  • 本页导读 (1)
  • 概述
  • 如何使用
  • 前提条件
  • 开始使用
  • 错误码
  • 常见问题
AI助理

点击开启售前

在线咨询服务

你好,我是AI助理

可以解答问题、推荐解决方案等