通义千问7B、14B和72B模型有哪些支持的API,如何使用_大模型服务平台百炼(Model Studio)-阿里云帮助中心

大语言模型

Qwen2

Qwen2是Qwen开源大语言模型的新系列。参数范围包括0.5B到72B，包括Mixture-of-Experts模型。与最先进的开源语言模型（包括之前发布的 Qwen1.5）相比，Qwen2在一系列针对语言理解、语言生成、多语言能力、编码、数学、推理等的基准测试中总体上超越了大多数开源模型，并表现出与专有模型的竞争力。Qwen2增⼤了上下⽂⻓度⽀持，最⾼达到128K tokens（Qwen2-72B-Instruct），能够处理大量输入。

灵积平台上基于Qwen2开源的0.5B、1.5B、7B、72B和57B-A14B MoE模型的instruct版本，并进行了针对性的推理性能优化，为广大开发者提供便捷的API服务。各个版本均对应魔搭社区开源的各个模型版本，详细参考ModelScope魔搭社区。

Qwen1.5

Qwen1.5是Qwen开源系列的下一个版本。与之前的版本相比，Qwen1.5显著提升了聊天模型与人类偏好的一致性，改善了它们的多语言能力，并具备了强大的链接外部系统能力。灵积上提供API服务的是新版本qwen模型的chat版本，在chat能力上大幅提升，即便在英文的MT-Bench上，Qwen1.5-Chat系列也取得了优秀的性能。

大模型服务平台百炼提供的0.5B模型、1.8B模型、7B模型、14B模型、32B模型、72B模型和110B模型基于千问开源版本，进行了针对性的推理性能优化，为广大开发者提供便捷的API服务。各个版本均对应魔搭社区开源的模型版本，详细参考链接。

Qwen

通义千问系列模型为阿里云研发的大语言模型。千问模型基于Transformer架构，在超大规模的预训练数据上进行训练得到。预训练数据类型多样，覆盖广泛，包括大量网络文本、专业书籍、代码等。同时，在预训练模型的基础之上，使用对齐机制打造了模型的chat版本。其中千问-1.8B是18亿参数规模的模型，千问-7B是70亿参数规模的模型，千问-14B是140亿参数规模的模型，千问-72B是720亿参数规模的模型。

大模型服务平台百炼提供的千问开源模型，进行了针对性的推理性能优化，为广大开发者提供便捷的API服务。其中1.8B模型基于最新在魔搭社区开源的最新版本，7B模型基于最新在魔搭社区上开源的V1.1版本，而14B模型同样基于魔搭社区上最新版本提供，72B模型基于魔搭社区开源的最新版本提供。

模型概览

模型名	模型简介	模型输入输出限制
qwen2-57b-a14b-instruct	通义千问2对外开源的57B规模14B激活参数的MOE模型	模型支持 32,768 tokens上下文，为了保障正常使用和正常输出，API限定用户输入为 30,720 ，输出最大 6,144。
qwen2-72b-instruct	通义千问2对外开源的0.5~72B规模的模型	模型支持 131,072 tokens上下文，为了保障正常使用和正常输出，API限定用户输入为 128,000 ，输出最大 6,144。
qwen2-7b-instruct
qwen2-1.5b-instruct		模型支持 32,768 tokens上下文，为了保障正常使用和正常输出，API限定用户输入为 30,720 ，输出最大 6,144。
qwen2-0.5b-instruct
qwen1.5-110b-chat	通义千问1.5对外开源的110B规模参数量的经过人类指令对齐的chat模型	支持 32k tokens上下文，为了保障正常使用和正常输出，API限定用户输入为30k，输出最大8k。
qwen1.5-72b-chat	通义千问1.5对外开源的72B规模参数量的经过人类指令对齐的chat模型	支持32k tokens上下文，输入最大30k，输出最大2k tokens。
qwen1.5-32b-chat	通义千问1.5对外开源的32B规模参数量的经过人类指令对齐的chat模型	支持32k tokens上下文，输入最大30k，输出最大2k tokens。
qwen1.5-14b-chat	通义千问1.5对外开源的14B规模参数量的经过人类指令对齐的chat模型	模型支持 8k tokens上下文，为了保障正常使用和正常输出，API限定用户输入为6k Tokens。
qwen1.5-7b-chat	通义千问1.5对外开源的7B规模参数量是经过人类指令对齐的chat模型	模型支持 8k tokens上下文，为了保障正常使用和正常输出，API限定用户输入为6k Tokens。
qwen1.5-1.8b-chat	通义千问1.5对外开源的1.8B规模参数量的经过人类指令对齐的chat模型	支持32k tokens上下文，输入最大30k，输出最大2k tokens。
qwen1.5-0.5b-chat	通义千问1.5对外开源的0.5B规模参数量的经过人类指令对齐的chat模型	支持32k tokens上下文，输入最大30k，输出最大2k tokens。
qwen-72b-chat	通义千问对外开源的72B规模参数量的经过人类指令对齐的chat模型	支持32k tokens上下文，输入最大30k，输出最大2k tokens。
qwen-14b-chat	通义千问对外开源的14B规模参数量的经过人类指令对齐的chat模型	模型支持 8k tokens上下文，为了保障正常使用和正常输出，API限定用户输入为6k Tokens。
qwen-7b-chat	通义千问对外开源的7B规模参数量的经过人类指令对齐的chat模型	模型支持 8k tokens上下文，为了保障正常使用和正常输出，API限定用户输入为6k Tokens。
qwen-1.8b-longcontext-chat	通义千问对外开源的1.8B规模参数量的经过人类指令对齐的chat模型	支持32k tokens上下文，输入最大30k，输出最大2k tokens。
qwen-1.8b-chat	通义千问对外开源的1.8B规模参数量的经过人类指令对齐的chat模型	模型支持 8k tokens上下文，为了保障正常使用和正常输出，API限定用户输入为6k Tokens。

SDK使用

前提条件

已开通服务并获得API-KEY：获取API-KEY。
已安装最新版SDK：安装SDK。

设置API-KEY：您可以通过环境变量配置API-KEY，降低API-KEY外泄的风险，避免不必要的损失。

单轮问答

以下示例展示了调用通义千问72B模型对一个用户指令进行响应的代码，如需访问110B模型、32B模型、14B模型、7B模型或者1.8B模型，替换对应的模型名称即可。

Python

import random
from http import HTTPStatus
import dashscope


def call_with_messages():
    messages = [
        {'role': 'user', 'content': '用萝卜、土豆、茄子做饭，给我个菜谱'}]
    response = dashscope.Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        # set the random seed, optional, default to 1234 if not set
        seed=random.randint(1, 10000),
        result_format='message',  # set the result to be "message" format.
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))


if __name__ == '__main__':
    call_with_messages()

Java

// Copyright (c) Alibaba, Inc. and its affiliates.
// 建议dashscope SDK的版本 >= 2.12.0

import java.util.Arrays;  
import com.alibaba.dashscope.aigc.generation.Generation;  
import com.alibaba.dashscope.aigc.generation.GenerationParam;  
import com.alibaba.dashscope.aigc.generation.GenerationResult;  
import com.alibaba.dashscope.common.Message;  
import com.alibaba.dashscope.common.Role;  
import com.alibaba.dashscope.exception.ApiException;  
import com.alibaba.dashscope.exception.InputRequiredException;  
import com.alibaba.dashscope.exception.NoApiKeyException;  
  
public class Main {  
  
    public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {  
        Generation gen = new Generation();  
          
        Message systemMsg = Message.builder()  
                .role(Role.SYSTEM.getValue())  
                .content("You are a helpful assistant.")  
                .build();  
          
        Message userMsg = Message.builder()  
                .role(Role.USER.getValue())  
                .content("如何做西红柿炒鸡蛋？")  
                .build();  
          
        GenerationParam param = GenerationParam.builder()  
                .model("qwen-72b-chat")  
                .messages(Arrays.asList(systemMsg, userMsg))  
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)  
                .topP(0.8)  
                .build();  
          
        return gen.call(param);  
    }  
  
    public static void main(String[] args) {  
        try {  
            GenerationResult result = callWithMessage();  
            System.out.println(result);  
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {  
            // 使用日志框架记录异常信息  
            // Logger.error("An error occurred while calling the generation service", e);  
            System.err.println("An error occurred while calling the generation service: " + e.getMessage());  
        }  
         System.exit(0);
    }  
}

多轮会话

您也可以通过messages参数传入对话历史，以达到与模型进行多轮交互的目的。

Python

import random
from http import HTTPStatus
from dashscope import Generation
from dashscope.api_entities.dashscope_response import Role


def multi_round_conversation():
    messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
                {'role': 'user', 'content': '请介绍一下上海有什么好玩的地方'}]
    response = Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        # set the random seed, optional, default to 1234 if not set
        seed=random.randint(1, 10000),
        result_format='message',  # set the result to be "message"  format.
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
        messages.append({'role': response.output.choices[0]['message']['role'],
                         'content': response.output.choices[0]['message']['content']})
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))
    messages.append({'role': Role.USER, 'content': '能否缩短一些，只讲三点'})
    response = Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        result_format='message',  # set the result to be "message"  format.
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))


if __name__ == '__main__':
    multi_round_conversation()

Java

// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.ArrayList;  
import java.util.List;    
import com.alibaba.dashscope.aigc.generation.Generation;  
import com.alibaba.dashscope.aigc.generation.GenerationParam;  
import com.alibaba.dashscope.aigc.generation.GenerationResult;  
import com.alibaba.dashscope.common.Message;  
import com.alibaba.dashscope.common.Role;  
import com.alibaba.dashscope.exception.ApiException;  
import com.alibaba.dashscope.exception.InputRequiredException;  
import com.alibaba.dashscope.exception.NoApiKeyException;  
import com.alibaba.dashscope.utils.JsonUtils;  
  
public class Main {  
  
    public static GenerationParam createGenerationParam(List<Message> messages) {  
        return GenerationParam.builder()  
                .model("qwen-72b-chat")  
                .messages(messages)  
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)  
                .topP(0.8)  
                .build();  
    }  
  
    public static GenerationResult callGenerationWithMessages(GenerationParam param) throws ApiException, NoApiKeyException, InputRequiredException {  
        Generation gen = new Generation();  
        return gen.call(param);  
    }  
  
    public static void main(String[] args) {  
        try {  
            List<Message> messages = new ArrayList<>();  
            messages.add(createMessage(Role.SYSTEM, "You are a helpful assistant."));  
            messages.add(createMessage(Role.USER, "如何做西红柿炖牛腩？"));  
  
            GenerationParam param = createGenerationParam(messages);  
            GenerationResult result = callGenerationWithMessages(param);  
            printResult(result);  
  
            // 添加assistant返回的消息到列表  
            messages.add(result.getOutput().getChoices().get(0).getMessage());  
  
            // 添加新的用户消息  
            messages.add(createMessage(Role.USER, "不放糖可以吗？"));  
  
            result = callGenerationWithMessages(param);  
            printResult(result);  
            printResultAsJson(result);  
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {  
            e.printStackTrace(); 
        } 
           System.exit(0); 
    }  
  
    private static Message createMessage(Role role, String content) {  
        return Message.builder().role(role.getValue()).content(content).build();  
    }  
  
    private static void printResult(GenerationResult result) {  
        System.out.println(result);  
    }  
  
    private static void printResultAsJson(GenerationResult result) {  
        System.out.println(JsonUtils.toJson(result));  
    }  
}

流式输出

Python

import random
from http import HTTPStatus
from dashscope import Generation


def call_stream_with_messages():
    messages = [
        {'role': 'user', 'content': '用萝卜、土豆、茄子做饭，给我个菜谱'}]
    responses = Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        seed=random.randint(1, 10000),  # set the random seed, optional, default to 1234 if not set
        result_format='message',  # set the result to be "message"  format.
        stream=True,
        output_in_full=True  # get streaming output incrementally
    )
    full_content = ''
    for response in responses:
        if response.status_code == HTTPStatus.OK:
            full_content += response.output.choices[0]['message']['content']
            print(response)
        else:
            print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
                response.request_id, response.status_code,
                response.code, response.message
            ))
    print('Full content: \n' + full_content)


if __name__ == '__main__':
    call_stream_with_messages()

Java

// Copyright (c) Alibaba, Inc. and its affiliates.
import java.util.Arrays;
import java.util.concurrent.Semaphore;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import io.reactivex.Flowable;

public class Main {

    private static final Logger logger = LoggerFactory.getLogger(Main.class);

    private static void handleGenerationResult(GenerationResult message, StringBuilder fullContent) {
        fullContent.append(message.getOutput().getChoices().get(0).getMessage().getContent());
        logger.info("Received message: {}", JsonUtils.toJson(message));
    }

    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        StringBuilder fullContent = new StringBuilder();

        result.blockingForEach(message -> handleGenerationResult(message, fullContent));

        logger.info("Full content: \n{}", fullContent.toString());
    }

    public static void streamCallWithCallback(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException, InterruptedException {
        GenerationParam param = buildGenerationParam(userMsg);
        Semaphore semaphore = new Semaphore(0);
        StringBuilder fullContent = new StringBuilder();

        gen.streamCall(param, new ResultCallback<GenerationResult>() {
            @Override
            public void onEvent(GenerationResult message) {
                handleGenerationResult(message, fullContent);
            }

            @Override
            public void onError(Exception err) {
                logger.error("Exception occurred: {}", err.getMessage());
                semaphore.release();
            }

            @Override
            public void onComplete() {
                logger.info("Completed");
                semaphore.release();
            }
        });

        semaphore.acquire();
        logger.info("Full content: \n{}", fullContent.toString());
    }

    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                .model("qwen-72b-chat")
                .messages(Arrays.asList(userMsg))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .topP(0.8)
                .incrementalOutput(true)
                .build();
    }

    public static void main(String[] args) {
        try {
            Generation gen = new Generation();
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("如何做西红柿炖牛腩？").build();

            streamCallWithMessage(gen, userMsg);
            streamCallWithCallback(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException | InterruptedException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
    }
}

参数配置

参数	类型	默认值	说明
model	string	-	指定用于对话的通义千问模型名，具体见模型概览模型名一列。
messages	array	-	messages指用户与模型的对话历史。list中的每个元素形式为{"role":角色, "content": 内容}。角色当前可选值：system、user、assistant，其中，仅messages[0]中支持role为system，user和assistant需要交替出现。 prompt指用户当前输入的期望模型执行指令。 messages和"prompt"二选一使用。chat场景中推荐优先使用messages参数。
prompt	string	-
history（可选）	list[dict]	[]	即将废弃，请使用messages字段。用户与模型的对话历史，list中的每个元素是形式为{"user":"用户输入","bot":"模型输出"}的一轮对话，多轮对话按时间正序排列。
seed（可选）	int	1234	生成时使用的随机数种子，用户控制模型生成内容的随机性。seed支持无符号64位整数，默认值为1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
max_tokens（可选）	int	1500	用于限制模型生成token的数量，max_tokens设置的是生成上限，并不表示一定会生成这么多的token数量。其中qwen1.5-14b-chat、qwen1.5-7b-chat、qwen-14b-chat和qwen-7b-chat最大值和默认值均为1500，qwen-1.8b-chat、qwen-1.8b-longcontext-chat和qwen-72b-chat最大值和默认值均为2000
top_p（可选）	float	0.8	生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
top_k（可选）	int	0	生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。默认值为0，表示不启用top_k策略，此时，仅有top_p策略生效。
repetition_penalty（可选）	float	1.1	用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
temperature（可选）	float	0.85	用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。取值范围：[0, 2)，不建议取值为0，无意义。 python version >=1.10.1 java version >= 2.5.1
stop（可选）	str/list[str]用于指定字符串；list[int]/list[list[int]]用于指定token_ids	None	用于控制生成时遇到某些内容则停止。如果指定了字符串或者token_ids，模型将要生成指定字符串或者token_ids时会停止生成，生成结果不包含指定的内容。例如指定stop为"你好"，表示将要生成"你好"时停止；指定stop为[37763, 367]，表示将要生成"Observation"时停止。同时，stop参数支持以list方式传入字符串数组或者token_ids数组，以期支持使用多个stop的场景。注意，list模式下不支持字符串和token_ids混用，list模式下元素类型要相同。
stream（可选）	bool	False	是否使用流式输出。当以stream模式输出结果时，接口返回结果为generator，需要通过迭代获取结果，默认每次输出为当前生成的整个序列，最后一次输出为最终全部生成结果，python可以通过SDK参output_in_full为False改变输出模式为非增量输出，参考output_in_full说明，java设置incrementalOutput请求参数为Fase修改。
result_format（可选）	String	text	[text\|message]，默认为text，当为message时，输出参考message结果示例
incremental_output（可选）	bool	False	控制流式输出模式，即后面的内容会包含已经输出的内容；设置为True，将开启增量输出模式，后面的输出不会包含已经输出的内容，您需要自行拼接整体输出，参考流式输出示例代码。默认False： I I like i like apple True: I like apple 该参数只能与stream输出模式配合使用。

返回结果

返回message结果示例

JSON

{
    "status_code": 200,
    "request_id": "b3d8bb75-05a2-9044-8e9e-ec8c87689a5e",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": "材料：\n- 萝卜：2根\n- 土豆：2个\n- 茄子：2个\n- 大葱：1根\n- 姜：适量\n- 蒜：适量\n- 食用油：适量\n- 盐：适量\n- 生抽：适量\n- 蚝油：适量\n\n做法：\n\n1. 将萝卜、土豆、茄子分别洗净去皮，切成块状备用。\n2. 大葱切断，姜切片，蒜切末备用。\n3. 烧热锅，加入适量的食用油，放入葱段、姜片、蒜末爆香。\n4. 加入萝卜块，翻炒几分钟，加入适量的盐、生抽调味。\n5. 加入土豆块，继续翻炒几分钟，加入适量的盐、生抽调味。\n6. 加入茄子块，继续翻炒几分钟，加入适量的盐、生抽调味。\n7. 加入适量的蚝油，翻炒均匀，让每一块蔬菜都均匀地裹上蚝油。\n8. 翻炒几分钟，让蔬菜熟透，即可出锅。\n\n这道菜色香味俱佳，营养丰富，可以作为主食或配菜食用。"
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 31,
        "output_tokens": 267
    }
}

返回text结果示例

{
    "status_code": 200,
    "request_id": "446877aa-dbb8-99ca-98eb-d78a5e90fe61",
    "code": "",
    "message": "",
    "output": {
        "text": "材料：\n- 萝卜：2根\n- 土豆：2个\n- 茄子：2个\n- 大葱：1根\n- 姜：适量\n- 蒜：适量\n- 食用油：适量\n- 盐：适量\n- 生抽：适量\n- 蚝油：适量\n\n做法：\n\n1. 将萝卜、土豆、茄子分别洗净去皮，切成块状备用。\n2. 大葱切段，姜切片，蒜切末备用。\n3. 烧热锅，加入适量的食用油，放入葱段、姜片、蒜末爆香。\n4. 加入萝卜块，翻炒几分钟，加入适量的盐、生抽调味。\n5. 加入土豆块，继续翻炒几分钟，加入适量的盐、生抽调味。\n6. 加入茄子块，继续翻炒几分钟，加入适量的盐、生抽调味。\n7. 加入适量的蚝油，翻炒均匀，让每一块蔬菜都均匀地裹上蚝油。\n8. 翻炒几分钟，让蔬菜熟透，即可出锅。\n\n这道菜色香味俱佳，营养丰富，可以作为主食或配菜食用。",
        "finish_reason": "stop",
        "choices": null
    },
    "usage": {
        "input_tokens": 31,
        "output_tokens": 267
    }
}

返回参数说明

返回参数	类型	说明
status_code	int	200（HTTPStatus.OK）表示请求成功，否则表示请求失败，可以通过code获取错误码，通过message字段获取错误详细信息。
request_Id	string	系统生成的标志本次调用的id。
code	string	表示请求失败，表示错误码，成功忽略。
message	string	失败，表示失败详细信息，成功忽略。
output	dict	调用结果信息，对于千问模型，包含输出text。
output.usage	dict	计量信息，表示本次请求的计量数据。
output.text	string	模型生成回复。
output.finish_reason	string	有三种情况：正在生成时为null，生成结束时如果由于停止token导致则为stop，生成结束时如果因为生成长度过长导致则为length。
usage.input_tokens	int	用户输入文本转换成Token后的长度。
usage.output_tokens	int	模型生成回复转换为Token后的长度。
choices	List	[]
choices[i].finish_reason	String	有三种情况：正在生成时为null，生成结束时如果由于停止token导致则为stop，生成结束时如果因为生成长度过长导致则为length。
choices[i].message	dict	模型生成消息输出
message.role	String	模型role，固定为assistant
message.content	String	模型生成的文本

HTTP调用接口

功能描述

通义千问开源模型同时支持 HTTP 调用来完成客户的响应，目前提供普通 HTTP 和 HTTP SSE 两种协议，您可根据自己的需求自行选择。

前提条件

已开通服务并获得API-KEY：获取API-KEY

提交接口调用

POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation

入参描述

传参方式	字段	类型	必选	描述	示例值
Header	Content-Type	String	是	请求类型：application/json 或者text/event-stream（开启 SSE 响应）	application/json
	Accept	String	否	/，选择text/event-stream则会开启 SSE 响应，默认无设置	text/event-stream
	Authorization	String	是	API-Key，例如：Bearer d1**2a	Bearer d1**2a
	X-DashScope-WorkSpace	String	否	指明本次调用需要使用的workspace；需要注意的是，对于子账号Apikey调用，此参数为必选项，子账号必须归属于某个workspace才能调用；对于主账号Apikey此项为可选项，添加则使用对应的workspace身份，不添加则使用主账号身份。	ws_QTggmeAxxxxx
	X-DashScope-SSE	String	否	跟Accept: text/event-stream 二选一即可启用SSE响应	enable
Body	model	String	是	目前可选 qwen1.5-72b-chat, qwen1.5-14b-chat，qwen1.5-7b-chat，qwen-72b-chat, qwen-14b-chat或qwen-7b-chat	qwen1.5-72b-chat
	input.prompt	String	是	用户当前输入的期望模型执行指令，支持中英文。	哪个公园距离我更近
	input.history	List	否	即将废弃，请使用messages字段。用户与模型的对话历史，list中的每个元素是形式为{"user":"用户输入","bot":"模型输出"}的一轮对话，多轮对话按时间正序排列。	"history": [ { "user":"今天天气好吗？", "bot":"今天天气不错，要出去玩玩嘛？" }, { "user":"那你有什么地方推荐？", "bot":"我建议你去公园，春天来了，花朵开了，很美丽。" } ]
	input.messages	List	否	用户与模型的对话历史，对话接口未来都会有message传输，不过prompt和history会持续兼容，list中的每个元素形式为{"role":角色, "content": 内容}。角色当前可选值：system、user、assistant。未来可以扩展到更多role。	"input":{ "messages":[ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "你好，附近哪里有博物馆？" }] }
	input.messages.role	String	message存在的时候不能缺省
	input.messages.content	String	message存在的时候不能缺省
	parameters.result_format	String	否	"text"表示旧版本的text "message"表示兼容openai的message	"text"
	parameters.seed	Integer	否	生成时使用的随机数种子，用户控制模型生成内容的随机性。seed支持无符号64位整数，默认值为1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。	65535
	parameters.max_tokens	Integer	否	用于限制模型生成token的数量，max_tokens设置的是生成上限，并不表示一定会生成这么多的token数量。其中qwen1.5-14b-chat、qwen1.5-7b-chat、qwen-14b-chat和qwen-7b-chat最大值和默认值均为1500， qwen-1.8b-chat、qwen-1.8b-longcontext-chat和qwen-72b-chat最大值和默认值均为2000	1500
	parameters.top_p	Float	否	生成时，核采样方法的概率阈值。例如，取值为0.8时，仅保留累计概率之和大于等于0.8的概率分布中的token，作为随机采样的候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的随机性越低。默认值 0.8。注意，取值不要大于等于1	0.8
	parameters.top_k	Integer	否	生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。注意：如果top_k参数为空或者top_k的值大于100，表示不启用top_k策略，此时仅有top_p策略生效。	50
	parameters.repetition_penalty	Float	否	用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。默认为1.1	1.1
	parameters.temperature	Float	否	用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。取值范围： [0, 2)，不建议取值为0，无意义。系统默认值0.85	0.85
	parameters.stop	String/List[String]用于指定字符串；List[Integer]/List[List[Integer]]用于指定token_ids	否	用于控制生成时遇到某些内容则停止。如果指定了字符串或者token_ids，模型将要生成指定字符串或者token_ids时会停止生成，生成结果不包含指定的内容。例如指定stop为"你好"，表示将要生成"你好"时停止；指定stop为[37763, 367]，表示将要生成"Observation"时停止。同时，stop参数支持以list方式传入字符串数组或者token_ids数组，以支持使用多个stop的场景。注意，list模式下不支持字符串和token_ids混用，list模式下每个元素类型要相同。	[[37763, 367]]
	parameters.incremental_output	Bool	否	控制流式输出模式，默认为False，即后面的内容会包含已经输出的内容；设置为True，将开启增量输出模式，后面的输出不会包含已经输出的内容，您需要自行拼接整体输出，参考流式输出示例代码。默认False： I I like i like apple True: I like apple 该参数只能与stream输出模式配合使用。

出参描述

字段	类型	描述	示例值
output.text	String	本次请求的算法输出内容。	我建议你去颐和园
output.finish_reason	String	有三种情况：正在生成时为null，生成结束时如果由于停止token导致则为stop，生成结束时如果因为生成长度过长导致则为length。	stop
output.choices[list]	List	入参result_format=message时候的返回值	入参result_format=message时候的返回值
output.choices[x].finish_reason	String		停止原因，null：生成过程中 stop：stop token导致结束 length：生成长度导致结束
output.choices[x].message	String		message每个元素形式为{"role":角色, "content": 内容}。角色当前可选值：system、user、assistant。未来可以扩展到更多role，content则包含本次请求算法输出的内容。
output.choices[x].message.role	String
output.choices[x].message.content	String
usage.output_tokens	Integer	本次请求算法输出内容的 token 数目。	380
usage.input_tokens	Integer	本次请求输入内容的 token 数目。在打开了搜索的情况下，输入的 token 数目因为还需要添加搜索相关内容支持，所以会超出客户在请求中的输入。	633
request_id	String	本次请求的系统唯一码	7574ee8f-38a3-4b1e-9280-11c33ab46e51

请求示例（SSE 关闭）

以下示例展示通过curl命令来调用通义千问14B模型的脚本（SSE关闭），如需调用7B模型或72B模型替换model参数即可。

Shell

curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-14b-chat",
        "input": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a helpful assistant."
                },
                {
                    "role": "user",
                    "content": "你好，哪个公园距离我最近？"
                }
            ]
        },
        "parameters": {}
    }'

响应示例（SSE关闭）

JSON

{
    "output":{
        "text":"最近的公园是公园，它距离你的家大约1.5公里 ... ... 你可以使用Google地图或者百度地图来查看具体的路线和距离。",
        "finish_reason":"stop"    
    },
    "usage":{
        "output_tokens":51,
        "input_tokens":85
    },
    "request_id":"d89c06fb-46a1-47b6-acb9-bfb17f814969"
}

请求示例（SSE开启）

以下示例展示通过CURL命令来调用通义千问14B模型的脚本（SSE 开启），如需调用72B模型替换model参数即可。

Shell

curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
 -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
 -H "Content-Type: application/json" \
 -H "X-DashScope-SSE: enable" \
 -d '{
     "model": "qwen1.5-72b-chat",
     "input": {
         "messages": [
             {
                 "role": "system",
                 "content": "You are a helpful assistant."
             },
             {
                 "role": "user",
                 "content": "你好，哪个公园距离我最近？"
             }
         ]
     },
     "parameters": {}
 }'

响应示例（SSE开启）

JSON

id:1
event:result
data:{"output":{"finish_reason":"null","text":"最近"},"usage":{"output_tokens":3,"input_tokens":85},"request_id":"1117fb64-5dd9-9df0-a5ca-d7ee0e97032d"}

id:2
event:result
data:{"output":{"finish_reason":"null","text":"最近的公园是公园，它"},"usage":{"output_tokens":11,"input_tokens":85},"request_id":"1117fb64-5dd9-9df0-a5ca-d7ee0e97032d"}

... ... ... ...
... ... ... ...

id:8
event:result
data:{"output":{"finish_reason":"stop","text":"最近的公园是公园，它距离你的家大约1.5公里。你可以使用Google地图或者百度地图来查看具体的路线和距离。"},"usage":{"output_tokens":51,"input_tokens":85},"request_id":"1117fb64-5dd9-9df0-a5ca-d7ee0e97032d"}

异常响应示例

在访问请求出错的情况下，输出的结果中会通过 code 和 message 指明出错原因。

{
    "code":"InvalidApiKey",
    "message":"Invalid API-key provided.",
    "request_id":"fb53c4ec-1c12-4fc4-a580-cdb7c3261fc1"
}

状态码说明

大模型服务平台通用状态码请查阅：状态码说明