数据挖掘(Qwen-Doc-Turbo)

更新时间: 2025-08-14 15:59:34

通义千问数据挖掘模型支持从文档中提取结构化信息,在数据标注、内容审核等领域中表现出色。

重要

本文档仅适用于“华北2(北京)”地域。如需使用模型,需使用“华北2(北京)”地域的API Key

模型概览

模型名称

上下文长度

最大输入

最大输出

输入成本

输出成本

免费额度

(Token数)

(每千Token)

qwen-doc-turbo

131,072

129,024

8,192

0.0006元

0.001元

限时免费试用

有效期:免费试用至2025年7月31日

试用期结束后按标准价格计费。

快速开始

您需要已获取API Key配置API Key到环境变量。如果通过OpenAI SDK或DashScope SDK进行调用,还需要安装SDK

文档内容上传方式选择

在选择文档内容上传方式时,请考虑以下因素:

  1. 通过文件ID 上传

    • 推荐:适合需要频繁引用和管理的文档。可以减少文本输入错误,操作简便。

      模型支持以纯文本提取和结构化方式解析TXT、DOCX、DOC、PPTX、PPT、XLSX、XLS、MD文件。单个文件大小不可超过150MB,单个阿里云账号最多可上传 1 万个文件,总文件大小不得超过 100GB。当任一条件超出限制时,需删除部分文件或文件内容以满足要求后再尝试上传,详情请参见OpenAI文件接口兼容
  2. 通过纯文本上传

    • 适用场景:适合小规模文档或临时内容。如果文档较短且不需要长期存储,可以选择此方式。

请根据您的具体需求和文档特性选择最合适的上传方式。我们建议优先考虑 文件ID 上传,以获得最佳体验。

通过文件ID传入文档信息

您可以通过OpenAI兼容接口上传文档,并将返回的文件ID输入到System Message中,使得模型在回复时参考文档信息。

文件ID目前仅能用于Qwen-Long、Qwen-Doc-Turbo模型以及Batch接口调用

Qwen-Doc-Turbo模型可以基于您上传的文档进行回复。此处以阿里云百炼系列手机产品介绍.docx作为示例文件。

  1. 将文件通过OpenAI兼容接口上传到阿里云百炼平台,保存至平台安全存储空间后获取文件ID。有关文档上传接口的详细参数解释及调用方式,请参考API文档页面进行了解。

    Python

    import os
    from pathlib import Path
    from openai import OpenAI
    
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # 如果您没有配置环境变量,请在此处替换您的API-KEY
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",  # 填写DashScope服务base_url
    )
    
    file_object = client.files.create(file=Path("阿里云百炼系列手机产品介绍.docx"), purpose="file-extract")
    print(file_object.id)

    Java

    import com.openai.client.OpenAIClient;
    import com.openai.client.okhttp.OpenAIOkHttpClient;
    import com.openai.models.*;
    
    import java.nio.file.Path;
    import java.nio.file.Paths;
    
    public class Main {
        public static void main(String[] args) {
            // 创建客户端,使用环境变量中的API密钥
            OpenAIClient client = OpenAIOkHttpClient.builder()
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                    .build();
            // 设置文件路径,请根据实际需求修改路径与文件名
            Path filePath = Paths.get("src/main/java/org/example/阿里云百炼系列手机产品介绍.docx");
            // 创建文件上传参数
            FileCreateParams fileParams = FileCreateParams.builder()
                    .file(filePath)
                    .purpose(FilePurpose.of("file-extract"))
                    .build();
    
            // 上传文件打印fileid
            FileObject fileObject = client.files().create(fileParams);
            System.out.println(fileObject.id());
        }
    }

    curl

    curl --location --request POST 'https://dashscope.aliyuncs.com/compatible-mode/v1/files' \
      --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
      --form 'file=@"阿里云百炼系列手机产品介绍.docx"' \
      --form 'purpose="file-extract"'

    运行以上代码,您可以得到本次上传文件对应的文件ID

  2. 文件ID传入System Message中,并在User Message中输入问题。

  3. import os
    from openai import OpenAI, BadRequestError
    
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    )
    
    try:
        completion = client.chat.completions.create(
            model="qwen-doc-turbo",
            messages=[
                {'role': 'system', 'content': 'You are a helpful assistant.'},
                # 如果您没有配置环境变量,请在此处替换您的API-KEY
                {'role': 'system', 'content': 'fileid://file-fe-xxx'},
                {'role': 'user', 'content': '阿里云百炼都有那些手机?'}
            ],
            # 所有代码示例均采用流式输出,以清晰和直观地展示模型输出过程。如果您希望查看非流式输出的案例,请参见https://help.aliyun.com/zh/model-studio/text-generation
            stream=True,
            stream_options={"include_usage": True}
        )
    
        full_content = ""
        for chunk in completion:
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
                print(chunk.model_dump())
        
        print(full_content)
    
    except BadRequestError as e:
        print(f"错误信息:{e}")
        print("请参考文档:https://help.aliyun.com/zh/model-studio/developer-reference/error-code")
    import com.openai.client.OpenAIClient;
    import com.openai.client.okhttp.OpenAIOkHttpClient;
    import com.openai.core.http.StreamResponse;
    import com.openai.models.*;
    
    public class Main {
        public static void main(String[] args) {
            // 创建客户端,使用环境变量中的API密钥
            OpenAIClient client = OpenAIOkHttpClient.builder()
                    .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                    ////请将 'file-fe-xxx'替换为您实际对话场景所使用的 fileid。
                    .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                    .build();
    
            ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
                    .addSystemMessage("You are a helpful assistant.")
                    .addSystemMessage("fileid://file-fe-xxx")
                    .addUserMessage("阿里云百炼都有那些手机?")
                    .model("qwen-doc-turbo")
                    .build();
    
            try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
                streamResponse.stream().forEach(chunk -> {
                    String content = chunk.choices().get(0).delta().content().orElse("");
                    if (!content.isEmpty()) {
                        System.out.print(content);
                    }
                });
            } catch (Exception e) {
                System.err.println("错误信息:" + e.getMessage());
            }
        }
    }
    curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
    --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
    --header "Content-Type: application/json" \
    --data '{
        "model": "qwen-doc-turbo",
        "messages": [
            {"role": "system","content": "You are a helpful assistant."},
            {"role": "system","content": "fileid://file-fe-xxx"},
            {"role": "user","content": "阿里云百炼都有那些手机?"}
        ],
        "stream": true,
        "stream_options": {
            "include_usage": true
        }
    }'
  4. 通过配置streamstream_options参数,Qwen-Doc-Turbo模型会流式输出回复,并在最后返回的对象中通过usage字段展示Token使用情况。

    {'id': 'chatcmpl-ddbb10e9-dba2-930e-9e5b-xxxxxxxxxxxx', 'choices': [{'delta': {'content': '这篇文章', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1753076578, 'model': 'qwen-doc-turbo', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
    {'id': 'chatcmpl-ddbb10e9-dba2-930e-9e5b-xxxxxxxxxxxx', 'choices': [{'delta': {'content': '是', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1753076578, 'model': 'qwen-doc-turbo', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
    ...
    {'id': 'chatcmpl-ddbb10e9-dba2-930e-9e5b-xxxxxxxxxxxx', 'choices': [{'delta': {'content': '方面的强大功能和', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1753076578, 'model': 'qwen-doc-turbo', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
    {'id': 'chatcmpl-ddbb10e9-dba2-930e-9e5b-xxxxxxxxxxxx', 'choices': [{'delta': {'content': '高性价比,吸引', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1753076578, 'model': 'qwen-doc-turbo', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
    {'id': 'chatcmpl-ddbb10e9-dba2-930e-9e5b-xxxxxxxxxxxx', 'choices': [{'delta': {'content': '潜在消费者的关注。', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1753076578, 'model': 'qwen-doc-turbo', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
    这篇文章是关于“百炼系列手机产品介绍”的内容。文章详细介绍了百炼系列的不同手机型号,包括它们的屏幕尺寸、分辨率、刷新率、存储空间、RAM、电池容量、摄像头配置、特色功能以及参考售价等信息。每一款手机都强调了其在特定领域的优势,如视觉体验、摄影体验、游戏性能、轻薄便携性以及折叠屏创新等。文章旨在展示百炼手机系列在各个方面的强大功能和高性价比,吸引潜在消费者的关注。
    ChatCompletionChunk{id=chatcmpl-2a9ebb9d-93ed-9342-8139-xxxxxxxxxxxx, choices=[Choice{delta=Delta{content=, functionCall=, refusal=, role=assistant, toolCalls=, additionalProperties={}}, finishReason=null, index=0, logprobs=null, additionalProperties={}}], created=1744943511, model=qwen-doc-turbo, object_=chat.completion.chunk, serviceTier=, systemFingerprint=null, usage=null, additionalProperties={}}
    ChatCompletionChunk{id=chatcmpl-2a9ebb9d-93ed-9342-8139-xxxxxxxxxxxx, choices=[Choice{delta=Delta{content=这篇文章, functionCall=, refusal=, role=, toolCalls=, additionalProperties={}}, finishReason=null, index=0, logprobs=null, additionalProperties={}}], created=1744943511, model=qwen-doc-turbo, object_=chat.completion.chunk, serviceTier=, systemFingerprint=null, usage=null, additionalProperties={}}
    ChatCompletionChunk{id=chatcmpl-2a9ebb9d-93ed-9342-8139-xxxxxxxxxxxx, choices=[Choice{delta=Delta{content=介绍了, functionCall=, refusal=, role=, toolCalls=, additionalProperties={}}, finishReason=null, index=0, logprobs=null, additionalProperties={}}], created=1744943511, model=qwen-doc-turbo, object_=chat.completion.chunk, serviceTier=, systemFingerprint=null, usage=null, additionalProperties={}}
    ...
    ChatCompletionChunk{id=chatcmpl-2a9ebb9d-93ed-9342-8139-xxxxxxxxxxxx, choices=[Choice{delta=Delta{content=手中的“科技艺术品, functionCall=, refusal=, role=, toolCalls=, additionalProperties={}}, finishReason=null, index=0, logprobs=null, additionalProperties={}}], created=1744943511, model=qwen-doc-turbo, object_=chat.completion.chunk, serviceTier=, systemFingerprint=null, usage=null, additionalProperties={}}
    ChatCompletionChunk{id=chatcmpl-2a9ebb9d-93ed-9342-8139-xxxxxxxxxxxx, choices=[Choice{delta=Delta{content=”。, functionCall=, refusal=, role=, toolCalls=, additionalProperties={}}, finishReason=null, index=0, logprobs=null, additionalProperties={}}], created=1744943511, model=qwen-doc-turbo, object_=chat.completion.chunk, serviceTier=, systemFingerprint=null, usage=null, additionalProperties={}}
    ChatCompletionChunk{id=chatcmpl-2a9ebb9d-93ed-9342-8139-xxxxxxxxxxxx, choices=[Choice{delta=Delta{content=, functionCall=, refusal=, role=, toolCalls=, additionalProperties={}}, finishReason=stop, index=0, logprobs=null, additionalProperties={}}], created=1744943511, model=qwen-doc-turbo, object_=chat.completion.chunk, serviceTier=, systemFingerprint=null, usage=null, additionalProperties={}}
    这篇文章介绍了多个品牌的手机产品,具体描述了每一款手机的主要特点和卖点,包括屏幕尺寸.....每款手机都强调了自己的独特之处,力求成为用户手中的“科技艺术品”。
    data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[{"finish_reason":null,"delta":{"content":"这篇文章"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[{"delta":{"content":"是"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[{"delta":{"content":"关于"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    .....
    
    data: {"choices":[{"delta":{"content":"描述了每款"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[{"delta":{"content":"手机的主要特点和"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[{"delta":{"content":"规格,并提供了参考"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[{"delta":{"content":"售价信息。"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":5395,"completion_tokens":71,"total_tokens":5466},"created":1728649489,"system_fingerprint":null,"model":"qwen-doc-turbo","id":"chatcmpl-e2434284-140a-9e3a-8ca5-f81e65e98d01"}
    
    data: [DONE]

纯文本传入文档信息

除了通过 文件ID 传入文档信息外,您还可以直接使用字符串传入文档内容。在此方法下,为避免模型混淆角色设定与文档内容,请确保在 messages 的第一条消息中添加用于角色设定的信息。

import os
from openai import OpenAI, BadRequestError

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

try:
    completion = client.chat.completions.create(
        model="qwen-doc-turbo",
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            # 如果您没有配置环境变量,请在此处替换您的API-KEY
            {'role': 'system', 'content': '阿里云百炼手机产品介绍 阿里云百炼X1 ——————畅享极致视界:搭载6.7英寸1440 x 3200像素超清屏幕...'},
            {'role': 'user', 'content': '阿里云百炼都有那些手机?'}
        ],
        # 所有代码示例均采用流式输出,以清晰和直观地展示模型输出过程。如果您希望查看非流式输出的案例,请参见https://help.aliyun.com/zh/model-studio/text-generation
        stream=True,
        stream_options={"include_usage": True}
    )

    full_content = ""
    for chunk in completion:
        if chunk.choices and chunk.choices[0].delta.content:
            full_content += chunk.choices[0].delta.content
            print(chunk.model_dump())
    
    print(full_content)

except BadRequestError as e:
    print(f"错误信息:{e}")
    print("请参考文档:https://help.aliyun.com/zh/model-studio/developer-reference/error-code")
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.StreamResponse;
import com.openai.models.*;

public class Main {
    public static void main(String[] args) {
        // 创建客户端,使用环境变量中的API密钥
        OpenAIClient client = OpenAIOkHttpClient.builder()
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                ////请将 'file-fe-xxx'替换为您实际对话场景所使用的 fileid。
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();

        ChatCompletionCreateParams chatParams = ChatCompletionCreateParams.builder()
                .addSystemMessage("You are a helpful assistant.")
                .addSystemMessage("阿里云百炼手机产品介绍 阿里云百炼X1 ——————畅享极致视界:搭载6.7英寸1440 x 3200像素超清屏幕...")
                .addUserMessage("阿里云百炼都有那些手机?")
                .model("qwen-doc-turbo")
                .build();

        try (StreamResponse<ChatCompletionChunk> streamResponse = client.chat().completions().createStreaming(chatParams)) {
            streamResponse.stream().forEach(chunk -> {
                String content = chunk.choices().get(0).delta().content().orElse("");
                if (!content.isEmpty()) {
                    System.out.print(content);
                }
            });
        } catch (Exception e) {
            System.err.println("错误信息:" + e.getMessage());
        }
    }
}
curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "model": "qwen-doc-turbo",
    "messages": [
        {"role": "system","content": "You are a helpful assistant."},
        {"role": "system","content": "阿里云百炼X1 —— 畅享极致视界:搭载6.7英寸1440 x 3200像素超清屏幕,搭配120Hz刷新率,..."},
        {"role": "user","content": "阿里云百炼都有那些手机?"}
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    }
}'

常见问题

  1. Dashscope SDK的调用方式是否兼容?

    是的,模型调用兼容DashScope SDK,但文件上传仅限OpenAI SDK。

  2. 不同的API Key之间能否共享文件ID进行调用?

    只能在同一个阿里云账号内的API Key之间共享。

  3. 通过OpenAI文件兼容接口上传文件后,文件将被保存在何处?

    保存在当前阿里云账号的百炼存储空间,不产生费用。关于所上传文件的信息查询与管理请参考OpenAI文件接口

  4. 文档ID是否可以用于其他模型对话或功能调用?

    文件ID目前仅能用于Qwen-Long、Qwen-Doc-Turbo模型对话以及Batch接口批量调用

  5. 有非流式输出的代码请求示例参考吗?

    相关内容请参考非流式输出案例

API参考

关于Qwen-Doc-Turbo模型的输入与输出参数,请参考通义千问API详情

错误码

如果模型调用失败并返回报错信息,请参见错误信息进行解决。

限流

模型限流触发条件请参考:限流

上一篇: 角色扮演(Qwen-Character) 下一篇: 用户指南(应用)