网页抓取

更新时间:
复制为 MD 格式

大模型无法直接获取网页数据。网页抓取工具可以访问指定 URL 并提取内容,为大模型提供所需信息。

使用方式

网页抓取功能支持三种调用方式,启用参数有所不同:

OpenAI 兼容-Responses API

需要通过 tools 参数启用网页抓取功能,并:

  • 同时添加 web_search(联网搜索) 和 web_extractor(网页抓取) 工具;

  • 启用 enable_thinking 参数开启思考模式。

为获得最佳回复效果,建议同时开启 code_interpreter 工具。
# 导入依赖与创建客户端...
response = client.responses.create(
    model="qwen3-max-2026-01-23",
    input="请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容",
    tools=[
        # 开启网页抓取必须同时开启联网搜索工具
        {"type": "web_search"},
        {"type": "web_extractor"},
        {"type": "code_interpreter"}
    ],
    extra_body={
      # 必须开启思考模式
      "enable_thinking": True
    }
)

print(response.output_text)

OpenAI 兼容-Chat Completions API

通过 enable_search 参数启用联网搜索,并将 search_strategy 设置为 agent_max 以启用网页抓取功能。同时需要启用 enable_thinking 参数开启思考模式。

不支持非流式输出。
# 导入依赖与创建客户端...
completion = client.chat.completions.create(
    model="qwen3-max-2026-01-23",
    messages=[{"role": "user", "content": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容"}],
    extra_body={
        "enable_thinking": True,
        "enable_search": True,
        "search_options": {"search_strategy": "agent_max"}
    },
    stream=True
)

DashScope

通过 enable_search 参数启用联网搜索,并将 search_strategy 设置为 agent_max 以启用网页抓取功能。同时需要启用 enable_thinking 参数开启思考模式。

不支持非流式输出。
from dashscope import Generation
    
response = Generation.call(
    model="qwen3-max-2026-01-23",
    messages=[{"role": "user", "content": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容"}],
    enable_search=True,
    search_options={"search_strategy": "agent_max"},
    enable_thinking=True,
    result_format="message",
    stream=True,
    incremental_output=True
)

支持的模型

支持思考模式下的qwen3-max-2026-01-23。若使用 Responses API,仅支持中国内地地域。

快速开始

运行以下代码,通过 Responses API 调用网页抓取工具,自动总结一篇技术文档。

需要已获取API Key配置API Key到环境变量
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"(不建议),
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
)

response = client.responses.create(
    model="qwen3-max-2026-01-23",
    input="请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容",
    tools=[
        {
            "type": "web_search"
        },
        {
            "type": "web_extractor"
        },
        {
            "type": "code_interpreter"
        }
    ],
    extra_body = {
        "enable_thinking": True
    }
)
# 取消以下注释查看中间过程输出
# print(response.output)
print("="*20+"回复内容"+"="*20)
print(response.output_text)
# 打印工具调用次数
usage = response.usage
print("="*20+"工具调用次数"+"="*20)
if hasattr(usage, 'x_tools') and usage.x_tools:
    print(f"\n网页抓取运行次数: {usage.x_tools.get('web_extractor', {}).get('count', 0)}")
import OpenAI from "openai";
import process from 'process';

const openai = new OpenAI({
    // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});

async function main() {
    const response = await openai.responses.create({
        model: "qwen3-max-2026-01-23",
        input: "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容",
        tools: [
            { type: "web_search" },
            { type: "web_extractor" },
            { type: "code_interpreter" }
        ],
        enable_thinking: true
    });

    console.log("====================回复内容====================");
    console.log(response.output_text);

    // 打印工具调用次数
    console.log("====================工具调用次数====================");
    if (response.usage && response.usage.x_tools) {
        console.log(`网页抓取次数: ${response.usage.x_tools.web_extractor?.count || 0}`);
        console.log(`联网搜索次数: ${response.usage.x_tools.web_search?.count || 0}`);
    }
    // 取消以下注释查看中间过程的输出
    // console.log(JSON.stringify(response.output[0], null, 2));
}

main();
curl -X POST https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-max-2026-01-23",
    "input": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容",
    "tools": [
        {"type": "web_search"},
        {"type": "web_extractor"},
        {"type": "code_interpreter"}
    ],
    "enable_thinking": true
}'

运行以上代码可获取如下回复:

====================回复内容====================
根据阿里云百炼官方文档,我为您总结了**代码解释器**功能的核心内容:

## 一、功能定位

...

> **文档来源**:阿里云百炼官方文档 - [Qwen代码解释器](https://help.aliyun.com/zh/model-studio/qwen-code-interpreter) 与 [Assistant API代码解释器](https://help.aliyun.com/zh/model-studio/code-interpreter)(更新时间:2025年12月)
====================工具调用次数====================

网页抓取运行次数: 1

流式输出

网页抓取耗时较长,建议启用流式输出,实时获取中间过程输出结果。

OpenAI 兼容-Responses API

import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
)
    
stream = client.responses.create(
    model="qwen3-max-2026-01-23",
    input="请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容",
    tools=[
        {"type": "web_search"},
        {"type": "web_extractor"},
        {"type": "code_interpreter"}
    ],
    stream=True,
    extra_body={"enable_thinking": True}
)

reasoning_started = False
output_started = False

for chunk in stream:
    # 打印思考过程
    if chunk.type == 'response.reasoning_summary_text.delta':
        if not reasoning_started:
            print("="*20 + "思考过程" + "="*20)
            reasoning_started = True
        print(chunk.delta, end='', flush=True)
    # 打印工具调用完成
    elif chunk.type == 'response.output_item.done':
        if hasattr(chunk, 'item') and hasattr(chunk.item, 'type'):
            if chunk.item.type == 'web_extractor_call':
                print("\n" + "="*20 + "工具调用" + "="*20)
                print(chunk.item.goal)
                print(chunk.item.output)
            elif chunk.item.type == 'reasoning':
                reasoning_started = False
    # 打印回复内容
    elif chunk.type == 'response.output_text.delta':
        if not output_started:
            print("\n" + "="*20 + "回复内容" + "="*20)
            output_started = True
        print(chunk.delta, end='', flush=True)
    # 响应完成,打印工具调用次数
    elif chunk.type == 'response.completed':
        print("\n" + "="*20 + "工具调用次数" + "="*20)
        usage = chunk.response.usage
        if hasattr(usage, 'x_tools') and usage.x_tools:
            print(f"网页抓取次数: {usage.x_tools.get('web_extractor', {}).get('count', 0)}")
            print(f"联网搜索次数: {usage.x_tools.get('web_search', {}).get('count', 0)}")
import OpenAI from "openai";
import process from 'process';

const openai = new OpenAI({
    // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1"
});

async function main() {
    const stream = await openai.responses.create({
        model: "qwen3-max-2026-01-23",
        input: "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容",
        tools: [
            { type: "web_search" },
            { type: "web_extractor" },
            { type: "code_interpreter" }
        ],
        stream: true,
        enable_thinking: true
    });

    let reasoningStarted = false;
    let outputStarted = false;

    for await (const chunk of stream) {
        // 打印思考过程
        if (chunk.type === 'response.reasoning_summary_text.delta') {
            if (!reasoningStarted) {
                console.log("====================思考过程====================");
                reasoningStarted = true;
            }
            process.stdout.write(chunk.delta);
        }
        // 打印工具调用完成
        else if (chunk.type === 'response.output_item.done') {
            if (chunk.item && chunk.item.type === 'web_extractor_call') {
                console.log("\n" + "====================工具调用====================");
                console.log(chunk.item.goal);
                console.log(chunk.item.output);
            } else if (chunk.item && chunk.item.type === 'reasoning') {
                reasoningStarted = false;
            }
        }
        // 打印回复内容
        else if (chunk.type === 'response.output_text.delta') {
            if (!outputStarted) {
                console.log("\n" + "====================回复内容====================");
                outputStarted = true;
            }
            process.stdout.write(chunk.delta);
        }
        // 响应完成,打印工具调用次数
        else if (chunk.type === 'response.completed') {
            console.log("\n" + "====================工具调用次数====================");
            const usage = chunk.response.usage;
            if (usage && usage.x_tools) {
                console.log(`网页抓取次数: ${usage.x_tools.web_extractor?.count || 0}`);
                console.log(`联网搜索次数: ${usage.x_tools.web_search?.count || 0}`);
            }
        }
    }
}

main();
curl -X POST https://dashscope.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-max-2026-01-23",
    "input": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容",
    "tools": [
        {"type": "web_search"},
        {"type": "web_extractor"},
        {"type": "code_interpreter"}
    ],
    "enable_thinking": true,
    "stream": true
}'

OpenAI 兼容-Chat Completions API

import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

stream = client.chat.completions.create(
    model="qwen3-max-2026-01-23",
    messages=[
        {"role": "user", "content": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容"}
    ],
    extra_body={
        "enable_search": True,
        "search_options": {"search_strategy": "agent_max"}
    },
    stream=True
)

for chunk in stream:
    print(chunk)
import OpenAI from "openai";
import process from 'process';

const openai = new OpenAI({
    // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const stream = await openai.chat.completions.create({
        model: "qwen3-max-2026-01-23",
        messages: [
            { role: "user", content: "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容" }
        ],
        enable_search: true,
        search_options: { search_strategy: "agent_max" },
        stream: true
    });

    for await (const chunk of stream) {
        console.log(chunk);
    }
}

main();
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-max-2026-01-23",
    "messages": [
        {"role": "user", "content": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容"}
    ],
    "enable_search": true,
    "search_options": {"search_strategy": "agent_max"},
    "stream": true
}'

DashScope

不支持 Java SDK。
import os
import dashscope
from dashscope import Generation

# 若没有配置环境变量,请用百炼API Key将下行替换为:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

response = Generation.call(
    model="qwen3-max-2026-01-23",
    messages=[
        {"role": "user", "content": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容"}
    ],
    enable_search=True,
    search_options={"search_strategy": "agent_max"},
    enable_thinking=True,
    result_format="message",
    stream=True,
    incremental_output=True
)

reasoning_started = False
output_started = False
last_usage = None

for chunk in response:
    if chunk.status_code == 200:
        message = chunk.output.choices[0].message

        # 打印思考过程
        if hasattr(message, 'reasoning_content') and message.reasoning_content:
            if not reasoning_started:
                print("="*20 + "思考过程" + "="*20)
                reasoning_started = True
            print(message.reasoning_content, end='', flush=True)

        # 打印回复内容
        if hasattr(message, 'content') and message.content:
            if not output_started:
                print("\n" + "="*20 + "回复内容" + "="*20)
                output_started = True
            print(message.content, end='', flush=True)

        # 保存最后的 usage 信息
        if hasattr(chunk, 'usage') and chunk.usage:
            last_usage = chunk.usage

# 打印工具调用次数
if last_usage:
    print("\n" + "="*20 + "工具调用次数" + "="*20)
    if hasattr(last_usage, 'plugins') and last_usage.plugins:
        print(f"网页抓取次数: {last_usage.plugins.get('web_extractor', {}).get('count', 0)}")
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "X-DashScope-SSE: enable" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-max-2026-01-23",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": "请访问阿里云百炼代码解释器部分的官方文档,并总结主要内容"
            }
        ]
    },
    "parameters": {
        "enable_thinking": true,
        "enable_search": true,
        "search_options": {
            "search_strategy": "agent_max"
        },
        "result_format": "message"
    }
}'

计费说明

计费涉及以下方面:

  • 模型调用费用:抓取的网页内容会拼接到提示词中,增加模型的输入 Token,按照模型的标准价格计费。价格详情请参考模型列表

  • 工具调用费用:包含网页抓取与联网搜索的费用。

    • 联网搜索工具每 1000 次调用费用:

      • 中国内地:4

      • 国际: 73.392381

    • 网页抓取工具限时免费。