Alibaba Cloud Model Studio supports the OpenAI-compatible Responses API. Building on the Chat Completions API, the Responses API streamlines native agent functionality.
Advantages over the OpenAI Chat Completions API:
-
Built-in tools: Improve results for complex tasks with web search, web scraping, a code interpreter, text-to-image, and image-to-image. For details, see Call built-in tools.
-
More flexible input: The API supports both direct string inputs and message arrays in the standard chat format.
-
Simplified context management: Passing the
previous_response_ideliminates the need to manually build a complete message history array.
See the OpenAI Responses API reference for parameter details.
Prerequisites
First, get an API key and set it as an environment variable. If you use the OpenAI SDK, install the SDK.
The legacy path /api/v2/apps/protocols/compatible-mode/v1/responses for the OpenAI-compatible Responses API will be deprecated soon. Please migrate to the new path /compatible-mode/v1/responses as soon as possible.
Model Studio has released workspace-specific domains for the Singapore regions. The new dedicated domains deliver superior performance and higher stability for inference requests. We recommend migrating to the new domains:
-
Singapore: from
https://dashscope-intl.aliyuncs.comtohttps://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com
{WorkspaceId} is your workspace ID, which can be found on the Workspace Details page in the Model Studio console. The existing domain remains fully functional.
Supported models
qwen3-max, qwen3-max-2026-01-23, qwen3.7-max, qwen3.7-max-2026-05-20, qwen3.7-max-2026-06-08, qwen3.7-max-preview, qwen3.7-max-2026-05-17, qwen3.7-plus, qwen3.7-plus-2026-05-26, qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.6-flash, qwen3.6-flash-2026-04-16, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen3.6-35b-a3b, qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash, and qwen3-coder-next.
Endpoints
China (Beijing)
The base_url for SDK call configuration: https://dashscope.aliyuncs.com/compatible-mode/v1
HTTP request URL: POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses
Singapore
SDK call configuration base_url: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
HTTP request URL: POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/responses
Replace WorkspaceId with your actual Workspace ID.
US (Virginia)
SDK call configuration base_url: https://dashscope-us.aliyuncs.com/compatible-mode/v1
HTTP request URL: POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/responses
Germany (Frankfurt)
HTTP request URL: POST https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/compatible-mode/v1/responses
The base_url for SDK call configuration: https://{WorkspaceId}.eu-central-1.maas.aliyuncs.com/compatible-mode/v1
Replace WorkspaceId with your actual Workspace ID.
Japan (Tokyo)
HTTP request URL: POST https://{WorkspaceId}.ap-northeast-1.maas.aliyuncs.com/compatible-mode/v1/responses
SDK call configuration base_url: https://{WorkspaceId}.ap-northeast-1.maas.aliyuncs.com/compatible-mode/v1
Replace WorkspaceId with your actual Workspace ID.
Code examples
Basic call
Send a message and get a response.
Python
import os
from openai import OpenAI
client = OpenAI(
# If an environment variable is not set, replace with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.7-plus",
input="你能做些什么?"
)
# Get model response
# print(response.model_dump_json())
print(response.output_text)
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
# If an environment variable is not set, replace with your Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.7-plus",
input: "你能做些什么?"
});
// Get model response
console.log(response.output_text);
}
main();
Curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"input": "你能做些什么?"
}'
Example response
This is a complete API response.
{
"created_at": 1771225825,
"id": "0c842a11-c7d1-45da-b7ec-4e668c389xxx",
"model": "qwen3.7-plus",
"object": "response",
"output": [
{
"id": "msg_0bdb8ab9-f1de-4db6-82c8-6c1185b91xxx",
"summary": [
{
"text": "Thinking Process:\n\n1. **Analyze the Request ...",
"type": "summary_text"
}
],
"type": "reasoning"
},
{
"content": [
{
"annotations": [],
"text": "Hello! As an AI assistant, I can help you with a variety of tasks. Here are some of my main capabilities:\n\n1. **Text Creation and Editing**\n * Write emails, articles, reports, stories, or social media copy.\n * Polish, rewrite, or summarize existing text.\n\n2. **Programming and Technical Support**\n * Write, debug, or explain code (supporting multiple programming languages like Python, JavaScript, C++, etc.).\n * Provide explanations of technical concepts and suggest solutions.\n\n3. **Q&A and Learning**\n * Answer questions across various fields (my knowledge includes information up to 2026).\n * Assist with learning new concepts, creating study plans, or solving exercises.\n\n4. **Language Translation**\n * Translate between multiple languages to help you overcome language barriers.\n\n5. **Data Analysis and Organization**\n * Help organize information, extract key points, or perform logical analysis.\n * Help format data or generate table structures.\n\n6. **Creativity and Brainstorming**\n * Provide creative inspiration, project plans, or suggestions.\n * Chat with you, offering emotional support or life advice.\n\nIs there anything specific I can help you with? Just let me know!",
"type": "output_text"
}
],
"id": "msg_c8bb3db1-d235-44e7-9704-55b584022xxx",
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": false,
"status": "completed",
"tool_choice": "auto",
"tools": [],
"usage": {
"input_tokens": 49,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 1384,
"output_tokens_details": {
"reasoning_tokens": 1110
},
"total_tokens": 1433,
"x_details": [
{
"input_tokens": 49,
"output_tokens": 1384,
"output_tokens_details": {
"reasoning_tokens": 1110
},
"total_tokens": 1433,
"x_billing_type": "response_api"
}
]
}
}
Multi-turn conversation
The previous_response_id parameter automatically maintains the conversation context, so you do not need to manually build the message history. Each response id is valid for 7 days.
Theprevious_response_idmust be the top-levelidfrom the previous response (e.g.,resp_xxx, in UUID format), not the messageidfrom within theoutputarray (e.g.,msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx).
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
# First round
response1 = client.responses.create(
model="qwen3.7-plus",
input="My name is Alice, please remember it."
)
print(f"First response: {response1.output_text}")
# Second round - use previous_response_id to link context
# The response id expires in 7 days
response2 = client.responses.create(
model="qwen3.7-plus",
input="Do you remember my name?",
previous_response_id=response1.id
)
print(f"Second response: {response2.output_text}")
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
// First round
const response1 = await openai.responses.create({
model: "qwen3.7-plus",
input: "My name is Alice, please remember it."
});
console.log(`First response: ${response1.output_text}`);
// Second round - use previous_response_id to link context
// The response id expires in 7 days
const response2 = await openai.responses.create({
model: "qwen3.7-plus",
input: "Do you remember my name?",
previous_response_id: response1.id
});
console.log(`Second response: ${response2.output_text}`);
}
main();
Curl
# First round
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"input": "My name is Alice, please remember it."
}'
# Second round - use the id from the first response as previous_response_id
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"input": "Do you remember my name?",
"previous_response_id": "response_id_from_first_round"
}'
Second-round response example
{
"id": "f0dbb153-117f-9bbf-8176-5284b47f3xxx",
"created_at": 1769169951.0,
"model": "qwen3.7-plus",
"object": "response",
"status": "completed",
"output": [
{
"id": "msg_56c860c4-3ad8-4a96-8553-d2f94c259xxx",
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Of course. Your name is Alice!",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 73,
"output_tokens": 10,
"total_tokens": 83,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
}
}
Note: In the second round, the input_tokens count is 73. This number includes context from the first round, showing that the model successfully remembered the name "Alice".
Deep thinking
Use the reasoning parameter to control the model's reasoning strength. When you set reasoning.effort, the model thinks before replying, and returns the thinking process in a reasoning output item. The effort parameter supports the following values:
-
none: Disables thinking and provides a direct answer. -
minimal: Minimizes thinking for the fastest response. -
low: Performs light thinking, prioritizing a quick response. -
medium(default): Performs moderate thinking, balancing speed and depth. -
high: Performs deep thinking, focusing on complex and specialized problems.
You cannot use thethinking_budgetparameter to control the maximum thinking length.reasoning.efforttakes precedence overenable_thinking. Usereasoning.effort, asenable_thinkingwill be deprecated.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.7-plus",
input="Which is larger, 9.9 or 9.11?",
reasoning={"effort": "medium"}
)
# Process the output
for item in response.output:
if item.type == "reasoning":
print("=== Thinking Process ===")
for summary in item.summary:
print(summary.text)
elif item.type == "message":
print("\n=== Final Answer ===")
print(item.content[0].text)
# Check the thinking token count
print(f"\nThinking token count: {response.usage.output_tokens_details.reasoning_tokens}")
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.7-plus",
input: "Which is larger, 9.9 or 9.11?",
reasoning: { effort: "medium" }
});
for (const item of response.output) {
if (item.type === "reasoning") {
console.log("=== Thinking Process ===");
for (const summary of item.summary) {
console.log(summary.text);
}
} else if (item.type === "message") {
console.log("\n=== Final Answer ===");
console.log(item.content[0].text);
}
}
// Check the thinking token count
console.log(`\nThinking token count: ${response.usage.output_tokens_details.reasoning_tokens}`);
}
main();
Curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"input": "Which is larger, 9.9 or 9.11?",
"reasoning": {"effort": "medium"}
}'
Example response
{
"created_at": 1774498317,
"id": "resp_xxx",
"model": "qwen3.7-plus",
"object": "response",
"output": [
{
"id": "msg_xxx",
"summary": [
{
"text": "Thinking Process:\n\n1. Analyze the Request:\n * Question: \"Which is larger, 9.9 or 9.11?\"\n * Context: The user is asking a simple mathematical comparison question.\n * Current Date: Thursday, March 26, 2026 (provided in the system prompt).\n * Knowledge Cutoff: 2026 (provided in the system prompt).\n\n2. Evaluate the Numbers:\n * Number A: 9.9\n * Number B: 9.11\n * These are decimal numbers.\n * Compare the integer part: Both are 9.\n * Compare the tenths place (first decimal digit):\n * 9.9 has 9 in the tenths place.\n * 9.11 has 1 in the tenths place.\n * Since 9 > 1, 9.9 is greater than 9.11.\n\n3. Consider Potential Ambiguities:\n * Could this be version numbering? (e.g., software versions). In versioning, 9.11 is often \"newer\" or \"higher\" than 9.9. However, mathematically, 9.9 > 9.11.\n * Could this be dates? (September 9th vs September 11th). 11th is later.\n * Standard interpretation: Without context, decimal numbers are assumed to be mathematical values.\n * Common pitfall: Some people mistakenly treat decimals like whole numbers (where 11 > 9), leading them to think 9.11 > 9.9. This is a known cognitive bias or misconception in elementary math.\n * Decision: Provide the mathematical answer clearly, but perhaps acknowledge the versioning context if relevant (though usually, for this specific question, it's a math test). Given the simplicity, stick to the mathematical truth first.\n\n4. Formulate the Answer:\n * Direct answer: 9.9 is larger.\n * Explanation: Compare place values. 9.9 = 9.90, 9.11 = 9.11. 90 hundredths > 11 hundredths.\n * Tone: Helpful, clear.\n\n5. Drafting the Response (Internal Monologue/Draft):\n * Okay, mathematically 9.9 is bigger. 9.9 is 9 and 9 tenths. 9.11 is 9 and 11 hundredths (or 1 tenth and 1 hundredth). 9 tenths is more than 1 tenth.\n * So, 9.9 > 9.11.\n * I should answer in English since the question is in English.\n * \"9.9 is larger.\"\n * Add a brief explanation to avoid confusion. \"Because 9.9 equals 9.90, and 9.90 is greater than 9.11.\"\n\n6. Refining based on \"Version Number\" possibility:\n * Sometimes this is a trick question about software versions. In semver, 9.11 > 9.9.\n * However, usually, when asked plainly \"which is larger\", it refers to numerical value.\n * I will provide the numerical answer as primary, maybe add a note about versions if it feels necessary. But keeping it simple is better. Most likely, this is testing the decimal comparison logic.\n * Let's stick to the mathematical value as the default.\n\n7. Final Polish:\n * Answer: 9.9 is larger.\n * Reasoning: When comparing decimals, first look at the integer part, then the tenths place. The tenths digit of 9.9 is 9, and the tenths digit of 9.11 is 1. Since 9 > 1, 9.9 > 9.11.\n\n8. Output Generation: (Matching the thought process)\n * State clearly which is larger.\n * Explain why.\n\n *Self-Correction on Date/Time:* The system prompt mentions the current date is 2026. This doesn't affect the math question, but I should ensure I don't reference anything post-2026 if I were adding context (not needed here).\n\n *Final Decision:* Just answer the math question directly.\n\n \"Numerically, 9.9 is larger.\"\n Explain: 9.9 = 9.90, 9.11 = 9.11. 90 > 11.\n\n Wait, is there any other interpretation?\n - Dates? 9/9 vs 9/11. 11th is later.\n - Versions? 9.11 is newer.\n - But \"who is big/larger\" usually implies magnitude.\n - I will answer based on numerical magnitude but briefly mention the version context if it adds value without confusing. Actually, better to be decisive. Numerical value is the standard interpretation for \"who is larger\" with numbers.\n\n Let's go with the numerical answer.\n\n Plan:\n 1. State 9.9 is larger numerically.\n 2. Explain the decimal comparison.\n 3. (Optional but helpful) Mention that if it's version numbers, 9.11 might be considered \"higher\". But prioritize the math.\n Actually, for a simple query like this, over-explaining versions might be confusing. I'll stick to the math but acknowledge the common confusion.\n\n *Revised Plan:*\n 1. Direct answer: 9.9 is larger (mathematically).\n 2. Explanation: Place value.\n 3. Note: Unless it's a version number.\n\n Let's keep it concise.\n\n \"In terms of mathematical value, 9.9 is larger.\"\n \"Because 9.9 equals 9.90, which is greater than 9.11.\"\n \"If it's a version number, version 9.11 is usually higher than version 9.9.\"\n This covers both bases.\n\n Wait, looking at common AI training, this is often a \"trap\" question to see if the AI treats decimals like integers. The correct mathematical answer is crucial.\n\n Okay, final output structure:\n 1. Mathematical answer.\n 2. Brief explanation.\n 3. Contextual note (versions).\n\n Let's write it.\n \"In terms of mathematical value, 9.9 is larger.\"\n \"When comparing decimals, first compare the integer part (both are 9), then the tenths place: The tenths digit of 9.9 is 9, while the tenths digit of 9.11 is 1. Because 9 > 1, 9.9 > 9.11.\"\n \"However, if these are software version numbers, version 9.11 is typically newer than version 9.9.\"\n\n This is balanced.\n\n Wait, I need to check the instruction again. \"Please remember the current actual time...\". This is just system context. I don't need to mention the date in the answer unless relevant. It's not relevant here.\n\n Okay, ready to generate.\n\n Wait, one more check. Is there any cultural context? In some contexts, people might say 9.11 is bigger because 11 > 9. This is the misconception I need to correct.\n So, emphasizing the mathematical rule is key.\n\n Final check:\n 9.9 = 9 + 9/10 = 9.90\n 9.11 = 9 + 1/10 + 1/100 = 9.11\n 9.90 > 9.11. Correct.\n\n Response will be in English.\n \"In terms of numerical value, 9.9 is larger.\"\n Explain why.\n Mention version exception.\n Done.\n",
"type": "summary_text"
}
],
"type": "reasoning"
},
{
"content": [
{
"annotations": [],
"text": "In terms of mathematical value, 9.9 is larger.\n\nHere's why:\nWhen comparing decimals, you first look at the integer part, which is 9 for both. Then you look at the tenths place (the first digit after the decimal point):\n* The tenths digit of 9.9 is 9\n* The tenths digit of 9.11 is 1\n\nSince 9 is greater than 1, 9.9 > 9.11 (you can think of 9.9 as 9.90 for comparison).\n\nNote: If these were software version numbers, version 9.11 would typically be considered newer (or higher) than version 9.9. However, in a direct numerical comparison, 9.9 is larger.",
"type": "output_text"
}
],
"id": "msg_xxx",
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"parallel_tool_calls": false,
"status": "completed",
"tool_choice": "auto",
"tools": [],
"usage": {
"input_tokens": 57,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 2018,
"output_tokens_details": {
"reasoning_tokens": 1861
},
"total_tokens": 2075,
"x_details": [
{
"input_tokens": 57,
"output_tokens": 2018,
"output_tokens_details": {
"reasoning_tokens": 1861
},
"total_tokens": 2075,
"x_billing_type": "response_api"
}
]
}
}
Stream output
Receive content from the model in real time, especially useful for long-form text generation.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
stream = client.responses.create(
model="qwen3.7-plus",
input="Please briefly introduce artificial intelligence.",
stream=True
)
print("Receiving stream output:")
for event in stream:
# print(event.model_dump_json()) # Uncomment to see the raw event response
if event.type == 'response.output_text.delta':
# Print the text delta in real time
print(event.delta, end='', flush=True)
elif event.type == 'response.completed':
print("\nStream completed")
print(f"Total tokens: {event.response.usage.total_tokens}")
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const stream = await openai.responses.create({
model: "qwen3.7-plus",
input: "Please briefly introduce artificial intelligence.",
stream: true
});
console.log("Receiving stream output:");
for await (const event of stream) {
// console.log(JSON.stringify(event)); // Uncomment to see the raw event response
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
} else if (event.type === 'response.completed') {
console.log("\nStream completed");
console.log(`Total tokens: ${event.response.usage.total_tokens}`);
}
}
}
main();
Curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"input": "Please briefly introduce artificial intelligence.",
"stream": true
}'
Example response
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"queued","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":0,"type":"response.created"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"","object":"response","output":[],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"in_progress","text":null,"top_logprobs":null,"truncation":null,"usage":null,"user":null},"sequence_number":1,"type":"response.in_progress"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[],"role":"assistant","status":"in_progress","type":"message"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"","type":"output_text","logprobs":null},"sequence_number":3,"type":"response.content_part.added"}
{"content_index":0,"delta":"Artificial","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":4,"type":"response.output_text.delta"}
{"content_index":0,"delta":" intelligence","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":5,"type":"response.output_text.delta"}
{"content_index":0,"delta":" (","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":6,"type":"response.output_text.delta"}
{"content_index":0,"delta":"AI","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":7,"type":"response.output_text.delta"}
... (intermediate events omitted) ...
{"content_index":0,"delta":"fields, and is profoundly changing our","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":38,"type":"response.output_text.delta"}
{"content_index":0,"delta":" lives and ways of working","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":39,"type":"response.output_text.delta"}
{"content_index":0,"delta":".","item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":40,"type":"response.output_text.delta"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","logprobs":[],"output_index":0,"sequence_number":41,"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior by using computer systems. xxxx","type":"response.output_text.done"}
{"content_index":0,"item_id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","output_index":0,"part":{"annotations":[],"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior by using computer systems. xxx","type":"output_text","logprobs":null},"sequence_number":42,"type":"response.content_part.done"}
{"item":{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"Artificial intelligence (AI) is the technology and science of simulating human intelligent behavior by using computer systems. It aims to enable machines to perform tasks that typically require human intelligence, such as:\n\n- Learning (for example, training models with data) \n- Reasoning (for example, logical judgment and problem-solving) \n- Perception (for example, recognizing images, speech, or text) \n- Understanding language (for example, natural language processing) \n- Decision-making (for example, making optimal choices in complex environments)\n\nAI can be divided into weak AI (focused on specific tasks, such as voice assistants and recommendation systems) and strong AI (possessing general, human-like intelligence, which has not yet been achieved).\n\nCurrently, AI is widely used in various fields, including healthcare, finance, transportation, education, and entertainment, and is profoundly changing the way we live and work.","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"},"output_index":0,"sequence_number":43,"type":"response.output_item.done"}
{"response":{"id":"47a71e7d-868c-4204-9693-ef8ff9058xxx","created_at":1769417481.0,"error":null,"incomplete_details":null,"instructions":null,"metadata":null,"model":"qwen3.7-plus","object":"response","output":[{"id":"msg_16db29d6-c1d3-47d7-9177-0fba81964xxx","content":[{"annotations":[],"text":"Artificial intelligence (AI) is xxxxxx","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"}],"parallel_tool_calls":false,"temperature":null,"tool_choice":"auto","tools":[],"top_p":null,"background":null,"completed_at":null,"conversation":null,"max_output_tokens":null,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"prompt_cache_key":null,"prompt_cache_retention":null,"reasoning":null,"safety_identifier":null,"service_tier":null,"status":"completed","text":null,"top_logprobs":null,"truncation":null,"usage":{"input_tokens":37,"input_tokens_details":{"cached_tokens":0},"output_tokens":166,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":203},"user":null},"sequence_number":44,"type":"response.completed"}
Use built-in tools
Enable built-in tools for complex tasks. The web extractor and code interpreter are free for a limited time. See tool calling for supported tools.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
response = client.responses.create(
model="qwen3.7-plus",
input="Find the Alibaba Cloud website and extract key information from the homepage",
# For best results, enable all the built-in tools.
tools=[
{"type": "web_search"},
{"type": "code_interpreter"},
{"type": "web_extractor"}
],
reasoning={"effort": "medium"}
)
# Uncomment the following line to see the intermediate output.
# print(response.output)
print(response.output_text)
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.responses.create({
model: "qwen3.7-plus",
input: "Find the Alibaba Cloud website and extract key information from the homepage",
tools: [
{ type: "web_search" },
{ type: "code_interpreter" },
{ type: "web_extractor" }
],
reasoning: { effort: "medium" }
});
for (const item of response.output) {
if (item.type === "reasoning") {
console.log("Model is thinking...");
} else if (item.type === "web_search_call") {
console.log(`Search query: ${item.action.query}`);
} else if (item.type === "web_extractor_call") {
console.log("Extracting web content...");
} else if (item.type === "message") {
console.log(`Response: ${item.content[0].text}`);
}
}
}
main();
Curl
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"input": "Find the Alibaba Cloud website and extract key information from the homepage",
"tools": [
{
"type": "web_search"
},
{
"type": "code_interpreter"
},
{
"type": "web_extractor"
}
],
"reasoning": {"effort": "medium"}
}'
Example response
{
"id": "69258b21-5099-9d09-92e8-8492b1955xxx",
"object": "response",
"status": "completed",
"output": [
{
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "The user wants to find the Alibaba Cloud website and extract information..."
}
]
},
{
"type": "web_search_call",
"status": "completed",
"action": {
"query": "Alibaba Cloud official website",
"type": "search",
"sources": [
{
"type": "url",
"url": "https://cn.aliyun.com/"
},
{
"type": "url",
"url": "https://www.alibabacloud.com/zh"
}
]
}
},
{
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "The search results show the Alibaba Cloud website URL..."
}
]
},
{
"type": "web_extractor_call",
"status": "completed",
"goal": "Extract key information from the Alibaba Cloud homepage",
"output": "Tongyi Large Language Model, full product portfolio, AI solutions...",
"urls": [
"https://cn.aliyun.com/"
]
},
{
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Key information from the Alibaba Cloud website: Tongyi Large Language Model, cloud computing services..."
}
]
}
],
"usage": {
"input_tokens": 40836,
"output_tokens": 2106,
"total_tokens": 42942,
"output_tokens_details": {
"reasoning_tokens": 677
},
"x_tools": {
"web_extractor": {
"count": 1
},
"web_search": {
"count": 1
}
}
}
}
Session cache
In multi-turn conversations, enable the session cache to let the server cache conversation context automatically. This reduces latency and costs without manual cache management.
Usage: To enable the session cache, add x-dashscope-session-cache: enable to the request header. To disable it, set the value to disable.
Supported models: qwen3.7-max, qwen3.7-max-2026-05-20, qwen3.7-max-2026-06-08, qwen3-max, qwen3.7-plus, qwen3.6-plus, qwen3.5-plus, qwen3.6-flash, qwen3.5-flash, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash
The session cache requires a minimum prompt length of 1024 tokens and expires after 5 minutes. It shares the same constraints as the explicit cache.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
# Enable session cache via default_headers
default_headers={"x-dashscope-session-cache": "enable"}
)
# Construct a long text exceeding 1024 tokens to trigger cache creation.
# (If the initial prompt is under 1024 tokens, the server creates the cache once the total context exceeds this threshold.)
long_context = "Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence." * 50
# First request
response1 = client.responses.create(
model="qwen3.7-plus",
input=long_context + "\n\nBased on the background knowledge above, briefly introduce the random forest algorithm in machine learning.",
)
print(f"First response: {response1.output_text}")
# Second request: Link the context using previous_response_id. The server manages the cache automatically.
response2 = client.responses.create(
model="qwen3.7-plus",
input="What are the main differences between it and GBDT?",
previous_response_id=response1.id,
)
print(f"Second response: {response2.output_text}")
# Check the cache hit status
usage = response2.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Cached tokens: {usage.input_tokens_details.cached_tokens}")
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1",
// Enable session cache via defaultHeaders
defaultHeaders: {"x-dashscope-session-cache": "enable"}
});
// Construct a long text exceeding 1024 tokens to trigger cache creation.
// (If the initial prompt is under 1024 tokens, the server creates the cache once the total context exceeds this threshold.)
const longContext = "Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence.".repeat(50);
async function main() {
// First request
const response1 = await openai.responses.create({
model: "qwen3.7-plus",
input: longContext + "\n\nBased on the background knowledge above, briefly introduce the random forest algorithm in machine learning, including its basic principles and use cases."
});
console.log(`First response: ${response1.output_text}`);
// Second request: Link the context using previous_response_id. The server manages the cache automatically.
const response2 = await openai.responses.create({
model: "qwen3.7-plus",
input: "What are the main differences between it and GBDT?",
previous_response_id: response1.id
});
console.log(`Second response: ${response2.output_text}`);
// Check the cache hit status
console.log(`Input tokens: ${response2.usage.input_tokens}`);
console.log(`Cached tokens: ${response2.usage.input_tokens_details.cached_tokens}`);
}
main();
curl
# First request
# The 'input' text must exceed 1024 tokens to trigger cache creation.
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
"model": "qwen3.7-plus",
"input": "Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence. Artificial intelligence is an important branch of computer science that focuses on the research and development of theories, methods, technologies, and application systems that can simulate, extend, and expand human intelligence.\n\nBased on the background knowledge above, briefly introduce the random forest algorithm in machine learning, including its basic principles and use cases."
}'
# Second request - Set 'previous_response_id' to the 'id' from the first response.
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "x-dashscope-session-cache: enable" \
-d '{
"model": "qwen3.7-plus",
"input": "What are the main differences between it and GBDT?",
"previous_response_id": "id_from_first_response"
}'
Migrate from Chat Completions API to Responses API
The Responses API simplifies the Chat Completions API interface while maintaining compatibility. To migrate, follow these steps.
1. Update the endpoint address
Update the endpoint address from /v1/chat/completions to /v1/responses.
Python
# Chat Completions API
completion = client.chat.completions.create(
model="qwen3.7-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message.content)
# Responses API - can use the same message format
response = client.responses.create(
model="qwen3.7-plus",
input=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.output_text)
# Responses API - or use a more concise format
response = client.responses.create(
model="qwen3.7-plus",
input="Hello!"
)
print(response.output_text)
Node.js
// Chat Completions API
const completion = await client.chat.completions.create({
model: "qwen3.7-plus",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(completion.choices[0].message.content);
// Responses API - can use the same message format
const response = await client.responses.create({
model: "qwen3.7-plus",
input: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(response.output_text);
// Responses API - or use a more concise format
const response2 = await client.responses.create({
model: "qwen3.7-plus",
input: "Hello!"
});
console.log(response2.output_text);
Curl
# Chat Completions API
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
}'
# Responses API - use a more concise format
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/responses \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"input": "Hello!"
}'
2. Update response handling
The Responses API returns a different response structure. Use the output_text shortcut to retrieve the text output, or access detailed information through the output array.
Response comparison
|
|
3. Simplify multi-turn conversations
With the Chat Completions API, you must manually manage the message history array. The Responses API simplifies this process by using the previous_response_id parameter to automatically link conversation context. The response id is valid for 7 days.
Python
|
|
Node.js
|
|
4. Use built-in tools
The Responses API includes built-in tools. Specify them in the tools parameter. The Code Interpreter and web search tools are free for a limited time. See tool calling.
Python
|
|
Node.js
|
|
Curl
|
|
FAQ
Q: How to pass multi-turn conversation context?
A: Pass the id from the previous successful model response as the previous_response_id parameter in your next conversation request.
Q: Why can't I print 'output_text'?
A: This attribute is missing in some versions of the OpenAI Python SDK, such as 1.99.2. To resolve this error, update the SDK to the latest version.