The Qwen API is stateless. To implement multi-turn conversations, pass conversation history in each request. Use truncation, summarization, or retrieval to manage context and reduce token consumption.
This topic covers OpenAI-compatible Chat Completion and DashScope interfaces. For a simpler alternative, see OpenAI-compatible - Responses.
How it works
To implement multi-turn conversations, maintain a messages array. After each round, append the user's question and model's response and then use the updated array for the next request.
The following example shows how the state of the messages array changes during a multi-turn conversation:
-
First round
Add the user's question to the
messagesarray.// Use a text model [ {"role": "user", "content": "Recommend a sci-fi movie about space exploration."} ] // Use a multimodal model, for example, Qwen-VL // {"role": "user", // "content": [{"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"}}, // {"type": "text", "text": "What products are shown in the image?"}] // } -
Second round
Add the model's response and the user's latest question to the
messagesarray.// Use a text model [ {"role": "user", "content": "Recommend a sci-fi movie about space exploration."}, {"role": "assistant", "content": "I recommend 'XXX'. It is a classic sci-fi work."}, {"role": "user", "content": "Who is the director of this movie?"} ] // Use a multimodal model, for example, Qwen-VL //[ // {"role": "user", "content": [ // {"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"}}, // {"type": "text", "text": "What products are shown in the image?"}]}, // {"role": "assistant", "content": "The image shows three items: a pair of light blue overalls, a blue and white striped short-sleeve shirt, and a pair of white sneakers."}, // {"role": "user", "content": "What style are they?"} //]
Getting started
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
api_key=os.getenv("DASHSCOPE_API_KEY"),
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)
def get_response(messages):
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages
)
return completion.choices[0].message.content
# Initialize messages
messages = []
# Round 1
messages.append({"role": "user", "content": "Recommend a sci-fi movie about space exploration."})
print("Round 1")
print(f"User: {messages[0]['content']}")
assistant_output = get_response(messages)
messages.append({"role": "assistant", "content": assistant_output})
print(f"Model: {assistant_output}\n")
# Round 2
messages.append({"role": "user", "content": "Who is the director of this movie?"})
print("Round 2")
print(f"User: {messages[-1]['content']}")
assistant_output = get_response(messages)
messages.append({"role": "assistant", "content": assistant_output})
print(f"Model: {assistant_output}\n")
Node.js
import OpenAI from "openai";
// China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
const BASE_URL = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1";
// API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
const openai = new OpenAI({
// If you have not configured the environment variable, replace the following line with: apiKey:"sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: BASE_URL,
});
async function getResponse(messages) {
const completion = await openai.chat.completions.create({
model: "qwen-plus",
messages: messages,
});
return completion.choices[0].message.content;
}
async function runConversation() {
const messages = [];
// Round 1
messages.push({ role: "user", content: "Recommend a sci-fi movie about space exploration." });
console.log("Round 1");
console.log("User: " + messages[0].content);
let assistant_output = await getResponse(messages);
messages.push({ role: "assistant", content: assistant_output });
console.log("Model: " + assistant_output + "\n");
// Round 2
messages.push({ role: "user", content: "Who is the director of this movie?" });
console.log("Round 2");
console.log("User: " + messages[messages.length - 1].content);
assistant_output = await getResponse(messages);
messages.push({ role: "assistant", content: assistant_output });
console.log("Model: " + assistant_output + "\n");
}
runConversation();curl
# ======= Important =======
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
# === Delete this comment before execution ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello, I am Qwen."
},
{
"role": "user",
"content": "What can you do?"
}
]
}'
DashScope
Python
The sample code provides an example of a mobile phone store salesperson who engages in a multi-turn conversation with a customer to determine their purchase intentions and then ends the session.
import os
from dashscope import Generation
import dashscope
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
def get_response(messages):
response = Generation.call(
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# For a list of models, see https://help.aliyun.com/document_detail/2751232.html
model="qwen-plus",
messages=messages,
result_format="message",
)
return response
# Initialize a messages array
messages = [
{
"role": "system",
"content": """You are a salesperson at the Bailian phone store. You are responsible for recommending phones to users. The phones have two parameters: screen size (including 6.1-inch, 6.5-inch, and 6.7-inch) and resolution (including 2K and 4K).
You can only ask the user for one parameter at a time. If the user does not provide complete information, you need to ask a follow-up question to get the missing parameter. When all parameters are collected, you must say: I have understood your purchase intention. Please wait.""",
}
]
assistant_output = "Welcome to the Bailian phone store. What screen size are you looking for?"
print(f"Model output: {assistant_output}\n")
while "I have understood your purchase intention" not in assistant_output:
user_input = input("Please enter: ")
# Add the user's question to the messages list
messages.append({"role": "user", "content": user_input})
assistant_output = get_response(messages).output.choices[0].message.content
# Add the model's response to the messages list
messages.append({"role": "assistant", "content": assistant_output})
print(f"Model output: {assistant_output}")
print("\n")
Java
import java.util.ArrayList;
import java.util.List;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import java.util.Scanner;
import com.alibaba.dashscope.utils.Constants;
public class Main {
// China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
static {
Constants.baseHttpApiUrl="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1";
}
public static GenerationParam createGenerationParam(List<Message> messages) {
return GenerationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
// API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// For a list of models, see https://help.aliyun.com/document_detail/2751232.html
.model("qwen-plus")
.messages(messages)
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.build();
}
public static GenerationResult callGenerationWithMessages(GenerationParam param) throws ApiException, NoApiKeyException, InputRequiredException {
Generation gen = new Generation();
return gen.call(param);
}
public static void main(String[] args) {
try {
List<Message> messages = new ArrayList<>();
messages.add(createMessage(Role.SYSTEM, "You are a helpful assistant."));
for (int i = 0; i < 3;i++) {
Scanner scanner = new Scanner(System.in);
System.out.print("Please enter: ");
String userInput = scanner.nextLine();
if ("exit".equalsIgnoreCase(userInput)) {
break;
}
messages.add(createMessage(Role.USER, userInput));
GenerationParam param = createGenerationParam(messages);
GenerationResult result = callGenerationWithMessages(param);
System.out.println("Model output: "+result.getOutput().getChoices().get(0).getMessage().getContent());
messages.add(result.getOutput().getChoices().get(0).getMessage());
}
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
e.printStackTrace();
}
System.exit(0);
}
private static Message createMessage(Role role, String content) {
return Message.builder().role(role.getValue()).content(content).build();
}
}
curl
# ======= Important =======
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
# === Delete this comment before execution ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello, I am Qwen."
},
{
"role": "user",
"content": "What can you do?"
}
]
}
}'
For multimodal models
Multimodal models support images and audio in conversations. Implementation differs from text models as follows:
-
Construction of user messages: User messages for multimodal models can contain multimodal information, such as images and audio, in addition to text.
-
DashScope SDK interface: When you use the DashScope Python SDK, call the
MultiModalConversationinterface. When you use the DashScope Java SDK, call theMultiModalConversationclass.
For multimodal models, see: Image and video understanding, User interface interaction, and Kimi. For Qwen-Omni, see Non-real-time (Qwen-Omni). Qwen-VL-OCR and Qwen3-Omni-Captioner are designed for specific single-turn tasks and do not support multi-turn conversations.
OpenAI compatible
Python
from openai import OpenAI
import os
client = OpenAI(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
api_key=os.getenv("DASHSCOPE_API_KEY"),
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"
},
},
{"type": "text", "text": "What products are shown in the image?"},
],
}
]
completion = client.chat.completions.create(
model="qwen3-vl-plus", # You can replace this with other multimodal models and modify the messages as needed
messages=messages,
)
print(f"First round output: {completion.choices[0].message.content}")
assistant_message = completion.choices[0].message
messages.append(assistant_message.model_dump())
messages.append({
"role": "user",
"content": [
{
"type": "text",
"text": "What style are they?"
}
]
})
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=messages,
)
print(f"Second round output: {completion.choices[0].message.content}")Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
// API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
apiKey: process.env.DASHSCOPE_API_KEY,
// China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
}
);
let messages = [
{
role: "user",
content: [
{ type: "image_url", image_url: { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png" } },
{ type: "text", text: "What products are shown in the image?" },
]
}]
async function main() {
let response = await openai.chat.completions.create({
model: "qwen3-vl-plus", // You can replace this with other multimodal models and modify the messages as needed
messages: messages
});
console.log(`First round output: ${response.choices[0].message.content}`);
messages.push(response.choices[0].message);
messages.push({"role": "user", "content": "What style are they?"});
response = await openai.chat.completions.create({
model: "qwen3-vl-plus",
messages: messages
});
console.log(`Second round output: ${response.choices[0].message.content}`);
}
main()curl
# ======= Important =======
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
# === Delete this comment before execution ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"
}
},
{
"type": "text",
"text": "What products are shown in the image?"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "The image shows three items: a pair of light blue overalls, a blue and white striped short-sleeve shirt, and a pair of white sneakers."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What style are they?"
}
]
}
]
}'DashScope
Python
import os
import dashscope
from dashscope import MultiModalConversation
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{
"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"
},
{"text": "What products are shown in the image?"},
],
}
]
response = MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus', # You can replace this with other multimodal models and modify the messages as needed
messages=messages)
print(f"Model first round output: {response.output.choices[0].message.content[0]['text']}")
messages.append(response['output']['choices'][0]['message'])
user_msg = {"role": "user", "content": [{"text": "What style are they?"}]}
messages.append(user_msg)
response = MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3-vl-plus',
messages=messages)
print(f"Model second round output: {response.output.choices[0].message.content[0]['text']}")
Java
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
// China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
static {Constants.baseHttpApiUrl="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1";}
private static final String modelName = "qwen3-vl-plus"; // You can replace this with other multimodal models and modify the messages as needed
public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"),
Collections.singletonMap("text", "What products are shown in the image?"))).build();
List<MultiModalMessage> messages = new ArrayList<>();
messages.add(userMessage);
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
// API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(modelName)
.messages(messages)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("First round output: "+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); // add the result to conversation
messages.add(result.getOutput().getChoices().get(0).getMessage());
MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "What style are they?"))).build();
messages.add(msg);
param.setMessages((List)messages);
result = conv.call(param);
System.out.println("Second round output: "+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text")); }
public static void main(String[] args) {
try {
MultiRoundConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
# === Delete this comment before execution ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{
"role": "user",
"content": [
{"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251031/ownrof/f26d201b1e3f4e62ab4a1fc82dd5c9bb.png"},
{"text": "What products are shown in the image?"}
]
},
{
"role": "assistant",
"content": [
{"text": "The image shows three items: a pair of light blue overalls, a blue and white striped short-sleeve shirt, and a pair of white sneakers."}
]
},
{
"role": "user",
"content": [
{"text": "What style are they?"}
]
}
]
}
}'For thinking models
Thinking models return reasoning_content (thinking process) and content (response). When updating messages, retain only content and ignore reasoning_content.
[
{"role": "user", "content": "Recommend a sci-fi movie about space exploration."},
{"role": "assistant", "content": "I recommend 'XXX'. It is a classic sci-fi work."}, # Do not add the reasoning_content field when you add to the context
{"role": "user", "content": "Who is the director of this movie?"}
]
For more information about thinking models, see Deep thinking, Image and video understanding, and Visual reasoning.
For more information about implementing multi-turn conversations with Qwen3-Omni-Flash (thinking mode), see omni-modal.
OpenAI compatible
Python
Sample code
from openai import OpenAI
import os
# Initialize the OpenAI client
client = OpenAI(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
api_key = os.getenv("DASHSCOPE_API_KEY"),
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
)
messages = []
conversation_idx = 1
while True:
reasoning_content = "" # Define the complete thinking process
answer_content = "" # Define the complete response
is_answering = False # Determine whether to end the thinking process and start responding
print("="*20+f"Conversation Round {conversation_idx}"+"="*20)
conversation_idx += 1
user_msg = {"role": "user", "content": input("Enter your message: ")}
messages.append(user_msg)
# Create a chat completion request
completion = client.chat.completions.create(
# You can replace this with other deep thinking models as needed
model="qwen-plus",
messages=messages,
extra_body={"enable_thinking": True},
stream=True,
# stream_options={
# "include_usage": True
# }
)
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")
for chunk in completion:
# If chunk.choices is empty, print usage
if not chunk.choices:
print("\nUsage:")
print(chunk.usage)
else:
delta = chunk.choices[0].delta
# Print the thinking process
if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
print(delta.reasoning_content, end='', flush=True)
reasoning_content += delta.reasoning_content
else:
# Start responding
if delta.content != "" and is_answering is False:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
is_answering = True
# Print the response process
print(delta.content, end='', flush=True)
answer_content += delta.content
# Add the content of the model's response to the context
messages.append({"role": "assistant", "content": answer_content})
print("\n")
Node.js
Sample code
import OpenAI from "openai";
import process from 'process';
import readline from 'readline/promises';
// Initialize the readline interface
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
// Initialize the openai client
const openai = new OpenAI({
// API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
apiKey: process.env.DASHSCOPE_API_KEY, // Read from environment variables
// China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
baseURL: 'https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1'
});
let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let messages = [];
let conversationIdx = 1;
async function main() {
while (true) {
console.log("=".repeat(20) + `Conversation Round ${conversationIdx}` + "=".repeat(20));
conversationIdx++;
// Read user input
const userInput = await rl.question("Enter your message: ");
messages.push({ role: 'user', content: userInput });
// Reset state
reasoningContent = '';
answerContent = '';
isAnswering = false;
try {
const stream = await openai.chat.completions.create({
// You can replace this with other deep thinking models as needed
model: 'qwen-plus',
messages: messages,
enable_thinking: true,
stream: true,
// stream_options:{
// include_usage: true
// }
});
console.log("\n" + "=".repeat(20) + "Thinking Process" + "=".repeat(20) + "\n");
for await (const chunk of stream) {
if (!chunk.choices?.length) {
console.log('\nUsage:');
console.log(chunk.usage);
continue;
}
const delta = chunk.choices[0].delta;
// Process the thinking process
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
reasoningContent += delta.reasoning_content;
}
// Process the formal response
if (delta.content) {
if (!isAnswering) {
console.log('\n' + "=".repeat(20) + "Complete Response" + "=".repeat(20) + "\n");
isAnswering = true;
}
process.stdout.write(delta.content);
answerContent += delta.content;
}
}
// Add the complete response to the message history
messages.push({ role: 'assistant', content: answerContent });
console.log("\n");
} catch (error) {
console.error('Error:', error);
}
}
}
// Start the program
main().catch(console.error);
HTTP
Sample code
curl
# ======= Important =======
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
# === Delete this comment before execution ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello! Nice to meet you. Is there anything I can help you with?"
},
{
"role": "user",
"content": "Who are you?"
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_thinking": true
}'
DashScope
Python
Sample code
import os
import dashscope
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
messages = []
conversation_idx = 1
while True:
print("=" * 20 + f"Conversation Round {conversation_idx}" + "=" * 20)
conversation_idx += 1
user_msg = {"role": "user", "content": input("Enter your message: ")}
messages.append(user_msg)
response = dashscope.Generation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
# This example uses qwen-plus. You can replace it with other deep thinking models as needed
model="qwen-plus",
messages=messages,
enable_thinking=True,
result_format="message",
stream=True,
incremental_output=True
)
# Define the complete thinking process
reasoning_content = ""
# Define the complete response
answer_content = ""
# Determine whether to end the thinking process and start responding
is_answering = False
print("=" * 20 + "Thinking Process" + "=" * 20)
for chunk in response:
# If both the thinking process and the response are empty, ignore
if (chunk.output.choices[0].message.content == "" and
chunk.output.choices[0].message.reasoning_content == ""):
pass
else:
# If it is currently the thinking process
if (chunk.output.choices[0].message.reasoning_content != "" and
chunk.output.choices[0].message.content == ""):
print(chunk.output.choices[0].message.reasoning_content, end="",flush=True)
reasoning_content += chunk.output.choices[0].message.reasoning_content
# If it is currently the response
elif chunk.output.choices[0].message.content != "":
if not is_answering:
print("\n" + "=" * 20 + "Complete Response" + "=" * 20)
is_answering = True
print(chunk.output.choices[0].message.content, end="",flush=True)
answer_content += chunk.output.choices[0].message.content
# Add the content of the model's response to the context
messages.append({"role": "assistant", "content": answer_content})
print("\n")
# To print the complete thinking process and complete response, uncomment and run the following code
# print("=" * 20 + "Complete Thinking Process" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "Complete Response" + "=" * 20 + "\n")
# print(f"{answer_content}")
Java
Sample code
// DashScope SDK version >= 2.19.4
import java.util.Arrays;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import java.lang.System;
import java.util.List;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
private static StringBuilder reasoningContent = new StringBuilder();
private static StringBuilder finalContent = new StringBuilder();
private static boolean isFirstPrint = true;
// China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
static {Constants.baseHttpApiUrl="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1";}
private static void handleGenerationResult(GenerationResult message) {
if (message != null && message.getOutput() != null
&& message.getOutput().getChoices() != null
&& !message.getOutput().getChoices().isEmpty()
&& message.getOutput().getChoices().get(0) != null
&& message.getOutput().getChoices().get(0).getMessage() != null) {
String reasoning = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
String content = message.getOutput().getChoices().get(0).getMessage().getContent();
if (reasoning != null && !reasoning.isEmpty()) {
reasoningContent.append(reasoning);
if (isFirstPrint) {
System.out.println("====================Thinking Process====================");
isFirstPrint = false;
}
System.out.print(reasoning);
}
if (content != null && !content.isEmpty()) {
finalContent.append(content);
if (!isFirstPrint) {
System.out.println("\n====================Complete Response====================");
isFirstPrint = true;
}
System.out.print(content);
}
}
}
private static GenerationParam buildGenerationParam(List<Message> messages) {
return GenerationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// This example uses qwen-plus. You can replace it with other model names as needed.
.model("qwen-plus")
.enableThinking(true)
.messages(messages)
.incrementalOutput(true)
.resultFormat("message")
.build();
}
public static void streamCallWithMessage(Generation gen, List<Message> messages)
throws NoApiKeyException, ApiException, InputRequiredException {
GenerationParam param = buildGenerationParam(messages);
Flowable<GenerationResult> result = gen.streamCall(param);
result.doOnError(throwable -> logger.error("Error occurred in stream processing: {}", throwable.getMessage(), throwable))
.blockingForEach(Main::handleGenerationResult);
}
public static void main(String[] args) {
try {
Generation gen = new Generation();
Message userMsg1 = Message.builder()
.role(Role.USER.getValue())
.content("Hello")
.build();
Message assistantMsg = Message.builder()
.role(Role.ASSISTANT.getValue())
.content("Hello! Nice to meet you. Is there anything I can help you with?")
.build();
Message userMsg2 = Message.builder()
.role(Role.USER.getValue())
.content("Who are you")
.build();
List<Message> messages = Arrays.asList(userMsg1, assistantMsg, userMsg2);
streamCallWithMessage(gen, messages);
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
logger.error("An exception occurred: {}", e.getMessage(), e);
} catch (Exception e) {
logger.error("Unexpected error occurred: {}", e.getMessage(), e);
} finally {
// Ensure the program exits normally
System.exit(0);
}
}
}
HTTP
Sample code
curl
# ======= Important =======
# API keys vary by region. To obtain an API key, see https://help.aliyun.com/document_detail/2795253.html
# China (Beijing) region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
# === Delete this comment before execution ===
curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "user",
"content": "Hello"
},
{
"role": "assistant",
"content": "Hello! Nice to meet you. Is there anything I can help you with?"
},
{
"role": "user",
"content": "Who are you?"
}
]
},
"parameters":{
"enable_thinking": true,
"incremental_output": true,
"result_format": "message"
}
}'
Going live
Multi-turn conversations can consume many tokens and exceed the model's context length, causing errors. Use these strategies to manage context and control costs.
1. Context management
The messages array grows with each round and may exceed the model's token limit. Use these methods to manage context length:
1.1. Context truncation
Keep only the most recent N rounds when history becomes too long. This is simple to implement but loses earlier conversation information.
1.2. Rolling summary
Summarize context as the conversation progresses to compress history and control length without losing core information:
a. When history reaches 70% of max context length, extract an earlier part (such as the first half) and make a separate API call to generate a "memory summary".
b. In the next request, replace the lengthy history with the "memory summary" and append recent rounds.
1.3. Vectorized retrieval
Rolling summaries can lose some information. To let the model recall relevant information from large conversation histories, use on-demand retrieval instead of linear context passing:
a. After each conversation round, store the conversation in a vector database.
b. When a user asks a question, retrieve relevant conversation records based on similarity.
c. Combine the retrieved conversation records with the most recent user input and send the combined content to the model.
2. Cost control
Input tokens increase with each round, significantly raising costs. Use these cost management strategies:
2.1. Reduce input tokens
Use the context management strategies described previously to reduce input tokens and lower costs.
2.2. Use models that support context cache
In multi-turn requests, the messages array is repeatedly processed and billed. Model Studio provides context cache for models like qwen-max and qwen-plus, which reduces costs and improves response speed. Prioritize models that support context cache.
Context cache is enabled automatically—no code changes required.
Error codes
If the model call fails and returns an error message, see Error codes for resolution.