For scenarios like code completion and text continuation, you can generate new content starting from an existing text fragment (prefix). Partial Mode ensures the model's output connects seamlessly with your prefix for improved accuracy and control.
How it works
To use Partial Mode, configure the messages array. In the last message of the array, set the role to assistant and provide the prefix in the content field. You must also set the "partial": true parameter in that message. The messages format is as follows:
[
{
"role": "user",
"content": "Complete this Fibonacci function. Do not add anything else."
},
{
"role": "assistant",
"content": "def calculate_fibonacci(n):\n if n <= 1:\n return n\n else:\n",
"partial": true
}
]
The model then starts generating text from the specified prefix.
Supported models
-
Text generation models
-
Qwen-Max (non-thinking mode): Qwen3.7-Max series, Qwen3.6-Max series, Qwen3-Max series, Qwen-Max series
-
Qwen-Plus (non-thinking mode): Qwen3.7-Plus series, Qwen3.6-Plus series, Qwen3.5-Plus series, Qwen-Plus series
-
Qwen-Flash (non-thinking mode): Qwen3.6-Flash series, Qwen3.5-Flash series, Qwen-Flash series
-
Qwen-Coder: Qwen3-Coder series, Qwen2.5-Coder series, Qwen-Coder series
-
Qwen-Turbo (non-thinking mode): Qwen-Turbo series
-
Qwen3.6 open source series (non-thinking mode)
-
Qwen3.5 open source series (non-thinking mode)
-
Qwen3 open source series (non-thinking mode)
-
Qwen2.5 open source series
-
Qwen-Math: Qwen-Math series, Qwen2.5-Math series
-
DeepSeek (SiliconFlow deployment): siliconflow/deepseek-v3.2 (non-thinking mode), siliconflow/deepseek-v3.1-terminus (non-thinking mode), siliconflow/deepseek-v3-0324
-
DeepSeek (Vanchin deployment): vanchin/deepseek-v3.2-think, vanchin/deepseek-r1, vanchin/deepseek-v3
-
-
Multimodal models
-
Qwen-VL: Qwen3-VL-Plus series, Qwen3-VL-Flash series, Qwen-VL-Max series, Qwen-VL-Plus series
-
Qwen3-VL open source series (non-thinking mode)
-
Kimi (deployed on Moonshot AI): kimi/kimi-k2.6, kimi/kimi-k2.5
-
Getting started
Prerequisites
Before you begin, get an API key and set the API key as an environment variable. If you call the service using the OpenAI SDK or DashScope SDK, you must install the SDK. If you are a member of a sub-workspace, ensure that the super administrator has granted model access to your workspace.
The DashScope Java SDK is not supported.
Sample code
Code completion is the core use case for Partial Mode. The following example shows how to complete a Python function.
OpenAI compatible
Python
import os
from openai import OpenAI
# 1. Initialize the client
client = OpenAI(
# If not set in environment, replace the next line with: api_key="sk-xxx"
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)
# 2. Define the code prefix to complete
prefix = """def calculate_fibonacci(n):
if n <= 1:
return n
else:
"""
# 3. Make a Partial Mode request
# Note: The last message in the messages array must have role "assistant" and include "partial": True
completion = client.chat.completions.create(
model="qwen3.7-max",
messages=[
{"role": "user", "content": "Complete this Fibonacci function. Do not add anything else."},
{"role": "assistant", "content": prefix, "partial": True},
],
)
# 4. Manually join the prefix and the model's generated content
generated_code = completion.choices[0].message.content
complete_code = prefix + generated_code
print(complete_code)
Response
def calculate_fibonacci(n):
if n <= 1:
return n
else:
return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
// If not set in environment, replace the next line with: apiKey: "sk-xxx"
// API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
});
// Define the code prefix to complete
const prefix = `def calculate_fibonacci(n):
if n <= 1:
return n
else:
`;
const completion = await openai.chat.completions.create({
model: "qwen3.7-max", // Use a code model
messages: [
{ role: "user", content: "Complete this Fibonacci function. Do not add anything else." },
{ role: "assistant", content: prefix, partial: true }
],
});
// Manually join the prefix and the model's generated content
const generatedCode = completion.choices[0].message.content;
const completeCode = prefix + generatedCode;
console.log(completeCode);
curl
# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
# === Remove this comment before running ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-max",
"messages": [
{
"role": "user",
"content": "Complete this Fibonacci function. Do not add anything else."
},
{
"role": "assistant",
"content": "def calculate_fibonacci(n):\n if n <= 1:\n return n\n else:\n",
"partial": true
}
]
}'
Response
{
"choices": [
{
"message": {
"content": " return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 48,
"completion_tokens": 19,
"total_tokens": 67,
"prompt_tokens_details": {
"cache_type": "implicit",
"cached_tokens": 0
}
},
"created": 1756800231,
"system_fingerprint": null,
"model": "qwen3.7-max",
"id": "chatcmpl-d103b1cf-4bda-942f-92d6-d7ecabfeeccb"
}
DashScope
Python
import os
import dashscope
# The following URL is for the China (Beijing) region. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": "Complete this Fibonacci function. Do not add anything else."
},
{
"role": "assistant",
"content": "def calculate_fibonacci(n):\n if n <= 1:\n return n\n else:\n",
"partial": True
}
]
response = dashscope.Generation.call(
# If not set in environment, replace the next line with: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen3.7-max',
messages=messages,
result_format='message',
)
# Manually join the prefix and the model's generated content
prefix = "def calculate_fibonacci(n):\n if n <= 1:\n return n\n else:\n"
generated_code = response.output.choices[0].message.content
complete_code = prefix + generated_code
print(complete_code)
Response
def calculate_fibonacci(n):
if n <= 1:
return n
else:
return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)
curl
# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Remove this comment before running ===
curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-max",
"input":{
"messages":[
{
"role": "user",
"content": "Complete this Fibonacci function. Do not add anything else."
},
{
"role": "assistant",
"content": "def calculate_fibonacci(n):\n if n <= 1:\n return n\n else:\n",
"partial": true
}
]
},
"parameters": {
"result_format": "message"
}
}'
Response
{
"output": {
"choices": [
{
"message": {
"content": " return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
"role": "assistant"
},
"finish_reason": "stop"
}
]
},
"usage": {
"total_tokens": 67,
"output_tokens": 19,
"input_tokens": 48,
"prompt_tokens_details": {
"cached_tokens": 0
}
},
"request_id": "c61c62e5-cf97-90bc-a4ee-50e5e117b93f"
}
Use cases
Pass images or videos
Qwen-VL models support Partial Mode with image or video data, which is useful for scenarios such as product descriptions, social posts, news articles, and creative copywriting.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
# If not set in environment, replace the next line with: api_key="sk-xxx",
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-vl-plus",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
},
},
{"type": "text", "text": "I want to post this on social media. Help me write a caption."},
],
},
{
"role": "assistant",
"content": "Today I discovered a hidden-gem café",
"partial": True,
},
],
)
print(completion.choices[0].message.content)
Response
— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime
Hope you like this caption! Let me know if you need any changes.
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
// If you have not set the environment variable, replace the next line with: apiKey: "sk-xxx"
// API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
});
async function main() {
const response = await openai.chat.completions.create({
model: "qwen3-vl-plus",
messages: [
{
role: "user",
content: [
{
type: "image_url",
image_url: {
"url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
}
},
{
type: "text",
text: "I want to post this on social media. Help me write a caption."
}
]
},
{
role: "assistant",
content: "Today I discovered a hidden-gem café",
"partial": true
}
]
});
console.log(response.choices[0].message.content);
}
main();
curl
# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# This is the base URL for the Beijing region. For Singapore region models, replace base_url with: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
# === Remove this comment before running ===
curl -X POST 'https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
}
},
{
"type": "text",
"text": "I want to post this on social media. Help me write a caption."
}
]
},
{
"role": "assistant",
"content": "Today I discovered a hidden-gem café",
"partial": true
}
]
}'
Response
{
"choices": [
{
"message": {
"content": "— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime\n\nHope you like this caption! Let me know if you need any changes.",
"role": "assistant"
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 282,
"completion_tokens": 56,
"total_tokens": 338,
"prompt_tokens_details": {
"cached_tokens": 0
}
},
"created": 1756802933,
"system_fingerprint": null,
"model": "qwen3-vl-plus",
"id": "chatcmpl-5780cbb7-ebae-9c63-b098-f8cc49e321f0"
}
DashScope
Python
import os
import dashscope
# The following URL is for the China (Beijing) region. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{
"image": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
},
{"text": "I want to post this on social media. Help me write a caption."},
],
},
{"role": "assistant", "content": "Today I discovered a hidden-gem café", "partial": True},
]
response = dashscope.MultiModalConversation.call(
# If you have not set the environment variable, replace the next line with: api_key ="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-vl-plus",
messages=messages
)
print(response.output.choices[0].message.content[0]["text"])
Response
— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime
Hope you like this caption! Let me know if you need any changes.
curl
# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Remove this comment before running ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-vl-plus",
"input":{
"messages":[
{"role": "user",
"content": [
{"image": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"},
{"text": "I want to post this on social media. Help me write a caption."}]
},
{"role": "assistant",
"content": "Today I discovered a hidden-gem café",
"partial": true
}
]
}
}'
Response
{
"output": {
"choices": [
{
"message": {
"content": [
{
"text": "— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime\n\nHope you like this caption! Let me know if you need any changes."
}
],
"role": "assistant"
},
"finish_reason": "stop"
}
]
},
"usage": {
"total_tokens": 339,
"input_tokens_details": {
"image_tokens": 258,
"text_tokens": 24
},
"output_tokens": 57,
"input_tokens": 282,
"output_tokens_details": {
"text_tokens": 57
},
"image_tokens": 258
},
"request_id": "c741328c-23dc-9286-bfa7-626a4092ca09"
}
Continue from incomplete output
You can use Partial Mode to continue from incomplete LLM output and ensure semantically complete results. Incomplete output may occur for the following reasons:
-
The value of the
max_tokensparameter is too small, which causes the model to truncate its output. -
A non-streaming response times out. In this case, the model returns the content that it has generated so far.
Timeouts no longer cause errors. Instead, the model returns generated content so far. See How to handle model timeouts.
OpenAI compatible
Python
import os
from openai import OpenAI
# Initialize the client
client = OpenAI(
# If not set in environment, replace the next line with: api_key="sk-xxx",
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)
def chat_completion(messages,max_tokens=None):
response = client.chat.completions.create(
model="qwen-plus",
messages=messages,
max_tokens=max_tokens
)
print(f"### Reason generation stopped: {response.choices[0].finish_reason}")
return response.choices[0].message.content
# Example usage
messages = [{"role": "user", "content": "Write a short sci-fi story"}]
# First call with max_tokens set to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)
# Add the first response as an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})
# Second call
second_content = chat_completion(messages)
print("### Complete content:")
print(first_content+second_content)
Response
length: The max_tokens limit was reached. stop: The model finished naturally or hit a stop word from the stop parameter.
### Reason generation stopped: length
**"The End of Memory"**
In the distant future, Earth is no longer fit for human life. The atmosphere is polluted, oceans are dry, and cities lie in ruins. Humans migrated to a habitable planet named "Eden," with blue skies, fresh air, and endless resources.
However, Eden is not a true paradise. It holds no human history, no past, and no memory.
...
**"If we forget who we are, are we still human?"**
— End —
Node.js
import OpenAI from "openai";
const openai = new OpenAI({
// If not set in environment, replace the next line with: apiKey: "sk-xxx"
// API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
});
async function chatCompletion(messages, maxTokens = null) {
const completion = await openai.chat.completions.create({
model: "qwen-plus",
messages: messages,
max_tokens: maxTokens
});
console.log(`### Reason generation stopped: ${completion.choices[0].finish_reason}`);
return completion.choices[0].message.content;
}
// Example usage
async function main() {
let messages = [{"role": "user", "content": "Write a short sci-fi story"}];
try {
// First call with max_tokens set to 40
const firstContent = await chatCompletion(messages, 40);
console.log(firstContent);
// Add the first response as an assistant message and set partial=true
messages.push({"role": "assistant", "content": firstContent, "partial": true});
// Second call
const secondContent = await chatCompletion(messages);
console.log("### Complete content:");
console.log(firstContent + secondContent);
} catch (error) {
console.error('Execution error:', error);
}
}
// Run the example
main();
DashScope
Python
Sample code
import os
import dashscope
# The following URL is for the China (Beijing) region. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
def chat_completion(messages, max_tokens=None):
response = dashscope.Generation.call(
# If not set in environment, replace the next line with: api_key="sk-xxx",
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
model='qwen-plus',
messages=messages,
max_tokens=max_tokens,
result_format='message',
)
print(f"### Reason generation stopped: {response.output.choices[0].finish_reason}")
return response.output.choices[0].message.content
# Example usage
messages = [{"role": "user", "content": "Write a short sci-fi story"}]
# First call with max_tokens set to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)
# Add the first response as an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})
# Second call
second_content = chat_completion(messages)
print("### Complete content:")
print(first_content + second_content)
Response
### Reason generation stopped: length
Title: **"Origami Time"**
---
In 2179, humanity finally mastered time travel. But this technology did not rely on massive machines or complex energy fields. It relied on paper.
A single sheet of paper.
It was called "Origami Time," made from an unknown alien material. Scientists could not explain how it worked. They only knew that drawing a scene on the paper and folding it in a specific way opened a door to the past or future.
...
"You are not the key to time. You are just a reminder that our future is always in our hands."
Then I tore it into pieces.
---
**(End)**
Billing
Billing covers input and output tokens. The prefix counts as part of the input tokens.
Error codes
If the model call fails and returns an error message, see Error codes for resolution.