Partial mode

更新时间:
复制 MD 格式

For scenarios like code completion and text continuation, you can generate new content starting from an existing text fragment (prefix). Partial Mode ensures the model's output connects seamlessly with your prefix for improved accuracy and control.

How it works

To use Partial Mode, configure the messages array. In the last message of the array, set the role to assistant and provide the prefix in the content field. You must also set the "partial": true parameter in that message. The messages format is as follows:

[
    {
        "role": "user",
        "content": "Complete this Fibonacci function. Do not add anything else."
    },
    {
        "role": "assistant",
        "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
        "partial": true
    }
]

The model then starts generating text from the specified prefix.

Supported models

  • Text generation models

    • Qwen-Max (non-thinking mode): Qwen3.7-Max series, Qwen3.6-Max series, Qwen3-Max series, Qwen-Max series

    • Qwen-Plus (non-thinking mode): Qwen3.7-Plus series, Qwen3.6-Plus series, Qwen3.5-Plus series, Qwen-Plus series

    • Qwen-Flash (non-thinking mode): Qwen3.6-Flash series, Qwen3.5-Flash series, Qwen-Flash series

    • Qwen-Coder: Qwen3-Coder series, Qwen2.5-Coder series, Qwen-Coder series

    • Qwen-Turbo (non-thinking mode): Qwen-Turbo series

    • Qwen3.6 open source series (non-thinking mode)

    • Qwen3.5 open source series (non-thinking mode)

    • Qwen3 open source series (non-thinking mode)

    • Qwen2.5 open source series

    • Qwen-Math: Qwen-Math series, Qwen2.5-Math series

    • DeepSeek (SiliconFlow deployment): siliconflow/deepseek-v3.2 (non-thinking mode), siliconflow/deepseek-v3.1-terminus (non-thinking mode), siliconflow/deepseek-v3-0324

    • DeepSeek (Vanchin deployment): vanchin/deepseek-v3.2-think, vanchin/deepseek-r1, vanchin/deepseek-v3

  • Multimodal models

    • Qwen-VL: Qwen3-VL-Plus series, Qwen3-VL-Flash series, Qwen-VL-Max series, Qwen-VL-Plus series

    • Qwen3-VL open source series (non-thinking mode)

    • Kimi (deployed on Moonshot AI): kimi/kimi-k2.6, kimi/kimi-k2.5

Getting started

Prerequisites

Before you begin, get an API key and set the API key as an environment variable. If you call the service using the OpenAI SDK or DashScope SDK, you must install the SDK. If you are a member of a sub-workspace, ensure that the super administrator has granted model access to your workspace.

Note

The DashScope Java SDK is not supported.

Sample code

Code completion is the core use case for Partial Mode. The following example shows how to complete a Python function.

OpenAI compatible

Python

import os
from openai import OpenAI

# 1. Initialize the client
client = OpenAI(
    # If not set in environment, replace the next line with: api_key="sk-xxx"
    # API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)

# 2. Define the code prefix to complete
prefix = """def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
"""

# 3. Make a Partial Mode request
# Note: The last message in the messages array must have role "assistant" and include "partial": True
completion = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {"role": "user", "content": "Complete this Fibonacci function. Do not add anything else."},
        {"role": "assistant", "content": prefix, "partial": True},
    ],
)

# 4. Manually join the prefix and the model's generated content
generated_code = completion.choices[0].message.content
complete_code = prefix + generated_code

print(complete_code)

Response

def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    // If not set in environment, replace the next line with: apiKey: "sk-xxx"
    // API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY,
    // Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
});

// Define the code prefix to complete
const prefix = `def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
`;

const completion = await openai.chat.completions.create({
    model: "qwen3.7-max",  // Use a code model
    messages: [
        { role: "user", content: "Complete this Fibonacci function. Do not add anything else." },
        { role: "assistant", content: prefix, partial: true }
    ],
});

// Manually join the prefix and the model's generated content
const generatedCode = completion.choices[0].message.content;
const completeCode = prefix + generatedCode;

console.log(completeCode);

curl

# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
# === Remove this comment before running ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.7-max",
    "messages": [
        {
            "role": "user", 
            "content": "Complete this Fibonacci function. Do not add anything else."
        },
        {
            "role": "assistant",
            "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
            "partial": true
        }
    ]
}'

Response

{
    "choices": [
        {
            "message": {
                "content": "        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 48,
        "completion_tokens": 19,
        "total_tokens": 67,
        "prompt_tokens_details": {
            "cache_type": "implicit",
            "cached_tokens": 0
        }
    },
    "created": 1756800231,
    "system_fingerprint": null,
    "model": "qwen3.7-max",
    "id": "chatcmpl-d103b1cf-4bda-942f-92d6-d7ecabfeeccb"
}

DashScope

Python

import os
import dashscope

# The following URL is for the China (Beijing) region. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"
messages = [
    {
        "role": "user", 
        "content": "Complete this Fibonacci function. Do not add anything else."
    },
    {
        "role": "assistant",
        "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
        "partial": True
    }
]
response = dashscope.Generation.call(
    # If not set in environment, replace the next line with: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3.7-max',
    messages=messages,
    result_format='message',  
)

# Manually join the prefix and the model's generated content
prefix = "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n"
generated_code = response.output.choices[0].message.content
complete_code = prefix + generated_code

print(complete_code)

Response

def calculate_fibonacci(n):
    if n <= 1:
        return n
    else:
        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

curl

# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Remove this comment before running ===
curl -X POST "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.7-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": "Complete this Fibonacci function. Do not add anything else."
            },
            {
                "role": "assistant",
                "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
                "partial": true
            }
        ]
    },
    "parameters": {
        "result_format": "message"
    }
}'

Response

{
    "output": {
        "choices": [
            {
                "message": {
                    "content": "        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
                    "role": "assistant"
                },
                "finish_reason": "stop"
            }
        ]
    },
    "usage": {
        "total_tokens": 67,
        "output_tokens": 19,
        "input_tokens": 48,
        "prompt_tokens_details": {
            "cached_tokens": 0
        }
    },
    "request_id": "c61c62e5-cf97-90bc-a4ee-50e5e117b93f"
}

Use cases

Pass images or videos

Qwen-VL models support Partial Mode with image or video data, which is useful for scenarios such as product descriptions, social posts, news articles, and creative copywriting.

OpenAI compatible

Python

import os
from openai import OpenAI

client = OpenAI(
    # If not set in environment, replace the next line with: api_key="sk-xxx",
    # API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-vl-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
                    },
                },
                {"type": "text", "text": "I want to post this on social media. Help me write a caption."},
            ],
        },
        {
            "role": "assistant",
            "content": "Today I discovered a hidden-gem café",
            "partial": True,
        },
    ],
)
print(completion.choices[0].message.content)

Response

— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime

Hope you like this caption! Let me know if you need any changes.

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    // If you have not set the environment variable, replace the next line with: apiKey: "sk-xxx"
    // API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY,
    // Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen3-vl-plus", 
        messages: [
            {
                role: "user",
                content: [
                    {
                        type: "image_url",
                        image_url: {
                            "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
                        }
                    },
                    {
                        type: "text",
                        text: "I want to post this on social media. Help me write a caption."
                    }
                ]
            },
            {
                role: "assistant",
                content: "Today I discovered a hidden-gem café",
                "partial": true
            }
        ]
    });
    console.log(response.choices[0].message.content);
}

main();

curl

# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# This is the base URL for the Beijing region. For Singapore region models, replace base_url with: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
# === Remove this comment before running ===
curl -X POST 'https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen3-vl-plus",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
          }
        },
        {
          "type": "text",
          "text": "I want to post this on social media. Help me write a caption."
        }
      ]
    },
    {
      "role": "assistant",
      "content": "Today I discovered a hidden-gem café",
      "partial": true
    }
  ]
}'

Response

{
    "choices": [
        {
            "message": {
                "content": "— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime\n\nHope you like this caption! Let me know if you need any changes.",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 282,
        "completion_tokens": 56,
        "total_tokens": 338,
        "prompt_tokens_details": {
            "cached_tokens": 0
        }
    },
    "created": 1756802933,
    "system_fingerprint": null,
    "model": "qwen3-vl-plus",
    "id": "chatcmpl-5780cbb7-ebae-9c63-b098-f8cc49e321f0"
}

DashScope

Python

import os
import dashscope

# The following URL is for the China (Beijing) region. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"

messages = [
    {
        "role": "user",
        "content": [
            {
                "image": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
            },
            {"text": "I want to post this on social media. Help me write a caption."},
        ],
    },
    {"role": "assistant", "content": "Today I discovered a hidden-gem café", "partial": True},
]

response = dashscope.MultiModalConversation.call(
    # If you have not set the environment variable, replace the next line with: api_key ="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"), 
    model="qwen3-vl-plus", 
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Response

— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime

Hope you like this caption! Let me know if you need any changes.

curl

# ======= Important notice =======
# API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Remove this comment before running ===
curl -X POST https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {"role": "user",
             "content": [
               {"image": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"},
               {"text": "I want to post this on social media. Help me write a caption."}]
            },
            {"role": "assistant",
             "content": "Today I discovered a hidden-gem café",
             "partial": true
            }
        ]
    }
}'

Response

{
    "output": {
        "choices": [
            {
                "message": {
                    "content": [
                        {
                            "text": "— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime\n\nHope you like this caption! Let me know if you need any changes."
                        }
                    ],
                    "role": "assistant"
                },
                "finish_reason": "stop"
            }
        ]
    },
    "usage": {
        "total_tokens": 339,
        "input_tokens_details": {
            "image_tokens": 258,
            "text_tokens": 24
        },
        "output_tokens": 57,
        "input_tokens": 282,
        "output_tokens_details": {
            "text_tokens": 57
        },
        "image_tokens": 258
    },
    "request_id": "c741328c-23dc-9286-bfa7-626a4092ca09"
}

Continue from incomplete output

You can use Partial Mode to continue from incomplete LLM output and ensure semantically complete results. Incomplete output may occur for the following reasons:

  • The value of the max_tokens parameter is too small, which causes the model to truncate its output.

  • A non-streaming response times out. In this case, the model returns the content that it has generated so far.

    Timeouts no longer cause errors. Instead, the model returns generated content so far. See How to handle model timeouts.

OpenAI compatible

Python

import os
from openai import OpenAI

# Initialize the client
client = OpenAI(
    # If not set in environment, replace the next line with: api_key="sk-xxx",
    # API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    base_url="https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
)

def chat_completion(messages,max_tokens=None):
    response = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        max_tokens=max_tokens
    )
    print(f"### Reason generation stopped: {response.choices[0].finish_reason}")
    
    return response.choices[0].message.content

# Example usage
messages = [{"role": "user", "content": "Write a short sci-fi story"}]

# First call with max_tokens set to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)
# Add the first response as an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})

# Second call
second_content = chat_completion(messages)
print("### Complete content:")
print(first_content+second_content)

Response

length: The max_tokens limit was reached. stop: The model finished naturally or hit a stop word from the stop parameter.

### Reason generation stopped: length
**"The End of Memory"**

In the distant future, Earth is no longer fit for human life. The atmosphere is polluted, oceans are dry, and cities lie in ruins. Humans migrated to a habitable planet named "Eden," with blue skies, fresh air, and endless resources.

However, Eden is not a true paradise. It holds no human history, no past, and no memory.

...
**"If we forget who we are, are we still human?"**

— End —

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
    // If not set in environment, replace the next line with: apiKey: "sk-xxx"
    // API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY,
    // Beijing region. For Singapore models use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
    baseURL: "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/compatible-mode/v1"
});

async function chatCompletion(messages, maxTokens = null) {
    const completion = await openai.chat.completions.create({
        model: "qwen-plus",
        messages: messages,
        max_tokens: maxTokens
    });
    
    console.log(`### Reason generation stopped: ${completion.choices[0].finish_reason}`);
    return completion.choices[0].message.content;
}

// Example usage
async function main() {
    let messages = [{"role": "user", "content": "Write a short sci-fi story"}];

    try {
        // First call with max_tokens set to 40
        const firstContent = await chatCompletion(messages, 40);
        console.log(firstContent);
        
        // Add the first response as an assistant message and set partial=true
        messages.push({"role": "assistant", "content": firstContent, "partial": true});

        // Second call
        const secondContent = await chatCompletion(messages);
        console.log("### Complete content:");
        console.log(firstContent + secondContent);
        
    } catch (error) {
        console.error('Execution error:', error);
    }
}

// Run the example
main();

DashScope

Python

Sample code

import os
import dashscope

# The following URL is for the China (Beijing) region. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
dashscope.base_http_api_url = "https://{WorkspaceId}.cn-beijing.maas.aliyuncs.com/api/v1"

def chat_completion(messages, max_tokens=None):
    response = dashscope.Generation.call(
        # If not set in environment, replace the next line with: api_key="sk-xxx",
        # API keys differ by region. Get your API key: https://help.aliyun.com/en/model-studio/get-api-key
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        model='qwen-plus',
        messages=messages,
        max_tokens=max_tokens,
        result_format='message',  
    )
    
    print(f"### Reason generation stopped: {response.output.choices[0].finish_reason}")
    return response.output.choices[0].message.content

# Example usage
messages = [{"role": "user", "content": "Write a short sci-fi story"}]

# First call with max_tokens set to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)

# Add the first response as an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})

# Second call
second_content = chat_completion(messages)
print("### Complete content:")
print(first_content + second_content)

Response

### Reason generation stopped: length
Title: **"Origami Time"**

---

In 2179, humanity finally mastered time travel. But this technology did not rely on massive machines or complex energy fields. It relied on paper.

A single sheet of paper.

It was called "Origami Time," made from an unknown alien material. Scientists could not explain how it worked. They only knew that drawing a scene on the paper and folding it in a specific way opened a door to the past or future.

...

"You are not the key to time. You are just a reminder that our future is always in our hands."

Then I tore it into pieces.

---

**(End)**

Billing

Billing covers input and output tokens. The prefix counts as part of the input tokens.

Error codes

If the model call fails and returns an error message, see Error codes for resolution.