How to check model training and deployment pricing-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

Training billing

Text generation models – Qwen

Note

For the training workflow, see Introduction to model fine-tuning. After training completes, deploy the new model before evaluating or calling it.

Method

Billed by training tokens

Formula

Model training fee = (Total tokens in training data + Total tokens in mixed training data) × Number of epochs × Training unit price (Minimum billing unit: 1 token)

View the estimated training fee at the bottom of the model training console, and click Computing Details to view the total number of training tokens, number of epochs, and training unit price.

Qwen

Model service	Model code	Price
Qwen3.6-Flash-2026-04-16	qwen3.6-flash-2026-04-16	CNY 0.05/1,000 tokens

Qwen3.5-27B	qwen3.5-27b	CNY 0.05/1,000 tokens
Qwen3.5-9B	qwen3.5-9b	CNY 0.02/1,000 tokens
Qwen3.5-Flash-2026-02-23	qwen3.5-flash-2026-02-23	CNY 0.05/1,000 tokens

Qwen3-32B	qwen3-32b	CNY 0.04/1,000 tokens
Qwen3-30B-A3B-Instruct-2507	qwen3-30b-a3b-instruct-2507	CNY 0.03/1,000 tokens
Qwen3-14B	qwen3-14b	CNY 0.03/1,000 tokens
Qwen3-8B	qwen3-8b	CNY 0.006/1,000 tokens
Qwen3-4B-Instruct-2507	qwen3-4b-instruct-2507	CNY 0.006/1,000 tokens
Qwen3-1.7B	qwen3-1.7b	CNY 0.0045/1,000 tokens
Qwen3-0.6B	qwen3-0.6b	CNY 0.003/1,000 tokens

Qwen2.5-72B-Instruct	qwen2.5-72b-instruct	CNY 0.15/1,000 tokens
Qwen2.5-32B-Instruct	qwen2.5-32b-instruct	CNY 0.03/1,000 tokens
Qwen2.5-14B-Instruct	qwen2.5-14b-instruct	CNY 0.03/1,000 tokens
Qwen2.5-7B-Instruct	qwen2.5-7b-instruct	CNY 0.006/1,000 tokens

Qwen-Plus-Character-2025-11-06	qwen-plus-character-2025-11-06	CNY 0.15/1,000 tokens

Qwen-VL

Model service	Model code	Price
Qwen3-VL-8B-Instruct	qwen3-vl-8b-instruct	CNY 0.012/1,000 tokens
Qwen3-VL-8B-Thinking	qwen3-vl-8b-thinking	CNY 0.012/1,000 tokens
Qwen3-VL-4B-Instruct	qwen3-vl-4b-instruct	CNY 0.006/1,000 tokens

Qwen2.5-VL-72B-Instruct	qwen2.5-vl-72b-instruct	CNY 0.05/1,000 tokens
Qwen2.5-VL-32B-Instruct	qwen2.5-vl-32b-instruct	CNY 0.02/1,000 tokens
Qwen2.5-VL-7B-Instruct	qwen2.5-vl-7b-instruct	CNY 0.01/1,000 tokens

Calculate tokens for images and videos

Images

Formula: Image Tokens = h_bar * w_bar / token_pixels + 2

h_bar, w_bar: The height and width of the scaled image. Before processing an image, the model performs pre-processing to scale it down to a specific pixel limit. This limit depends on the values of the max_pixels and vl_high_resolution_images parameters. For more information, see Process high-resolution images.
token_pixels: The pixel value corresponding to each visual token. This varies by model:
- qwen3.7-series, qwen3.6-series, qwen3.5-series, Qwen3-VL, qwen-vl-max, and qwen-vl-plus: Each token corresponds to 32x32 pixels.
- QVQ and other Qwen2.5-VL models: Each token corresponds to 28x28 pixels.

The following code demonstrates the approximate image scaling logic used by the model. Use it to estimate the tokens for an image. For actual billing, refer to the API response.

import math
from PIL import Image  # pip install Pillow

def smart_size(image_path, max_pixels, vl_high_resolution_images):
    """Calculates the scaled dimensions of an image based on model parameters to estimate image tokens."""
    image = Image.open(image_path)
    height, width = image.height, image.width

    # The scaling factor is 32 for models such as Qwen3.6, Qwen3.5, and Qwen3-VL. For other models, it is 28.
    factor = 32
    h_bar = round(height / factor) * factor
    w_bar = round(width / factor) * factor

    # Token lower limit: 4 tokens
    min_pixels = 4 * factor * factor

    # If vl_high_resolution_images=True, the token upper limit is fixed at 16384, and max_pixels is ignored.
    if vl_high_resolution_images:
        max_pixels = 16384 * factor * factor

    # Constrains the total number of pixels to the range [min_pixels, max_pixels].
    if h_bar * w_bar > max_pixels:
        beta = math.sqrt((height * width) / max_pixels)
        h_bar = math.floor(height / beta / factor) * factor
        w_bar = math.floor(width / beta / factor) * factor
    elif h_bar * w_bar < min_pixels:
        beta = math.sqrt(min_pixels / (height * width))
        h_bar = math.ceil(height * beta / factor) * factor
        w_bar = math.ceil(width * beta / factor) * factor

    return h_bar, w_bar

if __name__ == "__main__":
    # Note: The values of max_pixels and vl_high_resolution_images must match the parameters passed when calling the model.
    h_bar, w_bar = smart_size("xxx/test.jpg", max_pixels=2560 * 32 * 32, vl_high_resolution_images=False)
    print(f"Scaled image dimensions: Height {h_bar}, Width {w_bar}")

    # Each image includes one <vision_bos> and one <vision_eos> token.
    token = int(h_bar * w_bar / (32 * 32)) + 2
    print(f"Number of image tokens: {token}")

Videos

Video files:

When processing a video file, the model first extracts frames and then calculates the total number of tokens for all video frames. Because this calculation is complex, you can use the following code to estimate the total token consumption for a video by providing its path:

# Before use, install: pip install opencv-python
import math
import os
import logging
import cv2

logger = logging.getLogger(__name__)

FRAME_FACTOR = 2

# For models such as Qwen3.6, Qwen3.5, Qwen3-VL, qwen-vl-max-0813, qwen-vl-plus-0815, and qwen-vl-plus-0710, the image scaling factor is 32.
IMAGE_FACTOR = 32

# For other models, the image scaling factor is 28.
# IMAGE_FACTOR = 28

# Maximum aspect ratio for video frames
MAX_RATIO = 200
# Pixel lower limit for video frames
VIDEO_MIN_PIXELS = 4 * 32 * 32
# Pixel upper limit for video frames. For the Qwen3-VL-Plus model, VIDEO_MAX_PIXELS is 640 * 32 * 32. For other models, it is 768 * 32 * 32.
VIDEO_MAX_PIXELS = 640 * 32 * 32

# If the user does not pass the FPS parameter, the default value is used for fps.
FPS = 2.0
# Minimum number of extracted frames
FPS_MIN_FRAMES = 4
# Maximum number of extracted frames (set based on the selected model)
FPS_MAX_FRAMES = 2000

# Maximum pixel value for video input. For the Qwen3-VL-Plus model, set VIDEO_TOTAL_PIXELS to 131072 * 32 * 32. For other models, set it to 65536 * 32 * 32.
VIDEO_TOTAL_PIXELS = int(float(os.environ.get('VIDEO_MAX_PIXELS', 131072 * 32 * 32)))

def round_by_factor(number: int, factor: int) -> int:
    """Returns the integer closest to 'number' that is divisible by 'factor'."""
    return round(number / factor) * factor

def ceil_by_factor(number: int, factor: int) -> int:
    """Returns the smallest integer that is greater than or equal to 'number' and divisible by 'factor'."""
    return math.ceil(number / factor) * factor

def floor_by_factor(number: int, factor: int) -> int:
    """Returns the largest integer that is less than or equal to 'number' and divisible by 'factor'."""
    return math.floor(number / factor) * factor

def extract_vision_info(conversations):
    vision_infos = []
    if isinstance(conversations[0], dict):
        conversations = [conversations]
    for conversation in conversations:
        for message in conversation:
            if isinstance(message["content"], list):
                for ele in message["content"]:
                    if (
                        "image" in ele
                        or "image_url" in ele
                        or "video" in ele
                        or ele.get("type","") in ("image", "image_url", "video")
                    ):
                        vision_infos.append(ele)
    return vision_infos

def smart_nframes(ele,total_frames,video_fps):
    """Calculates the number of extracted video frames.

    Args:
        ele (dict): A dictionary containing the video configuration.
            - fps: Controls the number of input frames extracted for the model.
        total_frames (int): The original total number of frames in the video.
        video_fps (int | float): The original frame rate of the video.

    Raises:
        An error is reported if nframes is not within the interval [FRAME_FACTOR, total_frames].

    Returns:
        The number of video frames for model input.
    """
    assert not ("fps" in ele and "nframes" in ele), "Only accept either `fps` or `nframes`"
    fps = ele.get("fps", FPS)
    min_frames = ceil_by_factor(ele.get("min_frames", FPS_MIN_FRAMES), FRAME_FACTOR)
    max_frames = floor_by_factor(ele.get("max_frames", min(FPS_MAX_FRAMES, total_frames)), FRAME_FACTOR)
    duration = total_frames / video_fps if video_fps != 0 else 0
    if duration-int(duration)>(1/fps):
        total_frames = math.ceil(duration * video_fps)
    else:
        total_frames = math.ceil(int(duration)*video_fps)
    nframes = total_frames / video_fps * fps
    if nframes > total_frames:
        logger.warning(f"smart_nframes: nframes[{nframes}] > total_frames[{total_frames}]")
    nframes = int(min(min(max(nframes, min_frames), max_frames), total_frames))
    if not (FRAME_FACTOR <= nframes and nframes <= total_frames):
        raise ValueError(f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")

    return nframes

def get_video(video_path):
    # Get video information
    cap = cv2.VideoCapture(video_path)

    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    # Get video height
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    video_fps = cap.get(cv2.CAP_PROP_FPS)
    return frame_height, frame_width, total_frames, video_fps

def smart_resize(ele, path, factor=IMAGE_FACTOR):
    # Get the original width and height of the video
    height, width, total_frames, video_fps = get_video(path)
    # Token lower limit for video frames
    min_pixels = VIDEO_MIN_PIXELS
    total_pixels = VIDEO_TOTAL_PIXELS
    # Number of extracted video frames
    nframes = smart_nframes(ele, total_frames, video_fps)
    max_pixels = max(min(VIDEO_MAX_PIXELS, total_pixels / nframes * FRAME_FACTOR),int(min_pixels * 1.05))

    # The aspect ratio of the video should not exceed 200:1 or 1:200.
    if max(height, width) / min(height, width) > MAX_RATIO:
        raise ValueError(
            f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}"
        )

    h_bar = max(factor, round_by_factor(height, factor))
    w_bar = max(factor, round_by_factor(width, factor))
    if h_bar * w_bar > max_pixels:
        beta = math.sqrt((height * width) / max_pixels)
        h_bar = floor_by_factor(height / beta, factor)
        w_bar = floor_by_factor(width / beta, factor)
    elif h_bar * w_bar < min_pixels:
        beta = math.sqrt(min_pixels / (height * width))
        h_bar = ceil_by_factor(height * beta, factor)
        w_bar = ceil_by_factor(width * beta, factor)
    return h_bar, w_bar

def token_calculate(video_path, fps):
    # Pass the video path and the fps frame extraction parameter.
    messages = [{"content": [{"video": video_path, "fps": fps}]}]
    vision_infos = extract_vision_info(messages)[0]

    resized_height, resized_width = smart_resize(vision_infos, video_path)

    height, width, total_frames, video_fps = get_video(video_path)
    num_frames = smart_nframes(vision_infos, total_frames, video_fps)
    print(f"Original video dimensions: {height}*{width}, Model input dimensions: {resized_height}*{resized_width}, Total video frames: {total_frames}, Total frames extracted when fps is {fps}: {num_frames}", end=", ")
    video_token = int(math.ceil(num_frames / 2) * resized_height / 32 * resized_width / 32)
    video_token += 2   # The system automatically adds <|vision_bos|> and <|vision_eos|> visual markers (1 token each).
    return video_token

video_token = token_calculate("xxx/test.mp4", 1)
print("Video tokens:", video_token)

Image list:

When a video is passed as a list of images, it means that frame extraction has already been performed. Use the following code to calculate the token consumption by providing the path and number of images:

# Before use, install: pip install Pillow
import math
import os
import logging
from typing import Tuple
from PIL import Image

logger = logging.getLogger(__name__)

# ==================== Constant Definitions ====================
FRAME_FACTOR = 2
# For models such as Qwen3-VL, qwen-vl-max-0813, qwen-vl-plus-0815, and qwen-vl-plus-0710, the scaling factor is 32.
IMAGE_FACTOR = 32

# For other models, the scaling factor is 28.
# IMAGE_FACTOR = 28

# Constants for token calculation
TOKEN_DIVISOR = 32  # Divisor for token calculation
VISION_SPECIAL_TOKENS = 2  # <|vision_bos|> and <|vision_eos|> markers

# Maximum aspect ratio for video frames
MAX_RATIO = 200
# Pixel lower limit for video frames
VIDEO_MIN_PIXELS = 4 * 32 * 32
# Pixel upper limit for video frames. For the Qwen3-VL-Plus model, VIDEO_MAX_PIXELS is 640 * 32 * 32. For other models, it is 768 * 32 * 32.
VIDEO_MAX_PIXELS = 640 * 32 * 32

# Maximum pixel value for video input. For the Qwen3-VL-Plus model, set VIDEO_TOTAL_PIXELS to 131072 * 32 * 32. For other models, set it to 65536 * 32 * 32.
VIDEO_TOTAL_PIXELS = int(float(os.environ.get('VIDEO_MAX_PIXELS', 131072 * 32 * 32)))

def round_by_factor(number: int, factor: int) -> int:
    """Returns the integer closest to 'number' that is divisible by 'factor'."""
    return round(number / factor) * factor

def ceil_by_factor(number: int, factor: int) -> int:
    """Returns the smallest integer that is greater than or equal to 'number' and divisible by 'factor'."""
    return math.ceil(number / factor) * factor

def floor_by_factor(number: int, factor: int) -> int:
    """Returns the largest integer that is less than or equal to 'number' and divisible by 'factor'."""
    return math.floor(number / factor) * factor

def get_image_size(image_path: str) -> Tuple[int, int]:
    if not os.path.exists(image_path):
        raise FileNotFoundError(f"Image file not found: {image_path}")

    try:
        image = Image.open(image_path)
        height = image.height
        width = image.width
        image.close()  # Close the file promptly
        return height, width
    except Exception as e:
        raise ValueError(f"Cannot read image file {image_path}: {str(e)}")

def smart_resize(height: int, width: int, nframes: int, factor: int = IMAGE_FACTOR) -> Tuple[int, int]:
    """
    Calculates the scaled dimensions of an image

    Args:
        height: Original image height
        width: Original image width
        nframes: Number of video frames
        factor: Scaling factor, defaults to IMAGE_FACTOR

    Returns:
        (resized_height, resized_width) The scaled height and width

    Raises:
        ValueError: Aspect ratio exceeds the limit
    """
    # Token lower limit for video frames
    min_pixels = VIDEO_MIN_PIXELS
    total_pixels = VIDEO_TOTAL_PIXELS
    # Number of extracted video frames
    max_pixels = max(min(VIDEO_MAX_PIXELS, total_pixels / nframes * FRAME_FACTOR), int(min_pixels * 1.05))

    # The aspect ratio of the video should not exceed 200:1 or 1:200.
    aspect_ratio = max(height, width) / min(height, width)
    if aspect_ratio > MAX_RATIO:
        raise ValueError(
            f"Image aspect ratio must be less than {MAX_RATIO}:1, but is currently {aspect_ratio:.2f}:1"
        )

    h_bar = max(factor, round_by_factor(height, factor))
    w_bar = max(factor, round_by_factor(width, factor))
    if h_bar * w_bar > max_pixels:
        beta = math.sqrt((height * width) / max_pixels)
        h_bar = floor_by_factor(height / beta, factor)
        w_bar = floor_by_factor(width / beta, factor)
    elif h_bar * w_bar < min_pixels:
        beta = math.sqrt(min_pixels / (height * width))
        h_bar = ceil_by_factor(height * beta, factor)
        w_bar = ceil_by_factor(width * beta, factor)
    return h_bar, w_bar

def calculate_video_tokens(image_path: str, nframes: int = 1, factor: int = IMAGE_FACTOR, verbose: bool = True) -> int:
    """

    Args:
        image_path: Path to the video frame file
        nframes: Number of video frames,
        factor: Scaling factor, defaults to IMAGE_FACTOR
        verbose: Whether to print detailed information

    Returns:
        The number of tokens consumed

    Raises:
        FileNotFoundError: The file does not exist
        ValueError: The file format is invalid or the aspect ratio exceeds the limit
    """
    # Get the original image dimensions (read only once)
    height, width = get_image_size(image_path)

    # Calculate the scaled dimensions
    resized_height, resized_width = smart_resize(height, width, nframes, factor)

    # Calculate the number of tokens
    # Formula: ceil(nframes/2) * (height/TOKEN_DIVISOR) * (width/TOKEN_DIVISOR) + VISION_SPECIAL_TOKENS
    video_token = int(
        math.ceil(nframes / 2) *
        (resized_height / TOKEN_DIVISOR) *
        (resized_width / TOKEN_DIVISOR)
    )
    # Add visual marker tokens (<|vision_bos|> and <|vision_eos|>)
    video_token += VISION_SPECIAL_TOKENS

    if verbose:
        print(f"Original video frame dimensions: {height}x{width}, Model input dimensions: {resized_height}x{resized_width}, ", end="")

    return video_token

if __name__ == "__main__":
    try:
        video_token = calculate_video_tokens("xxx/test.jpg", nframes=30)
        print(f"Video tokens: {video_token}\n")
    except Exception as e:
        print(f"Error: {str(e)}\n")

Image generation models – Wan

Note

For the training workflow, see Fine-tune image generation models. After training completes, deploy the new model before calling it.

Method	Billed by training tokens
Formula	Model training fee = Total training tokens × Training unit price (Billing unit: per 1,000 tokens)

Formula for total training tokens

$T r ainin g T o k e n s T o t a l \approx ma x_s t e p s \times L_{s t e p}$

Where:

max_steps: A hyperparameter specified during training, representing the maximum number of training steps (configured when creating a fine-tuning job).
L_step: The token consumption per step. The formula is:

$L_{s t e p} = i \in ba t c h \sum L_{i t e m}^{(i)} \leq L_{ma x}$

L_step is approximately equal to L_max. L_max is determined by the max_token_length and generation_type, as shown below:

generation_type	max_token_length	L_max
t2i (text-to-image)	1k	12,800
t2i (text-to-image)	2k	23,220
i2i (image-to-image)	1k	23,220
i2i (image-to-image)	2k	32,000

Note

The above formula provides an approximation. Actual billing is based on the usage field returned by the system.

Model	Code	Training price (per 1K tokens)
Wan image generation	wan2.7-image-pro	CNY 0.08
Wan image generation	wan2.7-image	CNY 0.08

Billing example

Suppose you fine-tune the wan2.7-image-pro model for t2i. The parameters are: max_steps = 200, max_token_length = "1k", and the training price is CNY 0.08 per 1,000 tokens:

From the table: L_max = 12,800 (generation_type=t2i, max_token_length=1k), L_step ≈ L_max = 12,800
Total training tokens ≈ 200 × 12800 = 2560000 = 2560 thousand tokens
Model training fee ≈ 2560 × 0.08 = CNY 204.8

Video generation models – Wan

Note

For the training workflow, see Fine-tuning video generation models. After training completes, deploy the new model before calling it.

Method	Billed by training tokens
Formula	Model training fee = Total training tokens × Training unit price (Billing unit: per 1,000 tokens)

Formula for total training tokens

$T r ainin g T o k e n s T o t a l = (i = 1 \sum N billing duration of video_{i}) \times \frac{ma x _ p i x e l s}{1024} \times n_e p oc h s$

Where:

N: Total number of videos in the training set.
max_pixels: A hyperparameter specified during training, representing the maximum number of pixels for a video (configured when creating a fine-tuning job).
n_epochs: A hyperparameter specified during training, representing the number of loops (configured when creating a fine-tuning job).
- The conversion between n_epochs and steps is: steps = n_epochs × ⌈dataset_size / batch_size⌉, i.e., n_epochs = steps / ⌈dataset_size / batch_size⌉.
- When the dataset contains only 1 sample and batch_size = 1, n_epochs = steps. We recommend a total of at least 800 steps.
Billing duration calculation rule for a single video: First, round the original video duration (in seconds) to the nearest integer, then determine the final value based on model limits.
- wan2.7 model: Billing duration=min(10, rounded duration), meaning a single video is billed for a maximum of 10 seconds.
- wan2.5 model: Billing duration=min(10, rounded duration), meaning a single video is billed for a maximum of 10 seconds.
- wan2.2 model: Billing duration=min(5, rounded duration), meaning a single video is billed for a maximum of 5 seconds.

Model	Code	Training price (per 1K tokens)
Wan image-to-video (first frame-based)	wan2.7-i2v	CNY 2
	wan2.2-i2v-flash	CNY 0.06
	wan2.5-i2v-preview	CNY 0.32
Image-to-video (first and last frame-based)	wan2.2-kf2v-flash	CNY 0.06

Billing examples

wan2.7-i2v cost estimation (single data)

Assume a training set contains 1 video with a duration of 10 seconds. With batch_size = 1 (recommended), n_epochs = steps / ⌈1(dataset_size) / 1(batch_size)⌉ = steps.

Training unit price = CNY 2/thousand tokens. Taking max_pixels = 36864 and n_epochs = 800 as an example:

Total training tokens = 10 × (36864 / 1024) × 800 = 288,000 = 288 thousand tokens
Model training fee = 288 × 2 = CNY 576

max_pixels	Common steps	n_epochs	Estimated tokens	Estimated cost (CNY)
36864	800	800	288,000	576
	1,000	1,000	360,000	720
	2,000	2,000	720,000	1,440
65536	800	800	512,000	1,024
	1,000	1,000	640,000	1,280
	2,000	2,000	1,280,000	2,560
102400	800	800	800,000	1,600
	1,000	1,000	1,000,000	2,000
	2,000	2,000	2,000,000	4,000

wan2.7-i2v cost estimation (multiple data)

Assume a training set contains 2 videos with durations of 3.4 seconds and 11.5 seconds. Parameters: max_pixels = 36864, n_epochs = 800. Training unit price = CNY 2/thousand tokens:

Duration calculation:
- Video 1: 3.4 seconds is rounded to 3. Billable duration = min(10, 3) = 3.
- Video 2: 11.5 seconds is rounded to 11. Billable duration = min(10, 11) = 10.
- Total billable duration = 3 + 10 = 13 seconds.
Total training tokens = 13 × (36864/1024) × 800 = 374,400 = 374.4 thousand tokens.
Model training fee = 374.4 × 2 = CNY 748.8.

Deployment billing

Text generation models: Qwen

Time-based billing (Provisioned Throughput)

Cost = Usage Duration × (Input TPM Unit Price × Input TPM + Output TPM Unit Price × Output TPM)

For the pay-as-you-go method, usage is billed hourly, and the unit price is based on the hourly rates in the table below. For the subscription method, usage is billed daily, and the unit price is based on the daily rates in the table below.

Subscription orders take effect immediately after payment. An N-day subscription is valid until 23:59 on the Nth day. If an order is placed after 22:00, the expiration date is automatically extended by one day.
After a subscription order expires, the service is stopped after a 2-hour grace period. After the service is stopped, the resources are retained for 14 hours and then released.
Subscription orders cannot be terminated early.
For the pay-as-you-go method, if your account has an overdue payment, the deployed resources are retained and continue to be billed for 24 hours, during which the service remains available. After 24 hours, the system stops billing, and the model deployment enters an overdue state. The underlying resources are deleted, but the model deployment task is retained. After you pay the overdue amount, the system reallocates resources, restores the service, and resumes billing. To stop incurring charges, you must delete the model deployment task. Billing stops after the task is successfully deleted.

If the model input exceeds the maximum input tokens or the purchased TPM, the call automatically switches to the pay-as-you-go mode for the current model. In this case, inference performance may decrease and will be subject to the public traffic control of the current snapshot model in the workspace. Costs are charged based on the model invocation (pay-as-you-go) standard.

In this case, the API call returns a header that contains x-dashscope-ptu-overflow:true.
To view TPM statistics, go to Model Monitoring (Beijing).

For the specific refund rules for scale-in scenarios (downgrades), see Refund rules for downgrades.

Qwen

Model name	Model code	Max input tokens	Pay-as-you-go input Per 10k TPM/hour	Pay-as-you-go output Per 1k TPM/hour	Subscription input Per 10k TPM/day	Subscription output Per 1k TPM/day
Qwen3.7-Max-2026-05-20	qwen3.7-max-2026-05-20	256K	CNY 28.8	CNY 8.64	CNY 345.6	CNY 103.68
Qwen3.7-Plus-2026-05-26	qwen3.7-plus-2026-05-26	256K	CNY 4.8	CNY 1.92	CNY 57.6	CNY 23.04

Qwen3.6-Plus-2026-04-02	qwen3.6-plus-2026-04-02	128K	CNY 4.8	CNY 2.88	CNY 57.6	CNY 34.56

Qwen3.5-Plus-2026-04-20	qwen3.5-plus-2026-04-20	128K	CNY 1.92	CNY 1.15	CNY 23.04	CNY 13.82

Qwen3-Max-2025-09-23	qwen3-max-2025-09-23	128K	CNY 7.68	CNY 3.08	CNY 92.16	CNY 36.96

Qwen-Flash-2025-07-28	qwen-flash-2025-07-28	128K	CNY 0.36	CNY 0.36	CNY 4.32	CNY 4.32
Qwen-Plus-2025-12-01	qwen-plus-2025-12-01	128K	CNY 1.92	Non-thinking mode: CNY 0.48 Thinking mode: CNY 1.92	CNY 23.04	Non-thinking mode: CNY 5.76 Thinking mode: CNY 23.04

DeepSeek

Model name	Model code	Max input tokens	Pay-as-you-go input Per 10k TPM/hour	Pay-as-you-go output Per 1k TPM/hour	Subscription input Per 10k TPM/day	Subscription output Per 1k TPM/day
DeepSeek-v4-Flash	deepseek-v4-flash	256K	CNY 3.6	CNY 0.72	CNY 43.2	CNY 8.64
DeepSeek-v4-Pro	deepseek-v4-pro	256K	CNY 43.2	CNY 8.64	CNY 518.4	CNY 103.68
DeepSeek-v3.2	deepseek-v3.2	64K	CNY 7.2	CNY 1.08	CNY 86.4	CNY 12.96
DeepSeek-v3	deepseek-v3	64K	CNY 7.2	CNY 2.88	CNY 86.4	CNY 34.56

Qwen-VL

Model name

Model code

Max input tokens

Pay-as-you-go input

Per 10k TPM/hour

Pay-as-you-go output

Per 1k TPM/hour

Subscription input

Per 10k TPM/day

Subscription output

Per 1k TPM/day

Qwen3-VL-Plus-2025-09-23

qwen3-vl-plus-2025-09-23

128K

CNY 2.4

CNY 28.8

More models

Model name

Model code

Max input tokens

Pay-as-you-go input

Per 10k TPM/hour

Pay-as-you-go output

Per 1k TPM/hour

Subscription input

Per 10k TPM/day

Subscription output

Per 1k TPM/day

GLM-5.1

glm-5.1

64K

CNY 21.6

CNY 8.64

CNY 259.2

CNY 103.68

Time-based billing (Model Unit)

Cost = Usage Duration (hours) × Number of Model Units × Model Unit Price

For the pay-as-you-go method, the "Model Unit Price" is the "Hourly Price" from the table below. For the monthly subscription method, the formula is: Number of Months × Number of Model Units × Monthly Price.

For subscriptions, if you unsubscribe within the first month, the daily unit price (≈ monthly unit price / 30) is charged at 1.2 times the standard rate. Usage for less than a day is billed as a full day.

Note

For the Model Unit pay-as-you-go method, computing power resources are allocated on a first-come, first-served basis. A full refund is issued if the purchase is unsuccessful.

Text generation

Qwen

Model name	Model code	Model unit specification	Hourly price (CNY) Minimum billing unit: minute	Monthly price (CNY) Minimum billing unit: day
Qwen3.7-Plus-2026-05-26	qwen3.7-plus-2026-05-26	MU3 x 8	CNY 1,096	CNY 527,752

Qwen3.6-35B-A3B	qwen3.6-35b-a3b	MU8 x 1	CNY 47	CNY 22,400
Qwen3.6-35B-A3B	qwen3.6-35b-a3b	MU9 x 1	CNY 51	CNY 24,600
Qwen3.6-27B	qwen3.6-27b	MU9 x 1	CNY 51	CNY 24,600
Qwen3.6-Flash-2026-04-16	qwen3.6-flash-2026-04-16	MU1 x 2	CNY 108	CNY 52,236
Qwen3.6-Plus-2026-04-02	qwen3.6-plus-2026-04-02	MU1 x 8 MU1 x 16 (PD separation mode)	CNY 432 PD separation mode: CNY 864	CNY 208,944 PD separation mode: CNY 417,888

Qwen3.5-397B-A17B	qwen3.5-397b-a17b	MU2 x 8	CNY 504	CNY 240,288
		MU3 x 8 MU3 x 16 (PD separation mode)	CNY 1,096 PD separation mode: CNY 2,192	CNY 527,752 PD separation mode: CNY 1,055,504
		MU6 x 16	CNY 400	CNY 193,424
Qwen3.5-122B-A10B	qwen3.5-122b-a10b	MU1 x 4	CNY 216	CNY 104,472
		MU2 x 8	CNY 504	CNY 240,288
		MU6 x 16	CNY 400	CNY 193,424
		MU9 x 2	CNY 102	CNY 49,200
Qwen3.5-35B-A3B	qwen3.5-35b-a3b	MU1 x 2	CNY 108	CNY 52,236
		MU2 x 8	CNY 504	CNY 240,288
		MU8 x 1	CNY 47	CNY 22,400
		MU9 x 1	CNY 51	CNY 24,600
Qwen3.5-27B	qwen3.5-27b	MU9 x 1	CNY 51	CNY 24,600
Qwen3.5-9B	qwen3.5-9b	MU8 x 1	CNY 47	CNY 22,400
Qwen3.5-9B	qwen3.5-9b	MU9 x 1	CNY 51	CNY 24,600
Qwen3.5-Flash-2026-02-23	qwen3.5-flash-2026-02-23	MU1 x 2	CNY 108	CNY 52,236
Qwen3.5-Plus-2026-02-15	qwen3.5-plus-2026-02-15	MU1 x 16 (PD separation mode)	PD separation mode: CNY 864	PD separation mode: CNY 417,888
Qwen3.5-Plus-2026-02-15	qwen3.5-plus-2026-02-15	MU3 x 8 MU3 x 16 (PD separation mode)	CNY 1,096 PD separation mode: CNY 2,192	CNY 527,752 PD separation mode: CNY 1,055,504

Qwen3-235B-A22B-Instruct-2507	qwen3-235b-a22b-instruct-2507	MU1 x 4	CNY 216	CNY 104,472
Qwen3-235B-A22B-Instruct-2507	qwen3-235b-a22b-instruct-2507	MU2 x 8	CNY 504	CNY 240,288
Qwen3-Next-80B-A3B-Instruct	qwen3-next-80b-a3b-instruct	MU1 x 2	CNY 108	CNY 52,236
Qwen3-32B	qwen3-32b	MU1 x 4	CNY 216	CNY 104,472
Qwen3-32B	qwen3-32b	MU6 x 4	CNY 100	CNY 48,356
Qwen3-30B-A3B	qwen3-30b-a3b	MU9 x 2	CNY 102	CNY 49,200
Qwen3-30B-A3B-Instruct-2507	qwen3-30b-a3b-instruct-2507	MU1 x 4	CNY 216	CNY 104,472
Qwen3-30B-A3B-Instruct-2507	qwen3-30b-a3b-instruct-2507	MU2 x 8	CNY 504	CNY 240,288
Qwen3-8B	qwen3-8b	MU1 x 2	CNY 108	CNY 52,236
		MU2 x 2	CNY 126	CNY 60,072
		MU5 x 1	CNY 21	CNY 10,139
Qwen3-4B	qwen3-4b	MU1 x 2	CNY 108	CNY 52,236
Qwen3-4B	qwen3-4b	MU5 x 1	CNY 21	CNY 10,139
Qwen3-1.7B	qwen3-1.7b	MU1 x 2	CNY 108	CNY 52,236
Qwen3-1.7B	qwen3-1.7b	MU5 x 1	CNY 21	CNY 10,139
Qwen3-Embedding-0.6B	qwen3-embedding-0.6b	MU5 x 1	CNY 21	CNY 10,139
Qwen3-Embedding-0.6B	qwen3-embedding-0.6b	MU6 x 1	CNY 25	CNY 12,089
Qwen3-MoE-Rerank-0.6B	qwen3-moe-rerank-0.6b	MU5 x 1	CNY 21	CNY 10,139
Qwen3-Rerank-0.6B	qwen3-rerank-0.6b	MU5 x 1	CNY 21	CNY 10,139
Qwen3-Rerank-0.6B	qwen3-rerank-0.6b	MU6 x 1	CNY 25	CNY 12,089
Qwen3-Max-2025-09-23	qwen3-max-2025-09-23	MU2 x 8	CNY 504	CNY 240,288
Qwen3-Max-2025-09-23	qwen3-max-2025-09-23	MU3 x 8	CNY 1,096	CNY 527,752
Qwen3-Rerank	qwen3-rerank	MU5 x 1	CNY 21	CNY 10,139

Qwen2.5-72B	qwen2.5-72b-instruct	MU1 x 4	CNY 216	CNY 104,472
Qwen2.5-Open-Source-32B	qwen2.5-32b-instruct	MU1 x 4	CNY 216	CNY 104,472
Qwen2.5-open-source-14B	qwen2.5-14b-instruct	MU1 x 2	CNY 108	CNY 52,236
Qwen2.5-7B	qwen2.5-7b-instruct	MU1 x 2	CNY 108	CNY 52,236
Qwen2.5-7B	qwen2.5-7b-instruct	MU5 x 1	CNY 21	CNY 10,139
Qwen2.5-3B	qwen2.5-3b-instruct	MU5 x 1	CNY 21	CNY 10,139

Qwen-Flash-2025-07-28	qwen-flash-2025-07-28	MU1 x 4	CNY 216	CNY 104,472
Qwen-Plus-2025-07-28	qwen-plus-2025-07-28	MU1 x 4 MU1 x 16 (PD separation mode)	CNY 216 PD separation mode: CNY 864	CNY 104,472 PD separation mode: CNY 417,888
Qwen-Plus-2025-12-01	qwen-plus-2025-12-01	MU1 x 4	CNY 216	CNY 104,472

GLM

Model name	Model code	Model unit specification	Hourly price (CNY) Minimum billing unit: minute	Monthly price (CNY) Minimum billing unit: day
GLM-5.1	glm-5.1	MU2 x 8 MU2 x 16 (PD separation mode)	CNY 504 PD separation mode: CNY 1,008	CNY 240,288 PD separation mode: CNY 480,576
		MU3 x 16 (PD separation mode)	PD separation mode: CNY 2,192	PD separation mode: CNY 1,055,504
		MU6 x 16	CNY 400	CNY 193,424
GLM-5	glm-5	MU3 x 16 (PD separation mode)	PD separation mode: CNY 2,192	PD separation mode: CNY 1,055,504
GLM-4.7	glm-4.7	MU6 x 32 (PD separation mode)	PD separation mode: CNY 800	PD separation mode: CNY 386,848

DeepSeek

Model name

Model code

Model unit specification

Hourly price (CNY)

Minimum billing unit: minute

Monthly price (CNY)

Minimum billing unit: day

DeepSeek-v4-Flash

deepseek-v4-flash

MU1 x 8

CNY 432

CNY 208,944

DeepSeek-v3.2

deepseek-v3.2

MU2 x 16 (PD separation mode)

PD separation mode: CNY 1,008

PD separation mode: CNY 480,576

More models

Model name	Model code	Model unit specification	Hourly price (CNY) Minimum billing unit: minute	Monthly price (CNY) Minimum billing unit: day
MiniMax-M2.5	MiniMax-M2.5	MU1 x 16 (PD separation mode)	PD separation mode: CNY 864	PD separation mode: CNY 417,888

Kimi-K2.5	kimi-k2.5	MU2 x 8	CNY 504	CNY 240,288

Model types:

Instruct - The deployed model performs inference in non-thinking mode.
Thinking - The deployed model performs inference in thinking mode.

Model deployment types:

PD separation mode - Reduces first-token latency and improves throughput.

In this deployment mode, the model inference process splits the first-token calculation (Prefill) and subsequent token calculation (Decode) stages to run on different compute nodes.

Multimodal

Qwen-VL

Model name	Model code	Model unit specification	Hourly price (CNY) Minimum billing unit: minute	Monthly price (CNY) Minimum billing unit: day
Qwen3-VL-235B-A22B-Instruct	qwen3-vl-235b-a22b-instruct	MU1 x 4	CNY 216	CNY 104,472
Qwen3-VL-235B-A22B-Thinking	qwen3-vl-235b-a22b-thinking	MU1 x 4	CNY 216	CNY 104,472
Qwen3-VL-32B-Instruct	qwen3-vl-32b-instruct	MU2 x 8	CNY 504	CNY 240,288
Qwen3-VL-8B-Instruct	qwen3-vl-8b-instruct	MU1 x 2	CNY 108	CNY 52,236
Qwen3-VL-4B-Instruct	qwen3-vl-4b-instruct	MU1 x 2	CNY 108	CNY 52,236
Qwen3-VL-2B-Instruct	qwen3-vl-2b-instruct	MU5 x 1	CNY 21	CNY 10,139
Qwen3-VL-Embedding-2B	qwen3-vl-embedding-2b	MU5 x 1	CNY 21	CNY 10,139
Qwen3-VL-Flash-2025-10-15	qwen3-vl-flash-2025-10-15	MU1 x 4	CNY 216	CNY 104,472
Qwen3-VL-Plus-2025-09-23	qwen3-vl-plus-2025-09-23	MU1 x 4	CNY 216	CNY 104,472

Qwen-VL-Max-2025-08-13	qwen-vl-max-2025-08-13	MU6 x 4	CNY 100	CNY 48,356
Qwen-VL-OCR-2025-11-20	qwen-vl-ocr-2025-11-20	MU6 x 4	CNY 100	CNY 48,356

Qwen Omni

Model name	Model code	Model unit specification	Hourly price (CNY) Minimum billing unit: minute	Monthly price (CNY) Minimum billing unit: day
Qwen3.5-Omni-Flash	qwen3.5-omni-flash	MU8 x 1	CNY 47	CNY 22,400
Qwen3.5-Omni-Flash	qwen3.5-omni-flash	MU9 x 1	CNY 51	CNY 24,600
Qwen3.5-Omni-Plus	qwen3.5-omni-plus	MU9 x 8	CNY 408	CNY 196,800

Model types:

Instruct - The deployed model performs inference in non-thinking mode.
Thinking - The deployed model performs inference in thinking mode.
Instruct/Thinking - You can choose whether to enable thinking mode when deploying the model.

Speech synthesis

CosyVoice

Model name	Model code	Model unit specification	Hourly price (CNY)	Monthly price (CNY)
cosyvoice-v3-flash	cosyvoice-v3-flash	MU5	CNY 21	CNY 10,139

By model token usage

Cost = Number of Input Tokens × Input Unit Price + Number of Output Tokens × Output Unit Price (Minimum billing unit: 1 token)

Billing by model token usage is only supported after you have completed Supervised Fine-Tuning (SFT) for the following foundation models and you have obtained a custom model.

Qwen

Foundation model	Model code	Input CNY/1k tokens	Output CNY/1k tokens
Qwen3-32B	qwen3-32b	CNY 0.002	Non-thinking mode: CNY 0.008 Thinking mode: CNY 0.02
Qwen3-14B	qwen3-14b	CNY 0.001	Non-thinking mode: CNY 0.004 Thinking mode: CNY 0.01
Qwen3-8B	qwen3-8b	CNY 0.0005	Non-thinking mode: CNY 0.002 Thinking mode: CNY 0.005

Qwen2.5-72B	qwen2.5-72b-instruct	CNY 0.004	CNY 0.012
Qwen2.5-32B	qwen2.5-32b-instruct	CNY 0.002	CNY 0.006
Qwen2.5-Open-Source-14B	qwen2.5-14b-instruct	¥0.001	CNY 0.003
Qwen2.5-Open-Source-7B	qwen2.5-7b-instruct	CNY 0.0005	CNY 0.001

Qwen-VL

Foundation model	Model code	Input CNY/1k tokens	Output CNY/1k tokens
Qwen3-VL-8B-Instruct	qwen3-vl-8b-instruct	CNY 0.0005	CNY 0.002

Qwen2.5-VL-72B	qwen2.5-vl-72b-instruct	CNY 0.016	CNY 0.048
Qwen2.5-VL-32B	qwen2.5-vl-32b-instruct	CNY 0.008	CNY 0.024
Qwen2.5-VL-7B	qwen2.5-vl-7b-instruct	CNY 0.002	CNY 0.005

Image generation models – Wan

Deployment is free. Invocations are billed at the standard rate of the fine-tuned base model. For the training workflow, see Fine-tune image generation models.

Model ID	LoRA Deployment & Invocation Price
wan2.7-image-pro	CNY 0.50/image
wan2.7-image	CNY 0.20/image

FAQ

Q: When does billing for model deployment start?

A: Billing starts when the model status changes to Running. No charges apply during Deploying, Overdue Payment, or Deployment Failed.

For monthly subscriptions, the billing period starts when the status changes to Running.

Q: Am I charged if I cancel a training job?

A: Yes. If you cancel training manually, you are charged for all tokens processed before cancellation. Training jobs interrupted by system errors or other non-user causes are not charged.

Q: How do I view invocation statistics for a deployed model?

A: Visit the Model Monitoring (Beijing), Model Monitoring (Virginia), or Model Monitoring (Singapore) page.