Training and deployment pricing

更新时间:
复制 MD 格式

This topic describes the billing rules and pricing for model training and model deployment on Alibaba Cloud Model Studio.

Training billing

Text generation models – Qwen

Note

For the training workflow, see Model tuning. After training completes, deploy the new model before evaluating or calling it.

Method

Billed by training tokens

Formula

Model training fee = (Total tokens in training data + Total tokens in mixed training data) × Number of epochs × Training unit price (Minimum billing unit: 1 token)

View the estimated training fee at the bottom of the model training console, and click Computing Details to view the total number of training tokens, number of epochs, and training unit price.

Qwen

Model service

Model identifier

Price

Qwen3.6-Flash-2026-04-16

qwen3.6-flash-2026-04-16

CNY 0.05 per 1,000 tokens

Qwen3.5-27B

qwen3.5-27b

CNY 0.05 per 1,000 tokens

Qwen3.5-9B

qwen3.5-9b

CNY 0.02 per 1,000 tokens

Qwen3.5-Flash-2026-02-23

qwen3.5-flash-2026-02-23

CNY 0.05 per 1,000 tokens

Qwen3-32B

qwen3-32b

CNY 0.04 per 1,000 tokens

Qwen3-30B-A3B-Instruct-2507

qwen3-30b-a3b-instruct-2507

CNY 0.03 per 1,000 tokens

Qwen3-14B

qwen3-14b

CNY 0.03 per 1,000 tokens

Qwen3-8B

qwen3-8b

CNY 0.006 per 1,000 tokens

Qwen3-1.7B

qwen3-1.7b

CNY 0.0045 per 1,000 tokens

Qwen3-0.6B

qwen3-0.6b

CNY 0.003 per 1,000 tokens

Qwen2.5-72B-Instruct

qwen2.5-72b-instruct

CNY 0.15 per 1,000 tokens

Qwen2.5-32B-Instruct

qwen2.5-32b-instruct

CNY 0.03 per 1,000 tokens

Qwen2.5-14B-Instruct

qwen2.5-14b-instruct

CNY 0.03 per 1,000 tokens

Qwen2.5-7B-Instruct

qwen2.5-7b-instruct

CNY 0.006 per 1,000 tokens

Qwen-Plus-Character-2025-11-06

qwen-plus-character-2025-11-06

CNY 0.15 per 1,000 tokens

Qwen-VL

Model service

Model identifier

Price

Qwen3-VL-8B-Instruct

qwen3-vl-8b-instruct

CNY 0.012 per 1,000 tokens

Qwen3-VL-8B-Thinking

qwen3-vl-8b-thinking

CNY 0.012 per 1,000 tokens

Qwen3-VL-4B-Instruct

qwen3-vl-4b-instruct

CNY 0.006 per 1,000 tokens

Qwen2.5-VL-72B-Instruct

qwen2.5-vl-72b-instruct

CNY 0.05 per 1,000 tokens

Qwen2.5-VL-32B-Instruct

qwen2.5-vl-32b-instruct

CNY 0.02 per 1,000 tokens

Qwen2.5-VL-7B-Instruct

qwen2.5-vl-7b-instruct

CNY 0.01 per 1,000 tokens

Image generation models – Wan

Note

For the training workflow, see Fine-tune image generation models. After training completes, deploy the new model before calling it.

Method

Billed by training tokens

Formula

Model training fee = Total training tokens × Training unit price (Billing unit: per 1,000 tokens)

Formula for total training tokens

Where:

  • max_steps: A hyperparameter specified during training, representing the maximum number of training steps (configured when creating a fine-tuning job).

  • Lstep: The token consumption per step. The formula is:

    Lstep is approximately equal to Lmax. Lmax is determined by the max_token_length and generation_type, as shown below:

generation_type

max_token_length

Lmax

t2i (text-to-image)

1k

128,000

2k

232,200

i2i (image-to-image)

1k

232,200

2k

320,000

Note

The above formula provides an approximation. Actual billing is based on the usage field returned by the system.

Model

Code

Training price (per 1K tokens)

Wan image generation

wan2.7-image-pro

CNY 0.08

Wan image generation

wan2.7-image

CNY 0.08

Billing example

Suppose you fine-tune the wan2.7-image-pro model for t2i. The parameters are: max_steps = 200, max_token_length = "1k", and the training price is CNY 0.08 per 1,000 tokens:

  • From the table: Lmax = 128,000 (generation_type=t2i, max_token_length=1k), Lstep ≈ Lmax = 128,000

  • Total training tokens ≈ 200 × 128,000 = 25,600,000 = 25,600 thousand tokens

  • Model training fee ≈ 25,600 × 0.08 = CNY 2,048

Video generation models – Wan

Note

For the training workflow, see Fine-tuning video generation models. After training completes, deploy the new model before calling it.

Method

Billed by training tokens

Formula

Model training fee = Total training tokens × Training unit price (Billing unit: per 1,000 tokens)

Formula for total training tokens

Where:

  • N: Total number of videos in the training set.

  • max_pixels: A hyperparameter specified during training, representing the maximum number of pixels for a video (configured when creating a fine-tuning job).

  • n_epochs: A hyperparameter specified during training, representing the number of loops (configured when creating a fine-tuning job).

  • Billing duration calculation rule for a single video: First, round the original video duration (in seconds) to the nearest integer, then determine the final value based on model limits.

    • wan2.5 model: Billing duration=min(10, rounded duration), meaning a single video is billed for a maximum of 10 seconds.

    • wan2.2 model: Billing duration=min(5, rounded duration), meaning a single video is billed for a maximum of 5 seconds.

Model

Code

Training price (per 1K tokens)

Wan image-to-video (first frame-based)

wan2.2-i2v-flash

CNY 0.06

wan2.5-i2v-preview

CNY 0.32

Image-to-video (first and last frame-based)

wan2.2-kf2v-flash

CNY 0.06

Billing example

Assume a training set contains two videos with durations of 3.4 seconds and 6.5 seconds. The parameters are set as follows: max_pixels = 262,144, n_epochs = 400, and the training price is CNY 0.06 per 1,000 tokens:

  • Duration calculation:

    • Video 1: 3.4 seconds is rounded to 3 seconds. The billable duration is min(5, 3) = 3.

    • Video 2: 6.5 seconds is rounded to 7 seconds. The billable duration is min(5, 7) = 5.

    • Total billable duration = 3 + 5 = 8 seconds.

  • The total training tokens are calculated as 8 × (262144/1024) × 400, resulting in 819,200, or 819.2 thousand tokens.

  • Model training fee = 819.2 × 0.06 = CNY 49.152.

Deployment billing

Text generation models: Qwen

Pay-as-you-go

cost = usage duration × (input TPM price × input TPM + output TPM price × output TPM)

For pay-as-you-go billing, usage is calculated by the hour, and the "1-hour duration" column lists the unit price. For subscription billing, usage is calculated by the day, and the "1-day duration" column lists the unit price.

  • A subscription activates upon payment and is valid for N days, expiring at 23:59 on the Nth day. For orders placed after 22:00, the expiration date is automatically extended by 1 day.

  • After a subscription expires, service is suspended following a 2-hour grace period. Resources are then retained for 14 hours before being released.

  • You cannot terminate a subscription early.

  • For pay-as-you-go, if an account has an overdue balance, its deployed resources are retained and billing continues for 24 hours before they are automatically released.

When a model's input exceeds its maximum token limit or purchased TPM, calls to that model automatically switch to the pay-as-you-go mode. In this mode, inference performance may decrease, rate limiting is subject to the public traffic limits of the current snapshot model in your workspace, and fees are charged at the standard pay-as-you-go rate.

  • The API response headers will include x-dashscope-ptu-overflow:true.

  • To view TPM statistics, go to the Model Studio console.

See Configuration Downgrade Refund Rules for details on fee adjustments and refunds for a scale-in (configuration downgrade).

Qwen

Model name

Model code

Maximum input tokens

Pay-as-you-go input

(per 10K TPM/hour)

Pay-as-you-go output

(per 1K TPM/hour)

Provisioned input

(per 10K TPM/day)

Provisioned output

(per 1K TPM/day)

Qwen3.7-Max-2026-05-20

qwen3.7-max-2026-05-20

128,000

¥28.8

¥8.64

¥345.6

¥103.68

Qwen3.6-Flash-2026-04-16

qwen3.6-flash-2026-04-16

128,000

¥2.88

¥1.73

¥34.56

¥20.74

Qwen3.6-Plus-2026-04-02

qwen3.6-plus-2026-04-02

128,000

¥4.8

¥2.88

¥57.6

¥34.56

Qwen3.5-Plus-2026-04-20

qwen3.5-plus-2026-04-20

128,000

¥1.92

¥1.15

¥23.04

¥13.82

Qwen3-Max-2025-09-23

qwen3-max-2025-09-23

128,000

¥7.68

¥3.08

¥92.16

¥36.96

Qwen-Flash-2025-07-28

qwen-flash-2025-07-28

128,000

¥0.36

¥0.36

¥4.32

¥4.32

Qwen-Plus-2025-12-01

qwen-plus-2025-12-01

128,000

¥1.92

standard: ¥0.48

code interpreter: ¥1.92

¥23.04

standard: ¥5.76

code interpreter: ¥23.04

DeepSeek

Model name

Model code

Maximum input tokens

Pay-as-you-go input

(per 10K TPM/hour)

Pay-as-you-go output

(per 1K TPM/hour)

Provisioned input

(per 10K TPM/day)

Provisioned output

(per 1K TPM/day)

DeepSeek-v4-Pro

deepseek-v4-pro

64,000

¥43.2

¥8.64

¥518.4

¥103.68

DeepSeek-v3.2

deepseek-v3.2

64,000

¥7.2

¥1.08

¥86.4

¥12.96

DeepSeek-v3

deepseek-v3

64,000

¥7.2

¥2.88

¥86.4

¥34.56

Qwen-VL

Model name

Model code

Maximum input tokens

Pay-as-you-go input

(per 10K TPM/hour)

Pay-as-you-go output

(per 1K TPM/hour)

Provisioned input

(per 10K TPM/day)

Provisioned output

(per 1K TPM/day)

Qwen3-VL-Plus-2025-09-23

qwen3-vl-plus-2025-09-23

128,000

¥2.4

¥2.4

¥28.8

¥28.8

Other models

Model name

Model code

Maximum input tokens

Pay-as-you-go input

(per 10K TPM/hour)

Pay-as-you-go output

(per 1K TPM/hour)

Provisioned input

(per 10K TPM/day)

Provisioned output

(per 1K TPM/day)

GLM-5.1

glm-5.1

64,000

¥21.6

¥8.64

¥259.2

¥103.68

Billing by duration

fee = usage (hours) × number of model units × model unit price

For pay-as-you-go, the model unit price is the same as the hourly unit price in the table below. For monthly subscriptions, the formula is number of subscription months × number of model units × monthly unit price.

  • If you cancel a prepaid purchase within the first month, you will be billed at 1.2 times the daily rate (approximately the monthly rate / 30). Each partial day is billed as a full day.

Note

We allocate computing resources for pay-as-you-go model units on a first-come, first-served basis. If a purchase fails, we issue a full refund.

Text generation
Qwen

Model name

Model code

Unit

Hourly price (CNY)

Monthly price (CNY)

Qwen3.6-35B-A3B

qwen3.6-35b-a3b

MU8 x 1

¥47

¥22,400

MU9 x 1

¥51

¥24,600

Qwen3.6-27B

qwen3.6-27b

MU9 x 1

¥51

¥24,600

Qwen3.6-Flash-2026-04-16

qwen3.6-flash-2026-04-16

MU1 x 2

¥108

¥52,236

Qwen3.6-Plus-2026-04-02

qwen3.6-plus-2026-04-02

MU1 x 8

MU1 x 16 (PD-separated mode)

¥432

¥864

¥208,944

¥417,888

Qwen3.5-397B-A17B

qwen3.5-397b-a17b

MU2 x 8

¥504

¥240,288

MU3 x 8

MU3 x 16 (PD-separated mode)

¥1,096

¥2,192

¥527,752

¥1,055,504

MU6 x 16

¥400

¥193,424

Qwen3.5-122B-A10B

qwen3.5-122b-a10b

MU1 x 4

¥216

¥104,472

MU2 x 8

¥504

¥240,288

MU6 x 16

¥400

¥193,424

MU9 x 2

¥102

¥49,200

Qwen3.5-35B-A3B

qwen3.5-35b-a3b

MU1 x 2

¥108

¥52,236

MU2 x 8

¥504

¥240,288

MU8 x 1

¥47

¥22,400

MU9 x 1

¥51

¥24,600

Qwen3.5-27B

qwen3.5-27b

MU1 x 2

¥108

¥52,236

MU9 x 1

¥51

¥24,600

Qwen3.5-9B

qwen3.5-9b

MU1 x 2

¥108

¥52,236

MU8 x 1

¥47

¥22,400

MU9 x 1

¥51

¥24,600

Qwen3.5-Flash-2026-02-23

qwen3.5-flash-2026-02-23

MU1 x 2

¥108

¥52,236

Qwen3.5-Plus-2026-02-15

qwen3.5-plus-2026-02-15

MU1 x 16 (PD-separated mode)

¥864

¥417,888

MU3 x 8

MU3 x 16 (PD-separated mode)

¥1,096

¥2,192

¥527,752

¥1,055,504

Qwen3-235B-A22B-Instruct-2507

qwen3-235b-a22b-instruct-2507

MU1 x 4

¥216

¥104,472

MU2 x 8

¥504

¥240,288

Qwen3-Next-80B-A3B-Instruct

qwen3-next-80b-a3b-instruct

MU1 x 2

¥108

¥52,236

Qwen3-32B

qwen3-32b

MU1 x 4

¥216

¥104,472

MU6 x 4

¥100

¥48,356

Qwen3-30B-A3B

qwen3-30b-a3b

MU9 x 2

¥102

¥49,200

Qwen3-30B-A3B-Instruct-2507

qwen3-30b-a3b-instruct-2507

MU1 x 4

¥216

¥104,472

MU2 x 8

¥504

¥240,288

Qwen3-8B

qwen3-8b

MU1 x 2

¥108

¥52,236

MU2 x 2

¥126

¥60,072

MU5 x 1

¥21

¥10,139

Qwen3-4B

qwen3-4b

MU1 x 2

¥108

¥52,236

MU5 x 1

¥21

¥10,139

Qwen3-1.7B

qwen3-1.7b

MU1 x 2

¥108

¥52,236

MU5 x 1

¥21

¥10,139

Qwen3-Embedding-0.6B

qwen3-embedding-0.6b

MU5 x 1

¥21

¥10,139

MU6 x 1

¥25

¥12,089

Qwen3-MoE-Rerank-0.6B

qwen3-moe-rerank-0.6b

MU5 x 1

¥21

¥10,139

Qwen3-Rerank-0.6B

qwen3-rerank-0.6b

MU5 x 1

¥21

¥10,139

MU6 x 1

¥25

¥12,089

Qwen3-Max-2025-09-23

qwen3-max-2025-09-23

MU2 x 8

¥504

¥240,288

MU3 x 8

¥1,096

¥527,752

Qwen3-Rerank

qwen3-rerank

MU5 x 1

¥21

¥10,139

Qwen2.5-Instruct-72B

qwen2.5-72b-instruct

MU1 x 4

¥216

¥104,472

Qwen2.5-Instruct-32B

qwen2.5-32b-instruct

MU1 x 4

¥216

¥104,472

Qwen2.5-Instruct-14B

qwen2.5-14b-instruct

MU1 x 2

¥108

¥52,236

Qwen2.5-Instruct-7B

qwen2.5-7b-instruct

MU1 x 2

¥108

¥52,236

MU5 x 1

¥21

¥10,139

Qwen2.5-Instruct-3B

qwen2.5-3b-instruct

MU5 x 1

¥21

¥10,139

Qwen-Flash-2025-07-28

qwen-flash-2025-07-28

MU1 x 4

¥216

¥104,472

Qwen-Plus-2025-07-28

qwen-plus-2025-07-28

MU1 x 4

MU1 x 16 (PD-separated mode)

¥216

¥864

¥104,472

¥417,888

Qwen-Plus-2025-12-01

qwen-plus-2025-12-01

MU1 x 4

¥216

¥104,472

GLM

Model name

Model code

Unit specification

Hourly price (CNY)

Monthly price (CNY)

GLM-5

glm-5

MU3 x 16 (PD separation mode)

¥2,192

¥1,055,504

GLM-4.7

glm-4.7

MU6 x 32 (PD separation mode)

¥800

¥386,848

DeepSeek

Model name

Model code

Unit configuration

Hourly rate

Monthly rate

DeepSeek-v4-Flash

deepseek-v4-flash

MU1 x 8

¥432

¥208,944

DeepSeek-v3.2

deepseek-v3.2

MU2 x 16 (pd-separated mode)

¥1,008

¥480,576

More models

Model name

Model code

Unit specification

Hourly price (CNY)

Monthly price (CNY)

MiniMax-M2.5

MiniMax-M2.5

MU1 x 16 (PD-decoupled mode)

PD-decoupled mode: ¥864

PD-decoupled mode: ¥417,888

Kimi-K2.5

Kimi-K2.5

MU2 x 8

¥504

¥240,288

graph TD A[Full model] --> B{Scenarios}; B --> B1[Small model size]; B --> B2[Simple model structure]; B --> B3[Few training epochs]; A --> C{Advantages}; C --> C1[Fast training]; C --> C2[Low resource consumption]; A --> D{Disadvantages}; D --> D1[Low model accuracy]; D --> D2[Prone to underfitting]; E[Incremental model] --> F{Scenarios}; F --> F1[Large model size]; F --> F2[Complex model structure]; F --> F3[Many training epochs]; E --> G{Advantages}; G --> G1[High model accuracy]; E --> H{Disadvantages}; H --> H1[Slow training]; H --> H2[High resource consumption];
  • Instruct - After deployment, the model runs inference in instruct mode.

  • Thinking - After deployment, the model runs inference in thinking mode.

Model deployment type:

  • pd-separated mode: Reduces first-token latency and improves throughput.

    This deployment mode splits model inference into two phases, prefill and decode, which run on separate compute nodes.

Multimodal
Qwen-VL

Model name

Model code

Model unit

Hourly price

Monthly price

Qwen3-VL-235B-A22B-Instruct

qwen3-vl-235b-a22b-instruct

MU1 x 4

¥216

¥104,472

Qwen3-VL-235B-A22B-Thinking

qwen3-vl-235b-a22b-thinking

MU1 x 4

¥216

¥104,472

Qwen3-VL-32B-Instruct

qwen3-vl-32b-instruct

MU2 x 8

¥504

¥240,288

Qwen3-VL-8B-Instruct

qwen3-vl-8b-instruct

MU1 x 2

¥108

¥52,236

Qwen3-VL-4B-Instruct

qwen3-vl-4b-instruct

MU1 x 2

¥108

¥52,236

Qwen3-VL-2B-Instruct

qwen3-vl-2b-instruct

MU5 x 1

¥21

¥10,139

Qwen3-VL-Embedding-2B

qwen3-vl-embedding-2b

MU5 x 1

¥21

¥10,139

Qwen3-VL-Flash-2025-10-15

qwen3-vl-flash-2025-10-15

MU1 x 4

¥216

¥104,472

Qwen3-VL-Plus-2025-09-23

qwen3-vl-plus-2025-09-23

MU1 x 4

¥216

¥104,472

Qwen-VL-Max-2025-08-13

qwen-vl-max-2025-08-13

MU6 x 4

¥100

¥48,356

Qwen-VL-OCR-2025-11-20

qwen-vl-ocr-2025-11-20

MU6 x 4

¥100

¥48,356

Qwen Omni

Model name

Model code

Model unit

Hourly price

Monthly price

Qwen3.5-Omni-Flash

qwen3.5-omni-flash

MU8 x 1

¥47

¥22,400

MU9 x 1

¥51

¥24,600

Qwen3.5-Omni-Plus

qwen3.5-omni-plus

MU9 x 8

¥408

¥196,800

Model types:

  • Instruct - After deployment, the model runs in non-thinking mode.

  • Thinking - After deployment, the model runs in thinking mode.

  • Instruct/Thinking - You can enable or disable thinking mode during model deployment.

Speech synthesis

CosyVoice

Model name

Model code

Model unit specification

Hourly price (CNY)

Monthly price (CNY)

cosyvoice-v3-flash

cosyvoice-v3-flash

MU5

¥21

¥10,139

Token-based billing

cost = (number of input tokens × price per input token) + (number of output tokens × price per output token) (minimum billing unit: 1 token)

  • Token-based billing applies only to custom models fine-tuned from the following foundation models.

Qwen

Foundation model

Model code

Input

CNY/1,000 tokens

Output

CNY/1,000 tokens

Qwen3-32B

qwen3-32b

¥0.002

non-thinking mode: ¥0.008

thinking mode: ¥0.02

Qwen3-14B

qwen3-14b

¥0.001

non-thinking mode: ¥0.004

thinking mode: ¥0.01

Qwen3-8B

qwen3-8b

¥0.0005

non-thinking mode: ¥0.002

thinking mode: ¥0.005

Qwen2.5-72B-Instruct

qwen2.5-72b-instruct

¥0.004

¥0.012

Qwen2.5-32B-Instruct

qwen2.5-32b-instruct

¥0.002

¥0.006

Qwen2.5-14B-Instruct

qwen2.5-14b-instruct

¥0.001

¥0.003

Qwen2.5-7B-Instruct

qwen2.5-7b-instruct

¥0.0005

¥0.001

Qwen-VL

Foundation model

Model code

Input

CNY/1,000 tokens

Output

CNY/1,000 tokens

Qwen3-VL-8B-Instruct

qwen3-vl-8b-instruct

¥0.0005

¥0.002

Qwen2.5-VL-72B-Instruct

qwen2.5-vl-72b-instruct

¥0.016

¥0.048

Qwen2.5-VL-32B-Instruct

qwen2.5-vl-32b-instruct

¥0.008

¥0.024

Qwen2.5-VL-7B-Instruct

qwen2.5-vl-7b-instruct

¥0.002

¥0.005

Image generation models – Wan

Deployment is free. Invocations are billed at the standard rate of the fine-tuned base model. For the training workflow, see Fine-tune image generation models.

Model ID

LoRA Deployment & Invocation Price

wan2.7-image-pro

CNY 0.50/image

wan2.7-image

CNY 0.20/image

FAQ

Q: When does billing for model deployment start?

A: Billing starts when the model status changes to Running. No charges apply during Deploying, Overdue Payment, or Deployment Failed.

For monthly subscriptions, the billing period starts when the status changes to Running.

Q: Am I charged if I cancel a training job?

A: Yes. If you cancel training manually, you are charged for all tokens processed before cancellation. Training jobs interrupted by system errors or other non-user causes are not charged.

Q: How do I view invocation statistics for a deployed model?

A: Visit the Model Monitoring (Beijing), Model Monitoring (Virginia), or Model Monitoring (Singapore) page.

image