This topic describes the billing rules and pricing for model training and model deployment on Alibaba Cloud Model Studio.
Training billing
Text generation models – Qwen
For the training workflow, see Model tuning. After training completes, deploy the new model before evaluating or calling it.
Method | Billed by training tokens |
Formula | Model training fee = (Total tokens in training data + Total tokens in mixed training data) × Number of epochs × Training unit price (Minimum billing unit: 1 token) View the estimated training fee at the bottom of the model training console, and click Computing Details to view the total number of training tokens, number of epochs, and training unit price. |
Qwen
Model service | Model identifier | Price |
Qwen3.6-Flash-2026-04-16 | qwen3.6-flash-2026-04-16 | CNY 0.05 per 1,000 tokens |
Qwen3.5-27B | qwen3.5-27b | CNY 0.05 per 1,000 tokens |
Qwen3.5-9B | qwen3.5-9b | CNY 0.02 per 1,000 tokens |
Qwen3.5-Flash-2026-02-23 | qwen3.5-flash-2026-02-23 | CNY 0.05 per 1,000 tokens |
Qwen3-32B | qwen3-32b | CNY 0.04 per 1,000 tokens |
Qwen3-30B-A3B-Instruct-2507 | qwen3-30b-a3b-instruct-2507 | CNY 0.03 per 1,000 tokens |
Qwen3-14B | qwen3-14b | CNY 0.03 per 1,000 tokens |
Qwen3-8B | qwen3-8b | CNY 0.006 per 1,000 tokens |
Qwen3-1.7B | qwen3-1.7b | CNY 0.0045 per 1,000 tokens |
Qwen3-0.6B | qwen3-0.6b | CNY 0.003 per 1,000 tokens |
Qwen2.5-72B-Instruct | qwen2.5-72b-instruct | CNY 0.15 per 1,000 tokens |
Qwen2.5-32B-Instruct | qwen2.5-32b-instruct | CNY 0.03 per 1,000 tokens |
Qwen2.5-14B-Instruct | qwen2.5-14b-instruct | CNY 0.03 per 1,000 tokens |
Qwen2.5-7B-Instruct | qwen2.5-7b-instruct | CNY 0.006 per 1,000 tokens |
Qwen-Plus-Character-2025-11-06 | qwen-plus-character-2025-11-06 | CNY 0.15 per 1,000 tokens |
Qwen-VL
Model service | Model identifier | Price |
Qwen3-VL-8B-Instruct | qwen3-vl-8b-instruct | CNY 0.012 per 1,000 tokens |
Qwen3-VL-8B-Thinking | qwen3-vl-8b-thinking | CNY 0.012 per 1,000 tokens |
Qwen3-VL-4B-Instruct | qwen3-vl-4b-instruct | CNY 0.006 per 1,000 tokens |
Qwen2.5-VL-72B-Instruct | qwen2.5-vl-72b-instruct | CNY 0.05 per 1,000 tokens |
Qwen2.5-VL-32B-Instruct | qwen2.5-vl-32b-instruct | CNY 0.02 per 1,000 tokens |
Qwen2.5-VL-7B-Instruct | qwen2.5-vl-7b-instruct | CNY 0.01 per 1,000 tokens |
Image generation models – Wan
For the training workflow, see Fine-tune image generation models. After training completes, deploy the new model before calling it.
Method | Billed by training tokens |
Formula | Model training fee = Total training tokens × Training unit price (Billing unit: per 1,000 tokens) |
Model | Code | Training price (per 1K tokens) |
Wan image generation | wan2.7-image-pro | CNY 0.08 |
Wan image generation | wan2.7-image | CNY 0.08 |
Video generation models – Wan
For the training workflow, see Fine-tuning video generation models. After training completes, deploy the new model before calling it.
Method | Billed by training tokens |
Formula | Model training fee = Total training tokens × Training unit price (Billing unit: per 1,000 tokens) |
Model | Code | Training price (per 1K tokens) |
Wan image-to-video (first frame-based) | wan2.2-i2v-flash | CNY 0.06 |
wan2.5-i2v-preview | CNY 0.32 | |
Image-to-video (first and last frame-based) | wan2.2-kf2v-flash | CNY 0.06 |
Deployment billing
Text generation models: Qwen
Pay-as-you-go
cost = usage duration × (input TPM price × input TPM + output TPM price × output TPM)
For pay-as-you-go billing, usage is calculated by the hour, and the "1-hour duration" column lists the unit price. For subscription billing, usage is calculated by the day, and the "1-day duration" column lists the unit price.
-
A subscription activates upon payment and is valid for N days, expiring at 23:59 on the Nth day. For orders placed after 22:00, the expiration date is automatically extended by 1 day.
-
After a subscription expires, service is suspended following a 2-hour grace period. Resources are then retained for 14 hours before being released.
-
You cannot terminate a subscription early.
-
For pay-as-you-go, if an account has an overdue balance, its deployed resources are retained and billing continues for 24 hours before they are automatically released.
When a model's input exceeds its maximum token limit or purchased TPM, calls to that model automatically switch to the pay-as-you-go mode. In this mode, inference performance may decrease, rate limiting is subject to the public traffic limits of the current snapshot model in your workspace, and fees are charged at the standard pay-as-you-go rate.
-
The API response headers will include
x-dashscope-ptu-overflow:true. -
To view TPM statistics, go to the Model Studio console.
See Configuration Downgrade Refund Rules for details on fee adjustments and refunds for a scale-in (configuration downgrade).
Qwen
|
Model name |
Model code |
Maximum input tokens |
Pay-as-you-go input (per 10K TPM/hour) |
Pay-as-you-go output (per 1K TPM/hour) |
Provisioned input (per 10K TPM/day) |
Provisioned output (per 1K TPM/day) |
|
Qwen3.7-Max-2026-05-20 |
qwen3.7-max-2026-05-20 |
128,000 |
¥28.8 |
¥8.64 |
¥345.6 |
¥103.68 |
|
Qwen3.6-Flash-2026-04-16 |
qwen3.6-flash-2026-04-16 |
128,000 |
¥2.88 |
¥1.73 |
¥34.56 |
¥20.74 |
|
Qwen3.6-Plus-2026-04-02 |
qwen3.6-plus-2026-04-02 |
128,000 |
¥4.8 |
¥2.88 |
¥57.6 |
¥34.56 |
|
Qwen3.5-Plus-2026-04-20 |
qwen3.5-plus-2026-04-20 |
128,000 |
¥1.92 |
¥1.15 |
¥23.04 |
¥13.82 |
|
Qwen3-Max-2025-09-23 |
qwen3-max-2025-09-23 |
128,000 |
¥7.68 |
¥3.08 |
¥92.16 |
¥36.96 |
|
Qwen-Flash-2025-07-28 |
qwen-flash-2025-07-28 |
128,000 |
¥0.36 |
¥0.36 |
¥4.32 |
¥4.32 |
|
Qwen-Plus-2025-12-01 |
qwen-plus-2025-12-01 |
128,000 |
¥1.92 |
standard: ¥0.48 code interpreter: ¥1.92 |
¥23.04 |
standard: ¥5.76 code interpreter: ¥23.04 |
DeepSeek
|
Model name |
Model code |
Maximum input tokens |
Pay-as-you-go input (per 10K TPM/hour) |
Pay-as-you-go output (per 1K TPM/hour) |
Provisioned input (per 10K TPM/day) |
Provisioned output (per 1K TPM/day) |
|
DeepSeek-v4-Pro |
deepseek-v4-pro |
64,000 |
¥43.2 |
¥8.64 |
¥518.4 |
¥103.68 |
|
DeepSeek-v3.2 |
deepseek-v3.2 |
64,000 |
¥7.2 |
¥1.08 |
¥86.4 |
¥12.96 |
|
DeepSeek-v3 |
deepseek-v3 |
64,000 |
¥7.2 |
¥2.88 |
¥86.4 |
¥34.56 |
Qwen-VL
|
Model name |
Model code |
Maximum input tokens |
Pay-as-you-go input (per 10K TPM/hour) |
Pay-as-you-go output (per 1K TPM/hour) |
Provisioned input (per 10K TPM/day) |
Provisioned output (per 1K TPM/day) |
|
Qwen3-VL-Plus-2025-09-23 |
qwen3-vl-plus-2025-09-23 |
128,000 |
¥2.4 |
¥2.4 |
¥28.8 |
¥28.8 |
Other models
|
Model name |
Model code |
Maximum input tokens |
Pay-as-you-go input (per 10K TPM/hour) |
Pay-as-you-go output (per 1K TPM/hour) |
Provisioned input (per 10K TPM/day) |
Provisioned output (per 1K TPM/day) |
|
GLM-5.1 |
glm-5.1 |
64,000 |
¥21.6 |
¥8.64 |
¥259.2 |
¥103.68 |
Billing by duration
fee = usage (hours) × number of model units × model unit price
For pay-as-you-go, the model unit price is the same as the hourly unit price in the table below. For monthly subscriptions, the formula is number of subscription months × number of model units × monthly unit price.
-
If you cancel a prepaid purchase within the first month, you will be billed at 1.2 times the daily rate (approximately the monthly rate / 30). Each partial day is billed as a full day.
We allocate computing resources for pay-as-you-go model units on a first-come, first-served basis. If a purchase fails, we issue a full refund.
Text generation
Qwen
|
Model name |
Model code |
Unit |
Hourly price (CNY) |
Monthly price (CNY) |
|
Qwen3.6-35B-A3B |
qwen3.6-35b-a3b |
MU8 x 1 |
¥47 |
¥22,400 |
|
MU9 x 1 |
¥51 |
¥24,600 |
||
|
Qwen3.6-27B |
qwen3.6-27b |
MU9 x 1 |
¥51 |
¥24,600 |
|
Qwen3.6-Flash-2026-04-16 |
qwen3.6-flash-2026-04-16 |
MU1 x 2 |
¥108 |
¥52,236 |
|
Qwen3.6-Plus-2026-04-02 |
qwen3.6-plus-2026-04-02 |
MU1 x 8 MU1 x 16 (PD-separated mode) |
¥432 ¥864 |
¥208,944 ¥417,888 |
|
Qwen3.5-397B-A17B |
qwen3.5-397b-a17b |
MU2 x 8 |
¥504 |
¥240,288 |
|
MU3 x 8 MU3 x 16 (PD-separated mode) |
¥1,096 ¥2,192 |
¥527,752 ¥1,055,504 |
||
|
MU6 x 16 |
¥400 |
¥193,424 |
||
|
Qwen3.5-122B-A10B |
qwen3.5-122b-a10b |
MU1 x 4 |
¥216 |
¥104,472 |
|
MU2 x 8 |
¥504 |
¥240,288 |
||
|
MU6 x 16 |
¥400 |
¥193,424 |
||
|
MU9 x 2 |
¥102 |
¥49,200 |
||
|
Qwen3.5-35B-A3B |
qwen3.5-35b-a3b |
MU1 x 2 |
¥108 |
¥52,236 |
|
MU2 x 8 |
¥504 |
¥240,288 |
||
|
MU8 x 1 |
¥47 |
¥22,400 |
||
|
MU9 x 1 |
¥51 |
¥24,600 |
||
|
Qwen3.5-27B |
qwen3.5-27b |
MU1 x 2 |
¥108 |
¥52,236 |
|
MU9 x 1 |
¥51 |
¥24,600 |
||
|
Qwen3.5-9B |
qwen3.5-9b |
MU1 x 2 |
¥108 |
¥52,236 |
|
MU8 x 1 |
¥47 |
¥22,400 |
||
|
MU9 x 1 |
¥51 |
¥24,600 |
||
|
Qwen3.5-Flash-2026-02-23 |
qwen3.5-flash-2026-02-23 |
MU1 x 2 |
¥108 |
¥52,236 |
|
Qwen3.5-Plus-2026-02-15 |
qwen3.5-plus-2026-02-15 |
MU1 x 16 (PD-separated mode) |
¥864 |
¥417,888 |
|
MU3 x 8 MU3 x 16 (PD-separated mode) |
¥1,096 ¥2,192 |
¥527,752 ¥1,055,504 |
||
|
Qwen3-235B-A22B-Instruct-2507 |
qwen3-235b-a22b-instruct-2507 |
MU1 x 4 |
¥216 |
¥104,472 |
|
MU2 x 8 |
¥504 |
¥240,288 |
||
|
Qwen3-Next-80B-A3B-Instruct |
qwen3-next-80b-a3b-instruct |
MU1 x 2 |
¥108 |
¥52,236 |
|
Qwen3-32B |
qwen3-32b |
MU1 x 4 |
¥216 |
¥104,472 |
|
MU6 x 4 |
¥100 |
¥48,356 |
||
|
Qwen3-30B-A3B |
qwen3-30b-a3b |
MU9 x 2 |
¥102 |
¥49,200 |
|
Qwen3-30B-A3B-Instruct-2507 |
qwen3-30b-a3b-instruct-2507 |
MU1 x 4 |
¥216 |
¥104,472 |
|
MU2 x 8 |
¥504 |
¥240,288 |
||
|
Qwen3-8B |
qwen3-8b |
MU1 x 2 |
¥108 |
¥52,236 |
|
MU2 x 2 |
¥126 |
¥60,072 |
||
|
MU5 x 1 |
¥21 |
¥10,139 |
||
|
Qwen3-4B |
qwen3-4b |
MU1 x 2 |
¥108 |
¥52,236 |
|
MU5 x 1 |
¥21 |
¥10,139 |
||
|
Qwen3-1.7B |
qwen3-1.7b |
MU1 x 2 |
¥108 |
¥52,236 |
|
MU5 x 1 |
¥21 |
¥10,139 |
||
|
Qwen3-Embedding-0.6B |
qwen3-embedding-0.6b |
MU5 x 1 |
¥21 |
¥10,139 |
|
MU6 x 1 |
¥25 |
¥12,089 |
||
|
Qwen3-MoE-Rerank-0.6B |
qwen3-moe-rerank-0.6b |
MU5 x 1 |
¥21 |
¥10,139 |
|
Qwen3-Rerank-0.6B |
qwen3-rerank-0.6b |
MU5 x 1 |
¥21 |
¥10,139 |
|
MU6 x 1 |
¥25 |
¥12,089 |
||
|
Qwen3-Max-2025-09-23 |
qwen3-max-2025-09-23 |
MU2 x 8 |
¥504 |
¥240,288 |
|
MU3 x 8 |
¥1,096 |
¥527,752 |
||
|
Qwen3-Rerank |
qwen3-rerank |
MU5 x 1 |
¥21 |
¥10,139 |
|
Qwen2.5-Instruct-72B |
qwen2.5-72b-instruct |
MU1 x 4 |
¥216 |
¥104,472 |
|
Qwen2.5-Instruct-32B |
qwen2.5-32b-instruct |
MU1 x 4 |
¥216 |
¥104,472 |
|
Qwen2.5-Instruct-14B |
qwen2.5-14b-instruct |
MU1 x 2 |
¥108 |
¥52,236 |
|
Qwen2.5-Instruct-7B |
qwen2.5-7b-instruct |
MU1 x 2 |
¥108 |
¥52,236 |
|
MU5 x 1 |
¥21 |
¥10,139 |
||
|
Qwen2.5-Instruct-3B |
qwen2.5-3b-instruct |
MU5 x 1 |
¥21 |
¥10,139 |
|
Qwen-Flash-2025-07-28 |
qwen-flash-2025-07-28 |
MU1 x 4 |
¥216 |
¥104,472 |
|
Qwen-Plus-2025-07-28 |
qwen-plus-2025-07-28 |
MU1 x 4 MU1 x 16 (PD-separated mode) |
¥216 ¥864 |
¥104,472 ¥417,888 |
|
Qwen-Plus-2025-12-01 |
qwen-plus-2025-12-01 |
MU1 x 4 |
¥216 |
¥104,472 |
GLM
|
Model name |
Model code |
Unit specification |
Hourly price (CNY) |
Monthly price (CNY) |
|
GLM-5 |
glm-5 |
MU3 x 16 (PD separation mode) |
¥2,192 |
¥1,055,504 |
|
GLM-4.7 |
glm-4.7 |
MU6 x 32 (PD separation mode) |
¥800 |
¥386,848 |
DeepSeek
|
Model name |
Model code |
Unit configuration |
Hourly rate |
Monthly rate |
|
DeepSeek-v4-Flash |
deepseek-v4-flash |
MU1 x 8 |
¥432 |
¥208,944 |
|
DeepSeek-v3.2 |
deepseek-v3.2 |
MU2 x 16 (pd-separated mode) |
¥1,008 |
¥480,576 |
More models
|
Model name |
Model code |
Unit specification |
Hourly price (CNY) |
Monthly price (CNY) |
|
MiniMax-M2.5 |
MiniMax-M2.5 |
MU1 x 16 (PD-decoupled mode) |
PD-decoupled mode: ¥864 |
PD-decoupled mode: ¥417,888 |
|
Kimi-K2.5 |
Kimi-K2.5 |
MU2 x 8 |
¥504 |
¥240,288 |
-
Instruct - After deployment, the model runs inference in instruct mode.
-
Thinking - After deployment, the model runs inference in thinking mode.
Model deployment type:
-
pd-separated mode: Reduces first-token latency and improves throughput.
This deployment mode splits model inference into two phases, prefill and decode, which run on separate compute nodes.
Multimodal
Qwen-VL
|
Model name |
Model code |
Model unit |
Hourly price |
Monthly price |
|
Qwen3-VL-235B-A22B-Instruct |
qwen3-vl-235b-a22b-instruct |
MU1 x 4 |
¥216 |
¥104,472 |
|
Qwen3-VL-235B-A22B-Thinking |
qwen3-vl-235b-a22b-thinking |
MU1 x 4 |
¥216 |
¥104,472 |
|
Qwen3-VL-32B-Instruct |
qwen3-vl-32b-instruct |
MU2 x 8 |
¥504 |
¥240,288 |
|
Qwen3-VL-8B-Instruct |
qwen3-vl-8b-instruct |
MU1 x 2 |
¥108 |
¥52,236 |
|
Qwen3-VL-4B-Instruct |
qwen3-vl-4b-instruct |
MU1 x 2 |
¥108 |
¥52,236 |
|
Qwen3-VL-2B-Instruct |
qwen3-vl-2b-instruct |
MU5 x 1 |
¥21 |
¥10,139 |
|
Qwen3-VL-Embedding-2B |
qwen3-vl-embedding-2b |
MU5 x 1 |
¥21 |
¥10,139 |
|
Qwen3-VL-Flash-2025-10-15 |
qwen3-vl-flash-2025-10-15 |
MU1 x 4 |
¥216 |
¥104,472 |
|
Qwen3-VL-Plus-2025-09-23 |
qwen3-vl-plus-2025-09-23 |
MU1 x 4 |
¥216 |
¥104,472 |
|
Qwen-VL-Max-2025-08-13 |
qwen-vl-max-2025-08-13 |
MU6 x 4 |
¥100 |
¥48,356 |
|
Qwen-VL-OCR-2025-11-20 |
qwen-vl-ocr-2025-11-20 |
MU6 x 4 |
¥100 |
¥48,356 |
Qwen Omni
|
Model name |
Model code |
Model unit |
Hourly price |
Monthly price |
|
Qwen3.5-Omni-Flash |
qwen3.5-omni-flash |
MU8 x 1 |
¥47 |
¥22,400 |
|
MU9 x 1 |
¥51 |
¥24,600 |
||
|
Qwen3.5-Omni-Plus |
qwen3.5-omni-plus |
MU9 x 8 |
¥408 |
¥196,800 |
Model types:
-
Instruct - After deployment, the model runs in non-thinking mode.
-
Thinking - After deployment, the model runs in thinking mode.
-
Instruct/Thinking - You can enable or disable thinking mode during model deployment.
Speech synthesis
CosyVoice
|
Model name |
Model code |
Model unit specification |
Hourly price (CNY) |
Monthly price (CNY) |
|
cosyvoice-v3-flash |
cosyvoice-v3-flash |
MU5 |
¥21 |
¥10,139 |
Token-based billing
cost = (number of input tokens × price per input token) + (number of output tokens × price per output token) (minimum billing unit: 1 token)
-
Token-based billing applies only to custom models fine-tuned from the following foundation models.
Qwen
|
Foundation model |
Model code |
Input CNY/1,000 tokens |
Output CNY/1,000 tokens |
|
Qwen3-32B |
qwen3-32b |
¥0.002 |
non-thinking mode: ¥0.008 thinking mode: ¥0.02 |
|
Qwen3-14B |
qwen3-14b |
¥0.001 |
non-thinking mode: ¥0.004 thinking mode: ¥0.01 |
|
Qwen3-8B |
qwen3-8b |
¥0.0005 |
non-thinking mode: ¥0.002 thinking mode: ¥0.005 |
|
Qwen2.5-72B-Instruct |
qwen2.5-72b-instruct |
¥0.004 |
¥0.012 |
|
Qwen2.5-32B-Instruct |
qwen2.5-32b-instruct |
¥0.002 |
¥0.006 |
|
Qwen2.5-14B-Instruct |
qwen2.5-14b-instruct |
¥0.001 |
¥0.003 |
|
Qwen2.5-7B-Instruct |
qwen2.5-7b-instruct |
¥0.0005 |
¥0.001 |
Qwen-VL
|
Foundation model |
Model code |
Input CNY/1,000 tokens |
Output CNY/1,000 tokens |
|
Qwen3-VL-8B-Instruct |
qwen3-vl-8b-instruct |
¥0.0005 |
¥0.002 |
|
Qwen2.5-VL-72B-Instruct |
qwen2.5-vl-72b-instruct |
¥0.016 |
¥0.048 |
|
Qwen2.5-VL-32B-Instruct |
qwen2.5-vl-32b-instruct |
¥0.008 |
¥0.024 |
|
Qwen2.5-VL-7B-Instruct |
qwen2.5-vl-7b-instruct |
¥0.002 |
¥0.005 |
Image generation models – Wan
Deployment is free. Invocations are billed at the standard rate of the fine-tuned base model. For the training workflow, see Fine-tune image generation models.
Model ID | LoRA Deployment & Invocation Price |
wan2.7-image-pro | CNY 0.50/image |
wan2.7-image | CNY 0.20/image |
FAQ
Q: When does billing for model deployment start?
A: Billing starts when the model status changes to Running. No charges apply during Deploying, Overdue Payment, or Deployment Failed.
For monthly subscriptions, the billing period starts when the status changes to Running.
Q: Am I charged if I cancel a training job?
A: Yes. If you cancel training manually, you are charged for all tokens processed before cancellation. Training jobs interrupted by system errors or other non-user causes are not charged.
Q: How do I view invocation statistics for a deployed model?
A: Visit the Model Monitoring (Beijing), Model Monitoring (Virginia), or Model Monitoring (Singapore) page.
