AI PaaS billing rules
This document explains the billing methods for AI PaaS model services and cloud resources.
1. Model service pricing
1. Billing
Billing is based on the number of tokens consumed. Each model has separate rates for input and output tokens, and charges are calculated per call.
Model | Pricing tier | Type | Price (CNY/1k tokens) |
qwen-flash | 0k-128k | input | 0.0002 |
input (thinking) | 0.0002 | ||
input (cache hit) | 0.0001 | ||
input (thinking mode cache hit) | 0.0001 | ||
output | 0.0015 | ||
output (thinking) | 0.0015 | ||
128k-256k | input | 0.0006 | |
input (thinking) | 0.0006 | ||
input (cache hit) | 0.0002 | ||
input (thinking mode cache hit) | 0.0002 | ||
output | 0.006 | ||
output (thinking) | 0.006 | ||
256k-1m | input | 0.0012 | |
input (thinking) | 0.0012 | ||
input (cache hit) | 0.0003 | ||
input (thinking mode cache hit) | 0.0003 | ||
output | 0.012 | ||
output (thinking) | 0.012 | ||
qwen3-30b-a3b-instruct-2507 | - | input | 0.0008 |
output | 0.003 | ||
qwen-plus | 0k-128k | input | 0.0008 |
input (thinking) | 0.0008 | ||
input (cache hit) | 0.0002 | ||
input (thinking mode cache hit) | 0.0002 | ||
output | 0.002 | ||
output (thinking) | 0.008 | ||
128k-256k | input | 0.0024 | |
input (thinking) | 0.0024 | ||
input (cache hit) | 0.0005 | ||
input (thinking mode cache hit) | 0.0005 | ||
output | 0.02 | ||
output (thinking) | 0.024 | ||
256k-1m | input | 0.0048 | |
input (thinking) | 0.0048 | ||
input (cache hit) | 0.001 | ||
input (thinking mode cache hit) | 0.001 | ||
output | 0.048 | ||
output (thinking) | 0.064 | ||
qwen3-235b-a22b | - | input | 0.002 |
output | 0.008 | ||
qwen3-32b | - | input | 0.002 |
input (thinking) | 0.002 | ||
output | 0.008 | ||
output (thinking) | 0.02 | ||
qwen3-30b-a3b-thinking-2507 | - | input (thinking) | 0.0008 |
output (thinking) | 0.0075 | ||
qwen-turbo | - | input | 0.0003 |
input (thinking) | 0.0003 | ||
input (cache hit) | 0.0001 | ||
input (thinking mode cache hit) | 0.0001 | ||
output | 0.0006 | ||
output (thinking) | 0.003 | ||
qwen-max-2025-01-25 | - | input | 0.0024 |
output | 0.0096 | ||
qwen-max | - | input | 0.0024 |
input (cache hit) | 0.0005 | ||
output | 0.0096 | ||
qwen-vl-plus | - | input | 0.0008 |
input (cache hit) | 0.0002 | ||
output | 0.002 | ||
qwen-vl-max | - | input | 0.0016 |
input (cache hit) | 0.0004 | ||
output | 0.004 | ||
qwen2.5-vl-72b-instruct | - | input | 0.016 |
output | 0.048 | ||
qwen3-next-80b-a3b-instruct | - | input | 0.001 |
output | 0.004 | ||
qwen3-235b-a22b-instruct-2507 | - | input | 0.002 |
output | 0.008 | ||
qwen3-vl-235b-a22b-instruct | - | input | 0.002 |
output | 0.008 | ||
qwen3-vl-flash | 0k-32k | input | 0.0002 |
input (cache hit) | 0.0001 | ||
output | 0.0015 | ||
32k-128k | input | 0.0003 | |
input (cache hit) | 0.0001 | ||
output | 0.003 | ||
128k-256k | input | 0.0006 | |
input (cache hit) | 0.0002 | ||
output | 0.006 | ||
qwen3-vl-plus | 0k-32k | input | 0.001 |
input (cache hit) | 0.0002 | ||
output | 0.01 | ||
32k-128k | input | 0.0015 | |
input (cache hit) | 0.0003 | ||
output | 0.015 | ||
128k-256k | input | 0.003 | |
input (cache hit) | 0.0006 | ||
output | 0.03 | ||
qwen-plus-2025-12-01 | 0k-128k | input | 0.0008 |
input (thinking) | 0.0008 | ||
output | 0.002 | ||
output (thinking) | 0.008 | ||
128k-256k | input | 0.0024 | |
input (thinking) | 0.0024 | ||
output | 0.02 | ||
output (thinking) | 0.024 | ||
256k-1m | input | 0.0048 | |
input (thinking) | 0.0048 | ||
output | 0.048 | ||
output (thinking) | 0.064 | ||
qwen3-coder-plus | 0k-32k | input | 0.004 |
input (cache hit) | 0.0008 | ||
output | 0.016 | ||
32k-128k | input | 0.006 | |
input (cache hit) | 0.0012 | ||
output | 0.024 | ||
128k-256k | input | 0.01 | |
input (cache hit) | 0.002 | ||
output | 0.04 | ||
256k-1m | input | 0.02 | |
input (cache hit) | 0.004 | ||
output | 0.08 | ||
qwen3-max | 0k-32k | input | 0.0032 |
input (cache hit) | 0.0007 | ||
output | 0.0128 | ||
32k-128k | input | 0.0064 | |
input (cache hit) | 0.0013 | ||
output | 0.0256 | ||
128k-256k | input | 0.0096 | |
input (cache hit) | 0.002 | ||
output | 0.0384 | ||
qwen3-vl-plus-2025-12-19 | 0k-32k | input | 0.001 |
output | 0.01 | ||
32k-128k | input | 0.0015 | |
output | 0.015 | ||
128k-256k | input | 0.003 | |
output | 0.03 | ||
qwen-plus-2025-07-28 | 0k-128k | input | 0.0008 |
input (thinking) | 0.0008 | ||
output | 0.002 | ||
output (thinking) | 0.008 | ||
128k-256k | input | 0.0024 | |
input (thinking) | 0.0024 | ||
output | 0.02 | ||
output (thinking) | 0.024 | ||
256k-1m | input | 0.0048 | |
input (thinking) | 0.0048 | ||
output | 0.048 | ||
output (thinking) | 0.064 | ||
qwen3-0.6b | - | input | 0.0003 |
input (thinking) | 0.0003 | ||
output | 0.0012 | ||
output (thinking) | 0.003 | ||
qwen3-1.7b | - | input | 0.0003 |
input (thinking) | 0.0003 | ||
output | 0.0012 | ||
output (thinking) | 0.003 | ||
qwen3-4b | - | input | 0.0003 |
input (thinking) | 0.0003 | ||
output | 0.0012 | ||
output (thinking) | 0.003 | ||
qwen3-max-2026-01-23 | 0k-32k | input | 0.0025 |
output | 0.01 | ||
32k-128k | input | 0.004 | |
output | 0.016 | ||
128k-256k | input | 0.007 | |
output | 0.028 | ||
qwen3.5-plus | 0k-128k | input | 0.0008 |
output | 0.0048 | ||
128k-256k | input | 0.002 | |
output | 0.012 | ||
256k-1m | input | 0.004 | |
output | 0.024 | ||
qwen3.5-flash | 0k-128k | input | 0.0002 |
output | 0.002 | ||
128k-256k | input | 0.0008 | |
output | 0.008 | ||
256k-1m | input | 0.0012 | |
output | 0.012 |
2. Cloud resource billing methods
SAE | |
PAI | |
ApsaraMQ for RabbitMQ | |
PolarDB | |
ApsaraDB for ClickHouse | |
DTS | |
Alibaba Cloud DNS PrivateZone | |
Alibaba Cloud Elasticsearch | |
ApsaraDB for RDS | |
Tair | |
SLB | |
MSE | |
Log Service | |
ACK | |
OpenSearch | |
NAT Gateway | |
DMS | |
CloudMonitor | |
ECS | |
OSS |