How to query model inference bills, allocate costs, settle overdue payments, and stop billing-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

This topic describes how to query billing details, allocate costs, handle overdue payments, and stop billing.

Query bills

Bills are generated only after a call is completed. Bills for model inference are generated with minute-level granularity, typically within 2 to 10 minutes. Bills for services such as batch inference, model training, and knowledge base are generated hourly. During peak business hours, the time displayed in the system is considered the final billing time.

Cost overview

Log on to the Model Studio console. Click the Model tab at the top. In the left-side navigation pane, choose Usage & Billing > Cost overview, then select a billing month.

This page displays costs for model inference only. To view costs for services such as model training and knowledge base, see Billing Details.

View total consumption and breakdown: The top of the page shows the Total amount for the month, broken down into Subscription (such as Token Plan, provisioned throughput, model units, and savings plan) and Bill (for pay-as-you-go model calls and training). Click View details on the Subscription or Bill card to view itemized costs. The Bill trend chart below displays only pay-as-you-go amounts and excludes prepaid subscription fees. When viewing data for the current month, the Bill amount is typically updated first, while the Bill trend chart may lag slightly.
Query costs by model or API Key: In the Bill trend section, select a target model from the Model drop-down list or filter by API Key ID, then switch to the List view. The Payable amount column in the List view shows the cumulative monthly cost for the selected item.
Compare spending trends: Set Grouping to Category and compare the spending trends for model deployment, inference, and training on a Daily or Monthly basis.
Set billing alerts: On the Bill card, click Edit next to Bill alert. In the Usage limit & alerts panel, enable a monthly limit, set a threshold, and configure email/SMS notifications. You are notified when your spending reaches the threshold. This helps you avoid service interruptions caused by overdue payments.

Billing details

Bills for large model inference, deployment, and training can be broken down for review by API Key ID, workspace ID, model name, input/output type, invocation channel, and instance tag.

1. Download the bill

On the Billing Details page, select a billing month.
Select Product Name as Alibaba Cloud Model Studio, and click Search.
In the upper-right corner of the bill list, click the export icon to download the bill.
Open the file, locate the Instance ID (Billing Granularity) column, and interpret it using the rules described in the next section.

2. Interpret key fields

The "Instance ID (Billing Granularity)" field uses a semicolon (;) as the delimiter. The full format is ApiKeyID;workspace ID;model name;input/output type;invocation channel;free quota exhaustion flag.

Format A: Standard call (with ApiKeyID)
- Example: 12xxx;llm-xxx;qwen-max;output_token;app;0
- This represents the following information in order: ApiKeyID;workspace ID;model name;input/output type;invocation channel;free quota exhaustion flag.
Format B: Console call (without ApiKeyID)
- Example: ;llm-xxx;qwen-max;output_token;app;0
- This represents the following information in order: ;workspace ID;model name;input/output type;invocation channel;free quota exhaustion flag.
- If the ApiKeyID is not included, the cost is typically from a call made in the Alibaba Cloud Model Studio console, not an API call.

"Instance Tag": If you use tags for cost allocation, this column uses the following format:

Example: key:test1 value:test1; key:test2 value:test2
key represents the tag key, and value represents the tag value.
Multiple tags are separated by semicolons (;).

3. Data traceability and terms

Query API Key: Copy the ApiKeyID from your bill and go to the Model Studio API Key Management page to find the corresponding key name.
To find a workspace, copy the Workspace ID from your bill, go to the Model Studio console, click Default Workspace at the bottom of the left-side menu, and then click Workspace Details to confirm the specific workspace ID. You can also switch to other workspaces.
Invocation channel descriptions:
- app: A call made from an application (via code).
- bmp: A call made from the Playground in the console.
- assistant-api: A call made via the Assistant API.

Cost allocation

Attach a tag to a workspace to allocate costs by department or project.

Get workspace information: In Workspaces, identify the Workspace ID of the workspace to tag (for example, llm-xxx). Then, in Billing Details, find the workspace's region.
Attach a tag:
1. On the Tag Management page, select Attach Tags to Resources.
2. Set Resource Selection Method to Enter Multiple Resource IDs. On the product tab, search for and select Alibaba Cloud Model Studio: Workspace, select the corresponding workspace region, and enter the Workspace ID in the resource ID field. Then, click the button to attach the tag.
3. On the attach tags page, create a tag key-value pair or use an existing tag to bind to the workspace. After you enter the key-value pair or select a preset tag, click OK.
4. Enable the tag. Navigate to Cost Allocation Tags. In the tag key search box, enter the tag's key, and click Search. Find your tag and click Enable in the Actions column.
Verify: The cost allocation takes effect the next day (T+1). You can verify that the tag is correctly attached to the workspace by checking the Instance Tag column on the Billing Details page.

Overdue payments

An account is considered overdue if its available balance is less than zero, which may lead to the suspension of services such as model calls. You can check your balance by hovering over the available balance area on the Billing Management homepage. The balance is calculated by using the following formula: Available Balance = (Cash Balance + Credit Limit) - (Unsettled Amount for Current Month + Unsettled Amount for Previous Months).

Impact of overdue payments: An overdue account causes pay-as-you-go services such as model calls to be suspended. Whether you can continue depends on your billing method.
- Free quota, savings plans, and resource plans: All three are used to offset pay-as-you-go charges. During an overdue period, you cannot call models even if you still have a remaining balance. Service is restored after you settle the overdue payment.
- If you have purchased a Coding Plan or Token Plan: The plan's quota is independent of your account balance and can be used during the overdue period. However, automatic renewal will fail, and you cannot renew the plan after it expires.
Settle overdue payments: On the Expenses and Costs page, click Recharge, enter an amount, and then complete the payment.
Prevent overdue payments: On the High consumption alerts page, set a spending threshold to be notified when the threshold is reached.

Stop billing

If you no longer use Model Studio, follow the instructions below to stop the related services and prevent further charges.

Stop model inference: Stop making API calls from your code and stop using the Playground in the console to prevent further charges. To prevent accidental calls, you can delete your keys on the API-KEY page.
Stop model training: You are not charged when no model training tasks are running.
Cancel a Coding Plan subscription: Coding Plan is a monthly subscription product that automatically stops at the end of the subscription period. Mid-term cancellation and refunds are not supported. If you have auto-renewal enabled, disable it on the Coding Plan page.
Unsubscribe from Token Plan Team Edition: On the My Subscriptions page of the Token Plan console, you can unsubscribe seats that have not been used, and a refund is issued to the original payment account. If you do not want to renew your subscription, disable auto-renewal.
Stop model deployment: The procedure depends on the billing method you chose for the deployment:
- Pay-as-you-go by model calls: Unpublish the deployed model, or delete the API Key to prevent accidental calls.
- Pay-as-you-go by computing resource usage: Unpublish the deployed model.
- Subscription (prepaid monthly): Unpublish the deployed model. Then, navigate to the Refund Management page to unsubscribe from the instance. Your refund is the original payment minus the cost of usage. For more information, see the refund policy.

FAQ

Why can't I find a bill after calling a model?

Cause:

Billing latency: Model inference bills are aggregated by the minute and typically appear 2 to 10 minutes after a call. Bills for batch inference, model training, and knowledge base are aggregated hourly. Bill generation may be further delayed during peak hours.
Use of a non-commercial model: Models in public preview or invite-only testing do not generate billing records.

Solution: Wait for the billing interval to pass and then check again.

Why does the same model have multiple entries in my bill?

Cause: The same model is billed separately based on the billing type (such as input tokens, output tokens, or cache hits) and the invocation channel (such as an API call or console experience). For example, a single API call to qwen3.6-plus generates two entries: one for "input tokens" and one for "output tokens".

Solution: Use the Instance ID field in Billing Details to understand the specifics of each line item.

Many bill entries are named "Large Model Text Consumption". How do I identify the model for each entry?

Cause: The "Billable Item" column on the bill is uniformly labeled "Large Model Text Consumption" and does not show the specific model name.

Solution: Check the Instance ID (Billing Granularity) column on the Billing Details page. This field is a semicolon-separated string. The part of the string that immediately follows the workspace ID (such as llm-xxx) is the model name. For example, in 12xxx;llm-xxx;qwen3.6-plus;context_0-128k_input_token;bmp;0, the model is qwen3.6-plus.

Where can I view model call counts and statistics?

Go to the Alibaba Cloud Model Studio console, select the target region in the upper-right corner. Click the Model tab at the top. In the left-side navigation pane, choose Usage & Billing > Model usage.

Is pay-as-you-go billed in real time?

No. Alibaba Cloud uses a "reserve and settle monthly" model for pay-as-you-go billing. The system reserves an amount from your available balance to cover usage, and then generates a final bill and deducts the actual cost at the end of the billing cycle (early in the next month).

How do I export detailed bills for reimbursement?

See How to export detailed bills.

How do I top up my account?

See How to top up and make payments.

Why do I have an overdue payment even though I have barely used the service?

Reason: Additional features of Model Studio, such as web search, are billed separately (post-paid) based on the number of calls and are invoiced separately from model inference fees. Even if you have not actively used the console recently, if the enable_search parameter is enabled in applications or code that you created in the past, web search fees are still incurred for each call.

Solution:

In Billing Details, filter for Alibaba Cloud Model Studio and check the Instance ID (Billing Granularity) column to identify the model name and invocation channel that incurred the cost.
Check whether enable_search is enabled in your application code or Model Studio application configuration. If you no longer need web search, set this parameter to false or remove it.
If you have stopped all calls but are still being charged, check if other API keys or applications are still running. You can find and delete unused keys on the API Key Management page.

Why am I charged even without actively making API calls?

Cause: Model deployment in Model Studio is billed by duration. Billing starts as soon as the model deployment status changes to Running, regardless of API calls. Even if you have not called the model via API, charges continue to accumulate as long as the deployment status is Running. In addition, model inference is billed by token usage — fees are only incurred when API calls are made.

Solution:

Go to the Model Deployment page to check whether any deployed models have the status Running. If you no longer need them, take them offline to stop billing.
Delete unused API keys to prevent accidental API calls from generating inference fees (note: deletion is irreversible, proceed with caution). You can manage your keys on the API Key Management page.

How do I determine if my account has been compromised?

If you suspect your account has been accessed by others and unexpected charges have been incurred, follow these steps to investigate:

In Billing Details, filter by Model Studio, and check the ApiKeyID in the Instance ID (Billing Granularity) column to identify the API Key generating the charges.
Go to the API Key Management page, review the creation time of each key, and confirm whether it was created by you. The API Key Management page only shows creation time, not invocation time.
Review the call time distribution to identify any abnormal invocation patterns that may not be your own: Go to Model Usage, filter by Model or API Key ID, and switch to the List view to see the invocation time distribution.
If unauthorized invocations are found, immediately go to the API Key Management page to delete the corresponding API Key and regenerate it. Update all legitimate callers to use the new key.

Why am I charged without purchasing a Model Studio plan?

Cause: Model Studio API calls use the pay-as-you-go billing method by default. After you activate Model Studio, you can call the API immediately and pay based on actual usage, without purchasing a resource plan or savings plan. The system automatically charges you based on the token usage and corresponding pricing of the model you call. For detailed pricing information, see Model inference pricing.