FAQ-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

Billing

What are the unit prices for the models in Alibaba Cloud Model Studio?

For model descriptions, see the Model Studio console. For pricing information, see Model inference pricing.
How are model deployment costs calculated?

See Model deployment billing.
What is the unit price for training Qwen models?

See Model training pricing.
Are there any prepaid services available?

Yes, some models can be used with a prepaid service. For more information, see Savings plans.
Are pay-as-you-go bills settled monthly?

Bills are generated by the minute and settled monthly.
How can I view my charges and billing details?

Go to Expenses and Costs to view the details.
How can I request an invoice for my expenses?

Log on to the Expenses and Costs console, go to the Invoice Management page, and click the Issue Invoice tab to request an invoice.
When activating the service, I get a message: "Your account's available credit is less than 0. Please top up before trying to purchase." What should I do?

To activate the service, your Alibaba Cloud account must have a balance of at least 0 CNY.
Wan membership Does Wan membership support Alibaba Cloud Model Studio API calls?

No. Wan membership benefits do not apply to Model Studio API calls because they use separate billing systems.

API/SDK

Why do I get a "Missing Parameters" error (code 100004) when I call the Completion API?

This error indicates that a required parameter is missing. If you have included all required parameters, verify that the parameter format is correct.

The following is a correct example:

curl --location 'https://bailian.aliyuncs.com/v2/app/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer 85763*************cf050f' \
--data '{
"RequestId":"B8265C3E-9248-56C0-8665-A37A12F06F6B",
"AppId":"3cc760a7ef5d47d09255dd28b06b94d8",
"Prompt":"What is the weather like in Shenzhen today?",
"User":"1",
"Bot":"1"
}'

How can I find information about error codes?

API calls to Alibaba Cloud Model Studio return a status code that indicates the result of the call. For details about each code and its solution, see Error codes.
How do I install the SDK?

Alibaba Cloud Model Studio provides SDKs for Java and Python. For instructions, see Install the SDK.
When using a function call with the Assistant API, can I call two local functions in sequence?

a. Calling two separate functions sequentially is not currently supported.

b. As a workaround, you can create two separate Assistant APIs and handle the return value from each one.
Does the Assistant API have memory-related capabilities?

The memory configuration feature is not currently supported.
Why doesn't the doc_reference_type parameter take effect after being set?

Thedoc_reference_type parameter is effective only in older application versions. In newer versions, you can enable this feature directly on the application's operations page by turning on the Display answer source switch in the application configuration. If this switch is turned off, thedoc_reference_type parameter has no effect.

Product questions

How do I activate the Alibaba Cloud Model Studio service?

Alibaba Cloud Model Studio must be activated on a per-region basis. Log on with your Alibaba Cloud account and go to the Alibaba Cloud Model Studio console (China (Beijing) or Singapore). Switch the target region in the upper-right corner of the console. After reading and agreeing to the service agreement, Alibaba Cloud Model Studio is activated automatically. If the agreement does not appear, the service is already activated for that region.

If a message prompts you to complete real-name verification, you must complete real-name verification before proceeding.
How can I deactivate the Alibaba Cloud Model Studio service?

Currently, the Alibaba Cloud Model Studio service cannot be deactivated. If you use the API to call models or applications, you can prevent future calls by deleting your API key on the API-Key (Beijing) or API-Key (Singapore) page.
How can I try out the large model services?

You can go to the Playground (Beijing) or Playground (Singapore) page to try them out. For more details, see Introduction to the Playground.
What is the difference between Alibaba Cloud Model Studio and Qwen?

Alibaba Cloud Model Studio is a large language model service platform that provides a variety of large models, including the Qwen series.
My product integrates a Qwen model and needs to be listed on app stores, such as the WeChat mini program store. How can I apply for a cooperation agreement for the product listing?

a. For information about how to obtain a filing number, see Application compliance filing.

b. To apply for a cooperation agreement for the Qwen series models, submit an Alibaba Cloud ticket.
How can I implement business data isolation to ensure that data from different users is not associated?

You can use your Alibaba Cloud account to grant different workspace permissions to different RAM users. Data is isolated between workspaces. For more details, see Workspace permission management.
Does Alibaba Cloud Model Studio save data generated during model calls?

Alibaba Cloud strictly protects data privacy and never uses your data for model training. All data transmitted when you build applications or train large models is encrypted with AES-256 (Advanced Encryption Standard).

In accordance with relevant laws and regulations, Alibaba Cloud Model Studio will store data generated from model and application calls. For more information, see the terms regarding data processing, privacy, and security in the Alibaba Cloud Model Studio Service Agreement.
How long are conversation histories kept in the Playground, and is there a limit to the number of saved conversations?

The Model Studio console displays a maximum of 100 historical conversation records with no time limit. If you manually delete some records, the system automatically displays older ones. Conversations from trial sessions while not logged in or those that result in inference errors are not saved.
Does Alibaba Cloud Model Studio support adding implicit identifiers to generated text?

No.
Does Alibaba Cloud Model Studio have a mobile app?

Alibaba Cloud Model Studio does not currently offer an official standalone mobile app. The service is primarily accessed through the web console.

Model center

Can cloze test (fill-in-the-blank) data be used for training?

Yes. When you upload a training set, you can specify the questions and answers to guide the large model's learning. For reference, see Custom model best practices.
Does Alibaba Cloud Model Studio currently only support text training, or can images be trained as well?

Image training is now supported. The qwen-vl-plus model supports training and fine-tuning.
Is fine-tuning a high-level model and then transferring its capabilities to a low-level model a form of distillation?

Yes, this technique is a form of model distillation. It involves fine-tuning a high-level model to acquire powerful knowledge and then transferring it to a low-level model. This achieves model compression and performance optimization, allowing the smaller, efficient model to achieve performance comparable to or exceeding the original high-level model.
How are the parameters of a large model stored?

You can download open source models from the ModelScope community. Their structure is typically defined in JSON files. You usually need to use open source Python libraries to parse these files, which contain vector information that helps in understanding the storage process.
How is the diversity of a corpus dataset defined?

The diversity of a corpus dataset refers to its richness and variation across multiple dimensions, including linguistic features, content topics, text types, writing styles, language variants, author backgrounds, and time spans. The goal is to accurately reflect the actual use of language, enhance the generalization ability of NLP models, and improve their adaptability to diverse application scenarios.
When training a large model for personal use, how should I choose between qwen-turbo and qwen-max?

qwen-turbo emphasizes speed and resource efficiency, making it suitable for scenarios that require fast response times and easy deployment. In contrast, qwen-max focuses on top-tier performance and comprehensive knowledge, making it suitable for environments with strict requirements for model accuracy and the ability to handle complex tasks. The cost of qwen-turbo is lower than that of qwen-max. Select the model version based on your specific requirements. You can also see Introduction to Qwen to learn about the specific differences.
How do I upload a custom model for model training?

In model fine-tuning, a custom model refers to a model that you have already trained and now want to use for secondary training. Models that you have trained locally cannot be uploaded.
Can a trained open source model be exported?

This is not currently supported.
How many languages do the Qwen series models support?

They support 14 languages: Chinese, English, Arabic, Spanish, French, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Indonesian.
Can the current models connect to structured data, such as MySQL or Hive?

This is not currently supported. However, this feature is under development, with priority given to integrating with ApsaraDB RDS.
After Qwen is upgraded, do enterprise large models need to be retrained?

Not necessarily. You can decide whether and when to retrain an enterprise large model. If the trained model performs well in your scenarios, there is no need to retrain it. It is unpredictable how a base model upgrade will affect your application scenarios. You can use our evaluation tools to assess the impact if you deem it necessary.
I've noticed that the base models on Alibaba Cloud Model Studio sometimes repeat themselves, and this phenomenon seems to become more severe after fine-tuning. What is the reason for this?

This is a model hallucination issue. It can happen when the large model lacks the knowledge to answer your input prompt. If the problem worsens after fine-tuning a base model, it indicates that the training was not effective. The effectiveness of the training depends on the quality, diversity, and volume of the training data.
We are training a model for a vertical domain. If we only use domain data during the Supervised Fine-Tuning (SFT) phase and do not mix in the original Qwen SFT data, is it more likely to overfit and cause repetitive, verbose answers as we add more vertical domain data? Are there any best practice recommendations?

Using only vertical domain data for Supervised Fine-Tuning (SFT) can indeed cause the model to forget its original general knowledge.

To prepare high-quality domain-specific SFT data, consider the following:
- Clear task definition: Avoid having a single prompt map to ambiguous answers.
- High data quality: Answers should be accurate, concise, and directly address the question, avoiding redundant or irrelevant content.
- Data diversity: Express the same semantic meaning with a variety of different prompts to prevent the model from learning to respond to only a single pattern. High-quality training data often requires multiple iterations of optimization.
During training, I've found that with a small dataset (around 100 entries), more epochs lead to better results. However, with a larger dataset (over 1,000 entries), more epochs tend to cause overfitting. Are there any best practices for hyperparameter configuration and data ratios?

To achieve ideal model performance, training data should be not only high-quality but also as abundant as possible, especially for complex tasks. There are no fixed rules for hyperparameters like the number of epochs; they must be determined experimentally for each specific task. For example, when tackling a complex task with several thousand data entries, training typically requires around 20 epochs. Additionally, you should not evaluate a large model for overfitting based solely on its loss. Unlike with traditional models, the actual performance of a large model may improve even if the loss suggests overfitting. Therefore, you must use manual evaluation to judge the final effectiveness.
Is the text generation speed for models like Qwen-3 and Qwen-Max fixed for all users? Is there a way to adjust the speed?

The generation speed is not fixed. It varies based on factors like the current overall service load and the concurrency of your requests.
After model rate limiting is triggered, how long should I typically wait before trying again?

The waiting time depends on your specific rate limit value, such as requests per second (RPS) or requests per minute (RPM). For example, if your limit is 120 RPM (which is 2 requests per second) and you submit 2 requests consecutively within 0.2 seconds, the third request will be throttled. You will need to wait approximately 0.8 seconds before you can successfully submit another request.
Which model series does qwen-plus-latest belong to? Is it Qwen3.7 or Qwen3.5?

qwen-plus-latest is the latest version of qwen-plus and belongs to the Qwen3 series, not the Qwen3.5 or Qwen3.7 series. Note that Qwen3.5, Qwen3.7, and similar designations are independent model series that run parallel to Qwen3, not sub-versions of it.

Model hallucination

What is model hallucination?

Model hallucination refers to the phenomenon where a large language model (LLM) generates content that is nonsensical, factually incorrect, distorted, or logically contradictory. The output may seem plausible and fluent but is inconsistent with the input prompt, real-world facts, or logical context. It is important to distinguish hallucination from factual errors (such as those caused by outdated training data), subjective opinions, or creative writing (like when a model is explicitly asked to write a story). The core of hallucination is a confident assertion without a factual basis.
How can I reduce model hallucination?

You can reduce model hallucination in the following ways:
1. Choose a more powerful model: Generally, selecting a larger and more advanced model can reduce hallucinations. For example, in the Qwen series, Max-level models perform better than Plus-level models, which in turn perform better than Turbo-level models.
2. Prompt engineering: Modifying the prompt is a simple and effective way to reduce model hallucination. For example, in a Retrieval-Augmented Generation (RAG) scenario, add instructions like, "Please answer based only on the provided documents. If the information is not available, say 'I don't know.'" You can also add, "Please cite specific data or reports to support your conclusion," use prompts to break down a task into multiple steps, or define a strict role for the model in the prompt.
3. Retrieval-Augmented Generation (RAG): With RAG, you can provide the model with reference materials for its responses and strictly limit its answers to the scope of the retrieved knowledge, significantly reducing hallucinations. When building a RAG system, ensure the retrieval system is high-quality, clearly labels information sources, and gracefully handles cases where no relevant information can be found.
4. Plugins/MCP: Use the capabilities of plugins or MCP to reduce model hallucination. For example, when using a large model to summarize data from a structured database, you can use plugins or MCP to call a database client to perform the calculations. The results can then be returned to the model for summarization, which avoids hallucinations that can occur when a model tries to perform numerical calculations directly.
5. Model parameter tuning: Lowering randomness parameters such astemperature,top_k, andtop_p makes the output more deterministic and less likely to generate bizarre content, though it may sacrifice creativity. In some scenarios, reducingmax_tokens can prevent the model from fabricating content after it has provided the key information.
6. Post-processing verification: After the model's inference is complete, use a subsequent step to verify the correctness of the response. This usually involves using another AI-driven process to check the response for hallucinations. This method increases costs and slows the overall response time.

Contact us

How can I contact you for business cooperation?

Please contact the official Alibaba Cloud service hotline at 4008013260 or get in touch through Official Website - Pre-sales Consultation.
How can I provide feedback on product usage issues?

Log on to the official Alibaba Cloud website and submit your feedback through Official website - Support and Services.