Model Gallery encapsulates PAI-DLC and PAI-EAS to support zero-code deployment and training of open-source large language models. This topic demonstrates how to deploy, fine-tune, and evaluate the Qwen3-0.6B model.
1. Prerequisites
To activate PAI and create a workspace, log in to the PAI console with your Alibaba Cloud account, select a region in the top-left corner, and activate the service with one-click authorization.
2. Billing
The examples in this topic use public resources to create PAI-DLC tasks and PAI-EAS services, which are billed on a pay-as-you-go basis. For details, see PAI-DLC billing and PAI-EAS billing.
3. Model deployment
3.1 Deploy the model
Log on to the PAI console. In the left-side navigation pane, click Model Gallery, search for the Qwen3-0.6B card, and then click Deploy.
The configuration page is pre-filled with default parameters. Click Deploy > Confirm. The deployment takes about five minutes. The deployment is successful when the status changes to In operation.
By default, the model is deployed using public resources and is billed on a pay-as-you-go basis.
The default deployment resource specification is
ecs.gn7i-c8g1.2xlarge(8 vCPU, 30 GiB, NVIDIA A10 × 1), which costs approximately CNY 10.5/hour. After reviewing the configuration, click Deploy at the bottom of the panel.
3.2 Invoke the model
View the invocation information. On the service details page, click View Call Information to get the Internet Endpoint and Token.
To view the deployment job details later, in the left-side navigation pane, click Model Gallery > Job Management > Deployment Jobs. Then, click the target Service name.
In the displayed invocation information dialog box, view the Internet Endpoint and VPC endpoint on the Shared Gateway and VPC High-Speed Direct Connection tabs, respectively.
Invoke the model by using one of the following methods.
Online debugging
Switch to the Online Debugging page. The large language model service supports Conversation Debugging and API Debugging.
Cherry Studio client
Cherry Studio is a popular client for interacting with large language models. It integrates the MCP feature, which allows you to easily chat with models.
Connect to the Qwen3 model deployed on PAI
Install the client
Download and install the client from Cherry Studio.
You can also go to
https://github.com/CherryHQ/cherry-studio/releasesto download the client.Add a provider.
Click the
Settings button in the lower-left corner. In the Model Provider section, click Add.In the Provider Name field, enter a custom name, such as Platform for AI (PAI), and set the provider type to OpenAI.
Click OK.
In the API Key field, enter your Token. In the API Host field, enter your endpoint.
Click Add. In the model ID field, enter
Qwen3-0.6B(case-sensitive) to add the model.You can click Check next to API Key to verify connectivity.
Click the
icon to return to the chat page. At the top of the window, switch to your newly added Qwen3-0.6B model to start the conversation.
Python SDK
from openai import OpenAI import os # If you have not set the environment variable, you can assign your service Token directly. For example: token = 'YTA1NTEzMzY3ZTY4Z******************' token = os.environ.get("Token") # Do not remove "/v1" from the end of the endpoint. client = OpenAI( api_key=token, base_url=f'<your_endpoint>/v1', ) if token is None: print("Please configure the Token environment variable, or assign the token value directly to the token variable.") exit() query = 'Hello, who are you?' messages = [{'role': 'user', 'content': query}] resp = client.chat.completions.create(model='Qwen3-0.6B', messages=messages, max_tokens=512, temperature=0) query = messages[0]['content'] response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}')
3.3 Important reminder
This model service uses public resources and is billed on a pay-as-you-go basis. To avoid incurring unnecessary charges, stop or delete the service when you no longer need it.
You can do this on the Job Management > Deployment Jobs tab, in the Actions column of the target service.
4. Model fine-tuning
To improve a model's performance in a specific domain, you can fine-tune it on a domain-specific dataset. This section presents a scenario to demonstrate the purpose and steps of model fine-tuning.
4.1 Use case
In the logistics industry, you often need to extract structured information (such as recipient, address, and phone number) from natural language. Large-parameter models, such as Qwen3-235B-A22B, perform well on this task but are costly and have high latency. To balance performance and cost, you can first use a large-parameter model to label data, and then use that data to fine-tune a small-parameter model, such as Qwen3-0.6B, to deliver similar performance on the task. This process is also known as model distillation.
On this task, the original Qwen3-0.6B model has an accuracy of 14%. After fine-tuning, its accuracy can exceed 90%.
You can follow the steps for this use case in the solution 10-Minute Fine-Tuning: Making a 0.6B Model Comparable to a 235B Model.
Example recipient address information | Example structured information |
Room 1202, Block B, Runfeng Garden, 189 Taohualing Road, Yuelu District, Changsha | Phone: 021-17613435 | Contact: Jiang Yutong | |
4.2 Data preparation
This task involves performing model distillation from the teacher model (Qwen3-235B-A22B) to the Qwen3-0.6B model. First, you must use the teacher model's API to extract recipient address information into structured JSON data. Generating this JSON data can be time-consuming. Therefore, this article provides a sample training dataset train_qwen3.json and a validation set eval_qwen3.json that you can download and use directly.
In model distillation, the model with more parameters is called the teacher model. The data used in this article is synthetically generated by a large model and does not contain any sensitive user information.
Going live
4.3 Fine-tune the model
In the left-side navigation pane, click Model Gallery. Search for the Qwen3-0.6B card and click Fine-tune.
Configure the parameters for the training job. Configure only the following key parameters and keep the default values for the others.
Training Mode: The default selection is SFT (Supervised Fine-Tuning) using the LoRA method.
LoRA is an efficient fine-tuning technique that saves training resources by modifying only a subset of the model parameters.
Training dataset: First, download the sample training dataset train_qwen3.json. Then, on the configuration page, select OSS file or directory, click the
icon to select a bucket, click Upload File to upload the downloaded dataset to Object Storage Service (OSS), and then select the file.Validate dataset: First, download the validation dataset eval_qwen3.json. Then, click Add validation dataset and follow the same procedure as for the training dataset to upload and select the file.
The validation dataset evaluates the model's performance on unseen data during training.
Model output path: By default, the system saves the fine-tuned model to OSS. If the target OSS directory is empty, click Create folder and select the newly created directory.
Resource Group Type: Select Public Resource Group. This fine-tuning task requires approximately 5 GB of GPU memory. The console has already filtered the instance types that meet this requirement. Select an instance type, such as
ecs.gn7i-c16g1.4xlarge.When you deploy other models, you can refer to Estimate the GPU memory required for a large model to calculate the GPU memory needed for model training.
Hyperparameters:
learning_rate: Set to 0.0005num_train_epochs: Set to 4per_device_train_batch_size: Set to 8seq_length: Set to 512
The model performs well on the test data in this topic with this hyperparameter configuration. If you encounter low accuracy when fine-tuning a model for your business needs, try adjusting the hyperparameters. To learn more about what hyperparameters do and how to use the loss curve to guide adjustments, see the Alibaba Cloud Large Model ACP course.
Then, click Train > OK. The training job enters the Creating state. When the status changes to In operation, model fine-tuning starts.
View the training job until it completes. The fine-tuning process takes about 10 minutes. During this time, the job details page displays logs and metric curves. After the training job completes, the system saves the fine-tuned model to the specified OSS directory.
To view the training job details later, in the left-side navigation pane, click Model Gallery > Job Management > Training Jobs, and then click the job name.
4.4 Deploying the fine-tuned model
On the training job details page, click Deploy to open the deployment configuration page. For Resource Type, select Public Resources. Deploying the 0.6B model requires about 5 GB of GPU memory. The list under Instance Type automatically displays compatible specifications. Select one, such as ecs.gn7i-c8g1.2xlarge. Keep the other parameters at their default values, and then click Deploy > OK.
Deployment takes about 5 minutes and is complete when the status changes to Running.
To view the training job details, in the left-side navigation pane, click Model Gallery > Job Management > Training Jobs, and then click the job name.
If the Deploy button is disabled after the training job succeeds, it means the output model is still being registered. Wait about one minute for the button to be enabled.
The steps to invoke the model are the same as described in 3.2 Invoke the model.
4.5 Evaluate the fine-tuned model
Before deploying the fine-tuned model to a production environment, evaluate its performance to ensure it is stable and accurate. This evaluation helps prevent unexpected issues after deployment.
Prepare test data
Prepare a test dataset that does not overlap with your training data to evaluate the model's performance. The accuracy test code below automatically downloads a test set for this purpose.
Using a test dataset that is separate from the training data ensures an unbiased assessment of the model's generalization ability on unseen data. This practice prevents inflated scores that result from evaluating the model on data it has already seen.
Design evaluation metrics
Evaluation metrics should align closely with your business objectives. For this solution's use case, in addition to validating the generated JSON, you must also verify that the key-value pairs are correct.
Define the evaluation metrics programmatically. For the implementation in this example, refer to the compare_address_info method in the accuracy test code below.Validate the fine-tuned model
Run the following test code to output the model's accuracy on the test set.
Output:
All predictions are complete! Results have been saved to predicted_labels.jsonl
Number of samples: 400
Correct responses: 361
Incorrect responses: 39
Accuracy: 91.25 %Due to the random seed used in model fine-tuning and the stochastic nature of the large language model's output, the accuracy you achieve may differ from the results shown in this solution. This variance is normal.
As you can see, the accuracy is 91.25%, a significant improvement from the 14% accuracy of the original Qwen3-0.6B model. This indicates that fine-tuning substantially improved the model's performance on structured information extraction for logistics tasks.
To reduce training time, this guide uses only 4 training epochs, achieving an accuracy of 91.25%. You can further improve the accuracy by increasing the number of training epochs. For other scenarios, refer to the Alibaba Cloud large language model ACP course to learn how to adjust hyperparameters.
4.6 Important note
The model service in this topic uses public resources and is pay-as-you-go. When you no longer need the service, stop or delete it to avoid further charges.
Related documents
To learn more about Model Gallery features such as evaluation and compression, see Model Gallery.
To learn more about EAS features such as Auto Scaling, stress testing, and monitoring and alerting, see the EAS overview.

