Model Gallery quick start

更新时间:
复制 MD 格式

Model Gallery encapsulates PAI-DLC and PAI-EAS to support zero-code deployment and training of open-source large language models. This topic demonstrates how to deploy, fine-tune, and evaluate the Qwen3-0.6B model.

1. Prerequisites

To activate PAI and create a workspace, log in to the PAI console with your Alibaba Cloud account, select a region in the top-left corner, and activate the service with one-click authorization.

2. Billing

The examples in this topic use public resources to create PAI-DLC tasks and PAI-EAS services, which are billed on a pay-as-you-go basis. For details, see PAI-DLC billing and PAI-EAS billing.

3. Model deployment

3.1 Deploy the model

  1. Log on to the PAI console. In the left-side navigation pane, click Model Gallery, search for the Qwen3-0.6B card, and then click Deploy.

  2. The configuration page is pre-filled with default parameters. Click Deploy > Confirm. The deployment takes about five minutes. The deployment is successful when the status changes to In operation.

    By default, the model is deployed using public resources and is billed on a pay-as-you-go basis.

    The default deployment resource specification is ecs.gn7i-c8g1.2xlarge (8 vCPU, 30 GiB, NVIDIA A10 × 1), which costs approximately CNY 10.5/hour. After reviewing the configuration, click Deploy at the bottom of the panel.

3.2 Invoke the model

  1. View the invocation information. On the service details page, click View Call Information to get the Internet Endpoint and Token.

    To view the deployment job details later, in the left-side navigation pane, click Model Gallery > Job Management > Deployment Jobs. Then, click the target Service name.

    In the displayed invocation information dialog box, view the Internet Endpoint and VPC endpoint on the Shared Gateway and VPC High-Speed Direct Connection tabs, respectively.

  2. Invoke the model by using one of the following methods.

    Online debugging

    Switch to the Online Debugging page. The large language model service supports Conversation Debugging and API Debugging.

    Cherry Studio client

    Cherry Studio is a popular client for interacting with large language models. It integrates the MCP feature, which allows you to easily chat with models.

    Connect to the Qwen3 model deployed on PAI

    Python SDK

    from openai import OpenAI
    import os
    # If you have not set the environment variable, you can assign your service Token directly. For example: token = 'YTA1NTEzMzY3ZTY4Z******************'
    token = os.environ.get("Token")
    # Do not remove "/v1" from the end of the endpoint.
    client = OpenAI(
        api_key=token,
        base_url=f'<your_endpoint>/v1',
    )
    if token is None:
        print("Please configure the Token environment variable, or assign the token value directly to the token variable.")
        exit()
    query = 'Hello, who are you?'
    messages = [{'role': 'user', 'content': query}]
    resp = client.chat.completions.create(model='Qwen3-0.6B', messages=messages, max_tokens=512, temperature=0)
    query = messages[0]['content']
    response = resp.choices[0].message.content
    print(f'query: {query}')
    print(f'response: {response}')

3.3 Important reminder

This model service uses public resources and is billed on a pay-as-you-go basis. To avoid incurring unnecessary charges, stop or delete the service when you no longer need it.

You can do this on the Job Management > Deployment Jobs tab, in the Actions column of the target service.

4. Model fine-tuning

To improve a model's performance in a specific domain, you can fine-tune it on a domain-specific dataset. This section presents a scenario to demonstrate the purpose and steps of model fine-tuning.

4.1 Use case

In the logistics industry, you often need to extract structured information (such as recipient, address, and phone number) from natural language. Large-parameter models, such as Qwen3-235B-A22B, perform well on this task but are costly and have high latency. To balance performance and cost, you can first use a large-parameter model to label data, and then use that data to fine-tune a small-parameter model, such as Qwen3-0.6B, to deliver similar performance on the task. This process is also known as model distillation.

On this task, the original Qwen3-0.6B model has an accuracy of 14%. After fine-tuning, its accuracy can exceed 90%.
You can follow the steps for this use case in the solution 10-Minute Fine-Tuning: Making a 0.6B Model Comparable to a 235B Model.

Example recipient address information

Example structured information

Room 1202, Block B, Runfeng Garden, 189 Taohualing Road, Yuelu District, Changsha | Phone: 021-17613435 | Contact: Jiang Yutong

{
    "province": "Hunan",
    "city": "Changsha",
    "district": "Yuelu",
    "specific_location": "Room 1202, Block B, Runfeng Garden, 189 Taohualing Road",
    "name": "Jiang Yutong",
    "phone": "021-17613435"
}

4.2 Data preparation

This task involves performing model distillation from the teacher model (Qwen3-235B-A22B) to the Qwen3-0.6B model. First, you must use the teacher model's API to extract recipient address information into structured JSON data. Generating this JSON data can be time-consuming. Therefore, this article provides a sample training dataset train_qwen3.json and a validation set eval_qwen3.json that you can download and use directly.

In model distillation, the model with more parameters is called the teacher model. The data used in this article is synthetically generated by a large model and does not contain any sensitive user information.

Going live

To apply this solution to your business, we recommend that you prepare data using the following methods:

Real business scenarios (recommended)

Real business data better reflects your business scenarios, and the fine-tuned model can be better adapted to your business. After you obtain the business data, you need to programmatically convert it into a JSON file in the following format.

[
    {
        "instruction": "You are a professional information extraction assistant that specializes in extracting JSON information of recipients from Chinese text. The keys to include are province, city, district, specific_location (detailed information such as street, house number, residential area, and building), name (recipient's name), and phone (contact number). The recipient is Ouyang Wenbin; Tianjin Hexi District Zhujiang Road No. 21 Nankai University Science and Technology Park Building 3 Block B; Mobile number: 023-53932018",
        "output": "{\"province\": \"Tianjin City\", \"city\": \"Tianjin City\", \"district\": \"Hexi District\", \"specific_location\": \"Zhujiang Road No. 21 Nankai University Science and Technology Park Building 3 Block B\", \"name\": \"Ouyang Wenbin\", \"phone\": \"023-53932018\"}"
    },
    {
        "instruction": "You are a professional information extraction assistant that specializes in extracting JSON information of recipients from Chinese text. The keys to include are province, city, district, specific_location (detailed information such as street, house number, residential area, and building), name (recipient's name), and phone (contact number). Nanning City Qingxiu District Zhuxi Avenue No. 38 Jinyuan Chengwangfu Building 6: Contact: 23952529750: Recipient: Nong Lixia",
        "output": "{\"province\": \"Guangxi Zhuang Autonomous Region\", \"city\": \"Nanning City\", \"district\": \"Qingxiu District\", \"specific_location\": \"Zhuxi Avenue No. 38 Jinyuan Chengwangfu Building 6\", \"name\": \"Nong Lixia\", \"phone\": \"23952529750\"}"
    }
]

The JSON file contains multiple training samples. Each sample includes two fields: instruction and output.

  • instruction: Contains the prompt that guides the behavior of the large model, along with the input data.

  • output: The expected standard answer, usually generated by human experts or larger models such as qwen3-235b-a22b.

Model generation

When business data is insufficient, consider using a model for data augmentation. This can improve the diversity and coverage of the data. To avoid leaking user privacy, this solution uses a model to generate a batch of virtual address data. The following generation code is for your reference.

Code for simulating business data generation

To run the following code, you need to create an Alibaba Cloud Model Studio API key. The code uses qwen-plus-latest to generate business data and qwen3-235b-a22b for labeling.

# -*- coding: utf-8 -*-
import os
import asyncio
import random
import json
import sys
from typing import List, Dict
from openai import AsyncOpenAI
import platform

# Create an asynchronous client instance.
client = AsyncOpenAI(
    # If you have not configured environment variables, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

# List of Chinese provinces.
provinces = [
    "Beijing", "Tianjin", "Hebei", "Shanxi", "Inner Mongolia", "Liaoning", "Jilin", "Heilongjiang",
    "Shanghai", "Jiangsu", "Zhejiang", "Anhui", "Fujian", "Jiangxi", "Shandong", "Henan",
    "Hubei", "Hunan", "Guangdong", "Guangxi", "Hainan", "Chongqing", "Sichuan", "Guizhou",
    "Yunnan", "Tibet", "Shaanxi", "Gansu", "Qinghai", "Ningxia", "Xinjiang"
]

# Recipient writing templates.
recipient_templates = [
    "Recipient {name}", "Recipient: {name}", "Recipient is {name}", "To: {name}",
    "Recipient is {name}", "{name}", "Name: {name}", "Name {name}",
    "Contact {name}", "Contact: {name}", "Receiver {name}", "Receiver: {name}",
    "Consignee {name}", "Consignee: {name}", "Send to {name}", "To {name}",
    "Recipient {name}", "Recipient: {name}", "Receiver {name}", "Receiver: {name}"
]

# Phone number writing templates.
phone_templates = [
    "tel: {phone}", "tel:{phone}", "mobile: {phone}", "mobile:{phone}",
    "Mobile number {phone}", "Mobile number: {phone}", "Mobile: {phone}", "Mobile {phone}",
    "Phone: {phone}", "Phone {phone}", "Contact number {phone}", "Contact number: {phone}",
    "Number: {phone}", "Number {phone}", "TEL: {phone}", "MOBILE: {phone}",
    "contact: {phone}", "phone: {phone}", "{phone}", "call: {phone}",
    "Contact method {phone}", "Contact method: {phone}", "Phone number {phone}", "Phone number: {phone}",
    "Mobile No. {phone}", "Mobile No.: {phone}", "Phone number is {phone}", "Contact number is {phone}"
]

# Generate a virtual mobile number that starts with 2 to avoid overlapping with real numbers.
def generate_mobile():
    prefixes = ['200', '201', '202', '203', '204', '205', '206', '207', '208', '209',
               '210', '211', '212', '213', '214', '215', '216', '217', '218', '219',
               '220', '221', '222', '223', '224', '225', '226', '227', '228', '229',
               '230', '231', '232', '233', '234', '235', '236', '237', '238', '239']
    return random.choice(prefixes) + ''.join([str(random.randint(0, 9)) for _ in range(8)])

# Generate a landline number.
def generate_landline():
    area_codes = ['010', '021', '022', '023', '024', '025', '027', '028', '029', '0311', '0351', '0431', '0451']
    area_code = random.choice(area_codes)
    number = ''.join([str(random.randint(0, 9)) for _ in range(random.choice([7, 8]))])
    return f"{area_code}-{number}"

# Use a large model to generate recipient and address information.
async def generate_recipient_and_address_by_llm(province: str):
    """Uses a large model to generate the recipient's name and address information for a specified province."""
    prompt = f"""Please generate recipient information for {province}, including the following:
1. A real Chinese name. It can be a common name or a less common one for diversity.
2. A city name within that province.
3. An administrative region name within that city, such as a district or county.
4. A specific street address, such as a road name and house number, residential area name and building number, or commercial building and floor, which should be realistic.

Please return the information in JSON format:
{{"name": "Recipient Name", "city": "City Name", "district": "Administrative Region Name", "specific_location": "Specific Address"}}

Do not include any other content. Return only the JSON. The names should be diverse, not just common ones like Zhang San or Li Si."""

    try:
        response = await client.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model="qwen-plus-latest",
            temperature=1.7,  # Increase the temperature to make the names more diverse.
        )
        
        result = response.choices[0].message.content.strip()
        # Clean up possible markdown code block markers.
        if result.startswith('```'):
            result = result.split('\n', 1)[1]
        if result.endswith('```'):
            result = result.rsplit('\n', 1)[0]
        
        # Try to parse the JSON.
        info = json.loads(result)
        return info
    except Exception as e:
        print(f"Failed to generate recipient and address: {e}. Using a backup plan.")
        # Backup plan.
        backup_names = ["Wang Jianjun", "Li Chunyan", "Zhang Zhihua", "Chen Meiling", "Liu Deqiang", "Zhao Minhui", "Sun Wenbo", "Zhou Xiaoli"]
        return {
            "name": random.choice(backup_names),
            "city": f"{province.replace('Province', '').replace('City', '').replace('Autonomous Region', '')} City",
            "district": "Municipal District", 
            "specific_location": f"Renmin Road No. {random.randint(1, 999)}"
        }

# Generate a record.
async def generate_record():
    # Randomly select a province.
    province = random.choice(provinces)
    
    # Use a large model to generate recipient and address information.
    info = await generate_recipient_and_address_by_llm(province)
    
    # Generate the recipient information format.
    recipient = random.choice(recipient_templates).format(name=info['name'])
    
    # Generate a phone number. There is a 70% chance of a mobile number and a 30% chance of a landline number.
    if random.random() < 0.7:
        phone = generate_mobile()
    else:
        phone = generate_landline()
    
    phone_info = random.choice(phone_templates).format(phone=phone)
    
    # Assemble the address.
    full_address = f"{info['city']}{info['district']}{info['specific_location']}"
    
    # Assemble the data.
    components = [recipient, phone_info, full_address]
    
    # Randomly shuffle the order.
    random.shuffle(components)
    
    # Randomly select a separator.
    separators = [' ', ',', ',', ';', ';', ':', ':', ',', '|', '\t', '', '  ', ' | ', ' , ', ' ; ', '/']
    separator = random.choice(separators)
    
    # Merge the data.
    if separator == '':
        # No separator.
        combined_data = ''.join(components)
    else:
        combined_data = separator.join(components)    
    return combined_data

# Generate data in batches.
async def generate_batch_data(count: int) -> List[str]:
    """Generates a specified amount of data."""
    print(f"Starting to generate {count} data entries...")
    data = []
    
    # Use a semaphore to control the number of concurrent requests. QPM=1500, set to 20 concurrent requests.
    semaphore = asyncio.Semaphore(20)
    
    async def generate_single_record(index):
        async with semaphore:
            record = await generate_record()
            print(f"Generating data entry {index+1}: {record}")
            return record
    
    # Generate data concurrently.
    tasks = [generate_single_record(i) for i in range(count)]
    data = await asyncio.gather(*tasks)
    
    return data

# Save data to a file.
def save_data(data: List[str], filename: str = "recipient_data.json"):
    """Saves data to a JSON file."""
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    print(f"Data saved to {filename}")

# Data generation phase.
async def produce_data_phase():
    print("=== Phase 1: Start generating recipient data ===")
    
    # Generate 2,000 data entries.
    batch_size = 2000
    data = await generate_batch_data(batch_size)
    
    # Save the data.
    save_data(data)
    
    print(f"\nGenerated a total of {len(data)} data entries")
    print("\nSample data:")
    for i, record in enumerate(data[:3]):  # Display the first 3 entries as examples.
        print(f"{i+1}. Raw data: {record}")
        print()
    
    print("=== Phase 1 complete ===\n")
    return True

def get_system_prompt():
    """Returns the system prompt."""
    return """You are a professional information extraction assistant that specializes in extracting structured recipient information from Chinese text.

## Task description
Based on the input text, accurately extract and generate a JSON-formatted output that contains the following six fields:
- province: The province, municipality, or autonomous region. The full official name is required, such as "河南省", "上海市", or "新疆维吾尔自治区".
- city: The city name, which must include "市", such as "郑州市" or "西安市".
- district: The district or county name, which must include "区" or "县", such as "金水区" or "雁塔区".
- specific_location: The detailed address, including street, house number, residential area, and building.
- name: The full name of the recipient in Chinese.
- phone: The complete phone number, including the area code.

## Extraction rules
1. **Address information processing**:
   - Accurately identify the hierarchical relationship between the province, city, and district.
   - Use the full official name for the province, such as "河南省" instead of "河南".
   - For municipalities, the values of the province and city fields must be the same, such as "上海市".
   - The specific_location field must contain the detailed street address, residential area name, and building number.

2. **Name identification**:
   - Accurately extract the full Chinese name, including compound surnames.
   - Include names of ethnic minorities.

3. **Phone number processing**:
   - Extract the complete phone number and keep its original format.

## Output format
Strictly follow the JSON format below. Do not add any explanatory text.
{
  "province": "Province Name",
  "city": "City Name", 
  "district": "District Name",
  "specific_location": "Detailed Address",
  "name": "Recipient Name",
  "phone": "Phone Number"
}"""

# Use a large model to predict structured data.
async def predict_structured_data(raw_data: str):
    """Uses the qwen3-235b-a22b model to predict structured data."""
    system_prompt = get_system_prompt()
    
    try:
        response = await client.chat.completions.create(
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": raw_data}
            ],
            model="qwen3-235b-a22b",
            temperature=0.1,  # Lower the temperature to improve prediction accuracy.
            response_format={"type": "json_object"},
            extra_body={"enable_thinking":False}
        )
        
        result = response.choices[0].message.content.strip()
        
        # Clean up possible markdown code block markers.
        if result.startswith('```'):
            lines = result.split('\n')
            for i, line in enumerate(lines):
                if line.strip().startswith('{'):
                    result = '\n'.join(lines[i:])
                    break
        if result.endswith('```'):
            result = result.rsplit('\n```', 1)[0]
        
        # Try to parse the JSON.
        structured_data = json.loads(result)
        return structured_data
        
    except Exception as e:
        print(f"Failed to predict structured data: {e}. Raw data: {raw_data}")
        # Return empty structured data as a backup.
        return {
            "province": "",
            "city": "",
            "district": "",
            "specific_location": "",
            "name": "",
            "phone": ""
        }

# Data conversion phase.
async def convert_data_phase():
    """Converts the data format and uses a large model to predict structured data."""
    print("=== Phase 2: Start converting data format ===")
    
    try:
        print("Start reading the recipient_data.json file...")
        
        # Read the raw data.
        with open('recipient_data.json', 'r', encoding='utf-8') as f:
            raw_data_list = json.load(f)
        
        print(f"Successfully read the data. A total of {len(raw_data_list)} records")
        print("Start using the qwen3-235b-a22b model to predict structured data...")
        # Using a simple and clear system message helps improve training and inference speed.
        system_prompt = "You are a professional information extraction assistant that specializes in extracting JSON information of recipients from Chinese text. The keys to include are province, city, district, specific_location (detailed information such as street, house number, residential area, and building), name (recipient's name), and phone (contact number). The input is as follows:" 
        output_file = 'recipient_sft_data.json'
        
        # Use a semaphore to control the number of concurrent requests.
        semaphore = asyncio.Semaphore(10)
        
        async def process_single_item(index, raw_data):
            async with semaphore:
                # Use a large model to predict structured data.
                structured_data = await predict_structured_data(raw_data)
                print(f"Processing data entry {index+1}: {raw_data}")

                conversation = {
                    "instruction": system_prompt + raw_data,
                    "output": json.dumps(structured_data, ensure_ascii=False)
                }
            
                return conversation
        
        print(f"Start converting data to {output_file}...")
        
        # Process all data concurrently.
        tasks = [process_single_item(i, raw_data) for i, raw_data in enumerate(raw_data_list)]
        conversations = await asyncio.gather(*tasks)

        with open(output_file, 'w', encoding='utf-8') as outfile:
            json.dump(conversations, outfile, ensure_ascii=False, indent=4)
        
        print(f"Conversion complete. A total of {len(raw_data_list)} records processed")
        print(f"Output file: {output_file}")
        print("=== Phase 2 complete ===")
        
    except FileNotFoundError:
        print("Error: The recipient_data.json file was not found.")
        sys.exit(1)
    except json.JSONDecodeError as e:
        print(f"JSON parsing error: {e}")
        sys.exit(1)
    except Exception as e:
        print(f"An error occurred during conversion: {e}")
        sys.exit(1)

# Main function.
async def main():
    print("Starting the merged data processing flow...")
    print("This program will execute two phases in sequence:")
    print("1. Generate raw recipient data.")
    print("2. Use the qwen3-235b-a22b model to predict structured data and convert it to SFT training format.")
    print("-" * 50)
    
    # Phase 1: Generate data.
    success = await produce_data_phase()
    
    if success:
        # Phase 2: Convert data.
        await convert_data_phase()
        
        print("\n" + "=" * 50)
        print("All processes are complete.")
        print("Generated files:")
        print("- recipient_data.json: Raw data list")
        print("- recipient_sft_data.json: SFT training format data")
        print("=" * 50)
    else:
        print("Data generation phase failed. Execution terminated.")

if __name__ == '__main__':
    # Set the event loop policy.
    if platform.system() == 'Windows':
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
    # Run the main coroutine.
    asyncio.run(main(), debug=False) 

4.3 Fine-tune the model

  1. In the left-side navigation pane, click Model Gallery. Search for the Qwen3-0.6B card and click Fine-tune.

  2. Configure the parameters for the training job. Configure only the following key parameters and keep the default values for the others.

    • Training Mode: The default selection is SFT (Supervised Fine-Tuning) using the LoRA method.

      LoRA is an efficient fine-tuning technique that saves training resources by modifying only a subset of the model parameters.
    • Training dataset: First, download the sample training dataset train_qwen3.json. Then, on the configuration page, select OSS file or directory, click the image icon to select a bucket, click Upload File to upload the downloaded dataset to Object Storage Service (OSS), and then select the file.

    • Validate dataset: First, download the validation dataset eval_qwen3.json. Then, click Add validation dataset and follow the same procedure as for the training dataset to upload and select the file.

      The validation dataset evaluates the model's performance on unseen data during training.
    • Model output path: By default, the system saves the fine-tuned model to OSS. If the target OSS directory is empty, click Create folder and select the newly created directory.

    • Resource Group Type: Select Public Resource Group. This fine-tuning task requires approximately 5 GB of GPU memory. The console has already filtered the instance types that meet this requirement. Select an instance type, such as ecs.gn7i-c16g1.4xlarge.

      When you deploy other models, you can refer to Estimate the GPU memory required for a large model to calculate the GPU memory needed for model training.
    • Hyperparameters:

      • learning_rate: Set to 0.0005

      • num_train_epochs: Set to 4

      • per_device_train_batch_size: Set to 8

      • seq_length: Set to 512

      The model performs well on the test data in this topic with this hyperparameter configuration. If you encounter low accuracy when fine-tuning a model for your business needs, try adjusting the hyperparameters. To learn more about what hyperparameters do and how to use the loss curve to guide adjustments, see the Alibaba Cloud Large Model ACP course.

      Then, click Train > OK. The training job enters the Creating state. When the status changes to In operation, model fine-tuning starts.

  3. View the training job until it completes. The fine-tuning process takes about 10 minutes. During this time, the job details page displays logs and metric curves. After the training job completes, the system saves the fine-tuned model to the specified OSS directory.

    To view the training job details later, in the left-side navigation pane, click Model Gallery > Job Management > Training Jobs, and then click the job name.

    (Optional) Adjust hyperparameters using loss curves

    On the job details page, you can view the train_loss curve (training set loss) and the eval_loss curve (validation set loss):

    imageimage

    You can use the trend of the loss values to assess the model's training effectiveness:

    • Underfitting: Both the train_loss and eval_loss curves are still decreasing when training ends.

      You can increase the num_train_epochs parameter (the number of training epochs, which is positively correlated with training depth) or the lora_rank value (the rank of the low-rank matrix; a larger rank allows the model to handle more complex tasks but increases the risk of overfitting). Then, retrain the model to better fit the training data.

    • Overfitting: The train_loss continues to decrease while the eval_loss starts to increase before training ends.

      You can decrease the num_train_epochs parameter or the lora_rank value, and then retrain the model to prevent overfitting.

    • Good fit: Both the train_loss and eval_loss curves stabilize before the training ends.

      When the model reaches this state, you can proceed to the next steps.

    Due to space limitations, this topic does not detail fine-tuning parameters. To learn about key parameters in fine-tuning commands and how to use loss curves to guide further fine-tuning, see the Alibaba Cloud Large Model ACP course.

4.4 Deploying the fine-tuned model

On the training job details page, click Deploy to open the deployment configuration page. For Resource Type, select Public Resources. Deploying the 0.6B model requires about 5 GB of GPU memory. The list under Instance Type automatically displays compatible specifications. Select one, such as ecs.gn7i-c8g1.2xlarge. Keep the other parameters at their default values, and then click Deploy > OK.

Deployment takes about 5 minutes and is complete when the status changes to Running.

To view the training job details, in the left-side navigation pane, click Model Gallery > Job Management > Training Jobs, and then click the job name.
If the Deploy button is disabled after the training job succeeds, it means the output model is still being registered. Wait about one minute for the button to be enabled.

The steps to invoke the model are the same as described in 3.2 Invoke the model.

4.5 Evaluate the fine-tuned model

Before deploying the fine-tuned model to a production environment, evaluate its performance to ensure it is stable and accurate. This evaluation helps prevent unexpected issues after deployment.

Prepare test data

Prepare a test dataset that does not overlap with your training data to evaluate the model's performance. The accuracy test code below automatically downloads a test set for this purpose.

Using a test dataset that is separate from the training data ensures an unbiased assessment of the model's generalization ability on unseen data. This practice prevents inflated scores that result from evaluating the model on data it has already seen.

Design evaluation metrics

Evaluation metrics should align closely with your business objectives. For this solution's use case, in addition to validating the generated JSON, you must also verify that the key-value pairs are correct.

Define the evaluation metrics programmatically. For the implementation in this example, refer to the compare_address_info method in the accuracy test code below.

Validate the fine-tuned model

Run the following test code to output the model's accuracy on the test set.

Test model accuracy

Note: Replace the Token and endpoint with the invocation details you obtained earlier.

from openai import AsyncOpenAI
import requests
import json
import asyncio
import os
# If the 'Token' environment variable is not set, replace the following line with your Token from the Elastic Algorithm Service (EAS): token = 'YTA1NTEzMzY3ZTY4Z******************'
token = os.environ.get("Token")
# Do not remove the "/v1" suffix from the endpoint.
client = AsyncOpenAI(
    api_key=token,
    base_url=f'YOUR_ENDPOINT/v1',
)
if token is None:
    print("Please set the 'Token' environment variable, or assign your token directly to the 'token' variable.")
    exit()
system_prompt = """You are a professional information extraction assistant specializing in extracting structured recipient information from Chinese text.
## Task Description
Based on the given input text, accurately extract and generate a JSON output containing the following six fields:
- province: Province/Municipality/Autonomous Region (must be the full official name, e.g., "河南省", "上海市", "新疆维吾尔自治区")
- city: City name (including "市", e.g., "郑州市", "西安市")
- district: District/County name (including "区", "县", e.g., "金水区", "雁塔区")
- specific_location: Detailed address (street, building, or apartment number, etc.)
- name: Full name of the recipient (full Chinese name)
- phone: Contact phone number (full phone number, including area code)
## Extraction Rules
1. **Address Handling**:
   - Accurately identify the hierarchical relationship of province, city, and district.
   - The province name must be the official full name (e.g., "河南省" not "河南").
   - For municipalities, the province and city fields should be the same (e.g., both "上海市").
   - `specific_location` should contain the detailed street address, community name, building number, etc.
2. **Name Recognition**:
   - Accurately extract the full Chinese name, including compound surnames.
   - Include names of ethnic minorities.
3. **Phone Number Handling**:
   - Extract the full phone number, maintaining its original format.
## Output Format
Strictly follow the JSON format below and do not add any explanatory text:
{
  "province": "Province Name",
  "city": "City Name", 
  "district": "District Name",
  "specific_location": "Detailed Address",
  "name": "Recipient Name",
  "phone": "Contact Phone"
}"""
def compare_address_info(actual_address_str, predicted_address_str):
    """Compares two JSON strings representing address information to see if they are identical."""
    try:
        # Parse the actual address information
        if actual_address_str:
            actual_address_json = json.loads(actual_address_str)
        else:
            actual_address_json = {}
        # Parse the predicted address information
        if predicted_address_str:
            predicted_address_json = json.loads(predicted_address_str)
        else:
            predicted_address_json = {}
        # Directly compare if the two JSON objects are identical
        is_same = actual_address_json == predicted_address_json
        return {
            "is_same": is_same,
            "actual_address_parsed": actual_address_json,
            "predicted_address_parsed": predicted_address_json,
            "comparison_error": None
        }
    except json.JSONDecodeError as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"JSON parsing error: {str(e)}"
        }
    except Exception as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"Comparison error: {str(e)}"
        }
async def predict_single_conversation(conversation_data):
    """Predicts the label for a single conversation."""
    try:
        # Extract user content (excluding the assistant message)
        messages = conversation_data.get("messages", [])
        user_content = None
        for message in messages:
            if message.get("role") == "user":
                user_content = message.get("content", "")
                break
        if not user_content:
            return {"error": "User message not found"}
        response = client.chat.completions.create(
            model="Qwen3-0.6B",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_content}
            ],
            response_format={"type": "json_object"},
            extra_body={
                "enable_thinking": False
            }
        )
        predicted_labels = response.choices[0].message.content.strip()
        return {"prediction": predicted_labels}
    except Exception as e:
        return {"error": f"Prediction failed: {str(e)}"}
async def process_batch(batch_data, batch_id):
    """Processes a batch of data."""
    print(f"Processing batch {batch_id}, containing {len(batch_data)} items...")
    tasks = []
    for i, conversation in enumerate(batch_data):
        task = predict_single_conversation(conversation)
        tasks.append(task)
    results = await asyncio.gather(*tasks, return_exceptions=True)
    batch_results = []
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            batch_results.append({"error": f"Exception: {str(result)}"})
        else:
            batch_results.append(result)
    return batch_results
async def main():
    output_file = "predicted_labels.jsonl"
    batch_size = 20  # Number of items to process in each batch
    # Read the test data
    url = 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250616/ssrgii/test.jsonl'
    conversations = []
    try:
        response = requests.get(url)
        response.raise_for_status()  # Check if the request was successful
        for line_num, line in enumerate(response.text.splitlines(), 1):
            try:
                data = json.loads(line.strip())
                conversations.append(data)
            except json.JSONDecodeError as e:
                print(f"JSON parsing error on line {line_num}: {e}")
                continue
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
        return
    print(f"Successfully read {len(conversations)} conversations")
    # Process in batches
    all_results = []
    total_batches = (len(conversations) + batch_size - 1) // batch_size
    for batch_id in range(total_batches):
        start_idx = batch_id * batch_size
        end_idx = min((batch_id + 1) * batch_size, len(conversations))
        batch_data = conversations[start_idx:end_idx]
        batch_results = await process_batch(batch_data, batch_id + 1)
        all_results.extend(batch_results)
        print(f"Batch {batch_id + 1}/{total_batches} complete")
        # Add a short delay to avoid making requests too quickly
        if batch_id < total_batches - 1:
            await asyncio.sleep(1)
    # Save results
    same_count = 0
    different_count = 0
    error_count = 0
    with open(output_file, 'w', encoding='utf-8') as f:
        for i, (original_data, prediction_result) in enumerate(zip(conversations, all_results)):
            result_entry = {
                "index": i,
                "original_user_content": None,
                "actual_address": None,
                "predicted_address": None,
                "prediction_error": None,
                "address_comparison": None
            }
            # Extract original user content
            messages = original_data.get("messages", [])
            for message in messages:
                if message.get("role") == "user":
                    result_entry["original_user_content"] = message.get("content", "")
                    break
            # Extract actual address information (if an assistant message exists)
            for message in messages:
                if message.get("role") == "assistant":
                    result_entry["actual_address"] = message.get("content", "")
                    break
            # Save prediction result
            if "error" in prediction_result:
                result_entry["prediction_error"] = prediction_result["error"]
                error_count += 1
            else:
                result_entry["predicted_address"] = prediction_result.get("prediction", "")
                # Compare address information
                comparison_result = compare_address_info(
                    result_entry["actual_address"],
                    result_entry["predicted_address"]
                )
                result_entry["address_comparison"] = comparison_result
                # Tally comparison results
                if comparison_result["comparison_error"]:
                    error_count += 1
                elif comparison_result["is_same"]:
                    same_count += 1
                else:
                    different_count += 1
            f.write(json.dumps(result_entry, ensure_ascii=False) + '\n')
    print(f"All predictions are complete! Results have been saved to {output_file}")
    # Tally the results
    success_count = sum(1 for result in all_results if "error" not in result)
    prediction_error_count = len(all_results) - success_count
    print(f"Number of samples: {success_count}")
    print(f"Correct responses: {same_count}")
    print(f"Incorrect responses: {different_count}")
    print(f"Accuracy: {same_count * 100 / success_count} %")
if __name__ == "__main__":
    asyncio.run(main())

Output:

All predictions are complete! Results have been saved to predicted_labels.jsonl
Number of samples: 400
Correct responses: 361
Incorrect responses: 39
Accuracy: 91.25 %
Due to the random seed used in model fine-tuning and the stochastic nature of the large language model's output, the accuracy you achieve may differ from the results shown in this solution. This variance is normal.

As you can see, the accuracy is 91.25%, a significant improvement from the 14% accuracy of the original Qwen3-0.6B model. This indicates that fine-tuning substantially improved the model's performance on structured information extraction for logistics tasks.

To reduce training time, this guide uses only 4 training epochs, achieving an accuracy of 91.25%. You can further improve the accuracy by increasing the number of training epochs. For other scenarios, refer to the Alibaba Cloud large language model ACP course to learn how to adjust hyperparameters.

4.6 Important note

The model service in this topic uses public resources and is pay-as-you-go. When you no longer need the service, stop or delete it to avoid further charges.

Related documents

  • To learn more about Model Gallery features such as evaluation and compression, see Model Gallery.

  • To learn more about EAS features such as Auto Scaling, stress testing, and monitoring and alerting, see the EAS overview.