Fine-tune image generation models

更新时间:
复制 MD 格式

When using Wan for image generation, if prompt optimization cannot meet your customization needs for specific styles, IP characters, or visual effects, use model fine-tuning.

Scope

  • Supported deployment mode and region: This document applies only to the Beijing region under the China (Mainland) deployment mode. You must use an API key from this region.

  • Account permissions: If you use an Alibaba Cloud sub-account (RAM user), you must grant the sub-account permissions for model invocation, training, and deployment.

  • Supported fine-tuning method: SFT-LoRA efficient fine-tuning.

  • Supported models:

    • Image generation (text-to-image/image-to-image): wan2.7-image-pro.

How to fine-tune a model

Text-to-image

Fine-tuning objective: Train a character LoRA model.

Expected result: Given a text prompt, the model generates images of a specific character in the scene described by the prompt.

Input prompt

A person in a crowded morning rush hour subway car, holding onto the handrail, with blurred passengers in the background and tunnel lights visible through the windows, wearing an ordinary office worker white shirt and black trousers, standing facing the camera, half-body shot, realistic candid feel.

Output image (before fine-tuning - text-to-image)

7217b6ac-789d-43c3-aaa5-22647532de52_0

Without a reference image, the model cannot generate a specific character.

Output image (after fine-tuning)

1_24

After fine-tuning, the model can reliably reproduce the specific character from the training set.

Image-to-image

Fine-tuning objective: Train a "post-apocalyptic red-black mech armor" LoRA model.

Expected result: Given a character image, the model generates a "post-apocalyptic red-black mech armor" stylized version of the character without requiring a text prompt.

Input image

29_0

Output image (before fine-tuning)

output_0_0

Text prompts alone cannot reliably produce the specific "post-apocalyptic red-black mech armor" effect every time.

Output image (after fine-tuning)

29_1

After fine-tuning, the model can reproduce the "post-apocalyptic red-black mech armor" effect from the training set without requiring a text prompt.

Fine-tuning objective: Train an "IP character stylization" LoRA model.

Expected result: Given a text description or a reference image, the model generates images that match a specific IP character style.

Before running the following code, activate Model Studio and configure your API key.

Step 1: Upload the dataset

Upload your local dataset (in .zip format) to the Alibaba Cloud Model Studio platform and obtain the file ID (file_id).

Sample training data: For the format, see Training set.

Request example

This example uses text-to-image and uploads only the training set. The system automatically splits a portion of the training set as the validation set.
curl --location --request POST 'https://dashscope.aliyuncs.com/api/v1/files' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--form 'files=@"./wan-image-t2i-training-dataset.zip"' \
--form 'purpose="fine-tune"' \
--form 'descriptions="a fine-tune training data file for wan"'

Response example

Save the file_id. This is the unique identifier for the uploaded dataset.

{
    "data": {
        "uploaded_files": [
            {
                "name": "wan-image-t2i-training-dataset.zip",
                "file_id": "3bff1ef7-f72d-4285-bb75-xxxxxx"
            }
        ],
        "failed_uploads": []
    },
    "request_id": "1f3f1c5b-7418-4976-aaea-xxxxxx"
}

Step 2: Fine-tune the model

Step 2.1: Create a fine-tuning job

Use the file ID from Step 1 to start a training job.

Request example

Replace <replace_with_training_dataset_file_id> with the file_id obtained in the previous step.

curl --location 'https://dashscope.aliyuncs.com/api/v1/fine-tunes' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.7-image-pro",
    "training_file_ids": ["<replace_with_training_dataset_file_id>"],
    "training_type": "efficient_sft",
    "hyper_parameters": {
        "learning_rate": 3e-5,
        "max_steps": 800,
        "eval_steps": 200,
        "max_token_length": "2k",
        "gradient_clip": 0.5,
        "weight_decay": 0.02,
        "max_pixels": "2k",
        "val_img_size": "2k",
        "generation_type": "t2i",
        "lora_rank": 32,
        "lora_alpha": 32,
        "save_total_limit": 10
    }
}'
Note

The hyperparameters for image generation models differ from those for video models (image models use max_steps/eval_steps instead of n_epochs/eval_epochs). For the complete parameter reference and format constraints, see Hyperparameters (hyper_parameters).

Training duration reference (2K resolution, for reference only; actual duration depends on training data size and cluster load):

  • Text-to-image (t2i): approximately 77 minutes for 300 steps.

  • Image-to-image (i2i): approximately 110 minutes for 300 steps.

Response example

Pay attention to three key parameters in the output field:

  • job_id: The job ID, used to query progress.

  • finetuned_output: The name of the fine-tuned model. You must use this name for subsequent deployment and invocation.

  • status: The training status. After creating a fine-tuning job, the initial status is PENDING, indicating that training has not yet started.

{
    ...
    "output": {
        "job_id": "ft-202511111122-xxxx",
        "status": "PENDING",
        "finetuned_output": "xxxx-ft-202511111122-xxxx",
        ...
    }
}
Step 2.2: Query the fine-tuning job status

Use the job_id obtained in Step 2.1 to query the job progress. Poll the following API until the status changes to SUCCEEDED.

Request example

Replace <replace_with_fine_tuning_job_id> in the URL with the value of job_id.

curl --location 'https://dashscope.aliyuncs.com/api/v1/fine-tunes/<replace_with_fine_tuning_job_id>' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json'

Response example

Pay attention to two parameters in the output field:

  • status: When its value changes to SUCCEEDED, the model training is complete and you can proceed with model deployment.

  • usage: The total number of tokens consumed during model training, used for billing purposes.

{
    ...
    "output": {
        "job_id": "ft-202511111122-xxxx",
        "status": "SUCCEEDED",
        "usage": 432000,
        ...
    }
}

Step 3: Deploy the fine-tuned model

Step 3.1: Deploy the model as an online service

After the fine-tuning job status changes to SUCCEEDED, deploy the model as an online service.

Request example

Replace <replace_with_model_name> with the finetuned_output value from the Create a fine-tuning job output.

curl --location 'https://dashscope.aliyuncs.com/api/v1/deployments' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model_name": "<replace_with_model_name>",
    "capacity": 1,
    "plan": "lora"
}'

Response example

Pay attention to two parameters in the output field:

  • deployed_model: The deployed model name, used to query the deployment status and invoke the model.

  • status: The model deployment status. After deploying the fine-tuned model, the initial status is PENDING, indicating that deployment has not yet started.

{
    ...
    "output": {
        "deployed_model": "wan2.7-image-pro-xxxxxxxxxxxx",
        "status": "PENDING",
        ...
    }
}
Step 3.2: Query the deployment status

Query the deployment status. Poll the following API until the status changes to RUNNING.

Note

For the fine-tuned model in this example, the deployment process takes approximately 5-10 minutes.

Request example

Replace <replace_with_deployed_model> with the deployed_model value from the Step 3.1 output.

curl --location 'https://dashscope.aliyuncs.com/api/v1/deployments/<replace_with_deployed_model>' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' 

Response example

Pay attention to two parameters in the output field:

  • status: When the status changes to RUNNING, the model has been deployed successfully and you can start invoking it.

  • deployed_model: The deployed model name.

{
    ...
    "output": {
        "status": "RUNNING",
        "deployed_model": "wan2.7-image-pro-xxxxxxxxxxxx",
        ...
    }
}

Step 4: Invoke the model to generate images

After the model is deployed successfully (deployment status is RUNNING), you can start making invocations.

Step 4.1: Create an image generation task and obtain the task_id

Request example

Replace <replace_with_deployed_model> with the deployed_model value from the previous step.

Text-to-image

Provide a text description containing the trigger word. The model generates images matching the trained style.

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image-generation/generation' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "X-DashScope-Async: enable" \
--data '{
    "model": "<replace_with_deployed_model>",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {"text": "s86b5p, A person in a crowded morning rush hour subway car, holding onto the handrail, with blurred passengers in the background and tunnel lights visible through the windows, wearing an ordinary office worker white shirt and black trousers, standing facing the camera, half-body shot, realistic candid feel."}
                ]
            }
        ]
    },
    "parameters": {
        "size": "2K",
        "n": 1
    }
}'

Image-to-image

Provide a reference image and editing instructions. The model generates images based on the reference image in the trained style.

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image-generation/generation' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "X-DashScope-Async: enable" \
--data '{
    "model": "<replace_with_deployed_model>",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {"image": "<replace_with_reference_image_URL>"},
                    {"text": "s86b5p, Change the background to an elevator with red lighting. Change the character clothing to red tight-fitting mech armor with black stripe decorations."}
                ]
            }
        ]
    },
    "parameters": {
        "size": "2K",
        "n": 1
    }
}'

Response example

Copy and save the task_id for querying the result in the next step.

{
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx",
    "output": {
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx",
        "task_status": "PENDING"
    }
}

Input parameters

Note

When invoking a fine-tuned LoRA model, the input parameters are the same as those for the Wan Image Generation and Editing 2.7 API.

The following table lists only the key parameters for LoRA model invocation.

Field

Type

Required

Description

Example

model

string

Yes

The model name. You must use a fine-tuned model that has been successfully deployed with a status of RUNNING.

wan2.7-image-pro-xxxxxxxxxxxx

input.messages[].content[].text

string

Yes

The text prompt. We recommend including the trigger word to activate the LoRA style.

s86b5p, A person in a quiet private library on a peaceful afternoon...

parameters.size

string

No

The output image resolution.

  • Option 1: Specify the output resolution (recommended)

    • Supports 1K, 2K (default), and 4K

    • Applicable modes:

      • Text-to-image: supports 1K, 2K, and 4K.

      • Image editing: supports 1K and 2K.

    • Total pixels per resolution: 1K: 1024*1024, 2K: 2048*2048, 4K: 4096*4096

  • Option 2: Specify width and height pixel values

    • Text-to-image: Total pixels must be between [768*768, 4096*4096], with an aspect ratio range of [1:8, 8:1].

    • Image editing: Total pixels must be between [768*768, 2048*2048], with an aspect ratio range of [1:8, 8:1].

2K

parameters.n

integer

No

The number of images to generate. Valid values: 1-4. Default: 1.

1

Step 4.2: Query results by task_id

Poll the task status using the task_id until task_status changes to SUCCEEDED. Retrieve the image URL from output.choices[].message.content[].image.

Request example

Replace 86ecf553-d340-4e21-xxxxxxxxx with your actual task_id.
curl -X GET https://dashscope.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Response example

The image URL is valid for 24 hours. Download the image promptly.
{
    "request_id": "3f2ebb4e-3d47-97b5-xxxx-xxxxxx",
    "output": {
        "task_id": "aeea547c-e24e-4acb-xxxx-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2026-05-29 17:35:23.826",
        "scheduled_time": "2026-05-29 17:35:23.865",
        "end_time": "2026-05-29 17:36:32.498",
        "finished": true,
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "image": "https://dashscope-7c2c.oss-accelerate.aliyuncs.com/xxx.png?Expires=xxxxxx"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "size": "2048*2048",
        "total_tokens": 770,
        "image_count": 1,
        "output_tokens": 691,
        "input_tokens": 79
    }
}

Build custom datasets

In addition to using the sample data in this document to experience the fine-tuning workflow, you can also build your own datasets for fine-tuning.

A dataset should contain a training set (required) and a validation set (optional; supports automatic splitting from the training set). Package all files in .zip format. File names should contain only English characters, digits, underscores, or hyphens.

Dataset format

Training set: Required

Text-to-image

The training set includes training target images and an annotation file (data.jsonl).

  • Training set sample: wan-image-t2i-training.zip.

  • Zip package directory structure:

    wan-image-t2i-training-dataset.zip
    ├── data.jsonl      # Must be named data.jsonl, maximum 20MB
    ├── 1_0.png         # Training target image, max resolution 4096*4096, max 20MB per image, supports PNG/JPG/JPEG/WEBP/BMP
    ├── 1_1.png         # File names support only English characters, flat structure (no subdirectories)
    └── 1_2.png
  • Annotation file (data.jsonl): Each line represents one training sample and must be a JSON object.

    {
      "prompt": "s86b5p, A person in a quiet private library on a peaceful afternoon, with tall dark walnut bookshelves behind them, sunlight streaming through venetian blinds casting striped shadows, wearing a soft beige cable-knit sweater, standing facing the camera, half-body shot, the image has a delicate film grain texture.",
      "img_path": "./1_0.png"
    }

Image-to-image

The training set includes reference images (input), training target images (output), and an annotation file (data.jsonl).

  • Training set sample: wan-image-i2i-training.zip.

  • Zip package directory structure:

    wan-image-i2i-training-dataset.zip
    ├── data.jsonl      # Must be named data.jsonl, maximum 20MB
    ├── 1_0.jpg         # Training target image (output)
    ├── 1_1.jpg         # Reference image (input)
    ├── 6_0.jpg         # Training target image (output)
    └── 6_1.jpg         # Reference image (input)
  • Annotation file (data.jsonl): Each line represents one training sample and must be a JSON object.

    {
      "prompt": "s86b5p, Change the background to an elevator with red lighting, featuring large floor-to-ceiling windows. Change the character's clothing to red tight-fitting mech armor with black stripe decorations.",
      "input_img": "./1_1.jpg",
      "img_path": "./1_0.jpg"
    }
Note
  • data.jsonl must be in line-delimited JSONL format (one independent JSON object per line). Using JSON array format (where the first character of the file is [) is not allowed.

  • Files within the zip package must be placed in a flat structure. Subdirectories are not allowed. File names support only English characters (Chinese characters, spaces, and special characters are not allowed).

Validation set: Optional

The validation set includes an annotation file (data.jsonl) and optional reference images (required for image-to-image mode). Target images are not needed. At each evaluation checkpoint, the training job automatically invokes the model service to generate preview images using the prompts (and reference images) from the validation set.

  • Validation set:

  • Zip package directory structure:

    wan-image-i2i-valid-dataset.zip
    ├── data.jsonl       # Must be named data.jsonl, maximum 20MB
    ├── input_001.png    # Optional, reference image for image-to-image mode
    └── input_002.png
  • Annotation file (data.jsonl): Each line represents one validation sample and must be a JSON object.

    Text-to-image

    {
        "prompt": "s86b5p, A person in a crowded morning rush hour subway car, holding onto the handrail, with blurred passengers in the background and tunnel lights visible through the windows, wearing an ordinary office worker white shirt and black trousers, standing facing the camera, half-body shot, realistic candid feel."
    }

    Image-to-image

    {
        "prompt": "s86b5p, Change the background to an elevator with red lighting, featuring large floor-to-ceiling windows. Change the character's clothing to red tight-fitting mech armor with black stripe decorations.",
        "input_img": "./input_001.png"
    }

Data scale and limits

  • Data volume: We recommend providing at least 25 images (50 or more is recommended for better results). Use the same character or style across multiple scenes and angles with consistent content descriptions.

  • Zip package: When uploading via API, the total package size must be no larger than 1 GB.

  • Training image requirements:

    • Supported image formats: BMP, JPEG, PNG, and WEBP.

    • Image resolution must be no larger than 4096×4096.

    • Individual image file size must be no larger than 20 MB.

Data collection and cleaning

1. Determine the fine-tuning scenario

Wan supports the following fine-tuning scenarios for image generation:

  • IP character stylization: Train the model to learn the drawing style of a specific IP character, such as anime characters or mascot images.

  • Fixed visual style: Improve the model's ability to reproduce a specific art style, such as flat illustration, ink painting, or pixel art.

  • Specific scene generation: Replicate specific composition patterns or scene templates, such as product display images or poster layouts.

2. Obtain raw materials
  • AI generation and selection: Use the Wan base model to generate images in bulk, then manually select the high-quality samples that best match the target effect. This is the most commonly used method.

  • Real photography: If your goal is to achieve highly realistic scenes (such as real product photos or portrait photography), using real-shot footage is the best choice.

  • 3D software rendering: For scenes that require fine detail control or 3D rendering styles, we recommend using 3D software (such as Blender or C4D) to create source materials.

3. Clean the data

Dimension

Best practice

Anti-pattern

Consistency

Core features must be highly consistent.

For example: When training a "flat illustration style", all images must share the same line thickness and color scheme.

Mixed styles.

The dataset contains both impasto style and flat style images. The model cannot determine which style to learn.

Diversity

The more diverse the subjects and scenes, the better.

Cover different subjects (men, women, elderly, children, cats, dogs, buildings) and different compositions (long shot, close-up, extreme close-up). Resolution and aspect ratios should also be as varied as possible.

Single scene or subject.

All images show "a person in red clothes against a white wall". The model may mistakenly learn that "red clothes" and "white wall" are part of the style, and fail to generate correctly in different scenes.

Balance

Balanced proportions across data types.

If multiple styles are included, the quantity should be roughly equal.

Severely imbalanced proportions.

90% are portrait images and 10% are landscape images. The model may perform poorly when generating landscape images.

Cleanliness

Clean and clear images.

Use original materials without distractions.

Contains distracting elements.

Images contain watermarks, obvious black borders, or noise. The model may learn the watermarks as part of the style.

Resolution

Moderate resolution.

We recommend that training image resolution does not exceed 2048×2048. Excessively large images increase training time.

Resolution varies too widely.

Having both 256×256 small images and 4096×4096 large images in the training set affects training stability.

Image annotation: Writing prompts for images

In the dataset annotation file (data.jsonl), each image has a corresponding prompt. The prompt describes the content of the target image. The quality of the prompt directly determines what the model learns.

Prompt writing formula

Prompt = [Subject description] + [Background description] + [Trigger word] + [Style description]

Prompt component

Description

Recommendation

Example

Subject description

Describes the people or objects in the image

Required

A young woman wearing a red Chinese-style long shirt...

Background description

Describes the environment where the subject is located

Required

The background is a brick wall covered with green vines...

Trigger word

A rare word with no actual meaning

Recommended

s86b5p or m01aa

Style description

Describes the art style and visual characteristics of the target image in detail

Recommended

Rendered in flat illustration style with clean flowing lines and vivid flat colors to emphasize three-dimensionality and modern design aesthetics.

About trigger words
  • What is a trigger word?

    It serves as a "visual anchor". Because many complex visual styles (such as a unique image texture or specific color scheme) are difficult to describe precisely in text, a trigger word explicitly tells the model: when you see s86b5p, you must generate this specific visual style.

  • Why use it?

    Model fine-tuning establishes mappings between "text" and "image features". The trigger word binds an "indescribable style" to a unique word, enabling the model to lock onto the target.

  • If we already have a trigger word, why still describe the style in detail?

    The two serve different purposes and work better together.

    • Style description: Explains "what the image should look like". It tells the model the basic art style and visual characteristics. The style description is usually consistent across multiple samples.

    • Trigger word: Explains "what the style specifically looks like". It represents unique visual characteristics that cannot be precisely described in text.

Evaluate models with validation sets

Specify the validation set

A fine-tuning job must include a training set, while a validation set is optional. You can choose to have the system automatically split or manually upload a validation set. The specific methods are as follows:

Method 1: No validation set uploaded (system automatic split)

When creating a fine-tuning job, if no validation set is uploaded separately (i.e., the validation_file_ids parameter is not provided), the system splits a validation set from the training set based on split, which defaults to 0.9. This means 90% is used for training and 10% for validation.

Method 2: Manually upload a validation set (specified via validation_file_ids)

If you want to use your own prepared data to evaluate checkpoints instead of relying on system random splitting, you can upload a custom validation set.

Note: Once you choose to upload manually, the system completely ignores the automatic split rules above and uses only the data you uploaded for validation.

Procedure: Manually upload a validation set

  1. Prepare the validation set: Package the validation data into a separate .zip file. See Validation set format.

  2. Upload the validation set: Call the Upload dataset API to upload this validation set .zip file and obtain a dedicated file ID.

  3. Specify the validation set when creating the job: When calling the Create a fine-tuning job API, fill in this file ID in the validation_file_ids parameter.

    {
        "model":"wan2.7-image-pro",
        "training_file_ids":[ "<training_set_file_id>" ],
        "validation_file_ids": [ "<custom_validation_set_file_id>" ],
        ...
    }

Select the best checkpoint for deployment

During training, the system periodically saves model "snapshots" (i.e., checkpoints). By default, the system outputs the last checkpoint as the final fine-tuned model. However, checkpoints produced during intermediate stages may perform better than the final version. You can select the most satisfactory one for deployment.

The system runs checkpoints on the validation set and generates preview images at intervals set by the hyperparameter eval_steps.

  • How to evaluate: Judge the results by directly observing the generated preview images.

  • Selection criteria: Find the checkpoint with the best results and the most closely matching style.

Procedure

Step 1: View preview results generated by checkpoints
Step 1.1: Query the list of validated checkpoints

This API only returns checkpoints that have passed validation and successfully generated preview images. Checkpoints that failed validation are not listed.

Request example

curl --location 'https://dashscope.aliyuncs.com/api/v1/fine-tunes/<replace_with_fine_tuning_job_id>/validation-results' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' 

Response example

This API returns a list containing only the names of checkpoints that have successfully passed validation.

{
    "request_id": "da1310f5-5a21-4e29-99d4-xxxxxx",
    "output": [
        {
            "checkpoint": "checkpoint-160"
        },
        ...
    ]
}

Step 1.2: Query the validation set results for a checkpoint

Select a checkpoint from the list returned in the previous step (for example, "checkpoint-160") and view the generated image results.

Request example

  • <replace_with_fine_tuning_job_id>: Replace entirely with the job_id value from the Create a fine-tuning job output.

  • <replace_with_checkpoint_to_export>: Replace entirely with the checkpoint value, for example "checkpoint-160".

curl --location 'https://dashscope.aliyuncs.com/api/v1/fine-tunes/<replace_with_fine_tuning_job_id>/validation-details/<replace_with_checkpoint_to_export>?page_no=1&page_size=10' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Response example

The preview image URL is in the img_path field and is valid for 24 hours. Download the images promptly to review the results. Repeat this step to compare the results of multiple checkpoints and find the most satisfactory one.

{
    "request_id": "375b3ad0-d3fa-451f-b629-xxxxxxx",
    "output": {
        "page_no": 1,
        "page_size": 10,
        "total": 5,
        "list": [
            {
                "img_path": "https://finetune-result.oss-cn-wulanchabu.aliyuncs.com/xxx.png?Expires=xxxxxx",
                "prompt": "s86b5p, Change the background to an elevator equipped with a white ceiling lighting, featuring large floor-to-ceiling windows. Change the character's clothing to red tight-fitting mech armor with black stripe decorations.",
                "input_img": "https://finetune-result.oss-cn-wulanchabu.aliyuncs.com/val_dataset/input_001.png?Expires=xxxxxx"
            },
            ...
        ]
    }
}

Step 2: Export the checkpoint and obtain the model name for deployment
Step 2.1: Export the model

Assuming "checkpoint-160" has the best results, the next step is to export it.

Request example

  • <replace_with_fine_tuning_job_id>: Replace entirely with the job_id value from the Create a fine-tuning job output.

  • <replace_with_checkpoint_to_export>: Replace entirely with the checkpoint value, for example "checkpoint-160".

  • <replace_with_exported_model_display_name>: Replace entirely with a custom model name used only for console display, for example "wan2.5-checkpoint-160". This name must be globally unique. Exporting with duplicate names is not supported. For parameter details, see Export Checkpoint.

curl --location 'https://dashscope.aliyuncs.com/api/v1/fine-tunes/<replace_with_fine_tuning_job_id>/export/<replace_with_checkpoint_to_export>?model_name=<replace_with_exported_model_display_name>' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Response example

The response parameter output=true indicates that the export request has been successfully created.

{
    "request_id": "0817d1ed-b6b6-4383-9650-xxxxx",
    "output": true
}
Step 2.2: Query the new model name for deployment

Query the status of all checkpoints, confirm that the export is complete, and obtain the dedicated new model name (model_name) for deployment.

Request example

curl --location 'https://dashscope.aliyuncs.com/api/v1/fine-tunes/<replace_with_fine_tuning_job_id>/checkpoints' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Response example

Locate the exported checkpoint (such as checkpoint-160) in the returned list. When its status changes to SUCCEEDED, the export is successful. The model_name field returned at this point is the new model name after export.

{
    "request_id": "b0e33c6e-404b-4524-87ac-xxxxxx",
    "output": [
         ...,
        {
            "create_time": "2025-11-11T13:27:29",
            "full_name": "ft-202511111122-496e:checkpoint-160",
            "job_id": "ft-202511111122-496e",
            "checkpoint": "checkpoint-160",
            "model_name": "xxxx-ft-202511111122-xxxx-c160", // Important field, used for model deployment and invocation
            "model_display_name": "xxxx-ft-202511111122-xxxx",
            "status": "SUCCEEDED" // Successfully exported checkpoint
        },
        ...

    ]
}
Step 3: Deploy and invoke the model

After successfully exporting the checkpoint and obtaining the model_name, follow these steps for subsequent operations:

  • Model deployment: Fill in the model_name input parameter with the specific value obtained after export.

  • Model invocation: Follow the API documentation to invoke the deployed model.

Billing

  • Model training: Charged.

    The following table lists common training step counts and estimated costs for text-to-image (t2i) training. This data is for reference only. The actual training results are subject to the final delivery, and the costs are subject to the official bill. For detailed billing formulas, see Model training billing.

    Image Resolution

    Common Step Count

    Estimated Token Consumption

    Estimated Cost (CNY)

    1K

    500

    64,000,000

    5,120

    1,000

    128,000,000

    10,240

    2,000

    256,000,000

    20,480

    2K

    500

    116,100,000

    9,288

    1,000

    232,200,000

    18,576

    2,000

    464,400,000

    37,152

  • Model deployment: Free.

  • Model invocation: Charged at the standard invocation price of the base model used for fine-tuning.

  • Model ID

    Price per output

    wan2.7-image-pro

    CNY 0.50/image

API reference

Video and image generation model fine-tuning API

FAQ

Q: How do I design a good trigger word?

A: The rules are as follows:

  • We recommend using rare character combinations with no actual semantic meaning, such as s86b5p, m01aa, or EVEAven638123. Ensure there is no semantic meaning in the base model's vocabulary.

  • Avoid using common English words (such as beautiful, fire, or dance), as this would pollute the model's original understanding of these words.