Best practices for model warm-up in Function Compute-Function Compute(FC)-阿里云帮助中心

This topic describes how to use a model warm-up script to reduce the high latency of the first request to an AI inference application. When you deploy an application in Function Compute, you can configure an Initializer hook with a warm-up script and set the minimum number of instances to 1 or more.

Background information

When an elastic instance starts, it performs initialization operations, such as model loading, before it can process its first request. This process increases the latency of the first request. To address this issue, Function Compute provides a model warm-up feature. After you set the minimum number of instances, a user-defined model warm-up action is performed before the instance accepts requests. This prevents high latency for the first request.

Initializer Hook

The Initializer hook executes a user-configured callback after the function runtime starts. There are two types of callbacks: Invoke Code and Execute Command. For more information, see Configure Instance Lifecycle. Both types of Initializer callbacks can be used to implement model warm-up. With the Invoke Code type, you must modify the runtime image to add a POST /initialize path that implements the model warm-up logic. With the Execute Command type, you do not need to modify the image. You can configure a warm-up script in the function lifecycle.

Model warm-up for a ComfyUI text-to-image service

Note

Alibaba Cloud Function Compute provides a convenient graphical prefetch configuration feature through Function AI. If you deploy your application using Function AI, we recommend that you use its built-in prefetch feature because it is easier to configure. For more information, see How to prefetch a model to avoid long wait times for the first image generation request?.

The following section uses a ComfyUI text-to-image service deployed in Function Compute as an example to show how to use an Execute Command Initializer hook to perform model warm-up without changing any code. This is a lower-level method for scenarios that do not involve Function AI or that require more customization.

1. Create a ComfyUI text-to-image service

Log on to the Function Compute console. In the navigation pane on the left, choose More Features > Applications, and then click Create Application.
On the Create Application page, select Create from Template. On the Artificial Intelligence tab, find the Workflow-based AI Image Generation ComfyUI card, move the pointer over the card, and then click Create Now.
On the Create Application page, select a region and a built-in model, and then click Create Application. In the dialog box that appears, read the notice about application creation. Select the check boxes for the billable items, and then select I have read and agree to the terms above. Then, click Agree and Deploy.
This topic uses the Clay Style built-in model as an example.

2. Create a warm-up script

After the application is deployed, on the application details page, select the Environment Details tab. Click the endpoint in the Environment Information section to open the ComfyUI application interface.
Click . In the Settings dialog box, select the Enable Dev mode Options check box to enable developer mode.
Click Save (API Format) to export the workflow JSON file workflow_api.json.

Create the model warm-up script.

Important

The warm-up workflow must meet the following requirements:
- The workflow must run correctly. Ensure that all required models, plugins, and custom nodes are installed in the function instance.
- To warm up multiple models, you can add multiple model loaders to the workflow or extend it with sub-workflows.
The core of model warm-up is to load the model into GPU memory, not to generate a high-quality image. To shorten the warm-up time, do the following:
- Set the number of sampler iteration steps to 1.
- Set the image dimensions (width and height) to the minimum value of 16 × 16.

The following script is an example. Before you use it, replace the content between the two `EOF` markers with the workflow JSON file that you exported in the previous step. To reduce the warm-up time, reduce the values of the `steps`, `width`, and `height` parameters.

#!/bin/bash

# --- User-modifiable section ---
# Paste the workflow JSON that you exported in the previous step. If necessary, you can modify the content to use a random seed for KSampler or to warm up other models.
PROMPT=$(cat << EOF
{                       
  "3": {
    "inputs": {
      "seed": 490449184065642,
      "steps": 1,
      "cfg": 8,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1,
      "model": [
        "4",
        0
      ],
      "positive": [
        "6",
        0
      ],
      "negative": [
        "7",
        0
      ],
      "latent_image": [
        "5",
        0
      ]
    },
    "class_type": "KSampler",
    "_meta": {
      "title": "KSampler"
    }
  },
  "4": {
    "inputs": {
      "ckpt_name": "AnimeSkyRealmSDXL.safetensors"
    },
    "class_type": "CheckpointLoaderSimple",
    "_meta": {
      "title": "Load Checkpoint"
    }
  },
  "5": {
    "inputs": {
      "width": 16,
      "height": 16,
      "batch_size": 1
    },
    "class_type": "EmptyLatentImage",
    "_meta": {
      "title": "Empty Latent Image"
    }
  },
  "6": {
    "inputs": {
      "text": "beautiful scenery nature glass bottle landscape, , purple galaxy bottle,",
      "clip": [
        "4",
        1
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "7": {
    "inputs": {
      "text": "text, watermark",
      "clip": [
        "4",
        1
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "8": {
    "inputs": {
      "samples": [
        "3",
        0
      ],
      "vae": [
        "4",
        2
      ]
    },
    "class_type": "VAEDecode",
    "_meta": {
      "title": "VAE Decode"
    }
  },
  "9": {
    "inputs": {
      "filename_prefix": "ComfyUI",
      "images": [
        "8",
        0
      ]
    },
    "class_type": "SaveImage",
    "_meta": {
      "title": "Save Image"
    }
  }
}
EOF
)


# --- Core script logic (do not modify) ---
# Parse JSON
function parseJSON() {
    echo $1 | python3 -c "import json; print(json.loads(input())$2)" 2> /dev/null || echo ""
}

# Print logs with timestamps
function Log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1"
}

Log "Starting ComfyUI warm-up"
Log "Prompt: $PROMPT"

result=$(curl -XPOST http://127.0.0.1:9000/api/prompt -s -H "Content-Type: application/json" -d "{ \"client_id\": \"prewarm_client\", \"prompt\": ${PROMPT} }")
prompt_id=$(parseJSON "$result" '["prompt_id"]')
Log "Prompt ID: $prompt_id"

if [ -z "$prompt_id" ]; then
  Log "Failed to get Prompt ID. Warm-up finished." 
else
  while true; do
    result=$(curl http://127.0.0.1:9000/api/history/${prompt_id} -s)
    Log "Polling result: $result"
    outputs=$(parseJSON "$result" "['$prompt_id']['outputs']")
    if [ -n "$outputs" ]; then
        Log "Image generation result: $outputs"
        break
    fi

    sleep 1
  done
  
  Log "ComfyUI warm-up finished."
fi

3. Configure the model warm-up feature

On the application details page, in the Resource Information section, click the ComfyUI function comfyui-**** to go to the function details page.
On the function details page, select the Configuration tab. In the Instance Configuration section, click Edit. In the Instance Configuration panel, enable the Initializer hook. Set the Timeout to 600 s, select Execute Command, and then paste the warm-up script you created into the code editor.

This script sends an HTTP request to generate an image. This action loads the model into GPU memory and completes the warm-up process.
Select the Elasticity Configuration tab. In the Elastic Policy section, click Configure. In the Configure Elastic Policy panel, set the Minimum Number of Instances to 1, and then click OK.
Wait for the value of Minimum Number of Instances to become 1. The model warm-up is now complete.
Return to the ComfyUI web UI. You can now experience fast image generation, even on the first request.

Model warm-up for LLM services

This section describes how to use an Initializer hook in a large language model (LLM) service to perform model warm-up. This ensures that your function instances launch smoothly.

Prerequisites

Ensure that you have stored the example model Qwen3-8B provided in this topic in an OSS bucket so that functions can use it through an OSS mount. For more information, see Configure Object Storage Service.

SGLang

Run the following command to pull the latest sglang image from Docker Hub.
```
docker pull lmsysorg/sglang:latest
```
Note
Because of network access restrictions, users in China must pull the image from a mirror source in China.
Push the sglang image that you obtained in the previous step to Alibaba Cloud Container Registry (ACR). For more information, see Push and pull images using a Personal Edition instance.
Use an image from ACR to create an image-based function.

Configure the start command. For more information, see Create a custom image function.

python3 -m sglang.launch_server --model-path /mnt/model/Qwen3-8B --host 0.0.0.0 --port 8000

Configure the Execute Command for the Initializer hook. For more information, see Step 2.

#!/bin/sh

# Define variables
BASE_URL="http://localhost:8000"
REQUEST_PATH="/v1/chat/completions"
JSON_DATA='{
    "model": "/mnt/model/Qwen3-8B",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"}
    ]
}'

# Timeout in seconds
TIMEOUT=60
start_time=$(date +%s)

# Main loop
while true; do
    # Send an HTTP request and get the status code and response body
    http_code=$(curl -s -w "%{http_code}" \
        -H "Content-Type: application/json" \
        -d "$JSON_DATA" \
        -X POST "${BASE_URL}${REQUEST_PATH}" \
        -o /dev/stderr)

    # Check the HTTP status code
    if [ "$http_code" -eq 200 ]; then
        echo "{\"status\": \"success\", \"message\": \"HTTP code $http_code\"}"
        exit 0
    elif [ "$http_code" -eq 404 ]; then
        echo "{\"status\": \"retrying\", \"message\": \"Received 404, retrying...\"}" >&2
        elapsed_time=$(( $(date +%s) - start_time ))
        if [ "$elapsed_time" -ge "$TIMEOUT" ]; then
            echo "{\"status\": \"timeout\", \"message\": \"Request timed out after $TIMEOUT seconds\"}"
            exit 1
        fi
        sleep 1
    else
        echo "{\"status\": \"http_error\", \"code\": $http_code, \"message\": \"Unexpected HTTP status\"}"
        exit 1
    fi
done

vLLM

Run the following command to pull the latest vllm image from Docker Hub.
```
docker pull vllm/vllm-openai:latest
```
Note
Because of network access restrictions, users in China must pull the image from a mirror source in China.
Push the vLLM image that you obtained in the previous step to ACR. For more information, see Use a Personal Edition instance to push and pull images.
Use an image from ACR to create an image-based function.

Configure the start command. For more information, see Create a custom image function.

python3 -m vllm.entrypoints.openai.api_server --model /mnt/model/Qwen3-8B --served-model-name /mnt/model/Qwen3-8B --max-model-len 8192

Configure the Execute Command for the Initializer hook. For more information, see Configure instance lifecycle hook methods.

#!/bin/sh

# Define variables
BASE_URL="http://localhost:8000"
REQUEST_PATH="/v1/chat/completions"
JSON_DATA='{
    "model": "/mnt/model/Qwen3-8B",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"}
    ]
}'

# Timeout in seconds
TIMEOUT=60
start_time=$(date +%s)

# Main loop
while true; do
    # Send an HTTP request and get the status code and response body
    http_code=$(curl -s -w "%{http_code}" \
        -H "Content-Type: application/json" \
        -d "$JSON_DATA" \
        -X POST "${BASE_URL}${REQUEST_PATH}" \
        -o /dev/stderr)

    # Check the HTTP status code
    if [ "$http_code" -eq 200 ]; then
        echo "{\"status\": \"success\", \"message\": \"HTTP code $http_code\"}"
        exit 0
    elif [ "$http_code" -eq 404 ]; then
        echo "{\"status\": \"retrying\", \"message\": \"Received 404, retrying...\"}" >&2
        elapsed_time=$(( $(date +%s) - start_time ))
        if [ "$elapsed_time" -ge "$TIMEOUT" ]; then
            echo "{\"status\": \"timeout\", \"message\": \"Request timed out after $TIMEOUT seconds\"}"
            exit 1
        fi
        sleep 1
    else
        echo "{\"status\": \"http_error\", \"code\": $http_code, \"message\": \"Unexpected HTTP status\"}"
        exit 1
    fi
done