Set the minimum number of instances for a function to a value greater than 0 to pre-allocate elastic resources. This helps prevent request latency caused by cold starts during peak hours. You can also configure policies to automatically scale the minimum number of instances based on a schedule or metric thresholds. This ensures high performance and improves instance utilization.
Setting the minimum number of instances to a value greater than 0 helps mitigate cold start issues and improves response times for latency-sensitive applications. You are charged for these pre-allocated instances regardless of their usage. When processing requests, they are billed as active elastic instances. When not processing requests, they are billed as idle elastic instances. For more information about how active elastic instances and idle elastic instances are billed, see Billing overview.
You can configure elastic policies for the minimum number of instances only for a function alias or the LATEST version.
Set the minimum number of instances
Log on to the Function Compute console. In the left-side navigation pane, click Functions.
In the top navigation bar, select a region. On the Functions page, click Create Function.
On the Create Function page, in the Scaling Policy section, set Minimum Instances, configure the other required parameters, and then click Create.
Configure elastic policies
On the details page of the target function, click the Scaling Policy tab. In the Elastic policy section, find the target policy and click Modify in the Actions column.
In the Edit elastic policy panel, configure a dynamic elastic policy for the minimum number of instances.
NoteIf you configure multiple elastic policies, the system calculates the Minimum Instances for each triggered policy. The system then uses the maximum value among the Minimum Instances of all currently active policies as the current minimum number of instances. For more information, see How is the current minimum number of instances calculated?.
While an elastic policy is active, the initial Minimum Instances setting is ignored. If no elastic policy is active, the current minimum number of instances reverts to the initial value that you set for Minimum Instances.
Configure a Scheduled Scaling or Water-level Scaling policy
Scheduled scaling
A scheduled scaling policy is suitable for functions with clear periodic patterns or predictable traffic peaks. When the number of concurrent function invocations exceeds the minimum number of instances, the excess requests are automatically handled by on-demand elastic instances. For more information, see Scheduled scaling.
This example sets the Time zone to Asia/Shanghai (UTC+8). The policy is long-term and scales out the minimum number of instances to 50 at 10:00 from Monday to Friday, and scales in to 5 at 22:00.
Threshold-based scaling
The system periodically collects metrics such as Provisioned concurrency utilization, Memory utilization, or resource utilization metrics for GPU instances. When a specified condition is met, the system scales the Minimum Instances accordingly. For more information, see Threshold-based scaling.
Set the Minimum Number of Instances to
1. Then, in the Minimum Number of Instances Dynamic Policy section, select the Threshold-based Scaling tab to configure the policy.This example sets the Time zone to Asia/Shanghai (UTC+8). The policy is active from 00:00 on July 15, 2025, to 00:00 on July 31, 2025. It tracks the Provisioned concurrency utilization metric with a target value of 60%. When utilization exceeds 60%, the system scales out to a maximum of 100 instances. When utilization falls below 60%, it scales in to a minimum of 10 instances.
For CPU functions, threshold-based scaling monitors metrics such as Provisioned concurrency utilization and Memory utilization. For GPU functions, the policy supports monitoring Provisioned concurrency utilization and other GPU-related resource utilization metrics, as detailed in the following table.
CPU functions
GPU functions
In the threshold-based scaling configuration, the Utilization Type dropdown list supports two utilization types: instance concurrency utilization and memory utilization. The Trigger Method tab also includes scheduled scaling.
In the threshold-based scaling configuration of a GPU function, Utilization Type supports five options: instance concurrency utilization, GPU SM utilization, GPU memory utilization, GPU hardware encoder utilization, and GPU hardware decoder utilization.
Configure periodic scaling by using a CRON Expression
If your application has predictable traffic patterns, you can also use a CRON expression to periodically scale the minimum number of instances. For example, you can set the Time zone to Asia/Shanghai (UTC+8) to scale out the minimum number of instances to 10 at 10:00 every Monday and scale in to 1 at 22:00 every Friday.
Modify or delete an elastic policy for the minimum number of instances
Log on to the Function Compute console. In the left-side navigation pane, choose . On the Elastic policy page, find the policy you want to manage. In the Actions column, click Modify or Delete to modify or delete the elastic policy for the minimum number of instances.
Deleting an elastic policy for the minimum number of instances of an alias releases all pre-allocated instances for that alias. The function then automatically switches to on-demand scaling, which may involve a cold start. For CPU-based services, the average cold start time is typically hundreds of milliseconds, depending on the application's startup speed. For GPU-based services, the average cold start time can be several minutes, depending on the model size and loading speed.
References
To limit the number of instances for a specific function, you can configure function quotas. If the total number of running instances for the function exceeds the configured limit, Function Compute returns a throttling error.