When a large number of concurrent invocations hit your function simultaneously — for example, when many objects are uploaded to Object Storage Service (OSS) at the same time and trigger an OSS trigger — Function Compute must scale out quickly enough to process all requests without dropping or delaying them.
Two features work together to handle this: instance concurrency and provisioned instances. Use them based on your traffic pattern and latency requirements.
Choose a solution
| Instance concurrency | Provisioned instances | |
|---|---|---|
| What it does | Allows a single instance to handle multiple concurrent requests | Pre-warms a fixed number of instances before traffic arrives |
| Cold start | Possible — new instances are created on demand | Eliminated — instances are ready before requests arrive |
| Best for | Workloads with traffic spikes | Latency-sensitive workloads or GPU-accelerated functions |
Both solutions increase the total number of requests Function Compute can process concurrently. For latency-sensitive functions or GPU-accelerated instances, use provisioned instances — GPU instances scale more slowly than elastic instances.
Configure instance concurrency
Instance concurrency lets a single instance process multiple requests at the same time, reducing the total number of instances needed to serve your traffic.
For setup instructions, see Specify the maximum number of concurrent instances.
Configure provisioned instances
Provisioned instances are pre-warmed and ready to accept requests before traffic arrives, eliminating cold start latency. For GPU-accelerated instances, we recommend using provisioned mode — GPU instances scale more slowly than elastic instances.
For setup instructions, see Configure provisioned instances.
Scale-out limits
By default, an Alibaba Cloud account can run at most 100 concurrent instances per region. The actual limit shown in Quota Center takes precedence. To request a higher limit, go to Quota Center and submit a quota increase request.
Even after raising your quota, the rate at which new instances can be added is subject to regional limits:
| Region | Burst instance limit | Instance growth rate |
|---|---|---|
| China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen) | 300 | 300 per minute |
| Other regions | 100 | 100 per minute |
These limits apply equally to provisioned instances and on-demand instances in the same region.
Note: If your workload requires a higher scale-out rate, join the DingTalk group 64970014484 for technical support.