Configure a function to handle a large number of concurrent trigger events-Function Compute(FC)-阿里云帮助中心

When a large number of concurrent invocations hit your function simultaneously — for example, when many objects are uploaded to Object Storage Service (OSS) at the same time and trigger an OSS trigger — Function Compute must scale out quickly enough to process all requests without dropping or delaying them.

Two features work together to handle this: instance concurrency and provisioned instances. Use them based on your traffic pattern and latency requirements.

Choose a solution

	Instance concurrency	Provisioned instances
What it does	Allows a single instance to handle multiple concurrent requests	Pre-warms a fixed number of instances before traffic arrives
Cold start	Possible — new instances are created on demand	Eliminated — instances are ready before requests arrive
Best for	Workloads with traffic spikes	Latency-sensitive workloads or GPU-accelerated functions

Both solutions increase the total number of requests Function Compute can process concurrently. For latency-sensitive functions or GPU-accelerated instances, use provisioned instances — GPU instances scale more slowly than elastic instances.

Configure instance concurrency

Instance concurrency lets a single instance process multiple requests at the same time, reducing the total number of instances needed to serve your traffic.

For setup instructions, see Specify the maximum number of concurrent instances.

Configure provisioned instances

Provisioned instances are pre-warmed and ready to accept requests before traffic arrives, eliminating cold start latency. For GPU-accelerated instances, we recommend using provisioned mode — GPU instances scale more slowly than elastic instances.

For setup instructions, see Configure provisioned instances.

Scale-out limits

By default, an Alibaba Cloud account can run at most 100 concurrent instances per region. The actual limit shown in Quota Center takes precedence. To request a higher limit, go to Quota Center and submit a quota increase request.

Even after raising your quota, the rate at which new instances can be added is subject to regional limits:

Region	Burst instance limit	Instance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)	300	300 per minute
Other regions	100	100 per minute

These limits apply equally to provisioned instances and on-demand instances in the same region.

Note: If your workload requires a higher scale-out rate, join the DingTalk group 64970014484 for technical support.