How can I ensure a function responds normally when many events trigger it at the same time?

更新时间:
复制 MD 格式

When a large number of concurrent invocations hit your function simultaneously — for example, when many objects are uploaded to Object Storage Service (OSS) at the same time and trigger an OSS trigger — Function Compute must scale out quickly enough to process all requests without dropping or delaying them.

Two features work together to handle this: instance concurrency and provisioned instances. Use them based on your traffic pattern and latency requirements.

Choose a solution

Instance concurrencyProvisioned instances
What it doesAllows a single instance to handle multiple concurrent requestsPre-warms a fixed number of instances before traffic arrives
Cold startPossible — new instances are created on demandEliminated — instances are ready before requests arrive
Best forWorkloads with traffic spikesLatency-sensitive workloads or GPU-accelerated functions

Both solutions increase the total number of requests Function Compute can process concurrently. For latency-sensitive functions or GPU-accelerated instances, use provisioned instances — GPU instances scale more slowly than elastic instances.

Configure instance concurrency

Instance concurrency lets a single instance process multiple requests at the same time, reducing the total number of instances needed to serve your traffic.

For setup instructions, see Specify the maximum number of concurrent instances.

Configure provisioned instances

Provisioned instances are pre-warmed and ready to accept requests before traffic arrives, eliminating cold start latency. For GPU-accelerated instances, we recommend using provisioned mode — GPU instances scale more slowly than elastic instances.

For setup instructions, see Configure provisioned instances.

Scale-out limits

By default, an Alibaba Cloud account can run at most 100 concurrent instances per region. The actual limit shown in Quota Center takes precedence. To request a higher limit, go to Quota Center and submit a quota increase request.

Even after raising your quota, the rate at which new instances can be added is subject to regional limits:

RegionBurst instance limitInstance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)300300 per minute
Other regions100100 per minute

These limits apply equally to provisioned instances and on-demand instances in the same region.

Note: If your workload requires a higher scale-out rate, join the DingTalk group 64970014484 for technical support.