This topic describes how to optimize cold starts for elastic instances and improve function performance in Function Compute by setting a minimum number of instances.
What is a cold start?
Function Compute uses elastic instances by default. These instances scale automatically in response to requests. When a request arrives, the system creates an instance to process it. The instance is reclaimed after it is no longer processing requests. You are billed only for the time that instances spend processing requests. Although this elastic model simplifies resource management, it can cause performance issues, such as cold starts and high latency.
A cold start is the process of preparing the execution environment and your code. This process includes downloading code, starting the function instance container, initializing the runtime, and initializing your code. After the cold start is complete, the function instance can process requests.
Optimize cold starts
Optimizing cold starts is a shared responsibility between the user and the platform. Although Function Compute includes many system-level optimizations, you can use the following methods to further reduce cold start times:
Streamlining code packages
You can keep code packages as small as possible by removing unnecessary dependencies. For example, you can run the npm prune command in Node.js or autoflake in Python. In addition, some third-party libraries may contain files that are not required for execution, such as test case source code, unused binary files, or data files. You can delete these unused files to reduce the time required to download and decompress your code.
Choosing the right function language
The Java runtime typically has a longer cold start time than other languages because of differences in language design. For applications that are sensitive to cold start latency, you can use a lightweight language such as Python to significantly reduce long-tail latency. This is especially effective if there is little difference in warm start latency between the languages.
Choose a suitable memory size
A larger memory configuration allocates more CPU resources for the same level of concurrency. This results in better cold start performance.
Reduce the probability of cold starts
You can use an Initializer handler. Function Compute asynchronously calls the initialization interface, which separates the code initialization time from the request execution time. As a result, cold starts are not noticeable during Function Compute system upgrades or function updates.
Mixed mode
Some user-side cold starts are difficult to eliminate. For example, deep learning inference requires loading large model files. Another example is a function that must interact with a legacy system using a client that has a long initialization time. In these scenarios, if your function is highly sensitive to latency, you can set the minimum number of instances to 1 or more. When a request arrives, these hibernating instances can wake up quickly to process the request.
When the minimum number of instances is set to 1 or more, the system prioritizes assigning requests to these pre-warmed instances. If the pre-warmed instances cannot handle the current workload, the system automatically creates more elastic instances. This approach balances performance and resource utilization. By setting a minimum number of instances, you pre-allocate compute resources to handle workload fluctuations. The system continues to use the pre-warmed instances to process requests even while it creates more instances. This eliminates cold start latency for requests handled by the pre-warmed instances.
For example, assume that the minimum number of instances for a function is set to 10. If the concurrent workload requires more than 10 instances, the system creates new elastic instances to handle the additional requests. Whether an instance is fully loaded depends on its concurrency configuration. The system tracks the number of requests being processed on each function instance. When the number of concurrent requests on an instance reaches its configured limit, the system routes new requests to another available instance. When all available instances reach their concurrency limits, a new instance is created.
If you set the minimum number of instances to 1 or more, you are charged for these instances even when they are not processing requests. The charge is based on the unit price for elastic instances in shallow hibernation. For more information about billing, see Billing overview. To ensure that resource usage stays within a desired range, you can configure a maximum number of instances.