Compute Resource Optimization
Operating System Optimization Based on Cloud Infrastructure
Alibaba Cloud Linux is an operating system distribution built on the OpenAnolis community's Anolis OS. Compatible with the RHEL/CentOS ecosystem, it provides a secure, stable, and high-performance runtime environment for cloud applications. The distribution is deeply optimized for Alibaba Cloud infrastructure to improve startup speed and runtime performance, and has been validated across a wide range of Alibaba Cloud products for stability. Key optimization areas include:
-
Kernel: Alibaba Cloud Linux 2 is based on Linux kernel 4.19 LTS, and Alibaba Cloud Linux 3 is based on Linux kernel 5.10 LTS. Both kernels continuously add cloud-specific features, improve performance, fix critical defects, and provide boot and system configuration parameters customized for the ECS instance environment.
-
Boot startup speed: Startup speed is significantly optimized for the ECS instance environment, reducing startup time by about 60% compared to other operating systems in actual tests.
-
Runtime system performance: Scheduling, memory, and I/O subsystems are optimized, improving performance by approximately 10% to 30% in some open-source benchmark tests compared to other operating systems.
ECS Instance Family Options Optimized for Specific Business Scenarios
ECS offers instance families tailored to different workload requirements. Product lines span general-purpose computing, heterogeneous computing, and high-performance computing, with enhanced instance types for specific verticals such as network-enhanced, storage-enhanced, memory-enhanced, security-enhanced, big data, high frequency, and heterogeneous computing. Typical examples include:
-
Elastic Bare Metal Instance: Built on Alibaba Cloud's next-generation virtualization technology, Elastic Bare Metal Instances combine virtual machine elasticity with physical machine performance. The next-generation virtualization technology retains the elastic experience of general-purpose cloud servers and the characteristics of physical machines, with full support for nested virtualization. Through Alibaba Cloud's proprietary virtualization 2.0 technology, applications can directly access the processor and memory without any virtualization overhead. Elastic Bare Metal Instances retain complete physical processor features (such as Intel VT-x) and physical-level resource isolation, making them well suited for deploying traditional non-virtualized workloads in the cloud.
-
GPU Cloud Server: GPU cloud servers combine GPU and CPU computing capabilities. GPUs excel at complex mathematical and geometric computations, especially floating-point and parallel operations, delivering computation capability hundreds of times that of CPUs. GPU features include a large number of arithmetic and logic units (ALUs) for large-scale concurrent computing, high-throughput multi-threaded processing, and streamlined logic control units. GPU cloud servers are suitable for video transcoding, image rendering, AI training, AI inference, and cloud-based graphic workstations.
-
Super Computing Cluster (SCC): SCC extends Elastic Bare Metal Servers with high-speed RDMA (Remote Direct Memory Access) interconnection, significantly improving network performance and large-scale cluster acceleration. SCC provides high-bandwidth, low-latency networking alongside all the advantages of Elastic Bare Metal Servers. It is primarily used for high-performance computing, AI and machine learning, scientific and engineering computing, data analysis, and audio/video processing. Within the cluster, nodes communicate through the RDMA network, meeting the high parallelism requirements of these workloads. The RoCE (RDMA over Convergent Ethernet) network delivers InfiniBand-level performance while supporting a broader range of Ethernet-based applications.
Reasonable Use of Elastic Resources
Cloud computing products provide flexible scaling functions and policies to handle both irregular and regular workload fluctuations. Elastic resources primarily include the following types:
Elastic Scaling Elastic Scaling Service (ESS, also known as Auto Scaling) automatically adjusts computational capacity (the number of instances) based on business requirements and policies. Instances can be ECS or ECI instances. ESS offers multiple scaling modes, including fixed quantity, health, timed, custom, and dynamic modes, along with features such as lifecycle hooks and cooldown periods for fine-grained control.
ACK Elastic Scaling for Container Service Elasticity is a key feature of ACK, with typical scenarios including online business scaling, large-scale computing training, deep learning GPU or shared GPU training and inference, and timed cyclic load changes. Elastic scaling operates at two dimensions:
-
Scheduling layer elasticity: Adjusts the scheduling capacity of workloads. Types include HPA, VPA, CronHPA, and Elastic-Workload. For example, HPA adjusts the number of application replicas, which changes the scheduling capacity occupied by the workload to achieve scheduling-layer scaling.
-
Resource layer elasticity: When the cluster's existing capacity cannot meet scheduling demand, additional ECS or ECI resources are provisioned. Types include cluster-autoscaler, virtual-node, and virtual-kubelet-autoscaler.
In practice, these two layers are often used together. High-performance scenarios have stricter requirements for container provisioning speed. For scheduling layer elasticity, the ack-autoscaling-placeholder component provides a buffer for automatic cluster expansion. For resource layer elasticity, Alibaba Cloud Image Builder can automatically build images that, combined with the custom image feature of ACK cluster node pools, enable rapid node expansion.
Function Compute Function Compute has built-in elasticity: instances scale out automatically as invocations increase and scale in when requests decrease. If an instance does not process requests for a certain period, it is automatically destroyed. This on-demand model simplifies resource management but introduces cold start latency. A cold start involves code download, container startup, runtime initialization, and code initialization. Once complete, the function instance is ready and subsequent requests execute directly. In high-performance scenarios, you typically need strategies to mitigate cold start impact. Function Compute already optimizes cold starts on the platform side. On the user side, consider the following optimizations:
-
Slim down the code package: Minimize the code package by removing unnecessary dependencies. For example, run npm prune in Node.js or autoflake in Python. Third-party libraries may contain test source code, unused binary files, or data files. Removing these reduces code download and decompression time.
-
Choose the right function language: Java runtime cold startup time is typically higher than other languages due to differences in language design. For applications sensitive to cold start latency, lightweight languages like Python can significantly reduce tail latency when hot startup latency is comparable.
-
Choose the right memory: At a given concurrency level, larger function memory allocates more CPU resources, resulting in better cold start performance.
-
Reduce the probability of cold start
-
Preheat the function using a timed trigger.
-
Use the Initializer callback. Function Compute asynchronously calls the initialization interface, eliminating code initialization time. During system upgrades or function updates, cold starts are transparent to you.
In practice, fully eliminating user-side cold starts is difficult. For example, deep learning inference may require loading large model files, or a function may depend on a legacy system client with a long initialization time. If your function is highly sensitive to latency, you can configure reserved mode instances for the function, or use reserved and on-demand mode instances together.
You manage the allocation and release of reserved mode instances, and you are billed based on instance runtime. When load demand exceeds reserved instance capacity, the system automatically provisions on-demand instances, balancing performance and resource utilization. By allocating reserved instances in advance according to expected load changes, the system can continue processing requests on reserved instances while scaling on-demand instances, completely eliminating cold start delays.