Resource Scaling and System expansion

更新时间:
复制 MD 格式

Manual capacity management leads to over-provisioning during quiet periods and under-provisioning during spikes. This document describes how to use Alibaba Cloud's automated scaling models—cloud service auto scaling, container auto scaling, and Serverless—to match compute capacity to actual demand without manual intervention.

Before choosing a scaling approach, check whether any of the following describe your current setup:

  • You react to traffic spikes by manually adjusting capacity.

  • You apply the same static sizing guidelines from on-premises environments to the cloud.

  • You leave excess capacity running after a scaling event instead of scaling back down.

If any of these apply, the approaches below provide a path to fully automated, policy-driven scaling. Alibaba Cloud supports three scaling models—cloud service auto scaling, container auto scaling, and Serverless—each suited to different deployment patterns and operational requirements.

  • Cloud service auto scaling: Alibaba Cloud Auto Scaling (Elastic Scaling Service, ESS) automatically adjusts compute capacity—specifically, the number of Elastic Compute Service (ECS) or Elastic Container Instance (ECI) instances—based on your defined policies and real-time business demand. ESS suits both applications with variable traffic and those with predictable, stable workloads.

    Desired outcome: ESS provisions and releases instances automatically, without manual intervention. Scaling policies are defined in code and triggered by metrics or schedules, ensuring your compute capacity tracks actual demand in both directions—up and down.

  • Container auto scaling: As containerization becomes the standard compute model in cloud environments, more applications run on Container Service for Kubernetes (ACK). ACK elastic scaling covers key scenarios including online business elasticity, large-scale compute training, deep learning GPU and shared GPU training and inference, and scheduled or periodic workloads.

    Container scaling operates on two independent dimensions:

  • Scheduler layer elasticity: Adjusts the scheduling capacity consumed by a workload. The Horizontal Pod Autoscaler (HPA) is the primary component at this layer—it increases or decreases the number of pod replicas, which changes how much of the cluster's scheduling capacity the workload occupies.

  • Resource layer elasticity: Activates when the cluster's existing capacity cannot satisfy the scheduler's demands. At this point, additional ECS or ECI instances are provisioned to expand available resources.

    Both layers can be used independently or together. They are decoupled through the scheduling layer's capacity status, so each can scale at its own rate without blocking the other.

  • Serverless: Both cloud service and container scaling require some ongoing involvement in managing infrastructure. With Serverless, that responsibility is removed entirely—there is no capacity planning and no infrastructure to maintain. Alibaba Cloud Serverless products provide three core advantages: millisecond-scale elastic scaling, pay-per-use pricing, and high development efficiency with no need to manage underlying cloud resources. Alibaba Cloud offers more than 20 core Serverless products, spanning from Serverless compute to Serverless application development.

    Use Serverless when your workload has highly unpredictable traffic spikes, short-lived tasks, or when reducing operational overhead is the priority.