Overview

更新时间:
复制 MD 格式

System stability is a system's ability to maintain continuous, reliable service when unexpected events occur — from hardware failures and software bugs to sudden traffic spikes and infrastructure outages.

Failures are inevitable in distributed systems. Hardware breaks, configurations drift, and traffic surges without warning. In extreme cases, a single cut fiber optic cable or a natural disaster can take an entire data center offline. Building stable systems means assuming failures will happen and designing to contain, recover from, and prevent their impact.

As architectures grow more complex, stability becomes harder to maintain. The Alibaba Cloud Well-Architected Framework gives you a structured approach to evaluate and improve system stability across six dimensions:

  • Availability: the proportion of time a system can serve requests as intended

  • Reliability: the ability to perform the correct function under stated conditions

  • Observability: the ability to infer internal system state from external signals

  • Operability: the ease of running, maintaining, and evolving a system in production

  • Scalability: the ability to handle increased load without degrading service quality

  • Maintainability: the ease of modifying a system to fix issues or adapt to new requirements

Cloud infrastructure addresses many stability challenges that are difficult or expensive to solve on-premises. Alibaba Cloud can dynamically allocate and release compute resources in response to real-time demand, making systems easier to scale. Redundant storage and backup capabilities reduce the risk of downtime or data loss from hardware failures or incidents. Use these platform capabilities as a foundation, not a substitute for stability design in your workloads.

Shared responsibility model

Alibaba Cloud and you share responsibility for stability. Understanding the boundary helps you focus your design and operational effort in the right places.

Alibaba Cloud is responsible for the stability of the underlying infrastructure: physical hardware, data centers, global networking, and the core services that run on this infrastructure. Alibaba Cloud provides high availability (HA) infrastructure and a suite of tools and services to help you build stable applications.

You are responsible for the stability of what you build on Alibaba Cloud: application architecture, failure handling in code, deployment practices, configuration management, and operational procedures. For infrastructure-level services such as Elastic Compute Service (ECS), you configure and manage the resiliency of your workload. For managed services, Alibaba Cloud handles more of the underlying reliability, but you still control how your application uses and depends on those services.

Follow the best practices in this framework to design, build, and operate stable workloads on Alibaba Cloud.