Cluster maintenance window

更新时间:
复制 MD 格式

Configure a cluster maintenance window to schedule planned operations—such as automatic Kubernetes version upgrades and CVE fixes—during off-peak hours. This ensures the stability of your core services during peak times and minimizes the potential impact of changes on your business.

Applies to: ACK managed clusters

How cluster maintenance windows work

A cluster maintenance window is a preset, recurring time period for an ACK cluster. ACK performs automated operations and maintenance (O&M) during this window, such as automatic Kubernetes version upgrades and CVE vulnerability fixes.

ACK runs two types of maintenance:

  • ACK-initiated maintenance: ACK automatically plans the execution time and sequence of O&M tasks based on task type and impact. No manual configuration is required.

  • User-configured maintenance: Set a custom maintenance window to control when ACK performs O&M operations—for example, to limit planned changes to off-peak hours.

A maintenance window defines when ACK is *allowed* to perform O&M operations. It does not guarantee that tasks run at the next available window. Final execution time depends on ACK's overall task scheduling and global grayscale orchestration rules.

The following figures show examples of a weekly maintenance window.

Default behavior: image

Custom maintenance window: image

Configure a maintenance window

Configure a maintenance window when you create a cluster, or modify the maintenance window of an existing cluster using any of the following methods:

Best practices

Setting Recommendation
Period You can choose a weekly or custom maintenance period. For production environments, use a fixed maintenance period and set the start time to off-peak hours, such as 00:00–04:00.
Duration Set each window to at least 4 hours. Total available maintenance time per month must be at least 48 hours to prevent long-running tasks such as upgrades from failing due to an insufficient window.
Time zone Select the time zone that matches your business location to make sure the window opens at the expected local time.
Application high availability Use a multi-replica deployment and distribute workloads across multiple nodes. For critical services, configure a Pod Disruption Budget (PDB) to control the number of pods that can be disrupted simultaneously.
Multi-cluster setups Stagger maintenance windows across production clusters to enable grayscale upgrades between clusters and improve overall service stability.

Operations and maintenance windows

The following table lists ACK O&M operations and whether each one is restricted by the maintenance window.

Operation Follows maintenance window Notes
Automatic Kubernetes version upgrades for the cluster control plane Yes
Automatic scans and fixes for CVE vulnerabilities in the node operating system Yes
Automatic upgrades of critical system components in Auto Mode clusters Yes
Automatic updates of the node pool image ID in Auto Mode clusters Yes Only newly added nodes use the new image. Existing nodes are not directly upgraded.
Automatic responses to ECS system events (e.g., SystemMaintenance.Reboot) Yes (with fallback) If a maintenance window is available before the ECS scheduled execution time, ACK performs the response during that window. Otherwise, ACK acts one hour before the ECS scheduled time.
Control plane repairs No Self-healing is triggered immediately to maintain control plane stability.
Node auto-healing No Triggered immediately when a node in a managed node pool fails.
Node scaling No Driven by real-time workload demand (CPU, memory). Independent of scheduled windows.
Critical security vulnerability patches No ACK reserves the right to bypass maintenance windows for emergency fixes to protect cluster and service security.

FAQ

An O&M task failed. Will it be retried?

Yes. If an O&M task fails within the current maintenance window, ACK automatically retries it during the next available maintenance window.

An upgrade started during the maintenance window but didn't finish in time. What happens?

ACK handles unfinished O&M plans as follows:

  • Unstarted batches: Automatically canceled and postponed to the next maintenance window.

  • Started batches: Continue running until complete to maintain node status consistency. Once the current batch finishes, all remaining unstarted batches are canceled.

What's next