Create an ACK dedicated cluster

更新时间:
复制 MD 格式

In an ACK dedicated cluster, you must create at least three master nodes for high availability and multiple worker nodes for fine-grained control over the cluster infrastructure. However, you must plan, maintain, and upgrade the cluster yourself. This topic shows you how to create an ACK dedicated cluster using the console, an API, Terraform, an SDK, or a CLI.

Important

Container Service for Kubernetes no longer supports creating ACK dedicated clusters as of August 21, 2024. We recommend using ACK Pro clusters in production environments for higher reliability, security, and scheduling efficiency.

Preparations

Before you create a cluster, make sure that you have activated Container Service for Kubernetes (ACK), granted the ACK system service role to your Alibaba Cloud account or RAM user, and activated related cloud products such as VPC, Server Load Balancer (SLB), and NAT Gateway. ACK requires these permissions to call related services and perform cluster operations. For more information, see Quickly create an ACK managed cluster.

Note

The cluster creation process involves purchasing pay-as-you-go resources such as Server Load Balancer (SLB) instances. Make sure that your account has a sufficient balance to prevent service interruptions due to overdue payments.

Create a cluster

You can create an ACK cluster by using the console, API, SDK, Terraform, or CLI.

Console

Step 1: Log on to ACK console

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. In the top-left corner of the page, select the resource group and region where your target resources reside.image

  3. On the Clusters page, click Create Kubernetes Cluster.

Step 2: Configure cluster

On the ACK Dedicated Cluster tab, configure the basic information, network, and advanced options for the cluster.

Basic information

Parameter

Description

Cluster Name

Enter a custom cluster name.

Region

The region where cluster resources (such as ECS instances and cloud disks) are located. The closer the region is to your location and where your resources are deployed, the lower the network latency.

Kubernetes Version

Only the latest three minor versions are supported. We recommend using the latest available version. For details about ACK version support, see ACK version support overview.

Network configuration

Parameter

Description

IPv6 Dual-stack

Supported only for Kubernetes 1.22 or later, only with Terway, and cannot be used together with eRDMA.

The cluster supports both IPv4 and IPv6 protocols, but communication between worker nodes and the control plane still uses IPv4 addresses. Ensure the following:

  • The cluster VPC supports IPv6 dual-stack.

  • When using Terway in shared ENI mode, the instance type of the node must support IPv6 and have the same number of assignable IPv4 and IPv6 addresses.

VPC

The VPC for the cluster. To ensure high availability, we recommend selecting two or more zones.

  • Auto-create: ACK creates a vSwitch in each selected zone.

  • Use existing: Select a vSwitch to specify the cluster zone. You can create a new vSwitch or use an existing one.

We recommend using standard private CIDR blocks for the cluster VPC (for example, 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16). If you have special requirements, apply at the Quota Center (Create a cluster using a public CIDR block VPC).

Cloud resource and billing information: imageVPC

Configure SNAT for VPC

Do not select this option when using a shared VPC.

Select this option if nodes need public network access (to pull public images or access external services). ACK automatically configures a NAT Gateway and SNAT rules to enable public network access for cluster resources.

  • VPC has no NAT Gateway: ACK automatically creates a NAT Gateway, purchases a new EIP, and configures SNAT rules for the cluster's vSwitches.

  • VPC already has a NAT Gateway: ACK determines whether to purchase additional EIPs or configure SNAT rules. If no EIP is available, a new EIP is purchased. If no VPC-level SNAT rule exists, SNAT rules are configured for the cluster's vSwitches.

If you do not select this option, you can manually configure a NAT Gateway and SNAT rules after cluster creation. For details, see Public NAT Gateway.

Cloud resource and billing information: imageNAT Gateway, imageEIP

vSwitch

Select an existing vSwitch by zone from the list, or click Create vSwitch to create a new one. The control plane and default node pool use the specified vSwitch. For better high availability, we recommend selecting vSwitches in multiple zones.

Security Group

When using an existing VPC, you can select Select Existing Security Group

This security group applies to the cluster control plane, default node pool, and any node pool without a custom security group.

Compared with basic security groups, enterprise security groups can accommodate a larger number of private IP addresses but do not support intra-group connectivity. For more information, see Security Group Classification.

  • Auto-create: All outbound traffic is allowed by default. Inbound rules follow recommended configurations. If you modify rules later, ensure inbound access to the 100.64.0.0/10 CIDR block is allowed.

    This CIDR block is used to access other Alibaba Cloud services for operations such as image pulling and querying ECS basic information.
  • Use existing: ACK does not add extra access rules to the security group. You must manage security group rules yourself to avoid access issues. For details, see Configure cluster security groups.

Access to API Server

ACK automatically creates a pay-as-you-go private CLB instance as the internal endpoint for the API Server. This CLB instance cannot be reused or deleted. If deleted, the API Server becomes inaccessible and cannot be restored.

To use an existing CLB instance, submit a ticket. After selecting Use Existing Gateway for the VPC, you can set the SLB Source to Use Existing Gateway.

You can optionally enable Expose API server with EIP.

  • Enabled: Binds an EIP to the private CLB instance of the API Server, allowing public network access to manage the cluster.

    This does not grant public network access to resources inside the cluster. To allow cluster resources to access the public network, select Configure SNAT for VPC.
  • Disabled: Allows cluster connection and management via KubeConfig only from within the VPC.

To enable this later, see Enable public network access to API Server.
Starting December 1, 2024, newly created CLB instances will no longer support Subscription billing, and will incur instance fees. For details, see [Product Announcement] Discontinuation of subscription billing for new cluster API Server CLB instances, Adjustment announcement for Classic Load Balancer CLB billing items.

Cloud resource and billing information: imageCLB, imageEIP

Network Plug-in

The network plugin provides the foundation for pod-to-pod communication in the cluster.

For a detailed comparison, see Compare Terway and Flannel container network plugins.
  • Flannel: A lightweight, open-source community network plugin. In ACK, it integrates deeply with Alibaba Cloud VPC and uses direct VPC route table management for pod communication.

    • Use case: Simple configuration and low resource consumption. Suitable for small-scale clusters (limited by VPC route table quotas), simplified networking, and scenarios that do not require custom container network control.

  • Terway: A high-performance network plugin developed by Alibaba Cloud that uses Elastic Network Interfaces (ENIs) for pod communication.

    • Use case: Offers eBPF-based network acceleration, NetworkPolicy, and per-pod vSwitch and security group capabilities. Ideal for high-performance computing, gaming, microservices, and other scenarios requiring large-scale nodes, high network performance, and strong security.

    • Pod limit: Each pod consumes one secondary IP address from an ENI. The number of IPs per ENI is limited (depending on the instance type). Therefore, the maximum number of pods per node is constrained by ENI and secondary IP quotas.

      When using a shared VPC, only Terway is supported.

    Terway also provides the following capabilities.

    For details, see Use the Terway network plugin.
    • DataPathV2

      Configurable only during cluster creation.

      Enable DataPathV2 acceleration mode. Terway uses eBPF technology to optimize traffic forwarding paths, delivering lower latency and higher throughput for network-intensive applications.

      Supported only on Alibaba Cloud Linux 3 (all versions), ContainerOS, and Ubuntu with Linux kernel version 5.10 or later. For details, see Network acceleration.

    • NetworkPolicy support

      In public preview. Apply on the Quota Center console.

      Supports native Kubernetes NetworkPolicy to implement pod-level "firewalls" and fine-grained access control rules, enhancing cluster security.

    • Support for ENI Trunking

      Allows assigning dedicated IPs, vSwitches, and security groups to pods. Suitable for special business scenarios requiring fixed IPs or independent network policy management for specific pods. For details, see Assign fixed IPs, dedicated vSwitches, and security groups to pods.

Container CIDR Block

Required only for Flannel.

The IP address pool for assigning pod IPs. This CIDR block must not overlap with the VPC or any existing ACK cluster CIDR blocks in the VPC, and must not overlap with the Service CIDR.

Number of Pods per Node

Required only for Flannel.

Defines the maximum number of pods allowed on a single node.

Pod vSwitch

Required only when using the Terway plugin.

The vSwitch used to assign IP addresses to pods. Each pod vSwitch corresponds to a worker node vSwitch, and both must be in the same zone.

Important

For the Pod virtual switch, use a subnet mask no larger than /19. The maximum allowed subnet mask is /25. If you use a larger subnet mask, the number of Pod IP addresses that can be allocated in the cluster is severely limited, which affects the cluster’s normal operation.

Service CIDR

Also known as Service CIDR, this is the IP address pool for assigning IPs to internal cluster services. This CIDR block must not overlap with the VPC or any existing cluster CIDR blocks in the VPC, and must not overlap with the Container CIDR Block.

IPv6 Service CIDR Block

Requires IPv6 dual-stack to be enabled.

Configure an IPv6 address range for the Service CIDR block. Use a ULA address (within the fc00::/7 range) with a prefix length between /112 and /120. We recommend matching the number of available addresses to that of the Service CIDR.

Advanced configuration

Expand Advanced Options (Optional) to configure the service forwarding mode for the cluster.

Parameter

Description

Forwarding Mode

Select the kube-proxy proxy mode, which determines how cluster Services distribute requests to backend pods.

  • iptables: Uses Linux firewall rules for traffic forwarding. Stable but limited in performance. As the number of Services increases, firewall rules grow exponentially, slowing request processing. Suitable for clusters with few Services.

  • IPVS: A high-performance traffic distribution solution that uses hash tables for fast pod targeting, delivering lower latency under heavy Service loads. Suitable for large-scale production clusters or scenarios requiring high network performance.

Expand Advanced Options (Optional) to configure cluster deletion protection, the Resource Group, and other settings.

Advanced options

Parameter

Description

Cluster Deletion Protection

We recommend enabling this to prevent accidental cluster deletion via the console or OpenAPI.

Resource Group

Assign the cluster to the selected resource group for easier permission management and cost allocation.

A resource can belong to only one resource group.

Label

Bind key-value tags to the cluster as cloud resource identifiers.

Time Zone

The time zone used by the cluster. Defaults to the browser's configured time zone.

Cluster Domain

The top-level domain (standard suffix) used by Services in the cluster. Defaults to cluster.local but supports custom domains. For considerations when using a custom local domain, see What should I consider when configuring a custom cluster local domain (ClusterDomain)?.

For example, a Service named my-service in the default namespace has the DNS domain name my-service.default.svc.cluster.local.

Custom Certificate SANs

By default, the SAN (Subject Alternative Name) field in the API Server certificate includes the cluster local domain, private IP, public EIP, and other fields. To access the cluster through a proxy server, custom domain, or special network environment, add those access addresses to the SAN field.

To enable this later, see Customize the cluster API Server certificate SAN.

Service Account Token Volume Projection

In traditional mode, pod identity credentials are permanently valid and shared among multiple pods, posing a security risk. When enabled, each pod receives its own temporary identity credentials with configurable expiration and permission limits.

To enable this later, see Use ServiceAccount Token volume projection.

Node Port Range

The available port range when creating NodePort-type Services.

Cluster CA

When enabled, you can add a CA certificate to the cluster to enhance security for information exchange between server and client.

Step 3: Configure master nodes

Click Next: Master Configuration to configure the master nodes.

Parameter

Description

Master Nodes

Specify the number of master nodes to deploy in the zone.

Billing Method

Supports Pay-As-You-Go and Subscription. When selecting Subscription, set the Validity Period and whether to enable Auto Renewal.

Instance Type

Select the instance family for master nodes. For configuration recommendations, see Select master node specifications.

System Disk

Select a cloud disk type based on your business needs, including ESSD AutoPL, ESSD, ESSD Entry, and previous-generation disks (SSD and ultra disk). Configure capacity, IOPS, and other parameters.

Available system disk types depend on the selected instance family. Disk types not displayed are unsupported.

ESSD custom capabilities

  • Supports custom performance levels. Larger disk capacity allows higher performance levels (PL2 for capacities over 460 GiB, PL3 for over 1260 GiB). For details, see ESSD.

  • Only ESSD system disks support Encrypted. By default, Alibaba Cloud uses the service key (Default Service CMK) for encryption. You can also select a custom key (BYOK) pre-created in KMS.

Supports selecting More Disk Categories to configure disk types different from the primary System Disk, improving scale-out success rates. When creating nodes, ACK selects the first matching disk type from the specified order.

Cloud resource and billing information: imageECS block storage

Deployment Set

After creating a deployment set in the ECS console Create deployment set, specify it for the node pool so that scaled-out nodes are distributed across different physical servers, improving high availability.

By default, a deployment set supports up to 20 * number of zones (determined by vSwitches), limiting the maximum node count in the node pool. Ensure sufficient quota in the deployment set.

To enable this later, see Node pool deployment set best practices.
Advanced options

Parameter

Description

Instance Metadata Access Mode

Supported only for clusters running Kubernetes 1.28 or later.

Configure the ECS instance metadata access mode. Inside the ECS instance, access the metadata service to obtain instance metadata, including instance ID, VPC information, NIC information, and other instance properties. For details, see Instance metadata.

  • Normal Mode and Security Hardening Mode: Supports accessing the metadata service using both normal and reinforced modes.

  • Security Hardening Mode: Supports accessing the metadata service using only reinforced mode. For details, see Use reinforced mode only to access ECS instance metadata.

Step 4: Configure the node pool

Click Next: Node Pool Configuration and configure the basic and advanced options for the node pool.

Basic node pool configuration

Parameter

Description

Node Pool Name

Enter a custom node pool name.

Container Runtime

For selection guidance, see Compare containerd, sandboxed container, and Docker runtimes.

  • containerd (recommended): Community standard, supported for Kubernetes 1.20 and later.

  • Sandboxed container: Provides a strongly isolated environment based on lightweight virtualization technology. For procedures and limitations, see Create and manage sandboxed container node pools.

  • Docker (deprecated): Supported only for Kubernetes 1.22 and earlier. Creation is no longer supported.

Instance and image configuration

Parameter

Description

Billing Method

The default billing method used when scaling out nodes in the node pool.

  • Pay-As-You-Go: Can be enabled and released on demand.

  • Subscription: Requires configuring Duration and Auto Renewal.

  • Preemptible Instance: Currently, only spot instances with a protection period are supported. You must also configure the Instance Price Cap.

    The instance is created successfully when the real-time price of the specified instance type is below the maximum bid price. After the protection period (1 hour), the system checks the real-time price and inventory every 5 minutes. If the market price exceeds the bid price or inventory is insufficient, the spot instance is released. For usage recommendations, see Spot instance node pool best practices

To maintain node pool consistency, you cannot change a Pay-As-You-Go or Subscription node pool to a Preemptible Instance node pool, or vice versa.

Instance configuration

When scaling out, nodes are allocated from the configured ECS instance families. To improve scale-out success rates, select multiple instance types across multiple zones to avoid unavailability or insufficient inventory. The specific instance type used for scaling is determined by the configured Scaling Policy.

To ensure business stability and accurate resource scheduling, do not mix GPU and non-GPU instance types in the same node pool.

Configure instance types for scaling in one of two ways:

  • Specific types: Specify exact instance types based on vCPU, memory, family, architecture, and other dimensions.

  • Generalized configuration: Select instance types to use or exclude based on attributes (vCPU, memory, etc.) to further improve scale-out success rates. For details, see Configure node pools using specified instance attributes.

Refer to the console's elasticity strength recommendations for configuration, or view node pool elasticity strength after creation.

For ACK-unsupported instance types and node configuration recommendations, see ECS instance type configuration recommendations.

Cloud resource and billing information: imageECS instance, imageGPU instance

Operating System

Marketplace Image is in phased release.
To upgrade or change the operating system later, see Change operating system.

Security Hardening

When creating nodes, ACK applies the selected security baseline policy.

  • Disable: No security hardening is applied to ECS instances.

  • MLPS Security Hardening: Alibaba Cloud provides baseline check standards and scanning tools for Alibaba Cloud Linux MLPS 2.0 Level 3 images that comply with classified protection requirements. While ensuring native image compatibility and performance, these images are adapted for MLPS compliance to meet "GB/T22239-2019 Information Security Technology—Cybersecurity Classified Protection Basic Requirements." For details, see ACK MLPS hardening usage guide.

    In this mode, the root user cannot log on remotely via SSH. Connect to the instance via VNC in the ECS console and create a regular user that supports SSH logon.

  • OS Security Hardening: Supported only for Alibaba Cloud Linux 2 or Alibaba Cloud Linux 3.

Logon Type

  • Key Pair: Alibaba Cloud SSH key pairs provide a secure and convenient logon authentication method comprising a public key and a private key. Supported only for Linux instances.

    Configure both the Username (root or ecs-user) and the required Key Pair.

  • Password: Configure the Username (root or ecs-user) and password.

Storage configuration

Parameter

Description

System Disk

Select a cloud disk type based on your business needs, including ESSD AutoPL, ESSD, ESSD Entry, and previous-generation disks (SSD and ultra disk). Configure capacity, IOPS, and other parameters.

Available system disk types depend on the selected instance family. Disk types not displayed are unsupported.

ESSD custom capabilities

  • Supports custom performance levels. Larger disk capacity allows higher performance levels (PL2 for capacities over 460 GiB, PL3 for over 1260 GiB). For details, see ESSD.

  • Only ESSD system disks support Encrypted. By default, Alibaba Cloud uses the service key (Default Service CMK) for encryption. You can also select a custom key (BYOK) pre-created in KMS.

Supports selecting More Disk Categories to configure disk types different from the primary System Disk, improving scale-out success rates. When creating nodes, ACK selects the first matching disk type from the specified order.

Cloud resource and billing information: imageECS block storage

Data Disk

Select a cloud disk type based on your business needs, including ESSD AutoPL, ESSD, ESSD Entry, and previous-generation disks (SSD and ultra disk). Configure capacity, IOPS, and other parameters.

Available data disk types depend on the selected instance family. Disk types not displayed are unsupported.

ESSD AutoPL support

  • Provisioned performance: Decouples disk capacity from performance, allowing flexible configuration of provisioned performance based on actual business needs without changing storage capacity.

  • Performance burst: Temporarily boosts performance to handle peak read/write demands until business stabilizes.

ESSD support

Supports custom performance levels. Larger disk capacity allows higher performance levels (PL2 for capacities over 460 GiB, PL3 for over 1260 GiB). For details, see ESSD.

  • When mounting data disks, all cloud disk types support Encrypted. By default, Alibaba Cloud uses the service key (Default Service CMK) for encryption. You can also select a custom key (BYOK) pre-created in KMS.

  • During node creation, the last data disk is automatically formatted, and /var/lib/container is mounted to this disk. /var/lib/kubelet and /var/lib/containerd are mounted to /var/lib/container.

    To customize mount directories, adjust the data disk initialization configuration. You can select only one data disk as the container runtime directory. For details, see Can I customize directory mounting for data disks in ACK node pools?
  • For scenarios requiring container image acceleration or rapid large model loading, use snapshots to create data disks, improving system response speed and processing capability.

Select Add Data Disk Type to configure disk types different from the primary Data Disk, improving scale-out success rates. When creating nodes, ACK selects the first matching disk type from the specified order.

An ECS instance can mount up to 64 data disks. The maximum number of disks supported varies by instance type. Query the disk quantity limit for an instance type using the DescribeInstanceTypes API (DiskQuantity).

Cloud resource and billing information: imageECS block storage

Elastic Ephemeral Disk

Whitelist feature. Submit a ticket to apply.

Elastic ephemeral disk provides high-performance, cost-effective temporary storage for ECS instances, suitable for temporary data (such as intermediate computation results, cached data, temporary files) and high-performance computing scenarios requiring high IOPS and throughput.

Supported only in specific regions and ECS instance types. For details, see Region limits, Instance type limits.

You can choose whether to configure initialization for the elastic ephemeral disk and customize its mount directory.

Cloud resource and billing information: imageECS block storage

Instance quantity

Parameter

Description

Expected Number of Nodes

The total number of nodes the node pool should maintain. We recommend configuring at least two nodes to ensure normal operation of cluster components. Adjust the desired node count to scale the node pool in or out. For details, see Scale node pools.

If you do not need to create nodes, enter 0 and adjust manually later or add existing nodes.
Advanced configuration

Expand Advanced Options (Optional) to configure the node scaling policy.

Parameter

Description

Scaling Policy

Configure how the node pool selects instances during scaling.

  • Priority-based Policy: Scales based on the vSwitch priority configured in the cluster (vSwitch order from top to bottom indicates decreasing priority). If instances cannot be created in the higher-priority zone, the next priority vSwitch is used automatically.

  • Cost Optimization: Scales from lowest to highest vCPU unit price.

    When the node pool uses Preemptible Instance, spot instances are prioritized. You can configure the Percentage of pay-as-you-go instances (%) to automatically supplement with pay-as-you-go instances when spot instances cannot be created due to inventory or other reasons.

  • Distribution Balancing: Distributes ECS instances evenly across multiple zones, but only in multi-zone scenarios. If zone distribution becomes unbalanced due to inventory shortages, you can rebalance.

Use Pay-as-you-go Instances When Spot Instances Are Insufficient

Requires selecting spot instances as the billing method.

When enabled, if sufficient spot instances cannot be created due to price or inventory reasons, ACK automatically attempts to create pay-as-you-go instances as a supplement.

Cloud resource and billing information: imageECS instance

Enable Supplemental Spot Instance

Requires selecting spot instances as the billing method.

When enabled, upon receiving a system notification that a spot instance will be reclaimed (5 minutes before reclamation), ACK attempts to scale out new instances for compensation.

  • Compensation successful: ACK drains the old node and removes it from the cluster.

  • Compensation failed: ACK does not drain the old node, and the instance is reclaimed after 5 minutes. When inventory is restored or price conditions are met, ACK automatically purchases instances to maintain the desired node count. For details, see Spot instance node pool best practices.

Active release of spot instances may cause business disruptions. To improve compensation success rates, we recommend also enabling Use Pay-as-you-go Instances When Spot Instances Are Insufficient.

Cloud resource and billing information: imageECS instance

Expand Advanced Options (Optional) to configure ECS tags, taints, and other information.

Parameter

Description

ECS Tags

Add tags to ECS instances automatically created by ACK as cloud resource identifiers. Each ECS instance can have up to 20 tags. To increase this limit, apply on the Quota Platform. Because ACK and ESS occupy some tags, you can specify up to 17 custom tags per instance.

Expand to view tag usage details

  • ACK occupies two ECS tags by default.

    • ack.aliyun.com:<Your cluster ID>

    • ack.alibabacloud.com/nodepool-id:<Your node pool ID>

  • ESS occupies one ECS tag by default: acs:autoscaling:scalingGroupId:<Your node pool scaling group ID>.

  • After enabling node autoscaling, Auto Scaling occupies two ECS tags by default, so the node pool occupies two additional ECS tags: k8s.io/cluster-autoscaler:true and k8s.aliyun.com:true.

  • After enabling node autoscaling, components use ECS tags to record node labels and taints for pre-checking scheduling behavior of scaled-out nodes.

    • Each node label is converted to k8s.io/cluster-autoscaler/node-template/label/<Label key>:<Label value>.

    • Each node taint is converted to k8s.io/cluster-autoscaler/node-template/taint/<Taint key>/<Taint value>:<Taint effect>.

Taints

Add key-value taints to nodes. A valid taint key includes an optional prefix and a name. If a prefix is specified, separate it from the name with a forward slash (/).

Expand to view details

  • Key: The name must be 1–63 characters long, start and end with a letter, digit, or character [a-z0-9A-Z], and can contain letters, digits, hyphens (-), underscores (_), and periods (.).

    If a prefix is specified, it must be a DNS subdomain, meaning a series of DNS labels separated by periods (.), up to 253 characters long, ending with a forward slash (/).

  • Value: Can be empty, up to 63 characters long, must start and end with a letter, digit, or character [a-z0-9A-Z], and can contain letters, digits, hyphens (-), underscores (_), and periods (.).

  • Effect:

    • NoSchedule: Prevents new pods that do not tolerate this taint from being scheduled to the node, but does not affect pods already running.

    • NoExecute: Prevents new pods that do not tolerate this taint from being scheduled to the node and evicts any running pods that do not tolerate this taint.

    • PreferNoSchedule: ACK tries to avoid scheduling pods to nodes with taints they cannot tolerate, but does not enforce this strictly.

Node Labels

Add key-value labels to nodes. A valid key includes an optional prefix and a name. If a prefix is specified, separate it from the name with a forward slash (/).

Expand to view details

  • Key: The name must be 1–63 characters long, start and end with an alphanumeric character [a-z0-9A-Z], and can contain letters, digits, hyphens (-), underscores (_), and periods (.).

    If a prefix is specified, it must be a DNS subdomain, meaning a series of DNS labels separated by periods (.), up to 253 characters long, ending with a forward slash (/).

    The following prefixes are reserved by Kubernetes core components and cannot be specified

    • kubernetes.io/

    • k8s.io/

    • Prefixes ending with kubernetes.io/ or k8s.io/. For example, test.kubernetes.io/.

      Exceptions:

      • kubelet.kubernetes.io/

      • node.kubernetes.io

      • Prefixes ending with kubelet.kubernetes.io/.

      • Prefixes ending with node.kubernetes.io.

  • Value: Can be empty, up to 63 characters long, must start and end with an alphanumeric character [a-z0-9A-Z], and can contain letters, digits, hyphens (-), underscores (_), and periods (.).

Set to Unschedulable

Newly added nodes are set as unschedulable by default when registered to the cluster. Manually adjust the node scheduling status in the node list.

This setting applies only to clusters running Kubernetes versions earlier than 1.34. For details, see Kubernetes 1.34 version notes.

CPU Policy

Specify the CPU management policy for kubelet nodes.

  • None: Default policy.

  • Static: Allows pods with certain resource characteristics on the node to have enhanced CPU affinity and exclusivity.

We recommend using Custom node pool kubelet configuration.

Custom Node Name

Node names consist of a prefix, node IP address, and suffix. When enabled, node names, ECS instance names, and ECS instance hostnames change accordingly.

Example: Node IP address is 192.XX.YY.55, prefix is aliyun.com, suffix is test.

  • Linux node: Node name, ECS instance name, and ECS instance hostname are all aliyun.com192.XX.YY.55test.

  • Windows node: Hostname is fixed as the IP address, with - replacing . in the IP address, and no prefix or suffix included.

    Thus, the ECS instance hostname is 192-XX-YY-55, while the node name and ECS instance name are aliyun.com192.XX.YY.55test.

Important

When the custom node name format depends on truncating part of the IP address, if the VPC CIDR block is large and the truncated IP length (lenOfIP) is insufficient, node name conflicts may occur, causing node scale-out failures in instant node elasticity scenarios.

Based on your VPC CIDR block, set the IP truncation length as follows:

  • For large CIDR blocks like 10.0.0.0/8 and 172.16.0.0/12, set lenOfIP to at least 9.

  • For the 192.168.0.0/16 CIDR block, set lenOfIP to at least 6.

Instance Metadata Access Mode

Supported only for clusters running Kubernetes 1.28 or later.

Configure the ECS instance metadata access mode. Inside the ECS instance, access the metadata service to obtain instance metadata, including instance ID, VPC information, NIC information, and other instance properties. For details, see Instance metadata.

  • Normal Mode and Security Hardening Mode: Supports accessing the metadata service using both normal and reinforced modes.

  • Security Hardening Mode: Supports accessing the metadata service using only reinforced mode. For details, see Use reinforced mode only to access ECS instance metadata.

Pre-defined Custom Data

Before nodes join the cluster, run the specified instance pre-user User-Data script.

Example: If the pre-user data is touch /tmp/pre-script, the combined script execution order on the node is as follows.

#!/bin/bash
# Input instance pre-user data executes here
touch /tmp/pre-script

# ACK node initialization script executes here
For the execution logic of this configuration during node initialization, see Node initialization process overview.

User Data

After nodes join the cluster, run the specified instance user User-Data script.

Example: If the instance user data is touch /tmp/post-script, the combined script execution order on the node is as follows.

#!/bin/bash
# ACK node initialization script executes here

# Input instance user data executes here
touch /tmp/post-script
For the execution logic of this configuration during node initialization, see Node initialization process overview.
Successful cluster creation or node scale-out does not guarantee successful execution of the instance user script. Log on to the node and run grep cloud-init /var/log/messages to view execution logs.

CloudMonitor Agent

View and monitor node and application status in the CloudMonitor console.

This setting applies only to new nodes added to the node pool, not existing nodes.

To enable this for existing nodes, install it in the CloudMonitor console.

Cloud resource and billing information: imageCloud Monitor

Public IP

ACK assigns an IPv4 public IP address to nodes.

This setting applies only to new nodes added to the node pool, not existing nodes. To grant public network access to existing nodes, configure and bind an EIP. For details, see Bind EIP to cloud resources.

Cloud resource and billing information: imageECS public network

Custom Security Group

Specify a basic or enterprise security group for the node pool. ACK does not add extra access rules to the security group. You must manage security group rules yourself to avoid access issues. For details, see Configure cluster security groups.

Each ECS instance has a limit on the number of security groups it can join. Ensure sufficient security group quota.

RDS Whitelist

Add node IPs to the RDS instance whitelist.

Step 5: Configure components

Click Next: Component Configurations to configure the components.

Parameter

Description

Ingress

Ingress manages how external traffic accesses services inside the cluster. Install it to expose cluster applications or APIs to the public network.

Three instance types are available as cluster Ingress gateways.

ALB Ingress

Routes traffic through Alibaba Cloud Application Load Balancer (ALB), offering rich routing policies, deep integration with cloud products like WAF, and elastic scaling. Suitable for large-scale, high-traffic production workloads or scenarios requiring enterprise-grade reliability.

You can create a new ALB instance or use an existing ALB instance in the current VPC that is not associated with another cluster (only when using an existing VPC).

To enable this later, see Create and use ALB Ingress to expose services externally.

Cloud resource and billing information: imageALB billing overview

Nginx Ingress

Compatible with and optimized from the community Nginx Ingress Controller.

You can create a new CLB instance or use an existing CLB instance in the current VPC that is not associated with another cluster.

To enable this later, see Create and use Nginx Ingress to expose services externally.

Cloud resource and billing information: imageCLB

MSE Ingress

Implemented based on MSE cloud-native gateway, providing advanced capabilities like service governance, authentication, and phased releases. Suitable for scenarios requiring fine-grained microservice traffic control.

You can create a new MSE cloud-native gateway instance or use an existing instance in the current VPC that is not associated with another cluster (only when using an existing VPC).

To enable this later, see Access Container Service through MSE Ingress.

Cloud resource and billing information: imageStandard instance billing overview

For a detailed comparison, see Ingress management.

Service Discovery

Installs NodeLocal DNSCache to cache DNS resolution results on nodes, improving DNS resolution performance and stability and accelerating internal service calls within the cluster.

Volume Plug-in

Implements persistent storage based on CSI storage plugins, supporting Alibaba Cloud cloud disks, NAS, OSS, CPFS, and other storage volumes.

When selecting default creation of NAS and CNFS, ACK automatically creates a general-purpose NAS file system and manages it using Container Network File System (CNFS).

To create CNFS later, see Manage NAS file systems through CNFS.

Cloud resource and billing information: imageNAS

Container Monitoring

You can use Alibaba Cloud Prometheus to view preconfigured monitoring dashboards and performance metrics for the cluster. For details, see Alibaba Cloud Prometheus monitoring.

Log Service

Use an existing SLS Project or create a new one to collect cluster application logs.

Also enables the cluster API Server audit feature to collect requests to the Kubernetes API and their results.

To enable this later, see Collect ACK cluster container logs, Use cluster API Server audit feature.

Cloud resource and billing information: imageSLS

Cluster Inspections

Enables the cluster inspection feature of artificial intelligence for IT operations (AIOps) to regularly scan quotas, resource usage, component versions, and other aspects within the cluster, ensuring configurations follow best practices and exposing potential risks early.

Step 6: Confirm configuration and billing

Click Next: Confirm.

On the Confirm page, review the cluster configuration, resource billing, and the cloud service dependency check. Then, read the terms of service.

A fee overview for the cluster appears at the bottom of the creation page. For more details on the billing for ACK and other cloud services, see Billing overview and Cloud service resource fees.

Note

Creating a cluster that contains multiple nodes takes about 10 minutes.

You can also click Equivalent Code in the upper-right corner of the Confirm Configuration page to generate Terraform or SDK example parameters for the current cluster configuration.

API

API Explorer

CreateCluster

Sample request

This is a sample request to create an ACK dedicated cluster. For a complete list of parameters, see CreateCluster.

POST /clusters 
<Common request headers>
{
    "cluster_type": "Kubernetes",    // The type of the cluster. To create an ACK dedicated cluster, you must set this parameter to `Kubernetes`. #required
    "name": "ACK-dedicated-cluster-example",
    "region_id": "cn-hongkong",      // The ID of the region for the cluster. This example uses `cn-hongkong`, which indicates the China (Hong Kong) region. #required
    "kubernetes_version": "1.32.1-aliyun.1",    // The Kubernetes version for the cluster. The latest version is recommended. 
    "snat_entry": true,                         // Whether to create an SNAT entry for the VPC to enable public network access.
    "endpoint_public_access": false,            // Whether to enable public network access to the API server.
    "cloud_monitor_flags": false,               // Whether to install the CloudMonitor agent on cluster nodes.
    "deletion_protection": false,               // Whether to enable cluster deletion protection.
    "proxy_mode": "ipvs",                       // The kube-proxy proxy mode. The `ipvs` mode provides high performance.
    "timezone": "Asia/Shanghai",
    "tags": [],
    "addons": [                                 // The components to install in the cluster.
        {
            "name": "terway-eniip",             // The network plugin for the cluster. This parameter cannot be modified after the cluster is created.
            "config": "{\"IPVlan\":\"false\",\"NetworkPolicy\":\"false\",\"ENITrunking\":\"false\"}"
        },
        {
            "name": "csi-plugin"
        },
        {
            "name": "csi-provisioner"
        },
        {
            "name": "storage-operator",
            "config": "{\"CnfsOssEnable\":\"false\",\"CnfsNasEnable\":\"false\"}"
        },
        {
            "name": "nginx-ingress-controller",
            "disabled": true
        }
    ],
    "node_port_range": "30000-32767",
    "pod_vswitch_ids": [                         // The vSwitches for pods. Required for Terway-based clusters, as each pod receives a dedicated IP address from a vSwitch.
        "vsw-j6cwz95vspl56gl******",
        "vsw-j6c1tgut51ude2v******"
    ],
    "login_password": "******",
    "charge_type": "PostPaid",
    "master_instance_charge_type": "PostPaid",
    "cpu_policy": "none",
    "service_account_issuer": "https://kubernetes.default.svc",
    "api_audiences": "https://kubernetes.default.svc",
    "master_count": 3,                         // The number of master nodes. Three master nodes are recommended for a high-availability cluster.
    "master_vswitch_ids": [                    // The vSwitches for the master nodes.
        "vsw-j6cwz95vspl56gl******",
        "vsw-j6c1tgut51ude2v******",
        "vsw-j6c1tgut51ude2v******"
    ],
    "master_instance_types": [                 // The instance types for the master nodes.
        "ecs.u1-c1m2.xlarge",
        "ecs.c7.xlarge",
        "ecs.c7.xlarge"
    ],
    "master_system_disk_category": "cloud_essd",      // The system disk type for the master nodes. This example uses ESSD.
    "master_system_disk_size": 120,                   // The size of the system disk, in GiB.
    "master_system_disk_performance_level": "PL1",    // The performance level of the ESSD system disk. `PL1` provides up to 50,000 IOPS per disk.
    "vpcid": "vpc-j6c6njo385se80n******",             // The ID of the VPC for the cluster. This must be determined during network planning and cannot be modified after the cluster is created. #required
    "worker_vswitch_ids": [
        "vsw-j6cwz95vspl56gl******",
        "vsw-j6c1tgut51ude2v******"
    ],
    "is_enterprise_security_group": true,
    "ip_stack": "ipv4",
    "service_cidr": "172.16.xx.xx/16",
    "nodepools": [                                                 
        {
            "nodepool_info": {
                "name": "default-nodepool"
            },
            "scaling_group": {
                "system_disk_category": "cloud_essd",
                "system_disk_size": 120,
                "system_disk_performance_level": "PL0",
                "system_disk_encrypted": false,
                "data_disks": [         
                    {
                        "category": "cloud_auto",
                        "size": 200,
                        "encrypted": "false",
                        "bursting_enabled": false
                    }
                ],
                "tags": [],
                "soc_enabled": false,
                "security_hardening_os": false,
                "vswitch_ids": [
                    "vsw-j6cwz95vspl56gl******",
                    "vsw-j6c1tgut51ude2v******"
                ],
                "instance_types": [
                    "ecs.g6.xlarge"
                ],
                "instance_patterns": [],
                "login_password": "******",
                "instance_charge_type": "PostPaid",
                "security_group_ids": [],
                "platform": "AliyunLinux",
                "image_id": "aliyun_3_x64_20G_alibase_20241218.vhd",
                "image_type": "AliyunLinux3",
                "desired_size": 3,               // The desired number of nodes in the node pool.
                "multi_az_policy": "BALANCE"
            },
            "kubernetes_config": {
                "cpu_policy": "none",
                "cms_enabled": false,
                "unschedulable": false,
                "runtime": "containerd",        // The container runtime. This parameter cannot be modified after the cluster is created.
                "runtime_version": "1.6.36"
            }
        }
    ]
}

Key parameters

When using the CreateCluster API to create an ACK dedicated cluster, note the following parameters:

Parameter

Description

Sample configuration

cluster_type

The type of the cluster. To create an ACK dedicated cluster, you must set this parameter to Kubernetes.

"cluster_type": "Kubernetes"

Terraform

SDK

For examples, see Use the Java SDK.

Related operations

  • View basic cluster information

    On the Clusters page, find the target cluster and click Details in the Actions column. Then, click the Basic Information and Connection Information tabs to view this information.

    • API server public endpoint: The public-facing address and port of the Kubernetes API server. Use this endpoint to manage the cluster from your local terminal with tools like kubectl.

      Bind EIP and Unbind EIP:

      • Bind EIP: Bind an existing EIP or create and bind a new one.

        Binding an EIP causes the API server to briefly restart. Do not perform operations on the cluster during this time.

      • Unbind EIP: After you unbind the EIP, the API server is no longer publicly accessible.

        Unbinding an EIP causes the API server to briefly restart. Do not perform operations on the cluster during this time.

    • API server internal endpoint: The internal address and port of the Kubernetes API server. This endpoint is accessible only from within the cluster's VPC. The IP address is for an internal-facing Server Load Balancer (SLB) instance.

  • View cluster logs

    In the Actions column, select More > Operations > View Logs to view the cluster's logs on the Log Center page.

  • View node information

    To view node information, obtain the cluster's kubeconfig file and use kubectl to connect to the cluster, and then run kubectl get node.

Quotas and limits

If you have a large cluster or your account contains many resources, you must be aware of the quotas and limits for using ACK clusters. For more information, see Quotas and limits.

  • Limits: These include ACK configuration limits (such as account balance) and single-cluster capacity limits (the maximum capacity of different Kubernetes resources within a single cluster).

  • Quota limits and quota increase requests: This includes quota limits for ACK clusters and the cloud products on which ACK depends, such as ECS and VPC. To request a quota increase, follow the instructions in the relevant documentation.