Create an auto-scaling node pool with Terraform

更新时间:
复制 MD 格式

Configure scaling_config in alicloud_cs_kubernetes_node_pool to auto-scale ACK nodes on demand.

Run these examples in .
Warning

Resources created here incur charges. Delete them with terraform destroy when done.

Prerequisites

Ensure that you have:

  • Auto Scaling is activated with the default role assigned.

    If you previously used the alicloud_cs_kubernetes_autoscaler component, Auto Scaling is already activated.
  • The AliyunOOSLifecycleHook4CSRole role is created for CloudOps Orchestration Service (OOS) access:

    1. Click AliyunOOSLifecycleHook4CSRole to open the RAM Quick Authorization page.

      Note
      • For Alibaba Cloud accounts: click the link directly.

      • For RAM users: your Alibaba Cloud account must have the AliyunOOSLifecycleHook4CSRole role assigned. Attach the AliyunRAMReadOnlyAccess policy to the RAM user. See Grant permissions to a RAM user.

    2. On the RAM Quick Authorization page, click Authorize.

  • A Terraform runtime environment configured with one of these options:

    Option Best for
    Quick testing with no local installation
    Cloud Shell Terraform with pre-configured credentials
    Local machine Unstable networks or custom development environments

Background

Alibaba Cloud Provider 1.111.0 introduced the alicloud_cs_kubernetes_node_pool resource to manage auto-scaling node pools. It replaces the older alicloud_cs_kubernetes_autoscaler component, which had three limitations:

  • Complex configuration with high operational overhead

  • Scaled nodes were added to the default node pool and could not be managed separately

  • Certain parameters could not be modified after creation

The alicloud_cs_kubernetes_node_pool resource addresses all three:

  • Auto scaling requires only two parameters: min_size and max_size

  • Each node pool is independently managed and visible in the ACK console

  • Optional parameters use safe defaults, preventing environment drift across nodes

Terraform resources

Resource Function
alicloud_instance_types Queries Elastic Compute Service (ECS) instance types that meet specified conditions
alicloud_vpc Creates a virtual private cloud (VPC)
alicloud_vswitch Creates vSwitches in a VPC
alicloud_cs_managed_kubernetes Creates an ACK managed cluster
alicloud_cs_kubernetes_node_pool Creates a node pool for an ACK managed cluster

Generate Terraform parameters from the ACK console

If these examples don't cover your configuration, generate the exact Terraform parameters from the ACK console.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Click the name of the target cluster. In the left navigation pane, choose Nodes > Node Pools.

  3. Click Create Node Pool, configure the parameters, and click Confirm. In the dialog box, click Console-to-Code.

  4. In the Console-to-Code panel, click the Terraform tab. The code block shows your node pool parameters. Click def60920129c15b1257006a07a2b4da4 to use it.

Create a node pool with auto scaling enabled

Choose the procedure that matches your situation.

If you previously used alicloud_cs_kubernetes_autoscaler

Migrate to alicloud_cs_kubernetes_node_pool:

Step 1: Modify the autoscaler-meta ConfigMap

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Click the name of the target cluster. In the left navigation pane, choose Configurations > ConfigMaps.

  3. On the ConfigMap page, select kube-system from the Namespace drop-down list. Find autoscaler-meta and click Edit in the Actions column.

  4. In the Edit panel, change the taints value from "taints":"" to "taints":[] in the Value text box.

  5. Click OK.

Step 2: Sync the node pool

  1. In the left navigation pane, choose Nodes > Node Pools.

  2. On the Node Pools page, click Sync Node Pool.

If you have not previously used alicloud_cs_kubernetes_autoscaler

All examples use scaling_config to define node count boundaries.

Add an auto-scaling node pool to an existing cluster

If you already have an ACK cluster, add a node pool with this minimal configuration:

provider "alicloud" {
}

# Add an auto-scaling node pool to an existing ACK cluster.
resource "alicloud_cs_kubernetes_node_pool" "at1" {
  # The ID of the existing ACK cluster.
  cluster_id     = ""
  name           = "np-test"
  # Specify at least one vSwitch for the node pool.
  vswitch_ids    = ["vsw-bp1mdigyhmilu2h4v****"]
  instance_types = ["ecs.e3.medium"]
  password       = "Hello1234"

  scaling_config {
    min_size = 1  # Minimum node count
    max_size = 5  # Maximum node count
  }
}

Create a cluster and an auto-scaling node pool together

Provision a VPC, an ACK Pro cluster, and an auto-scaling node pool in a single terraform apply:

provider "alicloud" {
  region = var.region_id
}

variable "region_id" {
  type    = string
  default = "cn-shenzhen"
}

variable "cluster_spec" {
  type        = string
  description = "The cluster specifications of kubernetes cluster,which can be empty. Valid values:ack.standard : Standard managed clusters; ack.pro.small : Professional managed clusters."
  default     = "ack.pro.small"
}

# Specify the zones of vSwitches.
variable "availability_zone" {
  description = "The availability zones of vswitches."
  default     = ["cn-shenzhen-c", "cn-shenzhen-e", "cn-shenzhen-f"]
}

# The CIDR blocks used to create vSwitches.
variable "node_vswitch_cidrs" {
  type        = list(string)
  default     = ["172.16.0.0/23", "172.16.2.0/23", "172.16.4.0/23"]
}

# This variable specifies the CIDR blocks in which Terway vSwitches are created.
variable "terway_vswitch_cidrs" {
  type        = list(string)
  default     = ["172.16.208.0/20", "172.16.224.0/20", "172.16.240.0/20"]
}

# Specify the ECS instance types of worker nodes.
variable "worker_instance_types" {
  description = "The ecs instance types used to launch worker nodes."
  default     = ["ecs.g6.2xlarge", "ecs.g6.xlarge"]
}

# Specify a password for the worker node.
variable "password" {
  description = "The password of ECS instance."
  default     = "Test123456"
}

# Specify the prefix of the name of the ACK managed cluster.
variable "k8s_name_prefix" {
  description = "The name prefix used to create managed kubernetes cluster."
  default     = "tf-ack-shenzhen"
}

# Specify the components to install in the ACK managed cluster.
# Components: Terway (network), csi-plugin (volume), csi-provisioner (volume),
# logtail-ds (logging), nginx-ingress-controller (Ingress), arms-prometheus (monitoring),
# ack-node-problem-detector (node diagnostics).
variable "cluster_addons" {
  type = list(object({
    name   = string
    config = string
  }))

  default = [
    {
      "name"   = "terway-eniip",
      "config" = "",
    },
    {
      "name"   = "logtail-ds",
      "config" = "{\"IngressDashboardEnabled\":\"true\"}",
    },
    {
      "name"   = "nginx-ingress-controller",
      "config" = "{\"IngressSlbNetworkType\":\"internet\"}",
    },
    {
      "name"   = "arms-prometheus",
      "config" = "",
    },
    {
      "name"   = "ack-node-problem-detector",
      "config" = "{\"sls_project_name\":\"\"}",
    },
    {
      "name"   = "csi-plugin",
      "config" = "",
    },
    {
      "name"   = "csi-provisioner",
      "config" = "",
    }
  ]
}

# The default resource names.
locals {
  k8s_name_terway         = "k8s_name_terway_${random_integer.default.result}"
  vpc_name                = "vpc_name_${random_integer.default.result}"
  autoscale_nodepool_name = "autoscale-node-pool-${random_integer.default.result}"
}

# Query ECS instance types with 8 vCPUs and 32 GiB memory to use as worker nodes.
data "alicloud_instance_types" "default" {
  cpu_core_count       = 8
  memory_size          = 32
  availability_zone    = var.availability_zone[0]
  kubernetes_node_role = "Worker"
}

resource "random_integer" "default" {
  min = 10000
  max = 99999
}

# The VPC.
resource "alicloud_vpc" "default" {
  vpc_name   = local.vpc_name
  cidr_block = "172.16.0.0/12"
}

# The node vSwitch.
resource "alicloud_vswitch" "vswitches" {
  count      = length(var.node_vswitch_cidrs)
  vpc_id     = alicloud_vpc.default.id
  cidr_block = element(var.node_vswitch_cidrs, count.index)
  zone_id    = element(var.availability_zone, count.index)
}

# The Pod vSwitch (used by Terway for Pod IP allocation).
resource "alicloud_vswitch" "terway_vswitches" {
  count      = length(var.terway_vswitch_cidrs)
  vpc_id     = alicloud_vpc.default.id
  cidr_block = element(var.terway_vswitch_cidrs, count.index)
  zone_id    = element(var.availability_zone, count.index)
}

# The ACK managed cluster.
resource "alicloud_cs_managed_kubernetes" "default" {
  name         = local.k8s_name_terway
  cluster_spec = var.cluster_spec  # ack.pro.small creates an ACK Pro cluster.
  # vSwitches must reside in the zones specified by availability_zone.
  worker_vswitch_ids           = split(",", join(",", alicloud_vswitch.vswitches.*.id))
  pod_vswitch_ids              = split(",", join(",", alicloud_vswitch.terway_vswitches.*.id))
  new_nat_gateway              = true   # Create a NAT gateway for the cluster.
  service_cidr                 = "10.11.0.0/16"  # Must not overlap with the VPC CIDR block. Maximum 256 hosts.
  slb_internet_enabled         = true   # Create an Internet-facing SLB instance for the API server.
  enable_rrsa                  = true
  control_plane_log_components = ["apiserver", "kcm", "scheduler", "ccm"]
  dynamic "addons" {
    for_each = var.cluster_addons
    content {
      name   = lookup(addons.value, "name", var.cluster_addons)
      config = lookup(addons.value, "config", var.cluster_addons)
    }
  }
}

# Create a node pool with auto scaling enabled. Scales between 1 and 10 nodes.
resource "alicloud_cs_kubernetes_node_pool" "autoscale_node_pool" {
  cluster_id     = alicloud_cs_managed_kubernetes.default.id
  node_pool_name = local.autoscale_nodepool_name
  vswitch_ids    = split(",", join(",", alicloud_vswitch.vswitches.*.id))

  scaling_config {
    min_size = 1
    max_size = 10
  }

  instance_types        = var.worker_instance_types
  password              = var.password  # SSH login password for worker nodes.
  install_cloud_monitor = true          # Install the CloudMonitor agent on nodes.
  system_disk_category  = "cloud_efficiency"
  system_disk_size      = 100
  image_type            = "AliyunLinux3"

  data_disks {
    category = "cloud_essd"  # Disk category.
    size     = 120            # Disk size in GiB.
  }
}

Step 1: Initialize Terraform

terraform init

A successful initialization returns:

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Step 2: Apply the configuration

terraform apply

Step 3: Verify the result

After apply completes, go to the Node Pools page in the ACK console. The new node pool shows Auto Scaling Enabled below its name.

Clean up resources

Delete all resources created by this configuration:

terraform destroy

terraform destroy is covered in Common commands.

Next steps

References