Terraform是HashiCorp公司提供的一种开源工具,用于安全高效地预览,配置和管理云基础架构和资源,帮助开发者自动化地创建、更新阿里云基础设施资源,并进行版本管理。本文介绍如何使用Terraform创建ACK托管版集群。

前提条件

  • 安装Terraform。
    说明 请确认Terraform版本不低于v0.12.28,可通过terraform --version命令查看Terraform版本。
    • Cloud Shell默认安装配置了Terraform和阿里云账号信息,无需任何额外配置。
    • 如果您不使用Cloud Shell,关于安装Terraform的方式,请参见在本地安装和配置Terraform
  • 配置阿里云账号信息。
    • 创建环境变量,用于存放身份认证信息。
      export ALICLOUD_ACCESS_KEY="************"
      export ALICLOUD_SECRET_KEY="************"
      export ALICLOUD_REGION="cn-beijing"
    • 通过在配置文件的provider代码块中指定身份认证信息。
      provider "alicloud" {
        access_key = "************"
        secret_key = "************"
        region     = "cn-beijing"
      }
    说明 为提高权限管理的灵活性和安全性,建议您创建名为Terraform的RAM用户,并为该RAM用户创建AccessKey和授权。具体操作,请参见创建RAM用户为RAM用户授权
  • 展开查看本文用到的variable.tf文件
     variable "availability_zone" {
      description = "The availability zones of vswitches."
      default     = ["cn-shenzhen-d", "cn-shenzhen-e", "cn-shenzhen-f"]
    }
    
    variable "node_vswitch_ids" {
      description = "List of existing node vswitch ids for terway."
      type        = list(string)
      default     = []
    }
    
    variable "node_vswitch_cirds" {
      description = "List of cidr blocks used to create several new vswitches when 'node_vswitch_ids' is not specified."
      type        = list(string)
      default     = ["172.16.0.0/23", "172.16.2.0/23", "172.16.4.0/23"]
    }
    
    variable "terway_vswitch_ids" {
      description = "List of existing pod vswitch ids for terway."
      type        = list(string)
      default     = []
    }
    
    variable "terway_vswitch_cirds" {
      description = "List of cidr blocks used to create several new vswitches when 'terway_vswitch_ids' is not specified."
      type        = list(string)
      default     = ["172.16.208.0/20", "172.16.224.0/20", "172.16.240.0/20"]
    }
    
    # Node Pool worker_instance_types
    variable "worker_instance_types" {
      description = "The ecs instance types used to launch worker nodes."
      default     = ["ecs.g6.2xlarge", "ecs.g6.xlarge"]
    }
    
    # Password for Worker nodes
    variable "password" {
      description = "The password of ECS instance."
      default     = "yjh@ACK123"
    }
    
    # Cluster Addons
    variable "cluster_addons" {
      type = list(object({
        name      = string
        config    = string
      }))
    
      default = [
        {
          "name"     = "terway-eniip",
          "config"   = "",
        },
        {
          "name"     = "logtail-ds",
          "config"   = "{\"IngressDashboardEnabled\":\"true\"}",
        },
        {
          "name"     = "nginx-ingress-controller",
          "config"   = "{\"IngressSlbNetworkType\":\"internet\"}",
        },
        {
          "name"     = "arms-prometheus",
          "config"   = "",
          "disabled": false,
        },
        {
          "name"     = "ack-node-problem-detector",
          "config"   = "{\"sls_project_name\":\"\"}",
          "disabled": false,
        },
        {
          "name"     = "csi-plugin",
          "config"   = "",
        },
        {
          "name"     = "csi-provisioner",
          "config"   = "",
        }
      ]
    }
    
    
    
    # Cluster Addons for Flannel
    variable "cluster_addons_flannel" {
      type = list(object({
        name      = string
        config    = string
      }))
    
      default = [
        {
          "name"     = "flannel",
          "config"   = "",
        },
        {
          "name"     = "logtail-ds",
          "config"   = "{\"IngressDashboardEnabled\":\"true\"}",
        },
        {
          "name"     = "nginx-ingress-controller",
          "config"   = "{\"IngressSlbNetworkType\":\"internet\"}",
        },
        {
          "name"     = "arms-prometheus",
          "config"   = "",
          "disabled": false,
        },
        {
          "name"     = "ack-node-problem-detector",
          "config"   = "{\"sls_project_name\":\"\"}",
          "disabled": false,
        },
        {
          "name"     = "csi-plugin",
          "config"   = "",
        },
        {
          "name"     = "csi-provisioner",
          "config"   = "",
        }
      ]
    }
             
        

使用Terraform创建ACK托管版集群(Flannel)

  1. 创建一个工作目录,并且在工作目录中创建以下名为main.tf的配置文件。
    main.tf文件描述了以下的Terraform配置:
    • 创建一个新的VPC,并创建一个该VPC下的vSwitch。
    • 创建一个托管版ACK集群。
    • 创建一个包含两个节点的节点池。
    #provider, use alicloud
    provider "alicloud" {
      #access_key = "************"
      #secret_key = "************"
      #region     = "cn-shenzhen"
    }
    variable "k8s_name_prefix" {
      description = "The name prefix used to create managed kubernetes cluster."
      default     = "tf-ack-shenzhen"
    }
    resource "random_uuid" "this" {}
    # 默认资源名称。
    locals {
      k8s_name_terway         = substr(join("-", [var.k8s_name_prefix,"terway"]), 0, 63)
      k8s_name_flannel        = substr(join("-", [var.k8s_name_prefix,"flannel"]), 0, 63)
      k8s_name_ask            = substr(join("-", [var.k8s_name_prefix,"ask"]), 0, 63)
      new_vpc_name            = "tf-vpc-172-16"
      new_vsw_name_azD        = "tf-vswitch-azD-172-16-0"
      new_vsw_name_azE        = "tf-vswitch-azE-172-16-2"
      new_vsw_name_azF        = "tf-vswitch-azF-172-16-4"
      nodepool_name           = "default-nodepool"
      log_project_name        = "log-for-${local.k8s_name_terway}"
    }
    # 节点ECS实例配置。将查询满足CPU、Memory要求的ECS实例类型。
    data "alicloud_instance_types" "default" {
      cpu_core_count       = 8
      memory_size          = 32
      availability_zone    = var.availability_zone[0]
      kubernetes_node_role = "Worker"
    }
    // 满足实例规格的AZ。
    data "alicloud_zones" "default" {
      available_instance_type = data.alicloud_instance_types.default.instance_types[0].id
    }
    # 专有网络。
    resource "alicloud_vpc" "default" {
      vpc_name   = local.new_vpc_name
      cidr_block = "172.16.0.0/12"
    }
    # Node交换机。
    resource "alicloud_vswitch" "vswitches" {
      count             = length(var.node_vswitch_ids) > 0 ? 0 : length(var.node_vswitch_cirds)
      vpc_id            = alicloud_vpc.default.id
      cidr_block        = element(var.node_vswitch_cirds, count.index)
      availability_zone = element(var.availability_zone, count.index)
    }
    
    # Kubernetes托管版。
    resource "alicloud_cs_managed_kubernetes" "flannel" {
      # Kubernetes集群名称。
      name                      = local.k8s_name_flannel
      # 创建Pro版集群。
      cluster_spec              = "ack.pro.small"
      version                   = "1.22.10-aliyun.1"
      # 新的Kubernetes集群将位于的vSwitch。指定一个或多个vSwitch的ID。它必须在availability_zone指定的区域中。
      worker_vswitch_ids        = split(",", join(",", alicloud_vswitch.vswitches.*.id))
    
      # 是否在创建Kubernetes集群时创建新的NAT网关。默认为true。
      new_nat_gateway           = true
      # Pod网络的CIDR块。当cluster_network_type设置为flannel,你必须设定该参数。它不能与VPC CIDR相同,并且不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。集群中允许的最大主机数量:256。
      pod_cidr                  = "10.10.0.0/16"
      # 服务网络的CIDR块。它不能与VPC CIDR相同,不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。
      service_cidr              = "10.12.0.0/16"
      # 是否为API Server创建Internet负载均衡。默认为false。
      slb_internet_enabled      = true
    
      # Enable Ram Role for ServiceAccount
      enable_rrsa = true
    
      # 控制平面日志。
      control_plane_log_components = ["apiserver", "kcm", "scheduler"]
    
      # 组件管理。
      dynamic "addons" {
        for_each = var.cluster_addons_flannel
        content {
          name     = lookup(addons.value, "name", var.cluster_addons_flannel)
          config   = lookup(addons.value, "config", var.cluster_addons_flannel)
          # disabled = lookup(addons.value, "disabled", var.cluster_addons_flannel)
        }
      }
    
      # 容器运行时。
      runtime = {
        name    = "docker"
        version = "19.03.15"
      }
    }
    
    # 节点池。
    resource "alicloud_cs_kubernetes_node_pool" "flannel" {
      # Kubernetes集群名称。
      cluster_id            = alicloud_cs_managed_kubernetes.flannel.id
      # 节点池名称。
      name                  = local.nodepool_name
      # 新的Kubernetes集群将位于的vSwitch。指定一个或多个vSwitch的ID。它必须在availability_zone指定的区域中。
      vswitch_ids           = split(",", join(",", alicloud_vswitch.vswitches.*.id))
    
      # Worker ECS Type and ChargeType
      # instance_types      = [data.alicloud_instance_types.default.instance_types[0].id]
      instance_types        = var.worker_instance_types
      instance_charge_type  = "PrePaid"
      period                = 1
      period_unit           = "Month"
      auto_renew            = true
      auto_renew_period     = 1
    
      # customize worker instance name
      # node_name_mode      = "customized,ack-flannel-shenzhen,ip,default"
    
      #Container Runtime
      runtime_name          = "docker"
      runtime_version       = "19.03.15"
    
      # Kubernetes集群的总工作节点数。默认值为3。最大限制为50。
      desired_size          = 2
      # SSH登录集群节点的密码。
      password              = var.password
    
      # 是否为Kubernetes的节点安装云监控。
      install_cloud_monitor = true
    
      # 节点的系统磁盘类别。其有效值为cloud_ssd和cloud_efficiency。默认为cloud_efficiency。
      system_disk_category  = "cloud_efficiency"
      system_disk_size      = 100
    
      # OS Type
      image_type            = "AliyunLinux"
    
      # 节点数据盘配置。
      data_disks {
        # 节点数据盘种类。
        category = "cloud_essd"
        # 节点数据盘大小。
        size     = 120
      }
    }
  2. 执行以下命令初始化Terraform运行环境。
    terraform init
    预期输出:
    Initializing the backend...
    
    Initializing provider plugins...
    - Checking for available provider plugins...
    - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1...
    ...
    
    You may now begin working with Terraform. Try running "terraform plan" to see
    any changes that are required for your infrastructure. All Terraform commands
    should now work.
    
    If you ever set or change modules or backend configuration for Terraform,
    rerun this command to reinitialize your working directory. If you forget, other
    commands will detect it and remind you to do so if necessary.
  3. 执行以下命令生成资源规划。
    terraform plan
    预期输出:
    Refreshing Terraform state in-memory prior to plan...
    The refreshed state will be used to calculate this plan, but will not be
    persisted to local or remote state storage.
    ...
    Plan: 5 to add, 0 to change, 0 to destroy.
    ...
  4. 执行以下命令创建集群。
    terraform apply
    预期输出:
    ...
    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    ...
    alicloud_cs_managed_kubernetes.flannel: Creation complete after 8m26s [id=************]
    
    Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

使用Terraform创建ACK托管版集群(Terway)

  1. 创建一个工作目录,并且在工作目录中创建以下名为main.tf的配置文件。
    main.tf文件描述了以下的Terraform配置:
    • 创建一个新的VPC,并创建两个该VPC下的vSwitch。
    • 创建一个托管版ACK集群。
    • 创建一个包含两个节点的节点池。
    • 创建一个自动伸缩节点池。
    • 创建一个托管节点池。
    #provider, use alicloud
    provider "alicloud" {
      #access_key = "************"
      #secret_key = "************"
      #region     = "cn-shenzhen"
    }
    variable "k8s_name_prefix" {
      description = "The name prefix used to create managed kubernetes cluster."
      default     = "tf-ack-shenzhen"
    }
    resource "random_uuid" "this" {}
    # 默认资源名称。
    locals {
      k8s_name_terway         = substr(join("-", [var.k8s_name_prefix,"terway"]), 0, 63)
      k8s_name_flannel        = substr(join("-", [var.k8s_name_prefix,"flannel"]), 0, 63)
      k8s_name_ask            = substr(join("-", [var.k8s_name_prefix,"ask"]), 0, 63)
      new_vpc_name            = "tf-vpc-172-16"
      new_vsw_name_azD        = "tf-vswitch-azD-172-16-0"
      new_vsw_name_azE        = "tf-vswitch-azE-172-16-2"
      new_vsw_name_azF        = "tf-vswitch-azF-172-16-4"
      nodepool_name           = "default-nodepool"
      managed_nodepool_name   = "managed-node-pool"
      autoscale_nodepool_name = "autoscale-node-pool"
      log_project_name        = "log-for-${local.k8s_name_terway}"
    }
    # 节点ECS实例配置。将查询满足CPU、Memory要求的ECS实例类型。
    data "alicloud_instance_types" "default" {
      cpu_core_count       = 8
      memory_size          = 32
      availability_zone    = var.availability_zone[0]
      kubernetes_node_role = "Worker"
    }
    // 满足实例规格的AZ。
    data "alicloud_zones" "default" {
      available_instance_type = data.alicloud_instance_types.default.instance_types[0].id
    }
    # 专有网络。
    resource "alicloud_vpc" "default" {
      vpc_name   = local.new_vpc_name
      cidr_block = "172.16.0.0/12"
    }
    # Node交换机。
    resource "alicloud_vswitch" "vswitches" {
      count             = length(var.node_vswitch_ids) > 0 ? 0 : length(var.node_vswitch_cirds)
      vpc_id            = alicloud_vpc.default.id
      cidr_block        = element(var.node_vswitch_cirds, count.index)
      availability_zone = element(var.availability_zone, count.index)
    }
    # Pod交换机。
    resource "alicloud_vswitch" "terway_vswitches" {
      count             = length(var.terway_vswitch_ids) > 0 ? 0 : length(var.terway_vswitch_cirds)
      vpc_id            = alicloud_vpc.default.id
      cidr_block        = element(var.terway_vswitch_cirds, count.index)
      availability_zone = element(var.availability_zone, count.index)
    }
    # Kubernetes托管版。
    resource "alicloud_cs_managed_kubernetes" "default" {
      # Kubernetes集群名称。
      name                      = local.k8s_name_terway
      # 创建Pro版集群。
      cluster_spec              = "ack.pro.small"
      version                   = "1.22.10-aliyun.1"
      # 新的Kubernetes集群将位于的vSwitch。指定一个或多个vSwitch的ID。它必须在availability_zone指定的区域中。
      worker_vswitch_ids        = split(",", join(",", alicloud_vswitch.vswitches.*.id))
    
      # Pod虚拟交换机。
      pod_vswitch_ids           = split(",", join(",", alicloud_vswitch.terway_vswitches.*.id))
    
      # 是否在创建Kubernetes集群时创建新的NAT网关。默认为true。
      new_nat_gateway           = true
      # Pod网络的CIDR块。当cluster_network_type设置为flannel,你必须设定该参数。它不能与VPC CIDR相同,并且不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。集群中允许的最大主机数量:256。
      # pod_cidr                  = "10.10.0.0/16"
      # 服务网络的CIDR块。它不能与VPC CIDR相同,不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。
      service_cidr              = "10.11.0.0/16"
      # 是否为API Server创建Internet负载均衡。默认为false。
      slb_internet_enabled      = true
    
      # Enable Ram Role for ServiceAccount
      enable_rrsa = true
    
      # 控制平面日志。
      control_plane_log_components = ["apiserver", "kcm", "scheduler"]
    
      # 组件管理。
      dynamic "addons" {
        for_each = var.cluster_addons
        content {
          name     = lookup(addons.value, "name", var.cluster_addons)
          config   = lookup(addons.value, "config", var.cluster_addons)
          # disabled = lookup(addons.value, "disabled", var.cluster_addons)
        }
      }
    
      runtime = {
        name    = "docker"
        version = "19.03.15"
      }
    }
    
    # 普通节点池。
    resource "alicloud_cs_kubernetes_node_pool" "default" {
      # Kubernetes集群名称。
      cluster_id            = alicloud_cs_managed_kubernetes.default.id
      # 节点池名称。
      name = local.nodepool_name
      # 新的Kubernetes集群将位于的vSwitch。指定一个或多个vSwitch的ID。它必须在availability_zone指定的区域中。
      vswitch_ids           = split(",", join(",", alicloud_vswitch.vswitches.*.id))
    
      # Worker ECS Type and ChargeType
      # instance_types      = [data.alicloud_instance_types.default.instance_types[0].id]
      instance_types        = var.worker_instance_types
      instance_charge_type  = "PrePaid"
      period                = 1
      period_unit           = "Month"
      auto_renew            = true
      auto_renew_period     = 1
    
      # customize worker instance name
      # node_name_mode      = "customized,ack-terway-shenzhen,ip,default"
    
      #Container Runtime
      runtime_name          = "docker"
      runtime_version       = "19.03.15"
    
      # Kubernetes集群的总工作节点数。默认值为3。最大限制为50。
      desired_size          = 2
      # SSH登录集群节点的密码。
      password              = var.password
    
      # 是否为Kubernetes的节点安装云监控。
      install_cloud_monitor = true
    
      # 节点的系统磁盘类别。其有效值为cloud_ssd和cloud_efficiency。默认为cloud_efficiency。
      system_disk_category  = "cloud_efficiency"
      system_disk_size      = 100
    
      # OS Type
      image_type            = "AliyunLinux"
    
      # 节点数据盘配置。
      data_disks {
        # 节点数据盘种类。
        category = "cloud_essd"
        # 节点数据盘大小。
        size     = 120
      }
    }
    
    # 托管节点池。
    resource "alicloud_cs_kubernetes_node_pool" "managed_node_pool" {
      # Kubernetes集群名称。
      cluster_id              = alicloud_cs_managed_kubernetes.default.id
      # 节点池名称。
      name = local.managed_nodepool_name
      # 新的Kubernetes集群将位于的vSwitch。指定一个或多个vSwitch的ID。它必须在availability_zone指定的区域中。
      vswitch_ids             = split(",", join(",", alicloud_vswitch.vswitches.*.id))
    
      # Kubernetes集群的总工作节点数。默认值为3。最大限制为50。
      desired_size            = 2
    
      # Managed Node Pool
      management {
        auto_repair     = true
        auto_upgrade    = true
        surge           = 1
        max_unavailable = 1
      }
    
      # Worker ECS Type and ChargeType
      # instance_types      = [data.alicloud_instance_types.default.instance_types[0].id]
      instance_types        = var.worker_instance_types
      instance_charge_type  = "PrePaid"
      period                = 1
      period_unit           = "Month"
      auto_renew            = true
      auto_renew_period     = 1
    
      # customize worker instance name
      # node_name_mode      = "customized,ack-terway-shenzhen,ip,default"
    
      #Container Runtime
      runtime_name          = "containerd"
      runtime_version       = "1.5.10"
    
    
      # SSH登录集群节点的密码。
      password              = var.password
    
      # 是否为kubernetes的节点安装云监控。
      install_cloud_monitor = true
    
      # 节点的系统磁盘类别。其有效值为cloud_ssd和cloud_efficiency。默认为cloud_efficiency。
      system_disk_category  = "cloud_efficiency"
      system_disk_size      = 100
    
      # OS Type
      image_type            = "AliyunLinux"
    
      # 节点数据盘配置。
      data_disks {
        # 节点数据盘种类。
        category = "cloud_essd"
        # 节点数据盘大小。
        size     = 120
      }
    }
    
    # 自动伸缩节点池。
    resource "alicloud_cs_kubernetes_node_pool" "autoscale_node_pool" {
      # Kubernetes集群名称。
      cluster_id                      = alicloud_cs_managed_kubernetes.default.id
      # 节点池名称。
      name = local.autoscale_nodepool_name
      # 新的Kubernetes集群将位于的vSwitch。指定一个或多个vSwitch的ID。它必须在availability_zone指定的区域中。
      vswitch_ids        = split(",", join(",", alicloud_vswitch.vswitches.*.id))
    
    
    
      # AutoScale Node Pool
      scaling_config {
        min_size = 1
        max_size = 10
      }
    
      # Worker ECS Type and ChargeType
      # instance_types      = [data.alicloud_instance_types.default.instance_types[0].id]
      instance_types        = var.worker_instance_types
    
    
      # customize worker instance name
      # node_name_mode      = "customized,ack-terway-shenzhen,ip,default"
    
      #Container Runtime
      runtime_name          = "containerd"
      runtime_version       = "1.5.10"
    
    
      # SSH登录集群节点的密码。
      password              = var.password
    
      # 是否为kubernetes的节点安装云监控。
      install_cloud_monitor = true
    
      # 节点的系统磁盘类别。其有效值为cloud_ssd和cloud_efficiency。默认为cloud_efficiency。
      system_disk_category  = "cloud_efficiency"
      system_disk_size      = 100
    
      # OS Type
      image_type            = "AliyunLinux"
    
      # 节点数据盘配置。
      data_disks {
        # 节点数据盘种类。
        category = "cloud_essd"
        # 节点数据盘大小。
        size     = 120
      }
    }
  2. 执行以下命令初始化Terraform运行环境。
    terraform init

    预期输出:

    Initializing the backend...
    
    Initializing provider plugins...
    - Checking for available provider plugins...
    - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1...
    ...
    
    You may now begin working with Terraform. Try running "terraform plan" to see
    any changes that are required for your infrastructure. All Terraform commands
    should now work.
    
    If you ever set or change modules or backend configuration for Terraform,
    rerun this command to reinitialize your working directory. If you forget, other
    commands will detect it and remind you to do so if necessary.
  3. 执行以下命令生成资源规划。
    terraform plan

    预期输出:

    Refreshing Terraform state in-memory prior to plan...
    The refreshed state will be used to calculate this plan, but will not be
    persisted to local or remote state storage.
    ...
    Plan: 8 to add, 0 to change, 0 to destroy.
    ...
  4. 执行以下命令创建资源。
    terraform apply

    预期输出:

    ...
    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    ...
    alicloud_cs_managed_kubernetes.default: Creation complete after 8m26s [id=************]
    
    Apply complete! Resources: 8 added, 0 changed, 0 destroyed.

使用Terraform删除ACK托管版集群

您可以执行以下命令删除通过Terraform创建的集群。
terraform destroy
预期输出:
...
Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
...
Destroy complete! Resources: 5 destroyed.