使用Terraform创建托管版Kubernetes

更新时间:

在容器服务控制台,我们为您提供了便捷使用的可视界面,一步一步引导式地创建该类型集群。但当您需要反复创建托管版集群、大批量创建集群,使用控制台操作就显得繁琐了, 使用Terraform将会帮您解决这些问题。本文将介绍如何使用Terraform快速部署一个托管版的Kubernetes集群。

说明

本教程所含示例代码支持一键运行,您可以直接运行代码。一键运行

创建托管版 Kubernetes 集群

在阿里云托管版Kubernetes Terraform资源文档 alicloud_cs_managed_kubernetes中,可以看到该资源提供的参数列表。参数分为入参Argument和出参Attributes。入参列表内包含了必填参数以及可选参数,例如name和name_prefix就是一对必填参数,但它们互斥,即不能同时填写。如果填了name,集群名就是name的值,如果填了name_prefix,集群名会以name_prefix开头自动生成一个。在创建具备伸缩功能的节点池前,要为账号赋予相应权限,具体可参考 通过Terraform创建具备自动伸缩功能的节点池

  1. 对照文档中的入参列表Argument Reference,先编写出一个集群的描述,代码如下:

    说明

    当前示例代码支持一键运行,您可以直接运行代码。一键运行

    provider "alicloud" {
      region = var.region
    }
    
    variable "region" {
      default = "cn-zhangjiakou"
    }
    
    # 默认资源名称
    variable "name" {
      default = "my-first-kubernetes-demo"
    }
    # 日志服务项目名称
    variable "log_project_name" {
      default = "my-first-kubernetes-sls-demo"
    }
    # 可用区
    data "alicloud_zones" "default" {
      available_resource_creation = "VSwitch"
    }
    # 节点ECS实例配置
    data "alicloud_instance_types" "default" {
      availability_zone    = data.alicloud_zones.default.zones[0].id
      cpu_core_count       = 2
      memory_size          = 4
      kubernetes_node_role = "Worker"
    }
    # 专有网络
    resource "alicloud_vpc" "default" {
      vpc_name   = var.name
      cidr_block = "10.1.0.0/21"
    }
    # 交换机
    resource "alicloud_vswitch" "default" {
      vswitch_name = var.name
      vpc_id       = alicloud_vpc.default.id
      cidr_block   = "10.1.1.0/24"
      zone_id      = data.alicloud_zones.default.zones[0].id
    }
    
    # kubernetes托管版
    resource "alicloud_cs_managed_kubernetes" "default" {
      worker_vswitch_ids = [alicloud_vswitch.default.id]
      # kubernetes集群名称的前缀。与name冲突。如果指定,terraform将使用它来构建唯一的集群名称。默认为“ Terraform-Creation”。
      name_prefix = var.name
      # 是否在创建kubernetes集群时创建新的nat网关。默认为true。
      new_nat_gateway = true
      # pod网络的CIDR块。当cluster_network_type设置为flannel,你必须设定该参数。它不能与VPC CIDR相同,并且不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。集群中允许的最大主机数量:256。
      pod_cidr = "172.20.0.0/16"
      # 服务网络的CIDR块。它不能与VPC CIDR相同,不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。
      service_cidr = "172.21.0.0/20"
      # 是否为API Server创建Internet负载均衡。默认为false。
      slb_internet_enabled = true
    }
    
    resource "alicloud_cs_kubernetes_node_pool" "default" {
      node_pool_name         = var.name
      cluster_id   = alicloud_cs_managed_kubernetes.default.id
      vswitch_ids  = [alicloud_vswitch.default.id]
      # ssh登录集群节点的密码。您必须指定password或key_name kms_encrypted_password字段。
      password = "Yourpassword1234"
      # kubernetes集群的总工作节点数。
      desired_size = 2
      # 是否为kubernetes的节点安装云监控。
      install_cloud_monitor = true
      # 节点的ECS实例类型。为单个AZ集群指定一种类型,为MultiAZ集群指定三种类型。您可以通过数据源instance_types获得可用的kubernetes主节点实例类型
      instance_types        = ["ecs.n4.large"]
      # 节点的系统磁盘类别。其有效值为cloud_ssd和cloud_efficiency。默认为cloud_efficiency。
      system_disk_category  = "cloud_efficiency"
      system_disk_size      = 40
      data_disks {
        category = "cloud_ssd"
        size = "100"
      }
    }
  2. 将以上的配置保存为一个main.tf描述文件,在该文件的当前目录下执行terraform initterraform apply

    1. 执行terraform init命令初始化。

      $ terraform init                                                                    
      
      Initializing the backend...
      
      Initializing provider plugins...
      - Finding latest version of aliyun/alicloud...
      - Installing aliyun/alicloud v1.214.1...
      - Installed aliyun/alicloud v1.214.1 (verified checksum)
      
      Terraform has created a lock file .terraform.lock.hcl to record the provider
      selections it made above. Include this file in your version control repository
      so that Terraform can guarantee to make the same selections by default when
      you run "terraform init" in the future.
      
      ╷
      │ Warning: Incomplete lock file information for providers
      │ 
      │ Due to your customized provider installation methods, Terraform was forced to calculate lock file checksums locally for the following providers:
      │   - aliyun/alicloud
      │ 
      │ The current .terraform.lock.hcl file only includes checksums for darwin_amd64, so Terraform running on another platform will fail to install these providers.
      │ 
      │ To calculate additional checksums for another platform, run:
      │   terraform providers lock -platform=linux_amd64
      │ (where linux_amd64 is the platform to generate)
      ╵
      
      Terraform has been successfully initialized!
      
      You may now begin working with Terraform. Try running "terraform plan" to see
      any changes that are required for your infrastructure. All Terraform commands
      should now work.
      
      If you ever set or change modules or backend configuration for Terraform,
      rerun this command to reinitialize your working directory. If you forget, other
      commands will detect it and remind you to do so if necessary.
    2. 执行terraform apply命令创建资源。

      $ terraform apply  
      
      data.alicloud_zones.default: Reading...
      data.alicloud_zones.default: Read complete after 1s [id=2604238681]
      data.alicloud_instance_types.default: Reading...
      data.alicloud_instance_types.default: Read complete after 1s [id=1017980362]
      
      Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
        + create
      
      Terraform will perform the following actions:
      
        # alicloud_cs_kubernetes_node_pool.default will be created
        + resource "alicloud_cs_kubernetes_node_pool" "default" {
            + cluster_id                 = (known after apply)
            + deployment_set_id          = (known after apply)
            + desired_size               = 2
            + format_disk                = (known after apply)
            + id                         = (known after apply)
            + image_id                   = (known after apply)
            + image_type                 = (known after apply)
            + install_cloud_monitor      = true
            + instance_charge_type       = "PostPaid"
            + instance_types             = [
                + "ecs.n4.large",
              ]
            + internet_charge_type       = (known after apply)
            + internet_max_bandwidth_out = (known after apply)
            + keep_instance_name         = (known after apply)
            + name                       = "my-first-kubernetes-demo"
            + node_count                 = (known after apply)
            + node_name_mode             = (known after apply)
            + password                   = (sensitive value)
            + platform                   = (known after apply)
            + resource_group_id          = (known after apply)
            + runtime_name               = (known after apply)
            + runtime_version            = (known after apply)
            + scaling_group_id           = (known after apply)
            + scaling_policy             = (known after apply)
            + security_group_id          = (known after apply)
            + security_group_ids         = (known after apply)
            + spot_strategy              = (known after apply)
            + system_disk_category       = "cloud_efficiency"
            + system_disk_size           = 40
            + unschedulable              = false
            + vpc_id                     = (known after apply)
            + vswitch_ids                = (known after apply)
      
            + data_disks {
                + category = "cloud_ssd"
                + size     = 100
              }
          }
      
        # alicloud_cs_managed_kubernetes.default will be created
        + resource "alicloud_cs_managed_kubernetes" "default" {
            + availability_zone            = (known after apply)
            + certificate_authority        = (known after apply)
            + cluster_domain               = "cluster.local"
            + cluster_spec                 = (known after apply)
            + connections                  = (known after apply)
            + control_plane_log_project    = (known after apply)
            + control_plane_log_ttl        = (known after apply)
            + deletion_protection          = false
            + id                           = (known after apply)
            + install_cloud_monitor        = (known after apply)
            + is_enterprise_security_group = (known after apply)
            + load_balancer_spec           = "slb.s1.small"
            + name                         = (known after apply)
            + name_prefix                  = "my-first-kubernetes-demo"
            + nat_gateway_id               = (known after apply)
            + new_nat_gateway              = true
            + node_cidr_mask               = 24
            + node_port_range              = (known after apply)
            + os_type                      = "Linux"
            + platform                     = (known after apply)
            + pod_cidr                     = "172.20.0.0/16"
            + proxy_mode                   = "ipvs"
            + resource_group_id            = (known after apply)
            + rrsa_metadata                = (known after apply)
            + security_group_id            = (known after apply)
            + service_cidr                 = "172.21.0.0/20"
            + slb_id                       = (known after apply)
            + slb_internet                 = (known after apply)
            + slb_internet_enabled         = true
            + slb_intranet                 = (known after apply)
            + version                      = (known after apply)
            + vpc_id                       = (known after apply)
            + worker_auto_renew_period     = (known after apply)
            + worker_disk_size             = (known after apply)
            + worker_instance_charge_type  = (known after apply)
            + worker_period                = (known after apply)
            + worker_period_unit           = (known after apply)
            + worker_ram_role_name         = (known after apply)
            + worker_vswitch_ids           = (known after apply)
          }
      
        # alicloud_vpc.default will be created
        + resource "alicloud_vpc" "default" {
            + cidr_block            = "10.1.0.0/21"
            + create_time           = (known after apply)
            + id                    = (known after apply)
            + ipv6_cidr_block       = (known after apply)
            + ipv6_cidr_blocks      = (known after apply)
            + name                  = (known after apply)
            + resource_group_id     = (known after apply)
            + route_table_id        = (known after apply)
            + router_id             = (known after apply)
            + router_table_id       = (known after apply)
            + secondary_cidr_blocks = (known after apply)
            + status                = (known after apply)
            + user_cidrs            = (known after apply)
            + vpc_name              = "my-first-kubernetes-demo"
          }
      
        # alicloud_vswitch.default will be created
        + resource "alicloud_vswitch" "default" {
            + availability_zone    = (known after apply)
            + cidr_block           = "10.1.1.0/24"
            + create_time          = (known after apply)
            + id                   = (known after apply)
            + ipv6_cidr_block      = (known after apply)
            + ipv6_cidr_block_mask = (known after apply)
            + name                 = (known after apply)
            + status               = (known after apply)
            + vpc_id               = (known after apply)
            + vswitch_name         = "my-first-kubernetes-demo"
            + zone_id              = "cn-zhangjiakou-a"
          }
      
      Plan: 4 to add, 0 to change, 0 to destroy.
      
      Do you want to perform these actions?
        Terraform will perform the actions described above.
        Only 'yes' will be accepted to approve.
      
        Enter a value: 
  3. terraform init命令会把我们用到的Provider插件下载好,terraform apply命令会根据我们的main.tf描述文件计算出需要执行的操作。上述日志中显示将会创建一个alicloud_cs_managed_kubernetes.default的资源,需要我们输入yes来确认创建。确认创建后,创建大约会耗时五分钟,terraform会输出类似下面的日志。

    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    
    alicloud_vpc.default: Creating...
    alicloud_vpc.default: Creation complete after 4s [id=vpc-8vbkpc7n9gp5mft7kxh7t]
    alicloud_vswitch.default: Creating...
    alicloud_vswitch.default: Creation complete after 3s [id=vsw-8vbkdhovthzlwirs4et9c]
    alicloud_cs_managed_kubernetes.default: Creating...
    alicloud_cs_managed_kubernetes.default: Still creating... [10s elapsed]
    ......
    alicloud_cs_managed_kubernetes.default: Still creating... [3m40s elapsed]
    alicloud_cs_managed_kubernetes.default: Creation complete after 3m42s [id=cfd0a48c499804b94b59a4f6da963f6d5]
    alicloud_cs_kubernetes_node_pool.default: Creating...
    alicloud_cs_kubernetes_node_pool.default: Still creating... [10s elapsed]
    alicloud_cs_kubernetes_node_pool.default: Still creating... [20s elapsed]
    alicloud_cs_kubernetes_node_pool.default: Still creating... [30s elapsed]
    alicloud_cs_kubernetes_node_pool.default: Creation complete after 33s [id=cfd0a48c499804b94b59a4f6da963f6d5:np378764a2c81d4a8eb85bad53cf3ccf5c]
    
    Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
  4. 当出现Apply complete! Resources: 4 added字样的时候,集群已经成功创建,此时我们也可以登录控制台在集群列表中查看此集群。

    image

修改托管版Kubernetes集群

在Terraform Provider中,我们提供了一部分参数的修改能力,一般情况下,所有非Force New Resource(强制新建资源)的参数都可以被修改。

  1. 下面我们修改部分参数,以下内容为修改后的模板。

    说明

    当前示例代码支持一键运行,您可以直接运行代码。一键运行

    provider "alicloud" {
      region = var.region
    }
    
    variable "region" {
      default = "cn-zhangjiakou"
    }
    
    # 默认资源名称
    variable "name" {
      default = "my-first-kubernetes-demo"
    }
    # 日志服务项目名称
    variable "log_project_name" {
      default = "my-first-kubernetes-sls-demo"
    }
    # 可用区
    data "alicloud_zones" "default" {
      available_resource_creation = "VSwitch"
    }
    # 节点ECS实例配置
    data "alicloud_instance_types" "default" {
      availability_zone    = data.alicloud_zones.default.zones[0].id
      cpu_core_count       = 2
      memory_size          = 4
      kubernetes_node_role = "Worker"
    }
    # 专有网络
    resource "alicloud_vpc" "default" {
      vpc_name   = var.name
      cidr_block = "10.1.0.0/21"
    }
    # 交换机
    resource "alicloud_vswitch" "default" {
      vswitch_name = var.name
      vpc_id       = alicloud_vpc.default.id
      cidr_block   = "10.1.1.0/24"
      zone_id      = data.alicloud_zones.default.zones[0].id
    }
    
    # kubernetes托管版
    resource "alicloud_cs_managed_kubernetes" "default" {
      worker_vswitch_ids = [alicloud_vswitch.default.id]
      # kubernetes集群名称的前缀。与name冲突。如果指定,terraform将使用它来构建唯一的集群名称。默认为“ Terraform-Creation”。
      name_prefix = var.name
      # 是否在创建kubernetes集群时创建新的nat网关。默认为true。
      new_nat_gateway = true
      # pod网络的CIDR块。当cluster_network_type设置为flannel,你必须设定该参数。它不能与VPC CIDR相同,并且不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。集群中允许的最大主机数量:256。
      pod_cidr = "172.20.0.0/16"
      # 服务网络的CIDR块。它不能与VPC CIDR相同,不能与VPC中的Kubernetes集群使用的CIDR相同,也不能在创建后进行修改。
      service_cidr = "172.21.0.0/20"
      # 是否为API Server创建Internet负载均衡。默认为false。
      slb_internet_enabled = true
      
      # 导出集群的证书相关文件到 /tmp 目录,下同
      client_cert = "/tmp/client-cert.pem"
      client_key = "/tmp/client-key.pem"
      cluster_ca_cert = "/tmp/cluster-ca-cert.pem"
    }
    
    resource "alicloud_cs_kubernetes_node_pool" "default" {
      node_pool_name         = var.name
      cluster_id   = alicloud_cs_managed_kubernetes.default.id
      vswitch_ids  = [alicloud_vswitch.default.id]
      # ssh登录集群节点的密码。您必须指定password或key_name kms_encrypted_password字段。
      password = "Yourpassword1234"
      # kubernetes集群的总工作节点数。
      desired_size = 3
      # 是否为kubernetes的节点安装云监控。
      install_cloud_monitor = true
      # 节点的ECS实例类型。为单个AZ集群指定一种类型,为MultiAZ集群指定三种类型。您可以通过数据源instance_types获得可用的kubernetes主节点实例类型
      instance_types        = ["ecs.n4.large"]
      # 节点的系统磁盘类别。其有效值为cloud_ssd和cloud_efficiency。默认为cloud_efficiency。
      system_disk_category  = "cloud_efficiency"
      system_disk_size      = 40
      data_disks {
        category = "cloud_ssd"
        size = "100"
      }
    }
    
    data "alicloud_cs_cluster_credential" "auth" {
      cluster_id                 = alicloud_cs_managed_kubernetes.default.id
      temporary_duration_minutes = 60
      output_file = "/tmp/config"
    }
  2. 和创建集群一样,修改集群时使用的命令也是terraform apply。执行后我们得到以下日志输出,输入yes并回车,我们就可以把该集群的名称改为test-managed-kubernetes-updated,worker节点扩容至3节点,同时将导出证书和连接文件到本机的/tmp 目录。

    terraform apply
    data.alicloud_zones.default: Reading...
    alicloud_vpc.default: Refreshing state... [id=vpc-8vbr6t6i2xl49hjzald45]
    data.alicloud_zones.default: Read complete after 0s [id=2604238681]
    data.alicloud_instance_types.default: Reading...
    alicloud_vswitch.default: Refreshing state... [id=vsw-8vbkp6rcqkn4ljf1a7tb3]
    alicloud_cs_managed_kubernetes.default: Refreshing state... [id=cdfe383b2114c40f582270860c39cb3cb]
    data.alicloud_instance_types.default: Read complete after 1s [id=3527274229]
    alicloud_cs_kubernetes_node_pool.default: Refreshing state... [id=cdfe383b2114c40f582270860c39cb3cb:npf17c80f735d645e88b4ea61b689e15b8]
    
    Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
      ~ update in-place
     <= read (data resources)
    
    Terraform will perform the following actions:
    
      # data.alicloud_cs_cluster_credential.auth will be read during apply
      # (depends on a resource or a module with changes pending)
     <= data "alicloud_cs_cluster_credential" "auth" {
          + certificate_authority      = (known after apply)
          + cluster_id                 = "cdfe383b2114c40f582270860c39cb3cb"
          + cluster_name               = (known after apply)
          + expiration                 = (known after apply)
          + id                         = (known after apply)
          + kube_config                = (sensitive value)
          + output_file                = "/tmp/config"
          + temporary_duration_minutes = 60
        }
    
      # alicloud_cs_kubernetes_node_pool.default will be updated in-place
      ~ resource "alicloud_cs_kubernetes_node_pool" "default" {
          ~ desired_size               = 2 -> 3
            id                         = "cdfe383b2114c40f582270860c39cb3cb:npf17c80f735d645e88b4ea61b689e15b8"
          ~ instance_types             = [
              - "ecs.n1.medium",
              + "ecs.sn1.medium",
            ]
            name                       = "my-first-kubernetes-demo"
            tags                       = {}
            # (26 unchanged attributes hidden)
    
            # (1 unchanged block hidden)
        }
    
      # alicloud_cs_managed_kubernetes.default will be updated in-place
      ~ resource "alicloud_cs_managed_kubernetes" "default" {
          + client_cert                  = "/tmp/client-cert.pem"
          + client_key                   = "/tmp/client-key.pem"
          + cluster_ca_cert              = "/tmp/cluster-ca-cert.pem"
            id                           = "cdfe383b2114c40f582270860c39cb3cb"
            name                         = "my-first-kubernetes-demo20240116105632726000000002"
            tags                         = {}
            # (28 unchanged attributes hidden)
    
            # (1 unchanged block hidden)
        }
    
    Plan: 0 to add, 2 to change, 0 to destroy.
    
    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    
    alicloud_cs_managed_kubernetes.default: Modifying... [id=cdfe383b2114c40f582270860c39cb3cb]
    alicloud_cs_managed_kubernetes.default: Modifications complete after 3s [id=cdfe383b2114c40f582270860c39cb3cb]
    data.alicloud_cs_cluster_credential.auth: Reading...
    alicloud_cs_kubernetes_node_pool.default: Modifying... [id=cdfe383b2114c40f582270860c39cb3cb:npf17c80f735d645e88b4ea61b689e15b8]
    data.alicloud_cs_cluster_credential.auth: Read complete after 0s [id=87210520]
    alicloud_cs_kubernetes_node_pool.default: Still modifying... [id=cdfe383b2114c40f582270860c39cb3cb:npf17c80f735d645e88b4ea61b689e15b8, 10s elapsed]
    alicloud_cs_kubernetes_node_pool.default: Still modifying... [id=cdfe383b2114c40f582270860c39cb3cb:npf17c80f735d645e88b4ea61b689e15b8, 20s elapsed]
    alicloud_cs_kubernetes_node_pool.default: Still modifying... [id=cdfe383b2114c40f582270860c39cb3cb:npf17c80f735d645e88b4ea61b689e15b8, 30s elapsed]
    alicloud_cs_kubernetes_node_pool.default: Modifications complete after 35s [id=cdfe383b2114c40f582270860c39cb3cb:npf17c80f735d645e88b4ea61b689e15b8]
    
    Apply complete! Resources: 0 added, 2 changed, 0 destroyed.
  3. Terraform apply运行成功后,控制台中显示的集群信息已经表明现在集群已经变成了我们期望的状态。在本机上,我们也通过导出的连接文件,用kubectl连接到集群。

    image截屏2024-01-16 19