How to create an elastic node pool with a custom image

更新时间:
复制 MD 格式

Using a custom image with pre-installed software packages can significantly reduce the startup time for cloud nodes by minimizing package download time.

Prerequisites

Procedure

image
Note
  • This topic uses a CentOS 7.9 operating system and a Kubernetes v1.28.3 cluster connected using binaries as an example of how to create a custom image for an elastic node pool.

  • If you already have a custom image, you can skip to Step 3.

Step 1: Create a node pool and add a node

  1. Select an OSS bucket, create a file named join-ecs-node.sh with the following content, and upload the file to the bucket.

    echo "The node providerid is $ALIBABA_CLOUD_PROVIDER_ID"
    echo "The node name is $ALIBABA_CLOUD_NODE_NAME"
    echo "The node labels are $ALIBABA_CLOUD_LABELS"
    echo "The node taints are $ALIBABA_CLOUD_TAINTS"
  2. Obtain the URL of the join-ecs-node.sh file (you can use a signed URL), and then modify the custom script configuration in the cluster.

    1. Run the following command to edit the ack-agent-config ConfigMap:

      kubectl edit cm ack-agent-config -n kube-system
    2. Modify the addNodeScriptPath field. The updated configuration is as follows:

      apiVersion: v1
      data:
        addNodeScriptPath: https://kubelet-****.oss-cn-hangzhou-internal.aliyuncs.com/join-ecs-nodes.sh
      kind: ConfigMap
      metadata:
        name: ack-agent-config
        namespace: kube-system
  3. Create a node pool named cloud-test and set Expected Nodes to 1. For more information, see Create and manage a node pool.

    Important

    The new node will have a Failed status because it lacks the required software packages. You must log on to this node for initialization, so ensure it is accessible through SSH.

Step 2: Configure node and export custom image

  1. Log on to the node and run the following command to view the node information:

    cat /var/log/acs/init.log

    Expected output:

    The node providerid is cn-zhangjiakou.i-xxxxx
    The node name is cn-zhangjiakou.192.168.66.xx
    The node labels are alibabacloud.com/nodepool-id=npf9fbxxxxxx,ack.aliyun.com=c22b1a2e122ff4fde85117de4xxxxxx,alibabacloud.com/instance-id=i-8vb7m7nt3dxxxxxxx,alibabacloud.com/external=true
    The node taints are

    This output confirms that the custom script can obtain the Alibaba Cloud node information. Record this information for the kubelet startup parameters.

  2. Run the following commands to configure the base environment:

    # Install tool packages.
    yum update -y && yum -y install  wget psmisc vim net-tools nfs-utils telnet yum-utils device-mapper-persistent-data lvm2 git tar curl
    
    # Disable the firewall.
    systemctl disable --now firewalld
    
    # Disable SELinux.
    setenforce 0
    sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
    
    # Disable swap partitions.
    sed -ri 's/.*swap.*/#&/' /etc/fstab
    swapoff -a && sysctl -w vm.swappiness=0
    
    # Configure the network.
    systemctl disable --now NetworkManager
    systemctl start network && systemctl enable network
    
    # Synchronize the time.
    ln -svf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
    yum install ntpdate -y
    ntpdate ntp.aliyun.com
    
    # Configure ulimit.
    ulimit -SHn 65535
    cat >> /etc/security/limits.conf <<EOF
    * soft nofile 655360
    * hard nofile 131072
    * soft nproc 655350
    * hard nproc 655350
    * seft memlock unlimited
    * hard memlock unlimitedd
    EOF
    Note

    After you complete the preceding environment configuration, upgrade the kernel to version 4.18 or later and install ipvsadm.

  3. Install containerd.

    1. Run the following commands to download the CNI plugins and containerd packages:

      wget https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-linux-amd64-v1.3.0.tgz
      mkdir -p /etc/cni/net.d /opt/cni/bin 
      # Decompress the CNI binary package.
      tar xf cni-plugins-linux-amd64-v*.tgz -C /opt/cni/bin/
      wget https://github.com/containerd/containerd/releases/download/v1.7.8/containerd-1.7.8-linux-amd64.tar.gz
      tar -xzf containerd-*-linux-amd64.tar.gz -C /
    2. Run the following command to create the service startup configuration:

      cat > /etc/systemd/system/containerd.service <<EOF
      [Unit]
      Description=containerd container runtime
      Documentation=https://containerd.io
      After=network.target local-fs.target
      
      [Service]
      ExecStartPre=-/sbin/modprobe overlay
      ExecStart=/usr/local/bin/containerd
      Type=notify
      Delegate=yes
      KillMode=process
      Restart=always
      RestartSec=5
      LimitNPROC=infinity
      LimitCORE=infinity
      LimitNOFILE=infinity
      TasksMax=infinity
      OOMScoreAdjust=-999
      
      [Install]
      WantedBy=multi-user.target
      EOF
    3. Run the following command to configure the modules required by containerd:

      cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
      overlay
      br_netfilter
      EOF
      systemctl restart systemd-modules-load.service
    4. Run the following command to configure the kernel parameters required by containerd:

      cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
      net.bridge.bridge-nf-call-iptables  = 1
      net.ipv4.ip_forward                 = 1
      net.bridge.bridge-nf-call-ip6tables = 1
      EOF
      
      # Load the kernel parameters.
      sysctl --system
    5. Run the following commands to create a configuration file for containerd:

      mkdir -p /etc/containerd
      containerd config default | tee /etc/containerd/config.toml
      
      # Modify the containerd configuration file.
      sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml
      cat /etc/containerd/config.toml | grep SystemdCgroup
      sed -i "s#registry.k8s.io#m.daocloud.io/registry.k8s.io#g" /etc/containerd/config.toml
      cat /etc/containerd/config.toml | grep sandbox_image
      sed -i "s#config_path\ \=\ \"\"#config_path\ \=\ \"/etc/containerd/certs.d\"#g" /etc/containerd/config.toml
      cat /etc/containerd/config.toml | grep certs.d
      
      # Configure a registry mirror.
      mkdir /etc/containerd/certs.d/docker.io -pv
      cat > /etc/containerd/certs.d/docker.io/hosts.toml << EOF
      server = "https://docker.io"
      [host."https://hub-mirror.c.163.com"]
        capabilities = ["pull", "resolve"]
      EOF
    6. Run the following commands to enable containerd to start on system startup:

      systemctl daemon-reload
      # Reloads systemd units. Required after creating or modifying unit files (for example, .service or .socket).
      
      systemctl enable --now containerd.service
      systemctl start containerd.service
      systemctl status containerd.service
    7. Run the following commands to configure crictl:

      wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.28.0/crictl-v1.28.0-linux-amd64.tar.gz
      tar xf crictl-v*-linux-amd64.tar.gz -C /usr/bin/
      # Generate the configuration file.
      cat > /etc/crictl.yaml <<EOF
      runtime-endpoint: unix:///run/containerd/containerd.sock
      image-endpoint: unix:///run/containerd/containerd.sock
      timeout: 10
      debug: false
      EOF
      
      # Test the configuration.
      systemctl restart containerd
      crictl info
  4. Install kubelet and kube-proxy.

    1. Obtain the binary files. Log on to the master node and copy the binary files to the current node.

      scp /usr/local/bin/kube{let,-proxy} $NODEIP:/usr/local/bin/
    2. Obtain the certificates. Run the following command to create a certificate storage directory on the local machine:

      mkdir -p /etc/kubernetes/pki

      Log on to the master node and copy the certificates to the current node.

      for FILE in pki/ca.pem pki/ca-key.pem pki/front-proxy-ca.pem bootstrap-kubelet.kubeconfig kube-proxy.kubeconfig; 
        do scp /etc/kubernetes/$FILE $NODE:/etc/kubernetes/${FILE}; done
    3. Run the following command to configure the kubelet service. Use the Alibaba Cloud node pool variables that you obtained in Step 2.

      mkdir -p /var/lib/kubelet /var/log/kubernetes /etc/systemd/system/kubelet.service.d /etc/kubernetes/manifests/
      
      # Configure the kubelet service on all Kubernetes nodes.
      cat > /usr/lib/systemd/system/kubelet.service << EOF
      
      [Unit]
      Description=Kubernetes Kubelet
      Documentation=https://github.com/kubernetes/kubernetes
      After=network-online.target firewalld.service containerd.service
      Wants=network-online.target
      Requires=containerd.service
      
      [Service]
      ExecStart=/usr/local/bin/kubelet \\
          --node-ip=${ALIBABA_CLOUD_NODE_NAME} \\
          --hostname-override=${ALIBABA_CLOUD_NODE_NAME} \\
          --node-labels=${ALIBABA_CLOUD_LABELS} \\
          --provider-id=${ALIBABA_CLOUD_PROVIDER_ID} \\
          --register-with-taints=${ALIBABA_CLOUD_TAINTS} \\
          --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig  \\
          --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\
          --config=/etc/kubernetes/kubelet-conf.yml \\
          --container-runtime-endpoint=unix:///run/containerd/containerd.sock
      
      [Install]
      WantedBy=multi-user.target
      EOF
    4. Run the following command to create the kubelet startup configuration file:

      cat > /etc/kubernetes/kubelet-conf.yml <<EOF
      apiVersion: kubelet.config.k8s.io/v1beta1
      kind: KubeletConfiguration
      address: 0.0.0.0
      port: 10250
      readOnlyPort: 10255
      authentication:
        anonymous:
          enabled: false
        webhook:
          cacheTTL: 2m0s
          enabled: true
        x509:
          clientCAFile: /etc/kubernetes/pki/ca.pem
      authorization:
        mode: Webhook
        webhook:
          cacheAuthorizedTTL: 5m0s
          cacheUnauthorizedTTL: 30s
      cgroupDriver: systemd
      cgroupsPerQOS: true
      clusterDNS:
      - 10.96.0.10
      clusterDomain: cluster.local
      containerLogMaxFiles: 5
      containerLogMaxSize: 10Mi
      contentType: application/vnd.kubernetes.protobuf
      cpuCFSQuota: true
      cpuManagerPolicy: none
      cpuManagerReconcilePeriod: 10s
      enableControllerAttachDetach: true
      enableDebuggingHandlers: true
      enforceNodeAllocatable:
      - pods
      eventBurst: 10
      eventRecordQPS: 5
      evictionHard:
        imagefs.available: 15%
        memory.available: 100Mi
        nodefs.available: 10%
        nodefs.inodesFree: 5%
      evictionPressureTransitionPeriod: 5m0s
      failSwapOn: true
      fileCheckFrequency: 20s
      hairpinMode: promiscuous-bridge
      healthzBindAddress: 127.0.0.1
      healthzPort: 10248
      httpCheckFrequency: 20s
      imageGCHighThresholdPercent: 85
      imageGCLowThresholdPercent: 80
      imageMinimumGCAge: 2m0s
      iptablesDropBit: 15
      iptablesMasqueradeBit: 14
      kubeAPIBurst: 10
      kubeAPIQPS: 5
      makeIPTablesUtilChains: true
      maxOpenFiles: 1000000
      maxPods: 110
      nodeStatusUpdateFrequency: 10s
      oomScoreAdj: -999
      podPidsLimit: -1
      registryBurst: 10
      registryPullQPS: 5
      resolvConf: /etc/resolv.conf
      rotateCertificates: true
      runtimeRequestTimeout: 2m0s
      serializeImagePulls: true
      staticPodPath: /etc/kubernetes/manifests
      streamingConnectionIdleTimeout: 4h0m0s
      syncFrequency: 1m0s
      volumeStatsAggPeriod: 1m0s
      EOF
    5. Run the following commands to start the kubelet:

      systemctl daemon-reload
      # Reloads systemd units. Required after creating or modifying unit files (for example, .service or .socket).
      
      systemctl enable --now kubelet.service
      systemctl start kubelet.service
      systemctl status kubelet.service
    6. Run the following command to view cluster information:

      kubectl  get node
    7. Log on to the master node and get the kubeconfig file required by kube-proxy.

      scp /etc/kubernetes/kube-proxy.kubeconfig $NODE:/etc/kubernetes/kube-proxy.kubeconfig
    8. Run the following command to add the kube-proxy service configuration:

      cat >  /usr/lib/systemd/system/kube-proxy.service << EOF
      [Unit]
      Description=Kubernetes Kube Proxy
      Documentation=https://github.com/kubernetes/kubernetes
      After=network.target
      
      [Service]
      ExecStart=/usr/local/bin/kube-proxy \\
        --config=/etc/kubernetes/kube-proxy.yaml \\
        --v=2
      Restart=always
      RestartSec=10s
      
      [Install]
      WantedBy=multi-user.target
      
      EOF
    9. Run the following command to add the kube-proxy startup configuration:

      cat > /etc/kubernetes/kube-proxy.yaml << EOF
      apiVersion: kubeproxy.config.k8s.io/v1alpha1
      bindAddress: 0.0.0.0
      clientConnection:
        acceptContentTypes: ""
        burst: 10
        contentType: application/vnd.kubernetes.protobuf
        kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig
        qps: 5
      clusterCIDR: 172.16.0.0/12,fc00:2222::/112
      configSyncPeriod: 15m0s
      conntrack:
        max: null
        maxPerCore: 32768
        min: 131072
        tcpCloseWaitTimeout: 1h0m0s
        tcpEstablishedTimeout: 24h0m0s
      enableProfiling: false
      healthzBindAddress: 0.0.0.0:10256
      hostnameOverride: ""
      iptables:
        masqueradeAll: false
        masqueradeBit: 14
        minSyncPeriod: 0s
        syncPeriod: 30s
      ipvs:
        masqueradeAll: true
        minSyncPeriod: 5s
        scheduler: "rr"
        syncPeriod: 30s
      kind: KubeProxyConfiguration
      metricsBindAddress: 127.0.0.1:10249
      mode: "ipvs"
      nodePortAddresses: null
      oomScoreAdj: -999
      portRange: ""
      udpIdleTimeout: 250ms
      EOF
    10. Run the following commands to start kube-proxy:

       systemctl daemon-reload
      # Reloads systemd units. Required after creating or modifying unit files (for example, .service or .socket).
      
      systemctl enable --now kube-proxy.service
      systemctl restart kube-proxy.service
      systemctl status kube-proxy.service
  5. Sync the node pool status.

    1. Log on to the ACK console. In the left navigation pane, click Clusters.

    2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.

    3. On the Node Pools page, click Sync Node Pool. After the synchronization is complete, verify that no failure messages are displayed and the node pool is in a normal state.

  6. Export the custom image.

    1. Log in to the ECS console.

    2. In the left-side navigation pane, choose Instances & Images > Instance.

    3. Click the Instance ID. On the Instance Details tab, click Create custom image.

    4. In the left-side navigation pane, choose Instances & Images > Image.

    5. On the Images page, you can see the Custom Image that you created. The Status is Available.

Step 3: Modify or create node pool with custom image

Note

If you already have a custom image and skipped Step 1 and Step 2, you must create a node pool using the custom image. For more information, see How do I create a custom image from an existing ECS instance and use the image to create nodes?.

  1. On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.

  2. On the Node Pools page, find the target Node Pools and click Edit in the Actions column. On the Advanced tab, set the node pool image to Custom Image. Find the Custom image setting, click the Select custom image link, and select the custom image you created.

  3. On the Node Pools page, you can see that the Operating System has been updated to Custom Image. After the update is complete, the Operating system column for the node pool displays custom image and its image ID, which indicates that the image has been successfully replaced.

Step 4: Update node init script for cloud parameters

Note
  • You must remove the residual kubelet certificates from the custom image, as shown in the seventh line of the script.

  • For existing custom node pools, you must configure the download URL for the custom script as described in Step 1.

  1. Create or update the join-ecs-node.sh file with the following content. Because the custom image already contains the required tools and dependencies, the custom script only needs to receive and update the Alibaba Cloud node pool parameters.

    echo "The node providerid is $ALIBABA_CLOUD_PROVIDER_ID"
    echo "The node name is $ALIBABA_CLOUD_NODE_NAME"
    echo "The node labels are $ALIBABA_CLOUD_LABELS"
    echo "The node taints are $ALIBABA_CLOUD_TAINTS"
    systemctl stop kubelet.service
    echo "Delete old kubelet pki" # The old node certificates must be deleted.
    rm -rf /var/lib/kubelet/pki/*
    echo "Add kubelet service config"
    # Configure the kubelet service.
    cat > /usr/lib/systemd/system/kubelet.service << EOF
    
    [Unit]
    Description=Kubernetes Kubelet
    Documentation=https://github.com/kubernetes/kubernetes
    After=network-online.target firewalld.service containerd.service
    Wants=network-online.target
    Requires=containerd.service
    
    [Service]
    ExecStart=/usr/local/bin/kubelet \\
        --node-ip=${ALIBABA_CLOUD_NODE_NAME} \\
        --hostname-override=${ALIBABA_CLOUD_NODE_NAME} \\
        --node-labels=${ALIBABA_CLOUD_LABELS} \\
        --provider-id=${ALIBABA_CLOUD_PROVIDER_ID} \\
        --register-with-taints=${ALIBABA_CLOUD_TAINTS} \\
        --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.kubeconfig  \\
        --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\
        --config=/etc/kubernetes/kubelet-conf.yml \\
        --container-runtime-endpoint=unix:///run/containerd/containerd.sock
    
    [Install]
    WantedBy=multi-user.target
    EOF
    
    systemctl daemon-reload
    # Start the kubelet service.
    systemctl start kubelet.service
  2. Upload the updated join-ecs-node.sh script to OSS.

Step 5: Scale out the node pool

  1. On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.

  2. On the Node Pools page, find the target node pool. In the Actions column, choose More > Scale to add a new node.

    Verify that both nodes are in a normal state. This confirms the successful creation of the elastic node pool.

  3. You can configure an auto scaling policy for the node pool. For more information, see Configure auto scaling.