Use ACS computing power in an ACK Pro cluster

更新时间:
复制 MD 格式

Alibaba Cloud Container Compute Service (ACS) is integrated with Container Service for Kubernetes. You can use an ACK Pro cluster to quickly access the container computing power provided by ACS. This topic describes how to use ACS computing power in your ACK cluster.

How it works

Container Compute Service (ACS) is a container service that uses Kubernetes as its user interface and provides computing resources that comply with container specifications. ACS uses a layered architecture that separates the Kubernetes control plane from the underlying container computing power. The ACS computing resource layer is responsible for scheduling and allocating resources for Pods, while Kubernetes manages application workloads such as Deployments, Services, StatefulSets, and CronJobs on top of this layer.

You can connect ACS container computing power to a Kubernetes cluster as a virtual node. This gives your cluster powerful elasticity, unconstrained by the computing capacity of its nodes. When ACS takes over the management of the underlying infrastructure for Pods, Kubernetes no longer needs to directly handle the placement and startup of individual Pods or monitor the resource status of underlying virtual machines. ACS ensures the required Pod resources are always available.

Container Service for Kubernetes (ACK) is one of the world's first certified Kubernetes platforms and provides a high-performance management service for containerized applications. It integrates with Alibaba Cloud's virtualization, storage, networking, and security capabilities to simplify cluster creation and scaling, allowing you to focus on developing and managing your containerized applications.

In an ACK Pro cluster, you must manually deploy a virtual node before creating ACS Pods. When your cluster needs to scale out, you can create ACS Pods on the virtual node on demand, without planning for node capacity. ACS Pods can communicate with Pods on regular cluster nodes. For workloads that are long-running and have fluctuating traffic, we recommend scheduling them to the virtual node. This approach maximizes resource utilization, shortens scale-out times, and reduces costs. When traffic decreases, you can quickly release these Pods to lower costs. Each Pod on a virtual node runs as an ACS instance within a secure and isolated container environment. For more information, see Overview of ACK.

image

Prerequisites

  • If this is your first time using the service, activate the required services and grant the necessary permissions:

  • An ACK Pro cluster that runs Kubernetes 1.26 or later is required. For more information, see Create an ACK Pro cluster. For information about how to upgrade a cluster, see Upgrade an ACK cluster.

  • For ACK Pro clusters, the virtual node component (ACK Virtual Node) must meet the version requirement that corresponds to the Kubernetes version.

    Kubernetes version

    ACK Virtual Node component version

    1.26 or later

    v2.13.0 or later

Install the ACK Virtual Node component

Perform the following steps:

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Components and Add-ons.

  3. On the Core Components tab, find the ACK Virtual Node component and click Install or Upgrade to the required version.

    You can also navigate to the Component Management page by choosing Operations > Component Management in the left-side navigation pane of the cluster details page.

  4. If you are prompted to Activate and Authorize ACS before you install ACK Virtual Node, follow the on-screen instructions. After you activate ACS and grant the required permissions, click OK to proceed with the installation.

  5. After the installation is complete, choose Nodes > Nodes in the left-side navigation pane. The name of the new virtual node starts with virtual-kubelet- by default.

Example: Use ACS CPU computing power

After the ACK Virtual Node component is installed or upgraded to the version specified in the Prerequisites, it will support both ACS and ECI computing power.

Note

When you schedule a Pod to a virtual node, Elastic Container Instance (ECI) computing power is used by default unless you specify ACS.

To use ACS CPU computing power in ACK, perform the following steps:

  1. Schedule Pods to the virtual node by using methods such as nodeSelector, affinity, ResourcePolicy, or by adding the alibabacloud.com/acs: "true" label. For more information, see Node affinity.

    Note

    Scheduling by using the alibabacloud.com/acs: "true" label is not supported in ACK Serverless clusters. It is currently supported in ACK Pro clusters, ACK dedicated clusters, ACK One registered clusters, and ACK Edge clusters.

  2. Specify the instance type for the ACS Pod by using the label alibabacloud.com/compute-class:<compute-type>. For more information about ACS instance types, see ACS Pod instances.

The following steps provide a detailed example:

  1. Deploy a Deployment.

    Important

    If you schedule Pods by adding the alibabacloud.com/acs: "true" label, StorageClasses of the WaitForFirstConsumer type are not supported. Therefore, when you use ACS computing power in an ACK cluster and an ACS Pod needs to mount a cloud disk, schedule the Pod to a virtual node by using nodeSelector or ResourcePolicy. For more information about how to configure ResourcePolicy, see ACK Pro clusters support hybrid scheduling of ECS and ACS computing power.

    NodeSelector

    1. Run the following command to view the labels of the virtual node. Replace virtual-kubelet-cn-hangzhou-k with your virtual node's name.

      kubectl get node virtual-kubelet-cn-hangzhou-k -oyaml

      The following output is a snippet of the labels section:

      apiVersion: v1
      kind: Node
      metadata:
        labels:
          kubernetes.io/arch: amd64
          kubernetes.io/hostname: virtual-kubelet-cn-hangzhou-k
          kubernetes.io/os: linux
          kubernetes.io/role: agent
          service.alibabacloud.com/exclude-node: "true"
          topology.diskplugin.csi.alibabacloud.com/zone: cn-hangzhou-k
          topology.kubernetes.io/region: cn-hangzhou
          topology.kubernetes.io/zone: cn-hangzhou-k
          type: virtual-kubelet # Use this label to schedule Pods to virtual nodes.
        name: virtual-kubelet-cn-hangzhou-k
      spec:
        taints:
        - effect: NoSchedule
          key: virtual-kubelet.io/provider
          value: alibabacloud 
    2. Create a file named nginx.yaml with the following content to deploy two Pods.

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: nginx
        labels:
          app: nginx
      spec:
        replicas: 2
        selector:
          matchLabels:
            app: nginx
        template:
          metadata:
            name: nginx
            labels:
              app: nginx 
              alibabacloud.com/compute-class: general-purpose # Specify the compute class for the ACS Pod. Default: general-purpose.
              alibabacloud.com/compute-qos: default # Specify the QoS class for the ACS Pod. Default: default.
          spec:
            nodeSelector:
              type: virtual-kubelet # Schedule Pods to a virtual node.
            tolerations:
            - key: "virtual-kubelet.io/provider" # Tolerate the taint on the virtual node. 
              operator: "Exists"
              effect: "NoSchedule"
            containers:
            - name: nginx
              image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
              resources:
                limits:
                  cpu: 2
                requests:
                  cpu: 2
    3. Create the NGINX application and check the deployment result.

      1. Run the following command to create the NGINX application.

        kubectl apply -f nginx.yaml 
      2. Run the following command to check the deployment result.

        kubectl get pods -o wide

        Expected output:

        NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                            NOMINATED NODE   READINESS GATES
        nginx-9cdf7bbf9-s****   1/1     Running   0          36s   10.0.6.68        virtual-kubelet-cn-hangzhou-j   <none>           <none>
        nginx-9cdf7bbf9-v****   1/1     Running   0          36s   10.0.6.67        virtual-kubelet-cn-hangzhou-k   <none>           <none>

        The output shows that the nodeSelector scheduled the two Pods to nodes with the label type=virtual-kubelet.

    Pod label scheduling

    1. Create a file named nginx.yaml with the following content.

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: nginx
        labels:
          app: nginx
      spec:
        replicas: 2
        selector:
          matchLabels:
            app: nginx
        template:
          metadata:
            labels:
              app: nginx 
              alibabacloud.com/acs: "true" # Configure the Pod to use ACS computing power.
              alibabacloud.com/compute-class: general-purpose # Specify the compute class for the ACS Pod. Default: general-purpose.
              alibabacloud.com/compute-qos: default # Specify the QoS class for the ACS Pod. Default: default.
          spec:
            containers:
            - name: nginx
              image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
              resources:
                limits:
                  cpu: 2
                requests:
                  cpu: 2
    2. Create the NGINX application and check the deployment result.

      1. Run the following command to create the NGINX application.

        kubectl apply -f nginx.yaml 
      2. Run the following command to check the deployment result.

        kubectl get pods -o wide

        Expected output:

        NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                            NOMINATED NODE   READINESS GATES
        nginx-9cdf7bbf9-s****   1/1     Running   0          36s   10.0.6.68        virtual-kubelet-cn-hangzhou-j   <none>           <none>
        nginx-9cdf7bbf9-v****   1/1     Running   0          36s   10.0.6.67        virtual-kubelet-cn-hangzhou-k   <none>           <none>

        The output shows that the Pods are scheduled to virtual nodes, as specified by the alibabacloud.com/acs: "true" label.

  2. Check the details of an NGINX Pod to confirm that it is an ACS Pod instance.

    1. Run the following command to view the details of an NGINX Pod.

      kubectl describe pod nginx-9cdf7bbf9-s**** 

      Expected output (key information):

      Annotations:      ProviderCreate: done
                        alibabacloud.com/client-token: edf29202-54ac-438e-9626-a1ca007xxxxx
                        alibabacloud.com/instance-id: acs-2ze008giupcyaqbxxxxx
                        alibabacloud.com/pod-ephemeral-storage: 30Gi
                        alibabacloud.com/pod-use-spec: 2-4Gi
                        alibabacloud.com/request-id: A0EF3BF3-37E7-5A07-AC2D-68A0CFCxxxxx
                        alibabacloud.com/schedule-result: finished
                        alibabacloud.com/user-id: 14889995898xxxxx
                        kubernetes.io/pod-stream-port: 10250
                        kubernetes.io/preferred-scheduling-node: virtual-kubelet-cn-hangzhou-j/1
                        kubernetes.io/resource-type: serverless

      The alibabacloud.com/instance-id: acs-2ze008giupcyaqbxxxxx annotation confirms that the Pod is an ACS Pod instance.

Example: Use ACS GPU computing power

The process for using ACS GPU computing power is similar to that for ACS CPU computing power, but it requires specific component versions and some additional configuration.

Component configuration

For ACK Pro clusters of different Kubernetes versions, the kube-scheduler component must meet the following version requirements.

Kubernetes version

kube-scheduler version

1.26 or later

  • For 1.31 clusters, the scheduler version must be v1.31.0-aliyun.6.8.4.8f585f26 or later.

  • For 1.30 clusters, the scheduler version must be v1.30.3-aliyun.6.8.4.946f90e8 or later.

  • For 1.28 clusters, the scheduler version must be v1.28.12-aliyun-6.8.4.b27c0009 or later.

  • For 1.26 clusters, the scheduler version must be v1.26.3-aliyun-6.8.4.4b180111 or later.

Usage

...     
     labels:
        # Declare the ACS GPU resource requirement in the labels.
        alibabacloud.com/compute-class: gpu     # For GPU types, use the fixed value 'gpu'.
        alibabacloud.com/compute-qos: default   # The QoS class. This has the same meaning as for regular ACS computing power.
        alibabacloud.com/gpu-model-series: example-model  # The GPU model series. Replace with your actual model, such as T4.
...
Note
  1. The following examples show three different ways to configure GPU computing power.

    NodeSelector

    Use the following YAML to create a GPU workload.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: dep-node-selector-demo
      labels:
        app: node-selector-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: node-selector-demo
      template:
        metadata:
          labels:
            app: node-selector-demo
            # ACS attributes
            alibabacloud.com/compute-class: gpu
            alibabacloud.com/compute-qos: default
            alibabacloud.com/gpu-model-series: example-model  # The GPU model series. Replace with your actual model, such as T4.
        spec:
          # Specify the label for the virtual node.
          nodeSelector:
            type: virtual-kubelet
          # Tolerate the virtual node's taint.
          tolerations:
          - key: "virtual-kubelet.io/provider" # Tolerate the taint on the virtual node.
            operator: "Exists"
            effect: "NoSchedule"
          containers:
          - name: node-selector-demo
            image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
            command:
            - "sleep"
            - "1000h"
            resources:
              limits:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1"
              requests:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1"

    ResourcePolicy

    Use the following YAML to create a GPU workload.

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: dep-rp-demo
      namespace: default
    spec:
      selector:
        app: dep-rp-demo
      units:
      - resource: acs
        podLabels:
          alibabacloud.com/compute-class: gpu
          alibabacloud.com/compute-qos: default
          alibabacloud.com/gpu-model-series: example-model  # The GPU model series. Replace with your actual model, such as T4.
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: dep-rp-demo
      labels:
        app: dep-rp-demo
      annotations:
        resourcePolicy: "dep-rp-demo"  # Reference the name of the ResourcePolicy.
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: dep-rp-demo
      template:
        metadata:
          labels:
            app: dep-rp-demo
            alibabacloud.com/compute-class: gpu
            alibabacloud.com/compute-qos: default
            alibabacloud.com/gpu-model-series: example-model  # The GPU model series. Replace with your actual model, such as T4.
        spec:
          containers:
          - name: demo
            image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
            command:
            - "sleep"
            - "1000h"
            resources:
              limits:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1"
              requests:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1"

    For more information about using ResourcePolicy for resource scheduling, see Custom resource priority scheduling.

    Pod label scheduling

    Use the following YAML to create a GPU workload.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: dep-node-selector-demo
      labels:
        app: node-selector-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: node-selector-demo
      template:
        metadata:
          labels:
            app: node-selector-demo
            # ACS attributes
            alibabacloud.com/acs: "true" # Configure the Pod to use ACS computing power.
            alibabacloud.com/compute-class: gpu
            alibabacloud.com/compute-qos: default
            alibabacloud.com/gpu-model-series: example-model  # The GPU model series. Replace with your actual model, such as T4.
        spec:
          containers:
          - name: node-selector-demo
            image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
            command:
            - "sleep"
            - "1000h"
            resources:
              limits:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1"
              requests:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1"
  2. Run the following command to check the running status of the GPU workload.

    kubectl get pod node-selector-demo-9cdf7bbf9-s**** -oyaml

    Expected output (key information):

        phase: Running
        resources:
          limits:
            #other resources
            nvidia.com/gpu: "1"
          requests:
            #other resources
            nvidia.com/gpu: "1"

Example: Use ACS GPU HPN computing power

The process for using ACS GPU HPN computing power is similar to that for ACS CPU computing power, but with the following requirements:

  • This feature is supported only in ACK Pro clusters, ACK One registered clusters, and ACK One distributed workflow Argo clusters.

  • You must purchase GPU-HPN capacity reservations in advance and associate them with your cluster.

  • The kube-scheduler version must meet the following requirements:

    Kubernetes version

    kube-scheduler version

    1.28

    v1.28.12-aliyun-6.9.3.cd73f3fe or later.

    1.30

    v1.30.3-aliyun.6.9.3.ce7e2faf or later.

    1.31

    v1.31.0-aliyun.6.9.3.051bb0e8 or later.

    1.32

    v1.32.0-aliyun.6.9.3.515ac311 or later.

    1.33

    v1.33.0-aliyun.6.9.4.8b58e6b4 or later.

  • The ACK Virtual Node component must be v2.15.0 or later.

Usage

...     
labels:
  # Declare the ACS GPU resource requirement in the labels.
  alibabacloud.com/compute-class: gpu-hpn     # Must be set to gpu-hpn.
  alibabacloud.com/compute-qos: default   # The QoS class. This has the same meaning as for regular ACS computing power.
...
Note
  • For more information about ACS compute classes and QoS classes, see Relationship between compute classes and QoS classes.

  • For information about other parameters for ACS Pods, see Configure an ACS Pod.

  • An ACS GPU HPN node can only schedule Pods of the gpu-hpn compute class. You do not need to specify GPU resource requirements in the Pod resource declaration for these Pods. The node cannot schedule Pods of other compute classes or Pods for which no compute class is declared.

  1. You can use a Kubernetes nodeSelector to schedule Pods to GPU HPN nodes.

    Important

    When you configure an ACS GPU HPN Pod, note the following fields:

    • Specify the compute class: alibabacloud.com/compute-class: gpu-hpn.

    • Specify the reserved node label: alibabacloud.com/node-type: reserved.

    • For the device resource name in the requests and limits fields of a resource specification, specify the name based on the actual device card type, such as NVIDIA or others.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: dep-node-selector-demo
      labels:
        app: node-selector-demo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: node-selector-demo
      template:
        metadata:
          labels:
            app: node-selector-demo
            # ACS attributes
            alibabacloud.com/compute-class: gpu-hpn
            alibabacloud.com/compute-qos: default
        spec:
          # Specify the label for GPU HPN reserved nodes.
          nodeSelector:
            alibabacloud.com/node-type: reserved
          containers:
          - name: node-selector-demo
            image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
            command:
            - "sleep"
            - "1000h"
            resources:
              limits:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1" # Use the resource name that matches your actual GPU model.
              requests:
                cpu: 1
                memory: 1Gi
                nvidia.com/gpu: "1" # Use the resource name that matches your actual GPU model.
  2. Check the running status of the GPU workload.

    kubectl get pod node-selector-demo-9cdf7bbf9-s**** -oyaml

    Expected output (key information):

        phase: Running
        resources:
          limits:
            #other resources
            nvidia.com/gpu: "1"
          requests:
            #other resources
            nvidia.com/gpu: "1"