Before you use GPU topology-aware scheduling, you must install and configure the required components. This topic describes how to install these components and enable this feature in your cluster.
Prerequisites
-
You have an ACK managed cluster. The cluster must use a GPU instance type.
-
You have obtained the cluster KubeConfig and connected to the cluster by using kubectl.
-
The components in your cluster meet the following version requirements.
Component
Version requirements
Kubernetes
1.18.8 or later
NVIDIA driver
418.87.01 or later
NCCL version
2.7 or later
Operating system
-
CentOS 7.6
-
CentOS 7.7
-
Ubuntu 16.04
-
Ubuntu 18.04
-
Alibaba Cloud Linux 2
-
Alibaba Cloud Linux 3
GPU
V100
-
Procedure
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Cloud-native AI Suite page, click Deploy.
-
On the Deploy Cloud-native AI Suite page, in the Scheduling section, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling), and click Deploy Cloud-native AI Suite below. For more information about the configuration items for deploying the Cloud-native AI Suite, see Install Cloud-native AI Suite.
After the deployment is complete, you can see the installed GPU topology-aware scheduling component ack-ai-installer in the Components.
NoteIf you have previously deployed the Cloud-native AI Suite, you can directly click Deploy in the Actions column to the right of the scheduling component ack-ai-installer in the component list to install the component.