ACK-supported NVIDIA driver versions

更新时间:
复制 MD 格式

When installing an NVIDIA driver on a node, ensure the driver version is supported by ACK.

CUDA

CUDA is a parallel computing platform and programming model introduced by NVIDIA in 2007. It uses a graphics processing unit (GPU) to significantly accelerate computing performance.

The following figure shows the CUDA architecture. The differences between the Driver API and the Runtime API in the CUDA software stack are as follows:

  • Driver API: Offers more comprehensive features but is more complex to use.

  • Runtime API: Wraps several Driver API functions, hiding certain initialization operations, which makes it easier to use.

The NVIDIA Driver package provides the CUDA Driver API, while the CUDA Toolkit package provides the CUDA Library and CUDA Runtime.

cuda.png

Driver and cluster version compatibility

The following table lists the NVIDIA GPU driver versions supported by each ACK cluster version.

Important
  • For ACK Lingjun clusters and Lingjun nodes in ACK managed Pro clusters, GPU drivers are built into the OS image. You cannot use node labels to install a specific GPU driver version. This restriction also applies to edge node pools in ACK Edge clusters.

  • Driver versions 510 and later may occasionally cause XID 119 or XID 120 errors. If this issue occurs, see How do I troubleshoot GPU card failures caused by XID 119 or XID 120 errors?.

  • Driver version 550 resolves issues such as frequent XID 119, 120, and 31 errors, and kernel panics in certain applications. We recommend that you upgrade existing GPU nodes to driver version 550.

  • ACK periodically updates the default driver version. As a result, newly scaled-out GPU nodes in your cluster might use a different driver version. To prevent this, we recommend that you specify a driver version for your node pool. For more information, see Customize the GPU driver version for a node by specifying a version number.

  • When you create a node pool, if the specified driver version is not listed in Driver and OS kernel version compatibility, ACK automatically installs the default driver version. If you specify a driver version that is incompatible with the latest operating system, the node addition may fail. In this case, you must select the latest supported driver version.

  • After you upgrade the OS kernel version, the GPU driver installed on the node may become unavailable. To resolve this, remove the node from the node pool and then re-add it, or manually upgrade the GPU driver on a node.

  • When using the monitoring add-on with a 570-series or later driver, ensure the ack-arms-prometheus version is 1.1.33 or later and the ack-gpu-exporter version is 2.3.0 or later.

  • If you customize the GPU driver version for your node pool by specifying a version number or by using an OSS URL, the OS and the driver may become incompatible after the OS image is updated. See Supported NVIDIA driver versions in ACK to select a compatible driver.

  • For the gn9t instance family, do not use driver versions earlier than 570.153.02. Earlier versions may frequently cause GPU device disconnections. Symptoms include:

    • Running the nvidia-smi command reports fewer GPUs than are physically installed, or outputs No devices were found.

    • Running the lspci | grep -i nvidia command still detects the device, but the device status shows [rev b0].

Cluster version

Default driver version

Custom driver support

Supported driver versions

1.28 and later

535.161.07

570.169 (for ecs.gn9t and ecs.ebmgn9t instance types)

Yes

  • 580.126.09

  • 570.195.03

  • 570.169

  • 570.133.20

  • 550.163.01

  • 550.144.03

  • 550.90.07

  • 535.230.02

  • 535.161.07

The following driver versions are incompatible with the latest operating systems.

  • 535.129.03

  • 525.147.05

  • 515.105.01

  • 510.108.03

  • 535.54.03

  • 525.105.17

  • 515.86.01

  • 510.47.03

  • 470.161.03

  • 470.103.01

  • 470.82.01

  • 470.57.02

  • 460.91.03

1.26

Yes

1.24

Yes

1.22

Yes

1.20

Yes

  • 450.119.04

  • 450.102.04

  • 450.82.02

  • 450.51.06

  • 418.181.07

  • 418.87.01

1.18.8

418.181.07

Yes

1.16.9

418.181.07

Yes

1.16.6

418.87.01

No

1.14.8

418.181.07

Yes

Driver and GPU compatibility

Information

gn8v

gn8is

gn7e

gn7i

gn7

gn6e

gn6i

gn6v

gn5i

gn5

Product Type

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Data Center / Tesla

Product series

H-Series

L-Series

A-Series

A-Series

A-Series

V-Series

T-Series

V-Series

P-Series

P-Series

Recommended Tesla driver version

570.133.20 or later

450.80.02 or later

460.73.01 or later

450.80.02 or later

410.79 or later

Recommended CUDA Toolkit version

CUDA Toolkit 12.4 Update 1

CUDA Toolkit 11.0 Update 1

CUDA Toolkit 11.2

CUDA Toolkit 11.0 Update 1

CUDA Toolkit 10.1 Update 2

Note
  • The table lists GPU information for only some common GPU-accelerated compute-optimized instance types. Instances that have the same GPU card share the same GPU information, such as product type, product series, and product family. For example, ebmgn7i and gn7i instances both use NVIDIA A10 GPUs. Therefore, they have the same product type, product series, and product family.

  • When you manually install a Tesla driver and a CUDA package, make sure that the driver version is compatible with the CUDA package version. For more information, see CUDA Compatibility.

Driver and OS kernel version compatibility

For the mapping between kernel versions and OS image IDs, see the kernel version and image ID mapping table.

Driver version

Alibaba Cloud Linux 2

Alibaba Cloud Linux 3

CentOS 7

Ubuntu 22.04

580.126.09

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Unsupported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

570.195.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Unsupported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

570.169

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Unsupported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

570.133.20

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Unsupported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

550.163.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

550.144.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

550.90.07

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

550.54.15

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

550.54.14

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

535.247.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

535.230.02

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

535.161.07

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

535.129.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

535.98

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

535.54.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

525.147.05

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

525.105.17

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

515.105.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

515.86.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

510.108.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

510.54

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

510.47.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

470.256.02

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, ∞)

470.161.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-17.3.al8.x86_64]

Unsupported range:

[5.10.134-18.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

470.103.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

470.82.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

470.57.02

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

460.106.00

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Supported range:

[5.15.0-40-generic, 5.15.0-101-generic]

Unsupported range:

[5.15.0-106-generic, ∞)

460.91.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

460.73.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

460.32.03

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

450.119.04

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

450.102.04

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Supported range:

[5.10.23-5.al8.x86_64, 5.10.134-14.al8.x86_64]

Unsupported range:

[5.10.134-15.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

450.80.02

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Unsupported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

440.33.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Unsupported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

418.181.07

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Unsupported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

418.113

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Unsupported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

418.87.01

Supported range:

[4.19.81-17.1.al7.x86_64, ∞)

Unsupported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

410.93

Supported range:

[4.19.81-17.1.al7.x86_64, 4.19.91-18.al7.x86_64]

Unsupported range:

[4.19.91-19.1.al7.x86_64, ∞)

Unsupported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, 3.10.0-957.21.3.el7.x86_64]

Unsupported range:

[3.10.0-1062.9.1.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

410.79

Supported range:

[4.19.81-17.1.al7.x86_64, 4.19.91-18.al7.x86_64]

Unsupported range:

[4.19.91-19.1.al7.x86_64, ∞)

Unsupported range:

[5.10.23-5.al8.x86_64, ∞)

Supported range:

[3.10.0-862.14.4.el7.x86_64, 3.10.0-957.21.3.el7.x86_64]

Unsupported range:

[3.10.0-1062.9.1.el7.x86_64, ∞)

Unsupported range:

[5.15.0-40-generic, ∞)

Kernel version and image ID mapping

Kernel version

Image ID

5.15.0-106-generic

ubuntu_22_04_x64_20G_alibase_20240508.vhd

5.15.0-101-generic

ubuntu_22_04_x64_20G_alibase_20240322.vhd

5.15.0-40-generic

ubuntu_22_04_x64_20G_alibase_20220628.vhd

5.10.134-18.al8.x86_64

aliyun_3_x64_20G_container_optimized_20250117.vhd

5.10.134-17.3.al8.x86_64

aliyun_3_x64_20G_alibase_20241103.vhd

5.10.134-15.al8.x86_64

aliyun_3_x64_20G_alibase_20230727.vhd

5.10.134-14.al8.x86_64

aliyun_3_x64_20G_alibase_20230516.vhd

5.10.23-5.al8.x86_64

aliyun_3_x64_20G_alibase_20210425.vhd

4.19.91-19.1.al7.x86_64

aliyun_2_1903_x64_20G_alibase_20200529.vhd

4.19.91-18.al7.x86_64

aliyun_2_1903_x64_20G_alibase_20200324.vhd

4.19.81-17.1.al7.x86_64

aliyun_2_1903_x64_20G_alibase_20200221.vhd

3.10.0-1062.9.1.el7.x86_64

centos_7_7_x64_20G_alibase_20191225.vhd

3.10.0-957.21.3.el7.x86_64

centos_7_6_x64_20G_alibase_20211130.vhd

3.10.0-862.14.4.el7.x86_64

centos_7_5_x64_20G_alibase_20211130.vhd

Driver and CUDA Toolkit compatibility

Select an NVIDIA driver version compatible with the CUDA Toolkit version your application uses. For the CUDA Toolkit and driver compatibility matrix, see the CUDA Toolkit Release Notes.

Checking CUDA Driver API version

If an NVIDIA driver is installed on a node, run the nvidia-smi command to view the driver version and the CUDA Driver API version. In the output below, the driver version is 550.144.03, and the CUDA version of 12.6 indicates that the driver supports the CUDA Runtime API up to version 12.6.

Mon Mar 24 08:51:55 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla P4                       On  |   00000000:00:07.0 Off |                    0 |
| N/A   33C    P8              7W /   75W |       0MiB /   7680MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Runtime API version

NVIDIA provides official CUDA base images with the CUDA Toolkit pre-installed. Build your application container image on a base image that has the toolkit version your application requires.

When you use a GPU in a container, the CUDA base image of the application's Docker image determines its CUDA runtime API version. For example, if your application's Docker image is built on the CUDA base image NVIDIA/CUDA:12.2.0-base-Ubuntu20.04, the application uses CUDA runtime API version 12.2.0.

References