Manually install the Tesla driver (Linux)

更新时间:
复制 MD 格式

A Tesla driver is required for a GPU to deliver high performance for computing workloads, such as deep learning and AI, and for graphics acceleration, such as OpenGL and cloud gaming. If you did not install a Tesla driver when you created your GPU-accelerated compute-optimized instance, you must install it manually.

Procedure

This topic applies to all GPU-accelerated compute-optimized instances that run a Linux operating system. For more information, see GPU-accelerated compute-optimized instance families (gn, ebm, and scc series). You must install a Tesla driver compatible with your instance's operating system.

Step 1: Download the NVIDIA Tesla driver

  1. Go to the NVIDIA driver download page.

    Note

    For more information about how to install and configure NVIDIA drivers, see the NVIDIA Driver Installation Quickstart Guide.

  2. Set the search conditions and click SEARCH to find a suitable driver.

    Tesla驱动.jpg

    The following table describes the search conditions.

    Setting

    Description

    Example

    • product type

    • product series

    • product family

    Select the product type, product series, and product family that correspond to the GPU model of your instance type.

    Note

    For more information about how to view the details of a GPU instance, such as the instance ID, instance type, and operating system, see View instance information.

    • Data Center / Tesla

    • A-Series

    • NVIDIA A10

    Operating system

    Select the Linux operating system version that matches the image used by your instance.

    Linux 64-bit

    CUDA Toolkit

    Select a CUDA Toolkit version.

    11.4

    Language

    Select a language for the driver.

    Chinese (Simplified)

    GPU information, supported driver versions, and CUDA versions for some GPU-accelerated compute-optimized instance types

    Item

    gn8v

    gn8is

    gn7e

    gn7i

    gn7

    gn6e

    gn6i

    gn6v

    gn5i

    gn5

    Product Type

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    product series

    H-Series

    L-Series

    A-Series

    A-Series

    A-Series

    V-Series

    T-Series

    V-Series

    P-Series

    P-Series

    Recommended Tesla driver version

    570.133.20 or later

    450.80.02 or later

    460.73.01 or later

    450.80.02 or later

    410.79 or later

    Recommended CUDA Toolkit version

    CUDA Toolkit 12.4 Update 1

    CUDA Toolkit 11.0 Update 1

    CUDA Toolkit 11.2

    CUDA Toolkit 11.0 Update 1

    CUDA Toolkit 10.1 Update 2

    Note
    • The preceding table lists GPU information for only some common GPU-accelerated compute-optimized instance types. Instances that have the same GPU card share the same GPU information (product type, product series, and product family). For example, both ebmgn7i and gn7i instances use the NVIDIA A10 GPU, so they share the same product type, product series, and product family.

    • When you manually install a Tesla driver and a CUDA package, ensure that the driver version is compatible with the CUDA package version. For more information, see CUDA Compatibility.

  3. On the search results page, click BETA, OLDER DRIVERS, AND MORE.

  4. Find the driver that you want to download and click View next to the driver.

    For example, select Data Center Driver for Linux x64 with driver version 470.161.03 and CUDA Toolkit version 11.4.

  5. On the driver details page, right-click Download and select Copy Link Address.

    驱动下载.jpg

  6. Connect to the GPU instance that runs Linux.

    For more information, see Connect to a Linux instance by using Workbench.

  7. Run the following wget command to download the driver installation package.

    The driver download URL in the command example is the link that you obtained in Step 5.

    wget --referer=https://www.nvidia.cn/ https://cn.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run

Step 2: Install the NVIDIA Tesla driver

The installation method varies based on the operating system.

CentOS

  1. Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU instance.

    sudo rpm  -qa | grep $(uname -r)
    • If output similar to the following is returned, the packages are installed.

      kernel-3.10.0-1062.18.1.el7.x86_64
      kernel-devel-3.10.0-1062.18.1.el7.x86_64
      kernel-headers-3.10.0-1062.18.1.el7.x86_64
    • If kernel-devel-* and kernel-headers-* are not included in the output, you must download and install the corresponding versions of the kernel-devel and kernel-headers packages.

      Important

      A mismatch between the kernel-devel and kernel versions causes a compilation error during the driver installation. Before you download the kernel-devel package, check the version number of kernel-* in the output to ensure version consistency. In the example output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.

  2. Grant permissions and install the Tesla driver.

    We recommend using the Tesla driver in .run format for 64-bit Linux operating systems. Run the following commands to grant permissions and install the Tesla driver.

    Note

    If you use a Tesla driver in another format, such as .deb or .rpm, see the NVIDIA CUDA Installation Guide for Linux for installation instructions.

    sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sudo sh NVIDIA-Linux-x86_64-xxxx.run
  3. Run the following command to verify that the Tesla driver is installed.

    nvidia-smi

    If output similar to the following is returned, the Tesla driver is installed.

    驱动版本.jpg

  4. (Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence-M is disabled (off) by default. The Tesla driver provides more stable performance when Persistence-M is enabled. To ensure that your services run more stably, we recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon.

    Note
    • Persistence Mode is a user-configurable driver property that keeps the target GPU initialized even when no clients are connected to the GPU.

    • Enabling Persistence Mode by using the nvidia-smi -pm 1 command causes the setting to be lost after the instance is rebooted. For more information, see Persistence Mode is lost and ECC or MIG settings fail after a GPU instance is rebooted. We recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon.

    1. Run the following command to start the NVIDIA Persistence Daemon.

      sudo nvidia-persistenced --user username 
      # Replace username with your username.
    2. Run the following command to check the status of Persistence Mode.

      nvidia-smi

      The returned message is similar to the following, which indicates that Persistence-M is in the enabled (on) state.

      persistence.jpg

  5. (Optional) Configure Persistence Mode to start on boot.

    If the system restarts, the enabled (on) state of the Persistence-M property is lost. You can perform the following operations to re-enable the Persistence-M property.

    Installing the Tesla driver installation package installs the installation scripts provided by NVIDIA, such as example scripts and installer scripts, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path.

    1. Run the following commands to decompress and install the NVIDIA-provided scripts.

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      sudo tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sudo sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon is running as expected.

      sudo systemctl status nvidia-persistenced

      If output similar to the following is returned, the NVIDIA Persistence Daemon is running as expected.

      persistence Daemon.jpg

      Note

      You can adapt the NVIDIA Persistence Daemon installation script to your operating system to ensure that the daemon works as expected.

    3. Run the following command to confirm that the Persistence-M property is enabled (on).

      nvidia-smi
    4. (Optional) Run the following commands to stop the NVIDIA Persistence Daemon.

      You can stop the NVIDIA Persistence Daemon if it is no longer required.

      sudo systemctl stop nvidia-persistenced
      sudo systemctl disable nvidia-persistenced
  6. (Conditionally required) If your GPU instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e, ebmgn7ex, or sccgn7ex instance family, install the nvidia-fabricmanager service that corresponds to your driver version.

    Important
    • For instances in the ebmgn8v, ebmgn7, or ebmgn7e, ebmgn7ex, or sccgn7ex instance family, you cannot use the GPU if the corresponding nvidia-fabricmanager service is not installed.

    • If the GPU instance family is not ebmgn8v, ebmgn7, ebmgn7e, ebmgn7ex, or sccgn7ex, skip this step.

    1. Install the nvidia-fabricmanager service.

      You can install the nvidia-fabricmanager service by using the source code or an installation package. The following command examples are for the CentOS 7.x and CentOS 8.x operating systems. In the commands, replace driver_version with the version number of the driver that you downloaded in Step 1: Download the NVIDIA Tesla driver. For example, the driver version can be 460.91.03.

      • source code

        • CentOS 7.x

          driver_version=460.91.03
          sudo yum -y install yum-utils
          sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
          sudo yum install -y nvidia-fabric-manager-${driver_version}-1
        • CentOS 8.x

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          distribution=rhel8
          ARCH=$( /bin/arch )
          sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo
          sudo dnf module enable -y nvidia-driver:${driver_version_main}
          sudo dnf install -y nvidia-fabric-manager-0:${driver_version}-1
      • installation package

        • CentOS 7.x

          driver_version=460.91.03
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
        • CentOS 8.x

          driver_version=460.91.03
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
    2. Run the following commands to start the nvidia-fabricmanager service.

      sudo systemctl enable nvidia-fabricmanager
      sudo systemctl start nvidia-fabricmanager
    3. Run the following command to check whether the nvidia-fabricmanager service is installed.

      systemctl status nvidia-fabricmanager

      If output similar to the following is returned, the nvidia-fabricmanager service is installed.

      Dingtalk_20240910143221.jpg

Ubuntu and other operating systems

  1. Grant permissions and install the Tesla driver.

    We recommend using the Tesla driver in .run format for 64-bit Linux operating systems. Run the following commands to grant permissions and install the Tesla driver.

    Note

    If you use a Tesla driver in another format, such as .deb or .rpm, see the NVIDIA CUDA Installation Guide for Linux for installation instructions.

    sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sudo sh NVIDIA-Linux-x86_64-xxxx.run
  2. Run the following command to verify that the Tesla driver is installed.

    nvidia-smi

    If output similar to the following is returned, the Tesla driver is installed.

    驱动版本.jpg

  3. (Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence-M is disabled (off) by default. The Tesla driver provides more stable performance when Persistence-M is enabled. To ensure that your services run more stably, we recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon.

    Note
    • Persistence Mode is a user-configurable driver property that keeps the target GPU initialized even when no clients are connected to the GPU.

    • Enabling Persistence Mode by using the nvidia-smi -pm 1 command causes the setting to be lost after the instance is rebooted. For more information, see Persistence Mode is lost and ECC or MIG settings fail after a GPU instance is rebooted. We recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon.

    1. Run the following command to start the NVIDIA Persistence Daemon.

      sudo nvidia-persistenced --user username 
      # Replace username with your username.
    2. Run the following command to check the status of Persistence Mode.

      nvidia-smi

      The returned message is similar to the following, which indicates that Persistence-M is in the enabled (on) state.

      persistence.jpg

  4. (Optional) Configure Persistence Mode to start on boot.

    If the system restarts, the enabled (on) state of the Persistence-M property is lost. You can perform the following operations to re-enable the Persistence-M property.

    Installing the Tesla driver installation package installs the installation scripts provided by NVIDIA, such as example scripts and installer scripts, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path.

    1. Run the following commands to decompress and install the NVIDIA-provided scripts.

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      sudo tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sudo sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon is running as expected.

      sudo systemctl status nvidia-persistenced

      If output similar to the following is returned, the NVIDIA Persistence Daemon is running as expected.

      persistence Daemon.jpg

      Note

      You can adapt the NVIDIA Persistence Daemon installation script to your operating system to ensure that the daemon works as expected.

    3. Run the following command to confirm that the Persistence-M property is enabled (on).

      nvidia-smi
    4. (Optional) Run the following commands to stop the NVIDIA Persistence Daemon.

      You can stop the NVIDIA Persistence Daemon if it is no longer required.

      sudo systemctl stop nvidia-persistenced
      sudo systemctl disable nvidia-persistenced
  5. (Conditionally required) If your GPU instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e, ebmgn7ex, or sccgn7ex instance family, install the nvidia-fabricmanager service that corresponds to your driver version.

    Important
    • For instances in the ebmgn8v, ebmgn7, or ebmgn7e, ebmgn7ex, or sccgn7ex instance family, you cannot use the GPU if the corresponding nvidia-fabricmanager service is not installed.

    • If the GPU instance family is not ebmgn8v, ebmgn7, ebmgn7e, ebmgn7ex, or sccgn7ex, skip this step.

    1. Install the nvidia-fabricmanager service.

      You can install the nvidia-fabricmanager service from the source code or an installation package. The following command is an example for the Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, or Ubuntu 24.04 operating system. In the command, replace driver_version with the version number of the driver that you downloaded in Step 1: Download the NVIDIA Tesla driver.

      Important
      • On Ubuntu 22.04, the nvidia-fabricmanager service requires a Tesla driver version later than 515.48.07. The following example for Ubuntu 22.04 uses driver version 535.154.05.

      • On Ubuntu 24.04, the nvidia-fabricmanager service requires a Tesla driver version later than 550.90.07. The following example for Ubuntu 24.04 uses driver version 570.133.20.

      • source code

        Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04

        driver_version=460.91.03
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        sudo apt-key add 3bf863cc.pub
        sudo rm 3bf863cc.pub
        sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        sudo apt-get update
        sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*

        Ubuntu 22.04

        driver_version=535.154.05
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        sudo apt-key add 3bf863cc.pub
        sudo rm 3bf863cc.pub
        sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        sudo apt-get update
        sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*

        Ubuntu 24.04

        driver_version=570.133.20
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        sudo apt-key add 3bf863cc.pub
        sudo rm 3bf863cc.pub
        sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        sudo apt-get update
        sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
      • installation package

        • Ubuntu 16.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 18.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 20.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 22.04

          driver_version=535.154.05 
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 24.04

          driver_version=570.133.20 
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
    2. Run the following commands to start the nvidia-fabricmanager service.

      sudo systemctl enable nvidia-fabricmanager
      sudo systemctl start nvidia-fabricmanager
    3. Run the following command to check whether the nvidia-fabricmanager service is installed.

      systemctl status nvidia-fabricmanager

      If output similar to the following is returned, the nvidia-fabricmanager service is installed.

      image.png

      Note

      The nvidia-fabricmanager package version must match the Tesla driver version for the GPU to function correctly. On an Ubuntu system, if you install the nvidia-fabricmanager service from an installation package, the apt-daily service may automatically update the package. This can cause a version mismatch between the nvidia-fabricmanager package and the Tesla driver. As a result, the nvidia-fabricmanager service fails to start, making the GPU unavailable. For information about how to resolve this issue, see GPU becomes unavailable due to a version mismatch between nvidia-fabricmanager and the Tesla driver.

Related topics