Skip to content

How to Install DGX Tools and NVIDIA Drivers on Ubuntu

In the rapidly evolving landscape of artificial intelligence and deep learning, having the right tools and drivers installed on your systems is crucial for performance and efficiency. NVIDIA's DGX systems are at the forefront of this technology, providing unparalleled computational power. This guide will walk you through installing DGX tools and NVIDIA drivers on a system running Ubuntu, ensuring that your hardware is fully leveraged for AI tasks.

Introduction

NVIDIA DGX systems are designed for demanding AI and deep learning workloads, offering optimized hardware and software configurations. To harness the full potential of these systems, it's essential to install the correct DGX tools and NVIDIA drivers. This post provides a step-by-step guide to setting up your DGX system with the necessary software components on an Ubuntu-based environment.

Prerequisites

  • An NVIDIA DGX system
  • Ubuntu 20.04 LTS installed
  • Internet connection
  • Root or sudo privileges

Installation Steps

1. Enable NVIDIA Repositories

Start by adding NVIDIA's repositories to your system to ensure you can access and install the latest software packages directly from NVIDIA.

curl https://repo.download.nvidia.com/baseos/ubuntu/focal/dgx-repo-files.tgz | sudo tar xzf - -C /

2. Update APT Database

With the NVIDIA repositories in place, update your APT package list to include the new sources.

apt update

To maintain system security and stability, upgrade all your installed packages to their latest versions.

apt upgrade

4. Install DGX System Tools and Configurations

Install the DGX system-specific configurations and tools to optimize your system's performance for AI workloads.

apt install -y dgx-a100-system-configurations dgx-a100-system-tools dgx-a100-system-tools-extra

5. Disable the Ondemand Governor

For optimal performance, disable the 'ondemand' CPU governor, setting it to 'performance' mode instead.

systemctl disable ondemand

To prevent automatic updates from disrupting your system configuration, disable unattended upgrades.

apt purge -y unattended-upgrades

7. Install the Latest Kernel

Ensure you have the latest Linux kernel installed for security and compatibility.

apt install -y linux-generic

8. Install NVIDIA CUDA Driver

Install the NVIDIA driver package to enable CUDA support, which is essential for GPU-accelerated applications.

apt install -y nvidia-driver-470-server linux-modules-nvidia-470-server-generic libnvidia-nscq-470 nvidia-modprobe nvidia-fabricmanager-470 datacenter-gpu-manager nv-persistence-mode

9. Enable Required Services

Enable essential NVIDIA services to start automatically at boot.

systemctl enable nvidia-fabricmanager nvidia-persistenced nvidia-dcgm

10. Install Additional Tools

Finally, install tools for Serial over LAN (SOL) and NVIDIA System Management (NVSM) to enhance remote management capabilities.

apt install -y nvidia-ipmisol nvsm

Conclusion

Following these steps will ensure your NVIDIA DGX system is equipped with the latest tools, drivers, and configurations optimized for AI and deep learning projects. Regularly check for updates to the DGX system tools and NVIDIA drivers to maintain optimal performance and compatibility.