NVIDIA Driver and CUDA installation in Ubuntu
In this post, we will look at a way to install CUDA 12.3 on Ubuntu 22.04.4 LTS.
Problem
The underlying NVIDIA Driver kept throwing errors while installing CUDA 11.7, 11.8, 12.2, etc. The recommended NVIDIA Driver was installed on Ubuntu using the following command:
$ sudo ubuntu-drivers install
But upon installing CUDA, the installer throws the following error:
Loading new nvidia-520.61.05 DKMS files...
Building for 6.2.0-39-generic 6.5.0-18-generic
Building for architecture x86_64
Building initial module for 6.2.0-39-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-dkms-520.0.crash'
Error! Bad return status for module build on kernel: 6.2.0-39-generic (x86_64)
Consult /var/lib/dkms/nvidia/520.61.05/build/make.log for more information.
I checked the crash file and found the following errors:
/var/lib/dkms/nvidia/520.61.05/build/nvidia/nv-mmap.c: In function ‘nvidia_mmap_numa’:
/var/lib/dkms/nvidia/520.61.05/build/nvidia/nv-mmap.c:446:19: error: assignment of read-only member ‘vm_flags’
446 | vma->vm_flags |= VM_MIXEDMAP;
| ^~
/var/lib/dkms/nvidia/520.61.05/build/nvidia/nv-mmap.c: In function ‘nvidia_mmap_helper’:
Env Info
Below are the environment details:
- OS
$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy
- GCC
$ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- GLIBC
$ ldd --version ldd (Ubuntu GLIBC 2.35-0ubuntu3.6) 2.35 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.
- Kernel
$ uname -a Linux asus 6.5.0-18-generic #18~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 7 11:40:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
- Graphic Card
$ lspci | grep -i nvidia 01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) 01:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
- NVIDIA Detector
$ nvidia-detector nvidia-driver-545
- Desktop Session
$ echo $XDG_SESSION_TYPE x11
- NVIDIA System Management Interface
$ nvidia-smi Sat Feb 17 16:05:47 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 On | N/A | | 44% 59C P2 116W / 350W | 1124MiB / 24576MiB | 12% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2919 G /usr/lib/xorg/Xorg 212MiB | | 0 N/A N/A 3026 C ...libexec/gnome-remote-desktop-daemon 707MiB | | 0 N/A N/A 3060 G /usr/bin/gnome-shell 130MiB | | 0 N/A N/A 3160 G ...libexec/gnome-remote-desktop-daemon 0MiB | | 0 N/A N/A 6684 G ...seed-version=20240216-130127.682000 35MiB | +---------------------------------------------------------------------------------------+
Solution
Install CUDA 12.3 on Ubuntu 22.04.4 LTS as per the official documentation:
- Base Installer
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin $ sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 $ wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb $ sudo dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb $ sudo cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/ $ sudo apt update $ sudo apt -y install cuda-toolkit-12-3
More information can be found here
- Driver Installer
To install the open kernel module flavor:
$ sudo apt install -y nvidia-kernel-open-545 $ sudo apt install -y cuda-drivers-545
More information can be found here
Update (Install CUDA 11.7 on Ubuntu 22.04)
For CUDA 11.7, we need to install the nvidia-driver-515
. However, as of writing this post, the nvidia-driver-515
is unhappy with kernel 6.5 and reports the following error:
...
A new initrd image has also been created. To revert, please regenerate your
initrd by running the following command after deleting the modprobe.d file:
`/usr/sbin/initramfs -u`
*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can ***
*** be loaded. ***
*****************************************************************************
INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
Loading new nvidia-515.65.01 DKMS files...
Building for 5.15.0-94-generic 6.5.0-18-generic
Building for architecture x86_64
Building initial module for 5.15.0-94-generic
Secure Boot not enabled on this system.
Done.
nvidia.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.15.0-94-generic/updates/dkms/
nvidia-modeset.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.15.0-94-generic/updates/dkms/
nvidia-drm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.15.0-94-generic/updates/dkms/
nvidia-peermem.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.15.0-94-generic/updates/dkms/
nvidia-uvm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.15.0-94-generic/updates/dkms/
depmod...
Building initial module for 6.5.0-18-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-kernel-source-515.0.crash'
Error! Bad return status for module build on kernel: 6.5.0-18-generic (x86_64)
Consult /var/lib/dkms/nvidia/515.65.01/build/make.log for more information.
dpkg: error processing package nvidia-dkms-515 (--configure):
installed nvidia-dkms-515 package post-installation script subprocess returned error exit status 10
Setting up nsight-systems-2022.1.3 (2022.1.3.3-1c7b5f7) ...
update-alternatives: using /opt/nvidia/nsight-systems/2022.1.3/target-linux-x64/nsys to provide /usr/local/bin/nsys (nsy
s) in auto mode
update-alternatives: using /opt/nvidia/nsight-systems/2022.1.3/host-linux-x64/nsys-ui to provide /usr/local/bin/nsys-ui
(nsys-ui) in auto mode
Setting up libxml2:i386 (2.9.13+dfsg-1ubuntu0.3) ...
Setting up libcufile-dev-11-7 (1.3.1.18-1) ...
Setting up cuda-libraries-11-7 (11.7.1-1) ...
Setting up libxcb-dri3-0:i386 (1.14-3ubuntu3) ...
dpkg: dependency problems prevent configuration of cuda-drivers-515:
cuda-drivers-515 depends on nvidia-dkms-515 (>= 515.65.01); however:
Package nvidia-dkms-515 is not configured yet.
The crash and log file contains the following error:
/var/lib/dkms/nvidia/515.65.01/build/common/inc/nv-mm.h: In function ‘NV_GET_USER_PAGES_REMOTE’:
/var/lib/dkms/nvidia/515.65.01/build/common/inc/nv-mm.h:164:45: error: passing argument 1 of ‘get_user_pages_remote’ from incompatible pointer type [-Werror=incompatible-pointer-types]
164 | return get_user_pages_remote(tsk, mm, start, nr_pages, flags,
| ^~~
| |
| struct task_struct *
In file included from /var/lib/dkms/nvidia/515.65.01/build/common/inc/nv-pgprot.h:30,
Therefore, we need to first remove kernel 6.5 using the following command:
$ sudo apt purge linux-image-6.5.0-18-generic linux-modules-6.5.0-18-generic linux-modules-extra-6.5.0-18-generic linux-headers-6.5.0-18-generic
Before, we proceed next, please make sure we have kernel 5.15 mentioned in the official documentation as shown below
$ uname -r
5.15.0-94-generic
Install CUDA 11.7.1 on Ubuntu 22.04.4 LTS as per the official documentation:
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
$ sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb
$ sudo cp /var/cuda-repo-ubuntu2204-11-7-local/cuda-F83D2C4C-keyring.gpg /usr/share/keyrings/
$ sudo apt update
$ sudo apt install cuda-11-7
Additional Info (Install kernel 6.5 on Ubuntu 22.04)
$ sudo apt install linux-headers-generic
$ sudo apt install linux-image-6.5.0-18-generic linux-modules-6.5.0-18-generic linux-modules-extra-6.5.0-18-generic linux-headers-6.5.0-18-generic
$ sudo update-grub
Install cuDNN
The cuDNN can be downloaded from cuDNN Archive. You need to create and sign in to NVIDIA to proceed further. Once downloaed, the following command can be used to install the deb file:
$ sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb
Remove NVIDIA Driver and CUDA
$ sudo apt --purge -y remove "*nvidia*"
$ sudo apt --purge -y remove "*cublas*" "cuda*" "nsight*"
$ sudo apt update -y && sudo apt upgrade -y && sudo apt autoremove -y && sudo apt autoclean -y