NVIDIA Driver and CUDA installation in Ubuntu

In this post, we will look at a way to install CUDA 12.3 on Ubuntu 22.04.4 LTS.

Problem

The underlying NVIDIA Driver kept throwing errors while installing CUDA 11.7, 11.8, 12.2, etc. The recommended NVIDIA Driver was installed on Ubuntu using the following command:

$ sudo ubuntu-drivers install

But upon installing CUDA, the installer throws the following error:

Loading new nvidia-520.61.05 DKMS files...
Building for 6.2.0-39-generic 6.5.0-18-generic
Building for architecture x86_64
Building initial module for 6.2.0-39-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-dkms-520.0.crash'
Error! Bad return status for module build on kernel: 6.2.0-39-generic (x86_64)
Consult /var/lib/dkms/nvidia/520.61.05/build/make.log for more information.

I checked the crash file and found the following errors:

 /var/lib/dkms/nvidia/520.61.05/build/nvidia/nv-mmap.c: In function ‘nvidia_mmap_numa’:
 /var/lib/dkms/nvidia/520.61.05/build/nvidia/nv-mmap.c:446:19: error: assignment of read-only member ‘vm_flags’
   446 |     vma->vm_flags |= VM_MIXEDMAP;
       |                   ^~
 /var/lib/dkms/nvidia/520.61.05/build/nvidia/nv-mmap.c: In function ‘nvidia_mmap_helper’:

Env Info

Below are the environment details:

  • OS
    $ lsb_release -a
    No LSB modules are available.
    Distributor ID:	Ubuntu
    Description:	Ubuntu 22.04.4 LTS
    Release:	22.04
    Codename:	jammy
    
  • GCC
    $ gcc --version
    gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
    Copyright (C) 2021 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
  • GLIBC
    $ ldd --version
    ldd (Ubuntu GLIBC 2.35-0ubuntu3.6) 2.35
    Copyright (C) 2022 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    Written by Roland McGrath and Ulrich Drepper.
    
  • Kernel
    $ uname -a
    Linux asus 6.5.0-18-generic #18~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb  7 11:40:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
    
  • Graphic Card
    $ lspci | grep -i nvidia
    01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
    01:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
    
  • NVIDIA Detector
    $ nvidia-detector
    nvidia-driver-545
    
  • Desktop Session
    $ echo $XDG_SESSION_TYPE
    x11
    
  • NVIDIA System Management Interface
    $ nvidia-smi
    Sat Feb 17 16:05:47 2024
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
    |                                         |                      |               MIG M. |
    |=========================================+======================+======================|
    |   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0  On |                  N/A |
    | 44%   59C    P2             116W / 350W |   1124MiB / 24576MiB |     12%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+
                                                                                               
    +---------------------------------------------------------------------------------------+
    | Processes:                                                                            |
    |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
    |        ID   ID                                                             Usage      |
    |=======================================================================================|
    |    0   N/A  N/A      2919      G   /usr/lib/xorg/Xorg                          212MiB |
    |    0   N/A  N/A      3026      C   ...libexec/gnome-remote-desktop-daemon      707MiB |
    |    0   N/A  N/A      3060      G   /usr/bin/gnome-shell                        130MiB |
    |    0   N/A  N/A      3160      G   ...libexec/gnome-remote-desktop-daemon        0MiB |
    |    0   N/A  N/A      6684      G   ...seed-version=20240216-130127.682000       35MiB |
    +---------------------------------------------------------------------------------------+
    

Solution

Install CUDA 12.3 on Ubuntu 22.04.4 LTS as per the official documentation:

  • Base Installer
    $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
    $ sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
    $ wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb
    $ sudo dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.2-545.23.08-1_amd64.deb
    $ sudo cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
    $ sudo apt update
    $ sudo apt -y install cuda-toolkit-12-3
    

    More information can be found here

  • Driver Installer To install the open kernel module flavor:
    $ sudo apt install -y nvidia-kernel-open-545
    $ sudo apt install -y cuda-drivers-545
    

    More information can be found here

Update (Install CUDA 11.7 on Ubuntu 22.04)

For CUDA 11.7, we need to install the nvidia-driver-515. However, as of writing this post, the nvidia-driver-515 is unhappy with kernel 6.5 and reports the following error:

...
A new initrd image has also been created. To revert, please regenerate your
initrd by running the following command after deleting the modprobe.d file:
`/usr/sbin/initramfs -u`

*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can   ***
*** be loaded.                                                            ***
*****************************************************************************

INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
Loading new nvidia-515.65.01 DKMS files...
Building for 5.15.0-94-generic 6.5.0-18-generic
Building for architecture x86_64
Building initial module for 5.15.0-94-generic
Secure Boot not enabled on this system.
Done.

nvidia.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.0-94-generic/updates/dkms/

nvidia-modeset.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.0-94-generic/updates/dkms/

nvidia-drm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.0-94-generic/updates/dkms/

nvidia-peermem.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.0-94-generic/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.0-94-generic/updates/dkms/

depmod...
Building initial module for 6.5.0-18-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-kernel-source-515.0.crash'
Error! Bad return status for module build on kernel: 6.5.0-18-generic (x86_64)
Consult /var/lib/dkms/nvidia/515.65.01/build/make.log for more information.
dpkg: error processing package nvidia-dkms-515 (--configure):
 installed nvidia-dkms-515 package post-installation script subprocess returned error exit status 10
Setting up nsight-systems-2022.1.3 (2022.1.3.3-1c7b5f7) ...
update-alternatives: using /opt/nvidia/nsight-systems/2022.1.3/target-linux-x64/nsys to provide /usr/local/bin/nsys (nsy
s) in auto mode
update-alternatives: using /opt/nvidia/nsight-systems/2022.1.3/host-linux-x64/nsys-ui to provide /usr/local/bin/nsys-ui 
(nsys-ui) in auto mode
Setting up libxml2:i386 (2.9.13+dfsg-1ubuntu0.3) ...
Setting up libcufile-dev-11-7 (1.3.1.18-1) ...
Setting up cuda-libraries-11-7 (11.7.1-1) ...
Setting up libxcb-dri3-0:i386 (1.14-3ubuntu3) ...
dpkg: dependency problems prevent configuration of cuda-drivers-515:
 cuda-drivers-515 depends on nvidia-dkms-515 (>= 515.65.01); however:
  Package nvidia-dkms-515 is not configured yet.

The crash and log file contains the following error:

/var/lib/dkms/nvidia/515.65.01/build/common/inc/nv-mm.h: In function ‘NV_GET_USER_PAGES_REMOTE’:
/var/lib/dkms/nvidia/515.65.01/build/common/inc/nv-mm.h:164:45: error: passing argument 1 of ‘get_user_pages_remote’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  164 |                return get_user_pages_remote(tsk, mm, start, nr_pages, flags,
      |                                             ^~~
      |                                             |
      |                                             struct task_struct *
In file included from /var/lib/dkms/nvidia/515.65.01/build/common/inc/nv-pgprot.h:30,

Therefore, we need to first remove kernel 6.5 using the following command:

$ sudo apt purge linux-image-6.5.0-18-generic linux-modules-6.5.0-18-generic linux-modules-extra-6.5.0-18-generic linux-headers-6.5.0-18-generic

Before, we proceed next, please make sure we have kernel 5.15 mentioned in the official documentation as shown below

$ uname -r
5.15.0-94-generic

Install CUDA 11.7.1 on Ubuntu 22.04.4 LTS as per the official documentation:

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
$ sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu2204-11-7-local_11.7.1-515.65.01-1_amd64.deb 
$ sudo cp /var/cuda-repo-ubuntu2204-11-7-local/cuda-F83D2C4C-keyring.gpg /usr/share/keyrings/
$ sudo apt update
$ sudo apt install cuda-11-7

Additional Info (Install kernel 6.5 on Ubuntu 22.04)

$ sudo apt install linux-headers-generic
$ sudo apt install linux-image-6.5.0-18-generic linux-modules-6.5.0-18-generic linux-modules-extra-6.5.0-18-generic linux-headers-6.5.0-18-generic
$ sudo update-grub

Install cuDNN

The cuDNN can be downloaded from cuDNN Archive. You need to create and sign in to NVIDIA to proceed further. Once downloaed, the following command can be used to install the deb file:

$ sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb

Remove NVIDIA Driver and CUDA

$ sudo apt --purge -y remove "*nvidia*"
$ sudo apt --purge -y remove "*cublas*" "cuda*" "nsight*"
$ sudo apt update -y && sudo apt upgrade -y && sudo apt autoremove -y && sudo apt autoclean -y

References

  1. NVIDIA CUDA Installation Guide for Linux
  2. CUDA Toolkit 12.3 Update 2 Downloads
  3. Meta Packages
  4. Switching between Driver Module Flavors
  5. NVIDIA CUDA Installation Guide for Linux (CUDA 11.7.1)
  6. cuDNN Archive
Written on February 17, 2024