ubuntu安装cuda、cudnn和nvidia-docker


本文参考自Ubuntu18.04安装CUDA10.1和cuDNN v7.6.5

安装前的工作

lspci | grep -i nvidia查看可用的nvidia设备——
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
uname -m && cat /etc/*release知晓操作系统的信息——64位的ubuntu20.04系统

x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

gcc --version检查是否已安装gcc——version:(Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
uname -rlinux内核版本——5.8.0-50-generic

要安装的cuda和cudnn版本说明

根据windows踩坑的情况,rtx1060适配的cuda版本10.1.105_418,cudnn版本10.1v7.6.5.32

安装cuda

下载好cuda10.1.105_418,由于没有ubuntu20.04对应的版本,我选择了18.10包。按照下载页面执行如下命令:

sudo dpkg -i cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
/*执行第一条命令打印出的内容
Selecting previously unselected package cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39.
(Reading database ... 186150 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39_1.0-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39 (1.0-1) ...
Setting up cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39 (1.0-1) ...

The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/cuda-repo-10-1-local-10.1.105-418.39/7fa2af80.pub
*/
sudo apt-key add /var/cuda-repo-10-1-local-10.1.105-418.39/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

之后重启

检查cuda的安装情况

重启后执行nvidia-smi获取显卡信息。执行nvcc -V,建议“sudo apt install nvidia-cuda-toolkit”,不要如此做,因为本地已有与cuda对应的nvcc程序,从线上安装nvidia-cuda-toolkit可能造成toolkit与cuda的版本冲突,令cuda环境失效。(我曾经乱在主机上装nvidia-cuda-toolkit导致nvidia-smi命令无法使用,整个主机无法使用nv显卡,需要重新装cuda环境。)
下面将nvcc添加到环境变量中

vim ~/.bashrc
# 添加一行:export PATH="/usr/local/cuda-10.1/bin:$PATH"
source ~/.bashrc

之后执行nvcc -V命令得到结果:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

安装cudnn

去nv网站下载cudnn-10.1-linux-x64-v7.6.5.32.tgz(cudnn for linux)

tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn* # 所有用户组赋上读权限
vim ~/.bashrc
# 添加一行:export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
source ~/.bashrc

安装nvidia-docker

根据Docker-Getting Started-Installing on Ubuntu and Debian文档的说明,执行如下命令:

curl https://get.docker.com | sh 
&& sudo systemctl --now enable docker

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) 
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - 
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
sudo docker images
/*
REPOSITORY    TAG         IMAGE ID       CREATED        SIZE
nvidia/cuda   11.0-base   2ec708416bb8   8 months ago   122MB
*/

在红米book14上的实践

参考Win10+MX250+CUDA10.1+cuDNN+Pytorch1.4安装+测试全过程(吐血),使用的CUDA和cudnn还是这篇博文中用到的软件。按照本文的操作得到正确结果,中间遇到一个问题:执行nvidia-smi命令报错“VIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”,在BIOS设定好管理员的密码关闭安全启动模式,解决该问题
本文创建于2021年 05月 05日 星期三 19:41:19 CST,修改于2021年7月19日14点44分

原文地址:https://www.cnblogs.com/tellw/p/14732368.html