centos7.0安装cuda驱动

00、CUDA简介

  CUDA和GPU的并行处理能力来加速深度学习和其他计算密集型应用程序

01、CPU+GPU协同架构

 02、部署环境

[docker@lab-250 ~]$ cat /etc/*release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.0 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.0"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.0 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.0:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.0
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION=7.0
Red Hat Enterprise Linux Server release 7.0 (Maipo)
Red Hat Enterprise Linux Server release 7.0 (Maipo)

[docker@lab-250 ~]$ uname -r
3.10.0-123.el7.x86_64
[docker@lab-250 ~]$ uname -a
Linux lab-250 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

注意:要在服务器上安装GPU显卡

03、下载CUDA-Tookit

https://developer.nvidia.com/cuda-toolkit-archive

CUDA Toolkit 9.0 (Sept 2017), Online Documentation   //实验下载此版本,根据系统下载对应的安装包,建议选择本地集成成果包!

https://developer.nvidia.com/cuda-toolkit 

注意:下面的安装,是由于系统是rhel7.0,错误认为是centos7.0导致部分rpm未安装需要单独下载。一般对应版本是不需要在额外下载rpm包

 cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64-rpm  #centos7,由于centos是基于rhel7的开源发行版本,所以名字rhel7

04、setup

Installation Instructions:
rpm -i cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64.rpm
yum clean all && yum makecache
yum install cuda

Other installation options are available in the form of meta-packages.
For example, to install all the library packages, replace "cuda" with the "cuda-libraries-9-0" meta package

注意:安装cuda的时候它会自动找NVIDIA显卡的,不需要提前把NVIDIA显卡设置为默认显卡

错误处理:

https://mirrors.aliyun.com/epel/7/aarch64/Packages/d/dkms-2.6.1-1.el7.noarch.rpm
https://mirrors.aliyun.com/centos/7.6.1810/os/x86_64/Packages/libvdpau-1.1.1-3.el7.x86_64.rpm

--> Finished Dependency Resolution
Error: Package: 1:nvidia-kmod-384.81-2.el7.x86_64 (cuda-9-0-local)
           Requires: dkms
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
[root@lab-250 ~]# rz -E
rz waiting to receive.
[root@lab-250 ~]# rpm -ivh dkms-2.6.1-1.el7.noarch.rpm 
warning: dkms-2.6.1-1.el7.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
error: Failed dependencies:
	elfutils-libelf-devel is needed by dkms-2.6.1-1.el7.noarch
[root@lab-250 ~]# 
[root@lab-250 ~]# yum install -y elfutils-libelf-devel
Resolving Dependencies
--> Running transaction check
---> Package elfutils-libelf-devel.x86_64 0:0.158-3.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

[root@lab-250 ~]# rpm -ivh dkms-2.6.1-1.el7.noarch.rpm
warning: dkms-2.6.1-1.el7.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:dkms-2.6.1-1.el7 ################################# [100%]
[root@lab-250 ~]#
[root@lab-250 ~]# yum install -y cuda

05、设置环境变量

/usr/local/cuda-9.0   #默认安装位置

vim /etc/profile

export CUDA_HOME="/usr/local/cuda-9.0"
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

source /etc/profile

[docker@lab-250 ~]$ nvcc -V  #验证环境变量
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
[docker@lab-250 ~]$ nvidia-smi  #查看本机GPU显卡信息,由于测试机未安装GPU显卡导致的
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installe
d and running.

引用:

https://baijiahao.baidu.com/s?id=1610852365402771191&wfr=spider&for=pc

https://www.jianshu.com/p/34a504af8d51

原文地址:https://www.cnblogs.com/xiaochina/p/10631522.html