编译分布式并行版caffe(Open MPI)教程

caffe版本:https://github.com/yjxiong/caffe

使用环境:

1 CentOS release 6.6 (Final)
2 CUDA8.0
3 CuDNN6.0
4 Open MPI 3.1.3
5 OpenCV 3.1.0
View Code

CUDA8.0、CuDNN6.0、OpenCV3.1.0以及其他caffe所需要的依赖已经装好,这里仅需要安装OpenMPI3.1.3,步骤如下:

OpenMPI-3.1.3安装

1. 解压openmpi-3.1.3,进入解压后的文件夹 — openmpi3.1.3,在终端输入如下命令:

1 ./configure --prefix=/storage/student5/usr/local/openmpi --with-cuda --enable-mpi-thread-multiple
2 #--prefix后的路径未openmpi的安装路径;
3 sudo make all install
4 # make all install 加sudo,否则安装过程中可能出问题
View Code

2. 测试安装是否成功

1 cd openmpi-3.1.3/examples
2 make
3 mpirun -np 4 hello_c
View Code

Caffe安装

1. 下载caffe,将Makefile.config.example另存为Makefile.config,将其修改成以下的样子:

 1 ## Refer to http://caffe.berkeleyvision.org/installation.html
 2 # Contributions simplifying and improving our build system are welcome!
 3 
 4 # cuDNN acceleration switch (uncomment to build with cuDNN).
 5  USE_CUDNN := 1
 6 
 7 # CPU-only switch (uncomment to build without GPU support).
 8 # CPU_ONLY := 1
 9 
10 # uncomment to disable IO dependencies and corresponding data layers
11  USE_OPENCV := 1
12  USE_LEVELDB := 1
13  USE_LMDB := 1
14 
15 # Uncomment if you're using OpenCV 3
16  OPENCV_VERSION := 3
17 
18 # To customize your choice of compiler, uncomment and set the following.
19 # N.B. the default for Linux is g++ and the default for OSX is clang++
20 # CUSTOM_CXX := g++
21 
22 # CUDA directory contains bin/ and lib/ directories that we need.
23 CUDA_DIR := /usr/local/cuda
24 # On Ubuntu 14.04, if cuda tools are installed via
25 # "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
26 # CUDA_DIR := /usr
27 
28 # CUDA architecture setting: going with all of them.
29 # For CUDA < 6.0, comment the *_50 lines for compatibility.
30 CUDA_ARCH :=     -gencode arch=compute_30,code=sm_30 
31         -gencode arch=compute_35,code=sm_35 
32         -gencode arch=compute_50,code=sm_50 
33         -gencode arch=compute_50,code=compute_50
34 
35 # BLAS choice:
36 # atlas for ATLAS (default)
37 # mkl for MKL
38 # open for OpenBlas
39 BLAS := atlas
40 # Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
41 # Leave commented to accept the defaults for your choice of BLAS
42 # (which should work)!
43  BLAS_INCLUDE := /usr/include
44  BLAS_LIB := /usr/lib64/atlas
45 
46 # Homebrew puts openblas in a directory that is not on the standard search path
47 # BLAS_INCLUDE := $(shell brew --prefix openblas)/include
48 # BLAS_LIB := $(shell brew --prefix openblas)/lib
49 
50 # This is required only if you will compile the matlab interface.
51 # MATLAB directory should contain the mex binary in /bin.
52  MATLAB_DIR := /usr/local/MATLAB/R2014a
53 # MATLAB_DIR := /Applications/MATLAB_R2012b.app
54 
55 # NOTE: this is required only if you will compile the python interface.
56 # We need to be able to find Python.h and numpy/arrayobject.h.
57 PYTHON_INCLUDE := /usr/include/python2.7 
58         /usr/lib/python2.7/dist-packages/numpy/core/include
59 # Anaconda Python distribution is quite popular. Include path:
60 # Verify anaconda location, sometimes it's in root.
61 # ANACONDA_HOME := $(HOME)/anaconda
62 # PYTHON_INCLUDE := $(ANACONDA_HOME)/include 
63         # $(ANACONDA_HOME)/include/python2.7 
64         # $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include 
65 
66 # We need to be able to find libpythonX.X.so or .dylib.
67 PYTHON_LIB := /usr/lib
68 # PYTHON_LIB := $(ANACONDA_HOME)/lib
69 
70 # Homebrew installs numpy in a non standard path (keg only)
71 # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
72 # PYTHON_LIB += $(shell brew --prefix numpy)/lib
73 
74 # Uncomment to support layers written in Python (will link against Python libs)
75  WITH_PYTHON_LAYER := 1
76 
77 # Whatever else you find you need goes here.
78 INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
79 LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
80 
81 # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
82 # INCLUDE_DIRS += $(shell brew --prefix)/include
83 # LIBRARY_DIRS += $(shell brew --prefix)/lib
84 
85 # Uncomment to use `pkg-config` to specify OpenCV library paths.
86 # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
87 # USE_PKG_CONFIG := 1
88 
89 BUILD_DIR := build
90 DISTRIBUTE_DIR := distribute
91 
92 # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
93 # DEBUG := 1
94 
95 # The ID of the GPU that 'make runtest' will use to run unit tests.
96 TEST_GPUID := 0
97 
98 # enable pretty build (comment to see full commands)
99 Q ?= @
View Code

2. 在caffe目录下执行以下操作:

1 mkdir build && cd build
View Code

3. 编译caffe

  如果要开启matlab接口,先修改caffe根目录下的CMakeList.txt文件line24:

1 caffe_option(BUILD_matlab "Build Matlab wrapper" OFF IF UNIX OR APPLE)
View Code

  修改为:

1 caffe_option(BUILD_matlab "Build Matlab wrapper" ON IF UNIX OR APPLE)
View Code

  否则在caffe/build路径下直接进行以下操作:

1 cmake -DUSE_MPI=ON -DMPI_CXX_COMPILER=/path/to/your/openmpi/bin/mpicxx ..
2 # USE_MPI=ON即表示开启Open MPI
3 # -DMPI_CXX_COMPILER后的路径一定得是Open MPI的安装路径下的bin中的mpicxx路径,在/usr/bin下也有这个mpicxx,不要错写路径了
View Code

4. 安装caffe,在caffe根目录下执行以下操作:

1 make all -j8
2 make install
3 # 我在安装过程中,make all之后就不需要再make install
4 make runtest
5 # 同参考教程中一样,有两个test未通过
View Code

5. 编译Python接口:

  a. 添加环境变量:

1 gedit ~/.bashrc
View Code

  b. 在其中写入:

1 export PYTHONPATH=$PYTHONPATH:/path/to/your/caffe/python
View Code

  c. 使环境变量生效:

1 source ~/.bashrc
View Code

  d. 在caffe根目录下:

1 make pycaffe
2 # 教程中有加sudo,但是我没有加sudo也没有影响
View Code

  e. 测试Python接口,在终端输入以下命令:

1 python
2 import caffe
3 # 如果无错,则python接口编译成功
View Code

出现问题:

1. 安装caffe过程中,编译caffe时,输入以下命令出错:

1 cmake -DUSE_MPI=ON -DMPI_CXX_COMPILER=/path/to/your/openmpi/bin/mpicxx ..
View Code

  问题1:

 1 CMake Warning at /usr/local/opencv-3.1.0/cmake/OpenCVConfig.cmake:166 (message):
 2   Found OpenCV Windows Pack but it has no binaries compatible with your
 3   configuration.
 4 
 5   You should manually point CMake variable OpenCV_DIR to your build of OpenCV
 6   library.
 7 Call Stack (most recent call first):
 8   cmake/Dependencies.cmake:62 (find_package)
 9   CMakeLists.txt:31 (include)
10 
11 
12 CMake Error at cmake/Dependencies.cmake:62 (find_package):
13   Found package configuration file:
14 
15     /usr/local/opencv-3.1.0/cmake/OpenCVConfig.cmake
16 
17   but it set OpenCV_FOUND to FALSE so package "OpenCV" is considered to be
18   NOT FOUND.
19 Call Stack (most recent call first):
20   CMakeLists.txt:31 (include)
21 
22 
23 -- Configuring incomplete, errors occurred!
24 See also "/storage/student5/usr/local/caffe/build/CMakeFiles/CMakeOutput.log".
25 See also "/storage/student5/usr/local/caffe/build/CMakeFiles/CMakeError.log".
View Code

  解决方法:

    尝试一:在CMakeList.txt文件中加入set(OpenCV_DIR /path/to/your/OpenCV/build),该法无效;

    尝试二:退回到caffe根目录,然后make clean,暂时加入如下环境变量后重新从mkdir build && cd build开始,该法有效。

1 export OpenCV_DIR=/path/to/your/opencv/build
View Code

  问题2:

1 CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
2   Could NOT find Atlas (missing: Atlas_LAPACK_LIBRARY)
3 Call Stack (most recent call first):
4   /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE)
5   cmake/Modules/FindAtlas.cmake:43 (find_package_handle_standard_args)
6   cmake/Dependencies.cmake:74 (find_package)
7   CMakeLists.txt:31 (include)
View Code

  解决方法:

    尝试一:指定Atlas路径,退回到caffe根目录,然后make clean,暂时加入环境变量export Atlas_ROOT_DIR=/your/Atlas/Root,再重新从mkdir build && cd build开始,该法无效;

    尝试二:退回到caffe根目录,然后make clean,重新mkdir build && cd build开始,在终端输入以下命令后继续进行,该法有效。

1 cmake -DBLAS=open .
View Code

2. 当make all -j8时,

  问题1:

1 /usr/bin/ld: .build_release/examples/cpp_classification/classification.o: undefined reference to symbol '_ZN2cv6imreadERKNS_6StringEi'
2 /usr/local/lib/libopencv_imgcodecs.so.3.1: error adding symbols: DSO missing from command line
3 collect2: error: ld returned 1 exit status
4 make: *** [.build_release/examples/cpp_classification/classification.bin] Error 1
5 make: *** Waiting for unfinished jobs....
View Code

  解决方法:由于使用的是opencv-3.x,需要链接libopencv_imgcodercs.so,在Makefile文件中,line172处做如下修改:

1 LIBRARIES += glog gflags protobuf leveldb snappy 
2     lmdb boost_system hdf5_hl hdf5 m 
3     opencv_core opencv_highgui opencv_imgproc
View Code

  改为:

1 LIBRARIES += glog gflags protobuf leveldb snappy 
2     lmdb boost_system hdf5_hl hdf5 m 
3     opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
View Code

  问题2:

1 nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be 
2 removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
View Code

  解决方法:删除Makefile.config中的以下语句:

1 -gencode arch=compute_20,code=sm_20 
2 -gencode arch=compute_20,code=sm_21 
View Code

参考教程:

1. https://blog.csdn.net/whyerdiku/article/details/78842498 (Python+Matlab接口)

2. http://www.cnblogs.com/beihaidao/p/6866342.html (Python+Matlab接口)

3. https://blog.csdn.net/qq_21368481/article/details/81257265?tdsourcetag=s_pctim_aiomsg (Matlab接口)

原文地址:https://www.cnblogs.com/mantha/p/10278525.html