Jetson平台使用Inter 神经网络计算棒

Jetson平台使用Inter 神经网络计算棒（Neural Compute Stick, NCS)

工作方式

NCS工作方式分为两种，一种是在主机上将训练好的模型生成NCS可执行graph文件，该文件用于推理过程；另一种是在树莓派、Jetson TX2等便携式计算机上加速推理过程。

主机端使用NCS

安装

将神经计算棒插入主机中，在终端执行以下命令：

git clone https://github.com/movidius/ncsdk
cd ncsdk
make install

make install的作用如下：

检查安装Tensorflow；
检查安装Caffe(SSD-caffe)；
编译安装ncsdk（不包含inference模块，只包含mvNCCompile相关模块，用来将Caffe或Tensorflow模型转成NCS graph的）

之后执行：

make example

程序顺利运行不报错的话，就说明已经安装成功了。

使用

将训练好的模型生成NCS可以执行的graph文件，在终端执行以下命令：

mvNCCompile network.prototxt -w network.caffemodel -s MaxNumberOfShaves -in InputNodeName -on OutputNodeName-is InputWidth InputHeight -o OutputGraphFilename

network.prototxt：.prototxt文件的路径
-w network.caffemode：模型文件的路径
-s MaxNumberOfShaves：1, 2, 4, 8, 12。默认为12
-in InputNodeName：选择指定一个特定的输入图层（它将匹配prototxt文件中的名称，可选项）
-on OutputNodeName：默认情况下网络是通过输出张量进行处理的，这个选项允许用户在网络中选择一个替代端点（可选项）
-is InputWidth InputHeight：输入尺寸，需要与网络匹配
-o OutputGraphFilename：生成的graph文件存储路径

在Jetson TX2上安装NCS

在TX2上只完成推理（Inference）过程，所以只需安装API-only模式即可，将NCS插入到TX2上。

安装依赖

sudo apt-get install -y libusb-1.0-0-dev libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler libatlas-base-dev git automake byacc lsb-release cmake libgflags-dev libgoogle-glog-dev liblmdb-dev swig3.0 graphviz libxslt-dev libxml2-dev gfortran python3-dev python-pip python3-pip python3-setuptools python3-markdown python3-pillow python3-yaml python3-pygraphviz python3-h5py python3-nose python3-lxml python3-matplotlib python3-numpy python3-protobuf python3-dateutil python3-skimage python3-scipy python3-six python3-networkx python3-tk

下载源码

mkdir ~/workspace
cd ~/workspace
git clone https://github.com/movidius/ncsdk

编译和安装NCSDK API框架

cd ~/workspace/ncsdk/api/src
make
sudo make install

测试

cd ~/workspace
git clone https://github.com/movidius/ncappzoo
cd ncappzoo/apps/hello_ncs_py
python3 hello_ncs.py

出现以下结果：

Hello NCS! Device opened normally.

Goodbye NCS! Device closed normally.

NCS device working.

API-only模式安装成功。

TX2利用NCS加速推理使用

参数预定义：

GRAPH_PATH：graph文件路径；
IMAGE_PATH：要分类的图片的路径；
IMAGE_DIM：由选择的神经网络定义的图像尺寸；例：GoogLeNet uses 224x224 pixels, AlexNet uses 227x227 pixels
IMAGE_STDDEV：由选择的神经网络定义的标准差（标度值）；例：GoogLeNet uses no scaling factor, InceptionV3 uses 128 (stddev = 1/128)
IMAGE_MEAN：平均减法是深度学习中常用的一种技术，用于对数据进行中心处理。例：ILSVRC dataset, the mean is B = 102 Green = 117 Red = 123

使用NCS做图像分类的5个步骤：

从mvnc库中引入mvncapi模块

import mvnc.mvncapi as mvnc

01

将NCS插入应用处理器（Ubuntu笔记本电脑/台式机）USB端口时，它将自身列为USB设备。通过调用API来查找枚举的NCS设备:

# Look for enumerated Intel Movidius NCS device(s); quit program if none found.
devices = mvnc.EnumerateDevices()
if len( devices ) == 0:
	print( 'No devices found' )
quit()

如果插入了多个NCS，还需要选择一个NCS并打开：

# Get a handle to the first enumerated device and open it
device = mvnc.Device( devices[0] )
device.OpenDevice()

02

加载graph文件到NCS

# Read the graph file into a buffer
with open( GRAPH_PATH, mode='rb' ) as f:
blob = f.read()
# Load the graph buffer into the NCS
graph = device.AllocateGraph( blob )

03

将图像加载到Intel Movidius NCS上以运行推理

图像预处理：
1.调整图像大小/裁剪图像以匹配预先训练的网络定义的尺寸。例：GoogLeNet uses 224x224 pixels, AlexNet uses 227x227 pixels.
2.每个通道的平均值（蓝色，绿色和红色）从整个数据集中减去。这是深度学习中常用的一种技术，可以集中数据。
3.将图像转换为半精度浮点数（fp16）数组（NCS输入数据格式为fp16），并使用LoadTensor函数调用将图像加载到NCS上。skimage库可以在一行代码中完成此操作。

# Read & resize image (Image size is defined during training)
img = print_img = skimage.io.imread( IMAGES_PATH )
img = skimage.transform.resize( img, IMAGE_DIM, preserve_range=True )

# Convert RGB to BGR [skimage reads image in RGB, but Caffe uses BGR]
img = img[:, :, ::-1]

# Mean subtraction & scaling [A common technique used to center the data]
img = img.astype( numpy.float32 )
img = ( img - IMAGE_MEAN ) * IMAGE_STDDEV

# Load the image as a half-precision floating point array
graph.LoadTensor( img.astype( numpy.float16 ), 'user object' )

04

从NCS读取并打印推理结果

# Get the results from NCS
output, userobj = graph.GetResult()

# Print the results
print('
------- predictions --------')

labels = numpy.loadtxt( LABELS_FILE_PATH, str, delimiter = '	' )
order = output.argsort()[::-1][:6]
for i in range( 0, 5 ):
	print ('prediction ' + str(i) + ' is ' + labels[order[i]])

# Display the image on which inference was performed
skimage.io.imshow( IMAGES_PATH )
skimage.io.show( )

05

卸载图形并关闭设备

为了避免内存泄漏和/或分段错误，我们应该关闭所有打开的文件或资源并释放所有使用的内存。

graph.DeallocateGraph()
device.CloseDevice()

运行Demo

Demo采用Adrian Rosebrock博客Real-time object detection on the Raspberry Pi with the Movidius NCS - PyImageSearch里的程序（https://www.pyimagesearch.com/2018/02/19/real-time-object-detection-on-the-raspberry-pi-with-the-movidius-ncs/），这个程序基于Mobilenet-ssd模型对视频流做实时检测，如图2所示，Demo采用USB摄像头读取实时视频。图3为检测结果。

参考

https://github.com/movidius/ncsdk

https://movidius.github.io/blog/ncs-apps-on-rpi/

https://movidius.github.io/blog/ncs-image-classifier/

https://www.pyimagesearch.com/2018/02/19/real-time-object-detection-on-the-raspberry-pi-with-the-movidius-ncs/