TVM交叉编译和远程RPC

TVM交叉编译和远程RPC

本文介绍了TVM中使用RPC的交叉编译和远程设备执行。

使用交叉编译和RPC，可以在本地计算机上编译程序，然后在远程设备上运行它。当远程设备资源受到限制时（如Raspberry Pi和移动平台），此功能很有用。本文将使用Raspberry Pi作为CPU示例，并使用Firefly-RK3399作为OpenCL示例。

在设备上构建TVM运行时

第一步是在远程设备上构建TVM运行时。

本文所有指令都应在目标设备（例如Raspberry Pi）上执行。假设目标正在运行Linux。

由于在本地计算机上进行编译，因此远程设备仅用于运行生成的代码。只需要在远程设备上构建TVM运行时。

git clone --recursive https://github.com/apache/tvm tvm

cd tvm

make runtime -j2

成功构建运行时后，需要在~/.bashrc文件中设置环境变量。可以~/.bashrc 使用进行编辑并添加以下行（假设TVM目录位于）：vi ~/.bashrc~/tvm

export PYTHONPATH=$PYTHONPATH:~/tvm/python

要更新环境变量，执行。source ~/.bashrc

在设备上设置RPC服务器

要启动RPC服务器，在远程设备上运行以下命令（在本示例中为Raspberry Pi）。

python -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090

如果看到下面的行，则表明RPC服务器已在设备上成功启动。

INFO:root:RPCServer: bind to 0.0.0.0:9090

在本地计算机上声明并交叉编译内核

现在，返回安装了完整TVM（带有LLVM）的本地计算机。

将在本地计算机上声明一个简单的内核：

import numpy as np

import tvm

from tvm import te

from tvm import rpc

from tvm.contrib import utils

n = tvm.runtime.convert(1024)

A = te.placeholder((n,), name="A")

B = te.compute((n,), lambda i: A[i] + 1.0, name="B")

s = te.create_schedule(B.op)

然后交叉编译内核。对于Raspberry Pi 3B，目标应该是'llvm -mtriple = armv7l-linux-gnueabihf'，在这里使用'llvm'来使本文可在网页构建服务器上运行。参见以下块中的详细说明。

local_demo = True

if local_demo:

    target = "llvm"

else:

    target = "llvm -mtriple=armv7l-linux-gnueabihf"

func = tvm.build(s, [A, B], target=target, name="add_one")

# save the lib at a local temp folder

temp = utils.tempdir()

path = temp.relpath("lib.tar")

func.export_library(path)

Readme

要运行这个教程与真正的远程设备，更改local_demo 为False，并取代target在build与三联供设备适当的目标。目标三元组对于不同的设备可能有所不同。例如，它适用于Raspberry Pi 3B和 RK3399。'llvm -mtriple=armv7l-linux-gnueabihf''llvm -mtriple=aarch64-linux-gnu'

可以通过在设备上运行并查询以（）开头的行）来查询目标（尽管可能仍然是宽松的配置。）gcc -vTarget:

此外-mtriple，还可以设置别的编译选项，例如：

-mcpu = <cpuname>

在当前架构中指定要为其生成代码的特定芯片。默认情况下，这是从目标三元组推断出来的，并自动检测到当前体系结构。

-mattr = a1，+ a2，-a3，...

覆盖或控制目标的特定属性，例如是否启用SIMD操作。默认属性集由当前CPU设置。要获取可用属性的列表，可以执行以下操作：

llc -mtriple=<your device target triple> -mattr=help

这些选项与llc一致。建议设置目标三元组和功能集以包含可用的特定功能，充分利用开发板的功能。可以从《交叉编译的LLVM指南》中找到有关交叉编译属性的更多详细信息。

通过RPC远程运行CPU内核

展示了如何在远程设备上运行生成的CPU内核。首先，从远程设备获取RPC会话。

if local_demo:

    remote = rpc.LocalSession()

else:

    # The following is my environment, change this to the IP address of your target device

    host = "10.77.1.162"

    port = 9090

    remote = rpc.connect(host, port)

将库上传到远程设备，然后调用设备本地编译器以重新链接。现在func是一个远程模块对象。

remote.upload(path)

func = remote.load_module("lib.tar")

# create arrays on the remote device

ctx = remote.cpu()

a = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), ctx)

b = tvm.nd.array(np.zeros(1024, dtype=A.dtype), ctx)

# the function will run on the remote device

func(a, b)

np.testing.assert_equal(b.asnumpy(), a.asnumpy() + 1)

当要评估远程设备上内核的性能时，避免网络开销很重要。 time_evaluator将返回一个远程函数，该函数多次运行该函数，测量该远程设备上的每次运行成本，并返回测得的成本。排除网络开销。

time_f = func.time_evaluator(func.entry_name, ctx, number=10)

cost = time_f(a, b).mean

print("%g secs/op" % cost)

输出：

1.161e-07 secs/op

通过RPC远程运行OpenCL内核

对于远程OpenCL设备，工作流程与上面的工作流程几乎相同。可以定义内核，上传文件并通过RPC运行。

Raspberry Pi不支持OpenCL，以下代码在Firefly-RK3399上进行了测试。可以按照本文为RK3399设置操作系统和OpenCL驱动程序。

另外，需要在rk3399板上启用OpenCL来构建运行时。在TVM根目录中，执行

cp cmake/config.cmake .

sed -i "s/USE_OPENCL OFF/USE_OPENCL ON/" config.cmake

make runtime -j4

以下函数显示了如何远程运行OpenCL内核

def run_opencl():

    # NOTE: This is the setting for my rk3399 board. You need to modify

    # them according to your environment.

    target_host = "llvm -mtriple=aarch64-linux-gnu"

    opencl_device_host = "10.77.1.145"

    opencl_device_port = 9090

    # create schedule for the above "add one" compute declaration

    s = te.create_schedule(B.op)

    xo, xi = s[B].split(B.op.axis[0], factor=32)

    s[B].bind(xo, te.thread_axis("blockIdx.x"))

    s[B].bind(xi, te.thread_axis("threadIdx.x"))

    func = tvm.build(s, [A, B], "opencl", target_host=target_host)

    remote = rpc.connect(opencl_device_host, opencl_device_port)

    # export and upload

    path = temp.relpath("lib_cl.tar")

    func.export_library(path)

    remote.upload(path)

    func = remote.load_module("lib_cl.tar")

    # run

    ctx = remote.cl()

    a = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), ctx)

    b = tvm.nd.array(np.zeros(1024, dtype=A.dtype), ctx)

    func(a, b)

    np.testing.assert_equal(b.asnumpy(), a.asnumpy() + 1)

    print("OpenCL test passed!")

概括

本文提供了TVM中的交叉编译和RPC功能的演练。

在远程设备上设置RPC服务器。
设置目标设备配置以交叉编译本地计算机上的内核。
通过RPC API远程上载和运行内核。

人工智能芯片与自动驾驶