LibTorch实战五：模型序列化

一、在C++环境中加载一个TORCHSCRIP

官网：https://pytorch.org/tutorials/advanced/cpp_export.html

As its name suggests, the primary interface to PyTorch is the Python programming language. While Python is a suitable and preferred language for many scenarios requiring dynamism and ease of iteration, there are equally many situations where precisely these properties of Python are unfavorable. One environment in which the latter often applies is production – the land of low latencies and strict deployment requirements. For production scenarios, C++ is very often the language of choice, even if only to bind it into another language like Java, Rust or Go. The following paragraphs will outline the path PyTorch provides to go from an existing Python model to a serialized representation that can be loaded and executed purely from C++, with no dependency on Python.

　　一般地，类似python的脚本语言可用于算法快速实现、验证；但在产品化过程中，一般采用效率更高的C++语言，下面的工作就是将模型从python环境中移植到c++环境。

Step1：将pytorch模型转为torch scrip类型的模型

A PyTorch model’s journey from Python to C++ is enabled by Torch Script, a representation of a PyTorch model that can be understood, compiled and serialized by the Torch Script compiler. If you are starting out from an existing PyTorch model written in the vanilla “eager” API, you must first convert your model to Torch Script. In the most common cases, discussed below, this requires only little effort. If you already have a Torch Script module, you can skip to the next section of this tutorial.

通过TorchSript，我们可将pytorch模型从python转为c++。那么，什么是TorchScript呢？其实，它也是Pytorch模型的一种，这种模型能够被TorchScript的编译器识别读取、序列化。一般地，在处理模型过程中，我们都会先将模型转为torch script格式，例如：".pt" -> "yolov5x.torchscript.pt"

There exist two ways of converting a PyTorch model to Torch Script. The first is known as tracing, a mechanism in which the structure of the model is captured by evaluating it once using example inputs, and recording the flow of those inputs through the model. This is suitable for models that make limited use of control flow. The second approach is to add explicit annotations to your model that inform the Torch Script compiler that it may directly parse and compile your model code, subject to the constraints imposed by the Torch Script language.

转为torchscript格式有两种方法：一是函数torch.jit.trace；二是函数torch.jit.script。

torch.jit.trace原理：基于跟踪机制，需要输入一张图(0矩阵、张量亦可)，模型会对输入的tensor进行处理，并记录所有张量的操作，torch::jit::trace能够捕获模型的结构、参数并保存。由于跟踪仅记录张量上的操作，因此它不会记录任何控制流操作，如if语句或循环。

torch.jit.script原理：需要开发者先定义好神经网络模型结构，即：提前写好 class MyModule(torch.nn.Module)，这样TorchScript可以根据定义好的MyModule来解析网络结构。

基于Tracing的方法来转换为Torch Script

如下代码，给 torch.jit.trace 函数输入一个指定size的随机张量、ResNet18的网络模型，得到一个类型为 torch.jit.ScriptModule 的对象，即：traced_script_module

 1 import torch
 2 import torchvision
 3 
 4 # An instance of your model.
 5 model = torchvision.models.resnet18()
 6 
 7 # An example input you would normally provide to your model's forward() method.
 8 example = torch.rand(1, 3, 224, 224)
 9 
10 # Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
11 traced_script_module = torch.jit.trace(model, example)

经过上述处理，traced_script_module变量已经包含网络的结构和参数，可以直接用于推理，如下代码：

1 In[1]: output = traced_script_module(torch.ones(1, 3, 224, 224))
2 In[2]: output[0, :5]
3 Out[2]: tensor([-0.2698, -0.0381,  0.4023, -0.3010, -0.0448], grad_fn=<SliceBackward>)

基于Annotating (Script)的方法来转换为Torch Script

如果你的模型中有类似于控制流操作(例如：if or for循环)，基于上述tracing的方式不再适用，这种方式会排上用场，下面以vanilla模型为例子，注：下面网络结构中有个if判断。

 1 # 定义一个vanilla模型
 2 import torch
 3 
 4 class MyModule(torch.nn.Module):
 5     def __init__(self, N, M):
 6         super(MyModule, self).__init__()
 7         self.weight = torch.nn.Parameter(torch.rand(N, M))
 8 
 9     def forward(self, input):
10         if input.sum() > 0:
11           output = self.weight.mv(input)
12         else:
13           output = self.weight + input
14         return output

这里调用 torch.jit.script 来获取 torch.jit.ScriptModule 类型的对象，即：sm

 1 class MyModule(torch.nn.Module):
 2     def __init__(self, N, M):
 3         super(MyModule, self).__init__()
 4         self.weight = torch.nn.Parameter(torch.rand(N, M))
 5 
 6     def forward(self, input):
 7         if input.sum() > 0:
 8           output = self.weight.mv(input)
 9         else:
10           output = self.weight + input
11         return output
12 
13 my_module = MyModule(10,20)
14 sm = torch.jit.script(my_module)

Step2：序列化torch.jit.ScriptModule类型的对象，并保存为文件

注：上述的tacing和script方法都将得到一个类型为torch.jit.ScriptModule的对象(这里简单记为：ScriptModule )，该对象就是常规的前向传播模块。不管是哪一种方法，此时，只需要将ScriptModule进行序列化保存就行。这里保存的是上述基于Tracing得到的ResNet推理模块traced_script_module。

traced_script_module.save("traced_resnet_model.pt") # 序列化，保存
# 保存后可用工具：https://netron.app/ 进行可视化

同理，如下是保存基于Annotating得到推理模块my_module 后续，在libtorch中加载上述保存的模型文件就行，不再依赖任何python包。

1 my_module.save("my_module_model.pt") # 为什么不是sm

Step3：在libtorch中加载ScriptModule模型

如何配置libtorh？，我这里仅贴下vs环境下的属性表：

 1 include：
 2 D:ThirdPartylibtorch-win-shared-with-deps-1.7.1+cu110libtorchinclude
 4 D:ThirdPartylibtorch-win-shared-with-deps-1.7.1+cu110libtorchinclude	orchcsrcapiinclude
 5 
 7 lib：
 8 D:ThirdPartylibtorch-win-shared-with-deps-1.7.1+cu110libtorchlib
 9 
11 链接器：
12 c10.lib
13 c10_cuda.lib
14 torch.lib
15 torch_cpu.lib
16 torch_cuda.lib
17 
18 环境变量：
19 D:ThirdPartylibtorch-win-shared-with-deps-1.7.1+cu110libtorchlib

以下c++代码加载上述模型文件

 1 #include<torch/torch.h>
 2 #include<torch/script.h>
 3 #include<iostream>
 4 #include<memory>
 5 
 6 int main()
 7 {
 8     torch::jit::script::Module module;
 9     std::string str = "traced_resnet_model.pt";
10     try
11     {
12         module = torch::jit::load(str);
13     }
14     catch (const c10::Error& e)
15     {
16         std::cerr << "12313";
17         return -1;
18     }
19 
20     // 创建一个输入
21     std::vector<torch::jit::IValue> inputs;
22     inputs.push_back(torch::ones({ 1, 3, 224, 224 }));
23     // 推理
24     at::Tensor output = module.forward(inputs).toTensor();
25     std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '
';
26 
27     return 1;
28 }

最后汇总下：

python模型的序列化、保存代码：

 1 import torchvision
 2 import torch
 3 
 4 model = torchvision.models.resnet18()
 5 
 6 example = torch.rand(1, 3, 224, 224)
 7 
 8 traced_script_module = torch.jit.trace(model, example)
 9 
10 output = traced_script_module(torch.ones(1, 3, 224, 224))
11 
12 #traced_script_module.save("traced_resnet_model.pt") # 和下面等价，格式名称不同，仅此而已，在libtorch中是一样的
13 traced_script_module.save("traced_resnet_model.torchscript.pt")
14 print()

libtorch的模型加载，推理代码：

 1 #include<torch/torch.h>
 2 #include<torch/script.h>
 3 #include<iostream>
 4 #include<memory>
 5 
 6 int main()
 7 {
 8     torch::jit::script::Module module;
 9     std::string str = "traced_resnet_model.pt"; 
10     //std::string str = "traced_resnet_model.torchscript.pt"; // 和上面等价，模型格式而已
11     try
12     {
13         module = torch::jit::load(str);
14     }
15     catch (const c10::Error& e)
16     {
17         std::cerr << "12313";
18         return -1;
19     }
20 
21     // 创建一个输入
22     std::vector<torch::jit::IValue> inputs;
23     inputs.push_back(torch::ones({ 1, 3, 224, 224 }));
24     // 推理
25     at::Tensor output = module.forward(inputs).toTensor();
26     std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '
';
27 
28     return 1;
29 }

reference：[1] https://pytorch.org/tutorials/advanced/cpp_export.html

PyTorch模型部署建议方案：

[2] https://blog.csdn.net/zhoumoon/article/details/104850615

[3] torch.jit.trace & torch.jit.script

https://www.dazhuanlan.com/2020/02/18/5e4b46eb099e3/

CV&DL