使用python接口, 另外一种方式就是使用tf-trt,优化后的模型还是pb。优化的过程主要是一些层的合并啊之类的,加速结果不是特别明显,测了两个网络,
加速了10%的样子。优化后仍是pb,因此可以继续用tfserving。
keras/tf model -> pb model ->(trt优化model)
或者已经是savedmodel,可直接通 saved_model_cli来转换,用于后续的tfserving
参考:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example
https://github.com/srihari-humbarwadi/TensorRT-for-keras
https://github.com/jeng1220/KerasToTensorRT
https://github.com/NVIDIA-AI-IOT/tf_trt_models
https://github.com/WeJay/TensorRTkeras
https://github.com/tensorflow/tensorrt/tree/master/tftrt/examples/image-classification
https://github.com/srihari-humbarwadi/TensorRT-for-keras
https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/examples/classification/classification.ipynb
https://developer.ibm.com/linuxonpower/2019/08/05/using-tensorrt-models-with-tensorflow-serving-on-wml-ce/
讨论区
https://devtalk.nvidia.com/default/board/304/tensorrt/
其他还有C++端的接口,暂是没用到
https://zhuanlan.zhihu.com/p/85365075
https://zhuanlan.zhihu.com/p/86827710
http://manaai.cn/aicodes_detail3.html?id=48