【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读

NeuralScale

2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks

来源: ChenBong 博客园

Institute：National Chiao Tung University
Author：Eugene Lee、Chen-Yi Lee (H40)
GitHub：https://github.com/eugenelet/NeuralScale
Citation：3

Introduction

提出了一种按照各层的敏感性, 进行layer-wise的缩放最终达到目标参数量的方法, 区别于uniform的缩放。

Motivation

Contribution

Method

进行 P个 epoch的模型预训练, 在预训练模型的基础上开始迭代剪枝

每次迭代剪枝后, 每一层可以获得一个数据点: (xi_{l}=left{ au, phi_{l} ight}) , 其中 ( au) 是模型总参数量, (phi_{l}) 是第 (l) 层的 filter个数

N次迭代后, 每一层可以获得N个数据点: (oldsymbol{xi}_{l}=left{left{ au^{(n)}, phi_{l}^{(n)} ight}_{n=1}^{N} ight})

迭代filter剪枝直到 filter总数 < 原始 filter总数的 (epsilon=0.05) 时, 结束剪枝

将每一层的数据点 (oldsymbol{xi}_{l}) 画出来, 就得到每一层 filter个数关于总参数量的敏感性曲线:

对曲线进行函数拟合:

(phi_{l}left( au mid alpha_{l}, eta_{l} ight)=alpha_{l} au^{eta_{l}}) ,

(ln phi_{l}left( au mid alpha_{l}, eta_{l} ight)=ln alpha_{l}+eta_{l} ln au)

所有层的 layer-wise filter数量记为: (Phi( au mid Theta)={phi_1, phi_2, ..., phi_l,}) , (Theta={alpha_1, eta_1, alpha_2, eta_2, ..., alpha_l, eta_l})

得到各层的拟合函数 (Phi( au mid Theta)={phi_1, phi_2, ..., phi_l,}) 以后, 为了得到目标参数量 (hat au) 下的 layer-wise filter数量, 只需要将 (hat au) 代入 (Phi(hat au mid Theta)) , 即可获得layer-wise filter数量

但此时的模型的实际总参数量 (h(f(oldsymbol{x} mid oldsymbol{W}, oldsymbol{Phi}(hat{ au} mid oldsymbol{Theta})))) 与目标 (hat au) 存在差距, 作者提出了, 从初始化 ( au=hat au) 开始, 对 ( au) 进行梯度下降, 找到一个合适的 ( au) , 使得模型实际总参数量 (h(f)) 精确等于 (hat au) , 作者将这个过程称为 Architecture Descent

Experiments

Setup

GPU: single 1080ti
CIFAR10 / CIFAR100
- pre-trian: 10epoch
- 迭代剪枝
- fine-tune?
  - 300 epochs
  - lr=0.1, decay by 10 at 100, 200, 250 epoch
  - weight decay=(5^{-4}) , ≈0.0016
TinyImageNet
- pre-trian: 10epoch
- 迭代剪枝
- fine-tune?
  - 150 epochs
  - lr=0.1, decay by 10 at 50, 100 epoch
  - weight decay=(5^{-4}) , ≈0.0016

Importance of Architecture Descent

横轴表示 ( au) 的SGD迭代次数, 纵轴表示层数, 颜色表示该层的卷积核个数:

【NeuralScale】2020-CVPR-NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks-论文阅读

NeuralScale

Introduction

Motivation

Contribution

Method

Experiments

Setup

Importance of Architecture Descent

Benchmarking of NeuralScale

param vs acc

latency vs acc

main result

Conclusion

Summary

Reference