如何估算模型训练T(FL)OPS efficiency

Naive方法

以Torch Vision ResNet50-v1.5为例。

Step 1: 获取模型的前向理论需求MACs(Multiply–ACcumulate)
可使用thop得到模型的前向MACS。使用如下代码可得Torch Vision ResNet50-v1.5的前向MACs为4.112G。

from torchvision.models import resnet50
from thop import profile, clever_format
import torch
model = resnet50()
input = torch.randn(1, 3, 224, 224)
macs, params = profile(model, inputs=(input,))
print(clever_format([macs, params], "%.3f"))

Step 2: 估算模型在某个实测性能下每秒需求的T(FL)OPS
估算公式以OpenAI AI and Compute估算公式为基础：

required_T(FL)OPS = (MACs per forward pass) * (2 (FL)OPs/MAC) * (3 for forward and backward pass) * (number of examples per second)

再由实测性能数据：

accelerator data type bs IPS

V100 FP16 256 1325

V100 FP32 128 303.1

以V100 FP16训练为例，有：
MACs per forward pass = 4.112G
number of examples per second = 1325
required_(FL)OPS = 4.112G * 2 * 3 * 1325 = 32.69 T
汇总结果为：

accelerator data type bs IPS required T(FL)OPS

V100 FP16 256 1325 32.69

V100 FP32 128 303.1 7.478
Step 3: 估算模型理论峰值算力利用率
- 理论峰值算力
- 理论峰值算力利用率
  
  required_T(FL)OPS / peak_T(FL)OPS
  
  accelerator data type bs IPS required TF(L)OPS peak ratio
  
  V100 FP16 256 1325 32.69 29.2%
  
  V100 FP32 128 303.1 7.478 53%

accelerator	data type	bs	IPS	required T(FL)OPS
V100	FP16	256	1325	32.69
V100	FP32	128	303.1	7.478

accelerator	data type	bs	IPS	required TF(L)OPS	peak ratio
V100	FP16	256	1325	32.69	29.2%
V100	FP32	128	303.1	7.478	53%

如何估算模型训练T(FL)OPS efficiency

References