如何估算模型训练T(FL)OPS efficiency

Naive方法

以Torch Vision ResNet50-v1.5为例。

  • Step 1: 获取模型的前向理论需求MACs(Multiply–ACcumulate)
    可使用thop得到模型的前向MACS。使用如下代码可得Torch Vision ResNet50-v1.5的前向MACs为4.112G。

    from torchvision.models import resnet50
    from thop import profile, clever_format
    import torch
    model = resnet50()
    input = torch.randn(1, 3, 224, 224)
    macs, params = profile(model, inputs=(input,))
    print(clever_format([macs, params], "%.3f"))
  • Step 2: 估算模型在某个实测性能下每秒需求的T(FL)OPS
    估算公式以OpenAI AI and Compute估算公式为基础:

    required_T(FL)OPS = (MACs per forward pass) * (2 (FL)OPs/MAC) * (3 for forward and backward pass) * (number of examples per second)

    再由实测性能数据:

    accelerator data type bs IPS
    V100 FP16 256 1325
    V100 FP32 128 303.1


    以V100 FP16训练为例,有:
    MACs per forward pass = 4.112G
    number of examples per second = 1325
    required_(FL)OPS = 4.112G * 2 * 3 * 1325 = 32.69 T
    汇总结果为:

    accelerator data type bs IPS required T(FL)OPS
    V100 FP16 256 1325 32.69
    V100 FP32 128 303.1 7.478
  • Step 3: 估算模型理论峰值算力利用率

    • 理论峰值算力

    • 理论峰值算力利用率

      required_T(FL)OPS / peak_T(FL)OPS

      accelerator data type bs IPS required TF(L)OPS peak ratio
      V100 FP16 256 1325 32.69 29.2%
      V100 FP32 128 303.1 7.478 53%

References

  1. NV Training Performance Benchmark

  2. thop

  3. OpenAI AI and Compute

原文地址:https://www.cnblogs.com/Matrix_Yao/p/15747398.html