Caffe2 初识

官方网址上的这两句话很好的阐述了Caffe2的特点“A New Lightweight, Modular, and Scalable Deep Learning Framework”、“Code once, run anywhere”。我比较关心的第一个问题是，怎么利用多个GPU加速训练呢？

Synchronous SGD

There are multiple ways to utilize multiple GPUs or machines to train models. Synchronous SGD, using Caffe2’s data parallel model, is the simplest and easiest to understand: each GPU will execute exactly same code to run their share of the mini-batch. Between mini-batches, we average the gradients of each GPU and each GPU executes the parameter update in exactly the same way. At any point in time the parameters have same values on each GPU. Another way to understand Synchronous SGD is that it allows increasing the mini-batch size. Using 8 GPUS to run a batch of 32 each is equivalent to one GPU running a mini-batch of 256.

之前一直不是很明白应用多核GPU加速的原理是什么，读了上述一段话之后，豁然开朗。真的希望自己有一天，也能用寥寥数语将一个复杂概念解释清楚。上述英文的要点可以归结如下（假如我们有八个GPU）：

每一个GPU执行相同的代码；
每一个GPU的输入数据不同；
每一个GPU的参数是相同的，gradients是各个GPU得到结果的平均值，每一个GPU都用这一梯度更新相应参数。

如果在一个GPU上我们设置Batch size为256，在并行训练时，其等效设置为，这八个GPU，每个GPU的Batch size为32。这可以概括为“顺序执行转为并行执行”。