caffe中的sgd,与激活函数(activation function)

caffe中activation function的形式，直接决定了其训练速度以及SGD的求解。

在caffe中，不同的activation function对应的sgd的方式是不同的，因此，在配置文件中指定activation layer的type，目前caffe中用的最多的是relu的activation function.

caffe中，目前实现的activation function有以下几种：

absval, bnll, power, relu, sigmoid, tanh等几种，分别有单独的layer层。其数学公式分别为:

算了，这部分我不解释了，直接看caffe的tutorial吧

ReLU / Rectified-Linear and Leaky-ReLU

LayerType: RELU
CPU implementation: ./src/caffe/layers/relu_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/relu_layer.cu
Parameters (ReLUParameter relu_param)
- Optional
  - negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.

Sample (as seen in ./examples/imagenet/imagenet_train_val.prototxt)

layers {
  name: "relu1"
  type: RELU
  bottom: "conv1"
  top: "conv1"
}

Given an input value x, The RELU layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption.

Sigmoid

LayerType: SIGMOID
CPU implementation: ./src/caffe/layers/sigmoid_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/sigmoid_layer.cu

Sample (as seen in ./examples/imagenet/mnist_autoencoder.prototxt)

layers {
  name: "encode1neuron"
  bottom: "encode1"
  top: "encode1neuron"
  type: SIGMOID
}

The SIGMOID layer computes the output as sigmoid(x) for each input element x.

TanH / Hyperbolic Tangent

LayerType: TANH
CPU implementation: ./src/caffe/layers/tanh_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/tanh_layer.cu

Sample

layers {
  name: "layer"
  bottom: "in"
  top: "out"
  type: TANH
}

The TANH layer computes the output as tanh(x) for each input element x.

Absolute Value

LayerType: ABSVAL
CPU implementation: ./src/caffe/layers/absval_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/absval_layer.cu

Sample

layers {
  name: "layer"
  bottom: "in"
  top: "out"
  type: ABSVAL
}

The ABSVAL layer computes the output as abs(x) for each input element x.

Power

LayerType: POWER
CPU implementation: ./src/caffe/layers/power_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/power_layer.cu
Parameters (PowerParameter power_param)
- Optional
  - power [default 1]
  - scale [default 1]
  - shift [default 0]

Sample

layers {
  name: "layer"
  bottom: "in"
  top: "out"
  type: POWER
  power_param {
    power: 1
    scale: 1
    shift: 0
  }
}

The POWER layer computes the output as (shift + scale * x) ^ power for each input element x.

BNLL

LayerType: BNLL
CPU implementation: ./src/caffe/layers/bnll_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/bnll_layer.cu

Sample

layers {
  name: "layer"
  bottom: "in"
  top: "out"
  type: BNLL
}

The BNLL (binomial normal log likelihood) layer computes the output as log(1 + exp(x)) for each input element x.

转载请注明出处，谢谢。