神经网络的6种有监督训练算法

神经网络可以采用有监督和无监督两种方式来进行训练。传播训练算法是一种非常有效的有监督训练算法。6种传播算法如下：

　　　　　　　　　　　　　　　　　　1·Backpropagation Training

　　　　　　　　　　　　　　　　　　2·Quick Propagation Training (QPROP)

　　　　　　　　　　　　　　　　　　3·Manhattan Update Rule

　　　　　　　　　　　　　　　　　　4·Resilient Propagation Training (RPROP)

　　　　　　　　　　　　　　　　　　5·Scaled Conjugate Gradient (SCG)

　　　　　　　　　　　　　　　　　　6·Levenberg Marquardt (LMA)

1、反向传播算法(Backpropagation Training)

　　Backpropagation is one of the oldest training methods for feedforward neural networks. Backpropagation uses two parameters in conjunction with the gradient descent calculated in the previous section. The first parameter is the learning rate which is essentially a percent that determines how directly the gradient descent should be applied to the weight matrix. The gradient is multiplied by the learning rate and then added to the weight matrix. This slowly optimizes the weights to values that will produce a lower error.

　　One of the problems with the backpropagation algorithm is that the gradient descent algorithm will seek out local minima. These local minima are points of low error, but may not be a global minimum. The second parameter provided to the backpropagation algorithm helps the backpropagation out of local minima. The second parameter is called momentum. Momentum specifies to what degree the previous iteration weight changes should be applied to the current iteration.

　　The momentum parameter is essentially a percent, just like the learning rate. To use momentum, the backpropagation algorithm must keep track of what changes were applied to the weight matrix from the previous iteration. These changes will be reapplied to the current iteration, except scaled by the momentum parameters. Usually the momentum parameter will be less than one, so the weight changes from the previous training iteration are less significant than the changes calculated for the current iteration. For example, setting the momentum to 0.5 would cause 50% of the previous training iteration's changes to be applied to the weights for the current weight matrix.

　　总结：最早提出的方法，需要提供学习速率和动量参数

2、曼哈顿跟新规则（Manhattan Update Rule）

　　One of the problems with the backpropagation training algorithm is the degree to which the weights are changed. The gradient descent can often apply too large of a change to the weight matrix. The Manhattan Update Rule and resilient propagation training algorithms only use the sign of the gradient. The magnitude is discarded. This means it is only important if the gradient is positive, negative or near zero.

　　For the Manhattan Update Rule, this magnitude is used to determine how to update the weight matrix value. If the magnitude is near zero, then no change is made to the weight value. If the magnitude is positive, then the weight value is increased by a specific amount. If the magnitude is negative, then the weight value is decreased by a specific amount. The amount by which the weight value is changed is defined as a constant. You must provide this constant to the Manhattan Update Rule algorithm, like 0.00001.Manhattan propagation generally requires a small learning rate.

　　总结：提供学习速率参数，权重矩阵的该变量是一个固定的值，解决了采用梯度下降算法计算得到的权重改变量往往过大的问题。

3、快速传播算法（Quick Propagation Training ，QPROP）

　　Quick propagation (QPROP) is another variant of propagation training. Quick propagation is based on Newton's Method, which is a means of finding a function's roots. This can be adapted to the task of minimizing the error of a neural network. Typically QPROP performs much better than backpropagation. The user must provide QPROP with a learning rate parameter. However, there is no momentum parameter as QPROP is typically more tolerant of higher learning rates. A learning rate of 2.0 is generally a good starting point.

　　总结：基于牛顿法的QPROP需要学习速率参数，不需要动量参数。

4、弹性传播算法（Resilient Propagation Training ，RPROP）

　　The resilient propagation training (RPROP) algorithm is often the most efficient training algorithm for supervised feedforward neural networks. One particular advantage to the RPROP algorithm is that it requires no parameter setting before using it. There are no learning rates, momentum values or update constants that need to be determined. This is good because it can be difficult to determine the exact optimal learning rate.

The RPROP algorithms works similar to the Manhattan Update Rule in that only the magnitude of the descent is used. However, rather than using a fixed constant to update the weight values, a much more granular approach is used. These deltas will not remain fixed like in the Manhattan Update Rule or backpropagation algorithm. Rather, these delta values will change as training progresses.

　　The RPROP algorithm does not keep one global update value, or delta. Rather, individual deltas are kept for every weight matrix value. These deltas are first initialized to a very small number. Every iteration through the RPROP algorithm will update the weight values according to these delta values. However, as previously mentioned, these delta values do not remain fixed. The gradient is used to determine how they should change using the magnitude to determine how the deltas should be modified further. This allows every individual weight matrix value to be individually trained, an advantage not provided by either the backpropagation algorithm or the Manhattan Update Rule.

　　总结：最有效的有监督前馈神经网络训练算法，不需要提供参数

5、量化共轭梯度法（Scaled Conjugate Gradient ，SCG）

　　Scaled Conjugate Gradient (SCG) is a fast and efficient training. SCG is based on a class of optimization algorithms called Conjugate Gradient Methods (CGM). SCG is not applicable for all data sets. When it is used within its applicability, it is quite efficient. Like RPROP, SCG is at an advantage as there are no parameters that must be set.

　　总结：不需要参数，但是不适用所有的数据集

6、LM算法（Levenberg Marquardt ，LMA）

　　The Levenberg Marquardt algorithm (LMA) is a very efficient training method for neural networks. In many cases, LMA will outperform Resilient Propagation. LMA is a hybrid algorithm based on both Newton's Method and gradient descent (backpropagation)，integrating the strengths of both. Gradient descent is guaranteed to converge to a local minimum, albeit slowly. GNA is quite fast but often fails to converge. By using a damping factor to interpolate between the two, a hybrid method is created.

　　总结：最有效的训练算法，不需要提供参数。LM算法是牛顿法和梯度下降算法相结合的一种混合算法。梯度下降算法保证收敛到局部极小值，但比较缓慢；牛顿法很快，但很容易不收敛。通过使用阻尼因子在两者之间插值，生成了LM算法，整合了两者的优势。

--------参考《Encog3java-user》