神经网络——反向传播算法

神经网络的损失函数为

[Jleft( Theta  ight) =  - frac{1}{m}left[ {sumlimits_{i = 1}^m {sumlimits_{k = 1}^k {y_k^{left( i ight)}log {{left( {{h_Theta }left( {{x^{left( i ight)}}} ight)} ight)}_k} + left( {1 - y_k^{left( i ight)}} ight)log left( {1 - {{left( {{h_Theta }left( {{x^{left( i ight)}}} ight)} ight)}_k}} ight)} } } ight] + frac{lambda }{{2m}}sumlimits_{l = 1}^{L - 1} {sumlimits_{i = 1}^{{s_l}} {sumlimits_{j = 1}^{{s_{l + 1}}} {{{left( {Theta _{ji}^{left( l ight)}} ight)}^2}} } } ]

我们想要最小化J(Θ)

[underbrace {min }_Theta Jleft( Theta  ight)]

需要计算

[Jleft( Theta  ight)]

[frac{partial }{{partial _{ij}^{left( l ight)}}}Jleft( Theta  ight)]


问题的关键是计算

[frac{partial }{{partial _{ij}^{left( l ight)}}}Jleft( Theta  ight)]


有如下神经网络(省略中间的连线)

以一个样本为例(x, y)

先计算向前传播

[egin{array}{l}
{a^{left( 1 ight)}} = x\
{z^{left( 2 ight)}} = {Theta ^{left( 1 ight)}}{a^{left( 1 ight)}}\
{a^{left( 2 ight)}} = gleft( {{z^{left( 2 ight)}}} ight)left( { + a_0^{left( 2 ight)}} ight)\
{z^{left( 3 ight)}} = {Theta ^{left( 2 ight)}}{a^{left( 2 ight)}}\
{a^{left( 3 ight)}} = gleft( {{z^{left( 3 ight)}}} ight)left( { + a_0^{left( 3 ight)}} ight)\
{z^{left( 4 ight)}} = {Theta ^{left( 3 ight)}}{a^{left( 3 ight)}}\
{a^{left( 3 ight)}} = {h_Theta }left( x ight) = gleft( {{z^{left( 4 ight)}}} ight)
end{array}]

 反响传播算法

定义

[delta _j^{left( l ight)} = “error” of node j in layer l.]

[delta _j^{left( l ight)} = 第 l 层的第 j 个节点的“偏差”]

因此,对于每个输出层单元(L=4)

[delta _j^{left( 4 ight)} = a_J^{left( 4 ight)} - {y_j}]

yj是真实值。

接下来计算前面几层的“偏差”

[egin{array}{l}
{delta ^{left( 3 ight)}} = {left( {{Theta ^{left( 3 ight)}}} ight)^T}{delta ^{left( 4 ight)}}. * g'left( {{z^{left( 3 ight)}}} ight)\
{delta ^{left( 2 ight)}} = {left( {{Theta ^{left( 2 ight)}}} ight)^T}{delta ^{left( 3 ight)}}. * g'left( {{z^{left( 2 ight)}}} ight)
end{array}]

第一层没有“偏差”

又可以证明(我没有证明)

[egin{array}{l}
{delta ^{left( 3 ight)}} = {left( {{Theta ^{left( 3 ight)}}} ight)^T}{delta ^{left( 4 ight)}}. * left( {{a^{left( 3 ight)}}. * left( {1 - {a^{left( 3 ight)}}} ight)} ight)\
{delta ^{left( 2 ight)}} = {left( {{Theta ^{left( 2 ight)}}} ight)^T}{delta ^{left( 3 ight)}}. * left( {{a^{left( 2 ight)}}. * left( {1 - {a^{left( 2 ight)}}} ight)} ight)
end{array}]

又有

[frac{partial }{{partial Theta _{ij}^{left( l ight)}}}Jleft( Theta  ight) = a_j^{left( l ight)}delta _i^{left( {l + 1} ight)}]


总结反响传播算法

有训练集 

[left{ {left( {{x^{left( 1 ight)}},{y^{left( 1 ight)}}} ight),...,left( {{x^{left( m ight)}},{y^{left( m ight)}}} ight)} ight}]

1,令

[Delta _{ij}^{left( l ight)} = 0]

[Delta 是delta的大写 ]

2,计算

For i = 1 to m {

  Set a(1) = x(i)

  Perform forward propagation to compute a(l) for l = 2, 3,.., L

  Using y(i), compute δ(L) = a(L) - y(i)

  Compute δ(L-1), δ(L-2),...,δ(2)

  [Delta _{ij}^{left( l ight)}: = Delta _{ij}^{left( l ight)} + a_j^{left( l ight)}delta _i^{left( {l + 1} ight)}]

 }

3,计算

if j ≠ 0

[D_{ij}^{left( l ight)}: = frac{1}{m}Delta _{ij}^{left( l ight)} + lambda Theta _{ij}^{left( l ight)}]

if j = 0

[D_{ij}^{left( l ight)}: = frac{1}{m}Delta _{ij}^{left( l ight)}]

这里

[D_{ij}^{left( l ight)} = frac{partial }{{partial Theta _{ij}^{left( l ight)}}}Jleft( Theta  ight)]

原文地址:https://www.cnblogs.com/qkloveslife/p/9872785.html