深度学习入门（一）——前馈网络和反向传播推导

本篇博文详细记录下我推导前馈网络和反向传播的过程。想要了解基本的神经网络构成，请参看Poll的笔记[Deep Learning] 神经网络基础。下面结合Charlotte77的https://www.cnblogs.com/charlotte77/p/5629865.html 中的例子来推导一遍。

1. 目标分析

如图1-1是一个简单的神经网络，仅有三层，每层仅有两个神经元。第一层是输入层，包含两个神经元 ${i_1$ 、 ${i_2$ ，和截距项 ${b_1$ ；第二层是隐含层，包含两个神经元 ${h_1$ 、 ${h_2$ 和截距项 ${b_2$ ，第三层是输出 ${o_1$ 、 ${o_2$ ，每条线上标的是层与层之间连接的权重，激活函数我们默认为sigmoid函数。

图 11 初始神经网络权重图

为他们赋上权重值，如图1-2所示：

图 12 为神经网络赋初始值

目标：给出输入数据 ${i_1$ 、 ${i_2$ ，使输出尽可能与原始输出 ${o_1$ 、 ${o_2$ 接近。

2. 前向传播

2.1 输入层à隐含层

隐藏神经元的加权和为： $ne{t_{{h_1}}} = {i_1}*{w_1} + {i_2}*{w_2} + {b_1}*1 = 0.3775$

隐藏神经元的加权和为： $ne{t_{{h_2}}} = {i_1}*{w_3} + {i_2}*{w_4} + {b_1}*1 = 0.3925$

隐藏神经元的输出为： $ne{t_{{h_1}}} = frac{1}{{1 + {e^{ - ne{t_{{h_1}}}}}}} = 0.593269992107$

隐藏神经元的输出为： $ne{t_{{h_2}}} = frac{1}{{1 + {e^{ - ne{t_{{h_2}}}}}}} = 0.59688437826$

2.2 隐含层à输出层

输出神经元的加权和为： $ne{t_{{o_1}}} = ou{t_{{h_1}}}*{w_5} + ou{t_{{h_2}}}*{w_6} + {b_2}*1 = { m{1}}{ m{.10590596706}}$

输出神经元的加权和为： $ne{t_{{o_2}}} = ou{t_{{h_1}}}*{w_7} + ou{t_{{h_2}}}*{w_8} + {b_2}*1 = { m{1}}{ m{.2249214041}}$

输出神经元的输出为： $ne{t_{{o_1}}} = frac{1}{{1 + {e^{ - ne{t_{{o_1}}}}}}} = 0.751365069552$

输出神经元的输出为： $ne{t_{{o_2}}} = frac{1}{{1 + {e^{ - ne{t_{{o_2}}}}}}} = 0.772928465321$

3. 反向传播

3.1 计算总误差

总误差为均方误差： ${E_{total}} = frac{{sumlimits_{i = 1}^2 {{{(targ e{t_{{o_i}}} - ou{t_{{o_i}}})}^2}} }}{2} = 0.29837110876$

3.2 隐含层à输出层的权值更新

首先计算对整体误差产生了多少影响，需要对 ${w_5}$ 求偏导，同时由2.2可知， ${w_5}$ 只对 $ne{t_{{o_1}}}$ 产生影响，又决定了 $ou{t_{{o_1}}}$

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_5}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial {w_5}}}$

$= (targ e{t_{{o_1}}} - ou{t_{{o_1}}}) ullet ou{t_{{o_1}}}(1 - ou{t_{{o_1}}}) ullet ou{t_{{h_1}}} = { m{0}}{ m{.0821670405642}}$

同理可以计算

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_6}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial {w_6}}}$

$= (targ e{t_{{o_1}}} - ou{t_{{o_1}}}) ullet ou{t_{{o_1}}}(1 - ou{t_{{o_1}}}) ullet ou{t_{{h_2}}} = { m{0}}{ m{.0826676278475}}$

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_7}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial {w_7}}}$

$= (targ e{t_{{o_2}}} - ou{t_{{o_2}}}) ullet ou{t_{{o_2}}}(1 - ou{t_{{o_2}}}) ullet ou{t_{{h_1}}} = { m{ - 0}}{ m{.0226025404775}}$

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_8}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial {w_8}}}$

$= (targ e{t_{{o_2}}} - ou{t_{{o_2}}}) ullet ou{t_{{o_2}}}(1 - ou{t_{{o_2}}}) ullet ou{t_{{h_2}}} = { m{ - 0}}{ m{.022740242216}}$

将学习率 $eta$ 设置为0.5，计算得到的更新值

$E_5^ + = {w_5} - eta frac{{partial {E_{total}}}}{{partial {w_5}}} = { m{0}}{ m{.358916479718}}$

同理可得：

$E_6^ + = {w_6} - eta frac{{partial {E_{total}}}}{{partial {w_6}}} = { m{0}}{ m{.408666186076}}$

$E_7^ + = {w_7} - eta frac{{partial {E_{total}}}}{{partial {w_7}}} = { m{0}}{ m{.511301270239}}$

$E_8^ + = {w_8} - eta frac{{partial {E_{total}}}}{{partial {w_8}}} = { m{0}}{ m{.561370121108}}$

3.3隐含层à输入层的权值更新

方法与上面的类似，但是要注意 ${w_1}$ 决定了 $ne{t_{{h_1}}$ ，而 $ne{t_{{h_1}}$ 决定了 $partial ou{t_{{h_1}}}$ ， $partial ou{t_{{h_1}}}$ 对 $ne{t_{{o_1}}$ 和 $ne{t_{{o_2}}$ 均有影响， $ne{t_{{o_1}}$ 和 $ne{t_{{o_2}}$ 分别决定了 $ou{t_{{o_1}}$ 和 $ou{t_{{o_2}}$ ，所以在求总误差对 ${w_1}$ 偏导时，必须对 $ou{t_{{o_1}}$ 和 $ou{t_{{o_2}}$ 同时求偏导：

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_1}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial ou{t_{{h_1}}}}} ullet frac{{partial ou{t_{{h_1}}}}}{{partial ne{t_{{h_1}}}}} ullet frac{{partial ne{t_{{h_1}}}}}{{partial {w_1}}}$

$+ frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial ou{t_{{h_1}}}}} ullet frac{{partial ou{t_{{h_1}}}}}{{partial ne{t_{{h_1}}}}} ullet frac{{partial ne{t_{{h_1}}}}}{{partial {w_1}}}$

$= (frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial ou{t_{{h_1}}}}} + frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial ou{t_{{h_1}}}}}) ullet frac{{partial ou{t_{{h_1}}}}}{{partial ne{t_{{h_1}}}}} ullet frac{{partial ne{t_{{h_1}}}}}{{partial {w_1}}}$

$= ((targ e{t_{{o_1}}} - ou{t_{{o_1}}}) ullet ou{t_{{o_1}}}(1 - ou{t_{{o_1}}}) ullet {w_5}$

$+ (targ e{t_{{o_2}}} - ou{t_{{o_2}}}) ullet ou{t_{{o_2}}}(1 - ou{t_{{o_2}}}) ullet {w_7}) ullet ou{t_{{h_1}}}(1 - ou{t_{{h_1}}}) ullet {i_1}$

$= { m{0}}{ m{.000438567734474}}$

据此更新 ${w_1}$ 的值为

$E_1^ + = {w_1} - eta frac{{partial {E_{total}}}}{{partial {w_1}}} = { m{0}}{ m{.149780716133}}$

同理可得

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_2}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial ou{t_{{h_1}}}}} ullet frac{{partial ou{t_{{h_1}}}}}{{partial ne{t_{{h_1}}}}} ullet frac{{partial ne{t_{{h_1}}}}}{{partial {w_2}}}$

$= ((targ e{t_{{o_1}}} - ou{t_{{o_1}}}) ullet ou{t_{{o_1}}}(1 - ou{t_{{o_1}}}) ullet {w_5}$

$+ (targ e{t_{{o_2}}} - ou{t_{{o_2}}}) ullet ou{t_{{o_2}}}(1 - ou{t_{{o_2}}}) ullet {w_7}) ullet ou{t_{{h_1}}}(1 - ou{t_{{h_1}}}) ullet {i_2}$

$= { m{0}}{ m{.000438567734474}}$

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_3}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial ou{t_{{h_2}}}}} ullet frac{{partial ou{t_{{h_2}}}}}{{partial ne{t_{{h_2}}}}} ullet frac{{partial ne{t_{{h_2}}}}}{{partial {w_3}}}$

$+ frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial ou{t_{{h_2}}}}} ullet frac{{partial ou{t_{{h_2}}}}}{{partial ne{t_{{h_2}}}}} ullet frac{{partial ne{t_{{h_2}}}}}{{partial {w_3}}}$

$= (frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial ou{t_{{h_2}}}}} + frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial ou{t_{{h_2}}}}}) ullet frac{{partial ou{t_{{h_2}}}}}{{partial ne{t_{{h_2}}}}} ullet frac{{partial ne{t_{{h_2}}}}}{{partial {w_3}}}$

$= ((targ e{t_{{o_1}}} - ou{t_{{o_1}}}) ullet ou{t_{{o_1}}}(1 - ou{t_{{o_1}}}) ullet {w_6}$

$+ (targ e{t_{{o_2}}} - ou{t_{{o_2}}}) ullet ou{t_{{o_2}}}(1 - ou{t_{{o_2}}}) ullet {w_8}) ullet ou{t_{{h_1}}}(1 - ou{t_{{h_1}}}) ullet {i_1}$

$= { m{0}}{ m{.000497712735261}}$

$frac{{partial {{ m{E}}_{total}}}}{{partial {w_4}}} = frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial ou{t_{{h_2}}}}} ullet frac{{partial ou{t_{{h_2}}}}}{{partial ne{t_{{h_2}}}}} ullet frac{{partial ne{t_{{h_2}}}}}{{partial {w_4}}}$

$+ frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial ou{t_{{h_2}}}}} ullet frac{{partial ou{t_{{h_2}}}}}{{partial ne{t_{{h_2}}}}} ullet frac{{partial ne{t_{{h_2}}}}}{{partial {w_4}}}$

$= (frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_1}}}}} ullet frac{{partial ou{t_{{o_1}}}}}{{partial ne{t_{{o_1}}}}} ullet frac{{partial ne{t_{{o_1}}}}}{{partial ou{t_{{h_2}}}}} + frac{{partial {{ m{E}}_{total}}}}{{partial ou{t_{{o_2}}}}} ullet frac{{partial ou{t_{{o_2}}}}}{{partial ne{t_{{o_2}}}}} ullet frac{{partial ne{t_{{o_2}}}}}{{partial ou{t_{{h_2}}}}}) ullet frac{{partial ou{t_{{h_2}}}}}{{partial ne{t_{{h_2}}}}} ullet frac{{partial ne{t_{{h_2}}}}}{{partial {w_4}}}$

$= ((targ e{t_{{o_1}}} - ou{t_{{o_1}}}) ullet ou{t_{{o_1}}}(1 - ou{t_{{o_1}}}) ullet {w_6}$

$+ (targ e{t_{{o_2}}} - ou{t_{{o_2}}}) ullet ou{t_{{o_2}}}(1 - ou{t_{{o_2}}}) ullet {w_8}) ullet ou{t_{{h_2}}}(1 - ou{t_{{h_2}}}) ullet {i_2}$

$= { m{0}}{ m{.000995425470522}}$

从而得到 ${w_2}$ 、 ${w_3}$ 和 ${w_4}$ 的更新值：

$E_2^ + = {w_2} - eta frac{{partial {E_{total}}}}{{partial {w_2}}} = { m{0}}{ m{.199561432266}}$

$E_3^ + = {w_3} - eta frac{{partial {E_{total}}}}{{partial {w_3}}} = { m{0}}{ m{.249751143632}}$

$E_4^ + = {w_4} - eta frac{{partial {E_{total}}}}{{partial {w_4}}} = { m{0}}{ m{.299502287265}}$

如此所有的权重值便通过反向传播更新完毕，在这个例子中第一次迭代之后，总误差 ${E_{total}}$ 由0.29837110876下降至0.291027773694。迭代10000次后，总误差为0.000035085，输出为[0.015912196,0.984065734](原输入为[0.01,0.99]),证明效果还是不错的。首次反向传播的代码见：https://github.com/DowTowne/Deep/blob/master/Introduce_to_deep_learning/01_back_propagation/back_propagation_cal.py，循环迭代的代码请参见：https://github.com/DowTowne/Deep/blob/master/Introduce_to_deep_learning/01_back_propagation/test_simple_neuralNetwork.py。

转载请注明出处。如有勘误，请多多指正！

参考文献：

1. Poll的笔记：[Mechine Learning & Algorithm] 神经网络基础（http://www.cnblogs.com/maybe2030/p/5597716.html#3457159 ）

2. 一文弄懂神经网络中的反向传播法——BackPropagation（ https://www.cnblogs.com/charlotte77/p/5629865.html ）