softmax求导的计算

本文参考于(https://zhuanlan.zhihu.com/p/105758059)
大家可以参考上述链接,更加详细。

  • softmax之前的输入为
    (z = [z_1,z_2,...,z_n])
  • 经过softmax之后,
    (a_i = frac{e^{z_i}}{sum_{k=1}^{n}e^{z_k}})
    可得a向量(a = [frac{e^{z_1}}{sum_{k=1}^{n}e^{z_k}},frac{e^{z_2}}{sum_{k=1}^{n}e^{z_k}},...,frac{e^{z_n}}{sum_{k=1}^{n}e^{z_k}}])
  • 目标向量为
    y = [0,0,0,...,1,..0],假设(y_j=1)其余均为0
  • 损失函数为交叉熵损失
    (L = -sum_{i=1}^{n}y_i*lna_i),又其他均为0,故可以简写成(L = -y_j*lna_j = -lna_j)

目标是标量L对向量z求导,(frac{partial L}{partial Z} = frac{partial L}{partial a}*frac{partial a}{partial z})

1 求(frac{partial L}{partial a})

(L = -lna_j)得,loss只与a_j有关
(frac{partial L }{partial a} = [0,0,...,-frac{1}{a_j},..0])

2 求(frac{partial a}{partial z})

a是一个向量,z是一个向量,(frac{partial a}{partial z} = left[ egin{matrix} frac{partial a_1}{partial z_1} & frac{partial a_1}{partial z_2} & cdots & frac{partial a_1}{partial z_n}\ frac{partial a_2}{partial z_1} & frac{partial a_2}{partial z_2} & cdots & frac{partial a_2}{partial z_n}\ vdots & vdots & vdots & vdots \ frac{partial a_n}{partial z_1} & frac{partial a_n}{partial z_2} & cdots & frac{partial a_n}{partial z_n}\ end{matrix} ight] )
由于(frac{partial l}{partial a})只有第j列不为0,我们只需要求(frac{partial a}{partial z})的第行,即(frac{partial a_j}{partial z})
(frac{partial L}{partial Z} = -frac{1}{a_j}*frac{partial a_j}{partial Z}),其中(a_j = frac{e^{z_j}}{sum_{i=1}^{n}e^{z_k}})

  • (i ot= j)
    (frac{partial a_j}{partial z_i} = frac{0-e^{z_j}*e^{z_i}}{(sum_{i=1}^{n}e^{z_k})^2} = -a_j*a_i)
    (frac{partial L}{partial z_i} = -frac{1}{a_j}*frac{partial a_j}{partial z} = -frac{1}{a_j}*(-a_j*a_i) = a_i)
  • (i = j)
    (frac{partial a_j}{partial z_j} = frac{e^{z_j}*sum_{i=1}^{n}e^{z_k}-e^{z_j}*e^{z_j}}{(sum_{i=1}^{n}e^{z_k})^2} = a_j- a_j^2)
    (frac{partial L}{partial z_j} = (a_j-a_j^2)*(-frac{1}{a_j}) = a_j-1)

所以(frac{partial L}{partial Z} = [a_1,a_2,...a_j-1,..a_n] = [a_1,a_2,,,,a_j,...,a_n] - [0,0,...,1,..0] = a - y)

原文地址:https://www.cnblogs.com/zhou-lin/p/15419679.html