softmax回归推导

向量(y)(为one-hot编码,只有一个值为1,其他的值为0)真实类别标签(维度为(m),表示有(m)类别):

[y=egin{bmatrix}y_1\ y_2\ ...\y_mend{bmatrix} ]

向量(z)为softmax函数的输入,和标签向量(y)的维度一样,为(m):

[z=egin{bmatrix}z_1\ z_2\ ...\z_mend{bmatrix} ]

向量(s)为softmax函数的输出,和标签向量(y)的维度一样,为(m):

[s=egin{bmatrix}s_1\ s_2\ ...\s_mend{bmatrix} ]

[s_{i}=frac{e^{z_{i}}}{sum_{k=1}^{m}e^{z_{k}}} ]

交叉熵损失函数:

[c=-sum_{j=1}^{m}y_jlns_j ]

损失函数对向量(z)中的每个(z_i)求偏导:

[frac{partial c}{partial z_i}=-sum_{j=1}^{m}frac{partial (y_jlns_j)}{partial s_j}*frac{partial s_j}{partial z_i} =-sum_{j=1}^{m}frac{y_j}{s_j}*frac{partial s_j}{partial z_i} ]

当j=i时:

[frac{partial s_j}{partial z_i}=frac{partial (frac{e^{z_{i}}}{sum_{k=1}^{m}e^{z_{k}}})}{partial z_i} =frac{e^{z_i}*sum_{k=1}^{m}e^{z_k}-e^{z_i}*e^{z_i}}{(sum_{k=1}^{m}e^{z_k})^2} =frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}}*frac{sum_{k=1}^{m}e^{z_k}-e^{z_i}}{sum_{k=1}^{m}e^{z_k}} =frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}}*(1-frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}}) =s_i*(1-s_i) ]

当j!=i时:

[frac{partial s_j}{partial z_i}=frac{partial (frac{e^{z_{j}}}{sum_{k=1}^{m}e^{z_{k}}})}{partial z_i} =frac{0*sum_{k=1}^{m}e^{z_k}-e^{z_j}*e^{z_i}}{(sum_{k=1}^{m}e^{z_k})^2} =-frac{e^{z_j}}{sum_{k=1}^{m}e^{z_k}}*frac{e^{z_i}}{sum_{k=1}^{m}e^{z_k}} =-s_js_i ]

所以:

[frac{partial s_j}{partial z_i}=egin{cases}s_i(1-s_i)& j=i \ -s_js_i& j eq{i} end{cases} ]

损失函数对向量(z)中的每个(z_i)求偏导:

[frac{partial c}{partial z_i} =-sum_{j=1}^{m}frac{y_j}{s_j}*frac{partial s_j}{partial z_i} =-(frac{y_i}{s_i}*frac{partial s_i}{partial z_i}+sum_{j eq{i}}^{m}frac{y_j}{s_j}*frac{partial s_j}{partial z_i}) =-(frac{y_i}{s_i}*s_i(1-s_i)+sum_{j eq{i}}^{m}frac{y_j}{s_j}*(-s_js_i)) ]

[=-y_i(1-s_i)+sum_{j eq{i}}^{m}y_js_i =-y_i+s_iy_i+sum_{j eq{i}}^{m}y_js_i =-y_i+sum_{j=1}^{m}y_js_i =s_i-y_i ]

原文地址:https://www.cnblogs.com/smallredness/p/11047718.html