logistic回归学习

logistic回归是一种分类方法,用于两分类的问题,其基本思想为:

  1. 寻找合适的假设函数,即分类函数,用来预测输入数据的结果;
  2. 构造损失函数,用来表示预测的输出结果与训练数据中实际类别之间的偏差;
  3. 最小化损失函数,从而获得最优的模型参数。

首先来看一下sigmoid函数:

(g(x)=frac{1}{1-e^{x}})

它的函数图像为:

logistic回归中的假设函数(分类函数):

(h_{ heta }(x)=g( heta ^{T}x)=frac{1}{1+e^{- heta ^{T}x}})

解释:

( heta ) —— 我们在后面要求取的参数;

(T) —— 向量的转置,默认的向量都是列向量;

( heta ^{T}x) —— 列向量( heta)先转置,然后与(x)进行点乘,比如:

(egin{bmatrix}1\ -1\ 3end{bmatrix}^{T}egin{bmatrix}1\ 1\ -1end{bmatrix} = egin{bmatrix}1 & -1 & 3end{bmatrix}egin{bmatrix}1\ 1\ -1end{bmatrix}=1 imes 1+(-1) imes1+3 imes(-1) = -3)

logistic分类有线性边界和非线性边界两种:

线性边界形式为:( heta_{0}+ heta_{1}x_{1}+cdots+ heta_{n}x_{n}=sum_{i=0}^{n} heta_{i}x_{i}= heta^{T}x)

非线性边界的形式为:( heta_{0}+ heta_{1}x_{1}+ heta_{2}x_{2}+ heta_{3}x_{1}^{2}+ heta_{4}x_{2}^{2})

在概率上计算输入(x)结果为1或者0的概率分别为:

(P(y=1|x; heta)=h_{ heta}(x))

(P(y=0|x; heta)=1-h_{ heta}(x))

损失函数被定义为:(J( heta)=frac{1}{m}sum_{m}^{i=1}cost(h_{ heta}(x^{i}), y^{i}))

其中:

这里(m)是所有训练样本的数目;

(cost(h_{ heta}(x), y)=left{egin{matrix} -log(h_{ heta}(x)) if y=1\ -log(1-h_{ heta}(x)) if y=0end{matrix} ight.)

(cost)的另一种形式是:(cost(h_{ heta}(x), y)=-y imes log(h_{ heta}(x))-(1-y) imes log(1-h_{ heta}(x)))

将(cost)代入到(J( heta))中可以得到损失函数如下:

(J( heta)=-frac{1}{m}[sum_{m}^{i=1}y^{(i)}logh_{ heta}(x^{(i)})+(1-y^{(i)})log(1-h_{ heta}(x^{(i)}))])

梯度法求(J( heta))的最小值

( heta)的更新过程如下:

 ( heta_{j}:= heta_{j}-alphafrac{partial }{partial heta_{j}}J( heta), (j=0cdots n))

其中:(alpha)是学习步长。

(egin{align*} frac{partial }{partial heta_{j}}J( heta) &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{h_{ heta}(x^{(i)})} frac{partial }{partial heta_{j}}h_{ heta}(x^{(i)})-(1-y^{(i)})frac{1}{1-h_{ heta}(x^{(i)})}frac{partial }{partial heta_{j}}h_{ heta}(x^{(i)}) ight )  \  &=-frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{gleft ( heta^{T}x^{(i)} ight )}-left ( 1-y^{(i)} ight )frac{1}{1-gleft ( heta^{T}x^{(i)} ight )} ight )frac{partial }{partial heta_{j}}gleft ( heta^{T}x^{(i)} ight ) \   &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{gleft ( heta^{T}x^{(i)} ight )}-left ( 1-y^{(i)} ight )frac{1}{1-gleft ( heta^{T}x^{(i)} ight )} ight ) gleft ( heta^{T}x^{(i)} ight ) left ( 1-gleft ( heta^{T}x^{(i)} ight ) ight ) frac{partial }{partial heta_{j}} heta^{T}x^{(i)} end{align*})

(egin{align*} frac{partial }{partial heta_{j}}J( heta) &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}left ( 1-gleft ( heta^{T}x^{(i)} ight ) ight )-left ( 1-y^{(i)} ight )gleft ( heta^{T}x^{(i)} ight) ight )x_{j}^{left (i  ight )} \ &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)} -gleft ( heta^{T}x^{(i)} ight) ight )x_{j}^{left (i  ight )} \ &=-frac{1}{m}sum_{m}^{i=1}left ( y^{(i)} -h_{ heta}left ( x^{(i)} ight) ight )x_{j}^{left (i  ight )} \&=frac{1}{m}sum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i  ight )} end{align*})

把偏导代入更新过程那么可以得到:

( heta_{j}:= heta_{j}-alphafrac{1}{m}sum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i  ight )})

学习步长(alpha)通常是一个常量,然后省去(frac{1}{m}),可以得到最终的更新过程:

( heta_{j}:= heta_{j}-alphasum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i  ight )}, left ( j=0cdots n ight ))

向量化梯度

训练样本用矩阵来描述就是:

(X= egin{bmatrix} x^{(1)}\ x^{(2)}\ cdots \ x^{(m)}end{bmatrix}=egin{bmatrix} x_{0}^{(1)} & x_{1}^{(1)} & cdots & x_{n}^{(1)}\ x_{0}^{(2)} & x_{1}^{(2)} & cdots & x_{n}^{(2)}\ cdots & cdots & cdots & cdots \  x_{0}^{(m)} & x_{1}^{(m)} & cdots & x_{n}^{(m)} end{bmatrix}, Y=egin{bmatrix} y^{left ( 1 ight )}\  y^{left ( 2 ight )}\  cdots \ y^{left ( m ight )}end{bmatrix})

参数( heta)的矩阵形式为:

(Theta=egin{bmatrix} heta^{left ( 1 ight )}\  heta^{left ( 2 ight )}\  cdots \ heta^{left ( m ight )}end{bmatrix})

先计算(Xcdot Theta),并记结果为(A):

(A=XcdotTheta),其实就是矩阵的乘法

再来求取向量版的误差(E):

(E=h_{Theta}left ( X ight )-Y=egin{bmatrix} gleft ( A^{1} ight )-y^{left (1  ight )}\ gleft ( A^{1} ight )-y^{left (1  ight )}\  cdots \  gleft ( A^{1} ight )-y^{left (1  ight )}end{bmatrix} = egin{bmatrix} e^{(1)}\ e^{(2)}\ cdots \ e^{(m)}end{bmatrix})

 当(j=0)时的更新过程为:

(egin{align*} heta_{0}&= heta_{0}-alphasum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{0}^{left (i  ight )}, left ( j=0cdots n ight ) \  &= heta_{0}-alphasum_{m}^{i=1}e^{left ( i ight )}x_{0}^{left ( i ight )} \ &= heta_{0}-alpha egin{bmatrix} x_{0}^{left ( 1 ight )} & x_{0}^{left ( 2 ight )} & cdots & x_{m}^{left ( 0 ight )} end{bmatrix} cdot E end{align*})

对于( heta_{j})同理可以得到:

( heta_{j} = heta_{j}-alpha egin{bmatrix} x_{j}^{left ( 1 ight )} & x_{j}^{left ( 2 ight )} & cdots & x_{j}^{left ( m ight )} end{bmatrix} cdot E)

用矩阵来表达就是:

(egin{align*}egin{bmatrix} heta_{0}\ heta_{1}\ cdots \ heta_{n}end{bmatrix} &= egin{bmatrix} heta_{0}\ heta_{1}\ cdots \ heta_{n}end{bmatrix} - alpha cdot egin{bmatrix} x_{0}^{left ( 1 ight )} & x_{0}^{left ( 2 ight )} & cdots & x_{0}^{left (m ight )}\  x_{1}^{left ( 1 ight )} & x_{1}^{left ( 2 ight )} & cdots & x_{1}^{left (m ight )}\ cdots & cdots  &  cdots &  cdots\ x_{n}^{left ( 1 ight )} & x_{n}^{left ( 2 ight )} & cdots & x_{n}^{left (m ight )}\ end{bmatrix} cdot E \  &= heta - alpha cdot x^{T} cdot E end{align*})

以上就三个步骤:

1. 求取模型的输出:(A=X cdot Theta)

2. sigmoid映射之后求误差:(E=gleft ( A ight )-Y)

3. 利用推导的公式更新(Theta),(Theta:=Theta-alpha cdot X^{T} cdot E),然后继续回到第一步继续。

原文地址:https://www.cnblogs.com/tuhooo/p/9296915.html