证明adaboost和使用指数损失函数的前向可加模型的等价性

Why Adaboost is equivalent to forward stagewise additive modeling using the loss function (L(y,f(x))=exp(-yf(x)))?

First we consider forward stagewise additive modeling using the loss function (L(y,f(x))=exp(-yf(x)))

Using the exponential loss function, one must solve:

[left(eta_{m}, G_{m} ight)=arg min _{eta, G} sum_{i=1}^{N} exp left[-y_{i}left(f_{m-1}left(x_{i} ight)+eta Gleft(x_{i} ight) ight) ight] ]

we denote (w_{i}^{(m)}=exp left(-y_{i} f_{m-1}left(x_{i} ight) ight)),then:

[left(eta_{m}, G_{m} ight)=arg min _{eta, G} sum_{i=1}^{N} w_{i}^{(m)} exp left(-eta y_{i} Gleft(x_{i} ight) ight) ]

Since when (y_{i} =G(x_{i})),(exp left(-eta y_{i} Gleft(x_{i} ight) ight)=1);when (y_{i} eq G(x_{i})),(exp left(-eta y_{i} Gleft(x_{i} ight) ight)=-1)

we can rewrite (sum_{i=1}^{N} w_{i}^{(m)} exp left(-eta y_{i} Gleft(x_{i} ight) ight)) as:

[e^{-eta} cdot sum_{y_{i}=Gleft(x_{i} ight)} w_{i}^{(m)}+e^{eta} cdot sum_{y_{i} eq Gleft(x_{i} ight)} w_{i}^{(m)} ]

which is equivalent as:

[left(e^{eta}-e^{-eta} ight) cdot sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i} eq Gleft(x_{i} ight) ight)+e^{-eta} cdot sum_{i=1}^{N} w_{i}^{(m)} ]

Therefore, the optimization of G and w are independent:

[G_{m}=arg min _{G} sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i} eq Gleft(x_{i} ight) ight) ]

plugging this (G_m) into object function taking derivative to (eta), we have:

[left(e^{eta}+e^{-eta} ight) cdot sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i} eq G_mleft(x_{i} ight) ight)=e^{-eta} cdot sum_{i=1}^{N} w_{i}^{(m)} ]

[e^{2eta}+1=frac{sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i} eq G_mleft(x_{i} ight) ight)}{sum_{i=1}^{N} w_{i}^{(m)}} ]

Thus:

[eta_{m}=frac{1}{2} log frac{1-operatorname{err}_{m}}{operatorname{err}_{m}} ]

where

[operatorname{err}_{m}=frac{sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i} eq G_{m}left(x_{i} ight) ight)}{sum_{i=1}^{N} w_{i}^{(m)}} ]

Since we use forward stagewise additive modeling,

[f_{m}(x)=f_{m-1}(x)+eta_{m} G_{m}(x) ]

we have:

[egin{aligned} w_{i}^{(m+1)}&=exp left(-y_{i} f_{m}left(x_{i} ight) ight)\&=exp(-y_i(f_{m-1}(x)+eta_{m} G_{m}(x)))\&=w_{i}^{(m)} cdot e^{-eta_{m} y_{i} G_{m}left(x_{i} ight)} end{aligned} ]

Since (-y_{i} G_{m}left(x_{i} ight)=2 cdot Ileft(y_{i} eq G_{m}left(x_{i} ight) ight)-1), then

we can rewrite the above equation as

[w_{i}^{(m+1)}=w_{i}^{(m)} cdot e^{alpha_{m} Ileft(y_{i} eq G_{m}left(x_{i} ight) ight)} cdot e^{-eta_{m}} ]

Where (alpha_{m}=2 eta_{m})

we can ignore the factors (e^{-eta_m}) since it is multiplied all weights by the same value.

[w_{i}^{(m+1)}=w_{i}^{(m)} cdot e^{alpha_{m} Ileft(y_{i} eq G_{m}left(x_{i} ight) ight)} ]

also, we have:

[alpha_{m}=log frac{1-operatorname{err}_{m}}{operatorname{err}_{m}} ]

Compare it with adaboost, we can find that this is the same algorithm as Adaboost.

本文为跑得飞快的凤凰花原创，如需转载，请标明出处～