机器学习基石之非线性转换（Nonlinear Transformation）

非线性转换（Nonlinear Transformation）

前面讲了许多线性模型，但是假如数据并不是线性可分的，该如何处理呢？基本思路是将数据样本（特征）空间 (mathcal{X}) 映射到 (mathcal{Z}) 空间后，在 (mathcal{Z}) 空间数据是线性可分的话，便可以在 (mathcal{Z}) 空间上使用线性模型对数据分析。

那么该映射叫做非线性特征转换 (Phi)（(nonlinear) feature transform ）实现的是：

[mathbf { x } in mathcal { X } {mathop longmapsto ^ mathbf { Phi }} mathbf { z } in mathcal { Z } ]

学习的基本步骤如下：

transform original data (left{ left( mathbf { x } _ { n } , y _ { n } ight) ight}) to (left{ left( mathbf { z } _ { n } = mathbf { Phi } left( mathbf { x } _ { n } ight) , y _ { n } ight) ight})
get a good perceptron ( ilde { mathbf { w } }) using (left{ left( mathbf { z } _ { n } = mathbf { Phi } left( mathbf { x } _ { n } ight) , y _ { n } ight) ight}) and your favorite linear classification algorithm (mathcal{A})。
return (g ( mathbf { x } ) = operatorname { sign } left( ilde { mathbf { w } } ^ { T } mathbf { Phi } ( mathbf { x } ) ight))

常用的非线性转换（General Nonlinear Transform）

General Quadratic Hypothesis Set

基本形式为：

[Phi _ { 2 } ( mathbf { x } ) = left( 1 , x _ { 1 } , x _ { 2 } , x _ { 1 } ^ { 2 } , x _ { 1 } x _ { 2 } , x _ { 2 } ^ { 2 } ight) ]

其具有的特性是

can implement all possible quadratic curve boundaries: circle, ellipse, rotated ellipse, hyperbola, parabola, …
适用于各种二次曲线边界：圆，椭圆，旋转椭圆，双曲线，抛物线…
include lines and constants as degenerate cases
也包括直线型和常数型

General PolynomialHypothesis Set

基本形式为：

[egin{aligned} Phi _ { 0 } ( mathbf { x } ) = ( 1 ) , Phi _ { 1 } ( mathbf { x } ) & = left( Phi _ { 0 } ( mathbf { x } ) , quad x _ { 1 } , x _ { 2 } , ldots , x _ { d } ight) \ Phi _ { 2 } ( mathbf { x } ) & = left( Phi _ { 1 } ( mathbf { x } ) , quad x _ { 1 } ^ { 2 } , x _ { 1 } x _ { 2 } , ldots , x _ { d } ^ { 2 } ight) \ Phi _ { 3 } ( mathbf { x } ) & = left( Phi _ { 2 } ( mathbf { x } ) , quad x _ { 1 } ^ { 3 } , x _ { 1 } ^ { 2 } x _ { 2 } , ldots , x _ { d } ^ { 3 } ight)\ Phi _ { Q } ( mathbf { x } ) &= left( egin{array} { c c } Phi _ { Q - 1 } ( mathbf { x } ) , & left. x _ { 1 } ^ { Q } , x _ { 1 } ^ { Q - 1 } x _ { 2 } , ldots , x _ { d } ^ { Q } ight) end{array} ight.end{aligned} ]

那么在经过特征转换后的 hypothesis set 可以表示为

[egin{array} { c c c c c c c c c } mathcal { H } _ { Phi _ { 0 } } & subset & mathcal { H } _ { Phi _ { 1 } } & subset & mathcal { H } _ { Phi _ { 2 } } & subset & mathcal { H } _ { Phi _ { 3 } } & subset & ldots & subset & mathcal { H } _ { Phi _ { Q } } \ | & & | & & | & & | & & & &| \ mathcal { H } _ { 0 } & & mathcal { H } _ { 1 } & & mathcal { H } _ { 2 } & & mathcal { H } _ { 3 } & & ldots & & mathcal { H } _ { Q } end{array} ]

可以绘制出结构图：

所以其结构叫做嵌套（nested） (mathcal { H } _ { i }) 。

非线性转换代价（Price）

对于多项式非线性转换来说，求取 (g _ { i } = operatorname { argmin } _ { h in mathcal { H } _ { i } } E _ { mathrm { in } } ( h ))，可以获得以下结果：

[egin{array} { c c c c c c c c c} mathcal { H } _ { 0 } & subset & mathcal { H } _ { 1 } & subset & mathcal { H } _ { 2 } & subset & mathcal { H } _ { 3 } & subset & cdots \ d _ { mathrm { VC } } left( mathcal { H } _ { 0 } ight) & leq & d _ { mathrm { VC } } left( mathcal { H } _ { 1 } ight) & leq & d _ { mathrm { VC } } left( mathcal { H } _ { 2 } ight) & leq & d _ { mathrm { VC } } left( mathcal { H } _ { 3 } ight) & leq & cdots \ E _ { mathrm { in } } left( g _ { 0 } ight) & geq & E _ { mathrm { in } } left( g _ { 1 } ight) & geq & E _ { mathrm { in } } left( g _ { 2 } ight) & geq & E _ { mathrm { in } } left( g _ { 3 } ight) & geq & cdots end{array} ]

根据之前推导的公式可知：(underbrace { 1 } _ { W _ { 0 } } + underbrace { ilde { d } } _ { ext {others } } ext { dimensions } = O left( Q ^ { d } ight))，所以 (Q) large 意味着 large (d_{mathbf{vc}})。即能力越来越大，复杂度会随之不断增加。

而在分析 VC Dimension 时得出了下面关于(E_{ ext {in}})，(E_{ ext {out}})以及模型复杂度随 (d_{mathbf{vc}}) 的变化曲线图：

所以说能力越大，不一定越适用，在实际运用时，线性先行，从最简单的试起。许多情况下线性模型：简单（simple）, 有效（efficient）, 安全（safe）, 且可行（workable）!

任世事无常，勿忘初心

机器学习基石 之 非线性转换（Nonlinear Transformation）

非线性转换（Nonlinear Transformation）

常用的非线性转换 （General Nonlinear Transform）

非线性转换代价（Price）

机器学习基石之非线性转换（Nonlinear Transformation）

常用的非线性转换（General Nonlinear Transform）