Machine Learning--week4 神经网络的基本概念

之前的学习成果并不能解决复杂的非线性问题

Neural Networks

Sigmoid(logistic) activation function: activation function is another term for (g(z) = frac{1}{1+e^{-z}})

activation: the value that's computed by and as output by a specific

weights = parameters = ( heta)

input units: (x_1,x_2, x_3,dots, x_n)

bias unit/ bias neuron: (x_0) 与 (a_0^{(j)})

input units 和 hypothesis 之间的layer 由activation 构成

input wire/ output wire：input wire是指指向目标neuron的箭头，output wire是指从目标neuron指出的箭头

(a_i^{(j)}): "activation" of neuron (i) or of unit (i) in layer (j)

(Theta^{(j)}): matrix of weights controlling the function mapping form layer (j) to layer (j+1)

（注意(Theta)是大写的，因为它需要用到矩阵的形式了）

layer 1 == input layer

layer n == output layer (the last layer)

layer 2 ~ layer n-1 == hidden layer

for example:

[egin{align} ext{output of layer 1(a hidden lyer)}&egin{cases}a_1^{(2)} &= g(Theta_{10}^{(1)}x_0 + Theta_{11}^{(1)}x_1 + Theta_{12}^{(1)}x_2 + Theta_{13}^{(1)}x_3)\ a_2^{(2)} &= g(Theta_{20}^{(1)}x_0 + Theta_{21}^{(1)}x_1 + Theta_{22}^{(1)}x_2 + Theta_{23}^{(1)}x_3)\ a_3^{(2)} &= g(Theta_{30}^{(1)}x_0 + Theta_{31}^{(1)}x_1 + Theta_{32}^{(1)}x_2 + Theta_{33}^{(1)}x_3)end{cases}\ ext{output layer}&egin{cases}h_Theta(x) = a_1^{(3)} = g(Theta_{10}^{(2)}a_0^{(2)} + Theta_{11}^{(2)}a_1^{(2)} +Theta_{12}^{(2)}a_2^{(2)} + Theta_{13}^{(2)}a_3^{(2)})end{cases} end{align} ]

直观点就是：

[egin{align} ext{output of layer 1(a hidden lyer)} &egin{cases} a_1^{(2)} &= g(Theta_{1}^{(1)}a^{(1)})\ a_2^{(2)} &= g(Theta_{2}^{(1)}a^{(1)})\ a_3^{(2)} &= g(Theta_{3}^{(1)}a^{(1)}) end{cases}\ ext{output layer} &egin{cases} h_Theta(x) = a_1^{(3)} = g(Theta_{1}^{(2)}a^{(2)}) end{cases} end{align} ]

)generally, (Theta^{(j)}) will be of dimension (s_{j+1} imes (s_j+1)), if network has (s_j) units in layer (j) and (s_{j+1}) units in layer (j+1). ((s_j+1)中的(+1) comes from the addition in (Theta^{(j)}) of the "bias nodes," (x_0) and (Theta_0^{(j)}) . In other words the output nodes will not include the bias nodes while the inputs will. )

定义 (a^{(1)} = x)

(z^{j+1} = Theta^{(j)}a^{(j)})

(x_k^{(j+1)} = Theta_{k,0}^{(j)}a_0^{(j)} + Theta_{k,1}^{(j)}a_1^{(j)} + dots + Theta_{k,n^{(j)}}^{(j)}a_{n^{(j)}}^{(j)}quad ,(n^{(j)} ext{ means layer j has } n^{(j)} ext{ activation}))

(a^{(j)} = g(z^{(j)}) = g(Theta^{(j-1)}a^{(j-1)})quad(jge2))

设有 (n) 个 layers, then the last matrix (Theta^{(n)}) will have only one row which is multiplied by one column (a^{(j)}) so that our result is a single number:

(h_Theta(x) = a^{(n+1)}=g(z^{(n+1)}))

Add (a_0^{(j)}=1)

Forward Propagation：向前传播

Neural Networks 实际上是使用(a^{(n-1)})layer作为训练logistic regression的特征的，而非input layer，在(Theta^{(1)})中选择不同的参数可能得到一些复杂的特征，从而的到更好的hypothesis，这样做比直接用(x_1,x_2,dots ,x_n)作为训练特征更好

architecture(架构)：the way that neural networks are connected

逻辑表达式对应的( heta)：

({ m AND} = (x_1 igwedge x_2)):
- (Theta = egin{bmatrix}-30 &20& 20 end{bmatrix})
({ m NOR} = (lnot x_1 igwedge lnot x_2)):
- (Theta = egin{bmatrix}10 & -20& -20 end{bmatrix})
({ m OR} = (x_1 igvee x_2)):
- (Theta = egin{bmatrix}-10 &20& 20 end{bmatrix})
({ m NOT} = (lnot x)):
- (Theta = egin{bmatrix}-10 & 20end{bmatrix})
({ m XNOR} = (lnot x_1 igwedge lnot x_2) igvee ( x_1 igwedge x_2))
- 需要一个hidden layer: (a_1^{(2)} == (lnot x_1 igwedge lnot x_2),quad a_2^{(2)} == (x_1 igwedge x_2))
- output layer: (a^{(3)} == (a_1^{(2)} igvee a_2^{(2)}))

逻辑表达式的实现：

令(x=egin{bmatrix}1 \ x_1\x_2 end{bmatrix}), 则 (a_i = g(Theta_ix))就得到(Theta_i)对应的逻辑运算符运算(x_1,x_2)的结果了

比如 (Theta_i = egin{bmatrix}-10 &20& 20 end{bmatrix})那么(a_i == x_1 igvee x_2)

像({ m XNOR})这种复杂的逻辑表达式需要借助hidden layer才能算出来

对于 multiclass Classification:

用(y = egin{bmatrix}1\0\0\0 end{bmatrix}, egin{bmatrix}0\1\0\0 end{bmatrix}, egin{bmatrix}0\0\1\0 end{bmatrix}, egin{bmatrix}0\0\0\1 end{bmatrix},egin{bmatrix}0\0\0\0 end{bmatrix})来表示不同的class，