Linear classification

Classification, to find out which bounder side of a point or get the bounder to separate the dataset. This article is mainly about Linear Classification, using one hyper plane to separate the dataset.

as for the signed distance y

here comes the neuron:

the function f is a transfer function, used to prevent y from changing too much by aliens or noise. f can be step function, linear step function or tanh. As usual the error defined as:

E=0.5*(z-t)^2, z is the output of the neuron and t is the class of the points, so that the 1^st order derivate is:

update each vi as vi(t + 1) = vi(t) + ∆vi(t)

refference:

http://www.marmakoide.org/download/teaching/dm/dm-perceptron-1.pdf

Note that the linear classification can’t separate data with a non-linear frontier, neither a dataset with multidimensional data. For some dataset, a better initialization is needed, set the Vector C be the median of the data, the Vector V be one unit vector. Use MLP to get a better separation.

<view code>

result:

eta= 0.5 iteration=3
0.07076422200317212 0.08188117302023934
eta= 0.01 iteration=2
-0.6969402564832946 0.7171348353723
eta= 0.001 iteration=255
-0.6648463748271787 0.6683932978172582
test 2: ==========================
no Optimization:
too much time 。。。
use median to do the optimization
100.5 100.5
eta= 0.5 iteration=6
eta= 0.01 iteration=115
100.49434892035956 100.50567408374293
test 3: ==========================
seperate the dataset into 30% testing part and 70% training part,
using the percentage to determine when to stop the learning
iteration= 300
percentage: 0.88 test: 0.9083333333333333 train: 0.8678571428571429
-353.0333900061246 -763.5366403901546 -251.38913455734232
Vector write with cluster_id finished

some samples && linear classify: