SVM

http://www.blogjava.net/zhenandaci/archive/2009/02/13/254519.html SVM入门（一）至（三）Refresh

1.对于非数值变量（Categorical feature）

　　using m numbers to represent an m-category attribute. For example, a three-category attributes such as {red, green, blue} can be represented as (0,0,1), (0,1,0) and (1,0,0).

2. scale

有的变量取值范围大，有的变量取值范围小，这样可能导致大的变量dominate结果，所以要把所有变量取值范围都规定在[-1,1]。

而且，在train和predict时，scale的大小一定要一样。

例如：一个变量在train时范围是[-10,10]，scale后为[-1,1]。在predict的数据集中范围是[-11,8]，那就应该scale成[-1.1,0.8]

3. 什么是查准率和查全率（precision and recall）

 precision      查准率 = 识别出的真正的正面观点数 / 所有的识别为正面观点的条数
 recall         查全率 = 识别出的真正的正面观点数 / 样本中所有的真正正面观点的条数