基本的数据处理方法

填补NaN空缺

Imputer

    imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
    imp.fit(X_train)
    X_train = imp.transform(X_train)
    X_test = imp.transform(X_test)

数据正则化

Normalize, min_max,
http://www.tuicool.com/articles/JzMjeyi

分割数据集

cross_validation & metrics AUC
http://blog.csdn.net/u010414589/article/details/51166798

PCA进行降维

http://doc.okbase.net/u012162613/archive/120946.html

kaggle的一点经验之谈

这个很好,介绍了很多实用的模型
http://www.cnblogs.com/DjangoBlog/p/6648035.html

原文地址:https://www.cnblogs.com/shenbingyu/p/6659663.html