scikit-learn简介

scikit-learn是python中常用的机器学习库，用于数据挖掘和数据科学领域，是基于NumPy，SciPy和matplotlib工具包建立的

主要有六大功能：

1. 分类（Classificaation）：Identifying to which category an object belongs to.

包括支持向量机分类(SVM)、决策树(desicion tree)、最近邻法(nearest neighbors)、朴素贝叶斯等

2. 回归（Regression）：Predicting a continuous-valued attribute associated with an object.

包括线性回归、多项式回归（polynomial regression）等等

3. 聚类（clustering）：Automatic grouping of similar objects into sets.

包括k举止(k-means)、谱聚类(spectral clustering)等

4. 降维(dimensionality reduction)：Reducing the number of random variables to consider.

包括主成分分析(PCA)、独立成分分析(ICA)、主题模型(LDA)等等

5. 模型选择（model selection）：Comparing, validating and choosing parameters and models.

包括交叉验证、模型评估、选择模型、调参等等

6. 预处理（preprocessing）：Feature extraction and normalization.

用于数据数据标准化(standardization)、归一化(Normalization)、二值化(Binarization)等等