使用 CART 回归树做预测

原创转载请注明出处:https://www.cnblogs.com/agilestyle/p/12719231.html

准备数据

这里使用到 sklearn 自带的波士顿房价数据集,该数据集给出了影响房价的一些指标,比如犯罪率,房产税等,最后给出了房价。根据这些指标,使用 CART 回归树对波士顿房价进行预测。

from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor

boston = load_boston()

features = boston.data
labels = boston.target

# (506, 13)
features.shape
# (506,)
labels.shape

分割训练集和测试集

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.33, random_state=0)

建模训练

dtr = DecisionTreeRegressor()
# DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
#                       max_leaf_nodes=None, min_impurity_decrease=0.0,
#                       min_impurity_split=None, min_samples_leaf=1,
#                       min_samples_split=2, min_weight_fraction_leaf=0.0,
#                       presort=False, random_state=None, splitter='best')
dtr.fit(X_train, y_train)

评价模型

predict_price = dtr.predict(X_test)

print('回归树二乘偏差均值:', mean_squared_error(y_test, predict_price))
print('回归树绝对值偏差均值:', mean_absolute_error(y_test, predict_price))

运行结果(每次运行结果可能会有不同)

回归树二乘偏差均值: 24.67646706586826
回归树绝对值偏差均值: 3.1670658682634736

决策树可视化

from sklearn.tree import export_graphviz

with open('boston.dot', 'w') as f:
    f = export_graphviz(dtr, out_file=f)

如果把回归树画出来,可以得到下面的图示(波士顿房价数据集的指标有些多,所以树比较大):

Reference

https://time.geekbang.org/column/article/78659

原文地址:https://www.cnblogs.com/agilestyle/p/12719231.html