线性回归之决定系数(coefficient of determination)

1. Sum Of Squares Due To Error 

对于第i个观察点, 真实数据的Yi与估算出来的Yi-head的之间的差称为第i个residual, SSE 就是所有观察点的residual的和
2. Total Sum Of Squares


3. Sum Of Squares Due To Regression

通过以上我们能得到以下关于他们三者的关系

决定系数: 判断 回归方程 的拟合程度


(coefficient of determination)决定系数也就是说: 通过回归方程得出的 dependent variable 有 number% 能被 independent variable 所解释. 判断拟合的程度


(Correlation coefficient) 相关系数 : 测试dependent variable 和 independent variable 他们之间的线性关系有多强. 也就是说, independent variable 产生变化时 dependent variable 的变化有多大.

可以反映是正相关还是负相关

参考链接:http://blog.csdn.net/ytdxyhz/article/details/51730995

注意此决定系数不能用来衡量非线性回归的拟合优度

Why Is It Impossible to Calculate a Valid R-squared for Nonlinear Regression?

R-squared is based on the underlying assumption that you are fitting a linear model. If you aren’t fitting a linear model, you shouldn’t use it. The reason why is actually very easy to understand.

For linear models, the sums of the squared errors always add up in a specific manner: SS Regression + SS Error = SS Total.

This seems quite logical. The variance that the regression model accounts for plus the error variance adds up to equal the total variance. Further, R-squared equals SS Regression / SS Total, which mathematically must produce a value between 0 and 100%.

In nonlinear regression, SS Regression + SS Error do not equal SS Total! This completely invalidates R-squared for nonlinear models, and it no longer has to be between 0 and 100%.

参考链接:http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression

更新:

For cases other than fitting by ordinary least squares, the R2 statistic can be calculated as above and may still be a useful measure. If fitting is by weighted least squares or generalized least squares, alternative versions of R2 can be calculated appropriate to those statistical frameworks, while the "raw" R2 may still be useful if it is more easily interpreted. Values for R2 can be calculated for any type of predictive model, which need not have a statistical basis.

参考链接:https://en.wikipedia.org/wiki/Coefficient_of_determination

更新:

https://stats.stackexchange.com/questions/7357/manually-calculated-r2-doesnt-match-up-with-randomforest-r2-for-testing

这篇回答中给了两个信息:

(1)线性回归的R方等于实际值与预测值的相关系数的平方

(2)randomForest is reporting variation explained as opposed to variance explained.

原文地址:https://www.cnblogs.com/guo-xiang/p/7295550.html