李宏毅机器学习课程---3、Where does the error come from

李宏毅机器学习课程---3、Where does the error come from

一、总结

一句话总结：机器学习的模型中error的来源是什么

bias：比如打靶，你的瞄准点离准心的偏移

variance：比如打靶，你的实际打靶的位置偏离你的瞄准点的距离：相当于方差

1、机器学习中为什么需要判断error的来源？

有的放矢，改进模型：因为你的模型出错，你肯定需要改进模型，知道错误来源后才方便改进模型

2、做多次实验，一次函数和多次函数的函数在图上如何分布？

多次函数在多次实验中分布的线比较开

3、简单model和复杂model，bias和variance的大小情况如何？

简单模型：Large Bias，Small Variance

复杂模型：Small Bias，Large Variance

4、bias和variance分别很大的时候叫什么？

Underfitting：Large Bias：under说明小了，还要继续提升模型复杂度

Overfitting：Large Variance：over说明模型过渡复杂了

5、我怎么知道我模型是bias很大（Underfitting）？

不能满足training data：If your model cannot even fit the training examples, then you have large bias

6、我怎么知道我模型是variance很大（Overfitting）？

不能满足testing data：If you can fit the training data, but large error on testing data, then you probably have large variance

7、如果我模型的bias很大（Underfitting），我应该怎么做？

重新设计模型：比如考虑更多参数

更多数据没用：因为你的模型本身就不好，所以更多数据其实是没用的

For bias, redesign your model:
• Add more features as input
• A more complex model

8、如果我模型的variance很大（Overfitting），我应该怎么做？

More data：增加数据：Very effective, but not always practical

Regularization：平滑化：没用更多数据的情况：可能伤害bias

9、我们怎样选择模型？

相互转换：There is usually a trade-off between bias and variance.

相当于和更小：Select a model that balances two kinds of error to minimize total error

10、我们选择好了模型之后，用自己的测试数据测试之后，外部的测试数据测试的结果一般会比我们的测试结果大么？

一般都会比我们的测试结果大

11、我们应该如何验证我们的模型？

数据分多份：测试数据1，模型完了之后再用；测试数据2，选模型的时候再用；数据3，构建模型的时候用

一定留一份data做 private data，模拟实际用户使用的时候的情况

二、内容在总结中

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/Renyi-Fan/p/10965548.html