Reinforcement Learning

the differences are between the three types of learning（supervised, unsupervised and reinforcement）

监督学习、无监督学习和强化学习的区别

supervised learning sort of takes the form of function approximation where you're given a bunch of x, y pairs And your goal is to finda function f that will map some new x to a proper y

监督学习是通过对有标签数据进行学习，找到一个能很好拟合函数，对新样本x能得到一个最准确的y（以尽可能正确地对训练集以外的示例标签进行预测）

Unsupervised learning is very similar to supervised learning except that it turns out that you're given a bunch of x's and your goal is to find some f. That gives you a compact description of the set of x's that you've seen. So we call this clustering, or description as opposed to function approximation

无监督学习和监督学习类似，根据大量的无标签训练样本找到最佳拟合函数

reinforcement learning looks a lot like Supervised learning, in that we're going to be given a string of pairs of data, and we're going to try to learn some functions. But in the function approximation case, a supervized learning case, we were given a bunch of X and Y pairs. We were asked to learn F, but in reinforcement learning, we were given something totally different.Were instead going to be given x's and z's, and reinforcement learning is one mechanism for doing decision making.

强化学习看起来和监督学习类似，我们试图从一些数据对中学习一些函数。但监督学习的逼近函数是对x,y对而言，而强化学习是一些决策机制。

监督学习(supervised learning)和RL的区别在于，监督学习必须提供十分精确的例子。比如说学习下棋的时候，必须给出每一步的例子，进行训练。或者在训练一个声带系统发声的时候，需要给出每块声带肌肉震动收缩的例子。但是实际上，人们有时候很难得到完整精确的例子（比如说打球的时候，身体每块肌肉的运动的例子），却只能给出每次尝试以后的结果，比如说，这次击球的误差，声带系统发声的相似程度，或者告诉你这盘棋的最后结果。

而且RL学习的系统，给出的反馈往往不是实时的，而是有延时的，也就是你下棋，下了N步之后，在最后的一步才能得到评价输或赢的反馈。而你必须使用这些反馈去指导你之前做决策的过程。这种有延时的反馈信息，很难被监督学习所利用。监督学习更多的是去学习，同一个时间内，两个事情的对应关系。

强化学习模型的建立是通过程序的不断尝试和交互进行改进的

参考资料：

http://blog.csdn.net/ppn029012/article/details/8666328