temporal credit assignment in reinforcement learning 【强化学习经典论文】

Sutton 出版论文的主页：

http://incompleteideas.net/publications.html

Phd 论文： temporal credit assignment in reinforcement learning

http://incompleteideas.net/publications.html#PhDthesis

最近在做强化学习方面的课题，发现在强化学习方面被称作强化学习之父的 Sutton 确实很厉害， TD算法和策略梯度策略算法都是他所提出的，虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的，但是强化学习做的最早的人之一，对强化学习中经典思想的贡献最多的人估计就是Sutton了，Sutton本硕都是在MIT读的心理学，博士阶段才读的计算机，看来确实是很强的。作为强化学习最经典的论文，也是Sutton的博士毕业论文，很是值得读一读的，寻找该篇论文许久，发现可能是由于该篇论文发表的时间过久，所以所有的数据库都没有收录，唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学，但是由于该文章只向本校学生开发，所以找了几天都没有找到，今天灵机一动，为什么不到作者的个人主页上找一找呢，这一弄还果然发现了它的存在，特此mark一下。

----------------------------------------------------------------------------------------------------------------

附：（Sutton主页 Publication部分内容）

Rich Sutton's Publications

First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:

The 2nd edition of Reinforcement Learning: An Introduction
Emphatic TD(λ); Yu's convergence proof
Weighted importance sampling version of LSTD (λ), linear-complexity algorithms
True online TD(λ)
The predictive approach to knowledge representation; PEAK; Horde; nexting
Fast gradient-based TD algorithms, nonlinear case, GQ(lambda), control, Maei's thesis
RL book
Temporal-difference learning; TD(lambda) details
The TD model of Pavlovian conditioning; earlier Sutton-Barto model; more biological 1982 & 1986; and instrumental learning
Dyna; as an integrated architecture; with FA 1996, 2008
The options paper; UAV example; precursor not superseded;
Policy gradient methods; Incremental Natural Actor-Critic Algorithms
PhD thesis, introduced actor-critic architectures and "temporal credit assignment"
PSRs; the predictive representations hypothesis; TD networks; with options
RL for RoboCup soccer keepaway
RL with continuous state and action spaces
Step-size adaptation by meta-gradient descent; IDBD; improved; earliest pub; in classical conditioning; in human category learning, in tracking
Random representations; representation search; feature discovery; more
Pole-balancing; tracking nonstationarity
Exponentiated-gradient RL; fuller TR
A study in alpha and lambda
Two problems with backprop

Also, some RL pubs that aren't mine, available for researchers:

For any broken links, please send email to rich@richsutton.com.

temporal credit assignment in reinforcement learning 【强化学习 经典论文】

Rich Sutton's Publications

temporal credit assignment in reinforcement learning 【强化学习经典论文】