CS294-112 深度强化学习 秋季学期(伯克利)NO.3 Reinforcement learning introduction

  

 

 

 

first order markov chain

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

on policy algorithm is easier to be paralleled

 off policy algorithm has to fit transition net, and policy net. much more computationally expensive

 

 

 

 

 

 

 

原文地址:https://www.cnblogs.com/ecoflex/p/9084345.html