CS294-112 深度强化学习秋季学期（伯克利）NO.3 Reinforcement learning introduction

CS294-112 深度强化学习秋季学期（伯克利）NO.3 Reinforcement learning introduction

first order markov chain

on policy algorithm is easier to be paralleled

off policy algorithm has to fit transition net, and policy net. much more computationally expensive

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/ecoflex/p/9084345.html