Deep RL Bootcamp Lecture 9 Model-based Reinforcement

Deep RL Bootcamp Lecture 9 Model-based Reinforcement

So, the process is similar to one-to-many RNN?

learn much more efficiently than model-free method

iteratively get better

less than 300 trials ~ 25min robot time per task

visual prediction from the observation

during train of model, there is no reward. Some random motions are programmed. at the task time, there is a reward function, basically trying to move a pixel to the goal position.

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/ecoflex/p/8983080.html