Deep RL Bootcamp Lecture 7: SVG, DDPG, and Stochastic Computation Graphs

 

 

 

^ is the square root of epsilon

 

 

a simplified version of hard version

a more smooth way to find correct solution

 

the first term is the REINFORCE term, and the seconde term is our grad log probability of our loss

 

b is a stochastic node 

 

 

      

more formula derivations are ignored.

原文地址:https://www.cnblogs.com/ecoflex/p/8977893.html