Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

模拟到实际循环闭环：使用真实世界的经验调整模拟随机化

Yevgen Chebotar， Ankur Handa， Viktor Makoviychuk Miles Macklin Jan Issac1 Nathan Ratliff1 Dieter Fox

Abstract—We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so,we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks:swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https: //sites.google.com/view/simopt.

我们通过培训模拟场景的分布来考虑将策略转移到现实世界的问题。我们不是手动调整模拟的随机化，而是使用与策略培训交错的一些真实世界的推出来调整模拟参数分布。通过这样做，我们能够通过匹配模拟中的策略行为和现实世界来改变模拟的分布以改善策略转移。我们表明，使用我们的方法训练的策略能够在两个真实世界的任务中可靠地转移到不同的机器人：摆钉和打开橱柜抽屉。我们的实验视频可在https：//sites.google.com/view/simopt上找到。

将模拟到现实传输循环闭环是机器人策略的强大传输的重要组成部分。