Attention Is All You Need 一些好的资料

 

 The encoders are all identical in structure (yet they do not share weights). Each one is broken down into two sub-layers:

 

https://kexue.fm/archives/4765

https://jalammar.github.io/illustrated-transformer/

http://nlp.seas.harvard.edu/2018/04/03/attention.html

https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb#scrollTo=r6GPPFy1fL2N

原文地址:https://www.cnblogs.com/zle1992/p/10062804.html