Coursera, Deep Learning 5, Sequence Models, week4, Transformer Network self-attention multi-head attention 转载请注明出处 http://www.cnblogs.com/mashuai-191/