bert相关变体

bert相关变体

bert的一些应用：https://github.com/Jiakui/awesome-bert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

减少bert模型的参数：
- 将词嵌入矩阵进行分解
- 参数共享
使用SOP（sentence order predict）代替NSP（next sentence predict）：因为两个句子之间可能是通过主题进行预测下一句的；

xlnet：Generalized Autoregressive Pretraining for Language Understanding

论文地址：https://arxiv.org/pdf/1906.08237.pdf
预训练模型及代码地址：https://github.com/zihangdai/xlnet
论文原理：XLNet:运行机制及和Bert的异同比较 https://zhuanlan.zhihu.com/p/7025742
融合自回归和自编码模型的优点
解决问题：

bert中mask的token之间默认是独立的；
bert在预训练和微调时存在差异；

解决方法：

全排列语言模型
two-stream自注意力机制
借鉴transfomrer-xl，学习到更长距离的信息
使用相对位置编码
Multiple Segments建模

elmo：Deep contextualized word representations

两个单向的lstm模型

gpt：Improving Language Understanding by Generative Pre-Training

单向transformer语言模型，使用transformer的解码器

gpt2：Language Models are Unsupervised Multitask Learners

结构和gpt大致相同，不同点：

使用了更大的模型；
使用了数据更大、范围更广、质量更高的数据来训练；
多任务学习；

bert-wwm：Pre-Training with Whole Word Masking for Chinese BERT

使用全词Mask，这样wordpiece的时候就会将整个词进行mask。

RoBERTa: A Robustly Optimized BERT Pretraining Approach

移除NSP这个任务；
动态改变mask的策略；
其它实验的一些设置；

ERNIE: Enhanced Representation through Knowledge Integration

将外部知识引入到预训练模型当中

ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

多任务学习

ELECTRA：Efficiently Learning an Encoder that Classifies Token Replacements Accurately

提出了新的预训练任务和框架：将生成对砍网络（GAN）引入到了NLP中

生成式的Masked language model(MLM)预训练任务改成了判别式的Replaced token detection(RTD)任务

https://www.cnblogs.com/sandwichnlp/p/11947627.html

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/xiximayou/p/14437866.html