Pytorch Pretrained Bert 学习笔记

经常做NLP任务,要想获得好一点的准确率,需要一个与训练好的embedding模型。

参考:github

Install

pip install pytorch-pretrained-bert

Usage

BertTokenizer

BertTokenizer会分割输入的句子,便于后面嵌入。

import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)

对于找不到的词,会限制最大长度进行分割。

BertModel

tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))

将上面的列表转为tensor,并传给bertmodel

model = BertModel.from_pretrained('bert-base-uncased')
model.eval()

# Predict hidden states features for each layer
encoded_layers, _ = model(tokens_tensor, segments_tensors)
一个人没有梦想,和咸鱼有什么区别!
原文地址:https://www.cnblogs.com/TABball/p/13784558.html