学习如何使用GPT2进行文本生成(torch+transformers)

学习如何使用GPT2进行文本生成(torch+transformers)

GPT2是OPen AI发布的一个预训练语言模型,见论文《Language Models are Unsupervised Multitask Learners》,GPT-2利用单向Transformer的优势,做一些BERT使用的双向Transformer所做不到的事。那就是通过上文生成下文文本。
理论部分的文章有很多,这里不做深究,下面直接看代码吧

导入相关包

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

加载tokenizer

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

编码输入

对给出的文本进行编码,并转换为tensor

indexed_tokens = tokenizer.encode("Xiao Ming is a primary school student. He likes playing games")

print( tokenizer.decode(indexed_tokens))

tokens_tensor = torch.tensor([indexed_tokens])

Xiao Ming is a primary school student. He likes playing games

加载预训练模型(权重)

model = GPT2LMHeadModel.from_pretrained('gpt2')

将模型设置为评估模式

model.eval()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

tokens_tensor = tokens_tensor.to(device)
model.to(device)

预测所有标记

with torch.no_grad():
    outputs = model(tokens_tensor)
    predictions = outputs[0]

得到预测的下一词

predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_text = tokenizer.decode(indexed_tokens + [predicted_index])
print(predicted_text)

可以看到,GPT2预测的下一个词是and

Xiao Ming is a primary school student. He likes playing games and

生成一段完整的话

stopids = tokenizer.convert_tokens_to_ids(["."])[0] 
past = None
for i in range(100):
    with torch.no_grad():
        output, past = model(tokens_tensor, past_key_values=past, return_dict=False)

    token = torch.argmax(output[..., -1, :])

    indexed_tokens += [token.tolist()]

    if stopids== token.tolist():
        break
    tokens_tensor = token.unsqueeze(0)
    
sequence = tokenizer.decode(indexed_tokens)

print(sequence)

生成的文本为:and playing with his friends.与给出的句子构成了一段完整的话。

Xiao Ming is a primary school student. He likes playing games and playing with his friends.

试试其他语句

我们将上面的句子加上句号,gpt2会生成一个不一样的句子。
原:Xiao Ming is a primary school student. He likes playing games
现:Xiao Ming is a primary school student. He likes playing games.
生成为:

Xiao Ming is a primary school student. He likes playing games. He is also a member of the team that won the World Cup in 2010.

文章出处登录后可见!

已经登录?立即刷新

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
心中带点小风骚的头像心中带点小风骚普通用户
上一篇 2023年8月26日
下一篇 2023年8月26日

相关推荐