如何在 JupyterLab 打印输出中删除

原文标题How to remove and on JupyterLab print output

我正在使用通过 Anaconda 安装的 JupyterLab 笔记本来运行机器学习应用程序。如果我运行该应用程序,JupyterLab 会自动在每个生成的句子的开头和结尾插入 和 标签。

这是一个例子:

import re
from transformers import T5Tokenizer, T5ForConditionalGeneration
tweet_data = ['the coming days and weeks especially, it is critical that social media platforms apply their standards in a mann',
 'With just 2 days to go, what does my timeline think about the #USElections2020', '..more data here']
model = T5ForConditionalGeneration.from_pretrained('t5-base')
tokenizer = T5Tokenizer.from_pretrained('t5-base')
text = " ".join(tweet_data)
TEXT_CLEANING_RE = "@\S+|https?:\S+|http?:\S|[^A-Za-z0-9]+"
text = re.sub(TEXT_CLEANING_RE, ' ', str(text).lower()).strip()
Preprocessed_text = "summarize: "+text
tokens_input = tokenizer.encode(Preprocessed_text,return_tensors="pt", max_length=512, truncation=True)
summary_ids = model.generate(tokens_input, min_length=60, max_length=180, length_penalty=4.0)
summary = tokenizer.decode(summary_ids[0])
print(summary)

这是输出:

<pad> srpoll: joebiden elections2020: joebiden of equality free.</s>

如何确保 和 不在打印输出上?该应用程序是面向用户的,因此如果标签出现,可能会影响他们的体验。

我尝试将它们作为字符串删除,但我没有成功。

原文链接:https://stackoverflow.com//questions/71671769/how-to-remove-pad-and-s-on-jupyterlab-print-output

回复

我来回复
  • Sadra Naddaf的头像
    Sadra Naddaf 评论

    根据文档,您可以通过设置标志skip_special_tokens=True(默认为False)来跳过特殊标记。因此,只需将decode行更改为:

    summary = tokenizer.decode(summary_ids[0],skip_special_tokens=True)
    

    输出:

    social media platforms should use their standards in mann with just 2 days to go. what does my timeline think about the uselections2020 more data here. the uselections2020 data is a mann with just 2 days to go. the uselections2020 data is a mann with just 2 days to go.
    
    2年前 0条评论