如何在 JupyterLab 打印输出中删除和

青葱年少 2年前 machine-learning 209

原文标题 ：How to remove and on JupyterLab print output

我正在使用通过 Anaconda 安装的 JupyterLab 笔记本来运行机器学习应用程序。如果我运行该应用程序，JupyterLab 会自动在每个生成的句子的开头和结尾插入和标签。

这是一个例子：

import re
from transformers import T5Tokenizer, T5ForConditionalGeneration
tweet_data = ['the coming days and weeks especially, it is critical that social media platforms apply their standards in a mann',
 'With just 2 days to go, what does my timeline think about the #USElections2020', '..more data here']
model = T5ForConditionalGeneration.from_pretrained('t5-base')
tokenizer = T5Tokenizer.from_pretrained('t5-base')
text = " ".join(tweet_data)
TEXT_CLEANING_RE = "@\S+|https?:\S+|http?:\S|[^A-Za-z0-9]+"
text = re.sub(TEXT_CLEANING_RE, ' ', str(text).lower()).strip()
Preprocessed_text = "summarize: "+text
tokens_input = tokenizer.encode(Preprocessed_text,return_tensors="pt", max_length=512, truncation=True)
summary_ids = model.generate(tokens_input, min_length=60, max_length=180, length_penalty=4.0)
summary = tokenizer.decode(summary_ids[0])
print(summary)

这是输出：

<pad> srpoll: joebiden elections2020: joebiden of equality free.</s>

如何确保和不在打印输出上？该应用程序是面向用户的，因此如果标签出现，可能会影响他们的体验。

我尝试将它们作为字符串删除，但我没有成功。

原文链接：https://stackoverflow.com//questions/71671769/how-to-remove-pad-and-s-on-jupyterlab-print-output

我来回复

Sadra Naddaf 评论

根据文档，您可以通过设置标志skip_special_tokens=True（默认为False）来跳过特殊标记。因此，只需将decode行更改为：

summary = tokenizer.decode(summary_ids[0],skip_special_tokens=True)

输出：

social media platforms should use their standards in mann with just 2 days to go. what does my timeline think about the uselections2020 more data here. the uselections2020 data is a mann with just 2 days to go. the uselections2020 data is a mann with just 2 days to go.

2年前 0条评论

如何在 JupyterLab 打印输出中删除 和

回复

相关问题

如何在 JupyterLab 打印输出中删除和