如何在 JupyterLab 打印输出中删除 和
machine-learning 209
原文标题 :How to remove
我正在使用通过 Anaconda 安装的 JupyterLab 笔记本来运行机器学习应用程序。如果我运行该应用程序,JupyterLab 会自动在每个生成的句子的开头和结尾插入
这是一个例子:
import re
from transformers import T5Tokenizer, T5ForConditionalGeneration
tweet_data = ['the coming days and weeks especially, it is critical that social media platforms apply their standards in a mann',
'With just 2 days to go, what does my timeline think about the #USElections2020', '..more data here']
model = T5ForConditionalGeneration.from_pretrained('t5-base')
tokenizer = T5Tokenizer.from_pretrained('t5-base')
text = " ".join(tweet_data)
TEXT_CLEANING_RE = "@\S+|https?:\S+|http?:\S|[^A-Za-z0-9]+"
text = re.sub(TEXT_CLEANING_RE, ' ', str(text).lower()).strip()
Preprocessed_text = "summarize: "+text
tokens_input = tokenizer.encode(Preprocessed_text,return_tensors="pt", max_length=512, truncation=True)
summary_ids = model.generate(tokens_input, min_length=60, max_length=180, length_penalty=4.0)
summary = tokenizer.decode(summary_ids[0])
print(summary)
这是输出:
<pad> srpoll: joebiden elections2020: joebiden of equality free.</s>
如何确保
我尝试将它们作为字符串删除,但我没有成功。
回复
我来回复-
Sadra Naddaf 评论
根据文档,您可以通过设置标志
skip_special_tokens=True
(默认为False
)来跳过特殊标记。因此,只需将decode
行更改为:summary = tokenizer.decode(summary_ids[0],skip_special_tokens=True)
输出:
social media platforms should use their standards in mann with just 2 days to go. what does my timeline think about the uselections2020 more data here. the uselections2020 data is a mann with just 2 days to go. the uselections2020 data is a mann with just 2 days to go.
2年前