从默认停用词列表中删除否定词 – Spacy [重复]

心中带点小风骚 2年前 nlp 577

原文标题 ：Removing negation words from the list of default stop words – Spacy [duplicate]

用 spacy 添加/删除停用词的最佳方法是什么？我正在使用token.is_stop函数并想对集合进行一些自定义更改。我正在查看文档，但找不到任何关于停用词的信息。谢谢！

原文链接：https://stackoverflow.com//questions/71673639/removing-negation-words-from-the-list-of-default-stop-words-spacy

我来回复

Romain 评论
使用 Spacy 2.0.11，您可以使用以下方法之一更新其停用词集：

要添加单个停用词：
```
import spacy    
nlp = spacy.load("en")
nlp.Defaults.stop_words.add("my_new_stopword")
```
一次添加多个停用词：
```
import spacy    
nlp = spacy.load("en")
nlp.Defaults.stop_words |= {"my_new_stopword1","my_new_stopword2",}
```
要删除单个停用词：
```
import spacy    
nlp = spacy.load("en")
nlp.Defaults.stop_words.remove("whatever")
```
一次删除多个停用词：
```
import spacy    
nlp = spacy.load("en")
nlp.Defaults.stop_words -= {"whatever", "whenever"}
```
注意：要查看当前的停用词集，请使用：
```
print(nlp.Defaults.stop_words)
```
更新：在评论中指出此修复仅影响当前执行。要更新模型，您可以使用方法nlp.to_disk("/path")和nlp.from_disk("/path")（进一步描述在https://spacy.io/usage/saving-loading）。
2年前 0条评论

dantiston 评论

该回答已被采纳！

你可以在处理你的文本之前编辑它们（见这篇文章）：

>>> import spacy
>>> nlp = spacy.load("en")
>>> nlp.vocab["the"].is_stop = False
>>> nlp.vocab["definitelynotastopword"].is_stop = True
>>> sentence = nlp("the word is definitelynotastopword")
>>> sentence[0].is_stop
False
>>> sentence[3].is_stop
True

注意：这似乎工作 <=v1.8。对于较新的版本，请参阅其他答案。

2年前 0条评论

petezurich 评论

对于 2.0 版，我使用了这个：

from spacy.lang.en.stop_words import STOP_WORDS

print(STOP_WORDS) # <- set of Spacy's default stop words

STOP_WORDS.add("your_additional_stop_word_here")

for word in STOP_WORDS:
    lexeme = nlp.vocab[word]
    lexeme.is_stop = True

这会将所有停用词加载到一个集合中。

您可以将停用词修改为STOP_WORDS或首先使用您自己的列表。

2年前 0条评论

harryhorn 评论

对于 2.0，请使用以下内容：

for word in nlp.Defaults.stop_words:
    lex = nlp.vocab[word]
    lex.is_stop = True

2年前 0条评论

SolitaryReaper 评论

这也收集了停用词:)

spacy_stopwords = spacy.lang.en.stop_words.STOP_WORDS

2年前 0条评论
Sezin 评论
在最新版本中，以下将删除列表中的单词：
```
spacy_stopwords = spacy.lang.en.stop_words.STOP_WORDS
spacy_stopwords.remove('not')
```
2年前 0条评论
Joe 评论
对于 2.3.0 版本，如果您想替换整个列表而不是添加或删除一些停用词，您可以这样做：
```
custom_stop_words = set(['the','and','a'])

# First override the stop words set for the language
cls = spacy.util.get_lang_class('en')
cls.Defaults.stop_words = custom_stop_words

# Now load your model
nlp = spacy.load('en_core_web_md')
```
诀窍是在加载模型之前为语言分配停用词集。它还确保停用词的任何大写/小写变体都被视为停用词。
2年前 0条评论

从默认停用词列表中删除否定词 – Spacy [重复]

回复

相关问题