如何从数据框列中计算来自 Spacy 的名词数量？

扎眼的阳光 2年前 nlp 414

原文标题 ：How to count the number of nouns from Spacy from a dataframe column?

我有一个这样的数据框（例如）。

text
I left the country.
Andrew is from America and he loves apples.

我想添加一个新列，名词数量，Spacy 应该在其中计算 NOUNS pos 标签。如何在 Python 中转换它？

import pandas as pd
import spacy

# the dataframe

# NLP Spacy with POS tags
nlp = spacy.load("en_core_web_sm")

我的问题是，如何在“文本”列上应用 nlp，检查 pos 是否为 NOUN 并计算它并将其作为特征给出？

谢谢！

原文链接：https://stackoverflow.com//questions/71664985/how-to-count-the-number-of-nouns-from-spacy-from-a-dataframe-column

我来回复

I'mahdi 评论

你可以像下面这样使用applyinpandas：

import spacy
import pandas as pd
import collections

sp = spacy.load("en_core_web_sm")
df = pd.DataFrame({'text':['I left the country and city', 
                           'Andrew is from America and he loves apples and bananas']})

# >>> df
#     text
# 0   I left the country and city
# 1   Andrew is from America and he loves apples and bananas

def count_noun(x):
    res = [token.pos_ for token in sp(x)]
    return collections.Counter(res)['NOUN']

df['C_NOUN'] = df['text'].apply(count_noun)
print(df)

输出：

                                                     text     C_NOUN
0                             I left the country and city     2
1  Andrew is from America and he loves apples and bananas     2

如果您想获取名词列表和数量，可以尝试以下操作：

def count_noun(x):
    nouns = [token.text for token in sp(x) if token.pos_=='NOUN']
    return [nouns, len(nouns)]

df[['list_NOUN','C_NOUN']] = pd.DataFrame(df['text'].apply(count_noun).tolist())
print(df)

输出：

                             text          list_NOUN    C_NOUN
0     I left the country and city    [country, city]    2
1   Andrew ... apples and bananas  [apples, bananas]    2

2年前 0条评论

Talha Tayyab 评论

首先，我正在创建一个演示数据框：

import spacy
import pandas as pd
nlp = spacy.load("en_core_web_sm")
df = pd.DataFrame([["I left the country"],["Andrew is from America and he loves apples."]],columns=["text"])

它看起来像这样：

enter image description here

m=[]   # empty list to save values
for x in range(len(df['text'])): #  here you can have any number of rows in dataframe
  doc=nlp(df['text'][x])  #here we are applying nlp on each row from text column in dataframe.
  for n in doc.noun_chunks:
    m.append(n.text)
print(m)
print(len(m)) # this gives the count of number of nouns in all text rows.

enter image description here

2年前 0条评论

如何从数据框列中计算来自 Spacy 的名词数量？

回复

相关问题