使用 if else 语句对数据框进行情感分析

xiaoxingxing nlp 207

原文标题sentiment analysis of a dataframe using if else statements

我使用这个函数获得了形容词:

def getAdjectives(text):

    blob=TextBlob(text)
    return [ word for (word,tag) in blob.tags if tag == "JJ"]

dataset['adjectives'] = dataset['text'].apply(getAdjectives)`

我使用以下代码从 json 文件中获取了数据框:

with open('reviews.json') as project_file:    
    data = json.load(project_file)
dataset=pd.json_normalize(data) 
print(dataset.head())

我已经使用以下代码对数据框进行了情绪分析:

dataset[['polarity', 'subjectivity']] = dataset['text'].apply(lambda text: pd.Series(TextBlob(text).sentiment))
print(dataset[['adjectives', 'polarity']])

这是输出:


                                          adjectives  polarity
0                                                 []  0.333333
1  [right, mad, full, full, iPad, iPad, bad, diff...  0.209881
2                             [stop, great, awesome]  0.633333
3                                          [awesome]  0.437143
4                        [max, high, high, Gorgeous]  0.398333
5                                     [decent, easy]  0.466667
6  [it’s, bright, wonderful, amazing, full, few...  0.265146
7                                       [same, same]  0.000000
8         [old, little, Easy, daily, that’s, late]  0.161979
9                       [few, huge, storage.If, few]  0.084762

我试图过滤形容词,以确定这段代码中具有正极性、中性和负极性的形容词:

if dataset['polarity']> 0:
    print(dataset[['adjectives', 'polarity']], "Positive")
        
elif dataset['polarity'] == 0:
    print(dataset[['adjectives', 'polarity']], "Neutral")   
else: 
        print(dataset[['adjectives', 'polarity']], "Negative")

我得到了错误:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

请帮忙。

原文链接:https://stackoverflow.com//questions/71514314/sentiment-analysis-of-a-dataframe-using-if-else-statements

回复

我来回复
  • richardec的头像
    richardec 评论

    尝试使用np.select根据极性确定情绪:

    df['sentiment'] = np.select(
        [
            dataset['polarity'] > 0,
            dataset['polarity'] == 0
        ],
        [
            "Positive",
            "Neutral"
        ],
        default="Negative"
    )
    

    单线:

    df['sentiment'] = np.select([dataset['polarity'] > 0, dataset['polarity'] == 0], ["Positive", "Neutral"], "Negative")
    
    2年前 0条评论
  • Timus的头像
    Timus 评论

    如果只想打印dataset的相应部分:

    print('Positive:')
    print(dataset.loc[dataset['polarity'] > 0, ['adjectives', 'polarity']])
    print('Neutral:')
    print(dataset.loc[dataset['polarity'] == 0, ['adjectives', 'polarity']])
    print('Negative:')
    print(dataset.loc[dataset['polarity'] < 0, ['adjectives', 'polarity']])
    

    有关详细信息,请参阅布尔索引。

    2年前 0条评论