如何计算单词的重复次数并分配一个数字并附加到dataframe中

扎眼的阳光 python 250

原文标题how to count the number of repetation of words and assign a number and append into dataframe

我有一个包含所有摘要和作者性别的数据集。现在我想获得所有单词的性别重复,以便我可以将其绘制为单词相对于性别的重复次数的图表。

data_path = '/content/digitalhumanities - forum-and-fiction.csv'
def change_table(data_path):
  df = pd.read_csv(data_path)
  final = df.drop(["Title", "Author", "Season", "Year", "Keywords", "Issue No", "Volume"], axis=1)
  fin = final.set_index('Gender')
  return fin
change_table(data_path).T
This is the out put i got 
| Gender   | None                                              | Female                                            | Male                                              | None       | None                                  | Male                                              ,Female                                            |None                                              | Male                                             ,Female                                            |
|:----------|---------------------------------------------------|---------------------------------------------------|---------------------------------------------------|------------|---------------------------------------|---------------------------------------------------|---------------------------------------------------|---------------------------------------------------|---------------------------------------------------|---------------------------------------------------:|
| Abstract | This article describes Virginia Woolf's preocc... | The Amazonian region occupies a singular place... | This article examines Kipling's 1901 novel Kim... | Pamela; or | Virtue Rewarded uses a literary fo... | This article examines Nuruddin Farah's 1979 no... | Ecological catastrophe has challenged the cont... | British political fiction was a satirical genr... | The Lydgates have bought too much furniture an... 

现在我怎样才能获得摘要中每个单词关于性别的重复并附加到数据框。

期望输出示例

|gender|male|female|none|
|------|----|------|----|
| This    |    3|     0|   0|
|   occupies  |    5|     3|   0|
| examines    |    6|      0|   0|
|   British  |    0|      0|    7|

原文链接:https://stackoverflow.com//questions/71508265/how-to-count-the-number-of-repetation-of-words-and-assign-a-number-and-append-in

回复

我来回复
  • jezrael的头像
    jezrael 评论

    采用:

    #removed T
    df = change_table(data_path)
    
    #reshape with split columns
    df = (df.stack()
            .rename_axis(('Type','Gender'))
            .str.split(expand=True)
            .stack()
            .reset_index(name='Word'))
    
    #get counts per Gender, Word and Type
    df1 = pd.crosstab([df['Gender'], df['Word']], df['Type']).reset_index()
    
    #or get counts per Word and Type
    df2 = pd.crosstab([df['Word'], df['Type'])
    
    2年前 0条评论