如何构建规范化的 tf 数据框？

xiaoxingxing 2年前 nlp 186

原文标题 ：How to build a normalized tf dataframe?

我想将此应用到我的 tf 函数中。 enter image description here 但无法构建函数。

我的数据集是这样的 enter image description here

我试图构建这样的功能

def term_document_matrix(data, vocab_list = None, doc_index= 'ID', text= 'text'):
      tf_matirx = pd.DataFrame(columns=df[document_index], index= vocab).fillna(0)
    a = int(input("enter the value"))
    for word in tf_matrix.index:
    
    for doc in data[document_index]:
        
        result = a + (1-a)*[data[data[document_index] == doc][text].values[0].count(word)/X]
        X = ????????
        tf_matrix.loc[word,doc] = result
return tf_matrix

但无法完全建立这个。

这里参数描述如下

parameter:

    data: DataFrame. 
    Frequency of word calculated against the data.
    
    vocab_list: list of strings.
    Vocabulary of the documents    
    
    doc_index: str.
    Column name for document index in DataFrame passed.
    
    text: str
    Column name containing text for all documents in DataFrame,
    
returns:
    tf_matrix: DataFrame.
    DataFrame containing term document matrix.
    """

我的目标是得到这样的数据框 enter image description here

原文链接：https://stackoverflow.com//questions/71876033/how-to-build-a-normalized-tf-dataframe

我来回复

Silverstalon 评论
该回答已被采纳！

您可以使用 CountVectorizer 确定 tf 数据帧。然后将每个值除以其列的最大值，并对数据框中的每一列重复此过程
```
 df_1st = df.apply(lambda col: col / col.max())
```
然后只需为数据框中的每个元素相乘并添加一个缩放器。
```
df_2nd = df_1st.apply(lambda col: lambda + col*(1-lambda))
tf_matrix = df_2nd
```
2年前 0条评论

如何构建规范化的 tf 数据框？

回复

相关问题