从数据框python中提取多个数组

扎眼的阳光 python 206

原文标题Extract multiple arrays from dataframe python

customer| current_state | year  | amount
ax111   |   A           |   3   |  300
ax112   |   D           |   4   |  4890
ax113   |   G           |   9   |  624

我有一个数据框,我需要将客户数据提取到一个数组列表中,其中当前状态是需要放置金额的位置。有7个状态(A-G)

示例输出

([
    [300,0,0,0,0,0,0],
    [0,0,0,4890,0,0,0],
    [0,0,0,0,0,0,624],
])

我还需要在一维数组中提取年份

year=[3,4,9]

原文链接:https://stackoverflow.com//questions/71464135/extract-multiple-arrays-from-dataframe-python

回复

我来回复
  • jezrael的头像
    jezrael 评论

    CreateMultiIndexbyDataFrame.set_indexwithSeries.unstackfor reshape,然后通过DataFrame.reindex添加缺失状态:

    states = list('ABCDEFG')
    df1 = (df.set_index('current_state', append=True)['amount']
             .unstack(fill_value=0)
             .reindex(states, axis=1, fill_value=0))
    print (df1)
    
    current_state    A  B  C     D  E  F    G
    0              300  0  0     0  0  0    0
    1                0  0  0  4890  0  0    0
    2                0  0  0     0  0  0  624
    

    如果逻辑不同并且需要通过DataFrame.pivot_table与聚合sum如果可能重复custome, current_state行来为每个客户旋转:

    df1 = (df.pivot_table(index='customer',
                          columns='current_state', 
                          values='amount',
                          fill_value=0, 
                          aggfunc='sum')
             .reindex(states, axis=1, fill_value=0))
    print (df1)
    current_state    A  B  C     D  E  F    G
    customer                                 
    ax111          300  0  0     0  0  0    0
    ax112            0  0  0  4890  0  0    0
    ax113            0  0  0     0  0  0  624
    

    然后转换为列表:

    L = df1.to_numpy().tolist()
    print (L)
    [[300, 0, 0, 0, 0, 0, 0], [0, 0, 0, 4890, 0, 0, 0], [0, 0, 0, 0, 0, 0, 624]]
    
    year = df['year'].tolist()
    print (year)
    [3, 4, 9]
    
    2年前 0条评论