从数据框python中提取多个数组
python 206
原文标题 :Extract multiple arrays from dataframe python
customer| current_state | year | amount
ax111 | A | 3 | 300
ax112 | D | 4 | 4890
ax113 | G | 9 | 624
我有一个数据框,我需要将客户数据提取到一个数组列表中,其中当前状态是需要放置金额的位置。有7个状态(A-G)
示例输出
([
[300,0,0,0,0,0,0],
[0,0,0,4890,0,0,0],
[0,0,0,0,0,0,624],
])
我还需要在一维数组中提取年份
year=[3,4,9]
回复
我来回复-
jezrael 评论
Create
MultiIndex
byDataFrame.set_index
withSeries.unstack
for reshape,然后通过DataFrame.reindex
添加缺失状态:states = list('ABCDEFG') df1 = (df.set_index('current_state', append=True)['amount'] .unstack(fill_value=0) .reindex(states, axis=1, fill_value=0)) print (df1) current_state A B C D E F G 0 300 0 0 0 0 0 0 1 0 0 0 4890 0 0 0 2 0 0 0 0 0 0 624
如果逻辑不同并且需要通过
DataFrame.pivot_table
与聚合sum
如果可能重复custome, current_state
行来为每个客户旋转:df1 = (df.pivot_table(index='customer', columns='current_state', values='amount', fill_value=0, aggfunc='sum') .reindex(states, axis=1, fill_value=0)) print (df1) current_state A B C D E F G customer ax111 300 0 0 0 0 0 0 ax112 0 0 0 4890 0 0 0 ax113 0 0 0 0 0 0 624
然后转换为列表:
L = df1.to_numpy().tolist() print (L) [[300, 0, 0, 0, 0, 0, 0], [0, 0, 0, 4890, 0, 0, 0], [0, 0, 0, 0, 0, 0, 624]] year = df['year'].tolist() print (year) [3, 4, 9]
2年前