问题描述:
I have a df as follows:
df_in
G1 G2 TPE QC
A S1 td 2
A S1 ts 4
A S2 td 6
A S2 ts 3
B S1 td 20
B S1 ts 40
B S2 td 60
B S2 ts 30
C S1 td 90
D S2 ts 7
So the output should be grouped by Columns G1 & G2 and for each such group, do a row wise ratio for the
column QC like (ts/td) where values are td and ts for the column TPE and rename the new variable in column TPE as
ratio. It should also contain the original rows as it is. Also it should be noted that for the TPE column some groups may not have
both ts and td values. In such cases there will be no ratio or the ratio should be kept as blank.
So the output should be this:
df_out
G1 G2 TPE QC
A S1 td 2
A S1 ts 4
A S2 td 6
A S2 ts 3
B S1 td 20
B S1 ts 40
B S2 td 60
B S2 ts 30
C S1 td 90
D S2 ts 7
A S1 ratio 2
A S2 ratio 0.5
B S1 ratio 2
B S2 ratio 0.5
C S1 ratio
D S2 ratio
I tried the following, but its omitting the blank values for group C & D with blank ratios:
def calculate_ratio(group):
td_row = group[group['TPE'] == 'td']
ts_row = group[group['TPE'] == 'ts']
if not td_row.empty and not ts_row.empty:
ratio = ts_row['QC'].values[0] / td_row['QC'].values[0]
return pd.DataFrame({'G1': [group['G1'].iloc[0]],
'G2': [group['G2'].iloc[0]],
'TPE': ['ratio'],
'QC': [ratio]})
return pd.DataFrame()
grouped = df_in.groupby(['G1', 'G2']).apply(calculate_ratio).reset_index(drop=True)
df_out = pd.concat([df_in, grouped], ignore_index=True)
Any help will be immensely appreciated.
解决方案 1:[1]
Code
tmp = df_in.set_index(['G1', 'G2', 'TPE']).unstack()['QC']
out = pd.concat([df_in, tmp['ts'].div(tmp['td']).reset_index(name='QC').assign(TPE='ratio')])
out:
G1 G2 TPE QC
0 A S1 td 2.0
1 A S1 ts 4.0
2 A S2 td 6.0
3 A S2 ts 3.0
4 B S1 td 20.0
5 B S1 ts 40.0
6 B S2 td 60.0
7 B S2 ts 30.0
8 C S1 td 90.0
9 D S2 ts 7.0
0 A S1 ratio 2.0
1 A S2 ratio 0.5
2 B S1 ratio 2.0
3 B S2 ratio 0.5
4 C S1 ratio NaN
5 D S2 ratio NaN
Intermedate
tmp:
TPE td ts
G1 G2
A S1 2.0 4.0
S2 6.0 3.0
B S1 20.0 40.0
S2 60.0 30.0
C S1 90.0 NaN
D S2 NaN 7.0
解决方案 2:[2]
Another possible solution, which uses multi-indexing, pandas.xs
to separate the ts
values from the td
ones, and finally uses pandas.concat
to concatenate the two dataframes:
s = df.set_index(['G1', 'G2', 'TPE'])
pd.concat([
df, s.xs('ts', level=2).div(s.xs('td', level=2))
.reset_index().assign(TPE='ratio')])
Output:
G1 G2 TPE QC
0 A S1 td 2.0
1 A S1 ts 4.0
2 A S2 td 6.0
3 A S2 ts 3.0
4 B S1 td 20.0
5 B S1 ts 40.0
6 B S2 td 60.0
7 B S2 ts 30.0
8 C S1 td 90.0
9 D S2 ts 7.0
0 A S1 ratio 2.0
1 A S2 ratio 0.5
2 B S1 ratio 2.0
3 B S2 ratio 0.5
4 C S1 ratio NaN
5 D S2 ratio NaN
参考链接:
Copyright Notice: This article follows StackOverflow’s copyright notice requirements and is licensed under CC BY-SA 3.0.
Article Source: StackOverflow
[1] Panda Kim
[2] PaulS