站点图标 AI技术聚合

Creating a ratio between variables in a Stacked dataframe grouped by some Columns

问题描述:

I have a df as follows:

df_in 
      G1    G2       TPE           QC
      
      A     S1       td            2
      A     S1       ts            4
      A     S2       td            6
      A     S2       ts            3
      B     S1       td            20
      B     S1       ts            40
      B     S2       td            60
      B     S2       ts            30
      C     S1       td            90
      D     S2       ts            7
      

So the output should be grouped by Columns G1 & G2 and for each such group, do a row wise ratio for the
column QC like (ts/td) where values are td and ts for the column TPE and rename the new variable in column TPE as
ratio. It should also contain the original rows as it is. Also it should be noted that for the TPE column some groups may not have
both ts and td values. In such cases there will be no ratio or the ratio should be kept as blank.

So the output should be this:

       df_out


         G1    G2       TPE           QC
      
         A     S1       td            2
         A     S1       ts            4
         A     S2       td            6
         A     S2       ts            3
         B     S1       td            20
         B     S1       ts            40
         B     S2       td            60
         B     S2       ts            30
         C     S1       td            90
         D     S2       ts            7
         A     S1       ratio         2
         A     S2       ratio         0.5
         B     S1       ratio         2
         B     S2       ratio        0.5
         C     S1       ratio         
         D     S2       ratio           
         
         

I tried the following, but its omitting the blank values for group C & D with blank ratios:

def calculate_ratio(group):
    td_row = group[group['TPE'] == 'td']
    ts_row = group[group['TPE'] == 'ts']
    if not td_row.empty and not ts_row.empty:
        ratio = ts_row['QC'].values[0] / td_row['QC'].values[0]
        return pd.DataFrame({'G1': [group['G1'].iloc[0]], 
                             'G2': [group['G2'].iloc[0]], 
                             'TPE': ['ratio'], 
                             'QC': [ratio]})
    return pd.DataFrame()


grouped = df_in.groupby(['G1', 'G2']).apply(calculate_ratio).reset_index(drop=True)

df_out = pd.concat([df_in, grouped], ignore_index=True)

Any help will be immensely appreciated.

解决方案 1:[1]

Code

tmp = df_in.set_index(['G1', 'G2', 'TPE']).unstack()['QC']
out = pd.concat([df_in, tmp['ts'].div(tmp['td']).reset_index(name='QC').assign(TPE='ratio')])

out:

    G1  G2  TPE     QC
0   A   S1  td      2.0
1   A   S1  ts      4.0
2   A   S2  td      6.0
3   A   S2  ts      3.0
4   B   S1  td      20.0
5   B   S1  ts      40.0
6   B   S2  td      60.0
7   B   S2  ts      30.0
8   C   S1  td      90.0
9   D   S2  ts      7.0
0   A   S1  ratio   2.0
1   A   S2  ratio   0.5
2   B   S1  ratio   2.0
3   B   S2  ratio   0.5
4   C   S1  ratio   NaN
5   D   S2  ratio   NaN

Intermedate

tmp:

    TPE td      ts
G1  G2      
A   S1  2.0     4.0
    S2  6.0     3.0
B   S1  20.0    40.0
    S2  60.0    30.0
C   S1  90.0    NaN
D   S2  NaN     7.0

解决方案 2:[2]

Another possible solution, which uses multi-indexing, pandas.xs
to separate the ts values from the td ones, and finally uses pandas.concat to concatenate the two dataframes:

s = df.set_index(['G1', 'G2', 'TPE'])

pd.concat([
    df, s.xs('ts', level=2).div(s.xs('td', level=2))
    .reset_index().assign(TPE='ratio')])

Output:

  G1  G2    TPE    QC
0  A  S1     td   2.0
1  A  S1     ts   4.0
2  A  S2     td   6.0
3  A  S2     ts   3.0
4  B  S1     td  20.0
5  B  S1     ts  40.0
6  B  S2     td  60.0
7  B  S2     ts  30.0
8  C  S1     td  90.0
9  D  S2     ts   7.0
0  A  S1  ratio   2.0
1  A  S2  ratio   0.5
2  B  S1  ratio   2.0
3  B  S2  ratio   0.5
4  C  S1  ratio   NaN
5  D  S2  ratio   NaN

参考链接:

Copyright Notice: This article follows StackOverflow’s copyright notice requirements and is licensed under CC BY-SA 3.0.

Article Source: StackOverflow

[1] Panda Kim

[2] PaulS

退出移动版