Create new column with subtotal for first instance of unique value in column, 0 otherwise

乘风 • 2023年12月22日下午6:17 • 问题解决 • 阅读 204

Table of Contents

问题描述:

I have a data set that is as follows:

  ClientID        Date  Total sales  Cumulative Sales
0        A  01-01-2000          100               300
1        A  01-02-2000          100                 0
2        A  01-03-2000          100                 0
3        B  01-01-2000           50               100
4        B  01-02-2000           50                 0
5        C  01-01-2000           70               210
6        C  01-02-2000           70                 0
7        C  01-03-2000           70                 0
8        D  01-01-2000           20                40
9        D  01-02-2000           20                 0

I want to figure out a way to create the Cumulative Sales column, that will be the sum of Total sales for the first row of each unique ClientID, and 0 for the rest.

I figured out the solution in excel by creating a new column R for counting first distinct value and then using the IF(SUMIF) function to calculate the sum:

=IF(COUNTIFS($F$2:$F8, $F8)=1, SUMIF($F$2:$F$1065, R8, $P$2:$P$1065), "0")

解决方案 1:^[1]

Use numpy.where with set 0 by mask for duplicated values by Series.duplicated and for first values set sum by GroupBy.transform:

df['Cumulative Sales'] = np.where(df['ClientID'].duplicated(), 
                                  0, 
                                  df.groupby('ClientID')['Total sales'].transform('sum'))
print (df)
  ClientID        Date  Total sales  Cumulative Sales
0        A  01-01-2000          100               300
1        A  01-02-2000          100                 0
2        A  01-03-2000          100                 0
3        B  01-01-2000           50               100
4        B  01-02-2000           50                 0
5        C  01-01-2000           70               210
6        C  01-02-2000           70                 0
7        C  01-03-2000           70                 0
8        D  01-01-2000           20                40
9        D  01-02-2000           20                 0

解决方案 2:^[2]

Use a groupby.transform and mask the non-first values based on the duplicated IDs:

df['Cumulative Sales'] = (df.groupby('Client ID')['Total sales'].transform('sum')
                            .mask(df['Client ID'].duplicated(), 0)
                         )

Output:

  Client ID        Date  Total sales  Cumulative Sales
0         A  01-01-2000          100               300
1         A  01-02-2000          100                 0
2         A  01-03-2000          100                 0
3         B  01-01-2000           50               100
4         B  01-02-2000           50                 0
5         C  01-01-2000           70               210
6         C  01-02-2000           70                 0
7         C  01-03-2000           70                 0
8         D  01-01-2000           20                40
9         D  01-02-2000           20                 0

参考链接:

Article Source: StackOverflow

[1] jezrael

[2] mozway

Create new column with subtotal for first instance of unique value in column, 0 otherwise

问题描述:

解决方案 1:[1]

解决方案 2:[2]

参考链接:

相关推荐

解决方案 1:^[1]

解决方案 2:^[2]