问题描述:
I have a dataframe grouped by Client-Equipment, Date and Closing_Date. I show example:
Customer- Equipment | Date | Closing Date |
---|---|---|
Customer1 – Equipment A | 2023-01-01 | 2023-01-05 |
Customer1 – Equipment A | 2023-01-02 | NaN |
Customer1 – Equipment A | 2023-01-03 | NaN |
Customer1 – Equipment A | 2023-01-04 | NaN |
Customer1 – Equipment A | 2023-01-05 | NaN |
Customer1 – Equipment A | 2023-01-06 | NaN |
Customer2 – Equipment H | 2023-01-01 | 2023-01-02 |
Customer2 – Equipment H | 2023-01-02 | NaN |
Customer2 – Equipment H | 2023-01-03 | Nan |
I need to fill in the Closing dates until the date is equal to the closing date. The expected result would be:
Customer- Equipment | Date | Closing Date |
---|---|---|
Customer1 – Equipment A | 2023-01-01 | 2023-01-05 |
Customer1 – Equipment A | 2023-01-02 | 2023-01-05 |
Customer1 – Equipment A | 2023-01-03 | 2023-01-05 |
Customer1 – Equipment A | 2023-01-04 | 2023-01-05 |
Customer1 – Equipment A | 2023-01-05 | 2023-01-05 |
Customer1 – Equipment A | 2023-01-06 | NaN |
Customer2 – Equipment H | 2023-01-01 | 2023-01-02 |
Customer2 – Equipment H | 2023-01-02 | 2023-01-02 |
Customer2 – Equipment H | 2023-01-03 | Nan |
I’m trying codes like this:
df['test'] = df.groupby('Customer-Equipment').apply(
lambda x: x['Closing date'] if x['date'] <= x.at[row.index -1 ,'closing date'] else pd.NaT).fillna(method = 'ffill').reset_index(drop=True)
How could this be done in python?
解决方案 1[最佳方案][1]
If your dates are in increasing order, you could just groupby.ffill
and mask with where
:
s = df.groupby('Customer-Equipment')['Closing Date'].ffill()
df['Closing Date'] = s.where(s.ge(df['Date']))
Output:
Customer-Equipment Date Closing Date
0 Customer1 - Equipment A 2023-01-01 2023-01-05
1 Customer1 - Equipment A 2023-01-02 2023-01-05
2 Customer1 - Equipment A 2023-01-03 2023-01-05
3 Customer1 - Equipment A 2023-01-04 2023-01-05
4 Customer1 - Equipment A 2023-01-05 2023-01-05
5 Customer1 - Equipment A 2023-01-06 NaN
6 Customer2 - Equipment H 2023-01-01 2023-01-02
7 Customer2 - Equipment H 2023-01-02 2023-01-02
8 Customer2 - Equipment H 2023-01-03 Nan
解决方案 2:[2]
Try like this
import pandas as pd
data = {
'Customer-Equipment': ['Customer1 - Equipment A'] * 6 + ['Customer2 - Equipment H'] * 3,
'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06',
'2023-01-01', '2023-01-02', '2023-01-03']),
'Closing Date': [pd.to_datetime('2023-01-05')] * 5 + [pd.to_datetime('2023-01-02'), pd.NaT, pd.NaT]
}
df = pd.DataFrame(data)
def fill_closing_dates(group):
closing_date = group['Closing Date'].iloc[0]
group['Closing Date'] = group['Date'].apply(lambda x: closing_date if x <= closing_date else pd.NaT)
return group
df = df.groupby('Customer-Equipment').apply(fill_closing_dates)
print(df)
参考链接:
Copyright Notice: This article follows StackOverflow’s copyright notice requirements and is licensed under CC BY-SA 3.0.
Article Source: StackOverflow
[1] mozway
[2] Mahboob Nur