如何在python中查找连续时间戳并计算总和

社会演员多 2年前 python 472

原文标题 ：how to find consecutive timestamp and calculate sum in python

我正在查找每个客户在 24 小时内的销售额总和。

例如，

id     timestamp            sales
123   2022-10-01 12:50:55   11
124   2022-10-01 22:50:55   11
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11

然后对于“`id = 123“，我们选择

1. 
id     timestamp            sales
123   2022-10-01 12:50:55   11
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11

Sum = 11+11+11 = 33

2. 
id     timestamp            sales
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11

Sum = 11+11+11 = 33

3. 
id     timestamp            sales
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11

Sum = 11+11 = 22

4.
id     timestamp            sales
123   2022-10-02 13:50:55   11

Sum = 11

我们得到id = 123is的结果

id     timestamp            sales   sum
123   2022-10-01 12:50:55   11      33
123   2022-10-01 13:50:55   11      33
123   2022-10-02 12:50:55   11      22
123   2022-10-02 13:50:55   11      11

For id = 124, we get

id     timestamp            sales   sum
124   2022-10-01 22:50:55   11      11

我知道交叉连接可以解决这个问题，但是这种方法对于大数据集来说是耗时的。

有没有更好的方法来实现这一目标？

谢谢

原文链接：https://stackoverflow.com//questions/71507997/how-to-find-consecutive-timestamp-and-calculate-sum-in-python

我来回复

Amirhossein Kiani 评论
你可以用groupby给它pd.Groupby(freq="D")和id列：
```
df["timestamp"] = pd.to_datetime(df["timestamp"])
df.set_index("timestamp", inplace=True)
newDf = df.groupby([pd.Grouper(freq="D"), "id"]).sum().reset_index()
newDf
```
newDf将是：

timestamp id sales

0 2022-10-01 00:00:00 123 22

1 2022-10-01 00:00:00 124 11

2 2022-10-02 00:00:00 123 22

因此，通过调用newDf[newDf["id"] == 124]你会得到：

timestamp id sales

1 2022-10-01 00:00:00 124 11

请注意，时间与您提到的不完全一样，因为 00:00 通常被认为是一天的开始，而不是任何其他时间。
2年前 0条评论

如何在python中查找连续时间戳并计算总和

回复

相关问题