Pandas 或其他 Python 应用程序根据具有规则的其他两列生成具有 1 到 n 值的列

扎眼的阳光 2年前 python 203

原文标题 ：Pandas or other Python application to generate a column with 1 to n value based on other two columns with rules

希望我能正确解释这个问题。

在基本术语中，想象 df 如下：

打印（df）

year        id  
1           16100
1           150 
1           150
2           66
2           370
2           370 
2           530
3           41
3           43  
3           61

如果年份行相同，则需要 df.seq 为循环 1 到 n 值，直到它更改为止。如果上述行 id 值相同，df.seq2 仍将是 n，而不是 n+1。

所以如果我们想象像公式一样的excel会是这样的

df.seq2 = IF(A2=A1,IF(B2=B1,F1,F1+1),1)

这将使所需的输出 seq 和 seq2 如下：

year        id      seq   seq2
1           16100    1     1
1           150      2     2
1           150      3     2
2           66       1     1
2           370      2     2
2           370      3     2
2           530      4     3
3           41       1     1
3           43       2     2
3           61       3     3

是否测试了几件事（假设我已经生成了 df.seq）


comb_df['match'] = comb_df.year.eq(comb_df.year.shift())
comb_df['match2'] = comb_df.id.eq(comb_df.id.shift())


comb_df["seq2"] = np.where((comb_df["match"].shift(+1) == True) & (comb_df["match2"].shift(+1) == True), comb_df["seq"] - 1, comb_df["seq2"])

但问题是，如果连续有多个重复项等，这并不能真正解决。

也许问题不能纯粹以 numpy 的方式解决，但也许我必须遍历行？

有 2-3 百万行，因此如果解决方案非常慢，性能将是一个问题。

需要同时生成 df.seq 和 df.seq2

任何想法都会非常有帮助！

原文链接：https://stackoverflow.com//questions/71476658/pandas-or-other-python-application-to-generate-a-column-with-1-to-n-value-based

我来回复

BENY 评论

我们可以做groupby和cumcount和factorize

df['seq'] = df.groupby('year').cumcount()+1
df['seq2'] = df.groupby('year')['id'].transform(lambda x : x.factorize()[0]+1)
df
Out[852]: 
   year     id  seq  seq2
0     1  16100    1     1
1     1    150    2     2
2     1    150    3     2
3     2     66    1     1
4     2    370    2     2
5     2    370    3     2
6     2    530    4     3
7     3     41    1     1
8     3     43    2     2
9     3     61    3     3

2年前 0条评论

Pandas 或其他 Python 应用程序根据具有规则的其他两列生成具有 1 到 n 值的列

回复

相关问题