有什么方法可以在没有迭代的情况下用熊猫进行标记吗？

xiaoxingxing 2年前 python 434

原文标题 ：Is there any method for labeling without iteration with pandas?

我有两个基于时间的数据。一个是加速度计的测量数据，另一个是标签数据。例如，

accelerometer.csv

timestamp,X,Y,Z
1.0,0.5,0.2,0.0
1.1,0.2,0.3,0.0
1.2,-0.1,0.5,0.0
...
2.0,0.9,0.8,0.5
2.1,0.4,0.1,0.0
2.2,0.3,0.2,0.3
...

label.csv

start,end,label
1.0,2.0,"running"
2.0,3.0,"exercising"

也许这些数据是不现实的，因为这些只是例子。

在这种情况下，我想将这些数据合并到下面：merged.csv

timestamp,X,Y,Z,label
1.0,0.5,0.2,0.0,"running"
1.1,0.2,0.3,0.0,"running"
1.2,-0.1,0.5,0.0,"running"
...
2.0,0.9,0.8,0.5,"exercising"
2.1,0.4,0.1,0.0,"exercising"
2.2,0.3,0.2,0.3,"exercising"
...

我正在使用熊猫的“iterrows”。但是，实际数据的行数大于 10,000。因此，程序的运行时间很长。我认为，这项工作至少有一种方法无需迭代。

我的代码如下：

import pandas as pd

acc = pd.read_csv("./accelerometer.csv")
labeled = pd.read_csv("./label.csv")

for index, row in labeled.iterrows():
    start = row["start"]
    end = row["end"]

    acc.loc[(start <= acc["timestamp"]) & (acc["timestamp"] < end), "label"] = row["label"]

如何修改我的代码以摆脱“for”迭代？

原文链接：https://stackoverflow.com//questions/71686221/is-there-any-method-for-labeling-without-iteration-with-pandas

我来回复

Nick 评论

如果accelerometer中的时间不超出label中的时间范围，则可以使用merge_asof：

accmerged = pd.merge_asof(acc, labeled, left_on='timestamp', right_on='start', direction='backward')

输出（对于您问题中的示例数据）：

   timestamp    X    Y    Z  start  end       label
0        1.0  0.5  0.2  0.0    1.0  2.0     running
1        1.1  0.2  0.3  0.0    1.0  2.0     running
2        1.2 -0.1  0.5  0.0    1.0  2.0     running
3        2.0  0.9  0.8  0.5    2.0  3.0  exercising
4        2.1  0.4  0.1  0.0    2.0  3.0  exercising
5        2.2  0.3  0.2  0.3    2.0  3.0  exercising

请注意，您可以删除start和end列，如果您想：

accmerged = accmerged.drop(['start', 'end'], axis=1)

输出：

   timestamp    X    Y    Z       label
0        1.0  0.5  0.2  0.0     running
1        1.1  0.2  0.3  0.0     running
2        1.2 -0.1  0.5  0.0     running
3        2.0  0.9  0.8  0.5  exercising
4        2.1  0.4  0.1  0.0  exercising
5        2.2  0.3  0.2  0.3  exercising

2年前 0条评论

有什么方法可以在没有迭代的情况下用熊猫进行标记吗？

回复

相关问题