有什么方法可以在没有迭代的情况下用熊猫进行标记吗?

xiaoxingxing python 434

原文标题Is there any method for labeling without iteration with pandas?

我有两个基于时间的数据。一个是加速度计的测量数据,另一个是标签数据。例如,

accelerometer.csv

timestamp,X,Y,Z
1.0,0.5,0.2,0.0
1.1,0.2,0.3,0.0
1.2,-0.1,0.5,0.0
...
2.0,0.9,0.8,0.5
2.1,0.4,0.1,0.0
2.2,0.3,0.2,0.3
...

label.csv

start,end,label
1.0,2.0,"running"
2.0,3.0,"exercising"

也许这些数据是不现实的,因为这些只是例子。

在这种情况下,我想将这些数据合并到下面:merged.csv

timestamp,X,Y,Z,label
1.0,0.5,0.2,0.0,"running"
1.1,0.2,0.3,0.0,"running"
1.2,-0.1,0.5,0.0,"running"
...
2.0,0.9,0.8,0.5,"exercising"
2.1,0.4,0.1,0.0,"exercising"
2.2,0.3,0.2,0.3,"exercising"
...

我正在使用熊猫的“iterrows”。但是,实际数据的行数大于 10,000。因此,程序的运行时间很长。我认为,这项工作至少有一种方法无需迭代。

我的代码如下:

import pandas as pd

acc = pd.read_csv("./accelerometer.csv")
labeled = pd.read_csv("./label.csv")

for index, row in labeled.iterrows():
    start = row["start"]
    end = row["end"]

    acc.loc[(start <= acc["timestamp"]) & (acc["timestamp"] < end), "label"] = row["label"]

如何修改我的代码以摆脱“for”迭代?

原文链接:https://stackoverflow.com//questions/71686221/is-there-any-method-for-labeling-without-iteration-with-pandas

回复

我来回复
  • Nick的头像
    Nick 评论

    如果accelerometer中的时间不超出label中的时间范围,则可以使用merge_asof

    accmerged = pd.merge_asof(acc, labeled, left_on='timestamp', right_on='start', direction='backward')
    

    输出(对于您问题中的示例数据):

       timestamp    X    Y    Z  start  end       label
    0        1.0  0.5  0.2  0.0    1.0  2.0     running
    1        1.1  0.2  0.3  0.0    1.0  2.0     running
    2        1.2 -0.1  0.5  0.0    1.0  2.0     running
    3        2.0  0.9  0.8  0.5    2.0  3.0  exercising
    4        2.1  0.4  0.1  0.0    2.0  3.0  exercising
    5        2.2  0.3  0.2  0.3    2.0  3.0  exercising
    

    请注意,您可以删除startend列,如果您想:

    accmerged = accmerged.drop(['start', 'end'], axis=1)
    

    输出:

       timestamp    X    Y    Z       label
    0        1.0  0.5  0.2  0.0     running
    1        1.1  0.2  0.3  0.0     running
    2        1.2 -0.1  0.5  0.0     running
    3        2.0  0.9  0.8  0.5  exercising
    4        2.1  0.4  0.1  0.0  exercising
    5        2.2  0.3  0.2  0.3  exercising
    
    2年前 0条评论