如何以csv格式查找每8行的最大值

社会演员多 python 421

原文标题How to find max value per each 8 row in csv format

我有两个包含日期和值的列。我需要从每 8 行或 8 天中找到最大值。云有人建议我使用什么工具或什么库?

简单数据

timestamp  value
1/7/2017    0.4422709
2/7/2017    0.47979677
3/7/2017    0.48154536
4/7/2017    0.50247365
5/7/2017    0.45446774
6/7/2017    0.44231474
7/7/2017    0.48774317
8/7/2017    0.48993695
9/7/2017    0.48612505
10/7/2017   0.48970944
11/7/2017   0.46920314
12/7/2017   0.47724804
13/7/2017   0.4656107
14/7/2017   0.47519404
15/7/2017   0.44820467
16/7/2017   0.4583039
17/7/2017   0.44056067

原文链接:https://stackoverflow.com//questions/71476062/how-to-find-max-value-per-each-8-row-in-csv-format

回复

我来回复
  • Vraj Shah的头像
    Vraj Shah 评论

    解决方案

    你可以试试这个。只需根据 csv 的结构更改文件名和分隔符。
    import csv
    filename="text.csv"
    count = 1
    days = 8
    maxVal = 0
    maxValues=[]
    with open(filename,'r') as csvfile:
        reader = csv.DictReader(csvfile, delimiter=' ')
        for row in reader:
            if count % days == 0:
                maxValues.append(max(float(maxVal),float(row['value'])))
                maxVal = 0
            else:
                maxVal = max(float(maxVal),float(row['value']))
            count+=1
    print(maxValues)
    

    输出

    0.50247365
    0.48970944
    
    2年前 0条评论
  • Dev Parzival的头像
    Dev Parzival 评论
    use simple file operation for finding max values
    
    with open("data.log",'r') as file:
        lines = file.readlines()
        prevValues = []
        for index, line in enumerate(lines):
            if index == 0:
                continue
            cols = line.split(" ")
            prevValues.append(float(cols[1]))
            if index % 8 == 0:#if we have read 8 rows
                print(max(prevValues))
                prevValues.clear()#clearing the list
        if len(prevValues) > 0:
                print(max(prevValues))
    
            
    

    输出:

    0.50247365
    0.48970944
    0.44056067
    
    2年前 0条评论
  • furas的头像
    furas 评论

    我不确定您是否要为行[0:8][8:16][0:8][1:9][2:10]等(这对我来说似乎更有用)我使用pandas.DataFrame来描述这两个版本

    我用io来模拟文件——所以每个人都可以复制和测试——但是你用filename来加载文件。

    import pandas as pd
    import io
    
    text  ='''timestamp value
    1/7/2017    0.4422709
    2/7/2017    0.47979677
    3/7/2017    0.48154536
    4/7/2017    0.50247365
    5/7/2017    0.45446774
    6/7/2017    0.44231474
    7/7/2017    0.48774317
    8/7/2017    0.48993695
    9/7/2017    0.48612505
    10/7/2017   0.48970944
    11/7/2017   0.46920314
    12/7/2017   0.47724804
    13/7/2017   0.4656107
    14/7/2017   0.47519404
    15/7/2017   0.44820467
    16/7/2017   0.4583039
    17/7/2017   0.44056067
    '''
    
    #df = pd.read_csv('filename', sep='\s+')
    df = pd.read_csv(io.StringIO(text), sep='\s+')
    

    如果你需要行[0:8],[8:16]那么你可以使用

    df['part'] = df.index // 8
    

    创建新栏目part

        timestamp     value  part
    0    1/7/2017  0.442271     0
    1    2/7/2017  0.479797     0
    2    3/7/2017  0.481545     0
    3    4/7/2017  0.502474     0
    4    5/7/2017  0.454468     0
    5    6/7/2017  0.442315     0
    6    7/7/2017  0.487743     0
    7    8/7/2017  0.489937     0
    8    9/7/2017  0.486125     1
    9   10/7/2017  0.489709     1
    10  11/7/2017  0.469203     1
    11  12/7/2017  0.477248     1
    12  13/7/2017  0.465611     1
    13  14/7/2017  0.475194     1
    14  15/7/2017  0.448205     1
    15  16/7/2017  0.458304     1
    16  17/7/2017  0.440561     2
    

    然后你可以用它来grouprows并得到max()

    result = df.groupby('part')['value'].max()
    print(result)
    

    结果

    0    0.502474
    1    0.489709
    2    0.440561
    

    你也可以不创建列part

    result = df.groupby( df.index//8 )['value'].max()
    

    如果需要行[0:8][8:16][0:8][1:9][2:10]等,那么可以使用rolling window和大小8

    df['max'] = df['value'].rolling(window=8, min_periods=1).max()
    

    你得到

        timestamp     value       max
    0    1/7/2017  0.442271  0.442271
    1    2/7/2017  0.479797  0.479797
    2    3/7/2017  0.481545  0.481545
    3    4/7/2017  0.502474  0.502474
    4    5/7/2017  0.454468  0.502474
    5    6/7/2017  0.442315  0.502474
    6    7/7/2017  0.487743  0.502474
    7    8/7/2017  0.489937  0.502474
    8    9/7/2017  0.486125  0.502474
    9   10/7/2017  0.489709  0.502474
    10  11/7/2017  0.469203  0.502474
    11  12/7/2017  0.477248  0.489937
    12  13/7/2017  0.465611  0.489937
    13  14/7/2017  0.475194  0.489937
    14  15/7/2017  0.448205  0.489937
    15  16/7/2017  0.458304  0.489709
    16  17/7/2017  0.440561  0.489709
    

    完整代码:

    import pandas as pd
    import io
    
    text  ='''timestamp value
    1/7/2017    0.4422709
    2/7/2017    0.47979677
    3/7/2017    0.48154536
    4/7/2017    0.50247365
    5/7/2017    0.45446774
    6/7/2017    0.44231474
    7/7/2017    0.48774317
    8/7/2017    0.48993695
    9/7/2017    0.48612505
    10/7/2017   0.48970944
    11/7/2017   0.46920314
    12/7/2017   0.47724804
    13/7/2017   0.4656107
    14/7/2017   0.47519404
    15/7/2017   0.44820467
    16/7/2017   0.4583039
    17/7/2017   0.44056067
    '''
    
    #df = pd.read_csv('filename', sep='\s+')
    df = pd.read_csv(io.StringIO(text), sep='\s+')
    
    # --- 1a ---
    
    df['part'] = df.index // 8
    result = df.groupby('part')['value'].max()
    print(result)
    
    # --- 1b ---
    
    result = df.groupby(df.index//8)['value'].max()
    print(result)
    
    # --- 2 ---
    
    df['max'] = df['value'].rolling(window=8, min_periods=1).max()
    print(df)
    
    2年前 0条评论