如何以csv格式查找每8行的最大值

社会演员多 2年前 python 421

原文标题 ：How to find max value per each 8 row in csv format

我有两个包含日期和值的列。我需要从每 8 行或 8 天中找到最大值。云有人建议我使用什么工具或什么库？

简单数据

timestamp  value
1/7/2017    0.4422709
2/7/2017    0.47979677
3/7/2017    0.48154536
4/7/2017    0.50247365
5/7/2017    0.45446774
6/7/2017    0.44231474
7/7/2017    0.48774317
8/7/2017    0.48993695
9/7/2017    0.48612505
10/7/2017   0.48970944
11/7/2017   0.46920314
12/7/2017   0.47724804
13/7/2017   0.4656107
14/7/2017   0.47519404
15/7/2017   0.44820467
16/7/2017   0.4583039
17/7/2017   0.44056067

原文链接：https://stackoverflow.com//questions/71476062/how-to-find-max-value-per-each-8-row-in-csv-format

我来回复

Vraj Shah 评论

解决方案

你可以试试这个。只需根据 csv 的结构更改文件名和分隔符。

import csv
filename="text.csv"
count = 1
days = 8
maxVal = 0
maxValues=[]
with open(filename,'r') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=' ')
    for row in reader:
        if count % days == 0:
            maxValues.append(max(float(maxVal),float(row['value'])))
            maxVal = 0
        else:
            maxVal = max(float(maxVal),float(row['value']))
        count+=1
print(maxValues)

输出

0.50247365
0.48970944

2年前 0条评论

Dev Parzival 评论

use simple file operation for finding max values

with open("data.log",'r') as file:
    lines = file.readlines()
    prevValues = []
    for index, line in enumerate(lines):
        if index == 0:
            continue
        cols = line.split(" ")
        prevValues.append(float(cols[1]))
        if index % 8 == 0:#if we have read 8 rows
            print(max(prevValues))
            prevValues.clear()#clearing the list
    if len(prevValues) > 0:
            print(max(prevValues))

输出：

0.50247365
0.48970944
0.44056067

2年前 0条评论

furas 评论

我不确定您是否要为行[0:8]、[8:16]或[0:8]、[1:9]、[2:10]等（这对我来说似乎更有用）我使用pandas.DataFrame来描述这两个版本

我用io来模拟文件——所以每个人都可以复制和测试——但是你用filename来加载文件。

import pandas as pd
import io

text  ='''timestamp value
1/7/2017    0.4422709
2/7/2017    0.47979677
3/7/2017    0.48154536
4/7/2017    0.50247365
5/7/2017    0.45446774
6/7/2017    0.44231474
7/7/2017    0.48774317
8/7/2017    0.48993695
9/7/2017    0.48612505
10/7/2017   0.48970944
11/7/2017   0.46920314
12/7/2017   0.47724804
13/7/2017   0.4656107
14/7/2017   0.47519404
15/7/2017   0.44820467
16/7/2017   0.4583039
17/7/2017   0.44056067
'''

#df = pd.read_csv('filename', sep='\s+')
df = pd.read_csv(io.StringIO(text), sep='\s+')

如果你需要行[0:8],[8:16]那么你可以使用

df['part'] = df.index // 8

创建新栏目part

    timestamp     value  part
0    1/7/2017  0.442271     0
1    2/7/2017  0.479797     0
2    3/7/2017  0.481545     0
3    4/7/2017  0.502474     0
4    5/7/2017  0.454468     0
5    6/7/2017  0.442315     0
6    7/7/2017  0.487743     0
7    8/7/2017  0.489937     0
8    9/7/2017  0.486125     1
9   10/7/2017  0.489709     1
10  11/7/2017  0.469203     1
11  12/7/2017  0.477248     1
12  13/7/2017  0.465611     1
13  14/7/2017  0.475194     1
14  15/7/2017  0.448205     1
15  16/7/2017  0.458304     1
16  17/7/2017  0.440561     2

然后你可以用它来grouprows并得到max()

result = df.groupby('part')['value'].max()
print(result)

结果

0    0.502474
1    0.489709
2    0.440561

你也可以不创建列part

result = df.groupby( df.index//8 )['value'].max()

如果需要行[0:8]、[8:16]或[0:8]、[1:9]、[2:10]等，那么可以使用rolling window和大小8

df['max'] = df['value'].rolling(window=8, min_periods=1).max()

你得到

    timestamp     value       max
0    1/7/2017  0.442271  0.442271
1    2/7/2017  0.479797  0.479797
2    3/7/2017  0.481545  0.481545
3    4/7/2017  0.502474  0.502474
4    5/7/2017  0.454468  0.502474
5    6/7/2017  0.442315  0.502474
6    7/7/2017  0.487743  0.502474
7    8/7/2017  0.489937  0.502474
8    9/7/2017  0.486125  0.502474
9   10/7/2017  0.489709  0.502474
10  11/7/2017  0.469203  0.502474
11  12/7/2017  0.477248  0.489937
12  13/7/2017  0.465611  0.489937
13  14/7/2017  0.475194  0.489937
14  15/7/2017  0.448205  0.489937
15  16/7/2017  0.458304  0.489709
16  17/7/2017  0.440561  0.489709

完整代码：

import pandas as pd
import io

text  ='''timestamp value
1/7/2017    0.4422709
2/7/2017    0.47979677
3/7/2017    0.48154536
4/7/2017    0.50247365
5/7/2017    0.45446774
6/7/2017    0.44231474
7/7/2017    0.48774317
8/7/2017    0.48993695
9/7/2017    0.48612505
10/7/2017   0.48970944
11/7/2017   0.46920314
12/7/2017   0.47724804
13/7/2017   0.4656107
14/7/2017   0.47519404
15/7/2017   0.44820467
16/7/2017   0.4583039
17/7/2017   0.44056067
'''

#df = pd.read_csv('filename', sep='\s+')
df = pd.read_csv(io.StringIO(text), sep='\s+')

# --- 1a ---

df['part'] = df.index // 8
result = df.groupby('part')['value'].max()
print(result)

# --- 1b ---

result = df.groupby(df.index//8)['value'].max()
print(result)

# --- 2 ---

df['max'] = df['value'].rolling(window=8, min_periods=1).max()
print(df)

2年前 0条评论

如何以csv格式查找每8行的最大值

回复

解决方案

你可以试试这个。只需根据 csv 的结构更改文件名和分隔符。

输出

输出：

相关问题