如何以csv格式查找每8行的最大值
python 421
原文标题 :How to find max value per each 8 row in csv format
我有两个包含日期和值的列。我需要从每 8 行或 8 天中找到最大值。云有人建议我使用什么工具或什么库?
简单数据
timestamp value
1/7/2017 0.4422709
2/7/2017 0.47979677
3/7/2017 0.48154536
4/7/2017 0.50247365
5/7/2017 0.45446774
6/7/2017 0.44231474
7/7/2017 0.48774317
8/7/2017 0.48993695
9/7/2017 0.48612505
10/7/2017 0.48970944
11/7/2017 0.46920314
12/7/2017 0.47724804
13/7/2017 0.4656107
14/7/2017 0.47519404
15/7/2017 0.44820467
16/7/2017 0.4583039
17/7/2017 0.44056067
回复
我来回复-
Vraj Shah 评论
解决方案
你可以试试这个。只需根据 csv 的结构更改文件名和分隔符。
import csv filename="text.csv" count = 1 days = 8 maxVal = 0 maxValues=[] with open(filename,'r') as csvfile: reader = csv.DictReader(csvfile, delimiter=' ') for row in reader: if count % days == 0: maxValues.append(max(float(maxVal),float(row['value']))) maxVal = 0 else: maxVal = max(float(maxVal),float(row['value'])) count+=1 print(maxValues)
输出
0.50247365 0.48970944
2年前 -
Dev Parzival 评论
use simple file operation for finding max values with open("data.log",'r') as file: lines = file.readlines() prevValues = [] for index, line in enumerate(lines): if index == 0: continue cols = line.split(" ") prevValues.append(float(cols[1])) if index % 8 == 0:#if we have read 8 rows print(max(prevValues)) prevValues.clear()#clearing the list if len(prevValues) > 0: print(max(prevValues))
输出:
0.50247365 0.48970944 0.44056067
2年前 -
furas 评论
我不确定您是否要为行
[0:8]
、[8:16]
或[0:8]
、[1:9]
、[2:10]
等(这对我来说似乎更有用)我使用pandas.DataFrame
来描述这两个版本我用
io
来模拟文件——所以每个人都可以复制和测试——但是你用filename
来加载文件。import pandas as pd import io text ='''timestamp value 1/7/2017 0.4422709 2/7/2017 0.47979677 3/7/2017 0.48154536 4/7/2017 0.50247365 5/7/2017 0.45446774 6/7/2017 0.44231474 7/7/2017 0.48774317 8/7/2017 0.48993695 9/7/2017 0.48612505 10/7/2017 0.48970944 11/7/2017 0.46920314 12/7/2017 0.47724804 13/7/2017 0.4656107 14/7/2017 0.47519404 15/7/2017 0.44820467 16/7/2017 0.4583039 17/7/2017 0.44056067 ''' #df = pd.read_csv('filename', sep='\s+') df = pd.read_csv(io.StringIO(text), sep='\s+')
如果你需要行
[0:8]
,[8:16]
那么你可以使用df['part'] = df.index // 8
创建新栏目
part
timestamp value part 0 1/7/2017 0.442271 0 1 2/7/2017 0.479797 0 2 3/7/2017 0.481545 0 3 4/7/2017 0.502474 0 4 5/7/2017 0.454468 0 5 6/7/2017 0.442315 0 6 7/7/2017 0.487743 0 7 8/7/2017 0.489937 0 8 9/7/2017 0.486125 1 9 10/7/2017 0.489709 1 10 11/7/2017 0.469203 1 11 12/7/2017 0.477248 1 12 13/7/2017 0.465611 1 13 14/7/2017 0.475194 1 14 15/7/2017 0.448205 1 15 16/7/2017 0.458304 1 16 17/7/2017 0.440561 2
然后你可以用它来
group
rows并得到max()
result = df.groupby('part')['value'].max() print(result)
结果
0 0.502474 1 0.489709 2 0.440561
你也可以不创建列
part
result = df.groupby( df.index//8 )['value'].max()
如果需要行
[0:8]
、[8:16]
或[0:8]
、[1:9]
、[2:10]
等,那么可以使用rolling window
和大小8
df['max'] = df['value'].rolling(window=8, min_periods=1).max()
你得到
timestamp value max 0 1/7/2017 0.442271 0.442271 1 2/7/2017 0.479797 0.479797 2 3/7/2017 0.481545 0.481545 3 4/7/2017 0.502474 0.502474 4 5/7/2017 0.454468 0.502474 5 6/7/2017 0.442315 0.502474 6 7/7/2017 0.487743 0.502474 7 8/7/2017 0.489937 0.502474 8 9/7/2017 0.486125 0.502474 9 10/7/2017 0.489709 0.502474 10 11/7/2017 0.469203 0.502474 11 12/7/2017 0.477248 0.489937 12 13/7/2017 0.465611 0.489937 13 14/7/2017 0.475194 0.489937 14 15/7/2017 0.448205 0.489937 15 16/7/2017 0.458304 0.489709 16 17/7/2017 0.440561 0.489709
完整代码:
import pandas as pd import io text ='''timestamp value 1/7/2017 0.4422709 2/7/2017 0.47979677 3/7/2017 0.48154536 4/7/2017 0.50247365 5/7/2017 0.45446774 6/7/2017 0.44231474 7/7/2017 0.48774317 8/7/2017 0.48993695 9/7/2017 0.48612505 10/7/2017 0.48970944 11/7/2017 0.46920314 12/7/2017 0.47724804 13/7/2017 0.4656107 14/7/2017 0.47519404 15/7/2017 0.44820467 16/7/2017 0.4583039 17/7/2017 0.44056067 ''' #df = pd.read_csv('filename', sep='\s+') df = pd.read_csv(io.StringIO(text), sep='\s+') # --- 1a --- df['part'] = df.index // 8 result = df.groupby('part')['value'].max() print(result) # --- 1b --- result = df.groupby(df.index//8)['value'].max() print(result) # --- 2 --- df['max'] = df['value'].rolling(window=8, min_periods=1).max() print(df)
2年前