Elegant data processing in Python -


i don't know how name problem - have list of tuples in python:

(int, str, datetime, float) 

there bunch of rows in list, sorted datetime , i'd count how rows in 5 minute time span, have floats in given range, i.e. 0 2, 2 5, 5 10 , on. mean, given such data (date not string, datetime.datetime):

(1, 'abc', '2014-09-10 17:50:34', 5.5) (2, 'abc', '2014-09-10 17:51:34', 1.5) (3, 'abc', '2014-09-10 17:52:14', 7.1) (4, 'abc', '2014-09-10 17:59:34', 9.5) (5, 'abc', '2014-09-10 17:59:54', 9.2) 

i'd receive kind of dictionary:

{ the_end_of_time_interval1: {'0to2': int, '2to5': int, '5to10': int, ... },   the_end_of_time_interval2: {'0to2': int, '2to5': int, '5to10': int, ... },  ...} 

for example:

{ '2014-09-10 17:52:34': { '0to2': 1, '2to5': 0, '5to10': 2, '10to15': 0 },   '2014-09-10 17:59:54': { '0to2': 0, '2to5': 0, '5to10': 2, '10to15' : 0 } } 

my question - there elegant way that? i'd save file , send database purpose of monitoring.

you can make use of

itertools.groupby 

you need function grouping/sorting

one dates in order:

the_date = none def split_5_min(data_row):     global the_date      row_date = datetime.datetime.strptime(data_row[2], '%y-%m-%d %h:%m:%s')     if the_date none or row_date - the_date > datetime.timedelta(minutes=5):         the_date = row_date         return the_date 

something along lines should it

then need 1 put floats buckets:

def bucket_floats(data_row):     float_data = data_row[3]     if float_data > 0 , float_data <= 2:         return 1     elif float_data > 2 , float_data <= 5:         return 2     elif float_data > 5 , float_data <= 10:         return 3     ... 

so meat of things. want have data in list

then:

final_dict = {}  # sort groupby creates group each time group value changes data.sort(key=split_5_min) period, data_tuples in itertools.groupby(data, split_5_min):     group_data = list(data_tuples)     # sort again     group_data.sort(key=bucket_floats)     final_dict[period] = {}     # grouping witin 5 min group     bucket, float_tuples in itertools.groupby(data, bucket_floats):         # pack dict         final_dict[period][bucket] = len(list(float_tuples)) 

think datetime bucketing may off here... want more complicated. or straight sort on dates before run slit_5_min function on it


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

android - Associate same looper with different threads -

visual studio 2010 - Connect to informix database windows form application -