Elegant data processing in Python -
i don't know how name problem - have list of tuples in python:
(int, str, datetime, float)
there bunch of rows in list, sorted datetime , i'd count how rows in 5 minute time span, have floats in given range, i.e. 0 2, 2 5, 5 10 , on. mean, given such data (date not string, datetime.datetime):
(1, 'abc', '2014-09-10 17:50:34', 5.5) (2, 'abc', '2014-09-10 17:51:34', 1.5) (3, 'abc', '2014-09-10 17:52:14', 7.1) (4, 'abc', '2014-09-10 17:59:34', 9.5) (5, 'abc', '2014-09-10 17:59:54', 9.2)
i'd receive kind of dictionary:
{ the_end_of_time_interval1: {'0to2': int, '2to5': int, '5to10': int, ... }, the_end_of_time_interval2: {'0to2': int, '2to5': int, '5to10': int, ... }, ...}
for example:
{ '2014-09-10 17:52:34': { '0to2': 1, '2to5': 0, '5to10': 2, '10to15': 0 }, '2014-09-10 17:59:54': { '0to2': 0, '2to5': 0, '5to10': 2, '10to15' : 0 } }
my question - there elegant way that? i'd save file , send database purpose of monitoring.
you can make use of
itertools.groupby
you need function grouping/sorting
one dates in order:
the_date = none def split_5_min(data_row): global the_date row_date = datetime.datetime.strptime(data_row[2], '%y-%m-%d %h:%m:%s') if the_date none or row_date - the_date > datetime.timedelta(minutes=5): the_date = row_date return the_date
something along lines should it
then need 1 put floats buckets:
def bucket_floats(data_row): float_data = data_row[3] if float_data > 0 , float_data <= 2: return 1 elif float_data > 2 , float_data <= 5: return 2 elif float_data > 5 , float_data <= 10: return 3 ...
so meat of things. want have data in list
then:
final_dict = {} # sort groupby creates group each time group value changes data.sort(key=split_5_min) period, data_tuples in itertools.groupby(data, split_5_min): group_data = list(data_tuples) # sort again group_data.sort(key=bucket_floats) final_dict[period] = {} # grouping witin 5 min group bucket, float_tuples in itertools.groupby(data, bucket_floats): # pack dict final_dict[period][bucket] = len(list(float_tuples))
think datetime bucketing may off here... want more complicated. or straight sort on dates before run slit_5_min function on it
Comments
Post a Comment