numpy - Organizing column and header data with pandas, python -
i'm having go @ using numpy instead of matlab, i'm relatively new python.
my current challenge importing data in multiple file in sensible way can use , plot it. data organized in columnes (temperature, pressure, time, etc, each file being measurement period), , decided pandas best way import data. thinking of using top-leve descriptor each file, , subdescriptors each column. thought of doing this. reading multiple csv files python pandas dataframe
the problem i'd retain , use of data in header (for plotting, instance). there's no column titles, general info on data mesaurements, this:
flight id: xxxxxx date: 01-27-10 time: 5:25:19 owner release point: xx.304n xx.060e 11 m serial number xxxxxx surface data: 985.1 mb 1.0 c 100% 1.0 m/s @ 308 deg.
i don't know how extract , store data in way makes sense when combined data frame. thought of perhaps dictionary, i'm not sure how split data efficiently since there's no consistent divider. ideas?
looks working radiosondes...
when pull in radiosonde data put in multi-level indexed dataframe. levels of various forms , orders, flight_num, date, altitude, etc. make sense. also, when working sonde data want additional information not need stored within dataframe, store additional attributes. if parse file , store along lines of (yes, there modifications can made "improve" this):
import pandas pd open("filename.csv",'r') data: header = data.read().split('\n')[:5] # change match number of header rows data = pd.read_csv(data, skiprows=6, skipinitialspace=true, na_values=[-999,'infinity','-infinity']) # can parse header out necessary information # continue until have header info want/need; e.g. flight = header[0].split(': ')[1] date = header[1].split(': ')[1].split('')[0] time = header[1].split(': ')[2] # lot of header information stored metadata me. # want more flight number , date in metadata, point. data.metadata = {'flight':flight, 'date':date}
i presume have date/time column (call "dates" here) within file, can use re-index dataframe. if choose use different variables within multi-level index same method applies.
new_index = [(data.metadata['flight'],r) r in data.dates] data.index = pd.multiindex.from_tuples(new_index)
you have multi-level indexed dataframe.
now, regarding "metadata". edchum makes excellent point if copy "data" not copy on metadata dictionary. also, if save "data" dataframe via data.to_pickle lose metadata (more on later). if want keep metadata have couple options.
save data on flight-by-flight basis. allow store metadata each individual flight's file.
assuming want have multiple flights within 1 saved file: can add additional column within dataframe hold information (i.e. column flight number, column surface temperature, etc.), though increase size of saved file.
assuming want have multiple flights within 1 saved file (option 2): can make metadata dictionary "keyed" flight number. e.g.
data.metadata = {flight1:{'date':date}, flight2:{'date':date}}
now store metadata. check io class on storing additional attributes within h5 file posted here.
your question quite broad, got broad answer. hope helpful.
Comments
Post a Comment