python - h5py: slicing dataset without loading into memory -


is possible slice h5py dataset in 2 subsets without loading them memory? e.g.:

dset = h5py.file("/2tbhd/tst.h5py","r")  x_train = dset['x'][:n/2] x_test  = dset['x'][n/2:-1] 

no.

you need implement own class act view on dataset. an old thread on h5py mailing list indicates such datasetview class theoretically possible implement using hdf5 dataspaces, not worth many use cases. element-wise access slow compared normal numpy array (assuming can fit data memory).

edit: if want avoid messing hdf5 data spaces (whatever means), might settle simpler approach. try this gist wrote. use this:

dset = h5py.file("/2tbhd/tst.h5py","r")  simpleview import simpleview x_view = simpleview(dset['x'])  # stores slices, doesn't load memory x_train = x_view[:n/2] x_test  = x_view[n/2:-1]  # these statements load data memory. print numpy.sum(x_train) print numpy.array(x_test)[0] 

note slicing support in simple example limited. if want full slicing , element-wise access, you'll have copy real array:

x_train_copy = numpy.array(x_train) 

Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

visual studio 2010 - Connect to informix database windows form application -

android - Associate same looper with different threads -