python - h5py: slicing dataset without loading into memory -
is possible slice h5py dataset in 2 subsets without loading them memory? e.g.:
dset = h5py.file("/2tbhd/tst.h5py","r") x_train = dset['x'][:n/2] x_test = dset['x'][n/2:-1]
no.
you need implement own class act view on dataset. an old thread on h5py mailing list indicates such datasetview
class theoretically possible implement using hdf5 dataspaces, not worth many use cases. element-wise access slow compared normal numpy array (assuming can fit data memory).
edit: if want avoid messing hdf5 data spaces (whatever means), might settle simpler approach. try this gist wrote. use this:
dset = h5py.file("/2tbhd/tst.h5py","r") simpleview import simpleview x_view = simpleview(dset['x']) # stores slices, doesn't load memory x_train = x_view[:n/2] x_test = x_view[n/2:-1] # these statements load data memory. print numpy.sum(x_train) print numpy.array(x_test)[0]
note slicing support in simple example limited. if want full slicing , element-wise access, you'll have copy real array:
x_train_copy = numpy.array(x_train)
Comments
Post a Comment