Python multiprocessing within mpi -
i have python script i've written using multiprocessing module, faster execution. calculation embarrassingly parallel, efficiency scales number of processors. now, i'd use within mpi program, manages mcmc calculation across multiple computers. code has call system() invokes python script. however, i'm finding when called way, efficiency gain using python multiprocessing vanishes.
how can python script retain speed gains multiprocessing when called mpi?
here simple example, analogous more complicated codes want use displays same general behavior. write executable python script called junk.py.
#!/usr/bin/python import multiprocessing import numpy np nproc = 3 nlen = 100000 def f(x): print x v = np.arange(nlen) result = 0. i, y in enumerate(v): result += (x+v[i:]).sum() return result def foo(): pool = multiprocessing.pool(processes=nproc) xlist = range(2,2+nproc) print xlist result = pool.map(f, xlist) print result if __name__ == '__main__': foo()
when run shell itself, using "top" can see 3 python processes each taking 100% of cpu on 16-core machine.
node094:mpi[ 206 ] /usr/bin/time junk.py [2, 3, 4] 2 3 4 [333343333400000.0, 333348333450000.0, 333353333500000.0] 62.68user 0.04system 0:21.11elapsed 297%cpu (0avgtext+0avgdata 16516maxresident)k 0inputs+0outputs (0major+11092minor)pagefaults 0swaps
however, if invoke mpirun, each python process takes 33% of cpu, , overall takes 3 times long run. calling -np 2 or more results in more processes, doesn't speed computation any.
node094:mpi[ 208 ] /usr/bin/time mpirun -np 1 junk.py [2, 3, 4] 2 3 4 [333343333400000.0, 333348333450000.0, 333353333500000.0] 61.63user 0.07system 1:01.91elapsed 99%cpu (0avgtext+0avgdata 16520maxresident)k 0inputs+8outputs (0major+13715minor)pagefaults 0swaps
(additional notes: mpirun 1.8.1, python 2.7.3 on linux debian version wheezy. have heard system() not allowed within mpi programs, it's been working me last 5 years on computer. example have called pthread-based parallel code system() within mpi program, , it's used 100% of cpu each thread, desired. also, in case going suggest running python script in serial , calling on more nodes...the mcmc calculation involves fixed number of chains need move in synchronized way, computation unfortunately can't reorganized way.)
openmpi's mpirun, v1.7 , later, defaults binding processes cores - is, when launches python junk.py process, binds core run on. that's fine, , right default behaviour mpi use cases. here each mpi task forking more processes (through multiprocessing package), , forked processes inherit binding state of parent - they're bound same core, fighting amongst themselves. (the "p" column in top show they're on same processor)
if mpirun -np 2, you'll find 2 sets of 3 processes, each on different core, each contending amongst themselves.
with openmpi, can avoid turning off binding,
mpirun -np 1 --bind-to none junk.py
or choosing other binding makes sense given final geometry of run. mpich has similar options hydra.
note fork()ing of subprocesses mpi isn't safe or supported, particularly clusters running infiniband interconnects, openmpi's mpirun/mpiexec warn if isn't safe.
Comments
Post a Comment