Python multiprocessing within mpi -


i have python script i've written using multiprocessing module, faster execution. calculation embarrassingly parallel, efficiency scales number of processors. now, i'd use within mpi program, manages mcmc calculation across multiple computers. code has call system() invokes python script. however, i'm finding when called way, efficiency gain using python multiprocessing vanishes.

how can python script retain speed gains multiprocessing when called mpi?

here simple example, analogous more complicated codes want use displays same general behavior. write executable python script called junk.py.

#!/usr/bin/python import multiprocessing import numpy np  nproc = 3 nlen = 100000   def f(x):     print x     v = np.arange(nlen)     result = 0.     i, y in enumerate(v):         result += (x+v[i:]).sum()     return result   def foo():     pool = multiprocessing.pool(processes=nproc)     xlist = range(2,2+nproc)     print xlist     result = pool.map(f, xlist)     print result  if __name__ == '__main__':     foo() 

when run shell itself, using "top" can see 3 python processes each taking 100% of cpu on 16-core machine.

node094:mpi[ 206 ] /usr/bin/time junk.py [2, 3, 4] 2 3 4 [333343333400000.0, 333348333450000.0, 333353333500000.0] 62.68user 0.04system 0:21.11elapsed 297%cpu (0avgtext+0avgdata 16516maxresident)k 0inputs+0outputs (0major+11092minor)pagefaults 0swaps 

however, if invoke mpirun, each python process takes 33% of cpu, , overall takes 3 times long run. calling -np 2 or more results in more processes, doesn't speed computation any.

node094:mpi[ 208 ] /usr/bin/time mpirun -np 1 junk.py [2, 3, 4] 2 3 4 [333343333400000.0, 333348333450000.0, 333353333500000.0] 61.63user 0.07system 1:01.91elapsed 99%cpu (0avgtext+0avgdata 16520maxresident)k 0inputs+8outputs (0major+13715minor)pagefaults 0swaps 

(additional notes: mpirun 1.8.1, python 2.7.3 on linux debian version wheezy. have heard system() not allowed within mpi programs, it's been working me last 5 years on computer. example have called pthread-based parallel code system() within mpi program, , it's used 100% of cpu each thread, desired. also, in case going suggest running python script in serial , calling on more nodes...the mcmc calculation involves fixed number of chains need move in synchronized way, computation unfortunately can't reorganized way.)

openmpi's mpirun, v1.7 , later, defaults binding processes cores - is, when launches python junk.py process, binds core run on. that's fine, , right default behaviour mpi use cases. here each mpi task forking more processes (through multiprocessing package), , forked processes inherit binding state of parent - they're bound same core, fighting amongst themselves. (the "p" column in top show they're on same processor)

if mpirun -np 2, you'll find 2 sets of 3 processes, each on different core, each contending amongst themselves.

with openmpi, can avoid turning off binding,

mpirun -np 1 --bind-to none junk.py 

or choosing other binding makes sense given final geometry of run. mpich has similar options hydra.

note fork()ing of subprocesses mpi isn't safe or supported, particularly clusters running infiniband interconnects, openmpi's mpirun/mpiexec warn if isn't safe.


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

visual studio 2010 - Connect to informix database windows form application -

android - Associate same looper with different threads -