[padb-users] Problems running padb on large processor counts

Ashley Pittman ashley at pittman.co.uk
Wed Sep 15 12:35:12 BST 2010


On 15 Sep 2010, at 11:47, Duncan Harris wrote:

> Hi.
> One of our users is trying to run padb to find out why his 576 PE job
> is hanging. However when he does he gets this:
> 
> [node195]> padb -x -t -a -Ormgr=mpirun

> 
> Any suggestions as to what the problem could be please?

Can you try setting the environment variable FANOUT to a value of 64 please, when using the mpirun resource manager interface padb uses pdsh to launch the backend and by default pdsh is limited to 32 remote hosts concurrently.  If he is able to then using the "orte" resource manager interface would also solve this problem as that does not use pdsh.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-users mailing list