[padb-users] Problems running padb on large processor counts

Duncan Harris harris.duncan at gmail.com
Wed Sep 15 16:14:53 BST 2010


Hi.
Thanks for the prompt reply Ashley, that worked a treat.

Duncan

On Wed, Sep 15, 2010 at 12:35 PM, Ashley Pittman <ashley at pittman.co.uk> wrote:
>
> On 15 Sep 2010, at 11:47, Duncan Harris wrote:
>
>> Hi.
>> One of our users is trying to run padb to find out why his 576 PE job
>> is hanging. However when he does he gets this:
>>
>> [node195]> padb -x -t -a -Ormgr=mpirun
>
>>
>> Any suggestions as to what the problem could be please?
>
> Can you try setting the environment variable FANOUT to a value of 64 please, when using the mpirun resource manager interface padb uses pdsh to launch the backend and by default pdsh is limited to 32 remote hosts concurrently.  If he is able to then using the "orte" resource manager interface would also solve this problem as that does not use pdsh.
>
> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
>
>




More information about the padb-users mailing list