[padb-users] padb stalls with no output.

Rahul Nabar rpnabar at gmail.com
Fri Aug 20 20:53:09 BST 2010


On Fri, Aug 20, 2010 at 1:46 PM, Ashley Pittman <ashley at pittman.co.uk> wrote:
> On 20 Aug 2010, at 19:38, Ashley Pittman wrote:
>
>> The problem is with OpenMPI and with it's state directories in /tmp in particular, it could either be that you've had a parallel job that crashed and has left files around or it could that you are running multiple versions of OMPI and they don't like talking to each other.
>
> Oh - I forgot to say how to "fix" this issue, wait until there are no jobs running and remove all OPMI related files and directories in /tmp on the node where the mpirun process was running.

Yup! Thanks! This fixed it.

It's funny that I had rebooted all nodes but that hadn't fixed it.
Somehow the bad state was persistent through the reboot. Maybe because
it is based on the files and dirs in /tmp.

Maybe I will change my reboot protocol to clean up /tmp

-- 
Rahul




More information about the padb-users mailing list