[padb-users] padb stalls with no output.

Ashley Pittman ashley at pittman.co.uk
Fri Aug 20 19:38:26 BST 2010


On 20 Aug 2010, at 19:11, Rahul Nabar wrote:

> Any ideas what could be going on here? What's even more confusing is
> that ompi-ps produces  no output either (see below)! Have I broken my
> mpi install somehow? But that wouldn't make sense since the actual mpi
> tests are running file. Again, the symptoms are so bizarre that I
> suspect I am the one doing something stupid. But can't figure out what
> it is!!


Padb simply calls ompi-ps to get the list of running jobs, if ompi-ps hangs then padb will hang as well, I could make padb handle this case better but only by detecting a timeout and giving an error message to the user, it still wouldn't be able to attach to the job.

The problem is with OpenMPI and with it's state directories in /tmp in particular, it could either be that you've had a parallel job that crashed and has left files around or it could that you are running multiple versions of OMPI and they don't like talking to each other.  Where possible where you are running multiple versions of OMPI I recommend having your PATH and the rest of your environment the same for padb as it is for the job you are trying to inspect, for most cases it makes no difference but there tend to be corner cases when resource managers get upset otherwise.

Can you report this to OpenMPI please, there used to be lots of problems with this kind of issue but it underwent a big cleanup for 1.3 and I've not had a problem myself for a long time now.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-users mailing list