[padb-users] running with SGE/OMPI

Ashley Pittman ashley at pittman.co.uk
Mon Jul 12 14:01:14 BST 2010


On 9 Jul 2010, at 15:32, Dave Love wrote:

> Ashley Pittman <ashley at pittman.co.uk> writes:
> 
> I assumed Gridengine is relevant (a) in referring to `jobs', and (b) in
> that I think the OpenMPI tight integration is relevant, at least because
> it seems ompi-ps appears to be looking in the wrong place for files.

You are right, padb will use the "jobid" that orte had allocated the job rather than the id that Gridengine has given it but the tight integration mighy have changed the orte behaviour.  I see this with mpd (Mpich2) and PBS as well where PBS sets an environment variable which causes mpd to store it's temporary files under a different filename.  Unfortunately this is very hard to get around.

> That's easy, but neither mpirun nor orte work.  With mpirun I get
> 
> Error, resource manager "mpirun" not supported

You need to use the 3.2 beta release for this, I keep forgetting it's not in 3.0.  When using this method of attaching to jobs you have to run padb on the host where the "mpirun" process is running and the jobid will be the pid of that process.  Padb use pdsh to launch itself on the nodes so you'll need to have this installed if you haven't already.

> and orte doesn't find any jobs because ompi-ps doesn't.  I'll try to
> figure out what's going on when I get some time.

Unfortunately without a working ompi-os padb has no way of collecting the information it needs so the orte resource manager won't work for you in this case, you could on the opmi-users list to see if there is anything they recommend, as above we managed to get this working on MPICH2 recently by asking users to unset PBS_JOBID in their job script. 

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-users mailing list