[padb-users] running with SGE/OMPI

Ashley Pittman ashley at pittman.co.uk
Thu Jul 8 22:50:30 BST 2010


On 8 Jul 2010, at 16:38, Dave Love wrote:

> I'd like to use padb with OpenMPI jobs under Gridengine.

In padb-parlance Gridengine would be the scheduler which padb is un-interested in and given you mention ompi-ps presumably the resource manager is orte.  Padb only interfaces with the resource manager of these two.

orte is fully supported as a resource manager and has been since this project went public.  The resource manger should be automatically detected based on which binaries it can find on path, you can also manually set it if you need to.

> Currently it
> sees no jobs.  What do I need to make this work?  Is a working ompi-ps
> what's required?  That currently isn't working, and I'm not sure how
> it's supposed to, but might be able to fix it.

You have two choices here, you can either use "orte" as the resource manager in which case a working ompi-ps is required or you can use "mpirun" as the resource manager in which case it'll attach to the orterun (or mpirun) process with gdb and read the data it needs directly.  In both of these cases the data is only available, and hence you'll need to run padb, on the node where the orterun process is running, given you are using Gridengine finding this node could be a non-trivial problem but it depends on your setup.

orte would be the preferred resource manager to use, you'll need to ensure that the environment you run padb under is the same as used for the parallel job (PATH,LD_LIBRARY_PATH) to avoid any version problems between different versions of the orte tools, be aware that using an incorrect ompi-ps version can cause orted to crash and running jobs to fail so tread carefully.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-users mailing list