[padb] Patch of support of Slurm + Openmpi Orte manager

Ashley Pittman ashley at pittman.co.uk
Mon Nov 30 16:31:51 GMT 2009


On Mon, 2009-11-30 at 15:36 +0100, thipadin.seng-long at bull.net wrote:
> May I introduce you my patch against padb r341 for supporting Slurm
> combined with openmpi orte manager. 
> The Key is we use salloc to get resource from slurm and then use it to
> run mpirun of openmpi to start jobs. 

I knew you had to do this when running OpenMPI with slurm however I'd
never done it myself.  My test cluster has both installed so I should be
able to try it, do you happen to know if you need and special configure
options to either to allow this?

I'll try and get this running myself in the next couple of days but in
the mean time I've got some questions:

Does the mpirun job (i.e. the processes we want) have it's own slurm job
step or does it share the job step with the allocation?

I also notice the /proc/version in the patch, does this mean the patch
works on an OS other than Linux?

What happens if you run salloc... srun?  Does this work with the
existing support and how should users know which resource manager plugin
to pick (Ideally padb could do the right thing).

> [thipa at machu0 padb_open]$ ./padb -O rmgr="sl-orte" -O
> stack-shows-locals=no  -O stack-shows-params=no --debug=verbose=all
> -tx 8324 
> DEBUG (verbose):   0: There are 1 processes over 3 hosts 

This isn't great, the number of processes expected is so far only used
to check for missing processes but there are other potential uses for it
so I'd rather it was correct.

> I don't use scontrol listpids, because I found this command not a
> universal method (some version doesn't have it), 
> and may issued error message such as : 
> slurmd[machu139]: proctrack/pgid does not implement
> slurm_container_get_pids 

I'd prefer to use this if at all possible, this option was added at a
request my be several years ago so I'd have thought most versions have
it by now, can you be clearer on the versions where it doesn't work?

Ashley,
-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-devel mailing list