[padb] Patch of support of Slurm + Openmpi Orte manager

Wed Dec 2 15:51:09 GMT 2009

On Tue, 2009-12-01 at 15:31 +0000, Ashley Pittman wrote:
> I'm away Thursday/Friday this week but should be able to take a closer
> look at the actual code the beginning of next week, as I said I've got a
> cluster I can run it on this time.

The code almost works for me, all I've changed is as I sent before,
using a configuration option to turn it on and adding a call to
target_key_pair($rank,"JOB_SIZE"...), see r344 for details of this.

[ashley at cloud0 src]$ ./padb -a --proc-summary  -Oslurm_orte_alloc=true
Warning, failed to locate ranks [0,2]
rank  hostname  pid   vmsize    vmrss    S  uptime  %cpu  lcore  command 
   1    cloud1  1618  73504 kB  3928 kB  R    1.99    21      0  deadlock
   3    cloud1  1619  73504 kB  3932 kB  R    1.99    23      0  deadlock

As you can see it's missing the processes from cloud0 which is where the
mpirun is executing.  The same job shows up as expected using the orte
resource manager however, the limitation here being it only works from
the node where this is running.

[ashley at cloud0 src]$ ./padb -a --proc-summary -Ormgr=orte
rank  hostname  pid   vmsize    vmrss    S  uptime  %cpu  lcore  command 
   0    cloud0  3199  73380 kB  3900 kB  R    2.00    21      0  deadlock
   1    cloud1  1618  73504 kB  3928 kB  R    1.99    21      0  deadlock
   2    cloud0  3200  73384 kB  3908 kB  R    2.00    18      0  deadlock
   3    cloud1  1619  73504 kB  3932 kB  R    1.99    20      0  deadlock

This is the relevant parts of the process tree from cloud0, you can
trace deadlock back to the mpirun without any slurmstepd on this node at
all.

ps -o pid,ppid,user,cmd -xa
 2851  1219 ashley   salloc -N 2 -n3 -O
 2854  2851 ashley   /bin/bash
 3192  2854 ashley   mpirun -n 4 /home/ashley/general/mpi/deadlock
 3193  3192 ashley   srun --nodes=1 --ntasks=1 --kill-on-bad-exit --nodelist=cloud1 orted -mca ess slurm -mca orte_ess_jobid 258146304 -mca orte_es
 3199  3192 ashley   /home/ashley/general/mpi/deadlock
 3200  3192 ashley   /home/ashley/general/mpi/deadlock

I'm wondering if it might be better to simply walk all processes in a
very similar way to pbs_find_pids and check for OMPI_COMM_WORLD_RANK
OMPI_COMM_WORLD_SIZE, SLURM_JOB_ID and SLURM_STEP_ID.  This code could
then be used as a fallback in case scontol listpids failed to return any
pids and hence wouldn't need any options twiddled to enable it.

Combined with some more intelligent setting of default values for
slurm_job_step and that could make this case full automatic with the
user just specifying the jobid and nothing else.

Attached is the patch as I've been using it.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: padb-slurm-open-2.patch
Type: text/x-patch
Size: 5461 bytes
Desc: not available
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20091202/2af8b613/attachment.bin>