[padb] Patch of support of Slurm + Openmpi Orte manager
Ashley Pittman
ashley at pittman.co.uk
Wed Dec 2 15:51:09 GMT 2009
On Tue, 2009-12-01 at 15:31 +0000, Ashley Pittman wrote:
> I'm away Thursday/Friday this week but should be able to take a closer
> look at the actual code the beginning of next week, as I said I've got a
> cluster I can run it on this time.
The code almost works for me, all I've changed is as I sent before,
using a configuration option to turn it on and adding a call to
target_key_pair($rank,"JOB_SIZE"...), see r344 for details of this.
[ashley at cloud0 src]$ ./padb -a --proc-summary -Oslurm_orte_alloc=true
Warning, failed to locate ranks [0,2]
rank hostname pid vmsize vmrss S uptime %cpu lcore command
1 cloud1 1618 73504 kB 3928 kB R 1.99 21 0 deadlock
3 cloud1 1619 73504 kB 3932 kB R 1.99 23 0 deadlock
As you can see it's missing the processes from cloud0 which is where the
mpirun is executing. The same job shows up as expected using the orte
resource manager however, the limitation here being it only works from
the node where this is running.
[ashley at cloud0 src]$ ./padb -a --proc-summary -Ormgr=orte
rank hostname pid vmsize vmrss S uptime %cpu lcore command
0 cloud0 3199 73380 kB 3900 kB R 2.00 21 0 deadlock
1 cloud1 1618 73504 kB 3928 kB R 1.99 21 0 deadlock
2 cloud0 3200 73384 kB 3908 kB R 2.00 18 0 deadlock
3 cloud1 1619 73504 kB 3932 kB R 1.99 20 0 deadlock
This is the relevant parts of the process tree from cloud0, you can
trace deadlock back to the mpirun without any slurmstepd on this node at
all.
ps -o pid,ppid,user,cmd -xa
2851 1219 ashley salloc -N 2 -n3 -O
2854 2851 ashley /bin/bash
3192 2854 ashley mpirun -n 4 /home/ashley/general/mpi/deadlock
3193 3192 ashley srun --nodes=1 --ntasks=1 --kill-on-bad-exit --nodelist=cloud1 orted -mca ess slurm -mca orte_ess_jobid 258146304 -mca orte_es
3199 3192 ashley /home/ashley/general/mpi/deadlock
3200 3192 ashley /home/ashley/general/mpi/deadlock
I'm wondering if it might be better to simply walk all processes in a
very similar way to pbs_find_pids and check for OMPI_COMM_WORLD_RANK
OMPI_COMM_WORLD_SIZE, SLURM_JOB_ID and SLURM_STEP_ID. This code could
then be used as a fallback in case scontol listpids failed to return any
pids and hence wouldn't need any options twiddled to enable it.
Combined with some more intelligent setting of default values for
slurm_job_step and that could make this case full automatic with the
user just specifying the jobid and nothing else.
Attached is the patch as I've been using it.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: padb-slurm-open-2.patch
Type: text/x-patch
Size: 5461 bytes
Desc: not available
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20091202/2af8b613/attachment.bin>
More information about the padb-devel
mailing list