[padb] Réf. : Re: Patch of support of Slurm + Openmpi Orte manager

Ashley Pittman ashley at pittman.co.uk
Thu Dec 3 11:08:31 GMT 2009


I'm just running out of the door myself and will be away until Sunday
now.

On Thu, 2009-12-03 at 11:45 +0100, thipadin.seng-long at bull.net wrote:
> You have mpirun which has rank0, this shouldn't, and you miss 3,6.

ranks 3 and 6 are on the same node as rank 0, can you try the following
additional patch which should cause it to skip over the mpirun process
and look for local ones based on their environment.

If this patch doesn't work take a look at the the contents
of /proc/$pid/status for the process it's erroneously reporting as rank
0 to see what Name is set to.  In the example you sent it's pid 22210

--- padb-slurm-open-3	2009-12-03 11:03:08.500044734 +0000
+++ padb	2009-12-03 11:03:15.333036493 +0000
@@ -8187,6 +8187,7 @@
         next unless ( $job eq $jobid );
         next unless ( $step == $inner_conf{slurm_job_step} );
         next if( find_from_status( $pid, 'Name' ) eq 'orted');
+        next if( find_from_status( $pid, 'Name' ) eq 'mpirun');
         maybe_show_pid( $global, $pid );
         $found_target = 1;
     }


-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-devel mailing list