[padb] Réf. : Re: Réf. : Re: Patch of support of Slurm + Openmpi Orte manager

Ashley Pittman ashley at pittman.co.uk
Fri Dec 4 10:05:37 GMT 2009


On Fri, 2009-12-04 at 10:35 +0100, Sylvain Jeaugey wrote:
> On Fri, 4 Dec 2009, thipadin.seng-long at bull.net wrote:
> 
> > But no one can prevent somebody to start jobs like this since the syntax 
> > is correct,
> Actually we can. The documentation says : use salloc ... mpirun, not 
> salloc srun -n 1 mpirun. And I wouldn't say that the syntax is correct. It 
> just *happens* to work. With this command, you're launching this chain :
>   salloc -> srun -> mpirun -> srun -> MPI processes
> We're lucky it works !
> 
> > So if some one start jobs like this, padb should be able to support.
> I disagree. Since this has no added value, I don't see why we should 
> support it. But if that's only one extra line of code, then let it be ...

Given that the failure mode was to report the wrong information then I'm
much happier having this code in than not.  I'll probably change the
check to "next if is_resmgr_process($pid);" which is a superset of what
the code does now and means this case is handled no differently to the
normal salloc/mpirun case.

The examples given nicely demonstrate the benefit of having the signon
check for different executable names, I know some people do do this on
purpose but it's rare enough that it's worth warning about if padb
observes it.

Ashley,

-- 

Ashley Pittman, Brighton, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-devel mailing list