[padb] Réf. : Re: Réf. : Réf. : Réf . : Re: Réf. : Re: Réf. : Bullchanges ( with LSF -mpich2 wrapper and -openmpi_wrapper combined)

thipadin.seng-long at bull.net thipadin.seng-long at bull.net
Tue Feb 16 16:08:55 GMT 2010


On 02/15/2010 19:17 Ashley Pittman <ashley at pittman.co.uk> wrote:

>On 9 Feb 2010, at 16:03, thipadin.seng-long at bull.net wrote:

>> I've eventually combined my previous coding for mpich2 and openmpi 
wrapper on LSF as we discussed. 
>> I hope you haven't yet commit the previous sending. 
>> In the "outer" side we can store differents combined jobs (whatever 
mpich2 or openmpi) in the table. 
>> Each job is tagged in jobid{lsf_mpi} = 1 for mpich2 and 2 for openmpi. 
>> The flag is passed through inner_conf{lsf_mpi} to the inners processus 
so they can do differents treatments for each wrapper  to find the 
processus. 
>> The RMGR is 'lsf-mpiwr' as mpi wrapper as it must be lauched by a 
wrapper. So It can be used for further mpi wrapper. 
>
>I've renamed the rmgr as lsf rather than lsf-mpiwr as the -mpiwr only 
serves to add confusion.  If and when >better LSF support comes along it 
can share the same rmgr setting.  I also changed lsf_mpi to lsf_mode and 
gave >it string values instead of int values as well as this should make 
the code easier to read.
>
>> I've enjoyed  meeting you. Hoping you can come often to CEA. 
> I hope you'll commit it soon as we expect to deliver to CEA soon. 
>
>Thank you very much for the patch, I'm back from Holiday now so have some 
time to look at this again.
>
>I've committed a variant as r388.  I hope I haven't broken anything but 
can you test it please.  I'm interested >to see the output if a valid LSF 
job is specified but it doesn't use a wrapper of the correct style, is a 
>correct and clear error message given in this case?  As I said I don't 
have access to LSF myself so I've tried >to keep any changes to a minimum.

>Ashley,

I tested the 3.2 beta0 release version, you just missed slurm_cmd at line 
919 as below:
[senglont at artemis1 lsf-ompi]$ ./padb -O rmgr=lsf -atx
Undefined subroutine &main::slurm_cmd called at ./padb line 919.
[senglont at artemis1 lsf-ompi]$ ./padb -V
padb version 3.2 (Revision 389)

Written by Ashley Pittman
http://padb.pittman.org.uk
[senglont at artemis1 lsf-ompi]$

sources is:
sub slurp_remote_cmd {
    my ( $host, $cmd ) = @_;
    return slurm_cmd("ssh $host $cmd");
}

I guess it should have been 'slurp_cmd' instead of 'slurm_cmd'.
I'll modify myself and re-try.

Thipadin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20100216/9bab5982/attachment.html>


More information about the padb-devel mailing list