[padb] Réf. : Re: Réf. : Re: Réf. : Bullchanges ( with LSF -mpich2wrapper patch )

thipadin.seng-long at bull.net thipadin.seng-long at bull.net
Tue Feb 2 16:03:08 GMT 2010


On Jan 27 2010 at 18:30 Ashley Pittman <ashley at pittman.co.uk> wrote:
> As they are dependant on each other could you send them as a single, 
combined patch please.

I'm sending the combined patch against r386:



>I'm not sure that your loop over @chaps in lsfmpich2wr_get_mpiproc() is 
correct, should the if ($found_app != 0) test be outside of the main loop? 
 Again a comment explaining what the code is trying to extract would be 
>useful here.

This subroutine is trying to extract a file path wich is just after param 
--app
ex: --app foo ..., (foo is the file) so the test is just for the loop on 
every fields(words) for this line.

You are free to optimize my codings, just got to get them working.

Thipadin,
Regards.






Ashley Pittman <ashley at pittman.co.uk>
01/27/2010 06:30 PM

 
        Pour :  thipadin.seng-long at bull.net
        cc :    padb-devel at pittman.org.uk, Andry.Razafinjatovo at bull.net, 
florence.vallee at bull.net, Sylvain Jeaugey <sylvain.jeaugey at bull.net>
        Objet : Re: [padb] Réf. : Re: Réf. : Bull changes ( with LSF -mpich2wrapper patch 
)


On 21 Jan 2010, at 14:20, thipadin.seng-long at bull.net wrote:
> 
> I get back to you after a short break, as I've been doing some 
validation on a openmpi spawn functionality. 
> Now I've finished what you've asked me above,  I am just sending both 
patches. 
> One for lsf-mpich2 wrapper, and the other one with lsf-openmpi wrapper. 
I did it against r386 version. 
> Both are alike and have many common sub routines. As the patches are 
seperated some routines 
> are in both patches. I prefer you integrate once as you can factorize. 
> If you need some 'ps' or 'bjobs' command layouts to understand the 
coding, please ask, I'll send you. 

As they are dependant on each other could you send them as a single, 
combined patch please.

I don't have systems I can test this on as I don't have lsf but I would 
like to understand the code, could you put together a paragraph for each 
rmgr describing how the underlying resource manager lays out processes and 
how padb finds it's information.  I'm particularly interested in why it 
has to ssh around to different nodes to see the information it needs.

With the ps command you can prevent the printing of headers by using the 
option "-o pid=,ppid=,cmd=" which will avoid the special case for removing 
these later on.  Stripping the leading spaces from ps output is already 
done in get_extended_process_list(), can you use the same regexp in 
get_line_ppid() for clarity please.

I'm not sure that your loop over @chaps in lsfmpich2wr_get_mpiproc() is 
correct, should the if ($found_app != 0) test be outside of the main loop? 
 Again a comment explaining what the code is trying to extract would be 
useful here.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20100202/a467d7ab/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lsf_mpich2_ompi.patch
Type: application/octet-stream
Size: 17901 bytes
Desc: not available
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20100202/a467d7ab/attachment.obj>


More information about the padb-devel mailing list