[padb] Réf. : Re: Réf. : Re: Réf. : Réf . : Réf. : Re: Réf. : Re: Réf. : Bullchanges( with LSF -mpich2 wrapper and -openmpi_wrapper combined)

thipadin.seng-long at bull.net thipadin.seng-long at bull.net
Tue Feb 16 23:29:35 GMT 2010


On 16 Feb 2010, at 23:16 Ashley Pittman <ashley at pittman.co.uk> wrote:

>On 16 Feb 2010, at 16:08, thipadin.seng-long at bull.net wrote:
>> I guess it should have been 'slurp_cmd' instead of 'slurm_cmd'. 
>> I'll modify myself and re-try. 
>
>Fixed.  It shouldn't affect anyone other than you so I won't make another 
beta release at this stage if you're >happy to make the change locally 
yourself.
>

I was testing further and there's still another problem, i guess it came 
from the ps command you changed.

[senglont at artemis1 lsf-ompi]$ ./padb -O rmgr=lsf -tx 1516
Use of uninitialized value in numeric eq (==) at ./padb line 2896.
Use of uninitialized value in numeric eq (==) at ./padb line 2896.
Use of uninitialized value in numeric eq (==) at ./padb line 2896.
Use of uninitialized value in numeric eq (==) at ./padb line 2896.

Here's the result of the break point after the call to slurp_cmd:

[senglont at artemis1 lsf-ompi]$ perl -d ./padb -O rmgr=lsf -tx 1516

Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(./padb:345):     my $svn_revision_string = '$Revision: 389 $';
  DB<1> b 2939
  DB<2> b 2942
  DB<3> c
main::lsfmpi_get_mpiproc(./padb:2939):
2939:       my @handle =
2940:         slurp_remote_cmd( $host, "ps -o pid=,ppid=,cmd= -u 
$target_user" );
  DB<3> c
main::lsfmpi_get_mpiproc(./padb:2942):
2942:       $count_line = @handle;
  DB<3> p @handle
,ppid=,cmd=
      16179
      16180
      16184
      16185
      16187
      16191
      16193
      16194
      16195
      16196
      16201
      16202
      16203
      16207
      16208
      16210
      16214
      16215
      16216
      16217
      16218
      16221
      21554
      21555
  DB<4> 


In my version the ps command was:

my $cmd = "ssh $host ps -o pid,ppid,cmd -u $target_user ";
which display this on host 'artemis4':

[senglont at artemis1 lsf-ompi]$ ssh artemis4 ps -o pid,ppid,cmd -u senglont
  PID  PPID CMD
16179  2787 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/etc/res -d 
/usr/share/lsf/conf -m artemis1 
/home_nfs/senglont/.lsbatch/1266322840.1516
16180 16179 /bin/sh /home_nfs/senglont/.lsbatch/1266322840.1516
16184 16180 /bin/bash /home_nfs/senglont/.lsbatch/1266322840.1516.shell
16185 16184 pam -g 
/usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/openmpi_wrapper --prefix 
/home_nfs/senglont/ompi_inst/1.3.3/ ./pp_sndrcv_spbl
16187 16185 /bin/sh 
/usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/openmpi_wrapper --prefix 
/home_nfs/senglont/ompi_inst/1.3.3/ ./pp_sndrcv_spbl
16191 16187 mpirun --app /home_nfs/senglont/.openmpi_appfile_1516 
...................................;
.........................................................................


So can you tell me what you would have wanted to do!!!

Thipadin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20100217/22bd5e63/attachment.html>


More information about the padb-devel mailing list