<br><font size=2 face="sans-serif">Hi,</font>
<br><font size=2 face="sans-serif">As I promised you, i 'm sending you the display outputs of 'bjobs' and 'ps' command on the respective hosts as an example.</font>
<br><font size=2 face="sans-serif">This example is for openmpi wrapper. This may help to understand my codings as well.</font>
<br><font size=2 face="sans-serif">There are 2 jobs, the one (1478) started with the wrapper and the one (1477) without the wrapper.</font>
<br><font size=2 face="sans-serif">With the wrapper we can determine how many procs are on which host (2 on artemis3, 2 on artemis4) etc.</font>
<br><font size=2 face="sans-serif">Without the wrapper we can just see it has started on 'artemis3' but we don't know how many procs on artemis3 and on artemis4</font>
<br><font size=2 face="sans-serif">(actually 2 on artemis2 and 2 on artemis4).</font>
<br><font size=2 face="sans-serif">So this jobs should not be taken into account.</font>
<br><font size=2 face="sans-serif">To distinguish between the two, i look in the bjobs command, see where is the master host (where job starts)?, should be the first line of EXEC_HOST</font>
<br><font size=2 face="sans-serif">in this case is 'artemis3' for both jobs. And in this host ps command will show a mpirun --app 'path_to_app_file' while for the other job</font>
<br><font size=2 face="sans-serif">it shows mpirun without --app parameter.</font>
<br><font size=2 face="sans-serif">And in this appfile it'll show the TaskStarter command with -p artemis3:37756, a port number that all subsequent processes should have in each</font>
<br><font size=2 face="sans-serif">remote hosts, while the job without wrapper doesn't have.</font>
<br>
<br>
<br>
<br><font size=2 face="sans-serif">[senglont@artemis3 lsf-ompi]$ bjobs</font>
<br><font size=2 face="sans-serif">JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME</font>
<br><font size=2 face="sans-serif">1477 senglon RUN normal artemis3 artemis3 PP_SLNOWR Feb 4 14:13</font>
<br><font size=2 face="sans-serif">1478 senglon RUN normal artemis3 2*artemis3 PP_SNDRCV Feb 4 14:21</font>
<br><font size=2 face="sans-serif"> 2*artemis4</font>
<br><font size=2 face="sans-serif">[senglont@artemis3 ]$ </font>
<br>
<br><font size=2 face="sans-serif">Bjobs of 1478</font>
<br>
<br><font size=2 face="sans-serif">bjobs -l 1478</font>
<br>
<br><font size=2 face="sans-serif">Job <1478>, Job Name <PP_SNDRCV>, User <senglont>, Project <default>, Status <R</font>
<br><font size=2 face="sans-serif"> UN>, Queue <normal>, Command <#! /bin/bash;# with mpirun </font>
<br><font size=2 face="sans-serif"> wrapper;# essai avec -R span a lancer deux fois lui-meme;</font>
<br><font size=2 face="sans-serif"> # Ok ce script est bon pour lancer 2 jobs;# avec chaque </font>
<br><font size=2 face="sans-serif"> 2proc sur artemis3 et 2proc sur artemis4;#BSUB -J "PP_SNDR</font>
<br><font size=2 face="sans-serif"> CV";#BSUB -m "artemis3 artemis4";#BSUB -o PP_SNDRCV.%J;#BS</font>
<br><font size=2 face="sans-serif"> UB -n 4;#BSUB -e PP_SNDRCVerr.%J;#BSUB -a openmpi;#BSUB -R</font>
<br><font size=2 face="sans-serif"> "span[ptile=2]";source ~/.bashrc_lompi;mpirun.lsf --prefi</font>
<br><font size=2 face="sans-serif"> x /home_nfs/senglont/ompi_inst/1.3.3/ ./pp_sndrcv_spbl></font>
<br><font size=2 face="sans-serif">Thu Feb 4 14:21:02: Submitted from host <artemis3>, CWD <$HOME/mympi/lsf-ompi></font>
<br><font size=2 face="sans-serif"> , Output File <PP_SNDRCV.%J>, Error File <PP_SNDRCVerr.%J></font>
<br><font size=2 face="sans-serif"> , 4 Processors Requested, Requested Resources <span[ptile=</font>
<br><font size=2 face="sans-serif"> 2]>, Specified Hosts <artemis3>, <artemis4>;</font>
<br><font size=2 face="sans-serif">Thu Feb 4 14:21:04: Started on 4 Hosts/Processors <2*artemis3> <2*artemis4>, E</font>
<br><font size=2 face="sans-serif"> xecution Home </home_nfs/senglont>, Execution CWD </home_n</font>
<br><font size=2 face="sans-serif"> fs/senglont/mympi/lsf-ompi>;</font>
<br><font size=2 face="sans-serif">Thu Feb 4 15:19:57: Resource usage collected.</font>
<br><font size=2 face="sans-serif"> The CPU time used is 3526 seconds.</font>
<br><font size=2 face="sans-serif"> MEM: 14 Mbytes; SWAP: 611 Mbytes; NTHREAD: 14</font>
<br><font size=2 face="sans-serif"> PGID: 13623; PIDs: 13631 13635 13637 13638 13623 13624 </font>
<br><font size=2 face="sans-serif"> 13628 13629 </font>
<br><font size=2 face="sans-serif"> PGID: 13639; PIDs: 13639 </font>
<br><font size=2 face="sans-serif"> PGID: 13640; PIDs: 13640 </font>
<br><font size=2 face="sans-serif"> PGID: 10491; PIDs: 10491 </font>
<br><font size=2 face="sans-serif"> PGID: 10492; PIDs: 10492 </font>
<br>
<br>
<br><font size=2 face="sans-serif"> SCHEDULING PARAMETERS:</font>
<br><font size=2 face="sans-serif"> r15s r1m r15m ut pg io ls it tmp swp mem</font>
<br><font size=2 face="sans-serif"> loadSched - - - - - - - - - - - </font>
<br><font size=2 face="sans-serif"> loadStop - - - - - - - - - - - </font>
<br>
<br><font size=2 face="sans-serif">[senglont@artemis3 lsf-ompi]$ </font>
<br><font size=2 face="sans-serif">PS from artemis3</font>
<br><font size=2 face="sans-serif"> </font>
<br><font size=2 face="sans-serif">[senglont@artemis3 ]$ psu</font>
<br><font size=2 face="sans-serif"> PID PPID CMD</font>
<br><font size=2 face="sans-serif">10222 10220 sshd: senglont@pts/5</font>
<br><font size=2 face="sans-serif">10223 10222 -bash</font>
<br><font size=2 face="sans-serif">13586 27520 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/etc/res -d /usr/share/lsf/conf -m</font>
<br><font size=2 face="sans-serif">13587 13586 /bin/sh /home_nfs/senglont/.lsbatch/1265289212.1477</font>
<br><font size=2 face="sans-serif">13591 13587 /bin/bash /home_nfs/senglont/.lsbatch/1265289212.1477.shell</font>
<br><font size=2 face="sans-serif">13592 13591 mpirun --prefix /home_nfs/senglont/ompi_inst/1.3.3 -H artemis3,artemis4 -n 4 .</font>
<br><font size=2 face="sans-serif">13594 13592 ./pp_sleep</font>
<br><font size=2 face="sans-serif">13595 13592 ./pp_sleep</font>
<br><font size=2 face="sans-serif">13623 27520 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/etc/res -d /usr/share/lsf/conf -m</font>
<br><font size=2 face="sans-serif">13624 13623 /bin/sh /home_nfs/senglont/.lsbatch/1265289662.1478</font>
<br><font size=2 face="sans-serif">13628 13624 /bin/bash /home_nfs/senglont/.lsbatch/1265289662.1478.shell</font>
<br><font size=2 face="sans-serif">13629 13628 pam -g /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/openmpi_wrapper --prefi</font>
<br><font size=2 face="sans-serif">13631 13629 /bin/sh /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/openmpi_wrapper --pref</font>
<br><font size=2 face="sans-serif">13635 13631 mpirun <b>--app /home_nfs/senglont/.openmpi_appfile_1478</b></font>
<br><font size=2 face="sans-serif">13637 13635 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter <b>-p artemis3:37756</b></font>
<br><font size=2 face="sans-serif">13638 13635 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter <b>-p artemis3:37756</b></font>
<br><font size=2 face="sans-serif">13639 13637 ./pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif">13640 13638 ./pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif">13645 27420 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/etc/res</font>
<br><font size=2 face="sans-serif">13699 10223 ps -o pid,ppid,cmd -u senglont</font>
<br><font size=2 face="sans-serif">[senglont@artemis3 lsf-ompi]$ </font>
<br><font size=2 face="sans-serif">[senglont@artemis3 lsf-ompi]$ [senglont@artemis3 lsf-ompi]$ <b>cat /home_nfs/senglont/.openmpi_appfile_1478</b></font>
<br><font size=2 face="sans-serif">-host artemis4 -n 2 --prefix /home_nfs/senglont/ompi_inst/1.3.3/ /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter -p artemis3:37756 -c /usr/share/lsf/conf -s /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/etc -a X86_64 ./pp_sndrcv_spbl </font>
<br><font size=2 face="sans-serif">-host artemis3 -n 2 --prefix /home_nfs/senglont/ompi_inst/1.3.3/ /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter -p artemis3:37756 -c /usr/share/lsf/conf -s /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/etc -a X86_64 ./pp_sndrcv_spbl </font>
<br><font size=2 face="sans-serif">[senglont@artemis3 lsf-ompi]$ </font>
<br>
<br><font size=2 face="sans-serif">PS from artemis4</font>
<br><font size=2 face="sans-serif">[senglont@artemis4 ~]$ psu</font>
<br><font size=2 face="sans-serif"> PID PPID CMD</font>
<br><font size=2 face="sans-serif">10478 1 /home_nfs/senglont/ompi_inst/1.3.3/bin/orted --daemonize -mca ess env -mca ort</font>
<br><font size=2 face="sans-serif">10479 10478 ./pp_sleep</font>
<br><font size=2 face="sans-serif">10480 10478 ./pp_sleep</font>
<br><font size=2 face="sans-serif">10488 1 /home_nfs/senglont/ompi_inst/1.3.3/bin/orted --daemonize -mca ess env -mca ort</font>
<br><font size=2 face="sans-serif">10489 10488 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter <b>-p artemis3:37756</b></font>
<br><font size=2 face="sans-serif">10490 10488 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/bin/TaskStarter <b>-p artemis3:37756</b></font>
<br><font size=2 face="sans-serif">10491 10490 ./pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif">10492 10489 ./pp_sndrcv_spbl</font>
<br><font size=2 face="sans-serif">10493 18965 /usr/share/lsf/7.0/linux2.6-glibc2.3-x86_64/etc/res</font>
<br><font size=2 face="sans-serif">11019 11017 sshd: senglont@pts/8</font>
<br><font size=2 face="sans-serif">11020 11019 -bash</font>
<br><font size=2 face="sans-serif">11054 11020 ps -o pid,ppid,cmd -u senglont</font>
<br><font size=2 face="sans-serif">[senglont@artemis4 ~]$ </font>
<br>
<br><font size=2 face="sans-serif">As you said, we can work it out to optimize the codings to just have one (after the commit).</font>
<br>
<br><font size=2 face="sans-serif">Thipadin.</font>
<br><font size=2 face="sans-serif"> </font>
<br>
<br>