[padb-users] start using padb on TORQUE

Jie Cai Jie.Cai at anu.edu.au
Thu Nov 25 23:04:06 GMT 2010


On 26/11/10 07:01, David Singleton wrote:
> * Long ago, we changed the format of the PBS exechost string (qstat -n
>   output) to be more compact. The relevant parts are like pdsh hostlist
>   format, eg.
>       v[5-6,15-18,30-31]/cpus=0-7/mems=0-1
>   We have avoided working out how to get padb to use this format by
>   adding an "old exechost format" option to our qstat.  For very large
>   jobs, I think our format makes more sense and should be easier to use
>   with pdsh.  We haven't looked at clustershell yet (we use c3 for 
> cluster
>   management).
>
Here is a little bit extra. With David's current changes on our PBS 
system, the old host format looks like following.

$ qstat -w -n -u  gec651

xepbs:
                                                      Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- --- --- ------ ----- - -----
98856.xepbs     gec651   normal   adfrun      16  16   20gb 12:00 R 01:10
x146/0+x146/1+x146/2+x146/3+x146/4+x146/5+x146/6+x146/7+x147/0+x147/1+x147/2+x147/3+x147/4+x147/5+x147/6+x147/7

PADB will actually push x146,x146,...x146,x147,x147,....,x147 into 
$pbs_tabjobs{$job}{hosts} (in function pbs_get_lqsub()), then spawn 
remote processes by 'pdsh -w $pbs_tabjobs{$job}{hosts}'. I have also 
changed pbs_get_lqsub() function to filter the redundant host name. I am 
not sure whether this is a common problem to other site.

Kind Regards,
Jie

-- 
Jie Cai                         Jie.Cai at anu.edu.au
ANU Supercomputer Facility      NCI National Facility
Leonard Huxley, Mills Road      Ph:  +61 2 6125 7965
Australian National University  Fax: +61 2 6125 8199
Canberra, ACT 0200, Australia   http://nf.nci.org.au
-----------------------------------------------------





More information about the padb-users mailing list