[padb-users] start using padb on TORQUE
Jie Cai
Jie.Cai at anu.edu.au
Thu Nov 25 23:04:06 GMT 2010
On 26/11/10 07:01, David Singleton wrote:
> * Long ago, we changed the format of the PBS exechost string (qstat -n
> output) to be more compact. The relevant parts are like pdsh hostlist
> format, eg.
> v[5-6,15-18,30-31]/cpus=0-7/mems=0-1
> We have avoided working out how to get padb to use this format by
> adding an "old exechost format" option to our qstat. For very large
> jobs, I think our format makes more sense and should be easier to use
> with pdsh. We haven't looked at clustershell yet (we use c3 for
> cluster
> management).
>
Here is a little bit extra. With David's current changes on our PBS
system, the old host format looks like following.
$ qstat -w -n -u gec651
xepbs:
Req'd Req'd Elap
Job ID Username Queue Jobname NDS TSK Memory Time S Time
--------------- -------- -------- ---------- --- --- ------ ----- - -----
98856.xepbs gec651 normal adfrun 16 16 20gb 12:00 R 01:10
x146/0+x146/1+x146/2+x146/3+x146/4+x146/5+x146/6+x146/7+x147/0+x147/1+x147/2+x147/3+x147/4+x147/5+x147/6+x147/7
PADB will actually push x146,x146,...x146,x147,x147,....,x147 into
$pbs_tabjobs{$job}{hosts} (in function pbs_get_lqsub()), then spawn
remote processes by 'pdsh -w $pbs_tabjobs{$job}{hosts}'. I have also
changed pbs_get_lqsub() function to filter the redundant host name. I am
not sure whether this is a common problem to other site.
Kind Regards,
Jie
--
Jie Cai Jie.Cai at anu.edu.au
ANU Supercomputer Facility NCI National Facility
Leonard Huxley, Mills Road Ph: +61 2 6125 7965
Australian National University Fax: +61 2 6125 8199
Canberra, ACT 0200, Australia http://nf.nci.org.au
-----------------------------------------------------
More information about the padb-users
mailing list