[padb] Réf. : Re: Réf. : Re: [padb-devel] Patch for Support of PBS Pro resource manager
Ashley Pittman
ashley at pittman.co.uk
Wed Nov 18 17:10:10 GMT 2009
On Wed, 2009-11-18 at 16:48 +0100, thipadin.seng-long at bull.net wrote:
> 1- Path of remote padb:
>
> [thipa at xn5 padb_open]$ ./padb -O rmgr=pbs -tx 27611
> einner: xn20: bash: ./padb: No such file or directory
> As the consequence path is not found,
> So path to remote host must be a full path
I know of this one and don't have a generic solution, other resource
managers suffer from it as well, mpd springs to mind. It should only
occur when developing padb as if you aren't running as ./ it's probably
installed somewhere and will also be installed on the remote nodes.
As a workaround I often type `pwd`/padb which causes it to work, it's
not ideal however.
> I did the patch as follows:
> [snip]
> If you have another idea i take it.
How does this work if you do say ./src/padb -axt? If it works in that
case then I'm happy with the code and I'll commit it, I've not added
anything before as I couldn't think of a generalised solution.
> 2- Use of uninitialized value in subtraction (-) at ./padb line 4077
>
>
> 4077 foreach my $proc ( 0 .. $comm_data->{nprocesses} - 1 ) {
Are you able to extract the process count from the job id and return it
as "nprocesses" in the hash returned by pbs_setup_job()? I'm not
familiar with qstat so I don't know how to find this information.
> 3- Question about starting inner padb:
>
> How can I start an inner padb by hand on a remote host to debug such
> as:
> perl -d ./padb --inner --jobid=27611.xn0 --stack-trace -O rmgr="pbs"
> --line-formatted
> like I did it before, because this command doesn't work anymore.
> You have changed it with "call back" and communication on ports.
You're right in that debugging padb in the new model is a lot more
difficult, --debug full_duplex=all will show all comms between the
inner and the outer process or use --debug all=all and padb will spit
out as much as it can.
I'm not familiar with perl -d so can't help you on that front.
> Here is the diff again r311 (diff r311 newone).
>
> So you can integrate my new patch and try to correct the point 2,
> and send me back the new one, i will test it over.
I'll be able to take a closer look when I'm back from SC, I only have my
netbook with me and aren't able to test anything from here, the patch
looks good so far however.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
More information about the padb-devel
mailing list