[padb-users] start using padb on TORQUE

Thu Nov 25 20:53:41 GMT 2010

On 25 Nov 2010, at 20:01, David Singleton wrote:
> * Long ago, we changed the format of the PBS exechost string (qstat -n
>  output) to be more compact. The relevant parts are like pdsh hostlist
>  format, eg.
>      v[5-6,15-18,30-31]/cpus=0-7/mems=0-1
>  We have avoided working out how to get padb to use this format by
>  adding an "old exechost format" option to our qstat.  For very large
>  jobs, I think our format makes more sense and should be easier to use
>  with pdsh.  We haven't looked at clustershell yet (we use c3 for cluster
>  management).

I'm not sure if padb has code to extract hostlists from strings like that but I suspect it does, it's one piece of code I've written at least four or five times over the years.  It sounds like this would be useful upstream in pbs and if that becomes the case then I'd gladly add this functionality to padb.  As you have something working I'd be reluctant to add custom code for a single site.

Having checked there is a implementation of this login in the rms_job_to_nhosts() which could easily be factored out to support the above.

> * We run all jobs under project groups which are not user login groups.
>  That causes grief for "rsh node gdb ..." type debugging because of
>  insufficient privileges.  Since this is a common problem for us, we
>  have a variant of newgrp that we can insert in remote commands to
>  overcome this, eg
>      rsh node nfnewgrp projgroup gdb ...
> 
>  Note that all variants of PBS support users nominating their jobs
>  execution group (the group_list/egroup job attributes) but I dont know
>  how commonly this is exercised.

In this context I'm assuming by group you mean a pbs concept and not a linux group (from /etc/groups), if it was the latter and usernames were the same this would be a non-issue.  I'm curious to know how you've made this work with the current setup then?  It would be possible for me to add this, at least for the pdsh and ssh launch-modes and I assume that any rmgr launch-mode would handle this itself.  I assume there is a way of discovering which remote user a job is running as?

Currently I'm trying to stabilise a release but this is something I could look at once that process is complete if there is demand for it.

> * A common variant of MPI jobs are those launched like
>      mpirun wrapper_script mpi_executable
>  so that the parent of the MPI tasks is not orted/mpid/mpirun. We are
>  interested in ways to support such jobs.  Since job processes are
>  contained in cpusets (cgroups) on our system, we can easily get the
>  relevant process list and then use environment to find ranks. Will it
>  matter if a non-MPI process with OMPI_COMM_WORLD_RANK set is queried
>  for message queue info?  Does it matter that two process have the same
>  rank?

Padb *should* handle this case, it only allows one process per rank and, depending on the resource manager, it'll either pick the direct child of the resource manager or if that process is deemed to be a wrapper script and has any children then it will pick the first one.  The definition of wrapper script can be configured by the "scripts" configuration option, it defaults to "bash,sh,dash,ash,perl,xterm" so should cover most bases.

The code for this is in convert_pids_to_child_pids() and is called once per node and passed a list of potential process which are direct descendants of the resource manager and makes a decision based on what processes are active.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk