[padb] Réf. : Re: Patch of support of Slurm + Openmpi Orte manager

Tue Dec 1 15:30:22 GMT 2009

On Mon, 2009-11-30 at 17:31 <ashley at pittman.co.uk> wrote:

>I knew you had to do this when running OpenMPI with slurm however I'd
>never done it myself.  My test cluster has both installed so I should be
>able to try it, do you happen to know if you need and special configure
>options to either to allow this?

I used slurm 2.0.1 and openmpi_1.3.3, uppers versions should work also.
I don't know the special configure except in my $PATH, I have added the 
PATH
to where is installed my openMPI_1.3.3 binaries and libs.
Check the path with "type mpirun" command,it should show the PATH to 
openmpi.

>Does the mpirun job (i.e. the processes we want) have it's own slurm job
>step or does it share the job step with the allocation?

Just after salloc step is:
[thipa at vb0 openmpi]$ salloc.sh
salloc: Granted job allocation 27828
[thipa at vb0 openmpi]$
[thipa at vb0 openmpi]$ squeue -s
STEPID         NAME PARTITION     USER      TIME NODELIST
[thipa at vb0 openmpi]$ squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  27828       jlg     bash    thipa   R       0:25      2 vb[8,10]

After mpirun step is:
[thipa at vb0 openmpi]$ squeue -s
STEPID         NAME PARTITION     USER      TIME NODELIST
27828.0       orted       jlg    thipa      1:02 vb[8,10]
[thipa at vb0 openmpi]$

I believe it can't share job step, each job step is its own.

>I also notice the /proc/version in the patch, does this mean the patch
>works on an OS other than Linux?

It is not complete, to run on other OS that linux you must have two 
branches:
1 - with /proc/version using readdir /proc and /proc/$pid/cmdline
2 - with "ps -edf | grep slurmstepd" something like this.

>What happens if you run salloc... srun?  Does this work with the
>existing support and how should users know which resource manager plugin
>to pick (Ideally padb could do the right thing).

You mean salloc ... srun ...mpirun  prog ?
That's what I have experimented:

[thipa at vb0 openmpi]$ salloc.sh
salloc: Granted job allocation 27830
[thipa at vb0 openmpi]$ 
[thipa at vb0 openmpi]$ srun -n1 mpirun -bynode -n 6 ./pp_sndrcv_spbl
srun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1
I am, process 0 starting on vb8, total by srun  6
Me, process 0, send  1000 to process 2
I am, process 2 starting on vb8, total by srun  6
I am, process 4 starting on vb8, total by srun  6
I am, process 1 starting on vb10, total by srun  6
I am, process 5 starting on vb10, total by srun  6
I am, process 3 starting on vb10, total by srun  6

There are 2 steps:
[thipa at vb0 openmpi]$ squeue -s
STEPID         NAME PARTITION     USER      TIME NODELIST
27830.0      mpirun       jlg    thipa      0:22 vb8
27830.1       orted       jlg    thipa      0:22 vb10
[thipa at vb0 openmpi]$

And rmgr=slurm doesn't work (existing support)
You just catch the stack of orted:

[thipa at vb0 openmpi]$ padb_r341  -O stack-shows-locals=no  -O 
stack-shows-params=no -O rmgr=slurm --verbose -tx 27830
Loading config from "/etc/padb.conf"
Loading config from "/home_nfs/thipa/.padbrc"
Loading config from environment
Loading config from command line
Setting 'rmgr' to 'slurm'
Setting 'stack_shows_locals' to 'no'
Setting 'stack_shows_params' to 'no'

Collecting information for job '27830'

Attaching to job 27830
Job has 1 process(es)
Job spans 2 host(s)
Mode 'stack' mode specific options:
     gdb_retry_count : '3'
 max_distinct_values : '3'
  stack_shows_locals : '0'
  stack_shows_params : '0'
   stack_strip_above : 
'elan_waitWord,elan_pollWord,elan_deviceCheck,opal_condition_wait,opal_progress'
   stack_strip_below : 'main,__libc_start_main,start_thread'
    strip_above_wait : '1'
    strip_below_main : '1'
-----------------
[0] (1 processes)
-----------------
main() at main.c:13
  orterun() at orterun.c:686
    opal_event_dispatch() at ?:?
      opal_event_base_loop() at ?:?
        poll_dispatch() at ?:?
          poll() at ?:?
            ??() at ?:?
result from parallel command is 0 (state=shutdown)
[thipa at vb0 openmpi]$

>> [thipa at machu0 padb_open]$ ./padb -O rmgr="sl-orte" -O
>> stack-shows-locals=no  -O stack-shows-params=no --debug=verbose=all
>> -tx 8324 
>> DEBUG (verbose):   0: There are 1 processes over 3 hosts 

>This isn't great, the number of processes expected is so far only used
>to check for missing processes but there are other potential uses for it
>so I'd rather it was correct.

I will dig it more, I don't know the meaning of processes number actually 
you do with.

>> I don't use scontrol listpids, because I found this command not a
>> universal method (some version doesn't have it), 
>> and may issued error message such as : 
>> slurmd[machu139]: proctrack/pgid does not implement
>> slurm_container_get_pids 

>I'd prefer to use this if at all possible, this option was added at a
>request my be several years ago so I'd have thought most versions have
>it by now, can you be clearer on the versions where it doesn't work?

It work only for slurm upper from 1.2, may be some clients have it still ?
If you can get rid of messages above (slurmd[hostnn]: proctrack/pgid does 
not implement)
I am ok.

Thipadin.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20091201/a7e57760/attachment.html>