[padb] Réf. : Better handling of threads in stack traces.

thipadin.seng-long at bull.net thipadin.seng-long at bull.net
Fri Dec 18 13:37:15 GMT 2009


On Nov 30th, 2009 <ashley at pittman.co.uk> wrote:

> I've been giving some thought to how to padb can handle threaded
> applications better as the current scheme isn't ideal.
> 
> 
> First would be to report extra threads in the same tree as the primary
> thread, some magic would have to be applied to cover the fact that the
> first thread in a process starts with main and subsequent ones start
> with pthread_create() but this wouldn't be a insurmountable problem.
> The big problem with this approach would be how to report thread
> identifiers in the same rank-spec as rank rank identifiers, I could
> revert to just using a list here but that doesn't work so well on big
> systems.

> The second option would be to treat each thread as a different entity
> within the rank/process and have a number of trees displayed per job,
> each dealing with a different thread, e.g. there would be a tree per
> main thread and another tree for each extra thread encountered.  From a
> technical perspective implementing this would require adding a namespace
> to the {target_output} as it's passed back up the comms tree so is the
> hardest to add but would probably lead to the best solution.

> Finally there is the option of not showing all threads but allowing
> users to select a single thread per invocation of padb.  This is the
> simple but functional option although might be best viewed as a step
> along the way to fully supporting multiple threads in future.  Here the
> options are to be able to select threads by id (1,2,...) or perhaps by
> having a white/black list of function names that should appear in the
> stack for a thread before a thread is shown.

> I'd welcome ideas on which people would prefer or if anybody has any
> other thoughts on how to handle threads properly.


I have a support request from Bull customer that would like to have
padb report sorted by threads as below:
   Thread: 1
   --------------------------
   [0-1999] (2000 processes)
   ---------
   main()
    PMPI_Finalyse()
     ompi_mpi_finalyze()
      barrier()
      ----------------
      ......(249 processes)
      ---------------
       orte_grpcomm_base_allgather()
        opal_progress()
         opal_event_loop()
          epoll_dispatch()
           epoll_wait()
       ---------------
       .....  (1751 processes)
       ----------------
        opal_progress()
         opal_event_loop()
          epoll_dispatch()
           epoll_wait()
   Thread: 2
   --------------------------
   [0-1999] (2000 processes)
   --------- 
      ....
   Thread: 3
   --------------------------
   [0-1999] (2000 processes)
   --------- 
      ....

This report should be by job. Would you accept it ?

Thipadin.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20091218/cbd43366/attachment.html>


More information about the padb-devel mailing list