[padb] Réf. : Better handling of threads in stack traces.
thipadin.seng-long at bull.net
thipadin.seng-long at bull.net
Fri Dec 18 13:37:15 GMT 2009
On Nov 30th, 2009 <ashley at pittman.co.uk> wrote:
> I've been giving some thought to how to padb can handle threaded
> applications better as the current scheme isn't ideal.
>
>
> First would be to report extra threads in the same tree as the primary
> thread, some magic would have to be applied to cover the fact that the
> first thread in a process starts with main and subsequent ones start
> with pthread_create() but this wouldn't be a insurmountable problem.
> The big problem with this approach would be how to report thread
> identifiers in the same rank-spec as rank rank identifiers, I could
> revert to just using a list here but that doesn't work so well on big
> systems.
> The second option would be to treat each thread as a different entity
> within the rank/process and have a number of trees displayed per job,
> each dealing with a different thread, e.g. there would be a tree per
> main thread and another tree for each extra thread encountered. From a
> technical perspective implementing this would require adding a namespace
> to the {target_output} as it's passed back up the comms tree so is the
> hardest to add but would probably lead to the best solution.
> Finally there is the option of not showing all threads but allowing
> users to select a single thread per invocation of padb. This is the
> simple but functional option although might be best viewed as a step
> along the way to fully supporting multiple threads in future. Here the
> options are to be able to select threads by id (1,2,...) or perhaps by
> having a white/black list of function names that should appear in the
> stack for a thread before a thread is shown.
> I'd welcome ideas on which people would prefer or if anybody has any
> other thoughts on how to handle threads properly.
I have a support request from Bull customer that would like to have
padb report sorted by threads as below:
Thread: 1
--------------------------
[0-1999] (2000 processes)
---------
main()
PMPI_Finalyse()
ompi_mpi_finalyze()
barrier()
----------------
......(249 processes)
---------------
orte_grpcomm_base_allgather()
opal_progress()
opal_event_loop()
epoll_dispatch()
epoll_wait()
---------------
..... (1751 processes)
----------------
opal_progress()
opal_event_loop()
epoll_dispatch()
epoll_wait()
Thread: 2
--------------------------
[0-1999] (2000 processes)
---------
....
Thread: 3
--------------------------
[0-1999] (2000 processes)
---------
....
This report should be by job. Would you accept it ?
Thipadin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20091218/cbd43366/attachment.html>
More information about the padb-devel
mailing list