[padb-users] Fwd: Error message from /opt/sbin/libexec/minfo: No DLL to load

Ashley Pittman ashley at pittman.co.uk
Thu Aug 19 12:18:07 BST 2010


On 19 Aug 2010, at 10:51, Daniel Kidger wrote:

> >As a final point debugging collectives can be hard, in a deadlock situation it can be hard to tell if all >ranks are on the same iteration or if some are ahead of others and some are behind, I have a >patch to Open-MPI to add a counter to all collective calls to allow this situation to be detected and >reported correctly, if you're still stuck even with the stack trace then you might find this of use.  It'll >mean patching you MPI build and fixing the above problem with the DLL.
> 
> I would be particularly interested in this patch.
> Albeit it is often further complicated in that with the code I am working on often calls collectives like MPI_Allgather from various subsets of MPI_COMM_WORLD such that I do no expect all process to have called it the same number of times - does your patch allow for this?

Yes it does.  To be clear the "collective debugger" functionality is a proposal for extending the specification between tool (in this case padb) and the MPI library.  The patch is a implementation of the proposal for Open-MPI so you will need to re-compile your MPI library to use it.  Unfortunately it's looking like the proposal might not be formally adopted purely due to a lack of time on my part but I'm hoping that it can be made to work somehow.

The patch and it's background are on-line although unfortunately no sample output from when padb uses this.

http://padb.pittman.org.uk/extensions.html

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-users mailing list