[padb-users] Error message from /opt/sbin/libexec/minfo: No DLL to load

Ashley Pittman ashley at pittman.co.uk
Wed Aug 18 19:15:24 BST 2010


On 18 Aug 2010, at 19:11, Rahul Nabar wrote:

> Any ideas what the following error message is indicative of? Did I
> miss a step in the padb installation?
> 
> Background: I'm trying to debug a 32 node (256 core) Open-MPI job using padb.
> 
> Warning: errors reported by some ranks
> ========
> [0-255]: Error message from /opt/sbin/libexec/minfo: No DLL to load
> ========
> Warning: errors reported by some ranks
> ========
> [0-255]: Error message from /opt/sbin/libexec/minfo: No DLL to load
> ========

This error means that padb is unable to find the name of the debuger DLL which is supposed to be provided by the MPI library.

To give a bit of background here the way message queues are implemented is the MPI library provides a library which the debugger loads into it's own memory space and is used to extract the message queues from the parallel MPI process.  This dll is built and installed with the MPI library but loaded by and into the debugger.  To allow the debugger to find the library it's install location is built into the MPI process in a text variable.  A short aside here is that because it's a text location it's defined at build time so it's impossible for a MPI library to be re-located after build which is unfortunate.

The error you are getting means that the MPI library isn't exporting this text string for the filesystem location of the library which could either be because you aren't really looking at an MPI process or because Open-MPI wasn't build with debugger support.

With Open-MPI the debugger library is called $OPAL_PREFIX/lib/libompi_dbg_msgq.so IIRC so you could check if this file exists, if it doesn't then you need to check with Open-MPI what steps are needed to ensure this is built.  I thought it was built automatically but this is not the case with all MPI's and it doesn't help matters that in some cases if the build of this DLL fails then the build of MPI could still succeed - I fixed this in around the 1.4 timeframe.

Alternatively as I say it could be that padb isn't finding the correct processes, does the rest of the output look correct for what you are expecting and are you using some kind of wrapper script between mpirun and your executable?  padb should detect this case and act correctly but it is another possible cause.


Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-users mailing list