[padb-users] Error message from /opt/sbin/libexec/minfo: No DLL to load

Ashley Pittman ashley at pittman.co.uk
Wed Aug 18 20:10:43 BST 2010


On 18 Aug 2010, at 20:50, Rahul Nabar wrote:

> On Wed, Aug 18, 2010 at 1:15 PM, Ashley Pittman <ashley at pittman.co.uk> wrote:
> 
> To my naive eyes this doesn't mean much but maybe you have a clue? If
> not I'll post on the OpenMPI list (or read their make instructions) to
> see how the debugger support is built in.

I've checked with Jeff already and it's enabled by default.

>> With Open-MPI the debugger library is called $OPAL_PREFIX/lib/libompi_dbg_msgq.so IIRC so you could check if this file exists, if it doesn't then you need to check with Open-MPI what steps are needed to ensure this is built.  I thought it was built automatically but this is not the case with all MPI's and it doesn't help matters that in some cases if the build of this DLL fails then the build of MPI could still succeed - I fixed this in around the 1.4 timeframe.
>> 
> 
> I can't find that specific file in my MPI install.

Can you send the list of the files that you do have there?  I've just checked on a system here and the correct location is $OPAL_PREFIX/lib/openmpi/libompi_dbg_msgq.so (note the extra openmpi in there).  If you do have that file then could you take a peek inside a single MPI process and print out the value of "MPIR_dll_name", just attach with gdb and type "p MPIR_dll_name" should tell you.  I assume you are using a recent version of OpenMPI?

>> Alternatively as I say it could be that padb isn't finding the correct processes, does the rest of the output look correct for what you are expecting and are you using some kind of wrapper script between mpirun and your executable?  padb should detect this case and act correctly but it is another possible cause.
> 
> This is the first time I'm using padb (or a stack debugger for that
> matter!) :) So, not sure what is the "correct" or "typical" output.
> I've pasted a snippet at the very bottom of this message, just in case
> there are any clues.

Mainly that the stack trace is from your application and not from say /bin/sh.  Some people like to run "mpirun sh -c /path/to/my-app" and whilst padb should cope with this if you use a non-standard shell it might not.

> I found the process number like so:
> /opt/sbin/bin/padb --show-jobs --config-option rmgr=orte
> 25883
> /opt/sbin/bin/padb --full-report=25883 --config-option rmgr=orte  |
> tee padb.log.new.new
> 
> What is suspicious though is that this  number does not show up in the
> ps output. Does that imply padb is mis-discovering the process?

No.  35883 is the orte job number, run the command opmi-ps and it'll all become clear.

> Stack trace(s) for thread: 1
> -----------------
> [0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
> -----------------
> main() at ?:?
>  IMB_init_buffers_iter() at ?:?
>    IMB_bcast() at ?:?
>      PMPI_Bcast() at pbcast.c:107
>            params
>              void *         buffer:

[snip]

This is absolutely correct.  The extended output with variables like you have can be over-whelming, it's included in -full-report for off-line diagnostics but if you have a reproducer and can experiment it's often easier to see problems without it so just specifying -xt rather than --full-report

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk





More information about the padb-users mailing list