[padb-users] problem with minfo

Ashley Pittman ashley at pittman.co.uk
Mon Dec 2 23:10:28 GMT 2013


Dave,

Thank you for your query, I’ve just tried this with the current tip of openmpi and I’m seeing something very similar although it includes more helpful information.  This looks like an issue with openmpi itself, I know there have been problems with the topology code in the past but I thought these had all been resolved now.

[22:46] [0] ashley at cloud2:~/code/padb $ ./src/padb -aq -Ormgr=mpirun
Warning: errors reported by some ranks
========
[0-1]: Error message from /home/ashley/code/padb/./src/minfo: image_has_queues() failed
========
----------------
[0-1]: Error string from DLL
----------------
Failed to find some type
----------------
[0-1]: Message from DLL
----------------
mca_topo_base_module_2_1_0_t

I’ll try some more tests with openmpi 1.6.5 tomorrow but it looks to me like openmpi itself is broken rather than anything with padb itself.

In detail what I think is happening here is that the MPIR_Ignore_queues symbol isn’t being found (as it shouldn’t be) but rather the ompi_fill_in_type_info() function is failing, and in particular it is unable to find the mca_topo_base_module_2_1_0_t symbol which it treats as a critical error.  I’ve grepped the source for this string and the only matches are from the debugger directory itself which makes me think that either the code is plain wrong or that the “2_1_0” part of the type is somehow auto-generated and the debugger code hasn’t been updated to reflect a change elsewhere.

Ashley,

On 2 Dec 2013, at 12:33, Dave Love <d.love at liverpool.ac.uk> wrote:

> I've been trying to run padb (current repo head) against openmpi 1.6.5
> and I see this:
> 
>  Warning: errors reported by some ranks
>  ========
>  [0-31]: Error message from /usr/libexec/minfo: image_has_queues() failed
>  ========
> 
> whereas it seems to be happy with openmpi 1.4.5 on another system.
> 
> I don't understand the debugging support, and there's a "Are we supposed
> to ignore this ?"  comment in ompi where the error return is done.  Any
> suggestions for debugging the debugging?
> 
> Thanks.
> 
> _______________________________________________
> padb-users mailing list
> padb-users at pittman.org.uk
> http://pittman.org.uk/mailman/listinfo/padb-users_pittman.org.uk





More information about the padb-users mailing list