[padb-users] Error message from /opt/sbin/libexec/minfo: No DLL to load
Rahul Nabar
rpnabar at gmail.com
Wed Aug 18 19:50:47 BST 2010
On Wed, Aug 18, 2010 at 1:15 PM, Ashley Pittman <ashley at pittman.co.uk> wrote:
>
> This error means that padb is unable to find the name of the debuger DLL which is supposed to be provided by the MPI library.
Thanks Ashley! That helps!
>
> The error you are getting means that the MPI library isn't exporting this text string for the filesystem location of the library which could either be because you aren't really looking at an MPI process or because Open-MPI wasn't build with debugger support.
It could be either. The config.log from the OpenMPI build shows:
$ ./configure --prefix=/opt/ompi_new --with-tm=/opt/torque FC=ifort
CC=icc F77=ifort CXX=icpc CFLAGS=-g -O3 -mp FFLAGS=-mp -recurs
ive -O3 CXXFLAGS=-g CPPFLAGS=-DPgiFortran --disable-shared
--enable-static --with-memory-manager --disable-dlopen
--enable-openib-rd
macm --with-openib=/usr
Some other relevant parts:
configure:4764: checking whether to debug memory usage
configure:4776: result: no
configure:4796: checking whether to profile memory usage
configure:4808: result: no
configure:4828: checking if want developer-level compiler pickyness
configure:4840: result: no
configure:4855: checking if want developer-level debugging code
configure:4867: result: no
configure:5381: checking if want trace file debugging
configure:5393: result: no
To my naive eyes this doesn't mean much but maybe you have a clue? If
not I'll post on the OpenMPI list (or read their make instructions) to
see how the debugger support is built in.
> With Open-MPI the debugger library is called $OPAL_PREFIX/lib/libompi_dbg_msgq.so IIRC so you could check if this file exists, if it doesn't then you need to check with Open-MPI what steps are needed to ensure this is built. I thought it was built automatically but this is not the case with all MPI's and it doesn't help matters that in some cases if the build of this DLL fails then the build of MPI could still succeed - I fixed this in around the 1.4 timeframe.
>
I can't find that specific file in my MPI install.
> Alternatively as I say it could be that padb isn't finding the correct processes, does the rest of the output look correct for what you are expecting and are you using some kind of wrapper script between mpirun and your executable? padb should detect this case and act correctly but it is another possible cause.
This is the first time I'm using padb (or a stack debugger for that
matter!) :) So, not sure what is the "correct" or "typical" output.
I've pasted a snippet at the very bottom of this message, just in case
there are any clues.
I found the process number like so:
/opt/sbin/bin/padb --show-jobs --config-option rmgr=orte
25883
/opt/sbin/bin/padb --full-report=25883 --config-option rmgr=orte |
tee padb.log.new.new
What is suspicious though is that this number does not show up in the
ps output. Does that imply padb is mis-discovering the process?
ps aux | grep mpi
rpnabar 17800 0.0 0.0 20660 2264 pts/0 S+ 13:46 0:00
mpirun -np 256 --host
eu001,eu002,eu003,eu004,eu005,eu006,eu007,eu008,eu009,eu010,eu011,eu012,eu013,eu014,eu015,eu016,eu017,eu018,eu019,eu020,eu021,eu022,eu023,eu024,eu025,eu026,eu027,eu028,eu029,eu030,eu031,eu032
-mca btl openib,sm,self /opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256
bcast
rpnabar 17832 95.3 25.4 4330112 4181284 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17833 95.5 1.0 270100 169748 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17834 95.6 1.0 335608 169784 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17835 95.3 1.0 335608 169788 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17836 95.6 1.0 335288 169812 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17837 95.6 0.5 191204 90996 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17838 95.6 0.5 256840 91000 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17839 95.6 0.5 256740 90984 pts/0 RLl+ 13:46 0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar 17889 0.0 0.0 61140 756 pts/10 S+ 13:47 0:00 grep mpi
############
Warning: errors reported by some ranks
========
[0-255]: Error message from /opt/sbin/libexec/minfo: No DLL to load
========
Warning: errors reported by some ranks
========
[0-255]: Error message from /opt/sbin/libexec/minfo: No DLL to load
========
Total: 0 communicators, no communication data recorded.
Stack trace(s) for thread: 1
-----------------
[0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
-----------------
main() at ?:?
IMB_init_buffers_iter() at ?:?
IMB_bcast() at ?:?
PMPI_Bcast() at pbcast.c:107
params
void * buffer:
'0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-77,79-8
1,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180,182,1
84-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255]
'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,84-85,
87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,175-178,
181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
int count: more than 3 distinct values
MPI_Datatype datatype: more than 3 distinct values
int root: more than 3 distinct values
MPI_Comm comm:
'0x0'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97,112-113,116,11
8,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202,204,206,217-21
8,220,222-223,225,230,232-233,235,237,242]
'0x1'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74-77,79-81,83,86,89-92,95
-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-180,182,184-186,189-191,1
94-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
locals
int err = '1048576'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
-----------------
[0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
-----------------
mca_coll_sync_bcast() at coll_sync_bcast.c:44
params
void * buff:
'0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-77,79
-81,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180,182
,184-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255]
'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,84-8
5,87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,175-17
8,181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
int count: more than 3 distinct values
struct ompi_datatype_t * datatype: more than 3 distinct values
int root: more than 3 distinct values
struct ompi_communicator_t * comm:
'0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74-77,
79-81,83,86,89-92,95-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-180,1
82,184-186,189-191,194-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
'null pointer'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97,112
-113,116,118,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202,204
,206,217-218,220,222-223,225,230,232-233,235,237,242]
mca_coll_base_module_t * module = 'null pointer'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
-----------------
[0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
-----------------
ompi_coll_tuned_bcast_intra_dec_fixed() at
coll_tuned_decision_fixed.c:301
params
void * buff:
'0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-77,
79-81,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180,1
82,184-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255]
'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,84
-85,87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,175-
178,181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
int count: more than 3 distinct values
struct ompi_datatype_t * datatype: more than 3 distinct values
int root: more than 3 distinct values
struct ompi_communicator_t * comm:
'0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74-7
7,79-81,83,86,89-92,95-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-180
,182,184-186,189-191,194-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
'null pointer'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97,1
12-113,116,118,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202,2
04,206,217-218,220,222-223,225,230,232-233,235,237,242]
mca_coll_base_module_t * module = 'null pointer'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
locals
size_t message_size: more than 3 distinct values
-----------------
[0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
-----------------
ompi_coll_tuned_bcast_intra_pipeline() at coll_tuned_bcast.c:310
params
void * buffer:
'0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-7
7,79-81,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180
,182,184-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255
]
'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,
84-85,87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,17
5-178,181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
int count: more than 3
distinct values
struct ompi_datatype_t * datatype: more than 3
distinct values
int root: more than 3
distinct values
struct ompi_communicator_t * comm:
'0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74
-77,79-81,83,86,89-92,95-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-1
80,182,184-186,189-191,194-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
'null pointer'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97
,112-113,116,118,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202
,204,206,217-218,220,222-223,225,230,232-233,235,237,242]
mca_coll_base_module_t * module = 'null pointer'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
uint32_t segsize: more than 3
distinct values
-----------------
[0-23] (24 processes)
-----------------
ompi_coll_tuned_bcast_intra_generic() at coll_tuned_bcast.c:232
params
void * buffer:
'0x2 (Invalid pointer)' [4,17,20]
'null pointer' [0,3,5,7-9,11-12,14,16,18,21,23]
'valid pointer perm=rwxp' [1-2,6,10,13,15,19,22]
int original_count: more than 3
distinct values
struct ompi_datatype_t * datatype: more than 3
distinct values
int root: more than 3
distinct values
struct ompi_communicator_t * comm:
'0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23]
'null pointer' [2,4,6,10,13,15,17,19-20,22]
mca_coll_base_module_t * module = 'null pointer' [0-23]
uint32_t count_by_segment = '8192' [0-23]
ompi_coll_tree_t * tree = 'valid
pointer perm=rwxp' [0-23]
locals
ptrdiff_t extent: more than 3 distinct values
int num_segments = '1' [0-23]
size_t realsegsize: more than 3 distinct values
ompi_request_t *[2] recv_reqs = '{, }' [0-23]
int segindex = '1' [0-23]
char * tmpbuf: more than 3 distinct values
-----------------
[0-23] (24 processes)
-----------------
ompi_request_default_wait() at request/req_wait.c:37
params
ompi_request_t ** req_ptr: more than 3
distinct values
ompi_status_public_t * status: more than 3
distinct values
locals
ompi_request_t * req = 'valid pointer perm=rwxp' [0-23]
-----------------
[0-9,11-16,18-19,21-23] (21 processes)
-----------------
opal_progress() at runtime/opal_progress.c:207
-----------------
[0-3,5-7,9,11-13,15-16,18-19,22-23] (17 processes)
-----------------
btl_openib_component_progress() at
btl_openib_component.c:3175
locals
mca_btl_openib_device_t * device = 'valid
pointer perm=rwxp' [0-3,5-7,9,11-13,15-16,18-19,22-23]
-----------------
[1,13,19] (3 processes)
-----------------
t3b_poll_cq() at src/cq.c:406
params
struct ibv_cq * ibcq = 'valid pointer
perm=rwxp' [1,13,19]
int num_entries = '1' [1,13,19]
struct ibv_wc * wc = 'valid pointer
perm=rwxp ([stack])' [1,13,19]
locals
struct iwch_cq * chp = 'value
optimized out' [1,13,19]
int err = 'value
optimized out' [1,13,19]
int npolled = 'value
optimized out' [1,13,19]
struct iwch_device * rhp = 'valid
pointer perm=rwxp' [1,13,19]
-----------------
[1,13,19] (3 processes)
-----------------
pthread_spin_lock() at ?:?
-----------------
[2] (1 processes)
-----------------
t3b_poll_cq() at src/cq.c:407
params
struct ibv_cq * ibcq = 'valid pointer
perm=rwxp' [2]
int num_entries = '1' [2]
struct ibv_wc * wc = 'valid pointer
perm=rwxp ([stack])' [2]
locals
struct iwch_cq * chp = 'value
optimized out' [2]
int err = 'value
optimized out' [2]
int npolled = 'value
optimized out' [2]
struct iwch_device * rhp = 'valid
pointer perm=rwxp' [2]
-----------------
[15] (1 processes)
-----------------
t3b_poll_cq() at src/cq.c:415
params
struct ibv_cq * ibcq = 'valid pointer
perm=rwxp' [15]
int num_entries = '1' [15]
struct ibv_wc * wc = 'valid pointer
perm=rwxp ([stack])' [15]
locals
struct iwch_cq * chp = '0x11
(Invalid pointer)' [15]
int err = '650348576' [15]
int npolled = '0' [15]
struct iwch_device * rhp = 'valid
pointer perm=rwxp' [15]
-----------------
[15] (1 processes)
-----------------
iwch_poll_cq_one() at src/cq.c:394
params
struct iwch_device * rhp = 'valid
pointer perm=rwxp' [15]
struct iwch_cq * chp = 'value
optimized out' [15]
struct ibv_wc * wc = 'valid
pointer perm=rwxp ([stack])' [15]
locals
uint64_t cookie = '46212224' [15]
uint8_t cqe_flushed = '0 '\0'' [15]
struct t3_cqe * hw_cqe = 'null pointer' [15]
struct iwch_qp * qhp = 'valid
pointer perm=rwxp' [15]
int ret = '0' [15]
struct t3_wq * wq = 'null pointer' [15]
-----------------
#########################
--
Rahul
More information about the padb-users
mailing list