[padb-users] Error message from /opt/sbin/libexec/minfo: No DLL to load

Rahul Nabar rpnabar at gmail.com
Wed Aug 18 19:50:47 BST 2010


On Wed, Aug 18, 2010 at 1:15 PM, Ashley Pittman <ashley at pittman.co.uk> wrote:
>
> This error means that padb is unable to find the name of the debuger DLL which is supposed to be provided by the MPI library.

Thanks Ashley! That helps!

>
> The error you are getting means that the MPI library isn't exporting this text string for the filesystem location of the library which could either be because you aren't really looking at an MPI process or because Open-MPI wasn't build with debugger support.

It could be either. The config.log from the OpenMPI build shows:

  $ ./configure --prefix=/opt/ompi_new --with-tm=/opt/torque FC=ifort
CC=icc F77=ifort CXX=icpc CFLAGS=-g -O3 -mp FFLAGS=-mp -recurs
ive -O3 CXXFLAGS=-g CPPFLAGS=-DPgiFortran --disable-shared
--enable-static --with-memory-manager --disable-dlopen
--enable-openib-rd
macm --with-openib=/usr

Some other relevant parts:

configure:4764: checking whether to debug memory usage
configure:4776: result: no
configure:4796: checking whether to profile memory usage
configure:4808: result: no
configure:4828: checking if want developer-level compiler pickyness
configure:4840: result: no
configure:4855: checking if want developer-level debugging code
configure:4867: result: no

configure:5381: checking if want trace file debugging
configure:5393: result: no

To my naive eyes this doesn't mean much but maybe you have a clue? If
not I'll post on the OpenMPI list (or read their make instructions) to
see how the debugger support is built in.

> With Open-MPI the debugger library is called $OPAL_PREFIX/lib/libompi_dbg_msgq.so IIRC so you could check if this file exists, if it doesn't then you need to check with Open-MPI what steps are needed to ensure this is built.  I thought it was built automatically but this is not the case with all MPI's and it doesn't help matters that in some cases if the build of this DLL fails then the build of MPI could still succeed - I fixed this in around the 1.4 timeframe.
>

I can't find that specific file in my MPI install.

> Alternatively as I say it could be that padb isn't finding the correct processes, does the rest of the output look correct for what you are expecting and are you using some kind of wrapper script between mpirun and your executable?  padb should detect this case and act correctly but it is another possible cause.

This is the first time I'm using padb (or a stack debugger for that
matter!) :) So, not sure what is the "correct" or "typical" output.
I've pasted a snippet at the very bottom of this message, just in case
there are any clues.

I found the process number like so:
/opt/sbin/bin/padb --show-jobs --config-option rmgr=orte
25883
/opt/sbin/bin/padb --full-report=25883 --config-option rmgr=orte  |
tee padb.log.new.new

What is suspicious though is that this  number does not show up in the
ps output. Does that imply padb is mis-discovering the process?

ps aux | grep mpi
rpnabar  17800  0.0  0.0  20660  2264 pts/0    S+   13:46   0:00
mpirun -np 256 --host
eu001,eu002,eu003,eu004,eu005,eu006,eu007,eu008,eu009,eu010,eu011,eu012,eu013,eu014,eu015,eu016,eu017,eu018,eu019,eu020,eu021,eu022,eu023,eu024,eu025,eu026,eu027,eu028,eu029,eu030,eu031,eu032
-mca btl openib,sm,self /opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256
bcast
rpnabar  17832 95.3 25.4 4330112 4181284 pts/0 RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17833 95.5  1.0 270100 169748 pts/0   RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17834 95.6  1.0 335608 169784 pts/0   RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17835 95.3  1.0 335608 169788 pts/0   RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17836 95.6  1.0 335288 169812 pts/0   RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17837 95.6  0.5 191204 90996 pts/0    RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17838 95.6  0.5 256840 91000 pts/0    RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17839 95.6  0.5 256740 90984 pts/0    RLl+ 13:46   0:45
/opt/src/mpitests/imb/src/IMB-MPI1 -npmin 256 bcast
rpnabar  17889  0.0  0.0  61140   756 pts/10   S+   13:47   0:00 grep mpi


############
Warning: errors reported by some ranks
========
[0-255]: Error message from /opt/sbin/libexec/minfo: No DLL to load
========
Warning: errors reported by some ranks
========
[0-255]: Error message from /opt/sbin/libexec/minfo: No DLL to load
========
Total: 0 communicators, no communication data recorded.
Stack trace(s) for thread: 1
-----------------
[0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
-----------------
main() at ?:?
  IMB_init_buffers_iter() at ?:?
    IMB_bcast() at ?:?
      PMPI_Bcast() at pbcast.c:107
            params
              void *         buffer:
                  '0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
                  'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-77,79-8
1,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180,182,1
84-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255]
                  'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,84-85,
87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,175-178,
181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
              int             count: more than 3 distinct values
              MPI_Datatype datatype: more than 3 distinct values
              int              root: more than 3 distinct values
              MPI_Comm         comm:
                  '0x0'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97,112-113,116,11
8,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202,204,206,217-21
8,220,222-223,225,230,232-233,235,237,242]
                  '0x1'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74-77,79-81,83,86,89-92,95
-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-180,182,184-186,189-191,1
94-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
            locals
              int err = '1048576'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
        -----------------
        [0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
        -----------------
        mca_coll_sync_bcast() at coll_sync_bcast.c:44
              params
                void *                       buff:
                    '0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
                    'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-77,79
-81,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180,182
,184-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255]
                    'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,84-8
5,87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,175-17
8,181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
                int                         count: more than 3 distinct values
                struct ompi_datatype_t * datatype: more than 3 distinct values
                int                          root: more than 3 distinct values
                struct ompi_communicator_t * comm:
                    '0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74-77,
79-81,83,86,89-92,95-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-180,1
82,184-186,189-191,194-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
                    'null pointer'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97,112
-113,116,118,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202,204
,206,217-218,220,222-223,225,230,232-233,235,237,242]
                mca_coll_base_module_t *   module = 'null pointer'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
          -----------------
          [0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
          -----------------
          ompi_coll_tuned_bcast_intra_dec_fixed() at
coll_tuned_decision_fixed.c:301
                params
                  void *                       buff:
                      '0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
                      'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-77,
79-81,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180,1
82,184-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255]
                      'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,84
-85,87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,175-
178,181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
                  int                         count: more than 3 distinct values
                  struct ompi_datatype_t * datatype: more than 3 distinct values
                  int                          root: more than 3 distinct values
                  struct ompi_communicator_t * comm:
                      '0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74-7
7,79-81,83,86,89-92,95-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-180
,182,184-186,189-191,194-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
                      'null pointer'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97,1
12-113,116,118,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202,2
04,206,217-218,220,222-223,225,230,232-233,235,237,242]
                  mca_coll_base_module_t *   module = 'null pointer'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
                locals
                  size_t message_size: more than 3 distinct values
            -----------------
            [0-23,25-153,155-165,167-204,206,208-252,254-255] (250 processes)
            -----------------
            ompi_coll_tuned_bcast_intra_pipeline() at coll_tuned_bcast.c:310
                  params
                    void *                     buffer:
                        '0x2 (Invalid pointer)'
[4,17,20,93,116,153,168,183,187-188,201-202,206,218,232]
                        'null pointer'
[0,3,5,7-9,11-12,14,16,18,21,23,25,29,31-32,34-35,37-38,40-44,47-48,51,55-56,59-60,70,72,74-7
7,79-81,83,86,89-92,95-96,98-102,104-105,108-111,114-115,119,124,126,131,134-137,140,144-145,147-150,160-161,163,169-172,174,179-180
,182,184-186,190-191,194-196,198-200,203,208-211,213-216,219,221,224,226,228-229,231,234,236,238-241,243-245,247-248,250-251,254-255
]
                        'valid pointer perm=rwxp'
[1-2,6,10,13,15,19,22,26-28,30,33,36,39,45-46,49-50,52-54,57-58,61-69,71,73,78,82,
84-85,87-88,94,97,103,106-107,112-113,117-118,120-123,125,127-130,132-133,138-139,141-143,146,151-152,155-159,162,164-165,167,173,17
5-178,181,189,192-193,197,204,212,217,220,222-223,225,227,230,233,235,237,242,246,249,252]
                    int                         count: more than 3
distinct values
                    struct ompi_datatype_t * datatype: more than 3
distinct values
                    int                          root: more than 3
distinct values
                    struct ompi_communicator_t * comm:
                        '0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23,25-29,31-32,34-45,47-48,51-52,55-61,65,68,70-72,74
-77,79-81,83,86,89-92,95-96,98-111,114-115,117,119,124,126-127,131-132,134-137,139-142,144-145,147-150,155,160-163,169-172,174,177-1
80,182,184-186,189-191,194-196,198-200,203,208-216,219,221,224,226-229,231,234,236,238-241,243-252,254-255]
                        'null pointer'
[2,4,6,10,13,15,17,19-20,22,30,33,46,49-50,53-54,62-64,66-67,69,73,78,82,84-85,87-88,93-94,97
,112-113,116,118,120-123,125,128-130,133,138,143,146,151-153,156-159,164-165,167-168,173,175-176,181,183,187-188,192-193,197,201-202
,204,206,217-218,220,222-223,225,230,232-233,235,237,242]
                    mca_coll_base_module_t *   module = 'null pointer'
[0-23,25-153,155-165,167-204,206,208-252,254-255]
                    uint32_t                  segsize: more than 3
distinct values
              -----------------
              [0-23] (24 processes)
              -----------------
              ompi_coll_tuned_bcast_intra_generic() at coll_tuned_bcast.c:232
                    params
                      void *                     buffer:
                          '0x2 (Invalid pointer)' [4,17,20]
                          'null pointer' [0,3,5,7-9,11-12,14,16,18,21,23]
                          'valid pointer perm=rwxp' [1-2,6,10,13,15,19,22]
                      int                original_count: more than 3
distinct values
                      struct ompi_datatype_t * datatype: more than 3
distinct values
                      int                          root: more than 3
distinct values
                      struct ompi_communicator_t * comm:
                          '0x1 (Invalid pointer)'
[0-1,3,5,7-9,11-12,14,16,18,21,23]
                          'null pointer' [2,4,6,10,13,15,17,19-20,22]
                      mca_coll_base_module_t *   module = 'null pointer' [0-23]
                      uint32_t         count_by_segment = '8192' [0-23]
                      ompi_coll_tree_t *           tree = 'valid
pointer perm=rwxp' [0-23]
                    locals
                      ptrdiff_t              extent: more than 3 distinct values
                      int              num_segments = '1' [0-23]
                      size_t            realsegsize: more than 3 distinct values
                      ompi_request_t *[2] recv_reqs = '{, }' [0-23]
                      int                  segindex = '1' [0-23]
                      char *                 tmpbuf: more than 3 distinct values
                -----------------
                [0-23] (24 processes)
                -----------------
                ompi_request_default_wait() at request/req_wait.c:37
                      params
                        ompi_request_t **     req_ptr: more than 3
distinct values
                        ompi_status_public_t * status: more than 3
distinct values
                      locals
                        ompi_request_t * req = 'valid pointer perm=rwxp' [0-23]
                  -----------------
                  [0-9,11-16,18-19,21-23] (21 processes)
                  -----------------
                  opal_progress() at runtime/opal_progress.c:207
                    -----------------
                    [0-3,5-7,9,11-13,15-16,18-19,22-23] (17 processes)
                    -----------------
                    btl_openib_component_progress() at
btl_openib_component.c:3175
                          locals
                            mca_btl_openib_device_t * device = 'valid
pointer perm=rwxp' [0-3,5-7,9,11-13,15-16,18-19,22-23]
                      -----------------
                      [1,13,19] (3 processes)
                      -----------------
                      t3b_poll_cq() at src/cq.c:406
                            params
                              struct ibv_cq * ibcq = 'valid pointer
perm=rwxp' [1,13,19]
                              int      num_entries = '1' [1,13,19]
                              struct ibv_wc *   wc = 'valid pointer
perm=rwxp ([stack])' [1,13,19]
                            locals
                              struct iwch_cq *     chp = 'value
optimized out' [1,13,19]
                              int                  err = 'value
optimized out' [1,13,19]
                              int              npolled = 'value
optimized out' [1,13,19]
                              struct iwch_device * rhp = 'valid
pointer perm=rwxp' [1,13,19]
                        -----------------
                        [1,13,19] (3 processes)
                        -----------------
                        pthread_spin_lock() at ?:?
                      -----------------
                      [2] (1 processes)
                      -----------------
                      t3b_poll_cq() at src/cq.c:407
                            params
                              struct ibv_cq * ibcq = 'valid pointer
perm=rwxp' [2]
                              int      num_entries = '1' [2]
                              struct ibv_wc *   wc = 'valid pointer
perm=rwxp ([stack])' [2]
                            locals
                              struct iwch_cq *     chp = 'value
optimized out' [2]
                              int                  err = 'value
optimized out' [2]
                              int              npolled = 'value
optimized out' [2]
                              struct iwch_device * rhp = 'valid
pointer perm=rwxp' [2]
                      -----------------
                      [15] (1 processes)
                      -----------------
                      t3b_poll_cq() at src/cq.c:415
                            params
                              struct ibv_cq * ibcq = 'valid pointer
perm=rwxp' [15]
                              int      num_entries = '1' [15]
                              struct ibv_wc *   wc = 'valid pointer
perm=rwxp ([stack])' [15]
                            locals
                              struct iwch_cq *     chp = '0x11
(Invalid pointer)' [15]
                              int                  err = '650348576' [15]
                              int              npolled = '0' [15]
                              struct iwch_device * rhp = 'valid
pointer perm=rwxp' [15]
                        -----------------
                        [15] (1 processes)
                        -----------------
                        iwch_poll_cq_one() at src/cq.c:394
                              params
                                struct iwch_device * rhp = 'valid
pointer perm=rwxp' [15]
                                struct iwch_cq *     chp = 'value
optimized out' [15]
                                struct ibv_wc *       wc = 'valid
pointer perm=rwxp ([stack])' [15]
                              locals
                                uint64_t        cookie = '46212224' [15]
                                uint8_t    cqe_flushed = '0 '\0'' [15]
                                struct t3_cqe * hw_cqe = 'null pointer' [15]
                                struct iwch_qp *   qhp = 'valid
pointer perm=rwxp' [15]
                                int                ret = '0' [15]
                                struct t3_wq *      wq = 'null pointer' [15]
                    -----------------
#########################

-- 
Rahul




More information about the padb-users mailing list