[padb] [padb-devel] Simple Makefile patch

Ethan Mallove ethan.mallove at sun.com
Thu Nov 5 18:20:33 GMT 2009


On Mon, Nov/02/2009 10:06:52PM, Ashley Pittman wrote:
> On Mon, 2009-11-02 at 16:30 -0500, Ethan Mallove wrote:
> 
> > > Could you send me the log file for a run with that option set please.
> > 
> > The log files are the same:
> > 
> >   $ cat /tmp/padb-minfo-debug-log-0-IkyIH9
> >   req: sym MPIR_dll_name
> >   ok 0xff1f9824
> >   zzz: str:35 dmsg
> >   Could not find MPIR_dll_name symbol
> >   zzz: str:3 exit
> >   die
> 
> That would imply that it's found it but then not recognised it.
> 
> Could you add some printfs to the find_sym() and ask() functions to see
> what's going on, it's the very first contact so it's likely that one of
> the core functions is failing, perhaps the read(0 is a problem.  The
> attached patch should tell us more.
> 

I don't think any of the printf's fired:

$ padb --debug=all --config-option rmgr=mpirun --full-report=15961
DEBUG (config):   0: Finished setting configuration options
padb version 3.n (Revision 312)
full job report for job 15961

DEBUG (pcmd):   1: Loaded pcmd data
DEBUG (verbose):   1: There are 1 processes over 1 hosts
DEBUG (verbose):   1: Remote process data available on frontend
DEBUG (show_cmd):   1:  /home/em162155/software/SunOS/sparc/padb/bin/padb --inner
DEBUG (signon):   2: Received last signon, connecting to inner
DEBUG (ctree):   2: connection tree
DEBUG (full_duplex):   2: Sending command to inner, 364 bytes
DEBUG (full_duplex):   2: Reply from inner, 316 bytes
DEBUG (full_duplex):   2: Sending command to inner, 64 bytes
DEBUG (full_duplex):   3: Reply from inner, 384 bytes
DEBUG (full_duplex):   3: Sending command to inner, 36 bytes
DEBUG (tdata):   3: Target data
Namespace: "ERROR"
    Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol        [0]
Namespace: "FOUND"
    yes [0]

Warning: errors reported by some ranks
========
[0]: Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol
========
DEBUG (full_duplex):   5: Reply from inner, 432 bytes
DEBUG (full_duplex):   5: Sending command to inner, 472 bytes
DEBUG (tdata):   5: Target data
Namespace: "ERROR"
    Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol        [0]
Namespace: "FOUND"
    yes [0]

Warning: errors reported by some ranks
========
[0]: Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol
========
Total: 0 communicators, no communication data recorded.
DEBUG (full_duplex):   6: Reply from inner, 1380 bytes
DEBUG (full_duplex):   6: Sending command to inner, 28 bytes
DEBUG (tdata):   6: Target data
Namespace: ",main() at hello_c.c:18|var|argc"
    1   [0]
Namespace: ",main() at hello_c.c:18|var|argv"
    0xffbfe304  [0]
Namespace: ",main() at hello_c.c:18|var|rank"
    0   [0]
Namespace: ",main() at hello_c.c:18|var|size"
    -4201424    [0]
Namespace: "FOUND"
    yes [0]
Namespace: "main() at hello_c.c:18|locals"
    rank,size   [0]
Namespace: "main() at hello_c.c:18|params"
    argc,argv   [0]
Namespace: "main() at hello_c.c:18|var_type|argc"
    int [0]
Namespace: "main() at hello_c.c:18|var_type|argv"
    char **     [0]
Namespace: "main() at hello_c.c:18|var_type|rank"
    int [0]
Namespace: "main() at hello_c.c:18|var_type|size"
    int [0]

DEBUG (tree):   6: Making the tree
DEBUG (tree):   6: Enhancing the tree
DEBUG (tree):   6: Formatting the tree
DEBUG (tree):   6: Displaying the tree
-----------------
[0] (1 processes)
-----------------
main() at hello_c.c:18
      params
        int     argc = '1' [0]
        char ** argv = '0xffbfe304' [0]
      locals
        int rank = '0' [0]
        int size = '-4201424' [0]
  -----------------
  [0] (1 processes)
  -----------------
  sleep() at ?:?
    ___nanosleep() at ?:?
DEBUG (tree):   6: Done
DEBUG (full_duplex):   6: Reply from inner, 84 bytes
DEBUG (verbose):   6: Completed command
em162155 $

> I assume it compiles without warnings?

Right, it compiles w/o warnings.

-Ethan


> 
> Ashley,
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk

> Index: minfo.c
> ===================================================================
> --- minfo.c	(revision 310)
> +++ minfo.c	(working copy)
> @@ -117,10 +117,14 @@
>      printf("req: %s\n",req);
>      fflush(NULL);    
>      nbytes = read(0,buff,QUERY_SIZE+3);
> -    if ( nbytes < 0 )
> +    if ( nbytes < 0 ) {
> +	printf("Read returned %d\n",nbytes);
>  	return -1;
> -    if ( memcmp(buff,"ok ",3 ) )
> +    }
> +    if ( memcmp(buff,"ok ",3 ) ) {
> +	show_warning("ask request got nack\n");
>  	return -1;
> +    }
>      buff[nbytes-1] = '\000';
>      memcpy(ans,&buff[3],nbytes -3);
>      return 0;
> @@ -134,8 +138,10 @@
>      sprintf(req,"%s %s",type,name);
>      
>      i = ask(req,ans);
> -    if ( i != 0 )
> +    if ( i != 0 ) {
> +	show_warning("ask failed\n");
>  	return NULL;
> +    }
>      
>      i = sscanf(ans, "%p",&addr);
>      if ( i != 1 ) {
> @@ -533,8 +539,8 @@
>      DLSYM(dll_ep,dlhandle,next_communicator);
>      DLSYM(dll_ep,dlhandle,setup_operation_iterator);
>      DLSYM(dll_ep,dlhandle,next_operation);
> -    DLSYM(dll_ep,dlhandle,get_comm_group);
>  
> +    DLSYM_LAX(dll_ep,dlhandle,get_comm_group);
>      DLSYM_LAX(dll_ep,dlhandle,get_global_rank);
>      DLSYM_LAX(dll_ep,dlhandle,get_comm_coll_state);
>      return 0;





More information about the padb-devel mailing list