[padb] [padb-devel] Simple Makefile patch
Ethan Mallove
ethan.mallove at sun.com
Thu Nov 5 18:20:33 GMT 2009
On Mon, Nov/02/2009 10:06:52PM, Ashley Pittman wrote:
> On Mon, 2009-11-02 at 16:30 -0500, Ethan Mallove wrote:
>
> > > Could you send me the log file for a run with that option set please.
> >
> > The log files are the same:
> >
> > $ cat /tmp/padb-minfo-debug-log-0-IkyIH9
> > req: sym MPIR_dll_name
> > ok 0xff1f9824
> > zzz: str:35 dmsg
> > Could not find MPIR_dll_name symbol
> > zzz: str:3 exit
> > die
>
> That would imply that it's found it but then not recognised it.
>
> Could you add some printfs to the find_sym() and ask() functions to see
> what's going on, it's the very first contact so it's likely that one of
> the core functions is failing, perhaps the read(0 is a problem. The
> attached patch should tell us more.
>
I don't think any of the printf's fired:
$ padb --debug=all --config-option rmgr=mpirun --full-report=15961
DEBUG (config): 0: Finished setting configuration options
padb version 3.n (Revision 312)
full job report for job 15961
DEBUG (pcmd): 1: Loaded pcmd data
DEBUG (verbose): 1: There are 1 processes over 1 hosts
DEBUG (verbose): 1: Remote process data available on frontend
DEBUG (show_cmd): 1: /home/em162155/software/SunOS/sparc/padb/bin/padb --inner
DEBUG (signon): 2: Received last signon, connecting to inner
DEBUG (ctree): 2: connection tree
DEBUG (full_duplex): 2: Sending command to inner, 364 bytes
DEBUG (full_duplex): 2: Reply from inner, 316 bytes
DEBUG (full_duplex): 2: Sending command to inner, 64 bytes
DEBUG (full_duplex): 3: Reply from inner, 384 bytes
DEBUG (full_duplex): 3: Sending command to inner, 36 bytes
DEBUG (tdata): 3: Target data
Namespace: "ERROR"
Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol [0]
Namespace: "FOUND"
yes [0]
Warning: errors reported by some ranks
========
[0]: Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol
========
DEBUG (full_duplex): 5: Reply from inner, 432 bytes
DEBUG (full_duplex): 5: Sending command to inner, 472 bytes
DEBUG (tdata): 5: Target data
Namespace: "ERROR"
Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol [0]
Namespace: "FOUND"
yes [0]
Warning: errors reported by some ranks
========
[0]: Error message from /home/em162155/software/SunOS/sparc/padb/bin/minfo.x: Could not find MPIR_dll_name symbol
========
Total: 0 communicators, no communication data recorded.
DEBUG (full_duplex): 6: Reply from inner, 1380 bytes
DEBUG (full_duplex): 6: Sending command to inner, 28 bytes
DEBUG (tdata): 6: Target data
Namespace: ",main() at hello_c.c:18|var|argc"
1 [0]
Namespace: ",main() at hello_c.c:18|var|argv"
0xffbfe304 [0]
Namespace: ",main() at hello_c.c:18|var|rank"
0 [0]
Namespace: ",main() at hello_c.c:18|var|size"
-4201424 [0]
Namespace: "FOUND"
yes [0]
Namespace: "main() at hello_c.c:18|locals"
rank,size [0]
Namespace: "main() at hello_c.c:18|params"
argc,argv [0]
Namespace: "main() at hello_c.c:18|var_type|argc"
int [0]
Namespace: "main() at hello_c.c:18|var_type|argv"
char ** [0]
Namespace: "main() at hello_c.c:18|var_type|rank"
int [0]
Namespace: "main() at hello_c.c:18|var_type|size"
int [0]
DEBUG (tree): 6: Making the tree
DEBUG (tree): 6: Enhancing the tree
DEBUG (tree): 6: Formatting the tree
DEBUG (tree): 6: Displaying the tree
-----------------
[0] (1 processes)
-----------------
main() at hello_c.c:18
params
int argc = '1' [0]
char ** argv = '0xffbfe304' [0]
locals
int rank = '0' [0]
int size = '-4201424' [0]
-----------------
[0] (1 processes)
-----------------
sleep() at ?:?
___nanosleep() at ?:?
DEBUG (tree): 6: Done
DEBUG (full_duplex): 6: Reply from inner, 84 bytes
DEBUG (verbose): 6: Completed command
em162155 $
> I assume it compiles without warnings?
Right, it compiles w/o warnings.
-Ethan
>
> Ashley,
>
> --
>
> Ashley Pittman, Bath, UK.
>
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> Index: minfo.c
> ===================================================================
> --- minfo.c (revision 310)
> +++ minfo.c (working copy)
> @@ -117,10 +117,14 @@
> printf("req: %s\n",req);
> fflush(NULL);
> nbytes = read(0,buff,QUERY_SIZE+3);
> - if ( nbytes < 0 )
> + if ( nbytes < 0 ) {
> + printf("Read returned %d\n",nbytes);
> return -1;
> - if ( memcmp(buff,"ok ",3 ) )
> + }
> + if ( memcmp(buff,"ok ",3 ) ) {
> + show_warning("ask request got nack\n");
> return -1;
> + }
> buff[nbytes-1] = '\000';
> memcpy(ans,&buff[3],nbytes -3);
> return 0;
> @@ -134,8 +138,10 @@
> sprintf(req,"%s %s",type,name);
>
> i = ask(req,ans);
> - if ( i != 0 )
> + if ( i != 0 ) {
> + show_warning("ask failed\n");
> return NULL;
> + }
>
> i = sscanf(ans, "%p",&addr);
> if ( i != 1 ) {
> @@ -533,8 +539,8 @@
> DLSYM(dll_ep,dlhandle,next_communicator);
> DLSYM(dll_ep,dlhandle,setup_operation_iterator);
> DLSYM(dll_ep,dlhandle,next_operation);
> - DLSYM(dll_ep,dlhandle,get_comm_group);
>
> + DLSYM_LAX(dll_ep,dlhandle,get_comm_group);
> DLSYM_LAX(dll_ep,dlhandle,get_global_rank);
> DLSYM_LAX(dll_ep,dlhandle,get_comm_coll_state);
> return 0;
More information about the padb-devel
mailing list