From codesite-noreply at google.com Thu Oct 1 20:38:51 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 01 Oct 2009 19:38:51 +0000 Subject: [padb-devel] [padb] r272 committed - Merged cleanup changes r169:r271 into the trunk Message-ID: <0016364c6f4508fda60474e4cc93@google.com> Revision: 272 Author: apittman Date: Thu Oct 1 12:38:38 2009 Log: Merged cleanup changes r169:r271 into the trunk http://code.google.com/p/padb/source/detail?r=272 Added: /trunk/src/report.pl Modified: /trunk/src /trunk/src/Makefile /trunk/src/padb ======================================= --- /dev/null +++ /trunk/src/report.pl Thu Oct 1 12:38:38 2009 @@ -0,0 +1,108 @@ +#!/usr/bin/perl + +use strict; +use warnings; + +# Takes the source code and the output of perlcritic and make a +# per-function report on the state of the source. Very +# rough-and-ready but it helps in it's own little way. + +my $line = 1; + +my %calls; +my %subs; + +my @builtin = + qw(basename Dumper qw Configure dclone join defined delete print shift close accept confess A new thaw add + remove count flush handles printf split dirname); + +my %builtin; + +foreach (@builtin) { + $builtin{$_}++; +} + +my $current_fn; + +my %fns; +my @fns; +my @calls; +my %refs; + +my $PH; + +open $PH, '<', 'padb'; + +while (<$PH>) { + if ( m{\A[^#]*?(\w+)\(}x and not defined $builtin{$1} ) { + push @{ $calls{$1} }, $line; + $calls[$line] = $1; + + } elsif (m{\Asub\s+(\w+)}) { + $current_fn = $1; + + $fns{$current_fn}{start} = $line; + push @fns, $current_fn; + } elsif ( m{\A\}} and defined $current_fn ) { + + $fns{$current_fn}{end} = $line; + $current_fn = undef; + } elsif (m{\\\&(\w+)}) { + $refs{$1} = 1; + } + + $line++; +} + +close $PH; + +my $PPH; +open $PPH, '<', 'pc'; + +my @pc_names = qw(brutal cruel harsh stern gentle); + +my @errors; +while (<$PPH>) { + if (m{\A(\d+)\:[ ]\((\d)\)[ ](.*)\Z}x) { + push @{ $errors[$1] }, "$pc_names[$2-1]($2): $3"; + } else { + printf "?? $_\n"; + } +} + +foreach my $fn (@fns) { + foreach my $line ( $fns{$fn}{start} .. $fns{$fn}{end} ) { + + if ( defined $errors[$line] ) { + @{ $fns{$fn}{errors}{ $line - $fns{$fn}{start} } } = + @{ $errors[$line] }; + } + next unless defined( $calls[$line] ); + $fns{ $calls[$line] }{called_by}{$fn}++; + $fns{$fn}{calls}{ $calls[$line] }++; + } +} + +foreach my $fn (@fns) { + printf("Function: $fn $fns{$fn}{start}\n"); + + if ( defined $refs{$fn} ) { + printf("\tIs dereferenced\n"); + } + foreach my $cf ( sort keys %{ $fns{$fn}{called_by} } ) { + printf("\tIs called by:\t$cf ($fns{$fn}{called_by}{$cf} times)\n"); + } + foreach my $cf ( sort keys %{ $fns{$fn}{calls} } ) { + printf("\tCalls:\t\t$cf ($fns{$fn}{calls}{$cf} times)\n"); + } + foreach my $el ( sort { $a <=> $b } keys %{ $fns{$fn}{errors} } ) { + foreach my $error ( @{ $fns{$fn}{errors}{$el} } ) { + + #printf("\tError:\t\t$fns{$fn}{errors}{$el} ($el)\n"); + printf("\tError:\t\t$error ($el)\n"); + } + } + printf("\n"); +} + +exit 0; ======================================= --- /trunk/src/Makefile Tue Sep 1 13:09:16 2009 +++ /trunk/src/Makefile Thu Oct 1 12:38:38 2009 @@ -27,3 +27,14 @@ /bin/cp ${FILES} padb-${VERSION} svnversion > padb-${VERSION}/svnversion tar -czf padb-${VERSION}.tgz padb-${VERSION} + +tidy: + perltidy -b -ce -w -se padb + +pc: padb + perlcritic --brutal --verbose "%l: (%s) %m\n" padb > .pc.tmp || true + /bin/mv .pc.tmp pc + +report: pc + ./report.pl pc | tee report + ======================================= --- /trunk/src/padb Tue Sep 1 13:07:04 2009 +++ /trunk/src/padb Thu Oct 1 12:38:38 2009 @@ -1,4 +1,4 @@ -#!/usr/bin/perl -w +#!/usr/bin/perl # padb. a simple parallel debugging aid. @@ -24,9 +24,9 @@ # Revision history # Version 3.0 -# * Full-duplex communication between inner and outer processes, padb -# no longer simply sends request on the command line and processes -# the response but is truly interactive between the inner and outer +# * Full-duplex communication between inner and outer processes, padb no +# longer simply sends request on the command line and processes the +# response but is truly interactive between the inner and outer # processes. This avails greater flexibility in what can be achieved # and hopefully helps with scalability as well. # * Enabled warnings (-w) by default. Fixed lots of warnings, mostly @@ -34,22 +34,23 @@ # * Much more complete separation into "modes" of operation, most options # are now mode specific rather than simply using global variables or # global conf options. -# * Overhaul of the allfns (mode) callbacks and in particular their parameters +# * Overhaul of the allfns (mode) callbacks and in particular their +# parameters # * Performance improvements. -# * Simplify the slurm_find_pids() function to just return the output -# of scontrol listpids +# * Simplify the slurm_find_pids() function to just return the output of +# scontrol listpids # * Take the old process-tree walking code from slurm_find_pids() and make -# it independant and call it for all resource managers. This allows scripts -# which call parallel applications to be bypassed and the applications -# themselves targetted. -# * Added "port-range" option to limit port usage in case people try and use -# padb with firewalls enabled. +# it independant and call it for all resource managers. This allows +# scripts which call parallel applications to be bypassed and the +# applications themselves targetted. +# * Added "port-range" option to limit port usage in case people try and +# use padb with firewalls enabled. # # Version 2.5 # * First Non-Quadrics version # * Various stability/bug fixes. -# * Deadlock detect at the MPI Layer rather than the Elan layer -# if running with a patched MPI (Work in progress) +# * Deadlock detect at the MPI Layer rather than the Elan layer if running +# with a patched MPI (Work in progress) # * Completely new build and packing procedure to go with the new # maintainer. # * Added "orte" to the list of resource managers supported @@ -57,97 +58,109 @@ # * inner_main() now uses callbacks for resource manager support. # * --signal now takes names rather than numbers. # * Check job is valid when using the --full-report option. -# * Add a --proc-summary option to replace --proc-info --proc-format -# This gives a very efficient "job top" program. - +# * Add a --proc-summary option to replace --proc-info --proc-format This +# gives a very efficient "job top" program. # Version 2.2 -# * Add a --core-stack option along with --core and --exe to extract stack -# traces from core files. +# * Add a --core-stack option along with --core and --exe to extract stack +# traces from core files. # # Version 2.1 -# * Add some magic to return complex perl data structures back from the inner -# callback functions to the output callback function. (nfreeze/base64_encode). -# * Add "MPI watch" functionality to allow viewing of MPI state in a vmstat -# like fashion. -# * Add a --list-rmgrs option to list active resource managers and their jobs. +# * Add some magic to return complex perl data structures back from the +# inner callback functions to the output callback +# function. (nfreeze/base64_encode). +# * Add "MPI watch" functionality to allow viewing of MPI state in a +# vmstat like fashion. +# * Add a --list-rmgrs option to list active resource managers and their +# jobs. # * Add support for "local-qsnet" as a way of launching jobs. # * Add support for "local-fd" as a way of launching jobs. # * Add support for "mpd" as a way of launching jobs. -# * Add support for "lsf-rms" as a way of launching jobs. Note the lsf/rms -# integration means this is highly unlikely to work for everyone. -# * Add a -Olsf-job-offset option for finding lsf jobs in the rms database. +# * Add support for "lsf-rms" as a way of launching jobs. Note the +# lsf/rms integration means this is highly unlikely to work for +# everyone. +# * Add a -Olsf-job-offset option for finding lsf jobs in the rms +# * database. # * Support for MPI message queues as well as libelan queues (-Q) # * Add a -Ominfo= option for finding the new minfo.x command. # * Add a -Ompi-dll= option for overriding the debugger dll. # * Extend the gdb handling code to allow further expansion in the future. # * Make the strip-below and strip-above functions configurable. -# * Add support for loading settings from the environment or a config file. -# * Add support for "local" as a resource manager to target hand-crafted capabilities. +# * Add support for loading settings from the environment or a config +# file. +# * Add support for "local" as a resource manager to target hand-crafted +# capabilities. # * Ignore case when matching stats names. # * Correct printing of debug information from the inner. # * Try and remove warnings when run with -w (still disabled) # * Un-break it on slurm systems without RMS installed. # * Preliminary threading support (courtesy of LLNL) # * Show per-rail sdram usage of processes. -# * Look at all descendant processes of slurmstepd rather than direct descendants -# and try and avoid scripts (perl/sh/bash/xterm) -# * Use the new scontrol listpids and %A options to squeue for finding processes -# on slurm systems (1.2.0 and above). +# * Look at all descendant processes of slurmstepd rather than direct +# descendants and try and avoid scripts (perl/sh/bash/xterm) +# * Use the new scontrol listpids and %A options to squeue for finding +# processes on slurm systems (1.2.0 and above). # * Don't show usage on command line errors. -# * Only pass command line options onto the inner if it is going to handle them +# * Only pass command line options onto the inner if it is going to handle +# * them # # Version 2.0 -# * Removed the -OscriptDir option as it's no longer used, use -Oedb instead. +# * Removed the -OscriptDir option as it's no longer used, use -Oedb +# instead. # * Corrected the way tally statistics were being added. # * Added a --show-jobs option to show possible jobs in a resource manager # independent way. -# * Added a --local-stats option to show message statistics for all processes -# on the local node. -# * Added a --proc-format option which allows specific entries from /proc to be -# reported on a per-vp basis. -# * Ported to slurm only systems where the RMS kernel module isn't present. -# * Removed the padb-helper.pl file and folded it's functionality into padb -# itself. Padb is now self-contained. +# * Added a --local-stats option to show message statistics for all +# processes on the local node. +# * Added a --proc-format option which allows specific entries from /proc +# to be reported on a per-vp basis. +# * Ported to slurm only systems where the RMS kernel module isn't +# present. +# * Removed the padb-helper.pl file and folded it's functionality into +# padb itself. Padb is now self-contained. # * Removed the padb.gdb file from the kit, it's generated at run-time if # needed. # * Various readability fixes and small performance improvements. -# * Added a --kill option along with --signal which can be used to send signals -# to any process in the parallel job. +# * Added a --kill option along with --signal which can be used to send +# signals to any process in the parallel job. # # Version 1.6 -# * Add a --proc-info option to show the contents of files from /proc for a -# given rank +# * Add a --proc-info option to show the contents of files from /proc for +# a given rank # * Increase the RMS_EXITTIMEOUT value from 10 to 30 seconds and make it a # configuration option. # # Version 1.5 -# * Try and load edb from where padb is installed. This should allow it to run -# on elan3 systems where the qsnetdefault link is set to elan3. -# * GNAT:8110 Proper use of English in the group deadlock detection report. -# * Target the correct process if there are multiple processes for each vp. -# Use the pid of the process which called elan_baseInit() -# * GNAT 7945: Fix messages on stderr about integer overflow on 32 bit machines +# * Try and load edb from where padb is installed. This should allow it to +# run on elan3 systems where the qsnetdefault link is set to elan3. +# * GNAT:8110 Proper use of English in the group deadlock detection +# report. +# * Target the correct process if there are multiple processes for each +# vp. Use the pid of the process which called elan_baseInit() +# * GNAT 7945: Fix messages on stderr about integer overflow on 32 bit +# machines # * Remove warnings when -w is turned on. -# * Re-work the stack trace tree generation code do work via a intermediate -# data structure to make the code easier to parse. +# * Re-work the stack trace tree generation code do work via a +# intermediate data structure to make the code easier to parse. # * Report errors loading stats from a running job cleanly. # * Better backwards compatibility with older RMS releases. -# * Add a padb-treeview script to the release, this takes most of it's code -# from padb and uses tk to provide the user with a X based view of the stack -# traces. +# * Add a padb-treeview script to the release, this takes most of it's +# code from padb and uses tk to provide the user with a X based view of +# the stack traces. # * Changes to edb so the stats loading code can run on elan3 systems. # # Version 1.4 # * Bumped version number to 1.4 -# * Change the format of tree based stack traces, it now uses a more logical -# indention style. +# * Change the format of tree based stack traces, it now uses a more +# logical indention style. # * Discover and report if application stats are incomplete. -# * Allow the use of -r with -s to view statistics from an individual process. -# Update -S (which does the same thing) to parse the stats in padb as well. +# * Allow the use of -r with -s to view statistics from an individual +# process. Update -S (which does the same thing) to parse the stats in +# padb as well. # * Improved error handling in the case where jobs complete whilst padb is -# running, sample the job state before and after going parallel and do the -# right thing accordingly. -# * Much improved error output, only report an error if something bad happened. +# running, sample the job state before and after going parallel and do +# the right thing accordingly. +# * Much improved error output, only report an error if something bad +# happened. # * Changes to the code as required to enable padb to run cleanly with # warnings (-w) enabled. # * Added a -Ostats-name= option to allow the extraction of one specific @@ -156,43 +169,47 @@ # parallel to make it more resilient. # # Version 1.3 -# * Strip stack traces below main when possible, add a --nostrip-below-main -# option to turn this off. +# * Strip stack traces below main when possible, add a +# --nostrip-below-main option to turn this off. # * Strip stack traces above elan_waitWord when possible, add a # --nostrip-above-wait option to turn this off. -# * Added a -Ogdb-retry-count=N option. Defaults to three as before but is now -# tunable. -# * Parse communication statistics in padb directly now rather then relying on -# edb to do it for us +# * Added a -Ogdb-retry-count=N option. Defaults to three as before but is +# now tunable. +# * Parse communication statistics in padb directly now rather then +# relying on edb to do it for us # * Allow reading of stats from file (-s -i ) # * Perform group deadlock detection in padb directly rather than in edb, # improved the output and handling of corner cases. -# * Initial version of a "one process per line" method of statistics reporting. -# * Better catching and reporting of errors when running parallel programs. +# * Initial version of a "one process per line" method of statistics +# reporting. +# * Better catching and reporting of errors when running parallel +# programs. # * Bumped the version number to 1.3 # # Version 1.2 -# * Converted padb to use long command line options. The short ones still work -# but now have long alternatives -# * Removed the need to set -OscriptDir= when running in non-default locations -# * Added a --full-report= option to gather all possible information +# * Converted padb to use long command line options. The short ones still +# work but now have long alternatives +# * Removed the need to set -OscriptDir= when running in non-default +# locations +# * Added a --full-report= option to gather all possible +# information # * General tidy ups in the stack trace tree generation code. -# * Now reports processes that aren't present when generating stack traces. -# * Now reports errors properly when there are errors launching the parallel -# job +# * Now reports processes that aren't present when generating stack +# traces. +# * Now reports errors properly when there are errors launching the +# parallel job # * Calls edb directly rather than using a helper script when possible # (statistics reports). # * Incremented version number from 1.0 to 1.2. # - # TODO: # # * More testing with -w turned on. # * Multi-pass argument handling, --kill also accepts --signal for example, -# this should really be done at the getopt layer. Also proper usage -# info for these secondary args. -# * Paramater checking of secondary args, signal has a hacky implementation and -# port-range doesn't have any checking currently. +# this should really be done at the getopt layer. Also proper usage info +# for these secondary args. +# * Paramater checking of secondary args, signal has a hacky implementation +# and port-range doesn't have any checking currently. # * libunwind support? lighter weight than gdb and possibly more reliable. # * Maybe PMI would help? # * POD? generated man page? @@ -204,6 +221,7 @@ ############################################################################### +use warnings; use strict; use Getopt::Long; use File::Basename; @@ -250,17 +268,18 @@ # Main. my $prog = basename $0; -my $version = "3.0-rc1"; +my $version = '3.0-rc1'; my %conf; my $secret; -# Config options the inner knows about, only forward options if they are in this list. -my @inner_conf = qw(edb edbopt minfo rmgr scripts slurm_job_step verbose); - -# More config options the inner knows about, these are forwarded on the command line -# rather than over the sockets. +# Config options the inner knows about, only forward options if they are in +# this list. +my @inner_conf = qw(edb edbopt minfo rmgr scripts slurm_job_step); + +# More config options the inner knows about, these are forwarded on the +# command line rather than over the sockets. my @inner_conf_cmd = qw(port_range outer); ############################################################################### @@ -270,8 +289,8 @@ ############################################################################### # A hash of supported resource managers, each of which provides a number of -# functions for querying the state of the machine. This keeps the core -# of the code tidy. Note that this is only for the "outer" instance of the +# functions for querying the state of the machine. This keeps the core of +# the code tidy. Note that this is only for the "outer" instance of the # program, the inner version does things differently. # Function Args Returns Required Description @@ -288,70 +307,70 @@ my %rmgr; $rmgr{rms} = { - 'is_installed' => \&rms_is_installed, - 'get_active_jobs' => \&rms_get_jobs, - 'job_is_running' => \&rms_job_is_running, - 'job_to_key' => \&rms_job_to_key, - 'setup_pcmd' => \&rms_setup_pcmd, - 'find_pids' => \&rms_find_pids, + is_installed => \&rms_is_installed, + get_active_jobs => \&rms_get_jobs, + job_is_running => \&rms_job_is_running, + job_to_key => \&rms_job_to_key, + setup_pcmd => \&rms_setup_pcmd, + find_pids => \&rms_find_pids, }; $rmgr{mpd} = { - 'is_installed' => \&mpd_is_installed, - 'get_active_jobs' => \&mpd_get_jobs, - 'setup_pcmd' => \&mpd_setup_pcmd, - 'cleanup_pcmd' => \&mpd_cleanup_pcmd, - 'find_pids' => \&mpd_find_pids, - 'require_inner_callback' => 1, + is_installed => \&mpd_is_installed, + get_active_jobs => \&mpd_get_jobs, + setup_pcmd => \&mpd_setup_pcmd, + cleanup_pcmd => \&mpd_cleanup_pcmd, + find_pids => \&mpd_find_pids, + require_inner_callback => 1, }; $rmgr{orte} = { - 'is_installed' => \&open_is_installed, - 'get_active_jobs' => \&open_get_jobs, - 'setup_pcmd' => \&open_setup_pcmd, - 'cleanup_pcmd' => \&open_cleanup_pcmd, - 'find_pids' => \&open_find_pids, + is_installed => \&open_is_installed, + get_active_jobs => \&open_get_jobs, + setup_pcmd => \&open_setup_pcmd, + cleanup_pcmd => \&open_cleanup_pcmd, + find_pids => \&open_find_pids, }; -$rmgr{"lsf-rms"} = { - 'is_installed' => \&lsf_is_installed, - 'get_active_jobs' => \&lsf_get_jobs, - 'setup_pcmd' => \&lsf_setup_pcmd, - 'inner_rmgr' => "rms", +$rmgr{'lsf-rms'} = { + is_installed => \&lsf_is_installed, + get_active_jobs => \&lsf_get_jobs, + setup_pcmd => \&lsf_setup_pcmd, + inner_rmgr => 'rms', }; $rmgr{slurm} = { - 'is_installed' => \&slurm_is_installed, - 'get_active_jobs' => \&slurm_get_jobs, - 'job_is_running' => \&slurm_job_is_running, - 'setup_pcmd' => \&slurm_setup_pcmd, - 'find_pids' => \&slurm_find_pids, - 'require_inner_callback' => 1, + is_installed => \&slurm_is_installed, + get_active_jobs => \&slurm_get_jobs, + job_is_running => \&slurm_job_is_running, + setup_pcmd => \&slurm_setup_pcmd, + find_pids => \&slurm_find_pids, + require_inner_callback => 1, }; $rmgr{local} = { - 'get_active_jobs' => \&local_get_jobs, - 'job_is_running' => \&local_job_is_running, - 'setup_pcmd' => \&local_setup_pcmd, - 'find_pids' => \&local_find_pids, - 'require_inner_callback' => 1, + get_active_jobs => \&local_get_jobs, + job_is_running => \&local_job_is_running, + setup_pcmd => \&local_setup_pcmd, + find_pids => \&local_find_pids, + require_inner_callback => 1, }; -$rmgr{"local-qsnet"} = { - 'is_installed' => \&local_q_is_installed, - 'get_active_jobs' => \&local_q_get_jobs, - 'job_is_running' => \&local_job_is_running, - 'setup_pcmd' => \&local_setup_pcmd, - 'inner_rmgr' => "local", - 'require_inner_callback' => 1, +$rmgr{'local-qsnet'} = { + is_installed => \&local_q_is_installed, + get_active_jobs => \&local_q_get_jobs, + job_is_running => \&local_job_is_running, + setup_pcmd => \&local_setup_pcmd, + inner_rmgr => 'local', + require_inner_callback => 1, }; -$rmgr{"local-fd"} = { - 'get_active_jobs' => \&local_fd_get_jobs, - 'job_is_running' => \&local_job_is_running, - 'setup_pcmd' => \&local_setup_pcmd, - 'inner_rmgr' => "local", - 'require_inner_callback' => 1, +$rmgr{'local-fd'} = { + get_active_jobs => \&local_fd_get_jobs, + job_is_running => \&local_job_is_running, + setup_pcmd => \&local_setup_pcmd, + inner_rmgr => 'local', + require_inner_callback => 1, }; ############################################################################### @@ -360,11 +379,11 @@ # ############################################################################### -# If changing any of these defaults also check the inner code as some -# of these settings affect that program as well and padb will only -# pass on settings on the command line, not the entire config hash. -# The reason they are listed here as well is so that padb -O help -# works and gives the correct defaults. +# If changing any of these defaults also check the inner code as some of +# these settings affect that program as well and padb will only pass on +# settings on the command line, not the entire config hash. The reason +# they are listed here as well is so that padb -O help works and gives the +# correct defaults. my %allfns; @@ -376,33 +395,32 @@ # Debug options. $conf{verbose} = 0; -# Valid values are "none" "missing" or "all". Anything not recognised -# is treated as "all". -$conf{check_signon} = "all"; +# Valid values are "none" "missing" or "all". Anything not recognised is +# treated as "all". +$conf{check_signon} = 'all'; # Output options. -$conf{interval} = 10; -$conf{watch_clears_screen} = 1; -$conf{scripts} = "bash,sh,dash,ash,perl,xterm"; +$conf{interval} = '10s'; +$conf{watch_clears_screen} = 'enabled'; +$conf{scripts} = 'bash,sh,dash,ash,perl,xterm'; $conf{lsf_job_offset} = 1; -$conf{local_fd_name} = "/dev/null"; -$conf{inner_callback} = 0; - -# These two are used by deadlock and QsNet group -# code, they need migrating in the group code -# when I have access to a test system again. +$conf{local_fd_name} = '/dev/null'; +$conf{inner_callback} = 'disabled'; + +# These two are used by deadlock and QsNet group code, they need migrating +# in the group code when I have access to a test system again. #$conf{"show-group-members"} = 0; #$conf{"show-all-groups"} = 0; # Tuning options. -$conf{prun_timeout} = 120; -$conf{prun_exittimeout} = 120; +$conf{prun_timeout} = '2m'; +$conf{prun_exittimeout} = '2m'; $conf{rmgr} = undef; $conf{slurm_job_step} = 0; # These settings are passed onto inner only. -$conf{edbopt} = ""; +$conf{edbopt} = undef; $conf{edb} = find_edb(); $conf{minfo} = find_minfo(); @@ -410,18 +428,104 @@ # Option to define a list of ports used by padb. $conf{port_range} = undef; -$conf{tree_width} = 4; +$conf{tree_width} = '4'; + +# Config options which take boolean values. +my @conf_bool = qw(watch_clears_screen inner_callback); + +# Config options which take a time value. +my @conf_time = qw(prun_exittimeout prun_timeout interval); + +# Config options which take an integer. +my @conf_int = qw(lsf_job_offset slurm_job_step tree_width); my $norc = 0; -my $configfile = "/etc/padb.conf"; +my $configfile = '/etc/padb.conf'; + +# Standard regexpes for splitting on comma, equals and spaces. Note the +# space regexp matches multiple whitespace characters. +my $COMMA = qr{,}x; +my $EQUALS = qr{=}x; +my $SPACE = qr{\s+}x; +my $COLON = qr{:}x; + +my $EMPTY_STRING = q{}; + +# Enable sorting in Data::Dumper for clarity. +$Data::Dumper::Sortkeys = 1; +$Data::Dumper::Quotekeys = 0; + +sub check_and_convert_bool { + my ($str) = @_; + my @yes = qw(1 yes on enabled); + my @no = qw(0 no off disabled); + my %bool_table; + map { $bool_table{$_} = 1 } @yes; + map { $bool_table{$_} = 0 } @no; + + if ( defined $bool_table{$str} ) { + return $bool_table{$str}; + } + printf {*STDERR} "Boolean value \"%s\" not recognised, aborting.\n", $str; + exit 1; +} + +sub check_and_convert_time { + my ($str) = @_; + if ( + $str =~ m{\A # Start of line + (\d+) # A number + (s|m)? # With an option s or m suffix. + \z}x + ) + { + if ( defined $2 and $2 eq 'm' ) { + return $1 * 60; + } + return $1; + } + printf {*STDERR} "Time value \"%s\" not recognised, aborting.\n", $str; + exit 1; +} + +sub check_int { + my ($str) = @_; + + return + if ( + $str =~ m{\A # Start of line + \d+ # A number + \z}x + ); + + printf {*STDERR} "Integer value \"%s\" not recognised, aborting.\n", $str; + exit 1; +} + +sub check_signal { + my ($signal) = @_; + + my $s = uc $signal; + my %sig_names; + foreach ( split $SPACE, $Config{sig_name} ) { + $sig_names{$_} = 1; + } + + if ( not defined $sig_names{$s} ) { + printf {*STDERR} "Error: signal \"%s\" is invalid, aborting.\n", + $signal; + exit 1; + } + return $s; +} # Look for edb in the default install location only. sub find_edb { - return "/usr/lib/qsnet/elan4/bin/" - if ( -d "/usr/lib/qsnet/elan4/bin/" ); - return "/usr/lib64/qsnet/elan4/bin/" - if ( -d "/usr/lib64/qsnet/elan4/bin/" ); - return "edb"; + return '/usr/lib/qsnet/elan4/bin/' + if ( -d '/usr/lib/qsnet/elan4/bin/' ); + return '/usr/lib64/qsnet/elan4/bin/' + if ( -d '/usr/lib64/qsnet/elan4/bin/' ); + return 'edb'; } # Look for minfo.x in the same directory as padb. @@ -437,13 +541,13 @@ ############################################################################### sub show_version { - printf("$prog version $version\n\n"); - printf("Written by Ashley Pittman\n"); - printf("http://padb.pittman.org.uk\n"); + print "$prog version $version\n\n"; + print "Written by Ashley Pittman\n"; + print "http://padb.pittman.org.uk\n"; exit 0; } -my $usage = <= [,=...]] [-i ] [-r ] [-u ] -a|-A| @@ -458,8 +562,8 @@ XXXX --full-report= Generate a full report of job state. - --nostrip-below-main Don\'t strip stack traces below main. - --nostrip-above-wait Don\'t strip stack traces about elan_waitWord. + --no-strip-below-main Don't strip stack traces below main. + --no-strip-above-wait Don't strip stack traces about elan_waitWord. --proc-format Specify information to show about processes. @@ -505,45 +609,45 @@ -v --verbose Verbose. -V --version Show version number and exit. -h --help print this usage message. + EOF sub usage { chomp $usage; - my $extra = ""; + my $extra = $EMPTY_STRING; $extra .= "Modes of operation\n"; - foreach my $arg ( sort( keys %allfns ) ) { + foreach my $arg ( sort keys %allfns ) { next unless ( defined $allfns{$arg}{help} ); next if ( defined $allfns{$arg}{qsnet} ); if ( defined $allfns{$arg}{arg_short} ) { $extra .= "-$allfns{$arg}{arg_short}"; } else { - $extra .= " "; - } - $extra .= sprintf( " --%-18s%s.\n", - $allfns{$arg}{arg_long}, - $allfns{$arg}{help} ); + $extra .= ' '; + } + $extra .= sprintf " --%-18s%s.\n", + $allfns{$arg}{arg_long}, + $allfns{$arg}{help}; } $extra .= "\nQsNet specific modes\n"; - foreach my $arg ( sort( keys %allfns ) ) { + foreach my $arg ( sort keys %allfns ) { next unless ( defined $allfns{$arg}{help} ); next unless ( defined $allfns{$arg}{qsnet} ); if ( defined $allfns{$arg}{arg_short} ) { $extra .= "-$allfns{$arg}{arg_short}"; } else { - $extra .= " "; - } - $extra .= sprintf( " --%-18s%s.\n", - $allfns{$arg}{arg_long}, - $allfns{$arg}{help} ); + $extra .= ' '; + } + $extra .= sprintf " --%-18s%s.\n", + $allfns{$arg}{arg_long}, + $allfns{$arg}{help}; } - $usage =~ s!XXXX!$extra!; - - print STDERR <Dump( [$handle], [$type] ); + return; } # Valid debug modes, a full list is maintained here so using unexpected # ones can generate warnings. -$debugModes{full_duplex} = undef; -$debugModes{show_cmd} = undef; -$debugModes{all} = undef; -$debugModes{tree} = undef; -$debugModes{verbose} = undef; -$debugModes{signon} = undef; -$debugModes{rmgr} = undef; -$debugModes{ctree} = undef; -$debugModes{tdata} = undef; +$debug_modes{full_duplex} = undef; +$debug_modes{show_cmd} = undef; +$debug_modes{all} = undef; +$debug_modes{tree} = undef; +$debug_modes{verbose} = undef; +$debug_modes{signon} = undef; +$debug_modes{rmgr} = undef; +$debug_modes{ctree} = undef; +$debug_modes{tdata} = undef; +$debug_modes{config} = undef; + +sub slurp_file { + my ($file) = @_; + open my $FD, '<', $file or return; + my @contents = <$FD>; + close $FD; + return @contents; +} + +sub slurp_cmd { + my ($cmd) = @_; + open my $CFD, '-|', "$cmd 2>/dev/null" or return; + my @out = <$CFD>; + close $CFD; + return @out; +} + +sub slurp_dir { + my ($dir) = @_; + opendir my $DIR, $dir or return; + my @files = readdir $DIR; + closedir $DIR; + return @files; +} + +sub get_process_list { + my ($user) = @_; + my $uid = getpwnam $user; + return unless defined $uid; + my @pids = slurp_dir('/proc'); + my @userpids; + foreach my $pid (@pids) { + next unless ( $pid =~ m{\A\d+\z}xms ); + my ( undef, undef, undef, undef, $owner ) = stat "/proc/$pid"; + + # Check the stat worked, it's possible for processes to dissapear + # Take care to check for defined rather than true as root has a uid + # of zero. + next unless defined $owner; + next unless $owner == $uid; + push @userpids, $pid; + } + return @userpids; +} sub parse_args_outer { - Getopt::Long::Configure("bundling"); + Getopt::Long::Configure( 'bundling', 'pass_through' ); my $debugflag; my @ranks; my %optionhash = ( - "verbose|v+" => \$conf{verbose}, - "user|u=s" => \$user, - "rank|r=s" => \@ranks, - "group-id=s" => \@target_groups, - "help|h" => \&usage, - "all|a" => \$all, - "any|A" => \$any, - "version|V" => \&show_version, - "compress|c" => \$compress, - "compress-long|C" => \$compress_C, - "tree|t" => \$tree, - "input-file|file|i=s" => \$input_file, - "config-option|O=s" => \@config_options, - "full-report=s" => \$full_report, - "core-stack" => \$core_stack, - "core=s" => \$core_name, - "exe=s" => \$exe_name, - "list-rmgrs" => \$list_rmgrs, - "watch" => \$watch, - "local-stats" => \$local_stats, - "show-jobs" => \$show_jobs, - "norc" => \$norc, - "config-file=s" => \$configfile, - "debug=s" => \$debugflag, + 'verbose|v+' => \$conf{verbose}, + 'user|u=s' => \$target_user, + 'rank|r=s' => \@ranks, + 'group-id=s' => \@target_groups, + 'help|h' => \&usage, + 'all|a' => \$all, + 'any|A' => \$any, + 'version|V' => \&show_version, + 'compress|c' => \$output_compress, + 'compress-long|C' => \$output_compress_long, + 'tree|t' => \$output_tree, + 'input-file|file|i=s' => \$input_file, + 'full-report=s' => \$full_report, + 'core-stack' => \$core_stack, + 'core=s' => \$core_name, + 'exe=s' => \$exe_name, + 'list-rmgrs' => \$list_rmgrs, + 'watch' => \$watch, + 'local-stats' => \$local_stats, + 'show-jobs' => \$show_jobs, + 'norc' => \$norc, + 'config-file=s' => \$configfile, + 'debug=s' => \$debugflag, ); + # The primary modes, one of these only must be set. my %config_hash; + foreach my $arg ( keys %allfns ) { - $optionhash{ $allfns{$arg}{arg} } = \$config_hash{$arg}; - if ( defined $allfns{$arg}{secondary} ) { - foreach my $sec ( @{ $allfns{$arg}{secondary} } ) { - $sec->{value} = $sec->{default}; - $optionhash{ $sec->{arg} } = \$sec->{value}; - } - } - if ( defined $allfns{$arg}{options_i} ) { - foreach my $o ( keys( %{ $allfns{$arg}{options_i} } ) ) { - $conf{mode_options}{$arg}{$o} = $allfns{$arg}{options_i}{$o}; - $conf{mode_options_reverse}{$o}{$arg} = 1; - } + + # Set the primary mode in the hash. + $optionhash{ to_arg( $allfns{$arg} ) } = \$config_hash{$arg}; + } + + # Parse the options once to pick up the mode and any single letter + # options which might be bundled with it. + GetOptions(%optionhash); + + Getopt::Long::Configure( 'default', 'bundling' ); + + my $mode; + + foreach my $arg ( keys %config_hash ) { + next unless defined $config_hash{$arg}; + $mode = $arg; + $have_allfns_option++; + } + + # The secondary args, specify all of them for now as we only call + # GetOptions once. + my %sec_args; + + # Set any extra options this mode may or may not accept. + if ( defined $mode and defined $allfns{$mode}{secondary} ) { + foreach my $sec ( @{ $allfns{$mode}{secondary} } ) { + $optionhash{ to_arg($sec) } = \$sec_args{ $sec->{arg_long} }; } } - GetOptions(%optionhash) or exit(1); + # Set this for the second iteration only so that GetOptions can abort + # correctly if they are called without a value. + $optionhash{'config-option|O=s'} = \%config_options; + + GetOptions(%optionhash) or exit 1; if ( defined $debugflag ) { - foreach my $f ( split( ",", $debugflag ) ) { - my ( $name, $v ) = split( "=", $f ); - if ( exists $debugModes{$name} ) { - $debugModes{$name} = defined($v) ? $v : "basic"; + foreach my $f ( split $COMMA, $debugflag ) { + my ( $name, $v ) = split $EQUALS, $f; + if ( exists $debug_modes{$name} ) { + $debug_modes{$name} = defined $v ? $v : 'basic'; } else { - printf("Attempt to set unknown debug flag \"$name\".\n"); + print "Attempt to set unknown debug flag \"$name\".\n"; } } - if ( $debugModes{all} ) { - foreach my $mode ( keys(%debugModes) ) { - if ( not defined $debugModes{$mode} ) { - $debugModes{$mode} = $debugModes{all}; + if ( $debug_modes{all} ) { + foreach my $mode ( keys %debug_modes ) { + if ( not defined $debug_modes{$mode} ) { + $debug_modes{$mode} = $debug_modes{all}; } } } } - - my $mode; - - foreach my $arg ( keys %config_hash ) { - next unless defined $config_hash{$arg}; - $mode = $arg; - $have_allfns_option++; - } if (@ranks) { - $rank_rng = rng_convert_from_user( shift(@ranks) ); + $rank_rng = rng_convert_from_user( shift @ranks ); foreach my $rank (@ranks) { rng_merge( $rank_rng, rng_convert_from_user($rank) ); } } ***The diff for this file has been truncated for email.*** From codesite-noreply at google.com Thu Oct 1 22:08:48 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 01 Oct 2009 21:08:48 +0000 Subject: [padb-devel] [padb] r273 committed - Add "mpirun" as a valid resource manager. This works by reading... Message-ID: <001485f54712b8c0940474e60d7f@google.com> Revision: 273 Author: apittman Date: Thu Oct 1 14:07:46 2009 Log: Add "mpirun" as a valid resource manager. This works by reading the data in any mpirun process it finds and launching using pdsh. This should make padb available on all supercomputers with software stack conforming to the MPI standard. http://code.google.com/p/padb/source/detail?r=273 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 1 12:38:38 2009 +++ /trunk/src/padb Thu Oct 1 14:07:46 2009 @@ -23,6 +23,13 @@ # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA # Revision history + +# Version 3.? +# * Add "mpirun" as a resource manager, this causes it walk the local +# process list looking for processes called mpirun and to get the pid +# and hostlist by reading data from Mpir_Proctable as specified in the +# origional paper. Padb then launches itself via pdsh. +# # Version 3.0 # * Full-duplex communication between inner and outer processes, padb no # longer simply sends request on the command line and processes the @@ -306,6 +313,13 @@ my %rmgr; +$rmgr{mpirun} = { + get_active_jobs => \&mpirun_get_jobs, + job_is_running => \&local_job_is_running, + setup_pcmd => \&mpirun_setup_pcmd, + require_inner_callback => 1, +}; + $rmgr{rms} = { is_installed => \&rms_is_installed, get_active_jobs => \&rms_get_jobs, @@ -329,7 +343,6 @@ get_active_jobs => \&open_get_jobs, setup_pcmd => \&open_setup_pcmd, cleanup_pcmd => \&open_cleanup_pcmd, - find_pids => \&open_find_pids, }; $rmgr{'lsf-rms'} = { @@ -403,6 +416,7 @@ $conf{interval} = '10s'; $conf{watch_clears_screen} = 'enabled'; $conf{scripts} = 'bash,sh,dash,ash,perl,xterm'; +$conf{mpirun} = 'mpirun,orterun,srun,mpdrun,prun'; $conf{lsf_job_offset} = 1; $conf{local_fd_name} = '/dev/null'; $conf{inner_callback} = 'disabled'; @@ -2723,7 +2737,7 @@ my $cmd = "orterun -machinefile $fn -np $i $prefix"; my $hosts = $#hosts + 1; - return ( $cmd, $open_jobs{$job}{nprocs}, $hosts ); + return ( $cmd, $open_jobs{$job}{nprocs}, $hosts, $open_jobs{$job}{ranks} ); } sub open_cleanup_pcmd { @@ -2793,6 +2807,77 @@ return ( $cmd, $ncpus ); } + +############################################################################### +# +# mpirun support. +# +############################################################################### + +sub mpirun_get_jobs { + my $user = shift; + + my @jobs; + + my %mpirun; + + map { $mpirun{$_}++ } split $COMMA, $conf{mpirun}; + + foreach my $pid ( get_process_list($user) ) { + my $name = find_from_status( $pid, "Name" ); + if ( defined $mpirun{$name} ) { + push @jobs, $pid; + next; + } + + my $link = readlink "/proc/$pid/exe"; + next unless defined $link; + if ( defined $mpirun{ basename($link) } ) { + push @jobs, $pid; + } + } + return @jobs; +} + +sub mpirun_setup_pcmd { + my ($job) = @_; + + if ( not find_exe('pdsh') ) { + printf("mpirun resource manager requires pdsh to be installed\n"); + return; + } + + my $gdb = gdb_start(); + if ( not gdb_attach( $gdb, $job ) ) { + return; + } + + my $nprocs = gdb_read_value( $gdb, "MPIR_proctable_size" ); + + my %pt; + foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { + my $hostd = gdb_read_value( $gdb, "MPIR_proctable[$proc].host_name" ); + + if ( $hostd =~ m{\"(\w+)\"\z}x ) { + my $host = $1; + + my $pid = gdb_read_value( $gdb, "MPIR_proctable[$proc].pid" ); + + $pt{$host}{$proc} = $pid; + } + } + + gdb_detach($gdb); + gdb_quit($gdb); + + my @hosts = keys(%pt); + + my $hlist = join q{,}, @hosts; + + my $cmd = "pdsh -w $hlist"; + my $hc = @hosts; + return ( $cmd, $nprocs, $hc, \%pt ); +} ############################################################################### # @@ -3604,8 +3689,8 @@ # configuration options. # XXX: Need to send over scripts and other stuff here as well. - if ( $conf{rmgr} eq 'orte' ) { - $req->{orte_data} = $open_jobs{ $comm_data->{jobid} }{ranks}; + if ( defined $comm_data->{pd} ) { + $req->{pd} = $comm_data->{pd}; } $req->{cinner} = \%cinner; @@ -4208,10 +4293,7 @@ } sub go_parallel { - my $jobid = shift; - my $cmd = shift; - my $nprocesses = shift; - my $nhosts = shift; + my ( $jobid, $cmd, $nprocesses, $nhosts, $pd ) = @_; my $comm_data; @@ -4229,6 +4311,12 @@ $cdata{event_cb} = \&handle_event_from_port; $comm_data->{sockets}{$sl} = \%cdata; } + + if ( defined $pd ) { + debug_log( 'verbose', $pd, + 'Remote process data available on frontend' ); + $comm_data->{pd} = $pd; + } map { $cmd .= " --$_=\"$cinner_cmd{$_}\"" } keys %cinner_cmd; @@ -4375,7 +4463,7 @@ # Setup whatever is needed for running parallel commands, note this # might involve setting environment variables. - my ( $cmd, $ncpus, $hosts ) = setup_pcmd($jobid); + my ( $cmd, $ncpus, $hosts, $pd ) = setup_pcmd($jobid); $conf{verbose} && defined $ncpus && print "Job has $ncpus process(es)\n"; $conf{verbose} && defined $hosts && print "Job spans $hosts host(s)\n"; @@ -4386,11 +4474,10 @@ $cmd .= " $0 --inner"; if ( not defined $hosts ) { - print "Full duplex mode needs to know the host count\n"; - print "Which is doesn't for this resource manager: $conf{rmgr}\n"; + print "Fatal problem setting up the resource manager: $conf{rmgr}\n"; return 1; } - my $errors = go_parallel( $jobid, $cmd, $ncpus, $hosts ); + my $errors = go_parallel( $jobid, $cmd, $ncpus, $hosts, $pd ); debug_log( 'verbose', undef, 'Completed command' ); @@ -6991,17 +7078,6 @@ } return; } - -sub open_find_pids { - my $job = shift; - - my $hostname = $inner_conf{hostname}; - - foreach my $rank ( keys %{ $inner_conf{orte_data}{$hostname} } ) { - maybe_show_pid( $rank, $inner_conf{orte_data}{$hostname}{$rank} ); - } - return; -} sub rms_find_pids { my $jobid = shift; @@ -7289,13 +7365,17 @@ sub inner_find_pids { my ( $netdata, $cmd ) = @_; - if ( $inner_conf{rmgr} eq 'orte' ) { - $inner_conf{orte_data} = $cmd->{orte_data}; - } - - # Query the resource manager to find the pids, they'll be added to the - # "all_pids" array. - $rmgr{ $inner_conf{rmgr} }{find_pids}( $inner_conf{jobid} ); + if ( defined $cmd->{pd} ) { + my $hostname = $inner_conf{hostname}; + foreach my $rank ( keys %{ $cmd->{pd}{$hostname} } ) { + maybe_show_pid( $rank, $cmd->{pd}{$hostname}{$rank} ); + } + } else { + + # Query the resource manager to find the pids, they'll be added to + # the "all_pids" array. + $rmgr{ $inner_conf{rmgr} }{find_pids}( $inner_conf{jobid} ); + } convert_pids_to_child_pids(); From codesite-noreply at google.com Tue Oct 6 17:30:00 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Tue, 06 Oct 2009 16:30:00 +0000 Subject: [padb-devel] [padb] r274 committed - Update the version number to 3.0 Message-ID: <0016363b7bdede0003047546bd7b@google.com> Revision: 274 Author: apittman Date: Tue Oct 6 09:29:10 2009 Log: Update the version number to 3.0 http://code.google.com/p/padb/source/detail?r=274 Modified: /branches/3.0/src/Makefile /branches/3.0/src/padb ======================================= --- /branches/3.0/src/Makefile Tue Sep 15 14:01:13 2009 +++ /branches/3.0/src/Makefile Tue Oct 6 09:29:10 2009 @@ -1,7 +1,7 @@ INSTALL_DIR=/usr/local/ CONFIG_DIR=/etc -VERSION=3.0-rc2 +VERSION=3.0 FILES = Makefile minfo.c mpi_interface.h padb ======================================= --- /branches/3.0/src/padb Tue Sep 15 14:01:13 2009 +++ /branches/3.0/src/padb Tue Oct 6 09:29:10 2009 @@ -250,7 +250,7 @@ # Main. my $prog = basename $0; -my $version = "3.0-rc2"; +my $version = "3.0"; my %conf; From codesite-noreply at google.com Tue Oct 6 17:39:19 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Tue, 06 Oct 2009 16:39:19 +0000 Subject: [padb-devel] [padb] r275 committed - Update the web page to point to the new 3.0 release. Message-ID: <0016e64714d036945f047546dfbe@google.com> Revision: 275 Author: apittman Date: Tue Oct 6 09:38:43 2009 Log: Update the web page to point to the new 3.0 release. http://code.google.com/p/padb/source/detail?r=275 Modified: /trunk/doc/download.html /trunk/doc/index.html ======================================= --- /trunk/doc/download.html Tue Sep 15 14:06:16 2009 +++ /trunk/doc/download.html Tue Oct 6 09:38:43 2009 @@ -2,15 +2,13 @@

Beta release

-A 3.0 RC2 beta release is on-line at the google downloads section. The target date for this relese is 19-8-09. +No beta releases are available at this time.

Current stable release

-The latest stable release, 2.5 can be downloaded direct from the google downloads section. - +The latest stable release, 3.0 can be downloaded direct from the google downloads section.

-This is first "stable" release since the Quarics 2.2 version.

Source code download

======================================= --- /trunk/doc/index.html Tue Sep 15 14:06:16 2009 +++ /trunk/doc/index.html Tue Oct 6 09:38:43 2009 @@ -18,6 +18,7 @@

Recent News

    +
  • 06-10-09: Final 3.0 release avaliable for download
  • 15-09-09: 3.0-rc2 avaliable for download
  • 01-09-09: A 3.0-rc release is avaliable to download from the downloads page. From codesite-noreply at google.com Tue Oct 6 19:50:58 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Tue, 06 Oct 2009 18:50:58 +0000 Subject: [padb-devel] [padb] r276 committed - Update the menu link to reflect the new release. Message-ID: <001636c5be01fe8f3c047548b5ec@google.com> Revision: 276 Author: apittman Date: Tue Oct 6 11:49:59 2009 Log: Update the menu link to reflect the new release. http://code.google.com/p/padb/source/detail?r=276 Modified: /trunk/doc/download.html /trunk/doc/header.html ======================================= --- /trunk/doc/download.html Tue Oct 6 09:38:43 2009 +++ /trunk/doc/download.html Tue Oct 6 11:49:59 2009 @@ -6,10 +6,9 @@

    Current stable release

    -The latest stable release, 3.0 can be downloaded direct from the google downloads section. +The latest stable release, 3.0 can be downloaded direct from the google downloads section. This release was made on 06-10-09.

    -

    Source code download

    Source code can be downloaded direct from google at the ======================================= --- /trunk/doc/header.html Mon Sep 14 01:56:33 2009 +++ /trunk/doc/header.html Tue Oct 6 11:49:59 2009 @@ -37,8 +37,7 @@
  • Download -
    Beta 3.0 -
    Stable 2.5 +
    3.0 Release
    Source code
  • Patches From codesite-noreply at google.com Tue Oct 6 20:10:13 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Tue, 06 Oct 2009 19:10:13 +0000 Subject: [padb-devel] [padb] r277 committed - Add SVN keywords for the file, URL and date in the comments, revision... Message-ID: <001485f9f428dc9d4a047548fa51@google.com> Revision: 277 Author: apittman Date: Tue Oct 6 12:09:23 2009 Log: Add SVN keywords for the file, URL and date in the comments, revision in the comment and also in the version string reported to users. http://code.google.com/p/padb/source/detail?r=277 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 1 14:07:46 2009 +++ /trunk/src/padb Tue Oct 6 12:09:23 2009 @@ -2,6 +2,10 @@ # padb. a simple parallel debugging aid. +# $URL$ +# $Date$ +# $Revision$ + # For help and support visit http://padb.pittman.org.uk # or email padb-users at pittman.org.uk @@ -273,9 +277,16 @@ # Outer main # Inner # Main. + +my $svn_revision_string = '$Revision$'; +my $svn_revision = 'unknown'; + +if ( $svn_revision_string =~ m{(\d+)} ) { + $svn_revision = $1; +} my $prog = basename $0; -my $version = '3.0-rc1'; +my $version = "3.n (Revision $svn_revision)"; my %conf; From codesite-noreply at google.com Wed Oct 7 18:57:11 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Wed, 07 Oct 2009 17:57:11 +0000 Subject: [padb-devel] [padb] r278 committed - Add locals and params to tree based stack traces! Pass back... Message-ID: <0016368e1fec7cd7fd04755c13b8@google.com> Revision: 278 Author: apittman Date: Wed Oct 7 10:56:40 2009 Log: Add locals and params to tree based stack traces! Pass back the tree as normal but add target_key_pairs() for the names and values of variables in the code. Reconstruct these in the display_tree() function and show each variable along with it's values and rank_specs which exehibit those values. Limit the maximun number of distinct values which are displayed using the max_distinct_values setting (default 3). This code is functional but isn't tidy and adds a good deal of mess which needs to be cleaned up. http://code.google.com/p/padb/source/detail?r=278 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Tue Oct 6 12:09:23 2009 +++ /trunk/src/padb Wed Oct 7 10:56:40 2009 @@ -2092,175 +2092,6 @@ return 1; } - -# This function returns an reference to an array of hashes, each hash -# containing the "txt" of the function name and a further array of hash -# references called "children". -sub _make_tree { - my ( $level, $lines, $trange ) = @_; - - my @peers; - my $prev; - my $tag = rng_shift($trange); - - if (0) { - debug_log( 'tree', undef, 'called tag:%s, level:%d tags %s', - $tag, $level, rng_convert_to_user($trange) ); - } - - return if ( not defined $tag ); - return if ( not defined $lines->{$tag} ); - - my @identical; - my $different_rng; - - my $endlevel = $level; - - # Populate the two lists, @identical and @different - my $line = $lines->{$tag}[$level]; - if ( defined $line ) { - $different_rng = rng_create_empty(); - while ( defined( my $tag2 = rng_shift($trange) ) ) { - if ( defined $lines->{$tag2}[$level] - and $line eq $lines->{$tag2}[$level] ) - { - push @identical, $tag2; - } else { - rng_add_value( $different_rng, $tag2 ); - } - } - } else { - $different_rng = $trange; - } - - # Move $endlevel on as far as we can... - if ( $#identical >= 0 ) { - my $nextidentical; - do { - $nextidentical = 0; - my $nextfound = 0; - $endlevel++; - if ( defined $lines->{$tag}[$endlevel] ) { - foreach my $tag2 (@identical) { - if ( defined $lines->{$tag2}[$endlevel] - and $lines->{$tag}[$endlevel] eq - $lines->{$tag2}[$endlevel] ) - { - $nextfound++; - } - } - } - if ( ( $#identical + 1 ) == $nextfound ) { - $nextidentical = 1; - } - } while $nextidentical; - $endlevel--; - } else { - $endlevel = ( $#{ $lines->{$tag} } ); - } - - if (0) { - debug_log( - 'tree', - undef, - "level $level, endlevel $endlevel, identical:%s different: %s", - rng_convert_to_user( rng_create_from_array(@identical) ), - rng_convert_to_user($different_rng) - ); - } - - for ( my $l = $level ; $l <= $endlevel ; $l++ ) { - - my %this; - $this{txt} = $lines->{$tag}[$l]; - - # @{ $this{vps} } = ( $tag, @identical ); - - # The plus two here is one for $tag and one to convert from array - # idx to number of entries. - $this{vpcount} = $#identical + 2; - $this{vpspec} = - rng_convert_to_user( rng_create_from_array( ( $tag, @identical ) ) ); - - if ( defined $prev ) { - push @{ $prev->{children} }, \%this; - } else { - push @peers, \%this; - } - - $prev = \%this; - - } - - if ( $#identical >= 0 ) { - - if ( $endlevel != $#{ $lines->{$tag} } + 1 ) { - unshift @identical, $tag; - } - - my $r = rng_create_from_array(@identical); - $prev->{children} = _make_tree( $endlevel + 1, $lines, $r ); - } - - if (0) { - debug_log( - 'tree', - undef, -"returning level:$level endlevel:$endlevel identical:%s different: %s", - rng_convert_to_user( rng_create_from_array(@identical) ), - rng_convert_to_user($different_rng) - ); - } - - if ( not rng_empty($different_rng) ) { - my $new = _make_tree( $level, $lines, $different_rng ); - push @peers, ( @{$new} ); - } - - return \@peers; -} - -# Convert the rank output into a tree based form suitable for use with show_tree. -sub make_tree { - my ($lines) = @_; - my $rng = rng_create_from_array( keys %{$lines} ); - return _make_tree( 0, $lines, $rng ); -} - -# Takes a ref to a array of hashes... -sub _show_tree { - - my ( $ref, $parent, $indent ) = @_; - - my $ret = $EMPTY_STRING; - - # Don't need to sort the peers here because make_tree ensures the are - # sorted. - # my @peers = sort { $a->{vps}[0] <=> $b->{vps}[0] } ( @{$ref} ); - - my @peers = @{$ref}; - foreach my $peer (@peers) { - - if ( $#peers != 0 or $parent ne $peer->{vpspec} ) { - $ret .= "$indent-----------------\n"; - $ret .= "$indent$peer->{vpspec} ($peer->{vpcount} processes)\n"; - $ret .= "$indent-----------------\n"; - } - - $ret .= "$indent$peer->{txt}\n"; - if ( defined $peer->{children} ) { - $ret .= - _show_tree( $peer->{children}, $peer->{vpspec}, "$indent " ); - } - } - return $ret; -} - -sub show_tree { - my $ref = shift; - debug_log( 'tree', $ref, 'Complete tree' ); - return _show_tree( $ref, "no-parent", $EMPTY_STRING ); -} ############################################################################### # @@ -3362,7 +3193,7 @@ my $mode = $req->{mode}; if ( defined $req->{out_format} ) { - complex_output_handler( $req->{out_format}, $lines ); + complex_output_handler( $req->{out_format}, $lines, $d ); } else { my $nprocesses = keys %{ $d->{target_output} }; foreach my $process ( sort { $a <=> $b } keys %{ $d->{target_output} } ) @@ -3402,7 +3233,7 @@ } sub _display_tree { - my ( $tree, $parent, $indent ) = @_; + my ( $tree, $d, $parent, $indent, $path, $enforce_spec ) = @_; my $ret = $EMPTY_STRING; @@ -3410,39 +3241,141 @@ my @peers = sort { $tree->{$a}->{min} <=> $tree->{$b}->{min} } keys %{$tree}; + # This is ugly, dip inside the mode_options for the only mode which + # sets this value. + my $max_show = $conf{mode_options}{stack}{max_distinct_values}; + + my $child_enforce_spec = 0; foreach my $peer (@peers) { my $vpspec = rng_convert_to_user( $tree->{$peer}->{range} ); - if ( $#peers != 0 or $parent ne $vpspec ) { + if ( $#peers != 0 or $parent ne $vpspec or $enforce_spec ) { $ret .= "$indent-----------------\n"; $ret .= "$indent$vpspec ($tree->{$peer}->{count} processes)\n"; $ret .= "$indent-----------------\n"; } $ret .= "$indent$peer\n"; + if ( defined $d->{target_data} ) { + my $l = "$path,$peer"; + + if ( defined $d->{target_data}{"$peer|params"} ) { + my @params_lists = + sort keys %{ $d->{target_data}{"$peer|params"} }; + + # It's not impossible that the same function on the same + # line might have different params or locals, for example + # it could be a different binary. It's probably rare + # enough that we can ignore it however. + my @params = split $COMMA, $params_lists[0]; + + $ret .= "$indent params\n" if ( @params > 0 ); + foreach my $var (@params) { + + my $key = "$l|var|$var"; + my @values = keys %{ $d->{target_data}{$key} }; + + my $type = ''; + + my @type_list = + sort keys %{ $d->{target_data}{"$peer|param_type| $var"} }; + $type = $type_list[0]; + + $child_enforce_spec = 1; + if ( @values == 1 ) { + foreach my $value ( sort @values ) { + $ret .= + "$indent $type $var = '$value' " + . rng_convert_to_user( + $d->{target_data}{$key}{$value} ) + . "\n"; + } + } elsif ( @values > $max_show ) { + $ret .= +"$indent $type $var: \n"; + } else { + $ret .= "$indent $type $var:\n"; + foreach my $value ( sort @values ) { + $ret .= + "$indent '$value' " + . rng_convert_to_user( + $d->{target_data}{$key}{$value} ) + . "\n"; + } + } + } + } + + if ( defined $d->{target_data}{"$peer|locals"} ) { + my @locals_lists = keys %{ $d->{target_data}{"$peer| locals"} }; + + # It's not impossible that the same function on the same + # line might have different params or locals, for example + # it could be a different binary. In the case of locals + # simply load all of them. + my @locals = split $COMMA, join( q{,}, @locals_lists ); + + $ret .= "$indent locals\n" if ( @locals > 0 ); + foreach my $var (@locals) { + + my $key = "$l|var|$var"; + my @values = keys %{ $d->{target_data}{$key} }; + + my $type = ''; + + my @type_list = + sort keys %{ $d->{target_data}{"$peer|var_type| $var"} }; + $type = $type_list[0]; + + $child_enforce_spec = 1; + if ( @values == 1 ) { + foreach my $value ( sort @values ) { + $ret .= + "$indent $type $var = '$value' " + . rng_convert_to_user( + $d->{target_data}{$key}{$value} ) + . "\n"; + } + } elsif ( @values > $max_show ) { + $ret .= +"$indent $type $var: \n"; + } else { + $ret .= "$indent $type $var:\n"; + foreach my $value ( sort @values ) { + $ret .= + "$indent '$value' " + . rng_convert_to_user( + $d->{target_data}{$key}{$value} ) + . "\n"; + } + } + } + } + } if ( defined $tree->{$peer}->{desc} ) { - $ret .= - _display_tree( $tree->{$peer}->{desc}, $vpspec, "$indent " ); + $ret .= _display_tree( $tree->{$peer}->{desc}, + $d, $vpspec, "$indent ", "$path,$peer", $child_enforce_spec ); } } return $ret; } sub display_tree { - my ($tree) = @_; - return _display_tree( $tree, "no-parent", $EMPTY_STRING ); + my ( $tree, $d ) = @_; + return _display_tree( $tree, $d, "no-parent", $EMPTY_STRING, $EMPTY_STRING, + 1 ); } # An experimental new tree format. sub new_tree { - my ($lines) = @_; + my ( $lines, $d ) = @_; my %tree; debug_log( 'tree', undef, 'Making the tree' ); foreach my $tag ( sort { $a <=> $b } keys %{$lines} ) { add_tag_to_tree( \%tree, $tag, $lines->{$tag} ); } debug_log( 'tree', undef, 'Formatting the tree' ); - my $t = display_tree( \%tree ); + my $t = display_tree( \%tree, $d ); debug_log( 'tree', undef, 'Displaying the tree' ); print $t; debug_log( 'tree', undef, 'Done' ); @@ -3450,12 +3383,10 @@ } sub complex_output_handler { - my ( $output, $lines ) = @_; + my ( $output, $lines, $d ) = @_; if ( $output eq 'tree' ) { - - #print show_tree( make_tree($lines) ); - new_tree($lines); + new_tree( $lines, $d ); } elsif ( $output eq 'compress' ) { foreach my $tag ( sort { $a <=> $b } ( keys %{$lines} ) ) { @@ -5166,7 +5097,7 @@ } } - croak("Failed to extrace square braces from $str"); + croak("Failed to extract square braces from $str"); } sub gdb_extract_value_braces { @@ -6708,6 +6639,7 @@ } } + my @fl = $EMPTY_STRING; foreach my $frame ( reverse @frames ) { target_error( $vp, "error from gdb: $frame->{error}" ) @@ -6724,17 +6656,67 @@ $strip_below = undef; - my $l = output( - $vp, sprintf "%s() at %s:%s", - $function, - ( $frame->{file} || '?' ), - ( $frame->{line} || '?' ) - ); - if ( $carg->{stack_shows_params} ) { - show_stack_vars( $vp, $frame, 'params' ); - } - if ( $carg->{stack_shows_locals} ) { - show_stack_vars( $vp, $frame, 'locals' ); + my $l = sprintf "%s() at %s:%s", + $function, + ( $frame->{file} || '?' ), + ( $frame->{line} || '?' ); + + output $vp, $l; + + if ( $carg->{out_format} eq 'tree' ) { + push @fl, $l; + my $fl = join( ",", @fl ); + if ( $carg->{stack_shows_locals} ) { + my @local_names; + foreach my $loc ( @{ $frame->{locals} } ) { + push @local_names, $loc->{name}; + target_key_pair( $vp, "$l|var_type| $loc->{name}", + $loc->{type} ); + + if ( length $loc->{value} > 70 ) { + target_key_pair( + $vp, + $fl . '|var|' . $loc->{name}, + '' + ); + } else { + target_key_pair( $vp, + $fl . '|var|' . $loc->{name}, + $loc->{value} ); + } + } + target_key_pair( $vp, "$l|locals", + join( q{,}, sort @local_names ) ); + } + if ( $carg->{stack_shows_params} ) { + + my @param_names; + foreach my $par ( @{ $frame->{params} } ) { + push @param_names, $par->{name}; + target_key_pair( $vp, "$l|param_type| $par->{name}", + $par->{type} ); + if ( length $par->{value} > 70 ) { + target_key_pair( + $vp, + $fl . '|var|' . $par->{name}, + '' + ); + } else { + target_key_pair( $vp, + $fl . '|var|' . $par->{name}, + $par->{value} ); + } + } + target_key_pair( $vp, "$l|params", + join( q{,}, @param_names ) ); + } + } else { + if ( $carg->{stack_shows_params} ) { + show_stack_vars( $vp, $frame, 'params' ); + } + if ( $carg->{stack_shows_locals} ) { + show_stack_vars( $vp, $frame, 'locals' ); + } } # Strip below this function if we need to. @@ -7513,8 +7495,18 @@ # data for this node/rank. if ( defined $allfns{ $cmd->{mode} }{handler_all} ) { eval { + + # Bit of a hack here until I can fix it properly, pass on the + # output format so that the stack trace code knows when to do + # clever things in tree mode. + my $cargs = $cmd->{cargs}; + if ( defined $cmd->{out_format} ) { + $cargs->{out_format} = $cmd->{out_format}; + } else { + $cargs->{out_format} = 'raw'; + } $netdata->{target_responce} = - $allfns{ $cmd->{mode} }{handler_all}( $cmd->{cargs}, $pid_list ); + $allfns{ $cmd->{mode} }{handler_all}( $cargs, $pid_list ); 1; } or do { my $error = $@; @@ -7920,14 +7912,15 @@ arg_short => 'x', help => 'Show stack trace (see also -t)', options_i => { - gdb_retry_count => 3, + gdb_retry_count => 3, + max_distinct_values => 3, stack_strip_above => 'elan_waitWord,elan_pollWord,elan_deviceCheck,opal_condition_wait,opal_progress', stack_strip_below => 'main,__libc_start_main,start_thread', }, options_bool => { - stack_shows_params => 'no', - stack_shows_locals => 'no', + stack_shows_params => 'yes', + stack_shows_locals => 'yes', }, secondary => [ { From codesite-noreply at google.com Wed Oct 7 21:00:25 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Wed, 07 Oct 2009 20:00:25 +0000 Subject: [padb-devel] [padb] r279 committed - Nested quantifiers in regex, apparantly older perl versions don't like... Message-ID: <001485f9f4283ccb2404755dccfa@google.com> Revision: 279 Author: apittman Date: Wed Oct 7 12:59:15 2009 Log: Nested quantifiers in regex, apparantly older perl versions don't like them. http://code.google.com/p/padb/source/detail?r=279 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Wed Oct 7 10:56:40 2009 +++ /trunk/src/padb Wed Oct 7 12:59:15 2009 @@ -5036,7 +5036,7 @@ if ( $str =~ m{\A # Start of str. " # Quote - ((?:[^"\\]++|\\.)*+) # Anyting which isn't \" + ((?:[^"\\]+|\\.)*) # Anyting which isn't \" " # Close quote ,? # An optional comma. (.*) # Rest of line From codesite-noreply at google.com Thu Oct 8 11:49:20 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 08 Oct 2009 10:49:20 +0000 Subject: [padb-devel] [padb] r280 committed - Add a --create-secret-file option to automatically generate a secret f... Message-ID: <0016e640d42e3d301104756a372b@google.com> Revision: 280 Author: apittman Date: Thu Oct 8 03:48:22 2009 Log: Add a --create-secret-file option to automatically generate a secret file. Also change the code to read them to accept digits and full stops as well as just letters (replace \w+ with [\w\d\.]+). http://code.google.com/p/padb/source/detail?r=280 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Wed Oct 7 12:59:15 2009 +++ /trunk/src/padb Thu Oct 8 03:48:22 2009 @@ -695,6 +695,7 @@ my $full_report; my $core_stack; my $list_rmgrs; +my $create_secret; my $watch; my $local_stats; my $show_jobs; @@ -821,6 +822,7 @@ 'norc' => \$norc, 'config-file=s' => \$configfile, 'debug=s' => \$debugflag, + 'create-secret-file' => \$create_secret, ); # The primary modes, one of these only must be set. @@ -4352,6 +4354,24 @@ return 0; } + +sub create_padb_secret { + my $filename = "$ENV{HOME}/.padb-secret"; + my $FD; + if ( not open $FD, '>', $filename ) { + printf("Failed to create secret file: $!\n"); + return; + } + if ( chmod( 0600, $FD ) != 1 ) { + printf("Failed to chmod secret file: $!\n"); + return; + } + my $s = rand; + print {$FD} "secret=$s\n"; + close $FD; + print("Sucessfully created secret file ($filename)\n"); + return; +} sub find_padb_secret { @@ -4375,9 +4395,11 @@ if ( $#l != 0 ) { return; } - if ( $l[0] =~ m{\Asecret=(\w+)\Z}x ) { + if ( $l[0] =~ m{\Asecret=([\w\d\.]+)\Z}x ) { return $1; } + print "Failed to load secret from file ($file)\n"; + exit 1; } sub go_job { @@ -4394,6 +4416,7 @@ if ( not defined $secret ) { print "Error: Could not load secret file on this node\n"; + print "Use --create-secret-file to create one\n"; exit 1; } @@ -4659,6 +4682,11 @@ debug_log( 'config', \%conf, 'Finished setting configuration options' ); + if ($create_secret) { + create_padb_secret(); + exit 0; + } + if ($list_rmgrs) { foreach my $res ( sort keys %rmgr ) { my $working = 'yes'; From codesite-noreply at google.com Thu Oct 8 12:04:42 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 08 Oct 2009 11:04:42 +0000 Subject: [padb-devel] [padb] r281 committed - When viewing stack vars in tree mode only pass back the list of vars... Message-ID: <0016e640d47e2f6e2104756a6e32@google.com> Revision: 281 Author: apittman Date: Thu Oct 8 04:04:19 2009 Log: When viewing stack vars in tree mode only pass back the list of vars if it's non-empty. This saves a small amount of needless network traffic in some cases. http://code.google.com/p/padb/source/detail?r=281 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 8 03:48:22 2009 +++ /trunk/src/padb Thu Oct 8 04:04:19 2009 @@ -6713,8 +6713,10 @@ $loc->{value} ); } } - target_key_pair( $vp, "$l|locals", - join( q{,}, sort @local_names ) ); + if ( @local_names > 0 ) { + target_key_pair( $vp, "$l|locals", + join( q{,}, sort @local_names ) ); + } } if ( $carg->{stack_shows_params} ) { @@ -6735,8 +6737,10 @@ $par->{value} ); } } - target_key_pair( $vp, "$l|params", - join( q{,}, @param_names ) ); + if ( @param_names > 0 ) { + target_key_pair( $vp, "$l|params", + join( q{,}, @param_names ) ); + } } } else { if ( $carg->{stack_shows_params} ) { From codesite-noreply at google.com Thu Oct 8 12:26:23 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 08 Oct 2009 11:26:23 +0000 Subject: [padb-devel] [padb] r282 committed - Backport of r280 to the 3.0 branch, add a --create-secret-file option. Message-ID: <001485f854bebe456a04756abbda@google.com> Revision: 282 Author: apittman Date: Thu Oct 8 04:25:23 2009 Log: Backport of r280 to the 3.0 branch, add a --create-secret-file option. http://code.google.com/p/padb/source/detail?r=282 Modified: /branches/3.0/src/padb ======================================= --- /branches/3.0/src/padb Tue Oct 6 09:29:10 2009 +++ /branches/3.0/src/padb Thu Oct 8 04:25:23 2009 @@ -23,6 +23,12 @@ # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA # Revision history +# +# Version 3.1 +# * Added a --create-secret-file option to automatically create a secret +# file +# * Allow the secret file to contain digits and dots as well as letters +# # Version 3.0 # * Full-duplex communication between inner and outer processes, padb # no longer simply sends request on the command line and processes @@ -250,7 +256,7 @@ # Main. my $prog = basename $0; -my $version = "3.0"; +my $version = "3.1"; my %conf; @@ -566,6 +572,7 @@ my $full_report; my $core_stack; my $list_rmgrs; +my $create_secret; my $watch; my $local_stats; my $show_jobs; @@ -652,6 +659,7 @@ "norc" => \$norc, "config-file=s" => \$configfile, "debug=s" => \$debugflag, + 'create-secret-file' => \$create_secret, ); my %config_hash; @@ -4138,6 +4146,24 @@ return 0; } + +sub create_padb_secret { + my $filename = "$ENV{HOME}/.padb-secret"; + my $FD; + if ( not open $FD, '>', $filename ) { + printf("Failed to create secret file: $!\n"); + return; + } + if ( chmod( 0600, $FD ) != 1 ) { + printf("Failed to chmod secret file: $!\n"); + return; + } + my $s = rand; + print {$FD} "secret=$s\n"; + close $FD; + print("Sucessfully created secret file ($filename)\n"); + return; +} sub find_padb_secret { @@ -4163,9 +4189,11 @@ if ( $#l != 0 ) { return; } - if ( $l[0] =~ /^secret=(\w+)$/ ) { + if ( $l[0] =~ /^secret=([\d\w\.]+)$/ ) { return $1; } + print "Failed to load secret from file ($file)\n"; + exit 1; } sub go_job { @@ -4182,6 +4210,7 @@ if ( not defined $secret ) { printf("Error: Could not load secret file on this node\n"); + print "Use --create-secret-file to create one\n"; exit(1); } @@ -4401,6 +4430,11 @@ } config_set( $key, $val ); } + + if ($create_secret) { + create_padb_secret(); + exit 0; + } if ($list_rmgrs) { foreach my $res ( sort( keys %rmgr ) ) { From codesite-noreply at google.com Thu Oct 8 13:37:27 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 08 Oct 2009 12:37:27 +0000 Subject: [padb-devel] [padb] r283 committed - Changes to the mpirun rmgr, allow hosts with digits in the name and... Message-ID: <005045017e32df8f3304756bb919@google.com> Revision: 283 Author: apittman Date: Thu Oct 8 05:36:53 2009 Log: Changes to the mpirun rmgr, allow hosts with digits in the name and secondly avoid the use of pdsh if there is only one active host and it's the machine padb is running on. http://code.google.com/p/padb/source/detail?r=283 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 8 04:04:19 2009 +++ /trunk/src/padb Thu Oct 8 05:36:53 2009 @@ -2702,7 +2702,7 @@ foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { my $hostd = gdb_read_value( $gdb, "MPIR_proctable[$proc].host_name" ); - if ( $hostd =~ m{\"(\w+)\"\z}x ) { + if ( $hostd =~ m{\"([\w\d]+)\"\z}x ) { my $host = $1; my $pid = gdb_read_value( $gdb, "MPIR_proctable[$proc].pid" ); @@ -2716,10 +2716,13 @@ my @hosts = keys(%pt); - my $hlist = join q{,}, @hosts; - - my $cmd = "pdsh -w $hlist"; - my $hc = @hosts; + my $cmd = $EMPTY_STRING; + if ( $hosts[0] ne hostname() or @hosts > 1 ) { + my $hlist = join q{,}, @hosts; + $cmd = "pdsh -w $hlist"; + } + + my $hc = @hosts; return ( $cmd, $nprocs, $hc, \%pt ); } From codesite-noreply at google.com Thu Oct 8 15:57:54 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 08 Oct 2009 14:57:54 +0000 Subject: [padb-devel] [padb] r284 committed - Also allow dashes in hostnames and warn if they hostnames aren't match... Message-ID: <001636283fda267e2b04756db0ad@google.com> Revision: 284 Author: apittman Date: Thu Oct 8 07:57:31 2009 Log: Also allow dashes in hostnames and warn if they hostnames aren't matched. http://code.google.com/p/padb/source/detail?r=284 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 8 05:36:53 2009 +++ /trunk/src/padb Thu Oct 8 07:57:31 2009 @@ -2702,12 +2702,12 @@ foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { my $hostd = gdb_read_value( $gdb, "MPIR_proctable[$proc].host_name" ); - if ( $hostd =~ m{\"([\w\d]+)\"\z}x ) { + if ( $hostd =~ m{\"([\w\d\-\.]+)\"\z}x ) { my $host = $1; - my $pid = gdb_read_value( $gdb, "MPIR_proctable[$proc].pid" ); - $pt{$host}{$proc} = $pid; + } else { + print "Failed to extract hostname from $hostd\n"; } } From codesite-noreply at google.com Sun Oct 11 20:34:45 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Sun, 11 Oct 2009 19:34:45 +0000 Subject: [padb-devel] [padb] r285 committed - Ensure calls to output() use brackets. Message-ID: <001485f7d6ceca6a400475ade778@google.com> Revision: 285 Author: apittman Date: Sun Oct 11 12:33:54 2009 Log: Ensure calls to output() use brackets. http://code.google.com/p/padb/source/detail?r=285 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 8 07:57:31 2009 +++ /trunk/src/padb Sun Oct 11 12:33:54 2009 @@ -5662,7 +5662,7 @@ $proc->{gdb} = $gdb; push @all, $proc; } else { - output $vp, 'Failed to attach to to process'; + output( $vp, 'Failed to attach to to process' ); } } @@ -6110,7 +6110,7 @@ or p_die( $vp, "cant start command $cmd" ); while (<$CMD>) { chomp $_; - output $vp, $_; + output( $vp, $_ ); $lines++; } send_cont_signal($pid); @@ -6123,7 +6123,7 @@ open my $CMDS, '-|', "$cmd" or p_die $vp, 'cant fork subcommand'; while (<$CMDS>) { chomp $_; - output $vp, $_; + output( $vp, $_ ); } close $CMDS; return; @@ -6692,7 +6692,7 @@ ( $frame->{file} || '?' ), ( $frame->{line} || '?' ); - output $vp, $l; + output( $vp, $l ); if ( $carg->{out_format} eq 'tree' ) { push @fl, $l; From codesite-noreply at google.com Sun Oct 11 21:18:30 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Sun, 11 Oct 2009 20:18:30 +0000 Subject: [padb-devel] [padb] r286 committed - Replace lots of instances of $# ... +1 with a more simple... Message-ID: <001636ed6b413eefc80475ae844b@google.com> Revision: 286 Author: apittman Date: Sun Oct 11 13:17:51 2009 Log: Replace lots of instances of $# ... +1 with a more simple @... http://code.google.com/p/padb/source/detail?r=286 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Sun Oct 11 12:33:54 2009 +++ /trunk/src/padb Sun Oct 11 13:17:51 2009 @@ -2480,7 +2480,7 @@ my $cmd = "mpdrun -machinefile $fn -np $i"; - my $hosts = $#hosts + 1; + my $hosts = @hosts; return ( $cmd, $d->{$job}{lastproc} + 1, $hosts ); } @@ -2530,13 +2530,13 @@ my @elems = split qr{\s*\|\s*}, $l; - if ( $#elems == 3 ) { + if ( @elems == 4 ) { my $nprocs = $elems[3]; my $name = $elems[0]; if ( $name =~ m{\A\[(\d+)\,\d+]\z}x ) { $open_jobs{$1}{nprocs} = $nprocs; } - } elsif ( $#elems == 5 ) { + } elsif ( @elems == 6 ) { my $name = $elems[1]; if ( $name =~ m{\A\[\[(\d+)\,\d+\]\,(\d+)\]}x ) { my $job = $1; @@ -2579,7 +2579,7 @@ my $prefix = find_ompi_prefix(); my $cmd = "orterun -machinefile $fn -np $i $prefix"; - my $hosts = $#hosts + 1; + my $hosts = @hosts; return ( $cmd, $open_jobs{$job}{nprocs}, $hosts, $open_jobs{$job}{ranks} ); } @@ -2773,7 +2773,7 @@ push @ok, $res; } } - if ( $#ok != 0 ) { + if ( @ok != 1 ) { print "Error, multiple resource managers detected, use -Ormgr=\n"; push @ok, 'local-fd'; @@ -2823,13 +2823,13 @@ } # One resource manager is installed, good. - if ( $#installed == 0 ) { + if ( @installed == 1 ) { setup_rmgr( $installed[0] ); return; } # No resource managers are installed, bad. - if ( $#installed == -1 ) { + if ( @installed == 0 ) { print "Error, multiple resource managers detected, use -Ormgr=\n"; push @installed, 'local-fd'; @@ -2841,13 +2841,13 @@ my @active; foreach my $res (@installed) { my @jobs = $rmgr{$res}{get_active_jobs}($user); - if ( $#jobs != -1 ) { + if ( @jobs != 0 ) { push @active, $res; } } # Only one resource manager has active jobs, let's use it. - if ( $#active == 0 ) { + if ( @active == 1 ) { setup_rmgr( $active[0] ); return; } @@ -3230,8 +3230,7 @@ $tree->{$line}{min} = $tag; } $tree->{$line}{count}++; - my $rem = $#{$output}; - if ( $rem >= 0 ) { + if ( @{$output} > 0 ) { add_tag_to_tree( \%{ $tree->{$line}{desc} }, $tag, $output ); } return; @@ -3254,7 +3253,7 @@ foreach my $peer (@peers) { my $vpspec = rng_convert_to_user( $tree->{$peer}->{range} ); - if ( $#peers != 0 or $parent ne $vpspec or $enforce_spec ) { + if ( @peers != 1 or $parent ne $vpspec or $enforce_spec ) { $ret .= "$indent-----------------\n"; $ret .= "$indent$vpspec ($tree->{$peer}->{count} processes)\n"; $ret .= "$indent-----------------\n"; @@ -3562,7 +3561,7 @@ while ( @{$a} ) { foreach my $joint (@joints) { my @children = splice @{$a}, 0, $width; - if ( $#children > -1 ) { + if ( @children > 0 ) { push @leaves, @children; @{ $comm_tree{$joint}{children} } = @children; } @@ -3669,7 +3668,7 @@ sub next_command { my $comm_data = shift; - if ( $#commands == -1 ) { + if ( @commands == 0 ) { return last_command(); } @@ -3914,7 +3913,7 @@ # Children connect back with "Hello $outerkey $hostname $port $innernkey"; my @words = split $SPACE, $line; - if ( $#words != 4 or $words[0] ne 'Hello' or $words[1] ne $secret ) { + if ( @words != 5 or $words[0] ne 'Hello' or $words[1] ne $secret ) { print "Bad signon $line\n"; return 0; } @@ -3934,7 +3933,7 @@ sub inner_stdout_cb { my ( $comm_data, $cdata, $line ) = @_; my @words = split $SPACE, $line; - if ( $#words == 3 and $words[0] eq 'connect' ) { + if ( @words == 4 and $words[0] eq 'connect' ) { handle_signon( $comm_data, $words[1], $words[2], $words[3] ); return; @@ -4093,7 +4092,7 @@ my ($rg) = @_; # Return undef if this range is empty. - return if ( $#{$rg} == -1 ); + return if ( @{$rg} == 0 ); my $value = $rg->[0]->{l}; if ( $rg->[0]->{l} == $rg->[0]->{u} ) { @@ -4395,7 +4394,7 @@ } my @l = slurp_file($file); - if ( $#l != 0 ) { + if ( @l != 1 ) { return; } if ( $l[0] =~ m{\Asecret=([\w\d\.]+)\Z}x ) { @@ -4704,7 +4703,7 @@ if ( $working eq 'yes' ) { print "$r: "; my @jobs = $rmgr{$res}{get_active_jobs}($user); - if ( $#jobs > -1 ) { + if ( @jobs > 0 ) { my $j = join q{ }, sort { $a <=> $b } @jobs; print "jobs($j)\n"; } else { @@ -4782,12 +4781,12 @@ } if ( $all or $any ) { - if ( $#ARGV != -1 ) { + if ( @ARGV != 0 ) { cmdline_error( "$prog: Error: --all incompatible with specific ids\n"); } } elsif ( !$input_file ) { - if ( $#ARGV == -1 ) { + if ( @ARGV == 0 ) { cmdline_error( "$prog: Error: no jobs specified, use --all or jobids\n"); } @@ -4837,13 +4836,13 @@ find_any_rmgr($user); @jobids = get_all_jobids($user); - printf "Active jobs (%d) are @jobids\n", $#jobids + 1 + printf "Active jobs (%d) are @jobids\n", @jobids if $conf{verbose}; - if ( $#jobids == -1 ) { + if ( @jobids == 0 ) { printf "No active jobs could be found for user '$user'\n"; exit 1; } - if ( $any && $#jobids != 0 ) { + if ( $any && @jobids > 1 ) { printf "More than 1 active job (@jobids) for user '$user'\n"; exit 1; } @@ -4859,7 +4858,7 @@ } } - if ( $#jobids > 0 and $watch ) { + if ( @jobids > 1 and $watch ) { print "Cannot use --watch with more than one job\n"; exit 1; } @@ -4867,7 +4866,7 @@ foreach my $jobid (@jobids) { print "\nCollecting information for job '$jobid'\n\n" - if ( $conf{verbose} or ( $#jobids > 0 ) ); + if ( $conf{verbose} or ( @jobids > 1 ) ); my $of; $of = 'tree' if $output_tree; @@ -5700,7 +5699,7 @@ my $are = 'are'; my $have = 'have'; - if ( $#identical == 0 ) { + if ( @identical == 1 ) { $members = 'member'; $are = 'is'; $have = 'has'; @@ -5723,7 +5722,7 @@ my %tg; - if ( $#target_groups != -1 ) { + if ( @target_groups != 0 ) { foreach my $gid (@target_groups) { $tg{$gid}++; } @@ -5742,7 +5741,7 @@ $gid = "$gd->{id}($gd->{ranks}{0})"; } - if ( $#target_groups != -1 ) { + if ( @target_groups != 0 ) { next unless defined $tg{$gid}; } @@ -5769,7 +5768,7 @@ foreach my $gid ( sort keys %ad ) { - if ( $#target_groups != -1 ) { + if ( @target_groups != 0 ) { next unless defined $tg{$gid}; } @@ -5832,7 +5831,7 @@ push @inactive, $ident; } } - if ( $#inactive != -1 ) { + if ( @inactive != 0 ) { $ret .= $gstr . mpi_go_deadlock_detect_helper( 'not in a call to the collectives', @@ -6052,7 +6051,7 @@ foreach my $thread ( sort { $a->{id} <=> $b->{id} } @threads ) { my @frames = @{ $thread->{frames} }; - print "ThreadId: $thread->{id}\n" if ( $#threads != 0 ); + print "ThreadId: $thread->{id}\n" if ( @threads != 1 ); for ( my $i = $#frames ; $i >= 0 ; $i-- ) { my $frame = $frames[$i]; @@ -6653,7 +6652,7 @@ next unless defined $thread->{frames}; my @frames = @{ $thread->{frames} }; - output( $vp, "ThreadId: $thread->{id}" ) if ( $#threads != 0 ); + output( $vp, "ThreadId: $thread->{id}" ) if ( @threads != 1 ); my $strip_below; @@ -6967,7 +6966,7 @@ @mq = fetch_mpi_queue_gdb( $carg, $vp, $pid, $gdb ); - if ( $#mq == 0 ) { + if ( @mq == 0 ) { $good = ','; } else { foreach my $o (@mq) { @@ -7447,7 +7446,7 @@ my @children = @{ $cmd->{connection_tree}{ $inner_conf{hostname} }{children} }; - $netdata->{children} = $#children + 1; + $netdata->{children} = @children; # Only one child is tested so far. foreach my $chostname (@children) { @@ -8057,7 +8056,7 @@ common_main(); -if ( $#ARGV >= 0 and $ARGV[0] eq '--inner' ) { +if ( @ARGV > 0 and $ARGV[0] eq '--inner' ) { shift @ARGV; inner_main(); } else { From codesite-noreply at google.com Mon Oct 12 11:51:14 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Mon, 12 Oct 2009 10:51:14 +0000 Subject: [padb-devel] [padb] r287 committed - Add braces to the last remaining local function call. Message-ID: <001636d339036078b50475bab547@google.com> Revision: 287 Author: apittman Date: Mon Oct 12 03:50:51 2009 Log: Add braces to the last remaining local function call. http://code.google.com/p/padb/source/detail?r=287 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Sun Oct 11 13:17:51 2009 +++ /trunk/src/padb Mon Oct 12 03:50:51 2009 @@ -6119,7 +6119,7 @@ sub run_command { my ( $vp, $cmd ) = @_; - open my $CMDS, '-|', "$cmd" or p_die $vp, 'cant fork subcommand'; + open my $CMDS, '-|', "$cmd" or p_die( $vp, 'cant fork subcommand' ); while (<$CMDS>) { chomp $_; output( $vp, $_ ); From codesite-noreply at google.com Mon Oct 12 12:02:31 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Mon, 12 Oct 2009 11:02:31 +0000 Subject: [padb-devel] [padb] r288 committed - Remove function line numbers from the report so that reports ... Message-ID: <001636b2b3ffc4952c0475badd7e@google.com> Revision: 288 Author: apittman Date: Mon Oct 12 04:01:26 2009 Log: Remove function line numbers from the report so that reports from different revisions can be diffed and only show actual diffs. http://code.google.com/p/padb/source/detail?r=288 Modified: /trunk/src/report.pl ======================================= --- /trunk/src/report.pl Thu Oct 1 12:38:38 2009 +++ /trunk/src/report.pl Mon Oct 12 04:01:26 2009 @@ -84,7 +84,7 @@ } foreach my $fn (@fns) { - printf("Function: $fn $fns{$fn}{start}\n"); + printf("Function: $fn\n"); if ( defined $refs{$fn} ) { printf("\tIs dereferenced\n"); From codesite-noreply at google.com Thu Oct 15 14:31:26 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 15 Oct 2009 13:31:26 +0000 Subject: [padb-devel] [padb] r289 committed - Tighten up integration between padb and minfo, the inner padb processe... Message-ID: <0016367b63e6d11a7d0475f94bd7@google.com> Revision: 289 Author: apittman Date: Thu Oct 15 06:30:47 2009 Log: Tighten up integration between padb and minfo, the inner padb processes no longer just forward all output to the outer process but rather parse it locally into a type they can understand. Forward and error or user messages from the dll or minfo back to the user and present them in a useful way. Once the inner process has parsed the information from minfo it re-formats it into the format it was before for the outer process, next on this list is to try and improve this formatting and to make the collective detection code use reduced key/pair values rather than arrays of strings. http://code.google.com/p/padb/source/detail?r=289 Modified: /trunk/src/minfo.c /trunk/src/padb ======================================= --- /trunk/src/minfo.c Thu Jun 4 11:43:09 2009 +++ /trunk/src/minfo.c Thu Oct 15 06:30:47 2009 @@ -26,6 +26,43 @@ char name[128]; int size; }; + +char *(*es)(int errorcode); + +void show_string (char *desc, char *str) +{ + printf("zzz: str:%d %s\n%s\n", + strlen(str), + desc, + str); +} + +void show_warning (const char *msg) +{ + show_string("warning",(char *)msg); +} + +void show_dll_error_code (int res) +{ + char *msg; + msg = es(res); + show_string("dllerror",msg); +} + +void die (char *msg) +{ + show_string("dmsg",msg); + show_string("exit","die"); + + fflush(NULL); + exit(0); +} + +void die_with_code (int res, char *msg) +{ + show_dll_error_code(res); + die(msg); +} #define QUERY_SIZE 1280 @@ -73,7 +110,7 @@ void show_msg (const char *msg) { - printf("message from DLL:%s\n",msg); + show_string("dlldebugmessage",(char *)msg); } char *get_msg (int msg ) @@ -196,7 +233,7 @@ { struct process *p = (struct process *)process; if ( p->rank == -1 ) - printf("Warning, DLL called find_rank before setup_process!\n"); + show_warning("DLL called find_rank before setup_process!"); return p->rank; } @@ -291,27 +328,31 @@ strncpy(local,ans,QUERY_SIZE); return 0; } - -void die (char *msg) -{ - printf("Error: %s\n",msg); - fflush(NULL); - exit(1); -} int msgid = 0; int show_comm (struct process *p,mqs_communicator *comm) { static int c = 0; - printf("comm%d: name: '%s'\n", - c,comm->name); - printf("comm%d: rank: '%d'\n", - c,(int)comm->local_rank); - printf("comm%d: size: '%d'\n", - c,(int)comm->size); - printf("comm%d: id: '%p'\n", - c,(void *)comm->unique_id); + if ( comm->local_rank >= 0 ) + printf("out: c:%d rank:%d\n", + c, + (int)comm->local_rank); + + printf("out: c:%d size:%d\n", + c, + (int)comm->size); + + printf("out: c:%d str:%d name\n%s\n", + c, + strlen(comm->name), + comm->name); + + printf("out: c:%d id:%ld\n", + c, + comm->unique_id); + + msgid=0; return c++; /* This is not a political statement although if it was I'd stand by it */ } @@ -399,7 +440,7 @@ int (*sp)(mqs_process *process,mqs_process_callbacks *pcb); int (*phq)(mqs_process *process, char **msg); void (*ucl)(mqs_process *process); - char *(*es)(int errorcode); + int (*sci)(mqs_process *process); int (*gc)(mqs_process *process, mqs_communicator *comm); int (*nc)(mqs_process *process); @@ -417,7 +458,7 @@ char dll[PATH_MAX]; void *base = find_sym("sym","MPIR_dll_name"); if ( ! base ) { - die("Could not find dll_name symbol"); + die("Could not find MPIR_dll_name symbol"); } fetch_string(NULL,&dll[0],(mqs_taddr_t)base,PATH_MAX); dlhandle = dlopen(dll,RTLD_NOW); @@ -466,25 +507,17 @@ res = si((mqs_image *)&i,&icb); if ( res != mqs_ok ) { - die("Failed mqs_setup_image"); + die_with_code(res,"setup_image() failed"); } { char *m = NULL; res = ihq((mqs_image *)&i,&m); if ( m ) { - char image[QUERY_SIZE]; - if ( fetch_image(image) == 0 ) { - printf(m,image); - printf("\n"); - } else - printf("%s\n",m); + show_string("ihqm",m); } if ( res != mqs_ok ) { - char *msg; - msg = es(res); - printf("message from DLL %d '%s'\n",res,msg); - die("Failed image_has_queues"); + die_with_code(res,"image_has_queues() failed"); } } @@ -498,7 +531,7 @@ res = sp((mqs_process *)&p,&pcb); if ( res != mqs_ok ) { - die("Failed mqs_setup_process"); + die_with_code(res,"mqs_setup_process() failed"); } if ( gr ) { @@ -512,12 +545,9 @@ char *m = NULL; res = phq((mqs_process *)&p,&m); if ( m ) - printf("%s\n",m); + show_string("phqm",m); if ( res != mqs_ok ) { - char *msg; - msg = es(res); - printf("%s\n",msg); - die("Failed process has_queue"); + die_with_code(res,"process_has_queue() failed"); } } @@ -525,7 +555,7 @@ res = sci((mqs_process *)&p); if ( res != mqs_ok ) { - die("Failed sci"); + die_with_code(res,"setup_communicator_iterator() failed"); } do { @@ -533,11 +563,7 @@ res = gc((mqs_process *)&p,&comm); if ( res != mqs_ok ) { - char *msg; - msg = es(res); - printf("gc returned %d\n",res); - printf("%s\n",msg); - die("gc"); + die_with_code(res,"get_communicator() failed"); } if ( res == mqs_ok ) { @@ -553,7 +579,7 @@ if ( r == mqs_ok ) { int i; for ( i = 0 ; i < comm.size ; i++ ) { - printf("comm%d: Rank: local %d global %d\n",c,i,ranks[i]); + printf("out: c:%d rt:%d\n",c,ranks[i]); } } free(ranks); @@ -583,12 +609,15 @@ load_ops((mqs_process *)&p,mqs_pending_sends); } + printf("done\n" + ); nres = nc((mqs_process *)&p); } } while ( res == mqs_ok && nres == mqs_ok ); - + + show_string("exit","ok"); return 0; } ======================================= --- /trunk/src/padb Mon Oct 12 03:50:51 2009 +++ /trunk/src/padb Thu Oct 15 06:30:47 2009 @@ -2669,7 +2669,7 @@ foreach my $pid ( get_process_list($user) ) { my $name = find_from_status( $pid, "Name" ); - if ( defined $mpirun{$name} ) { + if ( defined $name and defined $mpirun{$name} ) { push @jobs, $pid; next; } @@ -3880,7 +3880,8 @@ } if ( defined $allfns{$mode}{out_handler} ) { - $allfns{$mode}{out_handler}( $conf{mode_options}{$mode}, $d ); + $allfns{$mode}{out_handler}( $conf{mode_options}{$mode}, $d, + $comm_data->{current_req} ); } else { default_output_handler( $comm_data->{current_req}, $d ); } @@ -4836,7 +4837,7 @@ find_any_rmgr($user); @jobids = get_all_jobids($user); - printf "Active jobs (%d) are @jobids\n", @jobids + printf "Active jobs (%d) are @jobids\n", $#jobids + 1 if $conf{verbose}; if ( @jobids == 0 ) { printf "No active jobs could be found for user '$user'\n"; @@ -5468,7 +5469,6 @@ tracepid => -1, attached => 0, }; - my @mq; my $cmd = $inner_conf{minfo}; $h->{hpid} = open3( $h->{fd}{wtr}, $h->{fd}{rdr}, $h->{fd}{err}, $cmd ) @@ -5480,18 +5480,123 @@ my %stats; + # Communicator data. + my %cd; + + my %global; + + $global{exit} = 'unknown'; + + my @cd; + my $bytes_to_read; + my $str_name; + my $str_value = $EMPTY_STRING; + my $str_global; while (<$handle>) { my $r = $_; + + if ( defined $bytes_to_read ) { + $str_value .= $r; + if ( length $str_value eq $bytes_to_read + 1 ) { + chomp $str_value; + if ($str_global) { + $global{$str_name} = $str_value; + + if ( $str_name eq 'ihqm' ) { + my $image = readlink "/proc/$gdb->{tracepid}/exe"; + $str_value =~ s{%s}{$image}; + } + + if ( $str_name ne 'exit' and $str_name ne 'dmsg' ) { + + # Report the string back to the outer process, + # don't bother forwarding exit status as that's + # done below. + target_key_pair( $vp, "minfo_msg_$str_name", + $str_value ); + target_key_pair( $vp, "minfo_msg", $str_name ); + } + } else { + $cd{$str_name} = $str_value; + } + $bytes_to_read = undef; + $str_value = ""; + } + next; + } + chomp $r; - if ( $r =~ m{\Areq:}x ) { + my $cmd = substr $r, 0, 4; + if ( $cmd eq 'req:' ) { my $res = minfo_handle_query( $gdb, $vp, $r, \%stats ); # Some things *do* fail here, symbol lookups for example, # and we don't need to report it. print {$out} "$res\n"; + } elsif ( $cmd eq 'out:' ) { + if ( + $r =~ m{\A + out: + [ ] + c:(\d+) + [ ] + (\w+): + (\d+) + [ ]? + (\w+)? + \z + }x + ) + { + my $cid = $1; + my $key = $2; + my $value = $3; + my $name = $4; + + if ( $key eq 'str' ) { + $bytes_to_read = $value; + $str_name = $name; + $str_global = 0; + } elsif ( $key eq 'rt' ) { + push @{ $cd{rt} }, $value; + } else { + $cd{$key} = $value; + $cd{mid} = $cid; + } + } else { + target_key_pair( $vp, "UNPARSEABLE MINFO", $r ); + } + } elsif ( $cmd eq 'zzz:' ) { + if ( + $r =~ m{\A + zzz: + [ ] + (\w+): + (\d+) + [ ]? + (\w+)? + \z + }x + ) + { + my $key = $1; + my $length = $2; + my $name = $3; + + if ( $key eq 'str' ) { + $bytes_to_read = $length; + $str_name = $name; + $str_global = 1; + } + } else { + target_key_pair( $vp, "UNPARSEABLE MINFO", $r ); + } + } elsif ( $cmd eq 'done' ) { + push @cd, dclone( \%cd ); + undef %cd; } else { - push @mq, $r; + push @{ $cd{raw} }, $r; } } @@ -5509,13 +5614,44 @@ return; } - if ( $? != 0 ) { - - # Bad exit code but we did talk to it so run with what we have. - target_error( $vp, - "Error running $inner_conf{minfo}: Bad exit code $?" ); + if ( $global{exit} ne 'ok' ) { + if ( $global{exit} eq 'die' ) { + target_error( $vp, + "Error message from $inner_conf{minfo}: $global{dmsg}" ); + + } else { + target_error( $vp, + "Error running $inner_conf{minfo}: Bad exit code $?" ); + } } + return minfo_to_array( \@cd ); + +} + +sub minfo_to_array { + my ($cd) = @_; + + my @mq; + foreach my $comm ( @{$cd} ) { + + #print Dumper $comm; + push @mq, "comm$comm->{mid}: name: '$comm->{name}'"; + if ( defined $comm->{rank} ) { + push @mq, "comm$comm->{mid}: rank: '$comm->{rank}'"; + } + push @mq, "comm$comm->{mid}: size: '$comm->{size}'"; + my $id = sprintf( "%#Lx", $comm->{id} ); + push @mq, "comm$comm->{mid}: id: '$id'"; + + for my $i ( 0 .. $#{ $comm->{rt} } ) { + push @mq, "comm$comm->{mid}: Rank: local $i global $comm->{rt}[$i]"; + } + + foreach my $l ( @{ $comm->{raw} } ) { + push @mq, $l; + } + } return @mq; } @@ -5686,6 +5822,45 @@ } return $ret; } + +sub mpi_queue_output_handler { + my ( $carg, $lines, $three ) = @_; + + my %headers = ( + ihqm => 'Message from DLL', + phqm => 'Message from DLL', + dllerror => 'Error string from DLL', + warning => 'Warning message from minfo', + dlldebugmessage => 'Debug message from DLL', + ); + + if ( exists $lines->{target_data}{minfo_msg} ) { + my @keys = sort keys %{ $lines->{target_data}{minfo_msg} }; + + foreach my $key (@keys) { + my @values = keys %{ $lines->{target_data}{"minfo_msg_$key"} }; + my $head; + if ( defined $headers{$key} ) { + $head = $headers{$key}; + } else { + $head = "Message from minfo/dll using unknown key: '$key'"; + } + foreach my $value ( sort @values ) { + printf("----------------\n"); + printf( + + rng_convert_to_user( + $lines->{target_data}{"minfo_msg_$key"}{$value} + ) . + ": $head\n"); + printf("----------------\n"); + printf( "%s\n", $value ); + } + } + } + + default_output_handler( $three, $lines ); +} sub mpi_go_deadlock_detect_helper { my $str = shift; # tagged onto the end of the line. @@ -5727,6 +5902,8 @@ $tg{$gid}++; } } + + my $no_data = 0; foreach my $process ( keys %{$cd} ) { my $rd = $cd->{$process}; @@ -5750,6 +5927,9 @@ } $ad{$gid}{size} = $gd->{size}; $ad{$gid}{name} = $gd->{name}; + if ( not exists $gd->{coll} ) { + $no_data++; + } foreach my $coll ( keys %{ $gd->{coll} } ) { my $count = $gd->{coll}{$coll}{count}; if ( defined $gd->{coll}{$coll}{active} ) { @@ -5843,13 +6023,21 @@ my $count = keys %ad; + if ( $count eq $no_data ) { + $ret .= + "Total: $count communicators, no communication data recorded.\n"; + return $ret; + } + if ( $count == 1 ) { my $use_str = ( $i_count == 1 ) ? $EMPTY_STRING : ' not'; - $ret .= "Total: $count group which is$use_str in use.\n"; + $ret .= "Total: $count communicators which is$use_str in use.\n"; } else { my $i_str = ( $i_count == 1 ) ? 'is' : 'are'; - $ret .= "Total: $count groups of which $i_count $i_str in use.\n"; - } + $ret .= + "Total: $count communicators of which $i_count $i_str in use.\n"; + } + $ret .= "No data was recorded for $no_data communicators\n"; return $ret; } @@ -5887,7 +6075,7 @@ } elsif ( $line =~ /^msg\d+/ ) { ; # nop } else { - print "Failed to match minfo output: $line\n"; + #print "Failed to match minfo output: $line\n"; } } $coll_data{$rank} = \%lid; @@ -6784,8 +6972,12 @@ my %remote_env = get_remote_env($pid); if ( defined $remote_env{LD_LIBRARY_PATH} ) { - $ENV{LD_LIBRARY_PATH} = - "$remote_env{LD_LIBRARY_PATH}:$inner_conf{myld}"; + if ( defined $inner_conf{myld} ) { + $ENV{LD_LIBRARY_PATH} = + "$remote_env{LD_LIBRARY_PATH}:$inner_conf{myld}"; + } else { + $ENV{LD_LIBRARY_PATH} = "$remote_env{LD_LIBRARY_PATH}"; + } } my $cmd = "$inner_conf{edb} --queues --pid=$pid"; @@ -7852,12 +8044,13 @@ # Sort out secondary and options_i so they are handled in the same way. $allfns{queue} = { - arg_long => 'message-queue', - qsnet => 1, - arg_short => 'q', - handler => \&show_queue, - help => 'Show the message queues', - options_i => { mpi_dll => undef, } + out_handler => \&mpi_queue_output_handler, + arg_long => 'message-queue', + qsnet => 1, + arg_short => 'q', + handler => \&show_queue, + help => 'Show the message queues', + options_i => { mpi_dll => undef, } }; From codesite-noreply at google.com Thu Oct 15 17:39:15 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Thu, 15 Oct 2009 16:39:15 +0000 Subject: [padb-devel] [padb] r290 committed - When displaying a list of variables with type names ensure that the va... Message-ID: <001485f7c3a487786e0475fbebb1@google.com> Revision: 290 Author: apittman Date: Thu Oct 15 09:38:57 2009 Log: When displaying a list of variables with type names ensure that the variable names are right justified and that they are no further to the right than they need to be. Apply this logic to both tree and normal stack traces, tree stacks were all over the place, normal ones were justified but always left a space between the longest type and the longest var, now we just ensure that there is at least one space between each type and it's matching var. http://code.google.com/p/padb/source/detail?r=290 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 15 06:30:47 2009 +++ /trunk/src/padb Thu Oct 15 09:38:57 2009 @@ -3235,6 +3235,35 @@ } return; } + +# Calculate the formatting needed for displaying a list of variables and +# their types allowing them to be neatly formatted on the screen. Take an +# array of hashes and combine {name} and {type} into a new value +# {type_name} which is the same length as {type_name} for all other entries +# in the array. +sub format_local_vars { + my ($list) = @_; + + my $max = 0; + foreach my $var ( @{$list} ) { + my $name_len = length $var->{name}; + my $type_len = length $var->{type}; + if ( $name_len + $type_len > $max ) { + $max = $name_len + $type_len; + } + } + + $max++; + + foreach my $var ( @{$list} ) { + my $name_len = length $var->{name}; + my $type_len = length $var->{type}; + my $pad = $max - ( $name_len + $type_len ); + $var->{type_name} = $var->{type} . " " x $pad . $var->{name}; + } + + return; +} sub _display_tree { my ( $tree, $d, $parent, $indent, $path, $enforce_spec ) = @_; @@ -3274,34 +3303,44 @@ my @params = split $COMMA, $params_lists[0]; $ret .= "$indent params\n" if ( @params > 0 ); + + my @all_vars; foreach my $var (@params) { - + my @type_list = + sort keys %{ $d->{target_data}{"$peer|param_type| $var"} }; + my $type = $type_list[0]; + push @all_vars, + { + name => $var, + type => $type, + }; + } + + format_local_vars( \@all_vars ); + + foreach my $vref (@all_vars) { + my $var = $vref->{name}; my $key = "$l|var|$var"; my @values = keys %{ $d->{target_data}{$key} }; - - my $type = ''; - - my @type_list = - sort keys %{ $d->{target_data}{"$peer|param_type| $var"} }; - $type = $type_list[0]; + my $type = $vref->{type}; $child_enforce_spec = 1; if ( @values == 1 ) { foreach my $value ( sort @values ) { $ret .= - "$indent $type $var = '$value' " + "$indent $vref->{type_name} = '$value' " . rng_convert_to_user( $d->{target_data}{$key}{$value} ) . "\n"; } } elsif ( @values > $max_show ) { $ret .= -"$indent $type $var: \n"; +"$indent $vref->{type_name}: \n"; } else { - $ret .= "$indent $type $var:\n"; + $ret .= "$indent $vref->{type_name}:\n"; foreach my $value ( sort @values ) { $ret .= - "$indent '$value' " + "$indent '$value' " . rng_convert_to_user( $d->{target_data}{$key}{$value} ) . "\n"; @@ -3320,34 +3359,44 @@ my @locals = split $COMMA, join( q{,}, @locals_lists ); $ret .= "$indent locals\n" if ( @locals > 0 ); + + my @all_vars; foreach my $var (@locals) { - + my @type_list = + sort keys %{ $d->{target_data}{"$peer|var_type| $var"} }; + my $type = $type_list[0]; + push @all_vars, + { + name => $var, + type => $type, + }; + } + + format_local_vars( \@all_vars ); + + foreach my $vref (@all_vars) { + my $var = $vref->{name}; my $key = "$l|var|$var"; my @values = keys %{ $d->{target_data}{$key} }; - - my $type = ''; - - my @type_list = - sort keys %{ $d->{target_data}{"$peer|var_type| $var"} }; - $type = $type_list[0]; + my $type = $vref->{type}; $child_enforce_spec = 1; if ( @values == 1 ) { foreach my $value ( sort @values ) { $ret .= - "$indent $type $var = '$value' " + "$indent $vref->{type_name} = '$value' " . rng_convert_to_user( $d->{target_data}{$key}{$value} ) . "\n"; } } elsif ( @values > $max_show ) { $ret .= -"$indent $type $var: \n"; +"$indent $vref->{type_name}: \n"; } else { - $ret .= "$indent $type $var:\n"; + $ret .= "$indent $vref->{type_name}:\n"; foreach my $value ( sort @values ) { $ret .= - "$indent '$value' " + "$indent '$value' " . rng_convert_to_user( $d->{target_data}{$key}{$value} ) . "\n"; @@ -5509,9 +5558,9 @@ if ( $str_name ne 'exit' and $str_name ne 'dmsg' ) { - # Report the string back to the outer process, - # don't bother forwarding exit status as that's - # done below. + # Report the string back to the outer process, + # don't bother forwarding exit status as that's + # done below. target_key_pair( $vp, "minfo_msg_$str_name", $str_value ); target_key_pair( $vp, "minfo_msg", $str_name ); @@ -5848,11 +5897,12 @@ foreach my $value ( sort @values ) { printf("----------------\n"); printf( - - rng_convert_to_user( + + rng_convert_to_user( $lines->{target_data}{"minfo_msg_$key"}{$value} - ) . - ": $head\n"); + ) + . ": $head\n" + ); printf("----------------\n"); printf( "%s\n", $value ); } @@ -6075,7 +6125,8 @@ } elsif ( $line =~ /^msg\d+/ ) { ; # nop } else { - #print "Failed to match minfo output: $line\n"; + + #print "Failed to match minfo output: $line\n"; } } $coll_data{$rank} = \%lid; @@ -6700,18 +6751,12 @@ my ( $vp, $frame, $type ) = @_; return unless defined $frame->{$type}; return if ( @{ $frame->{$type} } == 0 ); - my %l = ( t => 0, n => 0 ); - foreach my $arg ( @{ $frame->{$type} } ) { - $l{t} = length $arg->{type} if ( length $arg->{type} > $l{t} ); - $l{n} = length $arg->{name} if ( length $arg->{name} > $l{n} ); - } - my $header = " $type:"; - output( $vp, $header ); + + format_local_vars( $frame->{$type} ); + output( $vp, " $type:" ); foreach my $arg ( @{ $frame->{$type} } ) { my $value = ( defined $arg->{value} ? $arg->{value} : '??' ); - my $output = sprintf " %-$l{t}s %$l{n}s = %s", $arg->{type}, - $arg->{name}, $value; - output( $vp, $output ); + output( $vp, " $arg->{type_name} = $value" ); } return; } From codesite-noreply at google.com Fri Oct 16 11:32:16 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Fri, 16 Oct 2009 10:32:16 +0000 Subject: [padb-devel] [padb] r291 committed - More code cleanups, mostly removing braces and replacing double quotes... Message-ID: <00504502eccaf14ebf04760ae86a@google.com> Revision: 291 Author: apittman Date: Fri Oct 16 03:31:48 2009 Log: More code cleanups, mostly removing braces and replacing double quotes with single quotes. http://code.google.com/p/padb/source/detail?r=291 Modified: /trunk/src/padb /trunk/src/report.pl ======================================= --- /trunk/src/padb Thu Oct 15 09:38:57 2009 +++ /trunk/src/padb Fri Oct 16 03:31:48 2009 @@ -481,12 +481,12 @@ $Data::Dumper::Quotekeys = 0; sub check_and_convert_bool { - my ($str) = @_; - my @yes = qw(1 yes on enabled); - my @no = qw(0 no off disabled); + my ($str) = @_; + my @enabled = qw(1 yes on true enabled); + my @disabled = qw(0 no off false disabled); my %bool_table; - map { $bool_table{$_} = 1 } @yes; - map { $bool_table{$_} = 0 } @no; + map { $bool_table{$_} = 1 } @enabled; + map { $bool_table{$_} = 0 } @disabled; if ( defined $bool_table{$str} ) { return $bool_table{$str}; @@ -2304,7 +2304,7 @@ # Query the nodecount for the "job" as that is what we shall be running on. sub slurm_job_to_nodecount { my $job = shift; - my @jobs = slurp_cmd("squeue -o %i,%D"); + my @jobs = slurp_cmd('squeue -o %i,%D'); chomp @jobs; foreach my $step (@jobs) { @@ -2497,9 +2497,8 @@ ############################################################################### sub find_ompi_prefix { - my $name = "ompi-ps"; foreach my $dir ( split $COLON, $ENV{PATH} ) { - next unless ( -x "$dir/$name" ); + next unless ( -x "$dir/ompi-ps" ); my @d = split "/", $dir; pop @d; my $prefix = join q{/}, @d; @@ -2668,7 +2667,7 @@ map { $mpirun{$_}++ } split $COMMA, $conf{mpirun}; foreach my $pid ( get_process_list($user) ) { - my $name = find_from_status( $pid, "Name" ); + my $name = find_from_status( $pid, 'Name' ); if ( defined $name and defined $mpirun{$name} ) { push @jobs, $pid; next; @@ -2687,7 +2686,7 @@ my ($job) = @_; if ( not find_exe('pdsh') ) { - printf("mpirun resource manager requires pdsh to be installed\n"); + print "mpirun resource manager requires pdsh to be installed\n"; return; } @@ -2696,7 +2695,7 @@ return; } - my $nprocs = gdb_read_value( $gdb, "MPIR_proctable_size" ); + my $nprocs = gdb_read_value( $gdb, 'MPIR_proctable_size' ); my %pt; foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { @@ -3124,7 +3123,7 @@ foreach my $hash (@all) { my @res; foreach my $key (@proc_format_array) { - my $value = "??"; + my $value = '??'; if ( defined $hash->{$key} ) { $value = $hash->{$key}; } @@ -3218,7 +3217,7 @@ sub add_tag_to_tree { my ( $tree, $tag, $output ) = @_; - my $line = shift( @{$output} ); + my $line = shift @{$output}; if ( not defined $tree->{$line}{range} ) { $tree->{$line}{range} = rng_create_empty(); @@ -3259,7 +3258,7 @@ my $name_len = length $var->{name}; my $type_len = length $var->{type}; my $pad = $max - ( $name_len + $type_len ); - $var->{type_name} = $var->{type} . " " x $pad . $var->{name}; + $var->{type_name} = $var->{type} . q{ } x $pad . $var->{name}; } return; @@ -3457,7 +3456,7 @@ printf "%s\n", rng_convert_to_user($rng); print "----------------\n"; foreach my $data ( @{ $lines->{$tag} } ) { - print("$data\n"); + print "$data\n"; } } } elsif ( $output eq 'compress_c' ) { @@ -3534,7 +3533,7 @@ Proto => 'tcp', ) or die "Failed to connect to child ($host:$port)"; - print $socket "hello $word\n"; + print {$socket} "hello $word\n"; return $socket; } @@ -3780,7 +3779,7 @@ if ( not defined $data->{target_data}{FOUND}{yes} or rng_empty( $data->{target_data}{FOUND}{yes} ) ) { - printf "Warning, failed to locate any ranks\n"; + print "Warning, failed to locate any ranks\n"; return 1; } @@ -4411,17 +4410,17 @@ my $filename = "$ENV{HOME}/.padb-secret"; my $FD; if ( not open $FD, '>', $filename ) { - printf("Failed to create secret file: $!\n"); + print "Failed to create secret file: $!\n"; return; } if ( chmod( 0600, $FD ) != 1 ) { - printf("Failed to chmod secret file: $!\n"); + print "Failed to chmod secret file: $!\n"; return; } my $s = rand; print {$FD} "secret=$s\n"; close $FD; - print("Sucessfully created secret file ($filename)\n"); + print "Sucessfully created secret file ($filename)\n"; return; } @@ -4609,7 +4608,7 @@ my $max_len = 0; foreach my $key ( sort keys %{ $conf{mode_options}{$mode} } ) { - if ( length($key) > $max_len ) { + if ( length $key > $max_len ) { $max_len = length($key); } } @@ -4889,11 +4888,11 @@ printf "Active jobs (%d) are @jobids\n", $#jobids + 1 if $conf{verbose}; if ( @jobids == 0 ) { - printf "No active jobs could be found for user '$user'\n"; + print "No active jobs could be found for user '$user'\n"; exit 1; } if ( $any && @jobids > 1 ) { - printf "More than 1 active job (@jobids) for user '$user'\n"; + print "More than 1 active job (@jobids) for user '$user'\n"; exit 1; } } else { @@ -5450,9 +5449,8 @@ sub gdb_string { my ( $gdb, $len, $strp ) = @_; - my $offset = 0; - my $str = $EMPTY_STRING; - my @s = gdb_read_raw( $gdb, $strp, $len ); + my $str = $EMPTY_STRING; + my @s = gdb_read_raw( $gdb, $strp, $len ); return if ( not defined $s[0] ); foreach my $d (@s) { my $v = hex $d; @@ -5563,7 +5561,7 @@ # done below. target_key_pair( $vp, "minfo_msg_$str_name", $str_value ); - target_key_pair( $vp, "minfo_msg", $str_name ); + target_key_pair( $vp, 'minfo_msg', $str_name ); } } else { $cd{$str_name} = $str_value; @@ -5614,7 +5612,7 @@ $cd{mid} = $cid; } } else { - target_key_pair( $vp, "UNPARSEABLE MINFO", $r ); + target_key_pair( $vp, 'UNPARSEABLE MINFO', $r ); } } elsif ( $cmd eq 'zzz:' ) { if ( @@ -5639,7 +5637,7 @@ $str_global = 1; } } else { - target_key_pair( $vp, "UNPARSEABLE MINFO", $r ); + target_key_pair( $vp, 'UNPARSEABLE MINFO', $r ); } } elsif ( $cmd eq 'done' ) { push @cd, dclone( \%cd ); @@ -5690,7 +5688,7 @@ push @mq, "comm$comm->{mid}: rank: '$comm->{rank}'"; } push @mq, "comm$comm->{mid}: size: '$comm->{size}'"; - my $id = sprintf( "%#Lx", $comm->{id} ); + my $id = sprintf "%#Lx", $comm->{id}; push @mq, "comm$comm->{mid}: id: '$id'"; for my $i ( 0 .. $#{ $comm->{rt} } ) { @@ -5895,21 +5893,20 @@ $head = "Message from minfo/dll using unknown key: '$key'"; } foreach my $value ( sort @values ) { - printf("----------------\n"); - printf( - - rng_convert_to_user( - $lines->{target_data}{"minfo_msg_$key"}{$value} - ) - . ": $head\n" - ); - printf("----------------\n"); - printf( "%s\n", $value ); + print "----------------\n"; + printf + + rng_convert_to_user( + $lines->{target_data}{"minfo_msg_$key"}{$value} ) + . ": $head\n"; + print "----------------\n"; + printf "%s\n", $value; } } } default_output_handler( $three, $lines ); + return; } sub mpi_go_deadlock_detect_helper { ======================================= --- /trunk/src/report.pl Mon Oct 12 04:01:26 2009 +++ /trunk/src/report.pl Fri Oct 16 03:31:48 2009 @@ -14,7 +14,7 @@ my @builtin = qw(basename Dumper qw Configure dclone join defined delete print shift close accept confess A new thaw add - remove count flush handles printf split dirname); + remove count flush handles printf split dirname ref); my %builtin; From codesite-noreply at google.com Fri Oct 16 11:53:43 2009 From: codesite-noreply at google.com (codesite-noreply at google.com) Date: Fri, 16 Oct 2009 10:53:43 +0000 Subject: [padb-devel] [padb] r292 committed - Use gdb_string to read the hostname string in setup_mpirun_pcmd. Message-ID: <0016e68ee20fa96d5d04760b35d9@google.com> Revision: 292 Author: apittman Date: Fri Oct 16 03:52:39 2009 Log: Use gdb_string to read the hostname string in setup_mpirun_pcmd. http://code.google.com/p/padb/source/detail?r=292 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Fri Oct 16 03:31:48 2009 +++ /trunk/src/padb Fri Oct 16 03:52:39 2009 @@ -2699,14 +2699,14 @@ my %pt; foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { - my $hostd = gdb_read_value( $gdb, "MPIR_proctable[$proc].host_name" ); - - if ( $hostd =~ m{\"([\w\d\-\.]+)\"\z}x ) { - my $host = $1; - my $pid = gdb_read_value( $gdb, "MPIR_proctable[$proc].pid" ); + my $hostp = + gdb_read_value( $gdb, "(void *)MPIR_proctable[$proc].host_name" ); + my $host = gdb_string( $gdb, 1024, $hostp ); + my $pid = gdb_read_value( $gdb, "MPIR_proctable[$proc].pid" ); + if ( defined $host and defined $pid ) { $pt{$host}{$proc} = $pid; } else { - print "Failed to extract hostname from $hostd\n"; + print "Failed to extract process info for rank $proc\n"; } } From padb at googlecode.com Sun Oct 18 11:48:12 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Sun, 18 Oct 2009 10:48:12 +0000 Subject: [padb-devel] [padb] r293 committed - Don't pass error strings through printf so that value expansion isn't ... Message-ID: <0016e640d42e9d2cba0476335d6c@google.com> Revision: 293 Author: apittman Date: Sun Oct 18 03:47:57 2009 Log: Don't pass error strings through printf so that value expansion isn't attempted on them. This prevents errors when dealing with error string that contain % characters. http://code.google.com/p/padb/source/detail?r=293 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Fri Oct 16 03:52:39 2009 +++ /trunk/src/padb Sun Oct 18 03:47:57 2009 @@ -3921,8 +3921,8 @@ if ( defined $d->{target_data}{ERROR} ) { print "Warning: errors reported by some ranks\n========\n"; foreach my $error ( sort keys %{ $d->{target_data}{ERROR} } ) { - printf "%s: $error\n", - rng_convert_to_user( $d->{target_data}{ERROR}{$error} ); + printf "%s: %s\n", + rng_convert_to_user( $d->{target_data}{ERROR}{$error} ), $error; } print "========\n"; } From padb at googlecode.com Mon Oct 19 17:02:11 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Mon, 19 Oct 2009 16:02:11 +0000 Subject: [padb-devel] [padb] r294 committed - Enhance the way padb interacts with gdb by trying to read variables be... Message-ID: <00163623aa374d074d04764bde2c@google.com> Revision: 294 Author: apittman Date: Mon Oct 19 09:01:06 2009 Log: Enhance the way padb interacts with gdb by trying to read variables better. Disable the printing of pointers by gdb unless specificaly requested. Load the value of char * variables Maintain a list of known types for a given set of runtimes (currently Open-mpi) and how to convert from types to more specific data (i.e. print datatype names for MPI functions.). http://code.google.com/p/padb/source/detail?r=294 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Sun Oct 18 03:47:57 2009 +++ /trunk/src/padb Mon Oct 19 09:01:06 2009 @@ -4999,6 +4999,7 @@ gdbpid => -1, tracepid => -1, attached => 0, + pa => 0, }; my $cmd = 'gdb --interpreter=mi -q'; @@ -5050,6 +5051,22 @@ $gdb->{attached} = 1; $gdb->{tracepid} = $pid; + $gdb->{maps} = read_maps($pid); + + my $open = gdb_read_value( $gdb, 'opal_version_string' ); + + if ( defined $open ) { + $gdb->{runtime}{ompi} = 1; + } + + my $mpich2 = gdb_read_value( $gdb, 'MPID_GROUP' ); + + if ( defined $mpich2 ) { + $gdb->{runtime}{mpich2} = 1; + } + + gdb_n_send( $gdb, 'set print address off' ); + return $pid; } @@ -5075,15 +5092,48 @@ return; } -sub gdb_n_send { +sub _gdb_send_real { my ( $gdb, $cmd ) = @_; gdb_wait_for_prompt($gdb); my $handle = $gdb->{wtr}; print {$handle} "$cmd\n"; my %r = gdb_n_next_result($gdb); $r{cmd} = $cmd; + if ( 0 and defined $r{status} and $r{status} ne 'done' ) { + print Dumper \%r; + } return %r; } + +sub _gdb_set_print_address { + my ( $gdb, $flag ) = @_; + + if ( $flag == $gdb->{pa} ) { + return; + } + + $gdb->{pa} = $flag; + + if ($flag) { + _gdb_send_real( $gdb, 'set print address on' ); + } else { + _gdb_send_real( $gdb, 'set print address off' ); + } + +} + +sub gdb_n_send { + my ( $gdb, $cmd ) = @_; + _gdb_set_print_address( $gdb, 0 ); + return _gdb_send_real( $gdb, $cmd ); +} + +# Send a command with print address enabled. +sub gdb_send_addr { + my ( $gdb, $cmd ) = @_; + _gdb_set_print_address( $gdb, 1 ); + return _gdb_send_real( $gdb, $cmd ); +} sub gdb_send { my ( $gdb, $cmd ) = @_; @@ -5091,7 +5141,7 @@ return $p{status}; } -sub strip_square { +sub gdb_strip_square { my $str = shift; if ( $str =~ m{\A\[(.*)\]\z}x ) { return $1; @@ -5168,9 +5218,9 @@ $indent--; if ( $indent == 0 ) { if ( $rem eq $EMPTY_STRING ) { - return ( strip_square($value) ); + return ( gdb_strip_square($value) ); } else { - return ( strip_square($value), $rem ); + return ( gdb_strip_square($value), $rem ); } } } @@ -5397,22 +5447,22 @@ sub gdb_type_size { my ( $gdb, $type ) = @_; - my %p = gdb_n_send( $gdb, "-data-evaluate-expression sizeof($type)" ); + my %p = gdb_n_send( $gdb, "-data-evaluate-expression \"sizeof($type)\"" ); return unless ( $p{status} eq 'done' ); return gdb_strip_value( $p{reason} ); } sub gdb_type_offset { my ( $gdb, $type, $field ) = @_; - my %p = - gdb_n_send( $gdb, "-data-evaluate-expression \"&(($type *)0)->$field\"" ); + my %p = gdb_send_addr( $gdb, + "-data-evaluate-expression \"&(($type *)0)->$field\"" ); return unless ( $p{status} eq 'done' ); return hex gdb_strip_value( $p{reason} ); } sub gdb_func_addr { my ( $gdb, $func ) = @_; - my %p = gdb_n_send( $gdb, "-data-evaluate-expression $func" ); + my %p = gdb_send_addr( $gdb, "-data-evaluate-expression $func" ); return unless ( $p{status} eq 'done' ); my $value = gdb_strip_value( $p{reason} ); my @a = split $SPACE, $value; @@ -5422,7 +5472,7 @@ sub gdb_var_addr { my ( $gdb, $var ) = @_; - my %p = gdb_n_send( $gdb, "-data-evaluate-expression &$var" ); + my %p = gdb_send_addr( $gdb, "-data-evaluate-expression \"&($var)\"" ); return unless ( $p{status} eq 'done' ); return gdb_strip_value( $p{reason} ); } @@ -5459,6 +5509,43 @@ } return $str; } + +sub gdb_read_pointer { + my ( $gdb, $addr ) = @_; + + # Quote the request in case it contains spaces. + my %t = + gdb_send_addr( $gdb, "-data-evaluate-expression \"*(void **)$addr\"" ); + if ( $t{status} eq 'done' ) { + my $v = gdb_parse_reason( $t{reason} ); + return $v->{value}; + } + return; +} + +sub gdb_read_value { + my ( $gdb, $name ) = @_; + + # Quote the request in case it contains spaces. + my %t = gdb_n_send( $gdb, "-data-evaluate-expression \"$name\"" ); + if ( $t{status} eq 'done' ) { + my $v = gdb_parse_reason( $t{reason} ); + return $v->{value}; + } + return; +} + +sub gdb_read_value_addr { + my ( $gdb, $name ) = @_; + + # Quote the request in case it contains spaces. + my %t = gdb_send_addr( $gdb, "-data-evaluate-expression \"$name\"" ); + if ( $t{status} eq 'done' ) { + my $v = gdb_parse_reason( $t{reason} ); + return $v->{value}; + } + return; +} sub minfo_handle_query { my ( $gdb, $vp, $query, $stats ) = @_; @@ -6134,46 +6221,189 @@ return; } -sub gdb_read_value { - my ( $gdb, $name ) = @_; - - # Quote the request in case it contains spaces. - my %t = gdb_n_send( $gdb, "-data-evaluate-expression \"$name\"" ); - if ( $t{status} eq 'done' ) { - my $v = gdb_parse_reason( $t{reason} ); - return $v->{value}; +sub read_maps { + my ($pid) = @_; + + my @regions; + foreach my $rgn ( slurp_file("/proc/$pid/maps") ) { + my ( $area, $perm, $offset, $time, $inode, $file ) = + split $SPACE, $rgn; + my ( $start, $end ) = split "-", $area; + + my %region = ( + start => _hex("0x$start"), + end => _hex("0x$end"), + perm => $perm + ); + + $region{file} = $file if ( defined $file and length $file ); + push @regions, \%region; + } + + return \@regions; +} + +sub describe_pointer { + my ( $gdb, $ptr ) = @_; + + my $pval = _hex($ptr); + + if ( $ptr eq '0x0' ) { + return ''; + } + + foreach my $rgn ( @{ $gdb->{maps} } ) { + if ( $pval >= $rgn->{start} and $pval <= $rgn->{end} ) { + if ( defined $rgn->{file} ) { + return "{perm} ($rgn->{file})>"; + } else { + return "{perm}>"; + } + } + } + return "<$ptr (Invalid pointer)>"; +} + +sub expand_var_hash { + my ( $gdb, $lookup, $type, $addr ) = @_; + if ( defined $lookup->{$type} ) { + my $fm = $lookup->{$type}; + my $r = gdb_read_pointer( $gdb, $addr ); + $fm =~ s{%s}{$r}; + my $str_loc = gdb_var_addr( $gdb, $fm ); + if ( defined $str_loc ) { + my $value = gdb_string( $gdb, 1024, $str_loc ); + if ( defined $value and length $value > 0 ) { + return $value; + } + } } return; } -sub gdb_expand_vars { - my ( $gdb, $frame, $type ) = @_; - - foreach my $arg ( @{ $frame->{$type} } ) { - - # Detect simple pointers and deferefence then to show the - # underlying struct. Works quite well but is a problem with very - # large or complex data structures. More work is required to make - # this feature viable so leave it disabled for now. Perhaps have - # an option for enabling it in a per-type basis? - - if ( ( $arg->{type} =~ m/ \*$/ ) and $arg->{value} ne "0x0" and 0 ) { - my $value = gdb_read_value( $gdb, "* $arg->{name}" ); - if ( defined $value ) { - $arg->{value} .= " ($value)"; +sub custom_expand_var { + my ( $gdb, $type, $name, $addr ) = @_; + + my %lookup_generic = ( 'char *' => "%s", ); + + my %lookup_open = ( + 'MPI_Comm' => "((struct ompi_communicator_t *)%s).c_name", + 'MPI_Datatype' => "((struct ompi_datatype_t *)%s).name", + 'MPI_Op' => "((struct ompi_op_t *)%s).o_name", + 'ompi_communicator_t *' => "((struct ompi_communicator_t *)%s).c_name", + 'ompi_datatype_t *' => "((struct ompi_datatype_t *)%s).name", + 'ompi_op_t *' => "((struct ompi_op_t *)%s).o_name", + ); + + my %lookup_mpich2 = ( 'MPID_Comm *' => "((MPID_Comm *)%s).name", ); + + my $var_desc = expand_var_hash( $gdb, \%lookup_generic, $type, $addr ); + return $var_desc if defined $var_desc; + + if ( defined $gdb->{runtime}{ompi} ) { + $var_desc = expand_var_hash( $gdb, \%lookup_open, $type, $addr ); + return $var_desc if defined $var_desc; + } + + if ( defined $gdb->{runtime}{mpich2} ) { + $var_desc = expand_var_hash( $gdb, \%lookup_mpich2, $type, $addr ); + return $var_desc if defined $var_desc; + } + + return; + +} + +sub gdb_expand_var { + my ( $gdb, $arg ) = @_; + + # If you try and read a value which claims to be optimized away it + # will return a value of zero, hard to know how to handle this but + # not reporting it is probably the better of the two options. + return + if ( defined $arg->{value} and $arg->{value} eq '' ); + + # Char * pointers are already correctly handled by gdb with set + # pointer off enabled so don't try and modify these. + return if ( $arg->{type} eq 'char *' ); + + my $type = $arg->{type}; + my $addr = gdb_var_addr( $gdb, $arg->{name} ); + + # Strip out and struct from the given type as it makes no + # difference to the code if it's there or not. + if ( substr( $type, 0, 7 ) eq 'struct ' ) { + $type = substr $type, 7; + } + + # Check for custom types, these are individual, often run-time + # specific types that have handlers defined for them. Basically + # we know what the type is so go in and extract the information we + # need. + if ( defined $addr ) { + my $expanded_var = + custom_expand_var( $gdb, $type, $arg->{name}, $addr ); + if ( defined $expanded_var ) { + return "<$expanded_var>"; + } + } + + # If it's a pointer, firstly load it's value, then try and print + # it, if that fails then describe it. + if ( defined $type and substr( $type, -2 ) eq ' *' ) { + my $pointer; + + if ( defined $addr ) { + $pointer = gdb_read_pointer( $gdb, $addr ); + } + + if ( not defined $pointer ) { + $pointer = gdb_read_value_addr( $gdb, $arg->{name} ); + } + + if ( $pointer eq '0x0' ) { + return ''; + } + + # Try and print the contents of the pointer, this works fine + # for types like 'double *' but produces lots of data for more + # complex types, check the length here and reject it now if we + # need to so that describe_later() can work. + if ( defined $addr and $type ne 'void *' ) { + my $value = gdb_read_value( $gdb, "*($type)$addr" ); + if ( defined $value and length $value <= 70 ) { + return $value; } } - # Some variables don't show up a value from list-locals, - # __FUNCION__ and array pointers are two examples. For vars where - # the value isn't given automatically read the value of them - # directly. - next if defined $arg->{value}; - my $value = gdb_read_value( $gdb, $arg->{name} ); + # Describe the pointer by where it points to and what perms it + # has. + return describe_pointer( $gdb, $pointer ); + } + + # Some variables don't show up a value from list-locals, + # __FUNCION__ and array pointers are two examples. For vars where + # the value isn't given automatically read the value of them + # directly. + return if defined $arg->{value}; + my $value = gdb_read_value( $gdb, $arg->{name} ); + if ( defined $value ) { + return $value; + } + + return; +} + +sub gdb_expand_vars { + my ( $gdb, $frame, $type ) = @_; + + foreach my $arg ( @{ $frame->{$type} } ) { + my $value = gdb_expand_var( $gdb, $arg ); if ( defined $value ) { $arg->{value} = $value; } } + return; } @@ -6188,7 +6418,7 @@ foreach my $frame ( @{ $data->{stack} } ) { gdb_send( $gdb, "-stack-select-frame $frame->{level}" ); - my %r = gdb_n_send( $gdb, + my %r = gdb_send_addr( $gdb, "-stack-list-arguments 2 $frame->{level} $frame->{level}" ); my $args = gdb_parse_reason( $r{reason} ); @@ -6217,11 +6447,11 @@ my @th = (); my %result = gdb_n_send( $gdb, '-thread-list-ids' ); if ( $result{status} ne 'done' ) { - return ('unknown error'); + return; } my $data = gdb_parse_reason( $result{reason}, 'thread-ids' ); if ( not defined $data->{'thread-ids'} ) { - return ( { error => $data->{msg} || 'unknown error' } ); + return; } if ( $data->{'number-of-threads'} == 0 ) { my %t; From padb at googlecode.com Tue Oct 20 16:50:54 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Tue, 20 Oct 2009 15:50:54 +0000 Subject: [padb-devel] [padb] r295 committed - Changes to the gdb interface, use sequence numbers to identify... Message-ID: <001636c5be01d381b404765fd370@google.com> Revision: 295 Author: apittman Date: Tue Oct 20 08:50:32 2009 Log: Changes to the gdb interface, use sequence numbers to identify commands and ensure that what we send is what we get back. Add a debug mode to log all interaction to a file, this needs to be enabled by switching a flag in the code but is useful for development. Finally abort if we see an EOF from gdb unless we have already detached, it's still quite possible to crash gdb and this detects it and stops padb from reporting rubbish. http://code.google.com/p/padb/source/detail?r=295 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Mon Oct 19 09:01:06 2009 +++ /trunk/src/padb Tue Oct 20 08:50:32 2009 @@ -5000,6 +5000,8 @@ tracepid => -1, attached => 0, pa => 0, + debug => 0, + seq => 1, }; my $cmd = 'gdb --interpreter=mi -q'; @@ -5010,6 +5012,11 @@ $gdb->{gdbpid} = open3( $gdb->{wtr}, $gdb->{rdr}, $gdb->{err}, $cmd ) or croak "Unable to popen() gdb: $!"; + if ( $gdb->{debug} ) { + my ( $fh, $file ) = tempfile("/tmp/padb-gdb-debug-log-XXXXXX"); + $gdb->{debugfd} = $fh; + } + return $gdb; } @@ -5021,6 +5028,10 @@ next unless exists $gdb->{$fdname}; close $gdb->{$fdname}; } + if ( defined $gdb->{debugfd} ) { + close $gdb->{debugfd}; + } + return; } @@ -5065,7 +5076,7 @@ $gdb->{runtime}{mpich2} = 1; } - gdb_n_send( $gdb, 'set print address off' ); + gdb_n_send( $gdb, '-gdb-set print address off' ); return $pid; } @@ -5086,6 +5097,10 @@ my ($gdb) = shift; my $handle = $gdb->{rdr}; while (<$handle>) { + + if ( defined $gdb->{debugfd} ) { + print { $gdb->{debugfd} } $_; + } return if /^\(gdb\)/; } @@ -5096,10 +5111,19 @@ my ( $gdb, $cmd ) = @_; gdb_wait_for_prompt($gdb); my $handle = $gdb->{wtr}; - print {$handle} "$cmd\n"; - my %r = gdb_n_next_result($gdb); + my $seq = $gdb->{seq}++; + print {$handle} "$seq$cmd\n"; + if ( defined $gdb->{debugfd} ) { + print { $gdb->{debugfd} } "$seq$cmd\n"; + } + my %r = gdb_n_next_result( $gdb, $seq ); + if ( $gdb->{attached} and $r{seq} ne $seq ) { + croak( +"Invalid sequence number from gdb, expecting $seq got $r{seq} cmd=\"$cmd\"" + ); + } $r{cmd} = $cmd; - if ( 0 and defined $r{status} and $r{status} ne 'done' ) { + if ( $gdb->{debugfd} and defined $r{status} and $r{status} ne 'done' ) { print Dumper \%r; } return %r; @@ -5115,9 +5139,9 @@ $gdb->{pa} = $flag; if ($flag) { - _gdb_send_real( $gdb, 'set print address on' ); + _gdb_send_real( $gdb, '-gdb-set print address on' ); } else { - _gdb_send_real( $gdb, 'set print address off' ); + _gdb_send_real( $gdb, '-gdb-set print address off' ); } } @@ -5398,6 +5422,10 @@ while (<$handle>) { + if ( defined $gdb->{debugfd} ) { + print { $gdb->{debugfd} } $_; + } + #printf("Line $_\n"); return %res if /^\(gdb\)/; @@ -5407,10 +5435,14 @@ #if (/\&\"(.*)\"\n/) { #" # $res{debug} .= $1; #} - if (m{\A\^(done|error),?(.*)\Z}x) { - $res{status} = $1; - if ( defined $2 and $2 ne $EMPTY_STRING ) { - $res{reason} = $2; + if (m{\A(\d+)\^(done|error),?(.*)\Z}x) { + my $seq = $1; + my $status = $2; + my $reason = $3; + $res{status} = $status; + $res{seq} = $seq; + if ( defined $reason and $reason ne $EMPTY_STRING ) { + $res{reason} = $reason; } #if ( defined $res{raw} ) { @@ -5433,6 +5465,10 @@ # $res{debug} =~ s/\\n/\n/g; # chomp $res{debug}; #} + + if ( $gdb->{attached} ) { + croak("Unexpected EOF from gdb"); + } return %res; } From padb at googlecode.com Wed Oct 21 11:11:44 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Wed, 21 Oct 2009 10:11:44 +0000 Subject: [padb-devel] [padb] r296 committed - Re-instate mpirun as a resource manager after I changed the ... Message-ID: <0016e640d40eba6a7804766f34e6@google.com> Revision: 296 Author: apittman Date: Wed Oct 21 03:10:58 2009 Log: Re-instate mpirun as a resource manager after I changed the way gdb handled pointers (and hence char * strings). http://code.google.com/p/padb/source/detail?r=296 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Tue Oct 20 08:50:32 2009 +++ /trunk/src/padb Wed Oct 21 03:10:58 2009 @@ -2700,7 +2700,7 @@ my %pt; foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { my $hostp = - gdb_read_value( $gdb, "(void *)MPIR_proctable[$proc].host_name" ); + gdb_read_value_addr( $gdb, "(void *)MPIR_proctable[$proc].host_name" ); my $host = gdb_string( $gdb, 1024, $hostp ); my $pid = gdb_read_value( $gdb, "MPIR_proctable[$proc].pid" ); if ( defined $host and defined $pid ) { From padb at googlecode.com Thu Oct 22 15:09:47 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Thu, 22 Oct 2009 14:09:47 +0000 Subject: [padb-devel] [padb] r297 committed - Add a new --lstopo option to wrap around the fantastic lstopo command... Message-ID: <0016368e1fece7026d047686a5e4@google.com> Revision: 297 Author: apittman Date: Thu Oct 22 07:08:40 2009 Log: Add a new --lstopo option to wrap around the fantastic lstopo command provided by the openmpi team. Currently reports per-system CPUS available rather than what the target process is bound to however. http://code.google.com/p/padb/source/detail?r=297 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Wed Oct 21 03:10:58 2009 +++ /trunk/src/padb Thu Oct 22 07:08:40 2009 @@ -2699,8 +2699,8 @@ my %pt; foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { - my $hostp = - gdb_read_value_addr( $gdb, "(void *)MPIR_proctable[$proc].host_name" ); + my $hostp = gdb_read_value_addr( $gdb, + "(void *)MPIR_proctable[$proc].host_name" ); my $host = gdb_string( $gdb, 1024, $hostp ); my $pid = gdb_read_value( $gdb, "MPIR_proctable[$proc].pid" ); if ( defined $host and defined $pid ) { @@ -7265,6 +7265,38 @@ kill $signal, $pid; return; } + +# Experimental, currently reports on what's on the node rather than what +# the specific process is attached to, hopefully this functionality will be +# added in the future. + +# https://svn.open-mpi.org/trac/hwloc/ticket/21 +sub lstopo { + my ( $cargs, $vp, $pid ) = @_; + + target_error( $vp, "Reporting per node rather than per process" ); + + my @output = slurp_cmd("lstopo --whole-system"); + + # Check the return code, if it's not found then there won't be any + # output, if it was found but returned an error then do report the + # output as it might be useful. + my $rc = $?; + if ( $rc != 0 ) { + if ( not find_exe("lstopo") ) { + target_error( $vp, "Error running lstopo: command not found" ); + return; + } else { + target_error( $vp, "Error running lstopo" ); + } + } + + chomp @output; + foreach my $line (@output) { + output( $vp, $line ); + } + return; +} sub ping_rank { my ( $cargs, $vp, $pid ) = @_; @@ -8491,6 +8523,12 @@ } }; + $allfns{lstopo} = { + handler => \&lstopo, + arg_long => 'lstopo', + help => 'Show CPU topology', + }; + $allfns{ping} = { handler => \&ping_rank, arg_long => 'ping', From padb at googlecode.com Fri Oct 23 18:44:01 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Fri, 23 Oct 2009 17:44:01 +0000 Subject: [padb-devel] [padb] r298 committed - Patch from Jeff Squyres, compile cleanly in 64 bit platforms. Message-ID: <001636c5bf57e7347704769dc1fb@google.com> Revision: 298 Author: apittman Date: Fri Oct 23 10:43:31 2009 Log: Patch from Jeff Squyres, compile cleanly in 64 bit platforms. http://code.google.com/p/padb/source/detail?r=298 Modified: /trunk/src/minfo.c ======================================= --- /trunk/src/minfo.c Thu Oct 15 06:30:47 2009 +++ /trunk/src/minfo.c Fri Oct 23 10:43:31 2009 @@ -32,7 +32,7 @@ void show_string (char *desc, char *str) { printf("zzz: str:%d %s\n%s\n", - strlen(str), + (int) strlen(str), desc, str); } @@ -345,7 +345,7 @@ printf("out: c:%d str:%d name\n%s\n", c, - strlen(comm->name), + (int) strlen(comm->name), comm->name); printf("out: c:%d id:%ld\n", From padb at googlecode.com Fri Oct 23 20:19:57 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Fri, 23 Oct 2009 19:19:57 +0000 Subject: [padb-devel] [padb] r299 committed - Be rigorous when extracting process and rank information from a mpirun... Message-ID: <001636988b67fc3cc604769f1806@google.com> Revision: 299 Author: apittman Date: Fri Oct 23 12:19:26 2009 Log: Be rigorous when extracting process and rank information from a mpirun cmd, check all functions for errors and about with a suitable error message when a problem is found rather than wildly carrying on and failing in an obscure way later on. http://code.google.com/p/padb/source/detail?r=299 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Thu Oct 22 07:08:40 2009 +++ /trunk/src/padb Fri Oct 23 12:19:26 2009 @@ -2692,10 +2692,19 @@ my $gdb = gdb_start(); if ( not gdb_attach( $gdb, $job ) ) { + if ( defined $gdb->{error} ) { + print "Failed to attach to process: $gdb->{error}\n"; + } else { + print "Failed to attach to process\n"; + } return; } my $nprocs = gdb_read_value( $gdb, 'MPIR_proctable_size' ); + if ( not defined $nprocs ) { + print "No MPIR_proctable_size symbol found, cannot continue"; + return; + } my %pt; foreach my $proc ( 0 .. ( $nprocs - 1 ) ) { @@ -2715,6 +2724,11 @@ my @hosts = keys(%pt); + if ( @hosts == 0 ) { + print "No process data found"; + return; + } + my $cmd = $EMPTY_STRING; if ( $hosts[0] ne hostname() or @hosts > 1 ) { my $hlist = join q{,}, @hosts; From padb at googlecode.com Fri Oct 23 20:40:20 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Fri, 23 Oct 2009 19:40:20 +0000 Subject: [padb-devel] [padb] r300 committed - Remove a un-used variable. Message-ID: <00163623aa37d8bc9b04769f6131@google.com> Revision: 300 Author: apittman Date: Fri Oct 23 12:40:06 2009 Log: Remove a un-used variable. http://code.google.com/p/padb/source/detail?r=300 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Fri Oct 23 12:19:26 2009 +++ /trunk/src/padb Fri Oct 23 12:40:06 2009 @@ -7838,7 +7838,6 @@ sub pid_to_name { my $pid = shift; my $exe = readlink "/proc/$pid/exe"; - my $cmd; if ( defined $exe ) { return basename($exe); } else { From padb at googlecode.com Sat Oct 24 12:51:32 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Sat, 24 Oct 2009 11:51:32 +0000 Subject: [padb-devel] [padb] r301 committed - Add a get_extended_process_list() function, this is similar to... Message-ID: <0016e640d47e21430f0476acf364@google.com> Revision: 301 Author: apittman Date: Sat Oct 24 04:51:05 2009 Log: Add a get_extended_process_list() function, this is similar to get_process_list() but returns both the process list and the parent pid of each process in a hash. This allows the convert_pids_to_child_pids() function to load the process information once and do it's work without further calls to load process table state. http://code.google.com/p/padb/source/detail?r=301 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Fri Oct 23 12:40:06 2009 +++ /trunk/src/padb Sat Oct 24 04:51:05 2009 @@ -771,6 +771,7 @@ return @files; } +# Return an array of current processes for a given user. sub get_process_list { my ($user) = @_; my $uid = getpwnam $user; @@ -790,6 +791,32 @@ } return @userpids; } + +# Return the process list for a given user, return a hash indexed by pid +# and containing the parent pid for the given process. +sub get_extended_process_list { + my ($user) = @_; + my $uid = getpwnam $user; + return unless defined $uid; + my @pids = slurp_dir('/proc'); + my @userpids; + my %procs; + foreach my $pid (@pids) { + next unless ( $pid =~ m{\A\d+\z}xms ); + my ( undef, undef, undef, undef, $owner ) = stat "/proc/$pid"; + + # Check the stat worked, it's possible for processes to dissapear + # Take care to check for defined rather than true as root has a uid + # of zero. + next unless defined $owner; + next unless $owner == $uid; + my $ppid = find_from_status( $pid, 'PPid' ); + if ( defined $ppid ) { + $procs{$pid} = $ppid; + } + } + return %procs; +} sub parse_args_outer { @@ -7850,14 +7877,14 @@ # more interesting pids, in particular look for pids which appear to be # scripts and, if they have any children, look at the children instead. sub convert_pids_to_child_pids { - my @pids = get_process_list( getpwuid $< ); + my %process_data = get_extended_process_list( getpwuid $< ); my %scripts; map { $scripts{$_}++ } split $COMMA, $inner_conf{scripts}; my $ipids = $inner_conf{rmpids}; - foreach my $pid (@pids) { + foreach my $pid ( keys %process_data ) { # The resource manager pid this pid is associated with. my $rmpid; @@ -7865,14 +7892,14 @@ if ( defined $ipids->{$pid} ) { $rmpid = $pid; } else { - my $ppid = find_from_status( $pid, 'PPid' ); + my $ppid = $process_data{$pid}; while ( defined $ppid and $ppid != 1 ) { if ( defined $ipids->{$ppid} ) { $rmpid = $ppid; $ppid = undef; } else { - $ppid = find_from_status( $ppid, 'PPid' ); + $ppid = $process_data{$ppid}; } } } From padb at googlecode.com Tue Oct 27 21:00:44 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Tue, 27 Oct 2009 21:00:44 +0000 Subject: [padb-devel] [padb] r302 committed - Add support for stack traces on the solaris operating system, most... Message-ID: <00504502cc36c52da90476f0f80d@google.com> Revision: 302 Author: apittman Date: Tue Oct 27 13:59:52 2009 Log: Add support for stack traces on the solaris operating system, most changes revolve around using ps rather than /proc for discovering proccess data and small changes to the way gdb attaches to processes. http://code.google.com/p/padb/source/detail?r=302 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Sat Oct 24 04:51:05 2009 +++ /trunk/src/padb Tue Oct 27 13:59:52 2009 @@ -291,6 +291,18 @@ my %conf; my $secret; + +# Solaris support! Tested for and works for stack traces, anything else +# should be considered a bonus at this stage. Reports are welcome, more so +# if they contain good news. +my $running_on_solaris = 0; + +# Test for solaris by checking for a file rather than running uname, it'll +# be quicker. Note that with this option enabled padb will still function +# correctly on Linux. +if ( -d '/proc/1/path' ) { + $running_on_solaris = 1; +} # Config options the inner knows about, only forward options if they are in # this list. @@ -774,6 +786,16 @@ # Return an array of current processes for a given user. sub get_process_list { my ($user) = @_; + if ($running_on_solaris) { + my @procs = slurp_cmd("ps -o pid= -u $user"); + chomp @procs; + my @clean_procs; + foreach my $proc (@procs) { + $proc =~ s{\s*}{}g; + push @clean_procs, $proc; + } + return @clean_procs; + } my $uid = getpwnam $user; return unless defined $uid; my @pids = slurp_dir('/proc'); @@ -796,6 +818,20 @@ # and containing the parent pid for the given process. sub get_extended_process_list { my ($user) = @_; + if ($running_on_solaris) { + my @procs = slurp_cmd("ps -o pid= -o ppid= -u $user"); + my %procs; + foreach my $proc (@procs) { + + # A little bit of magic here, ps left pads its output with a + # space which the regexp matches so $pid is the second entry + # returned from the split command. I need to verify that Linux + # ps and Solaris ps behave the same in this situation. + my ( undef, $pid, $ppid ) = split $SPACE, $proc; + $procs{$pid} = $ppid; + } + return %procs; + } my $uid = getpwnam $user; return unless defined $uid; my @pids = slurp_dir('/proc'); @@ -5079,6 +5115,15 @@ sub gdb_attach { my ( $gdb, $pid ) = @_; + if ($running_on_solaris) { + my $exe = readlink("/proc/$pid/path/a.out"); + my %cs = gdb_n_send( $gdb, "file $exe" ); + if ( $cs{status} ne 'done' ) { + croak("Gdb command file $exe failed"); + return; + } + } + send_cont_signal($pid); my %p = gdb_n_send( $gdb, "attach $pid" ); @@ -6531,16 +6576,25 @@ return; } if ( $data->{'number-of-threads'} == 0 ) { - my %t; - $t{id} = 0; + my %t = ( id => 0 ); @{ $t{frames} } = gdb_dump_frames( $gdb, $detail ); push @th, \%t; return @th; } + # Solaris has a extra "LWP" thread for every process which gdb reports + # as being a duplicate of the main thread for a process. Skip loading + # this thread as it doesn't add anything to the output rather muddies + # the water somewhat. What I don't know is if the LWP reports to be a + # duplicate of the first thread, I suspect it's the last one so drop + # the last thread from the list reported by gdb. This needs verifying + # and possibly changing by somebody who has access to both solaris and + # multi threaded processes. + if ($running_on_solaris) { + pop @{ $data->{'thread-ids'} }; + } foreach my $thread ( @{ $data->{'thread-ids'} } ) { my $id = $thread->{'thread-id'}; - my %t; - $t{id} = $id; + my %t = ( id => $id ); gdb_send( $gdb, "-thread-select $id" ); @{ $t{frames} } = gdb_dump_frames( $gdb, $detail ); push @th, \%t; @@ -7611,6 +7665,21 @@ sub find_from_status { my ( $pid, $key ) = @_; + if ($running_on_solaris) { + my %key_lookup = ( + PPid => 'ppid', + Name => 'comm', + State => 's', + ); + my $ps_key = $key_lookup{$key}; + my @res = slurp_cmd("ps -o $ps_key= -p $pid"); + return if ( @res == 0 ); + chomp @res; + my $proc = $res[0]; + $proc =~ s{\s*}{}g; + return $proc; + } + foreach my $l ( slurp_file("/proc/$pid/status") ) { if ( $l =~ m{\A(\w+):\s*(.+)}x ) { if ( $1 eq $key ) { return $2; } @@ -7894,7 +7963,7 @@ } else { my $ppid = $process_data{$pid}; - while ( defined $ppid and $ppid != 1 ) { + while ( defined $ppid and $ppid != 1 and $ppid != 0 ) { if ( defined $ipids->{$ppid} ) { $rmpid = $ppid; $ppid = undef; @@ -7982,8 +8051,8 @@ my $pid = $proc->{pid}; my $vp = $proc->{vp}; target_key_pair( $vp, 'FOUND', 'yes' ); - target_key_pair( $vp, 'NAME', $proc->{status}->{Name} ); - target_key_pair( $vp, 'STATE', $proc->{status}->{State} ); + target_key_pair( $vp, 'NAME', find_from_status( $pid, 'Name' ) ); + target_key_pair( $vp, 'STATE', find_from_status( $pid, 'State' ) ); } return; } From padb at googlecode.com Tue Oct 27 21:33:31 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Tue, 27 Oct 2009 21:33:31 +0000 Subject: [padb-devel] [padb] r303 committed - Add an option to save a trace of the communication with minfo to a fil... Message-ID: <0016e64dc8ea05d2620476f16e54@google.com> Revision: 303 Author: apittman Date: Tue Oct 27 14:33:16 2009 Log: Add an option to save a trace of the communication with minfo to a file in /tmp. Useful for debugging problems with padb itself. http://code.google.com/p/padb/source/detail?r=303 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Tue Oct 27 13:59:52 2009 +++ /trunk/src/padb Tue Oct 27 14:33:16 2009 @@ -5724,12 +5724,18 @@ hpid => -1, tracepid => -1, attached => 0, + debug => 0, }; my $cmd = $inner_conf{minfo}; $h->{hpid} = open3( $h->{fd}{wtr}, $h->{fd}{rdr}, $h->{fd}{err}, $cmd ) or confess "Unable to popen() h: $!\n"; + if ( $h->{debug} ) { + my ( $fh, $file ) = tempfile("/tmp/padb-minfo-debug-log-$vp-XXXXXX"); + $h->{debugfd} = $fh; + } + my $handle = $h->{fd}{rdr}; my $out = $h->{fd}{wtr}; @@ -5751,6 +5757,10 @@ while (<$handle>) { my $r = $_; + if ( defined $h->{debugfd} ) { + print { $h->{debugfd} } $r; + } + if ( defined $bytes_to_read ) { $str_value .= $r; if ( length $str_value eq $bytes_to_read + 1 ) { @@ -5790,6 +5800,11 @@ # and we don't need to report it. print {$out} "$res\n"; + + if ( defined $h->{debugfd} ) { + print { $h->{debugfd} } "$res\n"; + } + } elsif ( $cmd eq 'out:' ) { if ( $r =~ m{\A @@ -6581,6 +6596,7 @@ push @th, \%t; return @th; } + # Solaris has a extra "LWP" thread for every process which gdb reports # as being a duplicate of the main thread for a process. Skip loading # this thread as it doesn't add anything to the output rather muddies From padb at googlecode.com Tue Oct 27 21:49:20 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Tue, 27 Oct 2009 21:49:20 +0000 Subject: [padb-devel] [padb] r304 committed - Check for stderr from the minfo process and report it to the user as ... Message-ID: <0016e64b974c95eedb0476f1a687@google.com> Revision: 304 Author: apittman Date: Tue Oct 27 14:48:53 2009 Log: Check for stderr from the minfo process and report it to the user as a problem if there is any. http://code.google.com/p/padb/source/detail?r=304 Modified: /trunk/src/padb ======================================= --- /trunk/src/padb Tue Oct 27 14:33:16 2009 +++ /trunk/src/padb Tue Oct 27 14:48:53 2009 @@ -5727,8 +5727,10 @@ debug => 0, }; + $h->{fd}{err} = *M_ERROR; + my $cmd = $inner_conf{minfo}; - $h->{hpid} = open3( $h->{fd}{wtr}, $h->{fd}{rdr}, $h->{fd}{err}, $cmd ) + $h->{hpid} = open3( $h->{fd}{wtr}, $h->{fd}{rdr}, *M_ERROR, $cmd ) or confess "Unable to popen() h: $!\n"; if ( $h->{debug} ) { @@ -5870,6 +5872,20 @@ push @{ $cd{raw} }, $r; } } + + # Belt and braces approach, minfo shouldn't have any output on + # stderr which means that if it does then it's a bug. Check that + # and report it to the user as a problem. + my $eh = $h->{fd}{err}; + my $stderr = $EMPTY_STRING; + my $have_error_messages = 0; + while (<$eh>) { + $have_error_messages = 1; + $stderr .= $_; + } + if ($have_error_messages) { + target_error( $vp, "Stderr from minfo: $stderr" ); + } my $sc = keys %stats; From ethan.mallove at sun.com Wed Oct 28 17:32:52 2009 From: ethan.mallove at sun.com (Ethan Mallove) Date: Wed, 28 Oct 2009 12:32:52 -0500 Subject: [padb-devel] Simple Makefile patch Message-ID: <20091028173250.GO53330@sun.com> Little patch for Makefile to make compiling on Solaris easier. Solaris C compiler is also named "cc", and it does not support -Wall. -Ethan -------------- next part -------------- Index: Makefile =================================================================== --- Makefile (revision 304) +++ Makefile (working copy) @@ -2,11 +2,13 @@ INSTALL_DIR=/usr/local/ CONFIG_DIR=/etc VERSION=3.0-rc1 +CC=gcc +CFLAGS=-Wall -g FILES = Makefile minfo.c mpi_interface.h padb minfo.x: minfo.c mpi_interface.h - cc -g minfo.c -o minfo.x -ldl -Wall + $(CC) minfo.c -o minfo.x -ldl $(CFLAGS) install: minfo.x /bin/mkdir -p ${INSTALL_DIR}/bin From padb at googlecode.com Wed Oct 28 22:32:15 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Wed, 28 Oct 2009 22:32:15 +0000 Subject: [padb-devel] [padb] r305 committed - Patch from Ethan Mallove ... Message-ID: <00504502c68addb7a40477065d8a@google.com> Revision: 305 Author: apittman Date: Wed Oct 28 15:31:42 2009 Log: Patch from Ethan Mallove Use gcc rather than cc to compile minfo as that also works for solaris where cc is the sun compiler. Factor out CC and CFLAGS into defines at the same time. http://code.google.com/p/padb/source/detail?r=305 Modified: /trunk/src/Makefile ======================================= --- /trunk/src/Makefile Thu Oct 1 12:38:38 2009 +++ /trunk/src/Makefile Wed Oct 28 15:31:42 2009 @@ -2,11 +2,13 @@ INSTALL_DIR=/usr/local/ CONFIG_DIR=/etc VERSION=3.0-rc1 +CC=gcc +CFLAGS=-Wall -g FILES = Makefile minfo.c mpi_interface.h padb minfo.x: minfo.c mpi_interface.h - cc -g minfo.c -o minfo.x -ldl -Wall + $(CC) minfo.c -o minfo.x -ldl $(CFLAGS) install: minfo.x /bin/mkdir -p ${INSTALL_DIR}/bin From padb at googlecode.com Thu Oct 29 22:52:58 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Thu, 29 Oct 2009 22:52:58 +0000 Subject: [padb-devel] [padb] r306 committed - Tidy up the minfo.c code considerably to make it easier to read and... Message-ID: <000e0cd20a2cd0381304771ac52d@google.com> Revision: 306 Author: apittman Date: Thu Oct 29 15:52:20 2009 Log: Tidy up the minfo.c code considerably to make it easier to read and hopefully extend in future. Use a struct for the dll callbacks and give the struct entries meaningful names to help the readability of the code. Add a few helper functions to ease the control flow. Also update the copyright information from 2004 to 2009 http://code.google.com/p/padb/source/detail?r=306 Modified: /trunk/src/minfo.c ======================================= --- /trunk/src/minfo.c Fri Oct 23 10:43:31 2009 +++ /trunk/src/minfo.c Thu Oct 29 15:52:20 2009 @@ -1,5 +1,6 @@ /* * Copyright (c) 2003,2004 Quadrics Ltd + * Copyright (c) 2009, Ashley Pittman. */ #ident "elfN.c,v 1.14 2005-11-03 11:23:04 ashley Exp" @@ -12,6 +13,24 @@ #include #include "mpi_interface.h" +struct dll_entry_points { + char *(*dll_error_string) (int); + void (*setup_basic_callbacks) (mqs_basic_callbacks *); + int (*setup_image) (mqs_image *, mqs_image_callbacks *); + int (*image_has_queues) (mqs_image *, char **); + int (*setup_process) (mqs_process *, mqs_process_callbacks *); + int (*process_has_queues) (mqs_process *, char **); + void (*update_communicator_list) (mqs_process *); + int (*setup_communicator_iterator)(mqs_process *); + int (*get_communicator) (mqs_process *, mqs_communicator *); + int (*next_communicator) (mqs_process *); + int (*get_global_rank) (mqs_process *); + int (*get_comm_coll_state) (mqs_process *, int, int *, int *); + int (*get_comm_group) (mqs_process *, int *); + int (*setup_operation_iterator) (mqs_process *, int); + int (*next_operation) (mqs_process *, mqs_pending_operation *); +}; + struct image { mqs_image_info *blob; }; @@ -27,7 +46,30 @@ int size; }; -char *(*es)(int errorcode); +struct dll_entry_points dll_ep = {}; + +char *collective_names[] = { "Barrier", + "Bcast", + "Allgather", + "Allgatherv", + "Allreduce", + "Alltoall", + "Alltoallv", + "Reduce_Scatter", + "Reduce", + "Gather", + "Gatherv", + "Scan", + "Scatter", + "Scatterv" }; + +char *op_types[] = { "pending_send", + "pending_receive", + "unexpected_message" }; + +char *op_status[] = { "pending", + "matched", + "complete" }; void show_string (char *desc, char *str) { @@ -45,7 +87,7 @@ void show_dll_error_code (int res) { char *msg; - msg = es(res); + msg = dll_ep.dll_error_string(res); show_string("dllerror",msg); } @@ -160,7 +202,7 @@ *addr = (mqs_taddr_t)base; return mqs_ok; } - return -1; + return mqs_no_information; } int find_symbol (mqs_image *image, char *name, mqs_taddr_t *addr) @@ -172,7 +214,7 @@ } return mqs_ok; } - return 100; + return mqs_no_information; } int req_to_int (char *req,int *res) @@ -259,7 +301,7 @@ i = ask(req,ans); if ( i != 0 ) - return -1; + return mqs_no_information; for ( i = 0 ; i < size ; i++ ) { char *e; @@ -284,14 +326,14 @@ int res; // printf("Trying to read data for %d from %p\n",size,(void *)addr); if ( ! addr ) { - return 100; + return mqs_no_information; } do { if ( offset > size ) offset = size; res = _find_data(proc,addr,offset,local); if ( res != mqs_ok ) - return 100; + return mqs_no_information; addr += offset; local += offset; @@ -329,11 +371,10 @@ return 0; } -int msgid = 0; - -int show_comm (struct process *p,mqs_communicator *comm) -{ - static int c = 0; +int msg_id = 0; + +int show_comm (struct process *p, mqs_communicator *comm, int c) +{ if ( comm->local_rank >= 0 ) printf("out: c:%d rank:%d\n", c, @@ -352,15 +393,46 @@ c, comm->unique_id); - - msgid=0; return c++; /* This is not a political statement although if it was I'd stand by it */ } -void show_op (mqs_pending_operation *op, int type) -{ - static char *types[] = { "pending_send", "pending_receive", "unexpected_message" }; - static char *status[] = { "pending", "matched", "complete" }; +void show_comm_members (mqs_process *target_process, mqs_communicator *comm, int comm_id) +{ + int *ranks = malloc(comm->size*sizeof(int)); + int r = dll_ep.get_comm_group(target_process,ranks); + if ( r == mqs_ok ) { + int i; + for ( i = 0 ; i < comm->size ; i++ ) { + printf("out: c:%d rt:%d\n", + comm_id, + ranks[i]); + } + } + free(ranks); +} + +void show_comm_coll_state (mqs_process *target_process, mqs_communicator *comm, int comm_id) +{ + int i; + for ( i = 0 ; i < 14 ; i++ ) { + int seq = -1; + int active = -1; + int r = dll_ep.get_comm_coll_state(target_process,i,&seq,&active); + if ( r == mqs_ok ) { + if ( seq != 0 ) + printf("comm%d: Collective '%s': call count %d, %sactive\n", + comm_id, + collective_names[i], + seq, + active ? "" : "not "); + } else if ( r != mqs_no_information ) { + show_dll_error_code(r); + } + } +} + +void show_op (mqs_pending_operation *op, int msgid, int type) +{ int i; int all = 0; @@ -368,7 +440,7 @@ all = 1; printf("msg%d: Operation %d (%s) status %d (%s)\n", - msgid,type,types[type],op->status,status[op->status]); + msgid,type,op_types[type],op->status,op_status[op->status]); printf("msg%d: Rank local %d global %d\n", msgid,(int)op->desired_local_rank, (int)op->desired_global_rank); if ( all ) @@ -396,17 +468,12 @@ else i = 10; } while ( i++ < 5 ); - - msgid++; } -int (*soi)(mqs_process *process,int type); -int (*no)(mqs_process *process, mqs_pending_operation *op); - -void load_ops (mqs_process *p,int type) -{ - mqs_pending_operation op; - int res = soi((mqs_process *)p,type); +void load_ops (mqs_process *target_process,int type) +{ + + int res = dll_ep.setup_operation_iterator(target_process,type); if ( res != mqs_ok ) { if ( res != mqs_ok && res != mqs_no_information ) printf("Setup operation iterator failed %d for type %d\n",res,type); @@ -414,16 +481,64 @@ } do { - memset(&op,0,sizeof(mqs_pending_operation)); - res = no((mqs_process *)p,&op); + mqs_pending_operation op = {}; + res = dll_ep.next_operation(target_process,&op); if ( res == mqs_ok ) { - show_op(&op,type); + show_op(&op,msg_id,type); + msg_id++; } else if ( res != mqs_end_of_list ) { printf("Res from mqs_pending_operation is %d type %d\n",res,type); } } while ( res == mqs_ok ); } + +void load_all_ops (mqs_process *target_process) +{ + msg_id = 0; + load_ops(target_process,mqs_pending_receives); + load_ops(target_process,mqs_unexpected_messages); + load_ops(target_process,mqs_pending_sends); +} + +#define DLSYM_LAX(VAR,HANDLE,NAME) VAR.NAME = dlsym(HANDLE,"mqs_" #NAME) + +#define DLSYM(VAR,HANDLE,NAME) do { \ + if ( (DLSYM_LAX(VAR,HANDLE,NAME)) == NULL ) { \ + show_warning("Failed to load symbol mqs_" #NAME); \ + return -1; \ + } \ + } while (0) + +/* Try and load the dll from a given filename, returns true if successfull. + * populates the contents of dll_ep if true. + */ +int load_msgq_dll(char *filename) +{ + void *dlhandle; + + dlhandle = dlopen(filename,RTLD_NOW); + if ( ! dlhandle ) + return -1; + + DLSYM(dll_ep,dlhandle,setup_basic_callbacks); + DLSYM(dll_ep,dlhandle,setup_image); + DLSYM(dll_ep,dlhandle,image_has_queues); + DLSYM(dll_ep,dlhandle,setup_process); + DLSYM(dll_ep,dlhandle,process_has_queues); + DLSYM(dll_ep,dlhandle,dll_error_string); + DLSYM(dll_ep,dlhandle,update_communicator_list); + DLSYM(dll_ep,dlhandle,setup_communicator_iterator); + DLSYM(dll_ep,dlhandle,get_communicator); + DLSYM(dll_ep,dlhandle,next_communicator); + DLSYM(dll_ep,dlhandle,setup_operation_iterator); + DLSYM(dll_ep,dlhandle,next_operation); + DLSYM(dll_ep,dlhandle,get_comm_group); + + DLSYM_LAX(dll_ep,dlhandle,get_global_rank); + DLSYM_LAX(dll_ep,dlhandle,get_comm_coll_state); + return 0; +} #define PATH_MAX 1024 @@ -431,62 +546,32 @@ main () { int res; - int nres; char *dll_name; - void *dlhandle; - void (*b)(mqs_basic_callbacks *bcb); - int (*si)(mqs_image *image,mqs_image_callbacks *icb); - int (*ihq)(mqs_image *image, char **msg); - int (*sp)(mqs_process *process,mqs_process_callbacks *pcb); - int (*phq)(mqs_process *process, char **msg); - void (*ucl)(mqs_process *process); - - int (*sci)(mqs_process *process); - int (*gc)(mqs_process *process, mqs_communicator *comm); - int (*nc)(mqs_process *process); - int (*gr)(mqs_process *process); - int (*gcs)(mqs_process *, int, int *, int *); - int (*gcg)(mqs_process *, int *); - - struct image i; - struct process p; - + int comm_id = 0; + + struct image image; + struct process process; + + mqs_image *target_image = (mqs_image *)ℑ + mqs_process *target_process = (mqs_process *)&process; + dll_name = getenv("MPINFO_DLL"); - if ( dll_name ) { - dlhandle = dlopen(dll_name,RTLD_NOW); - } else { - char dll[PATH_MAX]; + if ( ! dll_name ) { + void *base = find_sym("sym","MPIR_dll_name"); if ( ! base ) { die("Could not find MPIR_dll_name symbol"); } - fetch_string(NULL,&dll[0],(mqs_taddr_t)base,PATH_MAX); - dlhandle = dlopen(dll,RTLD_NOW); - } - - if ( ! dlhandle ) { - die("Could not open dll"); - } - - b = dlsym(dlhandle,"mqs_setup_basic_callbacks"); - if ( ! b ) { + dll_name = malloc(PATH_MAX); + if ( fetch_string(NULL,dll_name,(mqs_taddr_t)base,PATH_MAX) != 0 ) { + die("Could not read value of MPIR_dll_name"); + } + } + + if ( load_msgq_dll(dll_name) != 0 ) { die("Could not load symbols from dll"); } - si = dlsym(dlhandle,"mqs_setup_image"); - ihq = dlsym(dlhandle,"mqs_image_has_queues"); - sp = dlsym(dlhandle,"mqs_setup_process"); - phq = dlsym(dlhandle,"mqs_process_has_queues"); - es = dlsym(dlhandle,"mqs_dll_error_string"); - ucl = dlsym(dlhandle,"mqs_update_communicator_list"); - sci = dlsym(dlhandle,"mqs_setup_communicator_iterator"); - gc = dlsym(dlhandle,"mqs_get_communicator"); - nc = dlsym(dlhandle,"mqs_next_communicator"); - gr = dlsym(dlhandle,"mqs_get_global_rank"); - soi = dlsym(dlhandle,"mqs_setup_operation_iterator"); - no = dlsym(dlhandle,"mqs_next_operation"); - gcs = dlsym(dlhandle,"mqs_get_comm_coll_state"); - gcg = dlsym(dlhandle,"mqs_get_comm_group"); - + bcb.mqs_malloc_fp = malloc; bcb.mqs_free_fp = free; bcb.mqs_dprints_fp = show_msg; @@ -496,7 +581,7 @@ bcb.mqs_put_process_info_fp = process_put; bcb.mqs_get_process_info_fp = process_get; - b(&bcb); + dll_ep.setup_basic_callbacks(&bcb); icb.mqs_get_type_sizes_fp = get_type_size; icb.mqs_find_function_fp = find_function; @@ -505,16 +590,16 @@ icb.mqs_field_offset_fp = find_offset; icb.mqs_sizeof_fp = find_sizeof; - res = si((mqs_image *)&i,&icb); + res = dll_ep.setup_image(target_image,&icb); if ( res != mqs_ok ) { die_with_code(res,"setup_image() failed"); } { - char *m = NULL; - res = ihq((mqs_image *)&i,&m); - if ( m ) { - show_string("ihqm",m); + char *user_message = NULL; + res = dll_ep.image_has_queues(target_image,&user_message); + if ( user_message ) { + show_string("ihqm",user_message); } if ( res != mqs_ok ) { die_with_code(res,"image_has_queues() failed"); @@ -526,96 +611,65 @@ pcb.mqs_fetch_data_fp = find_data; pcb.mqs_target_to_host_fp = convert_data; - p.rank = -1; - p.image = &i; - - res = sp((mqs_process *)&p,&pcb); + process.rank = -1; + process.image = ℑ + + res = dll_ep.setup_process(target_process,&pcb); if ( res != mqs_ok ) { die_with_code(res,"mqs_setup_process() failed"); } - if ( gr ) { - p.rank = gr((mqs_process *)&p); + if ( dll_ep.get_global_rank ) { + process.rank = dll_ep.get_global_rank(target_process); } else { /* Load the rank into p */ - req_to_int("rank", &p.rank); + req_to_int("rank", &process.rank); } { - char *m = NULL; - res = phq((mqs_process *)&p,&m); - if ( m ) - show_string("phqm",m); + char *user_message = NULL; + res = dll_ep.process_has_queues(target_process,&user_message); + if ( user_message ) + show_string("phqm",user_message); if ( res != mqs_ok ) { - die_with_code(res,"process_has_queue() failed"); + die_with_code(res,"process_has_queues() failed"); } } - ucl((mqs_process *)&p); - - res = sci((mqs_process *)&p); + dll_ep.update_communicator_list(target_process); + + res = dll_ep.setup_communicator_iterator(target_process); if ( res != mqs_ok ) { die_with_code(res,"setup_communicator_iterator() failed"); } do { - mqs_communicator comm; - - res = gc((mqs_process *)&p,&comm); + mqs_communicator comm = {}; + + res = dll_ep.get_communicator(target_process,&comm); if ( res != mqs_ok ) { die_with_code(res,"get_communicator() failed"); } - if ( res == mqs_ok ) { - /* Should check for comm.size here, open-mpi puts MPI_COMM_NULL in the list with a size of 0 */ - char *names[] = { "Barrier", "Bcast", "Allgather", "Allgatherv", "Allreduce", "Alltoall", "Alltoallv", - "Reduce_Scatter", "Reduce", "Gather", "Gatherv", "Scan", "Scatter", "Scatterv" }; - int c; - c = show_comm(&p,&comm); - if ( comm.size > 1 ) { - if ( gcg ) { - int *ranks = malloc(comm.size*sizeof(int)); - int r = gcg((mqs_process *)&p,ranks); - if ( r == mqs_ok ) { - int i; - for ( i = 0 ; i < comm.size ; i++ ) { - printf("out: c:%d rt:%d\n",c,ranks[i]); - } - } - free(ranks); - } - if ( gcs ) { - int seq; - int active; - int r; - int i = 0; - for ( i = 0 ; i<14 ; i++ ) { - seq = -1; - active = -1; - r = gcs((mqs_process *)&p,i,&seq,&active); - if ( r == mqs_ok ) { - if ( seq != 0 ) - printf("comm%d: Collective '%s': call count %d, %sactive\n",c,names[i],seq,active ? "" : "not "); - } else if ( r != mqs_no_information ) { - char *msg; - msg = es(r); - printf("Error: %s\n",msg); - } - } - } - - load_ops((mqs_process *)&p,mqs_pending_receives); - load_ops((mqs_process *)&p,mqs_unexpected_messages); - load_ops((mqs_process *)&p,mqs_pending_sends); - - } - printf("done\n" - ); - - nres = nc((mqs_process *)&p); + show_comm(&process,&comm,comm_id); + + if ( comm.size > 1 ) { + + if ( dll_ep.get_comm_group ) + show_comm_members(target_process,&comm,comm_id); + + if ( dll_ep.get_comm_coll_state ) + show_comm_coll_state(target_process,&comm,comm_id); + + load_all_ops(target_process); } - } while ( res == mqs_ok && nres == mqs_ok ); + printf("done\n"); + + res = dll_ep.next_communicator(target_process); + comm_id++; + + } while ( res == mqs_ok ); show_string("exit","ok"); return 0; From thipadin.seng-long at bull.net Fri Oct 30 16:06:03 2009 From: thipadin.seng-long at bull.net (thipadin.seng-long at bull.net) Date: Fri, 30 Oct 2009 17:06:03 +0100 Subject: [padb-devel] Patch for Support of PBS Pro resource manager Message-ID: Hi, I'm a Bull software engineer and been working for a while on padb to adapt it to PBS Pro environment. The modified padb has been working fine in our MPI_Bull2 / Pbs Pro environment. So I am sending you the patch for you to integrate in the common padb branch. I have started from padb 2.5 version, sorry. If you want to discuss about my coding I am ready, feel free to comment before you commit it. Many other resources managers (slurm-orte, lsf-mpd, lsf-orte) will follow. Thipadin. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Diff_25to26.txt URL: From ashley at pittman.co.uk Fri Oct 30 18:26:00 2009 From: ashley at pittman.co.uk (Ashley Pittman) Date: Fri, 30 Oct 2009 18:26:00 +0000 Subject: [padb-devel] Patch for Support of PBS Pro resource manager In-Reply-To: References: Message-ID: <1256927160.19953.110.camel@alpha> On Fri, 2009-10-30 at 17:06 +0100, thipadin.seng-long at bull.net wrote: > I'm a Bull software engineer and been working for a while on padb to > adapt it to PBS Pro environment. > The modified padb has been working fine in our MPI_Bull2 / Pbs Pro > environment. > So I am sending you the patch for you to integrate in the common padb > branch. > I have started from padb 2.5 version, sorry. > If you want to discuss about my coding I am ready, feel free to > comment before you commit it. > Many other resources managers (slurm-orte, lsf-mpd, lsf-orte) will > follow. First off that's great! I'm always keen to see padb better and available to a wider range of people. Secondly as you'll see if you look at the history I've done a lot of work since the 2.5 release so several aspects of the design are different. For the most part this should make things simpler to code for however it does mean changes to the code you have written. I'm happy to integrate patches which add new resource managers as the danger of breaking anything else is low, I'd like to see if we can get away without they lpbs_get_script_name() function though. Are you able to upgrade to the HEAD code and submit patches based on that? have a look at @inner_conf and in particular slurm_job_step for how to pass on $server and $port to the inner processes. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk From padb at googlecode.com Sat Oct 31 20:51:19 2009 From: padb at googlecode.com (padb at googlecode.com) Date: Sat, 31 Oct 2009 20:51:19 +0000 Subject: [padb-devel] [padb] r307 committed - Create a branch for the 2.5 release using r58. This code was released... Message-ID: <000e0cd2149e714e180477414e26@google.com> Revision: 307 Author: apittman Date: Sat Oct 31 13:50:49 2009 Log: Create a branch for the 2.5 release using r58. This code was released long ago now but it should have had a branch at the time and creating one allows people to create diffs against it. http://code.google.com/p/padb/source/detail?r=307 Added: /branches/2.5