From brockp at umich.edu Wed Dec 10 18:57:21 2014 From: brockp at umich.edu (Brock Palen) Date: Wed, 10 Dec 2014 13:57:21 -0500 Subject: [padb-users] padb issues with openmpi 1.8 Message-ID: When trying to use pabd with openmpi 1.8.2 I'm getting errors: Warning, failed to locate ranks [0-23,32-79,88-119] I am invoking with: #ran on the root node padb -Ormgr=orte -a --stack-trace --tree Its strange it can't see local cpus (ranks 0-15) but sees some remote ones, orte-ps gives me a full list of ranks. So i'm confused. Using padb 3.3 Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion brockp at umich.edu (734)936-1985 From ashley at pittman.co.uk Wed Dec 10 20:31:44 2014 From: ashley at pittman.co.uk (Ashley Pittman) Date: Wed, 10 Dec 2014 20:31:44 +0000 Subject: [padb-users] padb issues with openmpi 1.8 In-Reply-To: References: Message-ID: There?s a couple of obvious things that spring to mind here, is orte-ps reporting either hostname of a FQDN for the local ranks? If you could send me the output of orte-ps I can take a look at this tomorrow. Ashley, > On 10 Dec 2014, at 18:57, Brock Palen wrote: > > When trying to use pabd with openmpi 1.8.2 I'm getting errors: > > Warning, failed to locate ranks [0-23,32-79,88-119] > > I am invoking with: > > #ran on the root node > padb -Ormgr=orte -a --stack-trace --tree > > Its strange it can't see local cpus (ranks 0-15) but sees some remote ones, > > orte-ps gives me a full list of ranks. So i'm confused. > > Using padb 3.3 > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > brockp at umich.edu > (734)936-1985 > > > > > _______________________________________________ > padb-users mailing list > padb-users at pittman.org.uk > http://pittman.org.uk/mailman/listinfo/padb-users_pittman.org.uk From brockp at umich.edu Wed Dec 10 20:36:43 2014 From: brockp at umich.edu (Brock Palen) Date: Wed, 10 Dec 2014 15:36:43 -0500 Subject: [padb-users] padb issues with openmpi 1.8 In-Reply-To: References: Message-ID: <0A4568E7-E018-4B43-88CE-5002AC819137@umich.edu> Appears to not be Fully Qualified I don't remember why (this was before I kept notes) but I did have to make an edit to make padb work for us in the past, here is the diff: [root at nyx bin]# diff padb padb.old 9141,9142c9141 < # if ( defined $fns->{fns}{ $frame->{func} } ) { < if ( defined $frame->{func} and defined $fns->{fns}{ $frame->{func} } ) { --- > if ( defined $fns->{fns}{ $frame->{func} } ) { This edit is many months old, not sure why. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: orte-ps.txt URL: -------------- next part -------------- Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion brockp at umich.edu (734)936-1985 > On Dec 10, 2014, at 3:31 PM, Ashley Pittman wrote: > > > There?s a couple of obvious things that spring to mind here, is orte-ps reporting either hostname of a FQDN for the local ranks? If you could send me the output of orte-ps I can take a look at this tomorrow. > > Ashley, > >> On 10 Dec 2014, at 18:57, Brock Palen wrote: >> >> When trying to use pabd with openmpi 1.8.2 I'm getting errors: >> >> Warning, failed to locate ranks [0-23,32-79,88-119] >> >> I am invoking with: >> >> #ran on the root node >> padb -Ormgr=orte -a --stack-trace --tree >> >> Its strange it can't see local cpus (ranks 0-15) but sees some remote ones, >> >> orte-ps gives me a full list of ranks. So i'm confused. >> >> Using padb 3.3 >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> brockp at umich.edu >> (734)936-1985 >> >> >> >> >> _______________________________________________ >> padb-users mailing list >> padb-users at pittman.org.uk >> http://pittman.org.uk/mailman/listinfo/padb-users_pittman.org.uk > From ashley at pittman.co.uk Wed Dec 10 20:39:24 2014 From: ashley at pittman.co.uk (Ashley Pittman) Date: Wed, 10 Dec 2014 20:39:24 +0000 Subject: [padb-users] padb issues with openmpi 1.8 In-Reply-To: <0A4568E7-E018-4B43-88CE-5002AC819137@umich.edu> References: <0A4568E7-E018-4B43-88CE-5002AC819137@umich.edu> Message-ID: <1F6E8153-66D2-4B4E-90DB-606BC1EB2E01@pittman.co.uk> This seems odd, I?ll try on a stock 1.8.2 install here tomorrow and let you know if I can see it this end. Ashley. > On 10 Dec 2014, at 20:36, Brock Palen wrote: > > Appears to not be Fully Qualified > > I don't remember why (this was before I kept notes) > > but I did have to make an edit to make padb work for us in the past, here is the diff: > > [root at nyx bin]# diff padb padb.old > 9141,9142c9141 > < # if ( defined $fns->{fns}{ $frame->{func} } ) { > < if ( defined $frame->{func} and defined $fns->{fns}{ $frame->{func} } ) { > --- >> if ( defined $fns->{fns}{ $frame->{func} } ) { > > This edit is many months old, not sure why. > > > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > brockp at umich.edu > (734)936-1985 > > > >> On Dec 10, 2014, at 3:31 PM, Ashley Pittman wrote: >> >> >> There?s a couple of obvious things that spring to mind here, is orte-ps reporting either hostname of a FQDN for the local ranks? If you could send me the output of orte-ps I can take a look at this tomorrow. >> >> Ashley, >> >>> On 10 Dec 2014, at 18:57, Brock Palen wrote: >>> >>> When trying to use pabd with openmpi 1.8.2 I'm getting errors: >>> >>> Warning, failed to locate ranks [0-23,32-79,88-119] >>> >>> I am invoking with: >>> >>> #ran on the root node >>> padb -Ormgr=orte -a --stack-trace --tree >>> >>> Its strange it can't see local cpus (ranks 0-15) but sees some remote ones, >>> >>> orte-ps gives me a full list of ranks. So i'm confused. >>> >>> Using padb 3.3 >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> brockp at umich.edu >>> (734)936-1985 >>> >>> >>> >>> >>> _______________________________________________ >>> padb-users mailing list >>> padb-users at pittman.org.uk >>> http://pittman.org.uk/mailman/listinfo/padb-users_pittman.org.uk >> >