From duncan.thomas at quadrics.com Tue Jun 9 12:54:06 2009 From: duncan.thomas at quadrics.com (Duncan Thomas) Date: Tue, 9 Jun 2009 12:54:06 +0100 Subject: [padb-users] PADB and multiple resource managers Message-ID: <1244548446.4901.26.camel@quadl003> Hi Running padb with multiple resource managers, is it possible for -a and -A to walk all resource managers? Usually I've only got one job running, the one I want to debug, so it should just be able to find it for me. Cheers -- Duncan Thomas From ashley at pittman.co.uk Wed Jun 10 12:24:45 2009 From: ashley at pittman.co.uk (Ashley Pittman) Date: Wed, 10 Jun 2009 12:24:45 +0100 Subject: [padb-users] PADB and multiple resource managers In-Reply-To: <1244548446.4901.26.camel@quadl003> References: <1244548446.4901.26.camel@quadl003> Message-ID: <1244633085.8451.6.camel@localhost.localdomain> On Tue, 2009-06-09 at 12:54 +0100, Duncan Thomas wrote: > Hi > > Running padb with multiple resource managers, is it possible for -a and > -A to walk all resource managers? Usually I've only got one job running, > the one I want to debug, so it should just be able to find it for me. That makes sense to me for the general case, do you really have multiple independent resource managers on the same machine however? How about -a and -A will work for "any" resource manager as long as only one of them is reporting jobs? This would avoid reporting the same job multiple times in the case where you have separate resource managers and job launchers, say lsf and rms. I guess the rmgr setting is the only piece of cluster configuration that padb takes, I'd assumed that it would be correctly detected and if wasn't then the administrator would configure it. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk From duncan.thomas at quadrics.com Wed Jun 10 12:39:46 2009 From: duncan.thomas at quadrics.com (Duncan Thomas) Date: Wed, 10 Jun 2009 12:39:46 +0100 Subject: [padb-users] PADB and multiple resource managers In-Reply-To: <1244633085.8451.6.camel@localhost.localdomain> References: <1244548446.4901.26.camel@quadl003> <1244633085.8451.6.camel@localhost.localdomain> Message-ID: <1244633986.6253.3.camel@quadl003> On Wed, 2009-06-10 at 12:24 +0100, Ashley Pittman wrote: > On Tue, 2009-06-09 at 12:54 +0100, Duncan Thomas wrote: > > Hi > > > > Running padb with multiple resource managers, is it possible for -a and > > -A to walk all resource managers? Usually I've only got one job running, > > the one I want to debug, so it should just be able to find it for me. > > That makes sense to me for the general case, do you really have multiple > independent resource managers on the same machine however? How about -a > and -A will work for "any" resource manager as long as only one of them > is reporting jobs? This would avoid reporting the same job multiple > times in the case where you have separate resource managers and job > launchers, say lsf and rms. Independent ish. A job might exist in one, the other or both, particularly when doing testing. Reporting on the first resource manager that has any jobs probably makes sense. Cheers -- Duncan Thomas From ashley at pittman.co.uk Wed Jun 10 13:52:35 2009 From: ashley at pittman.co.uk (Ashley Pittman) Date: Wed, 10 Jun 2009 13:52:35 +0100 Subject: [padb-users] PADB and multiple resource managers In-Reply-To: <1244633986.6253.3.camel@quadl003> References: <1244548446.4901.26.camel@quadl003> <1244633085.8451.6.camel@localhost.localdomain> <1244633986.6253.3.camel@quadl003> Message-ID: <1244638355.8451.10.camel@localhost.localdomain> On Wed, 2009-06-10 at 12:39 +0100, Duncan Thomas wrote: > On Wed, 2009-06-10 at 12:24 +0100, Ashley Pittman wrote: > Independent ish. A job might exist in one, the other or both, > particularly when doing testing. Reporting on the first resource manager > that has any jobs probably makes sense. Can you try the following patch and let me know if it meets your needs, if there are multiple installed resource managers detected it'll check how many of them have active jobs, if only one has active jobs it'll use that one otherwise it'll bounce the decision back to the user. If I don't hear I'll commit this sometime tomorrow. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk -------------- next part -------------- A non-text attachment was scrubbed... Name: padb-multiple-rmgrs.patch Type: text/x-patch Size: 4595 bytes Desc: not available URL: From daniel.kidger at googlemail.com Wed Jun 10 17:47:48 2009 From: daniel.kidger at googlemail.com (Daniel Kidger) Date: Wed, 10 Jun 2009 17:47:48 +0100 Subject: [padb-users] PADB and multiple resource managers In-Reply-To: <1244638355.8451.10.camel@localhost.localdomain> References: <1244548446.4901.26.camel@quadl003> <1244633085.8451.6.camel@localhost.localdomain> <1244633986.6253.3.camel@quadl003> <1244638355.8451.10.camel@localhost.localdomain> Message-ID: <37e88ea60906100947s42ff9f0fu14825bc81c83234d@mail.gmail.com> hmm consider if you have LSF and RMS running below it. Then the parallel job could be found from either. Does this matter ? hopefully not - unless going by one route has any extra functionality compared to the other Also is always scanning all RMs potentially slow? Daniel 2009/6/10 Ashley Pittman > On Wed, 2009-06-10 at 12:39 +0100, Duncan Thomas wrote: > > On Wed, 2009-06-10 at 12:24 +0100, Ashley Pittman wrote: > > > Independent ish. A job might exist in one, the other or both, > > particularly when doing testing. Reporting on the first resource manager > > that has any jobs probably makes sense. > > Can you try the following patch and let me know if it meets your needs, > if there are multiple installed resource managers detected it'll check > how many of them have active jobs, if only one has active jobs it'll use > that one otherwise it'll bounce the decision back to the user. > > If I don't hear I'll commit this sometime tomorrow. > > Ashley, > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > _______________________________________________ > padb-users mailing list > padb-users at pittman.org.uk > http://pittman.org.uk/mailman/listinfo/padb-users_pittman.org.uk > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashley at pittman.co.uk Wed Jun 10 20:14:58 2009 From: ashley at pittman.co.uk (Ashley Pittman) Date: Wed, 10 Jun 2009 20:14:58 +0100 Subject: [padb-users] PADB and multiple resource managers In-Reply-To: <37e88ea60906100947s42ff9f0fu14825bc81c83234d@mail.gmail.com> References: <1244548446.4901.26.camel@quadl003> <1244633085.8451.6.camel@localhost.localdomain> <1244633986.6253.3.camel@quadl003> <1244638355.8451.10.camel@localhost.localdomain> <37e88ea60906100947s42ff9f0fu14825bc81c83234d@mail.gmail.com> Message-ID: <1244661298.4183.50.camel@localhost.localdomain> On Wed, 2009-06-10 at 17:47 +0100, Daniel Kidger wrote: > hmm > > consider if you have LSF and RMS running below it. Then the parallel > job could be found from either. This is why I say padb should only automatically choose between multiple resource managers if only one of them has active jobs, in the scenario you describe it would refuse to run and require the user to specify a resource manager. > Does this matter ? hopefully not - unless going by one route has any > extra functionality compared to the other Yes, you don't want the same job targeted twice. I view this as something of a special case, if a cluster does have multiple RM's and padb doesn't know the relationship between then then I'd expect the administrator to set which one to use in padb.conf at which point this becomes a non-issue. > Also is always scanning all RMs potentially slow? Slower than not doing so although it already scans the selected RM to query jobs or verify the selected one is indeed running so "not very" is probably the correct answer here. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk From ashley at pittman.co.uk Thu Jun 18 09:37:02 2009 From: ashley at pittman.co.uk (Ashley Pittman) Date: Thu, 18 Jun 2009 09:37:02 +0100 Subject: [padb-users] 2.5 Release candidate available Message-ID: <1245314222.4226.3.camel@localhost.localdomain> All, A 2.5-rc1 candidate is available for download. Assuming no problems are reported with this I'll make it into an official release Monday or Tuesday next week. http://padb.googlecode.com/files/padb-2.5-rc1.tgz Yours, Ashley Pittman. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk From ashley at pittman.co.uk Wed Jun 24 08:55:26 2009 From: ashley at pittman.co.uk (Ashley Pittman) Date: Wed, 24 Jun 2009 08:55:26 +0100 Subject: [padb-users] 2.5 release avalaible for download. Message-ID: <1245830126.3886.5.camel@localhost.localdomain> All, The new 2.5 release is available for download from the website. This is the same code as was in the release candidate made last Thursday, I've had a number of reports of intermittent errors on mpd (MPICH2), this problem is being worked and will be addressed in later versions. Ashley Pittman, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk