[padb] (version-dependent?) problem with ORTE

Ashley Pittman ashley at pittman.co.uk
Sun Dec 5 22:06:39 GMT 2010


On 5 Dec 2010, at 19:16, Dave Love wrote:

> I reported a while ago that I couldn't make ORTE work, and I've found
> out why.  With open-mpi 1.4.1 or 1.4.2, the problem is that the format
> padb expects from ompi-ps is wrong.  I assume it's changed at some
> stage.  Here's a (truncated) sample:

This looks to be a Checkpoint dependant change, I regularly update my open-mpi install and have never had this problem, I don't have the Ckpt entries in the output though.

> The following patch fixes it for me, but presumably it will break
> whatever version the support was done for originally, so I don't know
> what to do for a real patch.  Maybe you need to match patterns in the
> records, rather than just checking the number of fields?  Let me know if
> I can provide any more info to help disambiguate things.

Yes, this would break it for most other people although I'm glad it works for you.  Interestingly perl seems to be ignoring the empty fields after splitting on | so it's likely that if you were using checkpoint-restart it would also break for you.  The attached should fix it and still works for me, I'm incredibly reluctant to apply this so close to release though because I've no idea what the valid values the opmi_is_valid_state() function should check for are.  I'll have another go tomorrow and try adding some code to keep track of where in the output the parsing has progressed to, alternatively it's possible to make ompi_is_valid_state() always return 1 with this patch and have it still work but then it's based on matching of the jobid only which doesn't strike me as a good idea.

Ashley,

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ompi-ps-format.patch
Type: application/octet-stream
Size: 2043 bytes
Desc: not available
URL: <http://pittman.org.uk/pipermail/padb-devel_pittman.org.uk/attachments/20101205/da2010eb/attachment.obj>
-------------- next part --------------


-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



More information about the padb-devel mailing list