From harris.duncan at gmail.com Mon May 12 11:55:09 2014 From: harris.duncan at gmail.com (Duncan .H) Date: Mon, 12 May 2014 11:55:09 +0100 Subject: [padb-users] Trouble running padb with intelmpi Message-ID: Hi, We're having problem running padb with intelMPI (4.x) on our systems and were hoping for some advice on tracking down the problem. We keep getting these errors: -------- host3:~> padb --show-jobs 33224 host3:~> padb -tx 33224 No MPIR_proctable_size symbol found, cannot continue No suitable backend found (perhaps try installing pdsh or clush ?)! Fatal problem setting up the resource manager: mpirun -------- pdsh -is- installed and available to the user. We have Hydra as our underlying process manager. We're have gdb version 7.2-48.e16 We've tried intelmpi 4.0.3 and 4.1.1 with the same results. Explicitly setting the resource manager to mpirun doesn't help: host3:~> padb --list-rmgrs local: 33064 33135 33171 33172 33224 33229 33230 33234 33235 33236 33237 33238 33239 33240 33241 33242 33243 33244 33245 33246 33247 33248 33249 33350 33351 33797 local-fd: No active jobs. local-qsnet: Not detected on system. lsf: Not detected on system. lsf-rms: Not detected on system. mpd: No active jobs. mpirun: 33224 orte: Not detected on system. pbs: Warning, job is listed with unexpected server Warning, job is listed with unexpected server 794585 794586 rms: Not detected on system. slurm: Not detected on system. host3:~> export PADB_RMGR=mpirun host3:~> padb -tx 33224 No MPIR_proctable_size symbol found, cannot continue No suitable backend found (perhaps try installing pdsh or clush ?)! Fatal problem setting up the resource manager: mpirun Any suggestions? Thanks, Duncan From ashley at pittman.co.uk Tue May 13 20:22:39 2014 From: ashley at pittman.co.uk (Ashley Pittman) Date: Tue, 13 May 2014 20:22:39 +0100 Subject: [padb-users] Trouble running padb with intelmpi In-Reply-To: References: Message-ID: <6D089CF7-568C-4F47-BBE9-9B3CB85147C3@pittman.co.uk> On 12 May 2014, at 11:55, Duncan .H wrote: > Hi, > We're having problem running padb with intelMPI (4.x) on our systems > and were hoping for some advice on tracking down the problem. > > We keep getting these errors: > > -------- > host3:~> padb --show-jobs > 33224 > host3:~> padb -tx 33224 > No MPIR_proctable_size symbol found, cannot continue > No suitable backend found (perhaps try installing pdsh or clush ?)! > Fatal problem setting up the resource manager: mpirun > ???? I think what is happening here is that the mpirun code is failing to find the process table and then the backend code is looking for a backend that can run on zero hosts, as such the pdsh message is a being erroneously reported. I can fix that specific but obviously it won?t help you with the underlying problem here. Mpich2 and hydra need to be built with the --enable-debuginfo configure flag to enable both the message queue support and also the MPI_proctable interface which allows debugger attach, I?m assuming the hydra you?re using is provided as part of intelMPI, in which case it would be worth asking them if they build with that option enabled. Ashley,