Ah ok, I put it there just because the user couldn't read that from my home space, and never even thought of that. gahhh.
Thanks, BTW I tried joining the padb mailing list. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Sep 1, 2010, at 6:11 PM, Ashley Pittman wrote: > > padb as a binary (it's a perl script) needs to exist on all nodes as it calls > orterun on itself, try installing it to a shared directory or copying padb to > /tmp on every node. > > To access the message queues padb needs a compiled helper program which is > installed in $PREFIX/lib so I would recommend re-building padb giving it a > prefix of a NFS shared directory. I can help you more with this if required. > > Ashley, > > On 1 Sep 2010, at 23:01, Brock Palen wrote: > >> We have ddt, but we do not have licenses to attach to the number of cores >> these jobs run at. >> >> I tried padb, but it fails, >> >> Example: >> >> ssh to root node for running MPI job: >> /tmp/padb -Q -a >> >> [nyx0862.engin.umich.edu:25054] [[22211,0],0]-[[25542,0],0] oob-tcp: >> Communication retries exceeded. Can not communicate with peer >> [nyx0862.engin.umich.edu:25054] [[22211,0],0] ORTE_ERROR_LOG: Unreachable in >> file util/comm/comm.c at line 62 >> [nyx0862.engin.umich.edu:25054] [[22211,0],0] ORTE_ERROR_LOG: Unreachable in >> file orte-ps.c at line 799 >> [nyx0862.engin.umich.edu:25054] [[22211,0],0]-[[25542,0],0] oob-tcp: >> Communication retries exceeded. Can not communicate with peer >> einner: >> -------------------------------------------------------------------------- >> einner: orterun was unable to launch the specified application as it could >> not access >> einner: or execute an executable: >> Unexpected EOF from Inner stdout (connecting) >> Unexpected EOF from Inner stderr (connecting) >> Unexpected exit from parallel command (state=connecting) >> Bad exit code from parallel command (exit_code=131) > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >