Ah ok, I put it there just because the user couldn't read that from my home 
space, and never even thought of that.  gahhh.

Thanks,

BTW I tried joining the padb mailing list.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Sep 1, 2010, at 6:11 PM, Ashley Pittman wrote:

> 
> padb as a binary (it's a perl script) needs to exist on all nodes as it calls 
> orterun on itself, try installing it to a shared directory or copying padb to 
> /tmp on every node.
> 
> To access the message queues padb needs a compiled helper program which is 
> installed in $PREFIX/lib so I would recommend re-building padb giving it a 
> prefix of a NFS shared directory.  I can help you more with this if required.
> 
> Ashley,
> 
> On 1 Sep 2010, at 23:01, Brock Palen wrote:
> 
>> We have ddt, but we do not have licenses to attach to the number of cores 
>> these jobs run at.
>> 
>> I tried padb,  but it fails, 
>> 
>> Example:
>> 
>> ssh to root node for running MPI job:
>> /tmp/padb -Q -a
>> 
>> [nyx0862.engin.umich.edu:25054] [[22211,0],0]-[[25542,0],0] oob-tcp: 
>> Communication retries exceeded.  Can not communicate with peer
>> [nyx0862.engin.umich.edu:25054] [[22211,0],0] ORTE_ERROR_LOG: Unreachable in 
>> file util/comm/comm.c at line 62
>> [nyx0862.engin.umich.edu:25054] [[22211,0],0] ORTE_ERROR_LOG: Unreachable in 
>> file orte-ps.c at line 799
>> [nyx0862.engin.umich.edu:25054] [[22211,0],0]-[[25542,0],0] oob-tcp: 
>> Communication retries exceeded.  Can not communicate with peer
>> einner: 
>> --------------------------------------------------------------------------
>> einner: orterun was unable to launch the specified application as it could 
>> not access
>> einner: or execute an executable:
>> Unexpected EOF from Inner stdout (connecting)
>> Unexpected EOF from Inner stderr (connecting)
>> Unexpected exit from parallel command (state=connecting)
>> Bad exit code from parallel command (exit_code=131)
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 


Reply via email to