You could try configuring with --enable-debug and then set -mca
dpm_base_verbose 5 on the cmd line of your two jobs that are trying to connect.
Will provide some hopefully useful debug info.
BTW: how did you configure OMPI?
On Dec 21, 2010, at 7:33 AM, Suraj Prabhakaran wrote:
>
> On 12/21/2
On 12/21/2010 03:12 PM, Ralph Castain wrote:
Are you using ompi-server for pub/sub, or just letting it default to
mpirun?
You might want to output the return value from lookup_name and
publish_name to see if they match. If they are different, then you
will definitely hang.
I used ompi-serv
Are you using ompi-server for pub/sub, or just letting it default to mpirun?
You might want to output the return value from lookup_name and publish_name to
see if they match. If they are different, then you will definitely hang.
On Dec 21, 2010, at 6:41 AM, Suraj Prabhakaran wrote:
> Hello,
>
Hello,
This is basically a repost of my previous mail regarding problems with
connect/accept and disconnect (**this is not related to spawning,
parent/child**).
I *sometimes* find processes blocking indefinitely at Connect/Accept
calls or at Disconnect calls. I have an example below.
Process