Re: [OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Ralph Castain
You could try configuring with --enable-debug and then set -mca dpm_base_verbose 5 on the cmd line of your two jobs that are trying to connect. Will provide some hopefully useful debug info. BTW: how did you configure OMPI? On Dec 21, 2010, at 7:33 AM, Suraj Prabhakaran wrote: > > On 12/21/2

Re: [OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Suraj Prabhakaran
On 12/21/2010 03:12 PM, Ralph Castain wrote: Are you using ompi-server for pub/sub, or just letting it default to mpirun? You might want to output the return value from lookup_name and publish_name to see if they match. If they are different, then you will definitely hang. I used ompi-serv

Re: [OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Ralph Castain
Are you using ompi-server for pub/sub, or just letting it default to mpirun? You might want to output the return value from lookup_name and publish_name to see if they match. If they are different, then you will definitely hang. On Dec 21, 2010, at 6:41 AM, Suraj Prabhakaran wrote: > Hello, >

[OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Suraj Prabhakaran
Hello, This is basically a repost of my previous mail regarding problems with connect/accept and disconnect (**this is not related to spawning, parent/child**). I *sometimes* find processes blocking indefinitely at Connect/Accept calls or at Disconnect calls. I have an example below. Process