You're calling bcast with root=0, so whatever value rank 0 has for srv, 
everyone will have after the bcast.  Plus, I didn't see in your code where *srv 
was ever set to 0.

In my runs, rank 0 is usually the one that publishes first.  Everyone then gets 
the lookup properly, and then the bcast sends srv=1 to everyone.  They all then 
try to call MPI_Comm_accept.

Your code was incomplete, so I had to extend it; see attached.

Here's a sample output with 8 procs:

[7:12] svbu-mpi:~/mpi % mpicc lookup.c -o lookup -g && mpirun lookup
[0] Publish name
[0] service ocean available at 
3853516800.0;tcp://172.29.218.140:36685;tcp://10.10.10.140:36685;tcp://10.10.20.140:36685;tcp://10.10.30.140:36685;tcp://172.16.68.1:36685;tcp://172.16.29.1:36685+3853516801.0;tcp://172.29.218.150:34210;tcp://10.10.30.150:34210:300
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
[2] Lookup name
[6] Lookup name
[4] Lookup name
[3] Lookup name
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
[1] Lookup name
[7] Lookup name
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
[5] Lookup name
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
MPI_Lookup_name succeeded
Bcast
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
Bcast complete: srv=1
Server calling MPI_Comm_accept
[hang -- because everyone's in accept, not connect]



On Jan 7, 2011, at 4:17 AM, Bernard Secher - SFME/LGLS wrote:

> Jeff,
> 
> Only the processes of the program where process 0 successed to publish name, 
> have srv=1 and then call MPI_Comm_accept.
> The processes of the program where process 0 failed to publish name, have 
> srv=0 and then call MPI_Comm_connect.
> 
> That's worked like this with openmpi 1.4.1.
> 
> Is it different whith openmpi 1.5.1 ?
> 
> Best
> Bernard
> 
> 
> Jeff Squyres a écrit :
>> On Jan 5, 2011, at 10:36 AM, Bernard Secher - SFME/LGLS wrote:
>> 
>>   
>> 
>>> MPI_Comm remoteConnect(int myrank, int *srv, char *port_name, char* service)
>>> {
>>>   int clt=0;
>>>   MPI_Request request; /* requete pour communication non bloquante */
>>>   MPI_Comm gcom;
>>>   MPI_Status status; 
>>>   char   port_name_clt[MPI_MAX_PORT_NAME]; 
>>> 
>>>   if( service == NULL ) service = defaultService;
>>> 
>>>   /* only process of rank null can publish name */
>>>   MPI_Barrier(MPI_COMM_WORLD);
>>> 
>>>   /* A lookup for an unpublished service generate an error */
>>>   MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>>   if( myrank == 0 ){
>>>     /* Try to be a server. If there service is already published, try to be 
>>> a cient */
>>>     MPI_Open_port(MPI_INFO_NULL, port_name); 
>>>     printf("[%d] Publish name\n",myrank);
>>>     if ( MPI_Publish_name(service, MPI_INFO_NULL, port_name) == MPI_SUCCESS 
>>> )  {
>>>       *srv = 1;
>>>       printf("[%d] service %s available at %s\n",myrank,service,port_name);
>>>     }
>>>     else if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == 
>>> MPI_SUCCESS ){
>>>       MPI_Close_port( port_name ); 
>>>       clt = 1;
>>>     }
>>>     else
>>>       /* Throw exception */
>>>       printf("[%d] Error\n",myrank);
>>>   }
>>>   else{
>>>     /* Waiting rank 0 publish name */
>>>     sleep(1);
>>>     printf("[%d] Lookup name\n",myrank);
>>>     if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == 
>>> MPI_SUCCESS ){
>>>       clt = 1;
>>>     }
>>>     else
>>>       /* Throw exception */
>>>       ;
>>>   }
>>>   MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>   
>>>   MPI_Bcast(srv,1,MPI_INT,0,MPI_COMM_WORLD);
>>>     
>>> 
>> 
>> You're broadcasting srv here -- won't everyone now have *srv==1, such that 
>> they all call MPI_COMM_ACCEPT, below?
>> 
>>   
>> 
>>>   if ( *srv )
>>>     /* I am the Master */
>>>     MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom );
>>>   else{
>>>     /*  Connect to service SERVER, get the inter-communicator server*/
>>>     MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>>     if ( MPI_Comm_connect(port_name_clt, MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
>>> &gcom )  == MPI_SUCCESS )
>>>       printf("[%d] I get the connection with %s at %s !\n",myrank, service, 
>>> port_name_clt);
>>>     MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>   }
>>> 
>>>   if(myrank != 0) *srv = 0;
>>> 
>>>   return gcom;
>>> 
>>> }


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Attachment: lookup.c
Description: Binary data

Reply via email to