You're calling bcast with root=0, so whatever value rank 0 has for srv, everyone will have after the bcast. Plus, I didn't see in your code where *srv was ever set to 0.
In my runs, rank 0 is usually the one that publishes first. Everyone then gets the lookup properly, and then the bcast sends srv=1 to everyone. They all then try to call MPI_Comm_accept. Your code was incomplete, so I had to extend it; see attached. Here's a sample output with 8 procs: [7:12] svbu-mpi:~/mpi % mpicc lookup.c -o lookup -g && mpirun lookup [0] Publish name [0] service ocean available at 3853516800.0;tcp://172.29.218.140:36685;tcp://10.10.10.140:36685;tcp://10.10.20.140:36685;tcp://10.10.30.140:36685;tcp://172.16.68.1:36685;tcp://172.16.29.1:36685+3853516801.0;tcp://172.29.218.150:34210;tcp://10.10.30.150:34210:300 Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept [2] Lookup name [6] Lookup name [4] Lookup name [3] Lookup name MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept [1] Lookup name [7] Lookup name MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept [5] Lookup name MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept MPI_Lookup_name succeeded Bcast MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept Bcast complete: srv=1 Server calling MPI_Comm_accept [hang -- because everyone's in accept, not connect] On Jan 7, 2011, at 4:17 AM, Bernard Secher - SFME/LGLS wrote: > Jeff, > > Only the processes of the program where process 0 successed to publish name, > have srv=1 and then call MPI_Comm_accept. > The processes of the program where process 0 failed to publish name, have > srv=0 and then call MPI_Comm_connect. > > That's worked like this with openmpi 1.4.1. > > Is it different whith openmpi 1.5.1 ? > > Best > Bernard > > > Jeff Squyres a écrit : >> On Jan 5, 2011, at 10:36 AM, Bernard Secher - SFME/LGLS wrote: >> >> >> >>> MPI_Comm remoteConnect(int myrank, int *srv, char *port_name, char* service) >>> { >>> int clt=0; >>> MPI_Request request; /* requete pour communication non bloquante */ >>> MPI_Comm gcom; >>> MPI_Status status; >>> char port_name_clt[MPI_MAX_PORT_NAME]; >>> >>> if( service == NULL ) service = defaultService; >>> >>> /* only process of rank null can publish name */ >>> MPI_Barrier(MPI_COMM_WORLD); >>> >>> /* A lookup for an unpublished service generate an error */ >>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN); >>> if( myrank == 0 ){ >>> /* Try to be a server. If there service is already published, try to be >>> a cient */ >>> MPI_Open_port(MPI_INFO_NULL, port_name); >>> printf("[%d] Publish name\n",myrank); >>> if ( MPI_Publish_name(service, MPI_INFO_NULL, port_name) == MPI_SUCCESS >>> ) { >>> *srv = 1; >>> printf("[%d] service %s available at %s\n",myrank,service,port_name); >>> } >>> else if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == >>> MPI_SUCCESS ){ >>> MPI_Close_port( port_name ); >>> clt = 1; >>> } >>> else >>> /* Throw exception */ >>> printf("[%d] Error\n",myrank); >>> } >>> else{ >>> /* Waiting rank 0 publish name */ >>> sleep(1); >>> printf("[%d] Lookup name\n",myrank); >>> if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == >>> MPI_SUCCESS ){ >>> clt = 1; >>> } >>> else >>> /* Throw exception */ >>> ; >>> } >>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL); >>> >>> MPI_Bcast(srv,1,MPI_INT,0,MPI_COMM_WORLD); >>> >>> >> >> You're broadcasting srv here -- won't everyone now have *srv==1, such that >> they all call MPI_COMM_ACCEPT, below? >> >> >> >>> if ( *srv ) >>> /* I am the Master */ >>> MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom ); >>> else{ >>> /* Connect to service SERVER, get the inter-communicator server*/ >>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN); >>> if ( MPI_Comm_connect(port_name_clt, MPI_INFO_NULL, 0, MPI_COMM_WORLD, >>> &gcom ) == MPI_SUCCESS ) >>> printf("[%d] I get the connection with %s at %s !\n",myrank, service, >>> port_name_clt); >>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL); >>> } >>> >>> if(myrank != 0) *srv = 0; >>> >>> return gcom; >>> >>> } -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
lookup.c
Description: Binary data