I should've been clearer. I have observed the same behavior under both those versions.
I was not using the two version in the same cluster.

-- Mark


Jeff Squyres wrote:
Are you mixing both v1.2.4 and v1.2.5 in a single MPI job? That may have unintended side-effects -- we unfortunately do not guarantee binary compatibility between any of our releases.


On Jul 28, 2008, at 10:16 AM, Mark Borgerding wrote:

I am using version 1.2.4 (Fedora 9) and 1.2.5 ( CentOS 5.2 )


A little clarification:
The children do not actually wake up when the parent *sends* data to them, but only after the parent tries to receive data from the merged intercomm.


Here is the timeline:

...
parent call to MPI_Comm_spawn returns
parent calls MPI_Intercomm_merge
children call to MPI_Init return
children call MPI_Intercomm_merge
parent MPI_Intercomm_merge returns
  (long pause inserted via parent sleep)
parent sends data to kid 1
  (long pause inserted via parent sleep)
parent starts to receive data from kid 1
all children's calls to MPI_Intercomm_merge return


-- Mark

Aurélien Bouteiller wrote:
Ok, I'll check to see what happens. Which version of Open MPI are you using ?

Aurelien

Le 27 juil. 08 à 23:13, Mark Borgerding a écrit :

I got something working, but I'm not 100% sure why.

The children woke up and returned from their calls to MPI_Intercomm_merge only after the parent used the intercomm to send some data to the children via MPI_Send.



Mark Borgerding wrote:
Perhaps I am doing something wrong. The childrens' calls to MPI_Intercomm_merge never return.

Here's the chronology (with 2 children):

parent calls MPI_Init
parent calls MPI_Comm_spawn
child calls MPI_Init
child calls MPI_Init
parent call to MPI_Comm_spawn returns
(long pause inserted)
parent calls MPI_Intercomm_merge
child MPI_Init returns
child calls MPI_Intercomm_merge
child MPI_Init returns
child calls MPI_Intercomm_merge
parent MPI_Intercomm_merge returns
... but the child processes never return from the MPI_InterComm_merge function.


Here are some code snippets:

############# parent:

 MPI_Init(NULL,NULL);

 int nkids=2;
 int errs[nkids];
 MPI_Comm kid;
 cerr << "parent calls MPI_Comm_spawn" << endl;
CHECK_MPI_CODE( MPI_Comm_spawn("test_mpi",NULL,nkids,MPI_INFO_NULL,0,MPI_COMM_WORLD,&kid,errs) );
 cerr << "parent call to MPI_Comm_spawn returns" << endl;
 for (k=0;k<nkids;++k)
     CHECK_MPI_CODE( errs[k] );

 MPI_Comm allmpi;
 cerr << "(long pause)" << endl;
 sleep(3);
 cerr << "parent calls MPI_Intercomm_merge\n";
 CHECK_MPI_CODE( MPI_Intercomm_merge( kid, 0, &allmpi) );
 cerr << "parent MPI_Intercomm_merge returns\n";

############### child:

 fprintf(stderr,"child calls MPI_Init \n");
 CHECK_MPI_CODE( MPI_Init(NULL,NULL) );
 fprintf(stderr,"child MPI_Init returns\n");

 MPI_Comm parent;
 CHECK_MPI_CODE( MPI_Comm_get_parent(&parent) );

 fprintf(stderr,"child calls MPI_Intercomm_merge \n");
 MPI_Comm allmpi;
 CHECK_MPI_CODE( MPI_Intercomm_merge( parent, 1, &allmpi) );
 fprintf(stderr,"child call to MPI_Intercomm_merge returns\n");
(the above line never gets executed)



Aurélien Bouteiller wrote:
MPI_Intercomm_merge is what you are looking for.

Aurelien
Le 26 juil. 08 à 13:23, Mark Borgerding a écrit :

Okay, so I've gotten a little bit closer.

I'm using MPI_Comm_spawn to start several children processes. The problem is that the children are in their own group, separate from the parent (just the like the documentation says). I want to merge the children's group with the parent group so I can efficiently Send/Recv data between them..

Is this possible?

Plan B: I guess if there is no elegant way to merge all those processes into one group, I can connect sockets and make intercomms to talk from the parent directly to each child.

-- Mark



Mark Borgerding wrote:
I am writing a code module that plugs into a larger application framework. That framework loads my code module as a shared object. So I do not control how the first process gets started, but I still want it to be able to start and participate in an MPI group.

Here's roughly what I want to happen ( I think):

framework app running (not under my control)
    -> framework loads mycode.so shared object into its process
-> mycode.so starts mpi programs on several hosts (e.g. via system call to mpiexec ) -> initial mycode.so process participates in the group he just started (e.g. he shows up in MPI_Comm_group, can use MPI_Send, MPI_Recv, etc. )

Can this be done?
I am running under Centos 5.2

Thanks,
Mark

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321





_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Mark Borgerding
3dB Labs, Inc
Innovate.  Develop.  Deliver.

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Mark Borgerding
3dB Labs, Inc
Innovate.  Develop.  Deliver.

Reply via email to