On 6/7/2011 10:23 AM, George Bosilca wrote: > > On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote: > >> George, >> >> I did not look over all the details of your test, but it looks to >> me like you are violating one of the requirements of >> intercomm_create namely the request that the two groups have to be >> disjoint. In your case the parent process(es) are part of both >> local intra-communicators, isn't it? > > The two groups of the two local communicators are disjoints. One > contains A,B while the other only C. The bridge communicator contains > A,C. > > I'm confident my example is supposed to work. At least for Open MPI > the error is under the hood, as the resulting inter-communicator is > valid but contains NULL endpoints for the remote process.
I'll come back to that later, I am not yet convinced that your code is correct :-) Your local groups might be disjoint, but I am worried about the ranks of the remote leader in your example. THey can not be 0 from both groups perspective. > > Regarding the fact that the two leader should be separate processes, > you will not find any wording about this in the current version of > the standard. In the 1.1 there were two opposite sentences about this > one stating that the two groups can be disjoint, while the other > claiming that the two leaders can be the same process. After > discussion, the agreement was that the two groups have to be > disjoint, and the standard has been amended to match the agreement. I realized that this is a non-issue. If the two local groups are disjoint, there is no way that the two local leaders are the same process. Thanks Edgar > > george. > > >> >> I just have MPI-1.1. at hand right now, but here is what it says: >> ---- >> >> Overlap of local and remote groups that are bound into an >> inter-communicator is prohibited. If there is overlap, then the >> program is erroneous and is likely to deadlock. >> >> ---- so bottom line is that the two local intra-communicators that >> are being used have to be disjoint, and the bridgecomm needs to be >> a communicator where at least one process of each of the two >> disjoint groups need to be able to talk to each other. >> Interestingly I did not find a sentence whether it is allowed to be >> the same process, or whether the two local leaders need to be >> separate processes... >> >> >> Thanks Edgar >> >> >> On 6/7/2011 12:57 AM, George Bosilca wrote: >>> Frederic, >>> >>> Attached you will find an example that is supposed to work. The >>> main difference with your code is on T3, T4 where you have >>> inversed the local and remote comm. As depicted on the picture >>> attached below, during the 3th step you will create the intercomm >>> between ab and c (no overlap) using ac as a bridge communicator >>> (here the two roots, a and c, can exchange messages). >>> >>> Based on the MPI 2.2 standard, especially on the paragraph in >>> PS:, the attached code should have been working. Unfortunately, I >>> couldn't run it successfully neither with Open MPI trunk nor >>> MPICH2 1.4rc1. >>> >>> george. >>> >>> PS: Here is what the MPI standard states about the >>> MPI_Intercomm_create: >>>> The function MPI_INTERCOMM_CREATE can be used to create an >>>> inter-communicator from two existing intra-communicators, in >>>> the following situation: At least one selected member from each >>>> group (the “group leader”) has the ability to communicate with >>>> the selected member from the other group; that is, a “peer” >>>> communicator exists to which both leaders belong, and each >>>> leader knows the rank of the other leader in this peer >>>> communicator. Furthermore, members of each group know the rank >>>> of their leader. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote: >>> >>>> Hello, >>>> >>>> I have a problem using MPI_Intercomm_create. >>>> >>>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two >>>> spawn operations by T0. >>>> >>>> So I have two intra-communicator : >>>> >>>> intra0 contains : T0, T1, T2 intra1 contains : T0, T3, T4 >>>> >>>> my goal is to make a collective loop to build a single >>>> intra-communicator containing T0, T1, T2, T3, T4 >>>> >>>> I tried to do it using MPI_Intercomm_create and >>>> MPI_Intercom_merge calls, but without success (I always get MPI >>>> internal errors). >>>> >>>> What I am doing : >>>> >>>> on T0 : ******* >>>> >>>> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com) >>>> >>>> on T1 and T2 : ************** >>>> >>>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com) >>>> >>>> on T3 and T4 : ************** >>>> >>>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com) >>>> >>>> >>>> I'm certainly missing something. Could anybody help me to solve >>>> this problem ? >>>> >>>> Best regards, >>>> >>>> Frédéric. >>>> >>>> PS : of course I did an extensive web search without finding >>>> anything usefull on my problem. >>>> >>>> _______________________________________________ users mailing >>>> list us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ users mailing >>> list us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> -- Edgar Gabriel Assistant Professor Parallel Software Technologies >> Lab http://pstl.cs.uh.edu Department of Computer Science >> University of Houston Philip G. Hoffman Hall, Room 524 >> Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: >> +1 (713) 743-3335 >> >> _______________________________________________ users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
signature.asc
Description: OpenPGP digital signature