On 6/7/2011 10:23 AM, George Bosilca wrote:
> On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote:
>> George,
>> I did not look over all the details of your test, but it looks to
>> me like you are violating one of the requirements of
>> intercomm_create namely the request that the two groups have to be
>> disjoint. In your case the parent process(es) are part of both
>> local intra-communicators, isn't it?
> The two groups of the two local communicators are disjoints. One
> contains A,B while the other only C. The bridge communicator contains
> A,C.
> I'm confident my example is supposed to work. At least for Open MPI
> the error is under the hood, as the resulting inter-communicator is
> valid but contains NULL endpoints for the remote process.

I'll come back to that later, I am not yet convinced that your code is
correct :-) Your local groups might be disjoint, but I am worried about
the ranks of the remote leader in your example. THey can not be 0 from
both groups perspective.

> Regarding the fact that the two leader should be separate processes,
> you will not find any wording about this in the current version of
> the standard. In the 1.1 there were two opposite sentences about this
> one stating that the two groups can be disjoint, while the other
> claiming that the two leaders can be the same process. After
> discussion, the agreement was that the two groups have to be
> disjoint, and the standard has been amended to match the agreement.

I realized that this is a non-issue. If the two local groups are
disjoint, there is no way that the two local leaders are the same process.


> george.
>> I just have MPI-1.1. at hand right now, but here is what it says: 
>> ----
>> Overlap of local and remote groups that are bound into an 
>> inter-communicator is prohibited. If there is overlap, then the
>> program is erroneous and is likely to deadlock.
>> ---- so bottom line is that the two local intra-communicators that
>> are being used have to be disjoint, and the bridgecomm needs to be
>> a communicator where at least one process of each of the two
>> disjoint groups need to be able to talk to each other.
>> Interestingly I did not find a sentence whether it is allowed to be
>> the same process, or whether the two local leaders need to be
>> separate processes...
>> Thanks Edgar
>> On 6/7/2011 12:57 AM, George Bosilca wrote:
>>> Frederic,
>>> Attached you will find an example that is supposed to work. The
>>> main difference with your code is on T3, T4 where you have
>>> inversed the local and remote comm. As depicted on the picture
>>> attached below, during the 3th step you will create the intercomm
>>> between ab and c (no overlap) using ac as a bridge communicator
>>> (here the two roots, a and c, can exchange messages).
>>> Based on the MPI 2.2 standard, especially on the paragraph in
>>> PS:, the attached code should have been working. Unfortunately, I
>>> couldn't run it successfully neither with Open MPI trunk nor
>>> MPICH2 1.4rc1.
>>> george.
>>> PS: Here is what the MPI standard states about the
>>> MPI_Intercomm_create:
>>>> The function MPI_INTERCOMM_CREATE can be used to create an
>>>> inter-communicator from two existing intra-communicators, in
>>>> the following situation: At least one selected member from each
>>>> group (the “group leader”) has the ability to communicate with
>>>> the selected member from the other group; that is, a “peer”
>>>> communicator exists to which both leaders belong, and each
>>>> leader knows the rank of the other leader in this peer
>>>> communicator. Furthermore, members of each group know the rank
>>>> of their leader.
>>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
>>>> Hello,
>>>> I have a problem using MPI_Intercomm_create.
>>>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two
>>>> spawn operations by T0.
>>>> So I have two intra-communicator :
>>>> intra0 contains : T0, T1, T2 intra1 contains : T0, T3, T4
>>>> my goal is to make a collective loop to build a single
>>>> intra-communicator containing T0, T1, T2, T3, T4
>>>> I tried to do it using MPI_Intercomm_create and
>>>> MPI_Intercom_merge calls, but without success (I always get MPI
>>>> internal errors).
>>>> What I am doing :
>>>> on T0 : *******
>>>> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com)
>>>> on T1 and T2 : **************
>>>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com)
>>>> on T3 and T4 : **************
>>>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com)
>>>> I'm certainly missing something. Could anybody help me to solve
>>>> this problem ?
>>>> Best regards,
>>>> Frédéric.
>>>> PS : of course I did an extensive web search without finding
>>>> anything usefull on my problem.
>>>> _______________________________________________ users mailing
>>>> list us...@open-mpi.org 
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________ users mailing
>>> list us...@open-mpi.org 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> -- Edgar Gabriel Assistant Professor Parallel Software Technologies
>> Lab      http://pstl.cs.uh.edu Department of Computer Science
>> University of Houston Philip G. Hoffman Hall, Room 524
>> Houston, TX-77204, USA Tel: +1 (713) 743-3857                  Fax:
>> +1 (713) 743-3335
>> _______________________________________________ users mailing list 
>> us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________ users mailing list 
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to