Re: [OMPI users] MPI_Bcast issue

Ralph Castain Mon, 9 Aug 2010 10:33:51 -0400

No idea what is going on here. No MPI call is implemented as a multicast - it 
all flows over the MPI pt-2-pt system via one of the various algorithms.


Best guess I can offer is that there is a race condition in your program that 
you are tripping when other procs that share the node change the timing.

How did you configure OMPI when you built it?


On Aug 8, 2010, at 11:02 PM, Randolph Pullen wrote:

> The only MPI calls I am using are these (grep-ed from my code):
> 
> MPI_Abort(MPI_COMM_WORLD, 1);
> MPI_Barrier(MPI_COMM_WORLD);
> MPI_Bcast(&bufarray[0].hdr, sizeof(BD_CHDR), MPI_CHAR, 0, MPI_COMM_WORLD);
> MPI_Comm_rank(MPI_COMM_WORLD,&myid);
> MPI_Comm_size(MPI_COMM_WORLD,&numprocs); 
> MPI_Finalize();
> MPI_Init(&argc, &argv);
> MPI_Irecv(
> MPI_Isend(
> MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);
> MPI_Send(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD);
> MPI_Test(&request, &complete, &status);
> MPI_Wait(&request, &status);  
> 
> The big wait happens on receipt of a bcast call that would otherwise work.
> Its a bit mysterious really...
> 
> I presume that bcast is implemented with multicast calls but does it use any 
> actual broadcast calls at all?  
> I know I'm scraping the edges here looking for something but I just cant get 
> my head around why it should fail where it has.
> 
> --- On Mon, 9/8/10, Ralph Castain <r...@open-mpi.org> wrote:
> 
> From: Ralph Castain <r...@open-mpi.org>
> Subject: Re: [OMPI users] MPI_Bcast issue
> To: "Open MPI Users" <us...@open-mpi.org>
> Received: Monday, 9 August, 2010, 1:32 PM
> 
> Hi Randolph
> 
> Unless your code is doing a connect/accept between the copies, there is no 
> way they can cross-communicate. As you note, mpirun instances are completely 
> isolated from each other - no process in one instance can possibly receive 
> information from a process in another instance because it lacks all knowledge 
> of it -unless- they wireup into a greater communicator by performing 
> connect/accept calls between them.
> 
> I suspect you are inadvertently doing just that - perhaps by doing 
> connect/accept in a tree-like manner, not realizing that the end result is 
> one giant communicator that now links together all the N servers.
> 
> Otherwise, there is no possible way an MPI_Bcast in one mpirun can collide or 
> otherwise communicate with an MPI_Bcast between processes started by another 
> mpirun.
> 
> 
> 
> On Aug 8, 2010, at 7:13 PM, Randolph Pullen wrote:
> 
>> Thanks,  although “An intercommunicator cannot be used for collective 
>> communication.” i.e ,  bcast calls., I can see how the MPI_Group_xx calls 
>> can be used to produce a useful group and then communicator;  - thanks again 
>> but this is really the side issue to my main question about MPI_Bcast.
>> 
>> I seem to have duplicate concurrent processes interfering with each other.  
>> This would appear to be a breach of the MPI safety dictum, ie MPI_COMM_WORD 
>> is supposed to only include the processes started by a single mpirun command 
>> and isolate these processes from other similar groups of processes safely.
>> 
>> So, it would appear to be a bug.  If so this has significant implications 
>> for environments such as mine, where it may often occur that the same 
>> program is run by different users simultaneously.  
>> 
>> It is really this issue that it concerning me, I can rewrite the code but if 
>> it can crash when 2 copies run at the same time, I have a much bigger 
>> problem.
>> 
>> My suspicion is that a within the MPI_Bcast handshaking, a syncronising 
>> broadcast call may be colliding across the environments.  My only evidence 
>> is an otherwise working program waits on broadcast reception forever when 
>> two or more copies are run at [exactly] the same time.
>> 
>> Has anyone else seen similar behavior in concurrently running programs that 
>> perform lots of broadcasts perhaps?
>> 
>> Randolph
>> 
>> 
>> --- On Sun, 8/8/10, David Zhang <solarbik...@gmail.com> wrote:
>> 
>> From: David Zhang <solarbik...@gmail.com>
>> Subject: Re: [OMPI users] MPI_Bcast issue
>> To: "Open MPI Users" <us...@open-mpi.org>
>> Received: Sunday, 8 August, 2010, 12:34 PM
>> 
>> In particular, intercommunicators
>> 
>> On 8/7/10, Aurélien Bouteiller <boute...@eecs.utk.edu> wrote:
>> > You should consider reading about communicators in MPI.
>> >
>> > Aurelien
>> > --
>> > Aurelien Bouteiller, Ph.D.
>> > Innovative Computing Laboratory, The University of Tennessee.
>> >
>> > Envoyé de mon iPad
>> >
>> > Le Aug 7, 2010 à 1:05, Randolph Pullen <randolph_pul...@yahoo.com.au> a
>> > écrit :
>> >
>> >> I seem to be having a problem with MPI_Bcast.
>> >> My massive I/O intensive data movement program must broadcast from n to n
>> >> nodes. My problem starts because I require 2 processes per node, a sender
>> >> and a receiver and I have implemented these using MPI processes rather
>> >> than tackle the complexities of threads on MPI.
>> >>
>> >> Consequently, broadcast and calls like alltoall are not completely
>> >> helpful.  The dataset is huge and each node must end up with a complete
>> >> copy built by the large number of contributing broadcasts from the sending
>> >> nodes.  Network efficiency and run time are paramount.
>> >>
>> >> As I don’t want to needlessly broadcast all this data to the sending nodes
>> >> and I have a perfectly good MPI program that distributes globally from a
>> >> single node (1 to N), I took the unusual decision to start N copies of
>> >> this program by spawning the MPI system from the PVM system in an effort
>> >> to get my N to N concurrent transfers.
>> >>
>> >> It seems that the broadcasts running on concurrent MPI environments
>> >> collide and cause all but the first process to hang waiting for their
>> >> broadcasts.  This theory seems to be confirmed by introducing a sleep of
>> >> n-1 seconds before the first MPI_Bcast  call on each node, which results
>> >> in the code working perfectly.  (total run time 55 seconds, 3 nodes,
>> >> standard TCP stack)
>> >>
>> >> My guess is that unlike PVM, OpenMPI implements broadcasts with broadcasts
>> >> rather than multicasts.  Can someone confirm this?  Is this a bug?
>> >>
>> >> Is there any multicast or N to N broadcast where sender processes can
>> >> avoid participating when they don’t need to?
>> >>
>> >> Thanks in advance
>> >> Randolph
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> 
>> -- 
>> Sent from my mobile device
>> 
>> David Zhang
>> University of California, San Diego
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>>  _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -----Inline Attachment Follows-----
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>  _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_Bcast issue

Reply via email to