Thanks Jeff, i'll try this flag. Regards.
2009/1/23 Jeff Squyres <jsquy...@cisco.com>: > This is with the 1.2 series, right? > > Have you tried using what is described here: > > > http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion > > I don't know if you can try OMPI v1.3 or not, but the issue described in the > the above FAQ item is fixed properly in the OMPI v1.3 series (i.e., that MCA > parameter is unnecessary because we fixed it a different way). > > FWIW, if adding an MPI_Barrier is the difference between hanging and not > hanging, it sounds like an Open MPI bug. You should never need to add an > MPI_Barrier to make an MPI program correct. > > > > On Jan 23, 2009, at 8:09 AM, Gabriele Fatigati wrote: > >> Hi Igor, >> My message size is 4096kb and i have 4 procs per core. >> There isn't any difference using different algorithms.. >> >> 2009/1/23 Igor Kozin <i.n.ko...@googlemail.com>: >>> >>> what is your message size and the number of cores per node? >>> is there any difference using different algorithms? >>> >>> 2009/1/23 Gabriele Fatigati <g.fatig...@cineca.it> >>>> >>>> Hi Jeff, >>>> i would like to understand why, if i run over 512 procs or more, my >>>> code stops over mpi collective, also with little send buffer. All >>>> processors are locked into call, doing nothing. But, if i add >>>> MPI_Barrier after MPI collective, it works! I run over Infiniband >>>> net. >>>> >>>> I know many people with this strange problem, i think there is a >>>> strange interaction between Infiniband and OpenMPI that causes it. >>>> >>>> >>>> >>>> 2009/1/23 Jeff Squyres <jsquy...@cisco.com>: >>>>> >>>>> On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote: >>>>> >>>>>> I've noted that OpenMPI has an asynchronous behaviour in the >>>>>> collective >>>>>> calls. >>>>>> The processors, doesn't wait that other procs arrives in the call. >>>>> >>>>> That is correct. >>>>> >>>>>> This behaviour sometimes can cause some problems with a lot of >>>>>> processors in the jobs. >>>>> >>>>> Can you describe what exactly you mean? The MPI spec specifically >>>>> allows >>>>> this behavior; OMPI made specific design choices and optimizations to >>>>> support this behavior. FWIW, I'd be pretty surprised if any optimized >>>>> MPI >>>>> implementation defaults to fully synchronous collective operations. >>>>> >>>>>> Is there an OpenMPI parameter to lock all process in the collective >>>>>> call until is finished? Otherwise i have to insert many MPI_Barrier >>>>>> in my code and it is very tedious and strange.. >>>>> >>>>> As you have notes, MPI_Barrier is the *only* collective operation that >>>>> MPI >>>>> guarantees to have any synchronization properties (and it's a fairly >>>>> weak >>>>> guarantee at that; no process will exit the barrier until every process >>>>> has >>>>> entered the barrier -- but there's no guarantee that all processes >>>>> leave >>>>> the >>>>> barrier at the same time). >>>>> >>>>> Why do you need your processes to exit collective operations at the >>>>> same >>>>> time? >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> Cisco Systems >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Ing. Gabriele Fatigati >>>> >>>> Parallel programmer >>>> >>>> CINECA Systems & Tecnologies Department >>>> >>>> Supercomputing Group >>>> >>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >>>> >>>> www.cineca.it Tel: +39 051 6171722 >>>> >>>> g.fatigati [AT] cineca.it >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Ing. Gabriele Fatigati >> >> Parallel programmer >> >> CINECA Systems & Tecnologies Department >> >> Supercomputing Group >> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >> >> www.cineca.it Tel: +39 051 6171722 >> >> g.fatigati [AT] cineca.it >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- Ing. Gabriele Fatigati Parallel programmer CINECA Systems & Tecnologies Department Supercomputing Group Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy www.cineca.it Tel: +39 051 6171722 g.fatigati [AT] cineca.it