Thanks, I though of looking at ompi_info after I sent that note sigh. SEND_INPLACE appears to help performance of larger messages in my synthetic benchmarks over regular SEND. Also it appears that SEND_INPLACE still allows our code to run.
We working on getting devs access to our system and code. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 16, 2011, at 11:49 AM, George Bosilca wrote: > Here is the output of the "ompi_info --param btl openib": > > MCA btl: parameter "btl_openib_flags" (current value: <306>, > data > source: default value) > BTL bit flags (general flags: SEND=1, PUT=2, GET=4, > SEND_INPLACE=8, RDMA_MATCHED=64, > HETEROGENEOUS_RDMA=256; flags > only used by the "dr" PML (ignored by others): > ACK=16, > CHECKSUM=32, RDMA_COMPLETION=128; flags only used by > the "bfo" > PML (ignored by others): FAILOVER_SUPPORT=512) > > So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of > these flags are totally useless in the current version of Open MPI (DR is not > supported), so the only value that really matter is SEND | HETEROGENEOUS_RDMA. > > If you want to enable the send protocol try first with SEND | SEND_INPLACE > (9), if not downgrade to SEND (1) > > george. > > On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote: > >> >> On May 16, 2011, at 8:53 AM, Brock Palen wrote: >> >>> >>> >>> >>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote: >>> >>>> Hi, >>>> >>>> Just out of curiosity - what happens when you add the following MCA option >>>> to your openib runs? >>>> >>>> -mca btl_openib_flags 305 >>> >>> You Sir found the magic combination. >> >> :-) - cool. >> >> Developers - does this smell like a registered memory availability hang? >> >>> I verified this lets IMB and CRASH progress pass their lockup points, >>> I will have a user test this, >> >> Please let us know what you find. >> >>> Is this an ok option to put in our environment? What does 305 mean? >> >> There may be a performance hit associated with this configuration, but if it >> lets your users run, then I don't see a problem with adding it to your >> environment. >> >> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on >> SEND. >> >> OpenFabrics gurus - please correct me if I'm wrong :-). >> >> Samuel Gutierrez >> Los Alamos National Laboratory >> >> >>> >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> Center for Advanced Computing >>> bro...@umich.edu >>> (734)936-1985 >>> >>>> >>>> Thanks, >>>> >>>> Samuel Gutierrez >>>> Los Alamos National Laboratory >>>> >>>> On May 13, 2011, at 2:38 PM, Brock Palen wrote: >>>> >>>>> On May 13, 2011, at 4:09 PM, Dave Love wrote: >>>>> >>>>>> Jeff Squyres <jsquy...@cisco.com> writes: >>>>>> >>>>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote: >>>>>>> >>>>>>>> We can reproduce it with IMB. We could provide access, but we'd have >>>>>>>> to >>>>>>>> negotiate with the owners of the relevant nodes to give you interactive >>>>>>>> access to them. Maybe Brock's would be more accessible? (If you >>>>>>>> contact me, I may not be able to respond for a few days.) >>>>>>> >>>>>>> Brock has replied off-list that he, too, is able to reliably reproduce >>>>>>> the issue with IMB, and is working to get access for us. Many thanks >>>>>>> for your offer; let's see where Brock's access takes us. >>>>>> >>>>>> Good. Let me know if we could be useful >>>>>> >>>>>>>>> -- we have not closed this issue, >>>>>>>> >>>>>>>> Which issue? I couldn't find a relevant-looking one. >>>>>>> >>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/2714 >>>>>> >>>>>> Thanks. In csse it's useful info, it hangs for me with 1.5.3 & np=32 on >>>>>> connectx with more than one collective I can't recall. >>>>> >>>>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1, >>>>> well that doesn't help here, both my production code (crash) and IMB >>>>> still hang. >>>>> >>>>> >>>>> Brock Palen >>>>> www.umich.edu/~brockp >>>>> Center for Advanced Computing >>>>> bro...@umich.edu >>>>> (734)936-1985 >>>>> >>>>>> >>>>>> -- >>>>>> Excuse the typping -- I have a broken wrist >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > George Bosilca > Research Assistant Professor > Innovative Computing Laboratory > Department of Electrical Engineering and Computer Science > University of Tennessee, Knoxville > http://web.eecs.utk.edu/~bosilca/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >