Thanks, I though of looking at ompi_info after I sent that note sigh.

SEND_INPLACE appears to help performance of larger messages in my synthetic 
benchmarks over regular SEND.  Also it appears that SEND_INPLACE still allows 
our code to run.

We working on getting devs access to our system and code. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 16, 2011, at 11:49 AM, George Bosilca wrote:

> Here is the output of the "ompi_info --param btl openib":
> 
>                 MCA btl: parameter "btl_openib_flags" (current value: <306>, 
> data
>                          source: default value)
>                          BTL bit flags (general flags: SEND=1, PUT=2, GET=4,
>                          SEND_INPLACE=8, RDMA_MATCHED=64, 
> HETEROGENEOUS_RDMA=256; flags
>                          only used by the "dr" PML (ignored by others): 
> ACK=16,
>                          CHECKSUM=32, RDMA_COMPLETION=128; flags only used by 
> the "bfo"
>                          PML (ignored by others): FAILOVER_SUPPORT=512)
> 
> So the 305 flags means: HETEROGENEOUS_RDMA | CHECKSUM | ACK | SEND. Most of 
> these flags are totally useless in the current version of Open MPI (DR is not 
> supported), so the only value that really matter is SEND | HETEROGENEOUS_RDMA.
> 
> If you want to enable the send protocol try first with SEND | SEND_INPLACE 
> (9), if not downgrade to SEND (1)
> 
>  george.
> 
> On May 16, 2011, at 11:33 , Samuel K. Gutierrez wrote:
> 
>> 
>> On May 16, 2011, at 8:53 AM, Brock Palen wrote:
>> 
>>> 
>>> 
>>> 
>>> On May 16, 2011, at 10:23 AM, Samuel K. Gutierrez wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Just out of curiosity - what happens when you add the following MCA option 
>>>> to your openib runs?
>>>> 
>>>> -mca btl_openib_flags 305
>>> 
>>> You Sir found the magic combination.
>> 
>> :-)  - cool.
>> 
>> Developers - does this smell like a registered memory availability hang?
>> 
>>> I verified this lets IMB and CRASH progress pass their lockup points,
>>> I will have a user test this, 
>> 
>> Please let us know what you find.
>> 
>>> Is this an ok option to put in our environment?  What does 305 mean?
>> 
>> There may be a performance hit associated with this configuration, but if it 
>> lets your users run, then I don't see a problem with adding it to your 
>> environment.
>> 
>> If I'm reading things correctly, 305 turns off RDMA PUT/GET and turns on 
>> SEND.
>> 
>> OpenFabrics gurus - please correct me if I'm wrong :-).
>> 
>> Samuel Gutierrez
>> Los Alamos National Laboratory
>> 
>> 
>>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> Samuel Gutierrez
>>>> Los Alamos National Laboratory
>>>> 
>>>> On May 13, 2011, at 2:38 PM, Brock Palen wrote:
>>>> 
>>>>> On May 13, 2011, at 4:09 PM, Dave Love wrote:
>>>>> 
>>>>>> Jeff Squyres <jsquy...@cisco.com> writes:
>>>>>> 
>>>>>>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>>>>>>> 
>>>>>>>> We can reproduce it with IMB.  We could provide access, but we'd have 
>>>>>>>> to
>>>>>>>> negotiate with the owners of the relevant nodes to give you interactive
>>>>>>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>>>>>>> contact me, I may not be able to respond for a few days.)
>>>>>>> 
>>>>>>> Brock has replied off-list that he, too, is able to reliably reproduce 
>>>>>>> the issue with IMB, and is working to get access for us.  Many thanks 
>>>>>>> for your offer; let's see where Brock's access takes us.
>>>>>> 
>>>>>> Good.  Let me know if we could be useful
>>>>>> 
>>>>>>>>> -- we have not closed this issue,
>>>>>>>> 
>>>>>>>> Which issue?   I couldn't find a relevant-looking one.
>>>>>>> 
>>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>>>>>> 
>>>>>> Thanks.  In csse it's useful info, it hangs for me with 1.5.3 & np=32 on
>>>>>> connectx with more than one collective I can't recall.
>>>>> 
>>>>> Extra data point, that ticket said it ran with mpi_preconnect_mpi 1,  
>>>>> well that doesn't help here, both my production code (crash) and IMB 
>>>>> still hang.
>>>>> 
>>>>> 
>>>>> Brock Palen
>>>>> www.umich.edu/~brockp
>>>>> Center for Advanced Computing
>>>>> bro...@umich.edu
>>>>> (734)936-1985
>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Excuse the typping -- I have a broken wrist
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> George Bosilca
> Research Assistant Professor
> Innovative Computing Laboratory
> Department of Electrical Engineering and Computer Science
> University of Tennessee, Knoxville
> http://web.eecs.utk.edu/~bosilca/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 


Reply via email to