Argh, our messed up environment with three generations on infiniband bit us,
Setting openib_cpc_include to rdmacm causes ib to not be used on our old DDR ib 
on some of our hosts.  Note that jobs will never run across our old DDR ib and 
our new QDR stuff where rdmacm does work.

I am doing some testing with:
export OMPI_MCA_btl_openib_cpc_include=rdmacm,oob,xoob

What I want to know is there a way to tell mpirun to 'dump all resolved mca 
settings'  Or something similar. 

The error we get which I think is expected is we set only rdmacm is:
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           nyx0665.engin.umich.edu
  Local device:         mthca0
  Local port:           1
  CPCs attempted:       rdmacm
--------------------------------------------------------------------------

Again I think this is expected on this older hardware. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Apr 22, 2011, at 10:23 AM, Brock Palen wrote:

> On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote:
> 
>> 
>> On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:
>> 
>>> Given that part of our cluster is TCP only, openib wouldn't even startup on 
>>> those hosts
>> 
>> That is correct - it would have no impact on those hosts
>> 
>>> and this would be ignored on hosts with IB adaptors?  
>> 
>> Ummm...not sure I understand this one. The param -will- be used on hosts 
>> with IB adaptors because that is what it is controlling.
>> 
>> However, it -won't- have any impact on hosts without IB adaptors, which is 
>> what I suspect you meant to ask?
> 
> Correct typo, Thanks, I am going to add the environment variable to our 
> OpenMPI modules so rdmacm is our default for now,  Thanks!
> 
>> 
>> 
>>> 
>>> Just checking thanks!
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote:
>>> 
>>>> Over IB, I'm not sure there is much of a drawback.  It might be slightly 
>>>> slower to establish QP's, but I don't think that matters much.
>>>> 
>>>> Over iWARP, rdmacm can cause connection storms as you scale to thousands 
>>>> of MPI processes.
>>>> 
>>>> 
>>>> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:
>>>> 
>>>>> We managed to have another user hit the bug that causes collectives (this 
>>>>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>>>>> 
>>>>> btl_openib_cpc_include rdmacm
>>>>> 
>>>>> My question is if we set this to the default on our system with an 
>>>>> environment variable does it introduce any performance or other issues we 
>>>>> should be aware of?
>>>>> 
>>>>> Is there a reason we should not use rdmacm?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Brock Palen
>>>>> www.umich.edu/~brockp
>>>>> Center for Advanced Computing
>>>>> bro...@umich.edu
>>>>> (734)936-1985
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> -- 
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 


Reply via email to