Tziporet Koren wrote:
> On 2/7/2010 6:39 PM, Steve Wise wrote:
>>
>> If ofed-1.5.1 is based on 2.6.33 then it will get this patch
>> automatically (assuming it goes upstream and makes 2.6.33). Or we can
>> pull it in as a kernel_patches/fixes/ patch.
>>
> OFED 1.5.1 is not based on 2.6.33, but
On 2/7/2010 6:39 PM, Steve Wise wrote:
>
> If ofed-1.5.1 is based on 2.6.33 then it will get this patch
> automatically (assuming it goes upstream and makes 2.6.33). Or we can
> pull it in as a kernel_patches/fixes/ patch.
>
OFED 1.5.1 is not based on 2.6.33, but on 2.6.30, so we need the patc
Tziporet Koren wrote:
> On 2/5/2010 6:52 PM, Sean Hefty wrote:
>>
>>> BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new
>>> kernel
>>> version?
>>>
>> apparently
>>
>>
> Sorry to jump late on this thread
> OFED 1.5.1 was not rebased on a new kernel - its still based on 2.
>Can you identify the source of the regression? ie what was the change
>that broke things?
My understanding is that support for loopback addresses exposes an existing bug
in openmpi. It tries to bind to 127.0.0.1, which now succeeds. Openmpi passes
that address to a remote node for use in conne
On 2/5/2010 6:52 PM, Sean Hefty wrote:
>
>> BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new kernel
>> version?
>>
> apparently
>
>
Sorry to jump late on this thread
OFED 1.5.1 was not rebased on a new kernel - its still based on 2.6.30.
But many time we take patches that
Roland Dreier wrote:
> > My point, though, is that even with this patch in ofed-1.5.1, we still
> > have an openmpi/IB/rdmacm regression. The only way to avoid this
> > regression without changing openmpi is to disallow _all_ rdma binds to
> > 127.0.0.1.
>
> Can you identify the source of the
> My point, though, is that even with this patch in ofed-1.5.1, we still
> have an openmpi/IB/rdmacm regression. The only way to avoid this
> regression without changing openmpi is to disallow _all_ rdma binds to
> 127.0.0.1.
Can you identify the source of the regression? ie what was the cha
Tziporet Koren wrote:
> On 2/7/2010 3:22 AM, Steve Wise wrote:
>>
Good catch, I'll update the patch and submit for 2.6.33 on Monday.
>> NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.
>>
>>
> If this patch will be accepted to the kernel 2.6
On 2/7/2010 3:22 AM, Steve Wise wrote:
>
>>>
>>> Good catch, I'll update the patch and submit for 2.6.33 on Monday.
>>>
>>>
>>>
> NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.
>
>
If this patch will be accepted to the kernel 2.6.33 we can take it too
Tziporet
_
>>
>> Good catch, I'll update the patch and submit for 2.6.33 on Monday.
>>
>>
NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bi
>> -list_for_each_entry(cma_dev, &dev_list, list)
>> +list_for_each_entry(cma_dev, &dev_list, list) {
>> +if (rdma_node_get_transport(cma_dev->device->node_type) !=
>> +RDMA_TRANSPORT_IB)
>> +continue;
>> +
>> for (p = 1; p <= cma
Note, even though this patch resolved the openmpi failure on my iwarp
nodes, ucmatose -b 127.0.0.1 doesn't fail. I haven't looked at the src,
but something funny must be happening.
So we still have a regression issue with ofed-1.5.1/upstream kernels and
openmpi over IB with rdmacm.
Steve.
S
> rdma/cm: disallow loopback address for iwarp devices
>
> From: Sean Hefty
>
> The current RDMA iWarp devices cannot be used to establish
> connections using the loopback address. Prevent rdma_bind_addr
> from associating the loopback address with an iWarp device.
>
> This fixes an issue with o
Sean Hefty wrote:
>> There is still some inconsistency here. Sean, you claimed binds to
>> 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
>> IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
>>
>
> You can verify this by running ucmatose -b
> > Well, I think you are right. This kind of change seems appropriate to
> > me for mainline, but OFED/RHEL should carry a responsibility to manage
> > an identified incompatibility, either patch their kernel, patch their
> > OMPI, or publish an errata. That is the role of a distribution.
>
Sean Hefty wrote:
>> There is still some inconsistency here. Sean, you claimed binds to
>> 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
>> IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
>>
>
> You can verify this by running ucmatose -b
On Feb 5, 2010, at 4:53 PM, Steve Wise wrote:
> There is still some inconsistency here. Sean, you claimed binds to
> 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
> IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
FWIW, I can run Open MPI v1
>There is still some inconsistency here. Sean, you claimed binds to
>127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
>IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
You can verify this by running ucmatose -b 127.0.0.1 and see if the test ente
Jeff Squyres wrote:
> On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:
>
>
>> Well, I think you are right. This kind of change seems appropriate to
>> me for mainline, but OFED/RHEL should carry a responsibility to manage
>> an identified incompatibility, either patch their kernel, patch their
On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:
> Well, I think you are right. This kind of change seems appropriate to
> me for mainline, but OFED/RHEL should carry a responsibility to manage
> an identified incompatibility, either patch their kernel, patch their
> OMPI, or publish an errata.
On Fri, Feb 05, 2010 at 03:08:10PM -0500, Jeff Squyres wrote:
> On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:
>
> > > I think we should remove the feature of allowing binds to 127.0.0.1
> > > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is
> > > a sw-loopback mechani
>Ammasso and Chelsio T3 rnics do not support HW loopback.
It looks like the NES driver doesn't support 127.0.0.1, but does support
loopback connections (gurgle). Here's an untested patch for 2.6.33
(not even compile tested) for consideration then. I'll be testing
this shortly unless there's disa
On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:
> > I think we should remove the feature of allowing binds to 127.0.0.1
> > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is
> > a sw-loopback mechanism anyway...
>
> I don't agree, the kernel should be free to provide a
> > That should be the patch in question. I'm not sure about reaching
> > consensus. :)
> > If the other changes to the rdma_cm aren't closely tied to that change, we
> > may
> > be able to back that one patch out until we can get whatever other fix may
> > be
> > needed.
> I'd like to
Sean Hefty wrote:
>> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
>> just went in for 2.6.33, which is still at -rc6, so if we can quickly
>> reach a consensus, there is still time to get a fix in for 2.6.33.
>>
>
> That should be the patch in question. I'm not sure
On Fri, Feb 05, 2010 at 12:32:51PM -0600, Steve Wise wrote:
> I think we should remove the feature of allowing binds to 127.0.0.1
> altogether based on Jeff's arguments and my assertion that 127.0.0.1 is
> a sw-loopback mechanism anyway...
I don't agree, the kernel should be free to provide a
> I think we should remove the feature of allowing binds to 127.0.0.1
> altogether based on Jeff's arguments and my assertion that 127.0.0.1
> is a sw-loopback mechanism anyway...
Well, someone propose a patch please.
--
Roland Dreier
Cisco.com - http://www.cisco.com
For corporate legal info
>Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
>just went in for 2.6.33, which is still at -rc6, so if we can quickly
>reach a consensus, there is still time to get a fix in for 2.6.33.
That should be the patch in question. I'm not sure about reaching consensus. :)
If the
Jeff Squyres wrote:
> On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:
>
>
>> > But Jeff, note that if someone uses the upstream kernel and OpenMPI,
>> > its busted...
>>
>> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
>> just went in for 2.6.33, which is s
On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:
> > But Jeff, note that if someone uses the upstream kernel and OpenMPI,
> > its busted...
>
> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
> just went in for 2.6.33, which is still at -rc6, so if we can quick
On Feb 5, 2010, at 11:16 AM, Steve Wise wrote:
> > Note that it is highly unlikely that we will release open mpi 1.4.2 in
> > time for ofed 1.5.1.
>
> Jeff, there is no way to handle high priority bug fixes in the current
> released stream?
We have 1.4.2 cooking, but it's not ready yet.
I'll
> But Jeff, note that if someone uses the upstream kernel and OpenMPI,
> its busted...
Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
just went in for 2.6.33, which is still at -rc6, so if we can quickly
reach a consensus, there is still time to get a fix in for 2.6.33.
Sean Hefty wrote:
>> My concern is breaking an existing working OpenMPI in a point release
>> because we changed semantics of the rdma-cm in an ofed point release...
>>
>
> OFED can call this release a point release, but in reality, the content makes
> it
> a major release...
>
>
>> BTW:
>My concern is breaking an existing working OpenMPI in a point release
>because we changed semantics of the rdma-cm in an ofed point release...
OFED can call this release a point release, but in reality, the content makes it
a major release...
>BTW: Was this change an artifact of rebasing ofed-1
> I agree that we should probably not allow 127.0.0.1 binds in
> ofed-1.5.1 at all because it regresses OpenMPI. Even with IB systems,
> if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is
> bound to that rdma interface and advertises this address to its peer
> as an address
Sean Hefty wrote:
>> Also note that trying to bind rdma cm to all interface ip addresses was the
>> way
>> that we were advised by openfabrics to figure out which devices are rdma-
>> capable.
>>
>> As such, it is highly desirable to get the fix transparently in rdmacm and
>> preserve the old sema
>Also note that trying to bind rdma cm to all interface ip addresses was the way
>that we were advised by openfabrics to figure out which devices are rdma-
>capable.
>
>As such, it is highly desirable to get the fix transparently in rdmacm and
>preserve the old semantic. More specifically, it seems
Jeff Squyres (jsquyres) wrote:
>
> Note that it is highly unlikely that we will release open mpi 1.4.2 in
> time for ofed 1.5.1.
>
Jeff, there is no way to handle high priority bug fixes in the current
released stream?
> Also note that trying to bind rdma cm to all interface ip addresses
> was
Note that it is highly unlikely that we will release open mpi 1.4.2 in time for
ofed 1.5.1.
Also note that trying to bind rdma cm to all interface ip addresses was the way
that we were advised by openfabrics to figure out which devices are
rdma-capable.
As such, it is highly desirable to get
I can. Chapter 17 verse 3.1
17.3.1 Loopback
"An HCA shall be able to internally loopback a packet sent to itself. That
is,
the verbs layer can specify a packet to be delivered to the same port
(possibly
a different QP though). The packet shall be delivered without the
packet appearing on the port
> Is this only an iwarp issue? IE do all IB devices support hw
> loopback? And will all future devices support it (IE is it an IBTA
> requirement)?
I do think IBA requires loopback to work. Can't quote chapter & verse
off the top of my head.
--
Roland Dreier
Cisco.com - http://www.cisco.co
Roland Dreier wrote:
> > Hey Roland, are you ok with a device attribute to indicate hw-loopback
> > support?
>
> Sigh, I guess so. Can we have the rdma-cm handle this somewhat
> automagically, eg only choose devices that do handle loopback when
> binding/connecting to 127.0.0.1?
That's the pl
> Hey Roland, are you ok with a device attribute to indicate hw-loopback
> support?
Sigh, I guess so. Can we have the rdma-cm handle this somewhat
automagically, eg only choose devices that do handle loopback when
binding/connecting to 127.0.0.1? Or maybe can we put the handling of
this into t
Sean Hefty wrote:
>> Well then the rdma-cm needs to know which devices support hw loopback.
>> Cuz on a T3-only system, no hwloop...
>>
>
> The problem sounds like it's more than just whether 127.0.0.1 is usable. That
> check may fix openmpi, but it sounds more like the app needs to know whet
>This solution would work. Will you code it up?
I can do that. I just want to make sure that we address the full scope of the
problem.
- Sean
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Sean Hefty wrote:
> At first thought, we can extend enum ib_device_cap_flags to indicate if a
> device
> supports loopback capabilities or not. The rdma_cm could then skip over such
> devices when dealing with a loopback address.
This solution would work. Will you code it up?
Stevo
_
>Well then the rdma-cm needs to know which devices support hw loopback.
>Cuz on a T3-only system, no hwloop...
The problem sounds like it's more than just whether 127.0.0.1 is usable. That
check may fix openmpi, but it sounds more like the app needs to know whether the
device can actually support
Sean Hefty wrote:
>> But how can you determine _which_ rdma device should be used if and app
>> binds to 127.0.0.1? I think this is busted...
>>
>
> The code just picks the first rdma device available. To me, this is
> preferable
> than simply disallowing the loopback device from working at
>But how can you determine _which_ rdma device should be used if and app
>binds to 127.0.0.1? I think this is busted...
The code just picks the first rdma device available. To me, this is preferable
than simply disallowing the loopback device from working at all. I personally
use it all the tim
Sean Hefty wrote:
>> OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid
>> for which IB devices. This logic is now broken. Regardless of whether
>> OpenMPI should use another method for determining which IP address
>> belong to which interfaces, we should probably rethink w
The more I think about this, the more I conclude the rdma-cm is just
broken. There's no way to determine an RDMA device from 127.0.0.1, so
how can bind succeed?
Steve Wise wrote:
> I just opened 1918. The latest ofed-1.5.1 rdma-cm is allowing binds to
> 127.0.0.1. This is no-no for devices
>OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid
>for which IB devices. This logic is now broken. Regardless of whether
>OpenMPI should use another method for determining which IP address
>belong to which interfaces, we should probably rethink whether we're
>breaking rdm
I just opened 1918. The latest ofed-1.5.1 rdma-cm is allowing binds to
127.0.0.1. This is no-no for devices that don't support hw loopback...
OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid
for which IB devices. This logic is now broken. Regardless of whether
OpenM
53 matches
Mail list logo