Tziporet Koren wrote:
On 2/7/2010 6:39 PM, Steve Wise wrote:
If ofed-1.5.1 is based on 2.6.33 then it will get this patch
automatically (assuming it goes upstream and makes 2.6.33). Or we can
pull it in as a kernel_patches/fixes/ patch.
OFED 1.5.1 is not based on 2.6.33, but on 2.6.30, so
On 2/7/2010 6:39 PM, Steve Wise wrote:
If ofed-1.5.1 is based on 2.6.33 then it will get this patch
automatically (assuming it goes upstream and makes 2.6.33). Or we can
pull it in as a kernel_patches/fixes/ patch.
OFED 1.5.1 is not based on 2.6.33, but on 2.6.30, so we need the patch
unde
Tziporet Koren wrote:
On 2/5/2010 6:52 PM, Sean Hefty wrote:
BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new
kernel
version?
apparently
Sorry to jump late on this thread
OFED 1.5.1 was not rebased on a new kernel - its still based on 2.6.30.
But many time we tak
>Can you identify the source of the regression? ie what was the change
>that broke things?
My understanding is that support for loopback addresses exposes an existing bug
in openmpi. It tries to bind to 127.0.0.1, which now succeeds. Openmpi passes
that address to a remote node for use in conne
On 2/5/2010 6:52 PM, Sean Hefty wrote:
BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new kernel
version?
apparently
Sorry to jump late on this thread
OFED 1.5.1 was not rebased on a new kernel - its still based on 2.6.30.
But many time we take patches that were acce
Roland Dreier wrote:
> My point, though, is that even with this patch in ofed-1.5.1, we still
> have an openmpi/IB/rdmacm regression. The only way to avoid this
> regression without changing openmpi is to disallow _all_ rdma binds to
> 127.0.0.1.
Can you identify the source of the regressio
> My point, though, is that even with this patch in ofed-1.5.1, we still
> have an openmpi/IB/rdmacm regression. The only way to avoid this
> regression without changing openmpi is to disallow _all_ rdma binds to
> 127.0.0.1.
Can you identify the source of the regression? ie what was the cha
Tziporet Koren wrote:
On 2/7/2010 3:22 AM, Steve Wise wrote:
Good catch, I'll update the patch and submit for 2.6.33 on Monday.
NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.
If this patch will be accepted to the kernel 2.6.33 we can take it too
If ofed-
On 2/7/2010 3:22 AM, Steve Wise wrote:
Good catch, I'll update the patch and submit for 2.6.33 on Monday.
NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.
If this patch will be accepted to the kernel 2.6.33 we can take it too
Tziporet
--
To unsubscribe fro
Good catch, I'll update the patch and submit for 2.6.33 on Monday.
NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More ma
>> -list_for_each_entry(cma_dev, &dev_list, list)
>> +list_for_each_entry(cma_dev, &dev_list, list) {
>> +if (rdma_node_get_transport(cma_dev->device->node_type) !=
>> +RDMA_TRANSPORT_IB)
>> +continue;
>> +
>> for (p = 1; p <= cma
Note, even though this patch resolved the openmpi failure on my iwarp
nodes, ucmatose -b 127.0.0.1 doesn't fail. I haven't looked at the src,
but something funny must be happening.
So we still have a regression issue with ofed-1.5.1/upstream kernels and
openmpi over IB with rdmacm.
Steve.
rdma/cm: disallow loopback address for iwarp devices
From: Sean Hefty
The current RDMA iWarp devices cannot be used to establish
connections using the loopback address. Prevent rdma_bind_addr
from associating the loopback address with an iWarp device.
This fixes an issue with openmpi, where
Sean Hefty wrote:
There is still some inconsistency here. Sean, you claimed binds to
127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
You can verify this by running ucmatose -b 127.0.0.1 a
> > Well, I think you are right. This kind of change seems appropriate to
> > me for mainline, but OFED/RHEL should carry a responsibility to manage
> > an identified incompatibility, either patch their kernel, patch their
> > OMPI, or publish an errata. That is the role of a distribution.
>
Sean Hefty wrote:
There is still some inconsistency here. Sean, you claimed binds to
127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
You can verify this by running ucmatose -b 127.0.0.1 a
On Feb 5, 2010, at 4:53 PM, Steve Wise wrote:
> There is still some inconsistency here. Sean, you claimed binds to
> 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
> IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
FWIW, I can run Open MPI v1
>There is still some inconsistency here. Sean, you claimed binds to
>127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running
>IB/openmpi/rdmacm should be seeing issues. We need to dig a little more...
You can verify this by running ucmatose -b 127.0.0.1 and see if the test ente
Jeff Squyres wrote:
On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:
Well, I think you are right. This kind of change seems appropriate to
me for mainline, but OFED/RHEL should carry a responsibility to manage
an identified incompatibility, either patch their kernel, patch their
OMPI, or p
On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:
> Well, I think you are right. This kind of change seems appropriate to
> me for mainline, but OFED/RHEL should carry a responsibility to manage
> an identified incompatibility, either patch their kernel, patch their
> OMPI, or publish an errata.
On Fri, Feb 05, 2010 at 03:08:10PM -0500, Jeff Squyres wrote:
> On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:
>
> > > I think we should remove the feature of allowing binds to 127.0.0.1
> > > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is
> > > a sw-loopback mechani
>Ammasso and Chelsio T3 rnics do not support HW loopback.
It looks like the NES driver doesn't support 127.0.0.1, but does support
loopback connections (gurgle). Here's an untested patch for 2.6.33
(not even compile tested) for consideration then. I'll be testing
this shortly unless there's disa
On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:
> > I think we should remove the feature of allowing binds to 127.0.0.1
> > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is
> > a sw-loopback mechanism anyway...
>
> I don't agree, the kernel should be free to provide a
> > That should be the patch in question. I'm not sure about reaching
> > consensus. :)
> > If the other changes to the rdma_cm aren't closely tied to that change, we
> > may
> > be able to back that one patch out until we can get whatever other fix may
> > be
> > needed.
> I'd like to
Sean Hefty wrote:
Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
just went in for 2.6.33, which is still at -rc6, so if we can quickly
reach a consensus, there is still time to get a fix in for 2.6.33.
That should be the patch in question. I'm not sure about reachi
On Fri, Feb 05, 2010 at 12:32:51PM -0600, Steve Wise wrote:
> I think we should remove the feature of allowing binds to 127.0.0.1
> altogether based on Jeff's arguments and my assertion that 127.0.0.1 is
> a sw-loopback mechanism anyway...
I don't agree, the kernel should be free to provide a
> I think we should remove the feature of allowing binds to 127.0.0.1
> altogether based on Jeff's arguments and my assertion that 127.0.0.1
> is a sw-loopback mechanism anyway...
Well, someone propose a patch please.
--
Roland Dreier
Cisco.com - http://www.cisco.com
For corporate legal info
>Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
>just went in for 2.6.33, which is still at -rc6, so if we can quickly
>reach a consensus, there is still time to get a fix in for 2.6.33.
That should be the patch in question. I'm not sure about reaching consensus. :)
If the
Jeff Squyres wrote:
On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:
> But Jeff, note that if someone uses the upstream kernel and OpenMPI,
> its busted...
Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
just went in for 2.6.33, which is still at -rc6, so
On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:
> > But Jeff, note that if someone uses the upstream kernel and OpenMPI,
> > its busted...
>
> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
> just went in for 2.6.33, which is still at -rc6, so if we can quick
On Feb 5, 2010, at 11:16 AM, Steve Wise wrote:
> > Note that it is highly unlikely that we will release open mpi 1.4.2 in
> > time for ofed 1.5.1.
>
> Jeff, there is no way to handle high priority bug fixes in the current
> released stream?
We have 1.4.2 cooking, but it's not ready yet.
I'll
> But Jeff, note that if someone uses the upstream kernel and OpenMPI,
> its busted...
Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This
just went in for 2.6.33, which is still at -rc6, so if we can quickly
reach a consensus, there is still time to get a fix in for 2.6.33.
Sean Hefty wrote:
My concern is breaking an existing working OpenMPI in a point release
because we changed semantics of the rdma-cm in an ofed point release...
OFED can call this release a point release, but in reality, the content makes it
a major release...
BTW: Was this change an
>My concern is breaking an existing working OpenMPI in a point release
>because we changed semantics of the rdma-cm in an ofed point release...
OFED can call this release a point release, but in reality, the content makes it
a major release...
>BTW: Was this change an artifact of rebasing ofed-1
I agree that we should probably not allow 127.0.0.1 binds in
ofed-1.5.1 at all because it regresses OpenMPI. Even with IB systems,
if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is
bound to that rdma interface and advertises this address to its peer
as an address to-which
Sean Hefty wrote:
Also note that trying to bind rdma cm to all interface ip addresses was the way
that we were advised by openfabrics to figure out which devices are rdma-
capable.
As such, it is highly desirable to get the fix transparently in rdmacm and
preserve the old semantic. More specific
>Also note that trying to bind rdma cm to all interface ip addresses was the way
>that we were advised by openfabrics to figure out which devices are rdma-
>capable.
>
>As such, it is highly desirable to get the fix transparently in rdmacm and
>preserve the old semantic. More specifically, it seems
Jeff Squyres (jsquyres) wrote:
Note that it is highly unlikely that we will release open mpi 1.4.2 in
time for ofed 1.5.1.
Jeff, there is no way to handle high priority bug fixes in the current
released stream?
Also note that trying to bind rdma cm to all interface ip addresses
was the
ssage-
From: linux-rdma-ow...@vger.kernel.org
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Roland Dreier
Sent: Thursday, February 04, 2010 3:51 PM
To: Steve Wise
Cc: Sean Hefty; linux-rdma; OpenFabrics EWG; Jeff Squyres
Subject: Re: bug 1918 - openmpi broken due to rdma-cm changes
&g
> Is this only an iwarp issue? IE do all IB devices support hw
> loopback? And will all future devices support it (IE is it an IBTA
> requirement)?
I do think IBA requires loopback to work. Can't quote chapter & verse
off the top of my head.
--
Roland Dreier
Cisco.com - http://www.cisco.co
Roland Dreier wrote:
> Hey Roland, are you ok with a device attribute to indicate hw-loopback
> support?
Sigh, I guess so. Can we have the rdma-cm handle this somewhat
automagically, eg only choose devices that do handle loopback when
binding/connecting to 127.0.0.1?
That's the plan.
Or
> Hey Roland, are you ok with a device attribute to indicate hw-loopback
> support?
Sigh, I guess so. Can we have the rdma-cm handle this somewhat
automagically, eg only choose devices that do handle loopback when
binding/connecting to 127.0.0.1? Or maybe can we put the handling of
this into t
Sean Hefty wrote:
Well then the rdma-cm needs to know which devices support hw loopback.
Cuz on a T3-only system, no hwloop...
The problem sounds like it's more than just whether 127.0.0.1 is usable. That
check may fix openmpi, but it sounds more like the app needs to know whether the
dev
>This solution would work. Will you code it up?
I can do that. I just want to make sure that we address the full scope of the
problem.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at h
Sean Hefty wrote:
At first thought, we can extend enum ib_device_cap_flags to indicate if a device
supports loopback capabilities or not. The rdma_cm could then skip over such
devices when dealing with a loopback address.
This solution would work. Will you code it up?
Stevo
--
To unsubscri
>Well then the rdma-cm needs to know which devices support hw loopback.
>Cuz on a T3-only system, no hwloop...
The problem sounds like it's more than just whether 127.0.0.1 is usable. That
check may fix openmpi, but it sounds more like the app needs to know whether the
device can actually support
Sean Hefty wrote:
But how can you determine _which_ rdma device should be used if and app
binds to 127.0.0.1? I think this is busted...
The code just picks the first rdma device available. To me, this is preferable
than simply disallowing the loopback device from working at all. I perso
>But how can you determine _which_ rdma device should be used if and app
>binds to 127.0.0.1? I think this is busted...
The code just picks the first rdma device available. To me, this is preferable
than simply disallowing the loopback device from working at all. I personally
use it all the tim
Sean Hefty wrote:
OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid
for which IB devices. This logic is now broken. Regardless of whether
OpenMPI should use another method for determining which IP address
belong to which interfaces, we should probably rethink whether we'
>OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid
>for which IB devices. This logic is now broken. Regardless of whether
>OpenMPI should use another method for determining which IP address
>belong to which interfaces, we should probably rethink whether we're
>breaking rdm
50 matches
Mail list logo