On Fri, 30 Mar 2012 11:33:56 -0700
Roland Dreier rol...@kernel.org wrote:
On Thu, Mar 29, 2012 at 4:41 AM, sebastien dugue
sebastien.du...@bull.net wrote:
So it looks like that cma_process_remove() did all it's job cleaning up
but is hung waiting for the client refcount to reach 0, which
Hi,
Apparently applications based on libumad can find local ports with
kernel 3.2.x but not with kernel 3.4-rc1.
# uname -r
3.4.0-rc1
# ls /sys/class/infiniband/mlx4_0/ports/1/rate
/sys/class/infiniband/mlx4_0/ports/1/rate
# cat /sys/class/infiniband/mlx4_0/ports/1/rate
cat:
On 04/02/12 10:33, Or Gerlitz wrote:
On 4/2/2012 10:42 AM, Bart Van Assche wrote:
# uname -r
3.4.0-rc1
# ls /sys/class/infiniband/mlx4_0/ports/1/rate
/sys/class/infiniband/mlx4_0/ports/1/rate
# cat /sys/class/infiniband/mlx4_0/ports/1/rate
cat: /sys/class/infiniband/mlx4_0/ports/1/rate:
On 4/2/2012 2:16 PM, Bart Van Assche wrote:
On 04/02/12 10:33, Or Gerlitz wrote:
As far as I can see the link layer value is fine:
$ cat /sys/class/infiniband/mlx4_0/ports/1/link_layer
InfiniBand
$ cat /sys/class/infiniband/mlx4_0/ports/2/link_layer
InfiniBand
So the two ports are actually
On 04/02/12 11:20, Or Gerlitz wrote:
On 4/2/2012 2:16 PM, Bart Van Assche wrote:
On 04/02/12 10:33, Or Gerlitz wrote:
As far as I can see the link layer value is fine:
$ cat /sys/class/infiniband/mlx4_0/ports/1/link_layer
InfiniBand
$ cat /sys/class/infiniband/mlx4_0/ports/2/link_layer
On 4/2/2012 2:48 PM, Bart Van Assche wrote:
The two ports are connected back-to-back to another mlx4 HCA. I
noticed this behavior change since opensm stopped working after
rebooting into 3.4-rc1.
can you add these prints and send me the output after attempting to cat
the rate file?
Or.
On 4/2/2012 3:51 PM, Or Gerlitz wrote:
can you add these prints and send me the output after attempting to
cat the rate file?
okay, on a system which has IB on port 1 and Ethernet on port 2, using
this patch
I get these prints:
ib_link_query_port active_speed 4
rate_show ret 0 for
On 4/2/2012 9:02 AM, Or Gerlitz wrote:
On 4/2/2012 3:51 PM, Or Gerlitz wrote:
can you add these prints and send me the output after attempting to
cat the rate file?
okay, on a system which has IB on port 1 and Ethernet on port 2, using
this patch
I get these prints:
ib_link_query_port
On 04/02/12 12:51, Or Gerlitz wrote:
On 4/2/2012 2:48 PM, Bart Van Assche wrote:
The two ports are connected back-to-back to another mlx4 HCA. I
noticed this behavior change since opensm stopped working after
rebooting into 3.4-rc1.
can you add these prints and send me the output after
On 4/2/2012 4:35 PM, Bart Van Assche wrote:
Some additional info:
- This issue only occurs if the back-to-back connected system is down,
not if it is running.
- The output I get with the other system down is:
# cat /sys/class/infiniband/mlx4_0/ports/1/link_layer
InfiniBand
# dmesg
Bart,
On 4/2/2012 9:25 AM, Hal Rosenstock wrote:
On 4/2/2012 9:02 AM, Or Gerlitz wrote:
On 4/2/2012 3:51 PM, Or Gerlitz wrote:
can you add these prints and send me the output after attempting to
cat the rate file?
okay, on a system which has IB on port 1 and Ethernet on port 2, using
this
When the IB port is down, the active_speed value returned by the MAD_IFC
command equals seven (7) which isn't among the IB speeds defined by the
ib_port_speed enum. This results in invalid speed value seen by higher
layers or applications who do port query. Fix that by setting the speed
to be SDR
Or Gerlitz (2):
IB/mlx4: fix the case of invalid speed value returned when the port is down
IB/core: add missing string for the display of SDR rates in sysfs
drivers/infiniband/core/sysfs.c |1 +
drivers/infiniband/hw/mlx4/main.c |4
2 files changed, 5 insertions(+), 0
commits 2e96691c IB: Use central enum for speed instead of hard-coded values
and e9319b0cb IB/core: Fix SDR rates in sysfs still didn't fill in the SDR
string in the SDR switch case, fix that.
---
drivers/infiniband/core/sysfs.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff
On 4/2/2012 4:25 PM, Hal Rosenstock wrote:
I think there are 3 main issues here:
1. EINVAL can be returned from rate_show and hence Invalid argument
rate string should be handled in libibumad. I think this was Bart's original
point.
2. Why is rate_show returning EINVAL ? I think that's what
I like this idea - but it reminds me of a related issue I raised a while back:
nodes can often set the HCA description before they received a hostname from
DHCP - in which case you end up with saqueries full of localhost HCA-1.
At the time, QLogic's proposal was to modify the kernel stack so
On 04/01/12 19:09, Bart Van Assche wrote:
On 10/11/11 00:41, Roland Dreier wrote:
On Mon, Oct 10, 2011 at 10:47 AM, Bart Van Assche bvanass...@acm.org wrote:
- uint32_t hi = *(uint32_t *)(gid-raw);
- uint32_t lo = *(uint32_t *)(gid-raw + 4);
- if (hi == htonl(0xfe80)
On 4/2/2012 10:45 AM, Or Gerlitz wrote:
When the IB port is down, the active_speed value returned by the MAD_IFC
command equals seven (7) which isn't among the IB speeds defined by the
ib_port_speed enum. This results in invalid speed value seen by higher
layers or applications who do port
On 4/2/2012 7:35 PM, Hal Rosenstock wrote:
Rather than always overwriting active_speed in this case, wouldn't it
be better to only do that for invalid values?
Yes, I have thought about that, however, spotting invalid values would
make the code a bit ugly, so I took this approach, Roland?
On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:
switch (attr.active_speed) {
case IB_SPEED_SDR:
+ speed = SDR;
rate = 25;
break;
case IB_SPEED_DDR:
I don't think we want this -- old kernels didn't
On 4/2/2012 12:47 PM, Or Gerlitz wrote:
On 4/2/2012 7:35 PM, Hal Rosenstock wrote:
Rather than always overwriting active_speed in this case, wouldn't it
be better to only do that for invalid values?
Yes, I have thought about that, however, spotting invalid values would
make the code a bit
On Mon, Apr 2, 2012 at 8:42 PM, Roland Dreier rol...@kernel.org wrote:
On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:
switch (attr.active_speed) {
case IB_SPEED_SDR:
+ speed = SDR;
rate = 25;
break;
On 4/2/2012 7:35 PM, Hal Rosenstock wrote:
Rather than always overwriting active_speed in this case, wouldn't it
be better to only do that for invalid values?
Yes, I have thought about that, however, spotting invalid values would
make the code a bit ugly, so I took this approach, Roland?
On Mon, Apr 02, 2012 at 03:27:35PM +, Heinz, Michael William wrote:
Any ideas on how we could solve the hostname problem while we're
changing the description?
The node description needs to be set from the DCHP notifier script
chain (eg /etc/network/if-up.d/ on Debian) and also from a udev
On 4/2/2012 2:39 PM, Hefty, Sean wrote:
On 4/2/2012 7:35 PM, Hal Rosenstock wrote:
Rather than always overwriting active_speed in this case, wouldn't it
be better to only do that for invalid values?
Yes, I have thought about that, however, spotting invalid values would
make the code a bit
On Mon, Apr 02, 2012 at 06:39:36PM +, Hefty, Sean wrote:
On 4/2/2012 7:35 PM, Hal Rosenstock wrote:
Rather than always overwriting active_speed in this case, wouldn't it
be better to only do that for invalid values?
Yes, I have thought about that, however, spotting invalid values
On Mon, Apr 2, 2012 at 8:42 PM, Roland Dreier rol...@kernel.org wrote:
On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:
switch (attr.active_speed) {
case IB_SPEED_SDR:
+ speed = SDR;
rate = 25;
break;
I'd agree but the network up/down functionality tends to vary significantly
from distro to distro. There's also the question of what package would we add
this functionality to? I mean, I assume it wouldn't be part of ofa_kernel.
Might actually be the installer script that has to patch the DHCP
We noticed the following interesting scenario:
One host sent a CM request to a remote host.
The remote host, which doesn't have any CM support, performed the following
steps:
* Replaced the sLID with the dLID
* Added an indication that this is a response MAD
* Set an
On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:
When the IB port is down, the active_speed value returned by the MAD_IFC
command equals seven (7) which isn't among the IB speeds defined by the
ib_port_speed enum. This results in invalid speed value seen by higher
layers
On Mon, Apr 2, 2012 at 11:50 AM, Or Gerlitz or.gerl...@gmail.com wrote:
Oh, I see what you mean - and I don't think we need to consider this
as interface change - these is an interface - namely nGbs (mX DDD)
where n is number m is number and DDD is string (SDR, DDR, QDR,
FDR10, etc) and this
From: Roland Dreier rol...@purestorage.com
Commit e9319b0cb00d (IB/core: Fix SDR rates in sysfs) changed our
sysfs rate attribute to return EINVAL to userspace if the underlying
device driver returns an invalid rate. Apparently some drivers do this
when the link is down and some userspace pukes
On 4/2/2012 3:27 PM, Roland Dreier wrote:
On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:
When the IB port is down, the active_speed value returned by the MAD_IFC
command equals seven (7) which isn't among the IB speeds defined by the
ib_port_speed enum. This results in
On Mon, Apr 2, 2012 at 12:39 PM, Hal Rosenstock h...@dev.mellanox.co.il wrote:
How about validating the speed in mlx4 before overwriting it ? Would you
take such a patch ?
I don't think so... what does speed even mean when we're reporting the
link is down?
Do we gain anything from that check?
On Mon, Apr 2, 2012 at 10:35 PM, Roland Dreier rol...@kernel.org wrote:
I think I'd rather just add this, to get back closer to the original
behavior even for non-fixed drivers (but I'll still merge the mlx4
patch, since that makes sense too).
okay, makes sense
--
To unsubscribe from this
On 4/2/2012 3:41 PM, Roland Dreier wrote:
On Mon, Apr 2, 2012 at 12:39 PM, Hal Rosenstock h...@dev.mellanox.co.il
wrote:
How about validating the speed in mlx4 before overwriting it ? Would you
take such a patch ?
I don't think so... what does speed even mean when we're reporting the
link
On 4/2/2012 3:35 PM, Roland Dreier wrote:
From: Roland Dreier rol...@purestorage.com
Commit e9319b0cb00d (IB/core: Fix SDR rates in sysfs) changed our
sysfs rate attribute to return EINVAL to userspace if the underlying
device driver returns an invalid rate. Apparently some drivers do this
By default ibacm expects to find its configuration files in /etc/ibacm.
This adds to the proliferation of directories in /etc/ needlessly. We
already have a number of RDMA related directories to choose from
depending on your install (OFED == /etc/ofed or /etc/openib in the old
days, RHEL5
On Mon, 2 Apr 2012 15:27:35 +
Heinz, Michael William michael.william.he...@intel.com wrote:
I like this idea - but it reminds me of a related issue I raised a while
back: nodes can often set the HCA description before they received a hostname
from DHCP - in which case you end up with
On Sun, Apr 1, 2012 at 2:48 PM, David Miller da...@davemloft.net wrote:
No problem, feel free to add:
Acked-by: David S. Miller da...@davemloft.net
Great, I pulled in these two patches, and I'll add the full ocrdma
driver soon and push it out for -next coverage (with the plan being
a merge to
40 matches
Mail list logo