Re: mlx4 catas_reset hangs when using the CM

2012-04-02 Thread sebastien dugue
On Fri, 30 Mar 2012 11:33:56 -0700 Roland Dreier rol...@kernel.org wrote: On Thu, Mar 29, 2012 at 4:41 AM, sebastien dugue sebastien.du...@bull.net wrote:  So it looks like that cma_process_remove() did all it's job cleaning up but is hung waiting for the client refcount to reach 0, which

mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Bart Van Assche
Hi, Apparently applications based on libumad can find local ports with kernel 3.2.x but not with kernel 3.4-rc1. # uname -r 3.4.0-rc1 # ls /sys/class/infiniband/mlx4_0/ports/1/rate /sys/class/infiniband/mlx4_0/ports/1/rate # cat /sys/class/infiniband/mlx4_0/ports/1/rate cat:

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Bart Van Assche
On 04/02/12 10:33, Or Gerlitz wrote: On 4/2/2012 10:42 AM, Bart Van Assche wrote: # uname -r 3.4.0-rc1 # ls /sys/class/infiniband/mlx4_0/ports/1/rate /sys/class/infiniband/mlx4_0/ports/1/rate # cat /sys/class/infiniband/mlx4_0/ports/1/rate cat: /sys/class/infiniband/mlx4_0/ports/1/rate:

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Or Gerlitz
On 4/2/2012 2:16 PM, Bart Van Assche wrote: On 04/02/12 10:33, Or Gerlitz wrote: As far as I can see the link layer value is fine: $ cat /sys/class/infiniband/mlx4_0/ports/1/link_layer InfiniBand $ cat /sys/class/infiniband/mlx4_0/ports/2/link_layer InfiniBand So the two ports are actually

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Bart Van Assche
On 04/02/12 11:20, Or Gerlitz wrote: On 4/2/2012 2:16 PM, Bart Van Assche wrote: On 04/02/12 10:33, Or Gerlitz wrote: As far as I can see the link layer value is fine: $ cat /sys/class/infiniband/mlx4_0/ports/1/link_layer InfiniBand $ cat /sys/class/infiniband/mlx4_0/ports/2/link_layer

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Or Gerlitz
On 4/2/2012 2:48 PM, Bart Van Assche wrote: The two ports are connected back-to-back to another mlx4 HCA. I noticed this behavior change since opensm stopped working after rebooting into 3.4-rc1. can you add these prints and send me the output after attempting to cat the rate file? Or.

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Or Gerlitz
On 4/2/2012 3:51 PM, Or Gerlitz wrote: can you add these prints and send me the output after attempting to cat the rate file? okay, on a system which has IB on port 1 and Ethernet on port 2, using this patch I get these prints: ib_link_query_port active_speed 4 rate_show ret 0 for

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Hal Rosenstock
On 4/2/2012 9:02 AM, Or Gerlitz wrote: On 4/2/2012 3:51 PM, Or Gerlitz wrote: can you add these prints and send me the output after attempting to cat the rate file? okay, on a system which has IB on port 1 and Ethernet on port 2, using this patch I get these prints: ib_link_query_port

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Bart Van Assche
On 04/02/12 12:51, Or Gerlitz wrote: On 4/2/2012 2:48 PM, Bart Van Assche wrote: The two ports are connected back-to-back to another mlx4 HCA. I noticed this behavior change since opensm stopped working after rebooting into 3.4-rc1. can you add these prints and send me the output after

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Or Gerlitz
On 4/2/2012 4:35 PM, Bart Van Assche wrote: Some additional info: - This issue only occurs if the back-to-back connected system is down, not if it is running. - The output I get with the other system down is: # cat /sys/class/infiniband/mlx4_0/ports/1/link_layer InfiniBand # dmesg

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Hal Rosenstock
Bart, On 4/2/2012 9:25 AM, Hal Rosenstock wrote: On 4/2/2012 9:02 AM, Or Gerlitz wrote: On 4/2/2012 3:51 PM, Or Gerlitz wrote: can you add these prints and send me the output after attempting to cat the rate file? okay, on a system which has IB on port 1 and Ethernet on port 2, using this

[PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Or Gerlitz
When the IB port is down, the active_speed value returned by the MAD_IFC command equals seven (7) which isn't among the IB speeds defined by the ib_port_speed enum. This results in invalid speed value seen by higher layers or applications who do port query. Fix that by setting the speed to be SDR

[PATCH 0/2]: fixes to port query and sysfs in 3.4-rc1

2012-04-02 Thread Or Gerlitz
Or Gerlitz (2): IB/mlx4: fix the case of invalid speed value returned when the port is down IB/core: add missing string for the display of SDR rates in sysfs drivers/infiniband/core/sysfs.c |1 + drivers/infiniband/hw/mlx4/main.c |4 2 files changed, 5 insertions(+), 0

[PATCH 2/2] IB/core: add missing string for the display of SDR rates in sysfs

2012-04-02 Thread Or Gerlitz
commits 2e96691c IB: Use central enum for speed instead of hard-coded values and e9319b0cb IB/core: Fix SDR rates in sysfs still didn't fill in the SDR string in the SDR switch case, fix that. --- drivers/infiniband/core/sysfs.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff

Re: mlx4: kernel 3.4-rc1 breaks libumad

2012-04-02 Thread Or Gerlitz
On 4/2/2012 4:25 PM, Hal Rosenstock wrote: I think there are 3 main issues here: 1. EINVAL can be returned from rate_show and hence Invalid argument rate string should be handled in libibumad. I think this was Bart's original point. 2. Why is rate_show returning EINVAL ? I think that's what

RE: [RFC] Proposal to change Node Description naming scheme for HCA's

2012-04-02 Thread Heinz, Michael William
I like this idea - but it reminds me of a related issue I raised a while back: nodes can often set the HCA description before they received a hostname from DHCP - in which case you end up with saqueries full of localhost HCA-1. At the time, QLogic's proposal was to modify the kernel stack so

Re: [PATCH] libmlx4: Fix a compiler warning

2012-04-02 Thread Bart Van Assche
On 04/01/12 19:09, Bart Van Assche wrote: On 10/11/11 00:41, Roland Dreier wrote: On Mon, Oct 10, 2011 at 10:47 AM, Bart Van Assche bvanass...@acm.org wrote: - uint32_t hi = *(uint32_t *)(gid-raw); - uint32_t lo = *(uint32_t *)(gid-raw + 4); - if (hi == htonl(0xfe80)

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Hal Rosenstock
On 4/2/2012 10:45 AM, Or Gerlitz wrote: When the IB port is down, the active_speed value returned by the MAD_IFC command equals seven (7) which isn't among the IB speeds defined by the ib_port_speed enum. This results in invalid speed value seen by higher layers or applications who do port

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Or Gerlitz
On 4/2/2012 7:35 PM, Hal Rosenstock wrote: Rather than always overwriting active_speed in this case, wouldn't it be better to only do that for invalid values? Yes, I have thought about that, however, spotting invalid values would make the code a bit ugly, so I took this approach, Roland?

Re: [PATCH 2/2] IB/core: add missing string for the display of SDR rates in sysfs

2012-04-02 Thread Roland Dreier
On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:        switch (attr.active_speed) {        case IB_SPEED_SDR: +               speed = SDR;                rate = 25;                break;        case IB_SPEED_DDR: I don't think we want this -- old kernels didn't

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Hal Rosenstock
On 4/2/2012 12:47 PM, Or Gerlitz wrote: On 4/2/2012 7:35 PM, Hal Rosenstock wrote: Rather than always overwriting active_speed in this case, wouldn't it be better to only do that for invalid values? Yes, I have thought about that, however, spotting invalid values would make the code a bit

Re: [PATCH 2/2] IB/core: add missing string for the display of SDR rates in sysfs

2012-04-02 Thread Or Gerlitz
On Mon, Apr 2, 2012 at 8:42 PM, Roland Dreier rol...@kernel.org wrote: On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:        switch (attr.active_speed) {        case IB_SPEED_SDR: +               speed = SDR;                rate = 25;                break;  

RE: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Hefty, Sean
On 4/2/2012 7:35 PM, Hal Rosenstock wrote: Rather than always overwriting active_speed in this case, wouldn't it be better to only do that for invalid values? Yes, I have thought about that, however, spotting invalid values would make the code a bit ugly, so I took this approach, Roland?

Re: [RFC] Proposal to change Node Description naming scheme for HCA's

2012-04-02 Thread Jason Gunthorpe
On Mon, Apr 02, 2012 at 03:27:35PM +, Heinz, Michael William wrote: Any ideas on how we could solve the hostname problem while we're changing the description? The node description needs to be set from the DCHP notifier script chain (eg /etc/network/if-up.d/ on Debian) and also from a udev

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Hal Rosenstock
On 4/2/2012 2:39 PM, Hefty, Sean wrote: On 4/2/2012 7:35 PM, Hal Rosenstock wrote: Rather than always overwriting active_speed in this case, wouldn't it be better to only do that for invalid values? Yes, I have thought about that, however, spotting invalid values would make the code a bit

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Jason Gunthorpe
On Mon, Apr 02, 2012 at 06:39:36PM +, Hefty, Sean wrote: On 4/2/2012 7:35 PM, Hal Rosenstock wrote: Rather than always overwriting active_speed in this case, wouldn't it be better to only do that for invalid values? Yes, I have thought about that, however, spotting invalid values

Re: [PATCH 2/2] IB/core: add missing string for the display of SDR rates in sysfs

2012-04-02 Thread Or Gerlitz
On Mon, Apr 2, 2012 at 8:42 PM, Roland Dreier rol...@kernel.org wrote: On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote:        switch (attr.active_speed) {        case IB_SPEED_SDR: +               speed = SDR;                rate = 25;                break;    

RE: [RFC] Proposal to change Node Description naming scheme for HCA's

2012-04-02 Thread Heinz, Michael William
I'd agree but the network up/down functionality tends to vary significantly from distro to distro. There's also the question of what package would we add this functionality to? I mean, I assume it wouldn't be part of ofa_kernel. Might actually be the installer script that has to patch the DHCP

RE: Does the CM know how to handle bad packets?

2012-04-02 Thread Hefty, Sean
We noticed the following interesting scenario: One host sent a CM request to a remote host. The remote host, which doesn't have any CM support, performed the following steps: * Replaced the sLID with the dLID * Added an indication that this is a response MAD * Set an

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Roland Dreier
On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote: When the IB port is down, the active_speed value returned by the MAD_IFC command equals seven (7) which isn't among the IB speeds defined by the ib_port_speed enum. This results in invalid speed value seen by higher layers

Re: [PATCH 2/2] IB/core: add missing string for the display of SDR rates in sysfs

2012-04-02 Thread Roland Dreier
On Mon, Apr 2, 2012 at 11:50 AM, Or Gerlitz or.gerl...@gmail.com wrote: Oh, I see what you mean - and I don't think we need to consider this as interface change - these is an interface - namely nGbs (mX DDD) where n is number  m is number and DDD is string (SDR, DDR, QDR, FDR10, etc) and this

[PATCH] IB/core: Don't return EINVAL from sysfs rate attribute for invalid speeds

2012-04-02 Thread Roland Dreier
From: Roland Dreier rol...@purestorage.com Commit e9319b0cb00d (IB/core: Fix SDR rates in sysfs) changed our sysfs rate attribute to return EINVAL to userspace if the underlying device driver returns an invalid rate. Apparently some drivers do this when the link is down and some userspace pukes

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Hal Rosenstock
On 4/2/2012 3:27 PM, Roland Dreier wrote: On Mon, Apr 2, 2012 at 7:45 AM, Or Gerlitz ogerl...@mellanox.com wrote: When the IB port is down, the active_speed value returned by the MAD_IFC command equals seven (7) which isn't among the IB speeds defined by the ib_port_speed enum. This results in

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Roland Dreier
On Mon, Apr 2, 2012 at 12:39 PM, Hal Rosenstock h...@dev.mellanox.co.il wrote: How about validating the speed in mlx4 before overwriting it ? Would you take such a patch ? I don't think so... what does speed even mean when we're reporting the link is down? Do we gain anything from that check?

Re: [PATCH] IB/core: Don't return EINVAL from sysfs rate attribute for invalid speeds

2012-04-02 Thread Or Gerlitz
On Mon, Apr 2, 2012 at 10:35 PM, Roland Dreier rol...@kernel.org wrote: I think I'd rather just add this, to get back closer to the original behavior even for non-fixed drivers (but I'll still merge the mlx4 patch, since that makes sense too). okay, makes sense -- To unsubscribe from this

Re: [PATCH 1/2] IB/mlx4: fix the case of invalid speed value returned when the port is down

2012-04-02 Thread Hal Rosenstock
On 4/2/2012 3:41 PM, Roland Dreier wrote: On Mon, Apr 2, 2012 at 12:39 PM, Hal Rosenstock h...@dev.mellanox.co.il wrote: How about validating the speed in mlx4 before overwriting it ? Would you take such a patch ? I don't think so... what does speed even mean when we're reporting the link

Re: [PATCH] IB/core: Don't return EINVAL from sysfs rate attribute for invalid speeds

2012-04-02 Thread Hal Rosenstock
On 4/2/2012 3:35 PM, Roland Dreier wrote: From: Roland Dreier rol...@purestorage.com Commit e9319b0cb00d (IB/core: Fix SDR rates in sysfs) changed our sysfs rate attribute to return EINVAL to userspace if the underlying device driver returns an invalid rate. Apparently some drivers do this

RE: ibacm fixes/updates

2012-04-02 Thread Hefty, Sean
By default ibacm expects to find its configuration files in /etc/ibacm. This adds to the proliferation of directories in /etc/ needlessly. We already have a number of RDMA related directories to choose from depending on your install (OFED == /etc/ofed or /etc/openib in the old days, RHEL5

Re: [RFC] Proposal to change Node Description naming scheme for HCA's

2012-04-02 Thread Ira Weiny
On Mon, 2 Apr 2012 15:27:35 + Heinz, Michael William michael.william.he...@intel.com wrote: I like this idea - but it reminds me of a related issue I raised a while back: nodes can often set the HCA description before they received a hostname from DHCP - in which case you end up with

Re: be2net: when can I expect roce support patch will be merged?

2012-04-02 Thread Roland Dreier
On Sun, Apr 1, 2012 at 2:48 PM, David Miller da...@davemloft.net wrote: No problem, feel free to add: Acked-by: David S. Miller da...@davemloft.net Great, I pulled in these two patches, and I'll add the full ocrdma driver soon and push it out for -next coverage (with the plan being a merge to