[PATCH 3/3] mlx4: add support for reading performance counters

2010-11-10 Thread Eli Cohen
This patch uses basic or extended counters which can be read by a command interface, to report counters for all the QPs that work on an rdmaoe port. This effectively allows to implement performance counter ala IB. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/infiniband/hw/mlx4/mad.c

Re: ib0: failed send event

2010-11-10 Thread Tziporet Koren
On 11/4/2010 10:45 PM, Craig Prescott wrote: Hi, We have a new-ish node whose IPoIB access was interrupted last night; last message in the syslog was: ib0: failed send event (status=2, wrid=74 vend_err 67) The node has a Mellanox ConnectX HCA (MHQH19-XTC). I understand the status

RE: ib receive completion error

2010-11-10 Thread Tziporet Koren
On 11/9/2010 9:22 PM, Usha Srinivasan wrote: Hello, Can someone from Mellanox tell me what the vendor error 0x32 means? I am getting this error for wc.opcode 128 (IB_WC_RECV) wc.status 4 (IB_WC_LOC_PROT_ERR). This error means corrupted MPT or PD violation Tziporet -- To

Re: ib receive completion error

2010-11-10 Thread Or Gerlitz
Usha Srinivasan wrote: Can someone from Mellanox tell me what the vendor error 0x32 means? I am getting this error for wc.opcode 128 (IB_WC_RECV) wc.status 4 (IB_WC_LOC_PROT_ERR). I am running ofed 1.5.2 and am getting it on both rhel5 and sles11 You can't count on the wc.opcode when the

Re: asynchronous operation with poll()

2010-11-10 Thread Jonathan Rosser
On 11/09/10 20:44, Jason Gunthorpe wrote: Broadly it looks to me like your actions are in the wrong order. A poll based RDMA loop should look like this: - exit poll - Check poll bit - call ibv_get_cq_event - call ibv_req_notify_cq - repeatedly call ibv_poll_cq (while rc == num requested) -

Re: asynchronous operation with poll()

2010-11-10 Thread Jonathan Rosser
On 11/10/10 10:30, Andrea Gozzelino wrote: Hi Jonathan, I wrote down a test (latency and transfer speed) with RDMA. Server and client work with the same code and they change defined size buffers for n times (loop). In the makefile.txt, you can find an help to use the code. I tested Intel

Pro/Cons of send/receive/read/write operations?

2010-11-10 Thread ib
It seems there are several ways to transfer data via RDMA: Send/Send w/immediate, Receive, RDMA Write/RDMA Write w/immediate, the Atomic extensions, and then the various transport modes variations with some of these operations. As an RDMA/IB newbie, it is somewhat confusing when

Re: asynchronous operation with poll()

2010-11-10 Thread Roland Dreier
Could I get some clarification on where there is no ordering guarantee? The WC's do not necessarily come back in the order that the sends were posted? For a given queue, completions are always returned in the order that work requests were posted. However there is no ordering between

Re: asynchronous operation with poll()

2010-11-10 Thread Jason Gunthorpe
On Wed, Nov 10, 2010 at 09:43:12AM -0800, Roland Dreier wrote: Could I get some clarification on where there is no ordering guarantee? The WC's do not necessarily come back in the order that the sends were posted? For a given queue, completions are always returned in the order that

Re: asynchronous operation with poll()

2010-11-10 Thread Jason Gunthorpe
On Wed, Nov 10, 2010 at 02:39:03PM +, Jonathan Rosser wrote: Continually posting sends and recvs will get you into trouble, you will run out of recvs and get RNR's. These days the wisdom for implementing RDMA is that you should have explicit message flow OK - I appreciate that a real

multicast sends not received by sending QP

2010-11-10 Thread Hefty, Sean
On one of our test clusters, I'm seeing a situation where multicast traffic is not received on the QP which sends a message. All nodes that I actually tested on the cluster have this problem (5 of 15). The devinfo from one of these is below: hca_id: mlx4_0 transport:

RE: multicast sends not received by sending QP

2010-11-10 Thread Hefty, Sean
The cluster is running OFED [...] I do not see this problem on 2.6.36 I think you hit this ofed only patch http://git.openfabrics.org/git?p=ofed_1_5/linux- 2.6.git;a=blob;f=kernel_patches/fixes/mlx4_0290_mcast_loopback.patch;h=786a 3926529befac2c2d1fa6d8c36bada79d61a7;hb=HEAD Yes - this

Re: [PATCH] infiniband: core: fix information leak to userland

2010-11-10 Thread Roland Dreier
Structure ib_uverbs_qp_attr is copied to userland with allmost all fields uninitialized (140 bytes on x86). It leads to leaking of contents of kernel stack memory. I don't think most of the fields are uninitialized... we have: memset(qp_attr, 0, sizeof qp_attr); and then later

Re: Pro/Cons of send/receive/read/write operations?

2010-11-10 Thread Jason Gunthorpe
On Wed, Nov 10, 2010 at 10:14:16AM -0700, i...@celticblues.com wrote: It seems there are several ways to transfer data via RDMA: Send/Send w/immediate, Receive, RDMA Write/RDMA Write w/immediate, the Atomic extensions, and then the various transport modes variations with some of these

RE: [PATCH] infiniband: core: fix information leak to userland

2010-11-10 Thread Hefty, Sean
Sean, what is intended for qp_state handling here? It seems ib_copy_qp_attr_to_user() should either clear it or set it to something sensible. I'm not sure what the original intent was, but both libibcm and librdmacm provide the qp_state as input to the init_qp_attr calls. It doesn't end up