Re: list corruption in IPOIB

2013-05-20 Thread Jinpu Wang
which list_del do you mean? in ipoib_cm_tx_start? On Mon, May 20, 2013 at 11:05 AM, Or Gerlitz ogerl...@mellanox.com wrote: On 19/05/2013 12:17, Jack Wang wrote: we added inject_bug sysfs node to make function run into error case, like something below. Yes, you are right, we want to speedup

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On 20/05/2013 12:10, Jinpu Wang wrote: which list_del do you mean? in ipoib_cm_tx_start? yes, but not only, you can start with 5KG hammer and convert all thesehits to list_del_init linux-2.6]# grep list_del drivers/infiniband/ulp/ipoib/*.c | grep neigh drivers/infiniband/ulp/ipoib/ipoib_cm.c:

libibverbs / libmlx4 release

2013-05-20 Thread Or Gerlitz
Hi Roland, Following what we discussed last week during the Linux Foundation EU summit, I think it would be good to follow what you said and have a point release for libibverbs and libmlx4 before we pull in the verbs extensions framework and features that use it (XRC, Flow-Steering, etc more

Re: list corruption in IPOIB

2013-05-20 Thread Jinpu Wang
A quick test show the list_corruption warning is gone, after I convert all list_del(neigh-list) to list_del_list(neigh-list). Test is still running, will update status if anything wrong. Thanks Or. On Mon, May 20, 2013 at 12:58 PM, Or Gerlitz ogerl...@mellanox.com wrote: On 20/05/2013

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On 20/05/2013 15:46, Jinpu Wang wrote: A quick test show the list_corruption warning is gone, after I convert all list_del(neigh-list) to list_del_list(neigh-list). yes, but this wasn't your original problem or was it? -- To unsubscribe from this list: send the line unsubscribe linux-rdma

Re: list corruption in IPOIB

2013-05-20 Thread Shlomo Pongratz
On 5/20/2013 3:58 PM, Jack Wang wrote: I haven't reproduced the original bug we saw in our production environment BUG: unable to handle kernel at 0008 IP: [a0206c30] ipoib_cm_tx_reap+0xe0/0x5a0 [ib_ipoib] ... RIP: 0010:[a0206c30] [a0206c30]

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
Hi Jack, I don't understand what is the current status, that is what do you see now after applying the patches. If you don't get the original bug why did you gave the trace of it? Or is it a new trace? It is not clear from your mail. Please add only the trace of the current issue.

Re: MLX4 Cq Question

2013-05-20 Thread Jack Morgenstein
On Saturday 18 May 2013 00:37, Roland Dreier wrote: On Fri, May 17, 2013 at 12:25 PM, Tom Tucker t...@opengridcomputing.com wrote: I'm looking at the Linux MLX4 net driver and found something that confuses me mightily. In particular in the file net/ethernet/mellanox/mlx4/cq.c, the

Re: MLX4 Cq Question

2013-05-20 Thread Roland Dreier
On Mon, May 20, 2013 at 7:53 AM, Jack Morgenstein ja...@dev.mellanox.co.il wrote: This is racy and can cause use-after-free, null pointer dereference, etc, which result in kernel crashes. Sounds fine and I'd be happy to apply your final patch, but I'd be curious to know what the race is in

Re: libibverbs / libmlx4 release

2013-05-20 Thread Roland Dreier
On Mon, May 20, 2013 at 5:37 AM, Or Gerlitz ogerl...@mellanox.com wrote: Following what we discussed last week during the Linux Foundation EU summit, I think it would be good to follow what you said and have a point release for libibverbs and libmlx4 before we pull in the verbs extensions

Re: [PATCH libibverbs v5 5/7] libibverbs: Add ibv_open_qp

2013-05-20 Thread Steve Wise
On 3/26/2013 4:14 PM, sean.he...@intel.com wrote: From: Sean Hefty sean.he...@intel.com XRC receive QPs are shareable across multiple processes. Allow any process with access to the xrc domain to open an existing QP. After opening the QP, the process will receive events related to the QP and

RE: [PATCH libibverbs v5 5/7] libibverbs: Add ibv_open_qp

2013-05-20 Thread Hefty, Sean
So is it intended that cooperating processes sharing an xrc domain will choose one process to create the xrc qp, and the rest will open it? yes - The QPN of the shared QP must be known between the cooperating processes. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in

Re: libibverbs / libmlx4 release

2013-05-20 Thread Doug Ledford
On 05/20/2013 12:49 PM, Roland Dreier wrote: On Mon, May 20, 2013 at 5:37 AM, Or Gerlitz ogerl...@mellanox.com wrote: Following what we discussed last week during the Linux Foundation EU summit, I think it would be good to follow what you said and have a point release for libibverbs and

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed list_del to list_del_init as Or suggested. As Or asked for the original bug, so I just want to show him the whole story.

rockets feedbacks?

2013-05-20 Thread Or Gerlitz
Hi Sean, Do we have some public quoted usages/feedback for rsockets? I think you've mentioned something during the panel at the Linux EU summit last week but I am not sure... Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
On 2013年05月20日 21:00, Or Gerlitz wrote: On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed list_del to list_del_init as Or suggested. As Or asked for the original bug, so

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com wrote: The bug in our production environment is introduced in our backport about ipoib fixes from mainline, and when we hit that bug we reverted back to old kernel without the backport patch, and the bug didn't happen for

Re: MLX4 Cq Question

2013-05-20 Thread Tom Tucker
Hi Guys, One other quick one. I've received conflicting claims on the validity of the wc.opcode when wc.status != 0 for mlx4 hardware. My reading of the code (i.e. hw/mlx4/cq.c) is that the hardware cqe owner_sr_opcode field contains MLX4_CQE_OPCODE_ERROR when there is an error and

RE: rockets feedbacks?

2013-05-20 Thread Hefty, Sean
Do we have some public quoted usages/feedback for rsockets? I think you've mentioned something during the panel at the Linux EU summit last week but I am not sure... Most feedback I can think of has come via private emails or personal interactions, especially specific details of various usage

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
On 2013年05月20日 21:50, Or Gerlitz wrote: On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com wrote: The bug in our production environment is introduced in our backport about ipoib fixes from mainline, and when we hit that bug we reverted back to old kernel without the

RE: MLX4 Cq Question

2013-05-20 Thread Hefty, Sean
My reading of the code (i.e. hw/mlx4/cq.c) is that the hardware cqe owner_sr_opcode field contains MLX4_CQE_OPCODE_ERROR when there is an error and therefore, the only way to recover what the opcode was is through the wr_id you used when submitting the WR. Is my reading of the code correct?

Re: MLX4 Cq Question

2013-05-20 Thread Tom Tucker
On 5/20/13 2:58 PM, Hefty, Sean wrote: My reading of the code (i.e. hw/mlx4/cq.c) is that the hardware cqe owner_sr_opcode field contains MLX4_CQE_OPCODE_ERROR when there is an error and therefore, the only way to recover what the opcode was is through the wr_id you used when submitting the WR.

Re: rockets feedbacks?

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 10:52 PM, Hefty, Sean sean.he...@intel.com wrote: Do we have some public quoted usages/feedback for rsockets? I think you've mentioned something during the panel at the Linux EU summit last week but I am not sure... Most feedback I can think of has come via private

RE: rockets feedbacks?

2013-05-20 Thread Hefty, Sean
So if you were pushing these private conversations to linux-rdma, more have been known on rsockets for the benefit of all... oh well. I think you mentioned something re Intel HPC group, or I am wrong? rsockets will continue to be supported by myself and Intel going forward. The rsocket work

[PATCH 0/3] make read_config() more robust

2013-05-20 Thread Yann Droneaud
Hi, Please find three patches to protect libibverbs from using invalid, unsecure configuration files. Thoses configurations files are usually located in /etc/libibverbs.d/ and contains the name of a shared library to dlopen(). Only legitimate shared libraries should be loaded by libibverbs, so

[PATCH 1/3] read_config: ignore files beginning with '.'

2013-05-20 Thread Yann Droneaud
Files beginning with a dot are mostly current and parent directories or, by convention, hidden files. Those path are skipped in find_sysfs_dev(). Signed-off-by: Yann Droneaud ydrone...@opteya.com --- src/init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/init.c b/src/init.c index

[PATCH 2/3] read_config: ignore directory entry with backup suffix (~)

2013-05-20 Thread Yann Droneaud
Try to protect libibverbs from hand modified configuration files. Signed-off-by: Yann Droneaud ydrone...@opteya.com --- src/init.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/init.c b/src/init.c index c880b68..1981da7 100644 --- a/src/init.c +++ b/src/init.c @@ -311,6 +311,9 @@

[PATCH 3/3] read_config: skip file/directory with unsecure permissions

2013-05-20 Thread Yann Droneaud
libibverbs must refuse to load arbitrary shared objects. This patch check the configuration directory and files for - being owned by root; - not being writable by others. Signed-off-by: Yann Droneaud ydrone...@opteya.com --- src/init.c | 23 +-- 1 file changed, 21

Re: [PATCH 0/4] add RAW Packet QP type

2013-05-20 Thread Shawn Bohrer
On Tue, Jan 17, 2012 at 10:21:28AM -0600, Steve Wise wrote: On 01/17/2012 09:59 AM, Or Gerlitz wrote: On 1/17/2012 5:08 PM, Steve Wise wrote: I think this series should add some new send flags for HW that does checksum offload [...] also, on ingress, most hardware can do INET checksum

[PATCH 2/2] configure: use automake's option subdir-objects

2013-05-20 Thread Yann Droneaud
Following advice in Autotool Mythbuster [1], option subdir-objects can be used to have Makefiles create object files in the same directory than theirs source files. It reduces clobbering in the build directory. [1] Autotool Mythbuster, by Diego Elio Flameeyes Petten`o

[PATCH 1/2] configure: apply updates proposed by autoupdate

2013-05-20 Thread Yann Droneaud
'autoupdate' is a tool to help developer to update configure.ac. This patch apply a few fixes as suggested by autoupdate. It was tested on Debian 6.0.7 (Squeeze) and Fedora 17 (Beefy Miracle). Signed-off-by: Yann Droneaud ydrone...@opteya.com --- configure.ac | 17 - 1 file

[PATCH] open files with close on exec flag

2013-05-20 Thread Yann Droneaud
File opened by libibverbs are not supposed to be inherited across exec*(), most of the files are of no use for another program, and others cannot be used without the associated memory mapping. This patch changes open() and fopen() to always set close on exec flag. This patch also add checks to

Re: [PATCH for-next 0/9] Add receive Flow Steering support

2013-05-20 Thread Shawn Bohrer
On Wed, Apr 24, 2013 at 04:58:43PM +0300, Or Gerlitz wrote: Hi Roland, all The first five patches in the series are mlx4 DMFS (Device Managed Flow Steering) pre-patches needed for flow steering access from the mlx4 IB driver. net/mlx4_core: Move DMFS HW structs to common header file