Re: MLX4 Cq Question

2013-05-21 Thread Or Gerlitz
On 20/05/2013 17:53, Jack Morgenstein wrote: === net/mlx4_core: Fix racy flow in the driver CQ completion handler The mlx4 CQ completion handler, mlx4_cq_completion, doesn't bother to lock the radix tree which is used to manage the table of CQs,

Re: MLX4 Cq Question

2013-05-21 Thread Bart Van Assche
On 05/21/13 11:40, Or Gerlitz wrote: 2. is possible in the Linux kernel for one hard irq callback to flash on CPU X while another hard irq callback is running on the same CPU? I think that from kernel 2.6.35 on MSI IRQs are no longer nested. See also

Re: MLX4 Cq Question

2013-05-21 Thread Or Gerlitz
On 21/05/2013 13:42, Bart Van Assche wrote: On 05/21/13 11:40, Or Gerlitz wrote: 2. is possible in the Linux kernel for one hard irq callback to flash on CPU X while another hard irq callback is running on the same CPU? I think that from kernel 2.6.35 on MSI IRQs are no longer nested. See

Re: MLX4 Cq Question

2013-05-21 Thread Jack Morgenstein
On Tuesday 21 May 2013 13:43, Or Gerlitz wrote: On 21/05/2013 13:42, Bart Van Assche wrote: On 05/21/13 11:40, Or Gerlitz wrote: 2. is possible in the Linux kernel for one hard irq callback to flash on CPU X while another hard irq callback is running on the same CPU? I think that from

Re: MLX4 Cq Question

2013-05-21 Thread Or Gerlitz
On 21/05/2013 17:13, Jack Morgenstein wrote: I just need to verify that the patch can be applied correctly on the upstream kernel. The use of RCU (and not spinlock) makes sense from a performance standpoint in any case. We do NOT want to force mlx4_cq_completion to have a spinlock which is

Re: MLX4 Cq Question

2013-05-20 Thread Jack Morgenstein
On Saturday 18 May 2013 00:37, Roland Dreier wrote: On Fri, May 17, 2013 at 12:25 PM, Tom Tucker t...@opengridcomputing.com wrote: I'm looking at the Linux MLX4 net driver and found something that confuses me mightily. In particular in the file net/ethernet/mellanox/mlx4/cq.c, the

Re: MLX4 Cq Question

2013-05-20 Thread Roland Dreier
On Mon, May 20, 2013 at 7:53 AM, Jack Morgenstein ja...@dev.mellanox.co.il wrote: This is racy and can cause use-after-free, null pointer dereference, etc, which result in kernel crashes. Sounds fine and I'd be happy to apply your final patch, but I'd be curious to know what the race is in

Re: MLX4 Cq Question

2013-05-20 Thread Tom Tucker
Hi Guys, One other quick one. I've received conflicting claims on the validity of the wc.opcode when wc.status != 0 for mlx4 hardware. My reading of the code (i.e. hw/mlx4/cq.c) is that the hardware cqe owner_sr_opcode field contains MLX4_CQE_OPCODE_ERROR when there is an error and

RE: MLX4 Cq Question

2013-05-20 Thread Hefty, Sean
My reading of the code (i.e. hw/mlx4/cq.c) is that the hardware cqe owner_sr_opcode field contains MLX4_CQE_OPCODE_ERROR when there is an error and therefore, the only way to recover what the opcode was is through the wr_id you used when submitting the WR. Is my reading of the code correct?

Re: MLX4 Cq Question

2013-05-20 Thread Tom Tucker
On 5/20/13 2:58 PM, Hefty, Sean wrote: My reading of the code (i.e. hw/mlx4/cq.c) is that the hardware cqe owner_sr_opcode field contains MLX4_CQE_OPCODE_ERROR when there is an error and therefore, the only way to recover what the opcode was is through the wr_id you used when submitting the WR.

Re: MLX4 Cq Question

2013-05-19 Thread Or Gerlitz
On 18/05/2013 00:37, Roland Dreier wrote: you see that when freeing a CQ, we first do the HW2SW_CQ firmware command; once this command completes, no more events will be generated for that CQ. Then we do synchronize_irq for the CQ's interrupt vector. Once that completes, no more completion

MLX4 Cq Question

2013-05-17 Thread Tom Tucker
Hi Roland, I'm looking at the Linux MLX4 net driver and found something that confuses me mightily. In particular in the file net/ethernet/mellanox/mlx4/cq.c, the mlx4_ib_completion function does not take any kind of lock when looking up the SW CQ in the radix tree, however, the mlx4_cq_event

Re: MLX4 Cq Question

2013-05-17 Thread Roland Dreier
On Fri, May 17, 2013 at 12:25 PM, Tom Tucker t...@opengridcomputing.com wrote: I'm looking at the Linux MLX4 net driver and found something that confuses me mightily. In particular in the file net/ethernet/mellanox/mlx4/cq.c, the mlx4_ib_completion function does not take any kind of lock when