Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-09 Thread Or Gerlitz
On 7/9/2015 4:35 PM, Jack Wang wrote: We have other kernel modules together also the autotest infrastructure. It's not that easy to install a 3.18.14 kernel. you said you are running on 3.18.14 and just replaced their stock RDMA stack with MLNX OFED I look into the code a little bit. I thi

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-09 Thread Jack Wang
2015-07-09 13:21 GMT+02:00 Or Gerlitz : > On 7/9/2015 2:14 PM, Jack Wang wrote: >> >> I managed to update the kernel to OFED 3.0 to verify the bug, but I >> can still produce the bug, maybe there're still some synchronice_irq >> is missing? > > > Again, even if you don't use the upstream kernel for

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-09 Thread Or Gerlitz
On 7/9/2015 2:14 PM, Jack Wang wrote: I managed to update the kernel to OFED 3.0 to verify the bug, but I can still produce the bug, maybe there're still some synchronice_irq is missing? Again, even if you don't use the upstream kernel for production, I suggest you try to reproduce the bug the

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-09 Thread Jack Wang
Hi Or, I managed to update the kernel to OFED 3.0 to verify the bug, but I can still produce the bug, maybe there're still some synchronice_irq is missing? Thanks Jack 2015-07-08 16:07 GMT+02:00 Jack Wang : > Thanks for your time. > > Looks the last one is missing in OFED 2.4 driver, I just chec

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Or Gerlitz
On Wed, Jul 8, 2015 at 5:07 PM, Jack Wang wrote: > Looks the last one is missing in OFED 2.4 driver, I just checked the > history of mainline > > commit bf1bac5b7882daa41249f85fbc97828f0597de5c > Author: Eli Cohen > Date: Thu Oct 23 15:57:27 2014 +0300 > > net/mlx4_core: Call synchronize_irq(

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Jack Wang
Thanks for your time. Looks the last one is missing in OFED 2.4 driver, I just checked the history of mainline commit bf1bac5b7882daa41249f85fbc97828f0597de5c Author: Eli Cohen Date: Thu Oct 23 15:57:27 2014 +0300 net/mlx4_core: Call synchronize_irq() before freeing EQ buffer After m

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Or Gerlitz
On 7/8/2015 3:47 PM, Jack Wang wrote: static void mlx4_ib_cq_comp(struct mlx4_cq *cq) 47 { 48 struct ib_cq *ibcq = &to_mibcq(cq)->ibcq; 49 ibcq->comp_handler(ibcq, ibcq->cq_context); 50 } Looks like cq use-after-free? I have no idea where. see if you have in the code base you're using (why not

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Jack Wang
Hi Or, We're testing our rdma kernel module, the tests is load module, create RDMA connection, do some traffic, and unload module. No mlx4_en involved, in fact we disable mlx4_en in kernel build, because we don't need that. I did some debug with gdb: (gdb)list *mlx4_test_interrupts+0x84a 0xb0ea

Re: Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Or Gerlitz
On 7/8/2015 12:42 PM, Jack Wang wrote: We're using MLX OFED 2.4-1.0.4 together on top of 3.18.14. So this list is for upstream things.. still, let's see We hit bug below spontaneously, our test trigger this bug around 1 in 5 times. and what is your test if I may ask?! HCA 'mlx4_0' CA t

Mlx4: BUG: unable to handle kernel at ffffffffa02be210

2015-07-08 Thread Jack Wang
Hello Or, Jack and Moni, We hit bug below spontaneously, our test trigger this bug around 1 in 5 times. We're using MLX OFED 2.4-1.0.4 together on top of 3.18.14. HCA 'mlx4_0' CA type: MT26428 Number of ports: 2 Firmware version: 2.9.1000 Hardware version: b0 Could you offer some insight, could