Re: list corruption in IPOIB

2013-05-20 Thread Jinpu Wang
which list_del do you mean? in ipoib_cm_tx_start? On Mon, May 20, 2013 at 11:05 AM, Or Gerlitz ogerl...@mellanox.com wrote: On 19/05/2013 12:17, Jack Wang wrote: we added inject_bug sysfs node to make function run into error case, like something below. Yes, you are right, we want to speedup

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On 20/05/2013 12:10, Jinpu Wang wrote: which list_del do you mean? in ipoib_cm_tx_start? yes, but not only, you can start with 5KG hammer and convert all thesehits to list_del_init linux-2.6]# grep list_del drivers/infiniband/ulp/ipoib/*.c | grep neigh drivers/infiniband/ulp/ipoib/ipoib_cm.c:

Re: list corruption in IPOIB

2013-05-20 Thread Jinpu Wang
A quick test show the list_corruption warning is gone, after I convert all list_del(neigh-list) to list_del_list(neigh-list). Test is still running, will update status if anything wrong. Thanks Or. On Mon, May 20, 2013 at 12:58 PM, Or Gerlitz ogerl...@mellanox.com wrote: On 20/05/2013

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On 20/05/2013 15:46, Jinpu Wang wrote: A quick test show the list_corruption warning is gone, after I convert all list_del(neigh-list) to list_del_list(neigh-list). yes, but this wasn't your original problem or was it? -- To unsubscribe from this list: send the line unsubscribe linux-rdma

Re: list corruption in IPOIB

2013-05-20 Thread Shlomo Pongratz
On 5/20/2013 3:58 PM, Jack Wang wrote: I haven't reproduced the original bug we saw in our production environment BUG: unable to handle kernel at 0008 IP: [a0206c30] ipoib_cm_tx_reap+0xe0/0x5a0 [ib_ipoib] ... RIP: 0010:[a0206c30] [a0206c30]

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
Hi Jack, I don't understand what is the current status, that is what do you see now after applying the patches. If you don't get the original bug why did you gave the trace of it? Or is it a new trace? It is not clear from your mail. Please add only the trace of the current issue.

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed list_del to list_del_init as Or suggested. As Or asked for the original bug, so I just want to show him the whole story.

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
On 2013年05月20日 21:00, Or Gerlitz wrote: On Mon, May 20, 2013 at 5:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed list_del to list_del_init as Or suggested. As Or asked for the original bug, so

Re: list corruption in IPOIB

2013-05-20 Thread Or Gerlitz
On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com wrote: The bug in our production environment is introduced in our backport about ipoib fixes from mainline, and when we hit that bug we reverted back to old kernel without the backport patch, and the bug didn't happen for

Re: list corruption in IPOIB

2013-05-20 Thread Jack Wang
On 2013年05月20日 21:50, Or Gerlitz wrote: On Mon, May 20, 2013 at 10:38 PM, Jack Wang jinpu.w...@profitbricks.com wrote: The bug in our production environment is introduced in our backport about ipoib fixes from mainline, and when we hit that bug we reverted back to old kernel without the

Re: list corruption in IPOIB

2013-05-19 Thread Or Gerlitz
On 19/05/2013 00:36, Jack Wang wrote: I tried 3.4.23, and mainline kernel from Roland's rdma-for-linus, we added bug injection interface, run multithread iperf, and switched ib mode between connected and datagram in sync on each side as Shlomo suggested. Can you be more specific re the bug

Re: list corruption in IPOIB

2013-05-18 Thread Or Gerlitz
On Fri, May 17, 2013 at 10:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: We've seen below neigh-list list corruption warning during testing, So about little heads up on what kernel you are using? what's the way to trigger this warning? From Dongsu's and my opinion, several place also

Re: list corruption in IPOIB

2013-05-18 Thread Jack Wang
On 2013年05月18日 21:37, Or Gerlitz wrote: On Fri, May 17, 2013 at 10:36 PM, Jack Wang jinpu.w...@profitbricks.com wrote: We've seen below neigh-list list corruption warning during testing, So about little heads up on what kernel you are using? what's the way to trigger this warning? Hi Or,

list corruption in IPOIB

2013-05-17 Thread Jack Wang
Hi Shlomo Or, We've seen below neigh-list list corruption warning during testing, From Dongsu's and my opinion, several place also need netif_tx_lock(_bh)/netif_tx_unlock(_bh) pairs around neigh-list , I tried to add netif_tx_lock/netif_tx_unlock into ipoib_cm_destroy_tx, it improved the