Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-11 Thread Or Gerlitz
On Tue, Sep 11, 2012 at 9:03 AM, Jack Morgenstein ja...@dev.mellanox.co.il wrote: On Monday 10 September 2012 16:27, Or Gerlitz wrote: I took a look on the practice/wrapping used over the mm subsystem for radix_tree_lookup calls, whose maintainer, Andrew Morton is signed on the patch Roland

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-10 Thread Jack Morgenstein
On Sunday 09 September 2012 18:10, Roland Dreier wrote: Please look at commit 7cf9c2c76c1a ([PATCH] radix-tree: RCU lockless readside) Roland, What about the following note (from the commit diff mentioned above): +/** + * Radix-tree synchronization + * + * The radix-tree API requires that

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-10 Thread Or Gerlitz
On 10/09/2012 09:57, Jack Morgenstein wrote: + * + * For API usage, in general, + * - any function _modifying_ the the tree or tags (inserting or deleting + * items, setting or clearing tags must exclude other modifications, and + * exclude any functions reading the tree. + * - any function

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-10 Thread Jack Morgenstein
On Monday 10 September 2012 13:35, Or Gerlitz wrote: Jack,  Max Actually, can't we do well with rcu_read_lock() in mlx4_cq_completion() as that commit documentation suggests? I don't know. I do notice (in file include/linux/rcupdate.h) that rcu_read_lock/unlock is meant to be used in the

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-10 Thread Or Gerlitz
On 10/09/2012 16:17, Jack Morgenstein wrote: I don't know. I do notice (in file include/linux/rcupdate.h) that rcu_read_lock/unlock is meant to be used in the interrupt context. Would it be sufficient (besides rcu_read_lock/unlock calls) to add a call rcu_synchronize() in mlx4_cq_free (after

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-10 Thread Jack Morgenstein
On Monday 10 September 2012 16:27, Or Gerlitz wrote: I  took a look on the practice/wrapping used over the mm subsystem for radix_tree_lookup calls, whose maintainer, Andrew Morton is signed on the patch Roland pointed to, its just rcu_read_lock/unlock, seems this is what to do as well. In

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-09 Thread Or Gerlitz
On Tue, Sep 4, 2012 at 11:12 AM, Max Matveev m...@gmx.co.uk wrote: On Thu, Aug 30, 2012 at 22:35 PM Roland Dreier wrote: On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz or.gerl...@gmail.com wrote: Roland Dreier rol...@kernel.org wrote: Can you be explicit about the race you're worried about?

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-09 Thread Or Gerlitz
On Wed, Sep 5, 2012 at 4:25 PM, Max Matveev m...@gmx.co.uk wrote: On Tue, 4 Sep 2012 11:02:32 -0700, Roland Dreier wrote: roland On Tue, Sep 4, 2012 at 1:12 AM, Max Matveev m...@gmx.co.uk wrote: What about races between radix_tree_extend and radix_tree_lookup? As far as I can see

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-09 Thread Roland Dreier
On Sun, Sep 9, 2012 at 7:09 AM, Or Gerlitz or.gerl...@gmail.com wrote: I honestly do see how - the height of the root node is updated indepedently of the slots, so if someone managed to get the updated height there is nothing from stoping radix_tree_lookup from going too deep into the chain of

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-08-31 Thread Or Gerlitz
On Fri, Aug 31, 2012 at 1:35 AM, Roland Dreier rol...@kernel.org wrote: On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz or.gerl...@gmail.com wrote: 1. on the time CQ A is deleted an interrupt that relates to CQ B takes place and a radix tree lookup is running while an element is being deleted

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-08-31 Thread Or Gerlitz
Roland Dreier ‎rol...@kernel.org wrote: Can you be explicit about the race you're worried about? Roland, This patch was made after we got the below report from the field. Dotan, can we get Roland access to the full vmcore image? Here's the report I got: [...] box panic'ed when trying to

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-08-31 Thread Or Gerlitz
On Fri, Aug 31, 2012 at 1:35 AM, Roland Dreier rol...@kernel.org wrote: I don't think this is a real problem; the radix tree code is [...] So maybe this patch wouldn't land in 3.7, we'll see, however, so far no other patch sits in the for-next branch of your tree for the next window... any plan

[PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-08-30 Thread Or Gerlitz
From: Yishai Hadas yish...@mellanox.com The mlx4 CQ completion handler, mlx4_cq_completion doesn't bother to lock the radix tree which is used to manage the table of CQs, nor to increase the reference count of the CQ before invoking the user provided callback. This is racy and can cause use

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-08-30 Thread Or Gerlitz
Roland Dreier ‎rol...@kernel.org wrote: Can you be explicit about the race you're worried about? few 1. on the time CQ A is deleted an interrupt that relates to CQ B takes place and a radix tree lookup is running while an element is being deleted from the tree, looking on the radix tree API,

Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-08-30 Thread Roland Dreier
On Thu, Aug 30, 2012 at 3:17 PM, Or Gerlitz or.gerl...@gmail.com wrote: Roland Dreier ‎rol...@kernel.org wrote: Can you be explicit about the race you're worried about? few 1. on the time CQ A is deleted an interrupt that relates to CQ B takes place and a radix tree lookup is running