On 02/19/2016 09:20 AM, Xue, Ying wrote:
> Hi Jon,
>
> I understood our scenario. Whatever the patch and another patch(v2 tipc: fix 
> crash during node removal) seem all incorrect. I will post a patch to fix the 
> issue soon.

Ok.
Actually, I think it is ok to use del_timer_sync(), which makes this 
very easy to fix. I was misled by our bad experience from the link 
timer, where we ran into deadlocks.
This cannot happen with the node timer.

///jon

>
> Regards,
> Ying
>
> -----Original Message-----
> From: Jon Maloy [mailto:[email protected]]
> Sent: 2016年2月19日 15:38
> To: Xue, Ying; [email protected]; Parthasarathy 
> Bhuvaragan; Richard Alpe; [email protected]
> Cc: [email protected]
> Subject: RE: [PATCH net-next 1/1] tipc: eliminate risk of finding 
> to-be-deleted node instance
>
>
>
>> -----Original Message-----
>> From: Xue, Ying [mailto:[email protected]]
>> Sent: Thursday, 18 February, 2016 08:50
>> To: Jon Maloy; [email protected]; Parthasarathy
>> Bhuvaragan; Richard Alpe; [email protected]
>> Cc: [email protected]
>> Subject: RE: [PATCH net-next 1/1] tipc: eliminate risk of finding
>> to-be-deleted node instance
>>
>> @@ -395,21 +396,19 @@ static void tipc_node_delete(struct tipc_node *node)  {
>>      list_del_rcu(&node->list);
>>      hlist_del_rcu(&node->hash);
>> -    kfree(node->bc_entry.link);
>> -    kfree_rcu(node, rcu);
>> +    tipc_node_put(node);
>>
>> We cannot call the first tipc_node_put(node) before
>> del_timer(&node->timer)), otherwise, this may cause a panic. For
>> example, after the first
>> tipc_node_put(node) is performed, its refcnt is zero.
> It will never be zero here. During normal runtime, the counter is 2 or more 
> (depending on how many tipc_node_find()'ers are active at the moment). This 
> decrementation is done because we just removed the node from the  lookup 
> table, but we still haven't tried deactivating the timer, so the value is  
> guaranteed 1 or more.
> (see one more comment further down)
>
>> When the node is accessed
>> in del_tmer(), the node instance might be freed by another CPU. Thus,
>> panic happens at the moment.
>>
>> Regards,
>> Ying
>>
>> +    if (del_timer(&node->timer))
>> +            tipc_node_put(node);
> This is where we have the real problem. If we are unsuccessful in deleting 
> the timer here because the timer handler is running (and the timer therefore 
> already inactive), there is no way the handler can know that it must not 
> reactivate it before it returns. After  mod_timer() returns, the timer is 
> always active,  irrespective of its previous state and return value. Any 
> suggestion for how to fix this? Even v2 of my patch "tipc: fix crash during 
> node removal" seems to be wrong. All other solutions I have found suggest 
> del_timer_sync(), but we cannot use that, since we know it may cause deadlock.
>
> ///jon
>
>>   }
>>
>>   void tipc_node_stop(struct net *net)
>>   {
>> -    struct tipc_net *tn = net_generic(net, tipc_net_id);
>> +    struct tipc_net *tn = tipc_net(net);
>>      struct tipc_node *node, *t_node;
>>
>>      spin_lock_bh(&tn->node_list_lock);
>> -    list_for_each_entry_safe(node, t_node, &tn->node_list, list) {
>> -            if (del_timer(&node->timer))
>> -                    tipc_node_put(node);
>> -            tipc_node_put(node);
>> -    }
>> +    list_for_each_entry_safe(node, t_node, &tn->node_list, list)
>> +            tipc_node_delete(node);
>>      spin_unlock_bh(&tn->node_list_lock);
>>   }
>>
>> --
>> 1.9.1


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to