Hi Amit,
Your solution seems ok to me.

BR
///jon


> -----Original Message-----
> From: Amit Jain <[email protected]>
> Sent: 8-Feb-19 00:04
> To: [email protected]
> Subject: [tipc-discussion] Race condition in link_start and tipc_link_delete
> function - RHEL 7.3
> 
> Hi ,
> 
> We are using tipc (version 3.10.0-862  for RHEL 7.3).
> 
> Scenario :
> kernel crash in 2 node cluster , no traffic (bearers enabled/disabled) We are
> able to consistently reproduce the crash on a 2 node setup and
> enabling/disabling bearers in a loop. (On 1 node enable/disable bearers
> within 0.5 seconds , on 2nd node with 4 seconds interval)
> 
> Root Cause Analysis of the issue :
> 
> Disabling the bearer invokes tipc_link_delete which frees the struct tipc_link
> *l_ptr.
> This is done under tipc_net_lock, bearer locks
> 
> Enabling the bearer, sends discovery messages and on reception from peer
> would invoke tipc_link_create which creates a new l_ptr and associates with
> the bearer. All this is done with correct locks. However it schedules 
> link_start
> as a tasklet.
> 
> Inside link_start, bearer lock is taken and work is performed (send proto
> msg). So if this tasklet is scheduled before tipc_link_delete , everything is
> fine. However if the tipc_link_delete happens first then the tasklet is
> invoked with dangling pointer.
> 
> Proposed Solution:
> Cancel the tasklet when deleting the link. But since its a work queue item
> within the tasklet, its not possible to cancel a particular work item. So 
> within
> link_start, we do sanity check whether l_ptr is valid o r not. if l_ptr is 
> invalid
> (does not match with l_ptr inside any n_ptr in node list) we simply return.
> 
> Following is the diff which seems to solve the issue,
> 
> # git diff
> diff --git a/tipc/link.c b/tipc/link.c
> index 349dbfd..f52dca9 100644
> --- a/tipc/link.c
> +++ b/tipc/link.c
> @@ -393,9 +393,24 @@ void tipc_link_delete(struct tipc_link *l_ptr)
> 
>  static void link_start(struct tipc_link *l_ptr)  {
> -       tipc_node_lock(l_ptr->owner);
> +       struct tipc_node *n_ptr;
> +       read_lock_bh(&tipc_net_lock);
> +       list_for_each_entry(n_ptr, &tipc_node_list, list) {
> +               u32 i;
> +               tipc_node_lock(n_ptr);
> +               for (i = 0; i < MAX_BEARERS; i++) {
> +                       if (n_ptr->links[i] == l_ptr) {
> +                               goto LINK_START;
> +                       }
> +               }
> +               tipc_node_unlock(n_ptr);
> +       }
> +       read_unlock_bh(&tipc_net_lock);
> +       return;
> +LINK_START:
>         link_state_event(l_ptr, STARTING_EVT);
>         tipc_node_unlock(l_ptr->owner);
> +       read_unlock_bh(&tipc_net_lock);
>  }
> 
> Would like to know if there are any obvious issues due to above change?
> Appreciate any feedback.
> 
> Regards,
> Amit
> 
> _______________________________________________
> tipc-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion


_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to