Hi Ying,
My reasoning was exactly like yours. I also thought about going back to 
del_timer(), but then remembered all the problems we had with this, and how 
well is working now, so I opted for the more coward (and uglier) solution. But 
I'll give this a second glance. I don't like to use work queues unless 
absolutely necessary.

///jon


> -----Original Message-----
> From: Ying Xue <[email protected]>
> Sent: 13-Nov-18 07:50
> To: Jon Maloy <[email protected]>; tipc-
> [email protected]
> Cc: Tung Quang Nguyen <[email protected]>; Hoang Huu Le
> <[email protected]>; Jon Maloy <[email protected]>;
> [email protected]; Canh Duc Luu <[email protected]>; Gordan
> Mihaljevic <[email protected]>;
> [email protected]
> Subject: Re: [net-next 2/2] tipc: fix lockdep warning during node delete
> 
> Hi Jon,
> 
> I believe the real deadlock scenario is as below:
> 
> CPU0                                    CPU1
> ====                                   =======
> spin_lock_bh(&tn->node_list_lock);
>                                      <SOFTIRQ>
>                                      call_timer_fn()
>                                      <IRQ IN>
> 
>                                      irq_fn()
>                                      <IRQ_OUT>
> del_timer_sync()
> //wait until tipc_node_timeout() is completed
>                                      <SOFTIRQ>
> 
>                                      tipc_node_timeout()
>                                      tipc_node_cleanup()
>                                      spin_lock_bh(&tn->node_list_lock);
> 
> tn->node_list_lock has been held, but del_timer_sync() must wait until
> tipc_node_timeout() is completed on other CPUs, otherwise, it will never
> return. Exactly speaking, if tipc_node_timeout() is not finished,
> del_timer_sync() could not exit from the loop below:
> 
> del_timer_sync()
>         for (;;) {
>                 int ret = try_to_del_timer_sync(timer);
>                 if (ret >= 0)
>                         return ret;
>                 cpu_relax();
>         }
> 
> However, tipc_node_timeout() will try to grab the tn->node_list_lock on
> CPU1, but the lock has been taken before del_timer_sync() on CPU0. As a
> result, system will be locked forever.
> 
> Regarding the root cause, as long as we don't synchronously wait the event
> of tipc_node_timeout() completion on other CPUs, the deadlock will never
> happen. It means if we replace del_timer_sync() with del_timer() in
> tipc_node_delete(), the deadlock will be eliminated. But we need to ensure
> node instance is valid especially for tipc_node_timeout() after
> tipc_node_put() is called in tipc_node_delete(). If so, we need to carefully
> increase and decrease node reference count.
> 
> In your solution, tn->node_list_lock would not be held in
> tipc_node_timeout(), instead, grabbing the lock is moved to workqueue,
> which means the deadlock is able to be killed too.
> 
> If we use the solution I suggested above, it will be a bit natural and we 
> don't
> need to additionally introduce workqueue. But as I said, we have to carefully
> check whether we properly get/put node reference count.
> 
> Actually, we original used del_timer() to asynchronously delete node timer,
> but as some issues were found, later we changed as its deletion behavior
> with a synchronous manner.
> 
> Anyway, your solution is also fine to me. So you can decide which approach
> you prefer to.
> 
> Thanks,
> Ying
> 
> On 11/11/2018 03:05 AM, Jon Maloy wrote:
> > We see the following lockdep warning:
> >
> > [ 2284.078521]
> ======================================================
> > [ 2284.078604] WARNING: possible circular locking dependency detected
> > [ 2284.078604] 4.19.0+ #42 Tainted: G            E
> > [ 2284.078604] ------------------------------------------------------
> > [ 2284.078604] rmmod/254 is trying to acquire lock:
> > [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at:
> > del_timer_sync+0x5/0xa0 [ 2284.078604] [ 2284.078604] but task is
> > already holding lock:
> > [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.},
> > at: tipc_node_stop+0xac/0x190 [tipc] [ 2284.078604] [ 2284.078604]
> > which lock already depends on the new lock.
> > [ 2284.078604]
> > [ 2284.078604]
> > [ 2284.078604] the existing dependency chain (in reverse order) is:
> > [ 2284.078604]
> > [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
> > [ 2284.078604]        tipc_node_timeout+0x20a/0x330 [tipc]
> > [ 2284.078604]        call_timer_fn+0xa1/0x280
> > [ 2284.078604]        run_timer_softirq+0x1f2/0x4d0
> > [ 2284.078604]        __do_softirq+0xfc/0x413
> > [ 2284.078604]        irq_exit+0xb5/0xc0
> > [ 2284.078604]        smp_apic_timer_interrupt+0xac/0x210
> > [ 2284.078604]        apic_timer_interrupt+0xf/0x20
> > [ 2284.078604]        default_idle+0x1c/0x140
> > [ 2284.078604]        do_idle+0x1bc/0x280
> > [ 2284.078604]        cpu_startup_entry+0x19/0x20
> > [ 2284.078604]        start_secondary+0x187/0x1c0
> > [ 2284.078604]        secondary_startup_64+0xa4/0xb0
> > [ 2284.078604]
> > [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
> > [ 2284.078604]        del_timer_sync+0x34/0xa0
> > [ 2284.078604]        tipc_node_delete+0x1a/0x40 [tipc]
> > [ 2284.078604]        tipc_node_stop+0xcb/0x190 [tipc]
> > [ 2284.078604]        tipc_net_stop+0x154/0x170 [tipc]
> > [ 2284.078604]        tipc_exit_net+0x16/0x30 [tipc]
> > [ 2284.078604]        ops_exit_list.isra.8+0x36/0x70
> > [ 2284.078604]        unregister_pernet_operations+0x87/0xd0
> > [ 2284.078604]        unregister_pernet_subsys+0x1d/0x30
> > [ 2284.078604]        tipc_exit+0x11/0x6f2 [tipc]
> > [ 2284.078604]        __x64_sys_delete_module+0x1df/0x240
> > [ 2284.078604]        do_syscall_64+0x66/0x460
> > [ 2284.078604]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [ 2284.078604]
> > [ 2284.078604] other info that might help us debug this:
> > [ 2284.078604]
> > [ 2284.078604]  Possible unsafe locking scenario:
> > [ 2284.078604]
> > [ 2284.078604]        CPU0                    CPU1
> > [ 2284.078604]        ----                    ----
> > [ 2284.078604]   lock(&(&tn->node_list_lock)->rlock);
> > [ 2284.078604]                                lock((&n->timer)#2);
> > [ 2284.078604]                                
> > lock(&(&tn->node_list_lock)->rlock);
> > [ 2284.078604]   lock((&n->timer)#2);
> > [ 2284.078604]
> > [ 2284.078604]  *** DEADLOCK ***
> > [ 2284.078604]
> > [ 2284.078604] 3 locks held by rmmod/254:
> > [ 2284.078604]  #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at:
> > unregister_pernet_subsys+0x15/0x30
> > [ 2284.078604]  #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at:
> > tipc_net_stop+0x144/0x170 [tipc] [ 2284.078604]  #2: 00000000f997afc0
> > (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
> > [...}
> >
> > The reason is that the node timer handler sometimes needs to delete a
> > node which has been disconnected for too long. To do this, it grabs
> > the lock  'node_list_lock', which may at the same time be held by the
> > generic node cleanup function, tipc_node_stop(), during module removal.
> > Since the latter is calling del_timer_sync() inside the same lock, we
> > have a potential deadlock.
> >
> > We fix this by moving the timer trigged delete function to work queue
> > context.
> >
> > Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance")
> > Signed-off-by: Jon Maloy <[email protected]>
> > ---
> >  net/tipc/node.c | 74
> > ++++++++++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 58 insertions(+), 16 deletions(-)
> >
> > diff --git a/net/tipc/node.c b/net/tipc/node.c index 2afc4f8..40527e6
> > 100644
> > --- a/net/tipc/node.c
> > +++ b/net/tipc/node.c
> > @@ -75,6 +75,12 @@ struct tipc_bclink_entry {
> >     struct sk_buff_head namedq;
> >  };
> >
> > +struct tipc_del_node_work {
> > +   struct work_struct work;
> > +   struct net *net;
> > +   u32 addr;
> > +};
> > +
> >  /**
> >   * struct tipc_node - TIPC node structure
> >   * @addr: network address of node
> > @@ -581,25 +587,61 @@ static void  tipc_node_clear_links(struct
> tipc_node *node)
> >     }
> >  }
> >
> > -/* tipc_node_cleanup - delete nodes that does not
> > - * have active links for NODE_CLEANUP_AFTER time
> > +static void tipc_exec_node_delete(struct work_struct *work) {
> > +   struct tipc_del_node_work *dwork;
> > +   struct tipc_net *tn;
> > +   struct tipc_node *n;
> > +   struct net *net;
> > +   u32 addr;
> > +
> > +   dwork = container_of(work, struct tipc_del_node_work, work);
> > +   addr = dwork->addr;
> > +   net = dwork->net;
> > +   kfree(dwork);
> > +
> > +   /* In case the module has been removed: */
> > +   tn = tipc_net(net);
> > +   if (!tn)
> > +           return;
> > +   n = tipc_node_find(net, addr);
> > +   if (!n || node_is_up(n))
> > +           return;
> > +
> > +   spin_lock_bh(&tn->node_list_lock);
> > +   tipc_node_write_lock(n);
> > +   tipc_node_clear_links(n);
> > +   tipc_node_delete_from_list(n);
> > +   tipc_node_write_unlock(n);
> > +   spin_unlock_bh(&tn->node_list_lock);
> > +
> > +   /* For work queue item */
> > +   tipc_node_put(n);
> > +}
> > +
> > +/* tipc_node_cleanup - delete node if it didn't have active
> > + * links for  NODE_CLEANUP_AFTER millisecods
> >   */
> >  static int tipc_node_cleanup(struct tipc_node *peer)  {
> > -   struct tipc_net *tn = tipc_net(peer->net);
> > -   bool deleted = false;
> > +   struct tipc_del_node_work *dwork;
> >
> > -   spin_lock_bh(&tn->node_list_lock);
> > -   tipc_node_write_lock(peer);
> > +   if (node_is_up(peer))
> > +           return false;
> > +   if (!time_after(jiffies, peer->delete_at))
> > +           return false;
> >
> > -   if (!node_is_up(peer) && time_after(jiffies, peer->delete_at)) {
> > -           tipc_node_clear_links(peer);
> > -           tipc_node_delete_from_list(peer);
> > -           deleted = true;
> > -   }
> > -   tipc_node_write_unlock(peer);
> > -   spin_unlock_bh(&tn->node_list_lock);
> > -   return deleted;
> > +   /* Delete in work queue context to avoid potential deadlock */
> > +   dwork = kzalloc(sizeof(*dwork), GFP_ATOMIC);
> > +   if (!dwork)
> > +           return false;
> > +
> > +   INIT_WORK(&dwork->work, tipc_exec_node_delete);
> > +   dwork->addr = peer->addr;
> > +   dwork->net = peer->net;
> > +   tipc_node_get(peer);
> > +   schedule_work(&dwork->work);
> > +   return true;
> >  }
> >
> >  /* tipc_node_timeout - handle expiration of node timer @@ -613,8
> > +655,8 @@ static void tipc_node_timeout(struct timer_list *t)
> >     int bearer_id;
> >     int rc = 0;
> >
> > -   if (!node_is_up(n) && tipc_node_cleanup(n)) {
> > -           /*Removing the reference of Timer*/
> > +   if (tipc_node_cleanup(n)) {
> > +           /* Remove reference to this timer */
> >             tipc_node_put(n);
> >             return;
> >     }
> >

_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to