Re: Crash due to mutex genl_lock called from RCU context

2016-11-29 Thread David Miller
From: Herbert Xu Date: Mon, 28 Nov 2016 19:22:12 +0800 > netlink: Call cb->done from a worker thread > > The cb->done interface expects to be called in process context. > This was broken by the netlink RCU conversion. This patch fixes > it by adding a worker struct

Re: Crash due to mutex genl_lock called from RCU context

2016-11-28 Thread Cong Wang
On Mon, Nov 28, 2016 at 3:22 AM, Herbert Xu wrote: > netlink: Call cb->done from a worker thread > > The cb->done interface expects to be called in process context. > This was broken by the netlink RCU conversion. This patch fixes > it by adding a worker struct to

Re: Crash due to mutex genl_lock called from RCU context

2016-11-28 Thread Herbert Xu
On Sun, Nov 27, 2016 at 10:53:21PM -0800, Cong Wang wrote: > > I just took a deeper look, some user calls rhashtable_destroy() in ->done(), > so even removing that genl lock is not enough, perhaps we should just > move it to a work struct like what Daniel does for the tcf_proto, but that is >

Re: Crash due to mutex genl_lock called from RCU context

2016-11-27 Thread Cong Wang
On Sun, Nov 27, 2016 at 8:23 AM, Eric Dumazet wrote: > On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote: >> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet wrote: >> > >> > Are you telling me inet_release() is called when we close() the first >> >

Re: Crash due to mutex genl_lock called from RCU context

2016-11-27 Thread Eric Dumazet
On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote: > On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet wrote: > > > > Are you telling me inet_release() is called when we close() the first > > file descriptor ? > > > > fd1 = socket() > > fd2 = dup(fd1); > > close(fd2) ->

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Cong Wang
On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet wrote: > > Are you telling me inet_release() is called when we close() the first > file descriptor ? > > fd1 = socket() > fd2 = dup(fd1); > close(fd2) -> release() ??? Sorry, I didn't express myself clearly, I meant your

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Eric Dumazet
On Sat, 2016-11-26 at 18:08 -0800, Cong Wang wrote: > On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet wrote: > > > > Oh well, this wont work, since sk->sk_destruct will be called from RCU > > callback. > > > > Grabbing the mutex should not be done from

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Cong Wang
On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet wrote: > > Oh well, this wont work, since sk->sk_destruct will be called from RCU > callback. > > Grabbing the mutex should not be done from netlink_sock_destruct() but > from netlink_release() But you also change the behavior

Re: Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread subashab
Oh well, this wont work, since sk->sk_destruct will be called from RCU callback. Grabbing the mutex should not be done from netlink_sock_destruct() but from netlink_release() Maybe this patch would be better : diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index

Re: Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread Eric Dumazet
On Fri, 2016-11-25 at 20:11 -0800, Eric Dumazet wrote: > On Fri, 2016-11-25 at 19:15 -0700, subas...@codeaurora.org wrote: > > We are seeing a crash due to gen_lock mutex being acquired in RCU > > context. > > Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a > > regression

Re: Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread Eric Dumazet
On Fri, 2016-11-25 at 19:15 -0700, subas...@codeaurora.org wrote: > We are seeing a crash due to gen_lock mutex being acquired in RCU > context. > Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a > regression rack, so unfortunately I don't have steps for a reproducer. > > It

Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread subashab
We are seeing a crash due to gen_lock mutex being acquired in RCU context. Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a regression rack, so unfortunately I don't have steps for a reproducer. It looks like freeing socket in RCU was brought in through commit