Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Sergey Senozhatsky
On (09/23/19 14:58), Petr Mladek wrote: > > If I understand it correctly then this is the re-appearing problem. > The only systematic solution with the current approach is to > take port->lock in printk_safe/printk_deferred context. It probably is. We have a number of reverse paths. TTY invokes

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Qian Cai
On Mon, 2019-09-23 at 14:58 +0200, Petr Mladek wrote: > On Mon 2019-09-23 19:21:00, Sergey Senozhatsky wrote: > > So we have > > > > port->lock -> MM -> zone->lock > > // from pty_write()->__tty_buffer_request_room()->kmalloc() > > > > vs > > > > zone->lock -> printk() -> port->lock > >

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Qian Cai
On Mon, 2019-09-23 at 19:21 +0900, Sergey Senozhatsky wrote: > On (09/18/19 12:10), Qian Cai wrote: > [..] > > > So you have debug objects enabled. Right? This thing does not behave > > > when it comes to printing. debug_objects are slightly problematic. > > > > Yes, but there is an also a

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Petr Mladek
On Mon 2019-09-23 19:21:00, Sergey Senozhatsky wrote: > So we have > > port->lock -> MM -> zone->lock > // from pty_write()->__tty_buffer_request_room()->kmalloc() > > vs > > zone->lock -> printk() -> port->lock > // from __offline_pages()->__offline_isolated_pages()->printk() If I

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-23 Thread Sergey Senozhatsky
On (09/18/19 12:10), Qian Cai wrote: [..] > > So you have debug objects enabled. Right? This thing does not behave > > when it comes to printing. debug_objects are slightly problematic. > > Yes, but there is an also a similar splat without the debug_objects. It looks > like anything try to

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Qian Cai
On Thu, 2019-09-19 at 00:50 +0900, Sergey Senozhatsky wrote: > On (09/18/19 10:39), Qian Cai wrote: > > > Perhaps for a quick fix (and a comment that says this needs to be fixed > > > properly). I think the changes to printk() that was discussed at > > > Plumbers may also solve this properly. > >

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Sergey Senozhatsky
A correction: On (09/19/19 00:51), Sergey Senozhatsky wrote: [..] > > zone->lock --> console_sem->lock > > So then we have > > zone->lock --> console_sem->lock --> pi_lock --> rq->lock > > vs. the reverse chain > > rq->lock --> console_sem->lock ^^^

Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Sergey Senozhatsky
On (09/18/19 10:39), Qian Cai wrote: > > Perhaps for a quick fix (and a comment that says this needs to be fixed > > properly). I think the changes to printk() that was discussed at > > Plumbers may also solve this properly. > > I assume that the new printk() stuff will also fix this deadlock

printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-18 Thread Qian Cai
On Mon, 2019-09-16 at 10:42 -0400, Steven Rostedt wrote: > On Thu, 12 Sep 2019 08:05:41 -0400 > Qian Cai wrote: > > > > drivers/char/random.c | 7 --- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/char/random.c b/drivers/char/random.c > > > index

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-16 Thread Sergey Senozhatsky
On (09/16/19 10:42), Steven Rostedt wrote: [..] > > > > This will also fix the hang. > > > > Sergey, do you plan to submit this Ted? > > Perhaps for a quick fix (and a comment that says this needs to be fixed > properly). I guess it would make sense, since LTS and -stable kernels won't get new

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-16 Thread Steven Rostedt
On Thu, 12 Sep 2019 08:05:41 -0400 Qian Cai wrote: > > drivers/char/random.c | 7 --- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/char/random.c b/drivers/char/random.c > > index 9b54cdb301d3..975015857200 100644 > > --- a/drivers/char/random.c > > +++

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-12 Thread Qian Cai
On Wed, 2019-09-11 at 10:10 +0900, Sergey Senozhatsky wrote: > Cc-ing Ted, Arnd, Greg > > On (09/10/19 11:22), Qian Cai wrote: > > [ 1078.283869][T43784] -> #3 (&(>lock)->rlock){-.-.}: > > [ 1078.291350][T43784]__lock_acquire+0x5c8/0xbb0 > > [ 1078.296394][T43784]

CONFIG_SHUFFLE_PAGE_ALLOCATOR=y lockdep splat (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)

2019-09-11 Thread Qian Cai
Adjusted Cc a bit as this looks like more of the scheduler territory. > On Sep 10, 2019, at 3:49 PM, Qian Cai wrote: > > Hmm, it feels like that CONFIG_SHUFFLE_PAGE_ALLOCATOR=y introduces some unique > locking patterns that the lockdep does not like via, > > allocate_slab > shuffle_freelist >

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-10 Thread Sergey Senozhatsky
Cc-ing Ted, Arnd, Greg On (09/10/19 11:22), Qian Cai wrote: > [ 1078.283869][T43784] -> #3 (&(>lock)->rlock){-.-.}: > [ 1078.291350][T43784]__lock_acquire+0x5c8/0xbb0 > [ 1078.296394][T43784]lock_acquire+0x154/0x428 > [ 1078.301266][T43784]_raw_spin_lock_irqsave+0x80/0xa0

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-10 Thread Qian Cai
On Tue, 2019-09-10 at 11:22 -0400, Qian Cai wrote: > On Thu, 2019-09-05 at 17:08 -0400, Qian Cai wrote: > > Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, > > it > > will also fix it. > > > > On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote: > > >

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-10 Thread Qian Cai
On Tue, 2019-09-10 at 11:22 -0400, Qian Cai wrote: > On Thu, 2019-09-05 at 17:08 -0400, Qian Cai wrote: > > Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, > > it > > will also fix it. > > > > On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote: > > >

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-10 Thread Qian Cai
On Thu, 2019-09-05 at 17:08 -0400, Qian Cai wrote: > Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, it > will also fix it. > > On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote: > > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config > > > > Booting

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-05 Thread Qian Cai
Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, it will also fix it. On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote: > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config > > Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] + >

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-08-23 Thread Qian Cai
On Fri, 2019-08-23 at 12:37 +0100, Will Deacon wrote: > On Thu, Aug 22, 2019 at 05:33:23PM -0400, Qian Cai wrote: > > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config > > > > Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] + > > CONFIG_PROVE_LOCKING=y results

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-08-23 Thread Will Deacon
On Thu, Aug 22, 2019 at 05:33:23PM -0400, Qian Cai wrote: > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config > > Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] + > CONFIG_PROVE_LOCKING=y results in hanging. Hmm, but the config you link to above has: #

page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-08-22 Thread Qian Cai
https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] + CONFIG_PROVE_LOCKING=y results in hanging. [1] https://lore.kernel.org/linux-mm/154899811208.3165233.17623209031065121886.s