Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-22 Thread Hugh Dickins
On Wed, 21 Dec 2016, Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 12:32 AM, Peter Zijlstra wrote: > > > > FWIW, here's mine.. compiles and boots on a NUMA x86_64 machine. > > So I like how your patch is smaller, but your patch is also broken. > > First off, the whole

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-22 Thread Hugh Dickins
On Wed, 21 Dec 2016, Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 12:32 AM, Peter Zijlstra wrote: > > > > FWIW, here's mine.. compiles and boots on a NUMA x86_64 machine. > > So I like how your patch is smaller, but your patch is also broken. > > First off, the whole contention bit is *not*

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 11:50:49 -0800 Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 11:01 AM, Nicholas Piggin wrote: > > Peter's patch is less code and in that regard a bit nicer. I tried > > going that way once, but I just thought it was a bit

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 11:50:49 -0800 Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 11:01 AM, Nicholas Piggin wrote: > > Peter's patch is less code and in that regard a bit nicer. I tried > > going that way once, but I just thought it was a bit too sloppy to > > do nicely with wait bit APIs. >

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 11:01 AM, Nicholas Piggin wrote: > Peter's patch is less code and in that regard a bit nicer. I tried > going that way once, but I just thought it was a bit too sloppy to > do nicely with wait bit APIs. So I have to admit that when I read through your

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 11:01 AM, Nicholas Piggin wrote: > Peter's patch is less code and in that regard a bit nicer. I tried > going that way once, but I just thought it was a bit too sloppy to > do nicely with wait bit APIs. So I have to admit that when I read through your and PeterZ's patches

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Thu, 22 Dec 2016 04:33:31 +1000 Nicholas Piggin wrote: > On Wed, 21 Dec 2016 10:02:27 -0800 > Linus Torvalds wrote: > > > I do think your approach of just re-using the existing bit waiting > > with just a page-specific waiting function is

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Thu, 22 Dec 2016 04:33:31 +1000 Nicholas Piggin wrote: > On Wed, 21 Dec 2016 10:02:27 -0800 > Linus Torvalds wrote: > > > I do think your approach of just re-using the existing bit waiting > > with just a page-specific waiting function is nicer than Nick's "let's > > just roll new waiting

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 10:12:36 -0800 Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 4:30 AM, Nicholas Piggin wrote: > > > > I've been doing a bit of testing, and I don't know why you're seeing > > this. > > > > I don't think I've been able to

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 10:12:36 -0800 Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 4:30 AM, Nicholas Piggin wrote: > > > > I've been doing a bit of testing, and I don't know why you're seeing > > this. > > > > I don't think I've been able to trigger any actual page lock contention > > so

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 10:02:27 -0800 Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 12:32 AM, Peter Zijlstra wrote: > > > > FWIW, here's mine.. compiles and boots on a NUMA x86_64 machine. > > So I like how your patch is smaller, but your

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 10:02:27 -0800 Linus Torvalds wrote: > On Wed, Dec 21, 2016 at 12:32 AM, Peter Zijlstra wrote: > > > > FWIW, here's mine.. compiles and boots on a NUMA x86_64 machine. > > So I like how your patch is smaller, but your patch is also broken. > > First off, the whole

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 4:30 AM, Nicholas Piggin wrote: > > I've been doing a bit of testing, and I don't know why you're seeing > this. > > I don't think I've been able to trigger any actual page lock contention > so nothing gets put on the waitqueue to really bounce cache

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 4:30 AM, Nicholas Piggin wrote: > > I've been doing a bit of testing, and I don't know why you're seeing > this. > > I don't think I've been able to trigger any actual page lock contention > so nothing gets put on the waitqueue to really bounce cache lines around > that I

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 12:32 AM, Peter Zijlstra wrote: > > FWIW, here's mine.. compiles and boots on a NUMA x86_64 machine. So I like how your patch is smaller, but your patch is also broken. First off, the whole contention bit is *not* NUMA-specific. It should help

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 12:32 AM, Peter Zijlstra wrote: > > FWIW, here's mine.. compiles and boots on a NUMA x86_64 machine. So I like how your patch is smaller, but your patch is also broken. First off, the whole contention bit is *not* NUMA-specific. It should help non-NUMA too, by avoiding

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Mon, 19 Dec 2016 14:58:26 -0800 Dave Hansen wrote: > I saw a 4.8->4.9 regression (details below) that I attributed to: > > 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues > > That commit took the bitlock waitqueues from being

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Mon, 19 Dec 2016 14:58:26 -0800 Dave Hansen wrote: > I saw a 4.8->4.9 regression (details below) that I attributed to: > > 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues > > That commit took the bitlock waitqueues from being dynamically-allocated > per-zone to being

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 09:09:31 +0100 Peter Zijlstra wrote: > On Tue, Dec 20, 2016 at 10:02:46AM -0800, Linus Torvalds wrote: > > On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds > > wrote: > > > > > > I'll go back and try to see why the page

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 09:09:31 +0100 Peter Zijlstra wrote: > On Tue, Dec 20, 2016 at 10:02:46AM -0800, Linus Torvalds wrote: > > On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds > > wrote: > > > > > > I'll go back and try to see why the page flag contention patch didn't > > > get applied. > >

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Peter Zijlstra
On Wed, Dec 21, 2016 at 09:09:31AM +0100, Peter Zijlstra wrote: > On Tue, Dec 20, 2016 at 10:02:46AM -0800, Linus Torvalds wrote: > > On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds > > wrote: > > > > > > I'll go back and try to see why the page flag contention

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Peter Zijlstra
On Wed, Dec 21, 2016 at 09:09:31AM +0100, Peter Zijlstra wrote: > On Tue, Dec 20, 2016 at 10:02:46AM -0800, Linus Torvalds wrote: > > On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds > > wrote: > > > > > > I'll go back and try to see why the page flag contention patch didn't > > > get applied. > >

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Peter Zijlstra
On Tue, Dec 20, 2016 at 10:02:46AM -0800, Linus Torvalds wrote: > On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds > wrote: > > > > I'll go back and try to see why the page flag contention patch didn't > > get applied. > > Ahh, a combination of warring patches by

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Peter Zijlstra
On Tue, Dec 20, 2016 at 10:02:46AM -0800, Linus Torvalds wrote: > On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds > wrote: > > > > I'll go back and try to see why the page flag contention patch didn't > > get applied. > > Ahh, a combination of warring patches by Nick and PeterZ, and worry >

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Linus Torvalds
On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds wrote: > > I'll go back and try to see why the page flag contention patch didn't > get applied. Ahh, a combination of warring patches by Nick and PeterZ, and worry about the page flag bits. Damn. I had mentally

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Linus Torvalds
On Tue, Dec 20, 2016 at 9:31 AM, Linus Torvalds wrote: > > I'll go back and try to see why the page flag contention patch didn't > get applied. Ahh, a combination of warring patches by Nick and PeterZ, and worry about the page flag bits. Damn. I had mentally marked this whole issue as "solved".

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Linus Torvalds
On Mon, Dec 19, 2016 at 4:20 PM, Dave Hansen wrote: > On 12/19/2016 03:07 PM, Linus Torvalds wrote: >> +wait_queue_head_t *bit_waitqueue(void *word, int bit) >> +{ >> + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); >> + >> +

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Linus Torvalds
On Mon, Dec 19, 2016 at 4:20 PM, Dave Hansen wrote: > On 12/19/2016 03:07 PM, Linus Torvalds wrote: >> +wait_queue_head_t *bit_waitqueue(void *word, int bit) >> +{ >> + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); >> + >> + return

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Nicholas Piggin
On Tue, 20 Dec 2016 12:58:25 + Mel Gorman wrote: > On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote: > > On Mon, 19 Dec 2016 16:20:05 -0800 > > Dave Hansen wrote: > > > > > On 12/19/2016 03:07 PM, Linus Torvalds

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Nicholas Piggin
On Tue, 20 Dec 2016 12:58:25 + Mel Gorman wrote: > On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote: > > On Mon, 19 Dec 2016 16:20:05 -0800 > > Dave Hansen wrote: > > > > > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > > > +wait_queue_head_t *bit_waitqueue(void

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Mel Gorman
On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote: > On Mon, 19 Dec 2016 16:20:05 -0800 > Dave Hansen wrote: > > > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > +{ > > > +

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-20 Thread Mel Gorman
On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote: > On Mon, 19 Dec 2016 16:20:05 -0800 > Dave Hansen wrote: > > > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > > > +{ > > > + const int __maybe_unused

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Nicholas Piggin
On Mon, 19 Dec 2016 16:20:05 -0800 Dave Hansen wrote: > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > > +{ > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); > > + >

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Nicholas Piggin
On Mon, 19 Dec 2016 16:20:05 -0800 Dave Hansen wrote: > On 12/19/2016 03:07 PM, Linus Torvalds wrote: > > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > > +{ > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); > > + > > + return

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Nicholas Piggin
On Mon, 19 Dec 2016 14:58:26 -0800 Dave Hansen wrote: > I saw a 4.8->4.9 regression (details below) that I attributed to: > > 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues > > That commit took the bitlock waitqueues from being

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Nicholas Piggin
On Mon, 19 Dec 2016 14:58:26 -0800 Dave Hansen wrote: > I saw a 4.8->4.9 regression (details below) that I attributed to: > > 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues > > That commit took the bitlock waitqueues from being dynamically-allocated > per-zone to being

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Dave Hansen
On 12/19/2016 03:07 PM, Linus Torvalds wrote: > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > +{ > + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); > + > + return __bit_waitqueue(word, bit, nid); > > No can do. Part of the problem with

Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Dave Hansen
On 12/19/2016 03:07 PM, Linus Torvalds wrote: > +wait_queue_head_t *bit_waitqueue(void *word, int bit) > +{ > + const int __maybe_unused nid = page_to_nid(virt_to_page(word)); > + > + return __bit_waitqueue(word, bit, nid); > > No can do. Part of the problem with

[RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Dave Hansen
I saw a 4.8->4.9 regression (details below) that I attributed to: 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues That commit took the bitlock waitqueues from being dynamically-allocated per-zone to being statically allocated and global. As suggested by Linus, this makes

[RFC][PATCH] make global bitlock waitqueues per-node

2016-12-19 Thread Dave Hansen
I saw a 4.8->4.9 regression (details below) that I attributed to: 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues That commit took the bitlock waitqueues from being dynamically-allocated per-zone to being statically allocated and global. As suggested by Linus, this makes