Re: in_irq_or_nmi() and RFC patch

2017-04-05 Thread Mel Gorman
On Mon, Apr 03, 2017 at 01:05:06PM +0100, Mel Gorman wrote:
> > Started performance benchmarking:
> >  163 cycles = current state
> >  183 cycles = with BH disable + in_irq
> >  218 cycles = with BH disable + in_irq + irqs_disabled
> > 
> > Thus, the performance numbers unfortunately looks bad, once we add the
> > test for irqs_disabled().  The slowdown by replacing preempt_disable
> > with BH-disable is still a win (we saved 29 cycles before, and loose
> > 20, I was expecting regression to be only 10 cycles).
> > 
> 
> This surprises me because I'm not seeing the same severity of problems
> with irqs_disabled. Your path is slower than what's currently upstream
> but it's still far better than a revert. The softirq column in the
> middle is your patch versus a full revert which is the last columnm
> 

Any objection to resending the local_bh_enable/disable patch with the
in_interrupt() check based on this data or should I post the revert and
go back to the drawing board?

-- 
Mel Gorman
SUSE Labs


Re: in_irq_or_nmi() and RFC patch

2017-04-05 Thread Mel Gorman
On Mon, Apr 03, 2017 at 01:05:06PM +0100, Mel Gorman wrote:
> > Started performance benchmarking:
> >  163 cycles = current state
> >  183 cycles = with BH disable + in_irq
> >  218 cycles = with BH disable + in_irq + irqs_disabled
> > 
> > Thus, the performance numbers unfortunately looks bad, once we add the
> > test for irqs_disabled().  The slowdown by replacing preempt_disable
> > with BH-disable is still a win (we saved 29 cycles before, and loose
> > 20, I was expecting regression to be only 10 cycles).
> > 
> 
> This surprises me because I'm not seeing the same severity of problems
> with irqs_disabled. Your path is slower than what's currently upstream
> but it's still far better than a revert. The softirq column in the
> middle is your patch versus a full revert which is the last columnm
> 

Any objection to resending the local_bh_enable/disable patch with the
in_interrupt() check based on this data or should I post the revert and
go back to the drawing board?

-- 
Mel Gorman
SUSE Labs


Re: in_irq_or_nmi() and RFC patch

2017-04-03 Thread Mel Gorman
On Thu, Mar 30, 2017 at 05:07:08PM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 30 Mar 2017 14:04:36 +0100
> Mel Gorman  wrote:
> 
> > On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > > > Regardless or using in_irq() (or in combi with in_nmi()) I get the
> > > > following warning below:
> > > > 
> > > > [0.00] Kernel command line: 
> > > > BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> > > > root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet 
> > > > LANG=en_DK.UTF
> > > > -8
> > > > [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> > > > [0.00] [ cut here ]
> > > > [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> > > > __local_bh_enable_ip+0x70/0x90
> > > > [0.00] Modules linked in:
> > > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > > > 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> > > > [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), 
> > > > BIOS 1.60 12/16/2015
> > > > [0.00] Call Trace:
> > > > [0.00]  dump_stack+0x4f/0x73
> > > > [0.00]  __warn+0xcb/0xf0
> > > > [0.00]  warn_slowpath_null+0x1d/0x20
> > > > [0.00]  __local_bh_enable_ip+0x70/0x90
> > > > [0.00]  free_hot_cold_page+0x1a4/0x2f0
> > > > [0.00]  __free_pages+0x1f/0x30
> > > > [0.00]  __free_pages_bootmem+0xab/0xb8
> > > > [0.00]  __free_memory_core+0x79/0x91
> > > > [0.00]  free_all_bootmem+0xaa/0x122
> > > > [0.00]  mem_init+0x71/0xa4
> > > > [0.00]  start_kernel+0x1e5/0x3f1
> > > > [0.00]  x86_64_start_reservations+0x2a/0x2c
> > > > [0.00]  x86_64_start_kernel+0x178/0x18b
> > > > [0.00]  start_cpu+0x14/0x14
> > > > [0.00]  ? start_cpu+0x14/0x14
> > > > [0.00] ---[ end trace a57944bec8fc985c ]---
> > > > [0.00] Memory: 32739472K/33439416K available (7624K kernel 
> > > > code, 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K 
> > > > reserved, 0K cma-reserved)
> > > > 
> > > > And kernel/softirq.c:161 contains:
> > > > 
> > > >  WARN_ON_ONCE(in_irq() || irqs_disabled());
> > > > 
> > > > Thus, I don't think the change in my RFC-patch[1] is safe.
> > > > Of changing[2] to support softirq allocations by replacing
> > > > preempt_disable() with local_bh_disable().
> > > > 
> > > > [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> > > > 
> > > > [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator 
> > > > for irq-safe requests")
> > > >  https://git.kernel.org/torvalds/c/374ad05ab64d  
> > > 
> > > A patch that avoids the above warning is inlined below, but I'm not
> > > sure if this is best direction.  Or we should rather consider reverting
> > > part of commit 374ad05ab64d to avoid the softirq performance regression?
> > >
> > 
> > At the moment, I'm not seeing a better alternative. If this works, I
> > think it would still be far superior in terms of performance than a
> > revert. 
> 
> Started performance benchmarking:
>  163 cycles = current state
>  183 cycles = with BH disable + in_irq
>  218 cycles = with BH disable + in_irq + irqs_disabled
> 
> Thus, the performance numbers unfortunately looks bad, once we add the
> test for irqs_disabled().  The slowdown by replacing preempt_disable
> with BH-disable is still a win (we saved 29 cycles before, and loose
> 20, I was expecting regression to be only 10 cycles).
> 

This surprises me because I'm not seeing the same severity of problems
with irqs_disabled. Your path is slower than what's currently upstream
but it's still far better than a revert. The softirq column in the
middle is your patch versus a full revert which is the last columnm

  4.11.0-rc5 4.11.0-rc5 
4.11.0-rc5
 vanilla   softirq-v2r1 
   revert-v2r1
Ameanalloc-odr0-1   217.00 (  0.00%)   223.00 ( -2.76%) 
  280.54 (-29.28%)
Ameanalloc-odr0-2   162.23 (  0.00%)   174.46 ( -7.54%) 
  210.54 (-29.78%)
Ameanalloc-odr0-4   144.15 (  0.00%)   150.38 ( -4.32%) 
  182.38 (-26.52%)
Ameanalloc-odr0-8   126.00 (  0.00%)   132.15 ( -4.88%) 
  282.08 (-123.87%)
Ameanalloc-odr0-16  117.00 (  0.00%)   122.00 ( -4.27%) 
  253.00 (-116.24%)
Ameanalloc-odr0-32  113.00 (  0.00%)   118.00 ( -4.42%) 
  145.00 (-28.32%)
Ameanalloc-odr0-64  110.77 (  0.00%)   114.31 ( -3.19%) 
  143.00 (-29.10%)
Ameanalloc-odr0-128 109.00 (  0.00%)   107.69 (  1.20%) 
  179.54 (-64.71%)
Ameanalloc-odr0-256 121.00 (  0.00%)   125.00 ( -3.31%) 
  232.23 

Re: in_irq_or_nmi() and RFC patch

2017-04-03 Thread Mel Gorman
On Thu, Mar 30, 2017 at 05:07:08PM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 30 Mar 2017 14:04:36 +0100
> Mel Gorman  wrote:
> 
> > On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > > > Regardless or using in_irq() (or in combi with in_nmi()) I get the
> > > > following warning below:
> > > > 
> > > > [0.00] Kernel command line: 
> > > > BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> > > > root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet 
> > > > LANG=en_DK.UTF
> > > > -8
> > > > [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> > > > [0.00] [ cut here ]
> > > > [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> > > > __local_bh_enable_ip+0x70/0x90
> > > > [0.00] Modules linked in:
> > > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > > > 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> > > > [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), 
> > > > BIOS 1.60 12/16/2015
> > > > [0.00] Call Trace:
> > > > [0.00]  dump_stack+0x4f/0x73
> > > > [0.00]  __warn+0xcb/0xf0
> > > > [0.00]  warn_slowpath_null+0x1d/0x20
> > > > [0.00]  __local_bh_enable_ip+0x70/0x90
> > > > [0.00]  free_hot_cold_page+0x1a4/0x2f0
> > > > [0.00]  __free_pages+0x1f/0x30
> > > > [0.00]  __free_pages_bootmem+0xab/0xb8
> > > > [0.00]  __free_memory_core+0x79/0x91
> > > > [0.00]  free_all_bootmem+0xaa/0x122
> > > > [0.00]  mem_init+0x71/0xa4
> > > > [0.00]  start_kernel+0x1e5/0x3f1
> > > > [0.00]  x86_64_start_reservations+0x2a/0x2c
> > > > [0.00]  x86_64_start_kernel+0x178/0x18b
> > > > [0.00]  start_cpu+0x14/0x14
> > > > [0.00]  ? start_cpu+0x14/0x14
> > > > [0.00] ---[ end trace a57944bec8fc985c ]---
> > > > [0.00] Memory: 32739472K/33439416K available (7624K kernel 
> > > > code, 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K 
> > > > reserved, 0K cma-reserved)
> > > > 
> > > > And kernel/softirq.c:161 contains:
> > > > 
> > > >  WARN_ON_ONCE(in_irq() || irqs_disabled());
> > > > 
> > > > Thus, I don't think the change in my RFC-patch[1] is safe.
> > > > Of changing[2] to support softirq allocations by replacing
> > > > preempt_disable() with local_bh_disable().
> > > > 
> > > > [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> > > > 
> > > > [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator 
> > > > for irq-safe requests")
> > > >  https://git.kernel.org/torvalds/c/374ad05ab64d  
> > > 
> > > A patch that avoids the above warning is inlined below, but I'm not
> > > sure if this is best direction.  Or we should rather consider reverting
> > > part of commit 374ad05ab64d to avoid the softirq performance regression?
> > >
> > 
> > At the moment, I'm not seeing a better alternative. If this works, I
> > think it would still be far superior in terms of performance than a
> > revert. 
> 
> Started performance benchmarking:
>  163 cycles = current state
>  183 cycles = with BH disable + in_irq
>  218 cycles = with BH disable + in_irq + irqs_disabled
> 
> Thus, the performance numbers unfortunately looks bad, once we add the
> test for irqs_disabled().  The slowdown by replacing preempt_disable
> with BH-disable is still a win (we saved 29 cycles before, and loose
> 20, I was expecting regression to be only 10 cycles).
> 

This surprises me because I'm not seeing the same severity of problems
with irqs_disabled. Your path is slower than what's currently upstream
but it's still far better than a revert. The softirq column in the
middle is your patch versus a full revert which is the last columnm

  4.11.0-rc5 4.11.0-rc5 
4.11.0-rc5
 vanilla   softirq-v2r1 
   revert-v2r1
Ameanalloc-odr0-1   217.00 (  0.00%)   223.00 ( -2.76%) 
  280.54 (-29.28%)
Ameanalloc-odr0-2   162.23 (  0.00%)   174.46 ( -7.54%) 
  210.54 (-29.78%)
Ameanalloc-odr0-4   144.15 (  0.00%)   150.38 ( -4.32%) 
  182.38 (-26.52%)
Ameanalloc-odr0-8   126.00 (  0.00%)   132.15 ( -4.88%) 
  282.08 (-123.87%)
Ameanalloc-odr0-16  117.00 (  0.00%)   122.00 ( -4.27%) 
  253.00 (-116.24%)
Ameanalloc-odr0-32  113.00 (  0.00%)   118.00 ( -4.42%) 
  145.00 (-28.32%)
Ameanalloc-odr0-64  110.77 (  0.00%)   114.31 ( -3.19%) 
  143.00 (-29.10%)
Ameanalloc-odr0-128 109.00 (  0.00%)   107.69 (  1.20%) 
  179.54 (-64.71%)
Ameanalloc-odr0-256 121.00 (  0.00%)   125.00 ( -3.31%) 
  232.23 (-91.93%)
Amean

Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Jesper Dangaard Brouer
On Thu, 30 Mar 2017 14:04:36 +0100
Mel Gorman  wrote:

> On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > > Regardless or using in_irq() (or in combi with in_nmi()) I get the
> > > following warning below:
> > > 
> > > [0.00] Kernel command line: 
> > > BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> > > root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet 
> > > LANG=en_DK.UTF
> > > -8
> > > [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> > > [0.00] [ cut here ]
> > > [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> > > __local_bh_enable_ip+0x70/0x90
> > > [0.00] Modules linked in:
> > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > > 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> > > [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), 
> > > BIOS 1.60 12/16/2015
> > > [0.00] Call Trace:
> > > [0.00]  dump_stack+0x4f/0x73
> > > [0.00]  __warn+0xcb/0xf0
> > > [0.00]  warn_slowpath_null+0x1d/0x20
> > > [0.00]  __local_bh_enable_ip+0x70/0x90
> > > [0.00]  free_hot_cold_page+0x1a4/0x2f0
> > > [0.00]  __free_pages+0x1f/0x30
> > > [0.00]  __free_pages_bootmem+0xab/0xb8
> > > [0.00]  __free_memory_core+0x79/0x91
> > > [0.00]  free_all_bootmem+0xaa/0x122
> > > [0.00]  mem_init+0x71/0xa4
> > > [0.00]  start_kernel+0x1e5/0x3f1
> > > [0.00]  x86_64_start_reservations+0x2a/0x2c
> > > [0.00]  x86_64_start_kernel+0x178/0x18b
> > > [0.00]  start_cpu+0x14/0x14
> > > [0.00]  ? start_cpu+0x14/0x14
> > > [0.00] ---[ end trace a57944bec8fc985c ]---
> > > [0.00] Memory: 32739472K/33439416K available (7624K kernel code, 
> > > 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K reserved, 0K 
> > > cma-reserved)
> > > 
> > > And kernel/softirq.c:161 contains:
> > > 
> > >  WARN_ON_ONCE(in_irq() || irqs_disabled());
> > > 
> > > Thus, I don't think the change in my RFC-patch[1] is safe.
> > > Of changing[2] to support softirq allocations by replacing
> > > preempt_disable() with local_bh_disable().
> > > 
> > > [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> > > 
> > > [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for 
> > > irq-safe requests")
> > >  https://git.kernel.org/torvalds/c/374ad05ab64d  
> > 
> > A patch that avoids the above warning is inlined below, but I'm not
> > sure if this is best direction.  Or we should rather consider reverting
> > part of commit 374ad05ab64d to avoid the softirq performance regression?
> >
> 
> At the moment, I'm not seeing a better alternative. If this works, I
> think it would still be far superior in terms of performance than a
> revert. 

Started performance benchmarking:
 163 cycles = current state
 183 cycles = with BH disable + in_irq
 218 cycles = with BH disable + in_irq + irqs_disabled

Thus, the performance numbers unfortunately looks bad, once we add the
test for irqs_disabled().  The slowdown by replacing preempt_disable
with BH-disable is still a win (we saved 29 cycles before, and loose
20, I was expecting regression to be only 10 cycles).

Bad things happen when adding the test for irqs_disabled().  This
likely happens because it uses the "pushfq + pop" to read CPU flags.  I
wonder if X86-experts know if e.g. using "lahf" would be faster (and if
it also loads the interrupt flag X86_EFLAGS_IF)?

We basically lost more (163-218=-55) than we gained (29) :-(


> As before, if there are bad consequences to adding a BH
> rescheduling point then we'll have to revert. However, I don't like a
> revert being the first option as it'll keep encouraging drivers to build
> sub-allocators to avoid the page allocator.

I'm also motivated by speeding up the page allocator to avoid this
happening in all the drivers.

> > [PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator
> > 
> > From: Jesper Dangaard Brouer 
> >   
> 
> Other than the slightly misleading comments about NMI which could
> explain "this potentially misses an NMI but an NMI allocating pages is
> brain damaged", I don't see a problem. The irqs_disabled() check is a
> subtle but it's not earth shattering and it still helps the 100GiB cases
> with the limited cycle budget to process packets.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Jesper Dangaard Brouer
On Thu, 30 Mar 2017 14:04:36 +0100
Mel Gorman  wrote:

> On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > > Regardless or using in_irq() (or in combi with in_nmi()) I get the
> > > following warning below:
> > > 
> > > [0.00] Kernel command line: 
> > > BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> > > root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet 
> > > LANG=en_DK.UTF
> > > -8
> > > [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> > > [0.00] [ cut here ]
> > > [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> > > __local_bh_enable_ip+0x70/0x90
> > > [0.00] Modules linked in:
> > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > > 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> > > [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), 
> > > BIOS 1.60 12/16/2015
> > > [0.00] Call Trace:
> > > [0.00]  dump_stack+0x4f/0x73
> > > [0.00]  __warn+0xcb/0xf0
> > > [0.00]  warn_slowpath_null+0x1d/0x20
> > > [0.00]  __local_bh_enable_ip+0x70/0x90
> > > [0.00]  free_hot_cold_page+0x1a4/0x2f0
> > > [0.00]  __free_pages+0x1f/0x30
> > > [0.00]  __free_pages_bootmem+0xab/0xb8
> > > [0.00]  __free_memory_core+0x79/0x91
> > > [0.00]  free_all_bootmem+0xaa/0x122
> > > [0.00]  mem_init+0x71/0xa4
> > > [0.00]  start_kernel+0x1e5/0x3f1
> > > [0.00]  x86_64_start_reservations+0x2a/0x2c
> > > [0.00]  x86_64_start_kernel+0x178/0x18b
> > > [0.00]  start_cpu+0x14/0x14
> > > [0.00]  ? start_cpu+0x14/0x14
> > > [0.00] ---[ end trace a57944bec8fc985c ]---
> > > [0.00] Memory: 32739472K/33439416K available (7624K kernel code, 
> > > 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K reserved, 0K 
> > > cma-reserved)
> > > 
> > > And kernel/softirq.c:161 contains:
> > > 
> > >  WARN_ON_ONCE(in_irq() || irqs_disabled());
> > > 
> > > Thus, I don't think the change in my RFC-patch[1] is safe.
> > > Of changing[2] to support softirq allocations by replacing
> > > preempt_disable() with local_bh_disable().
> > > 
> > > [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> > > 
> > > [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for 
> > > irq-safe requests")
> > >  https://git.kernel.org/torvalds/c/374ad05ab64d  
> > 
> > A patch that avoids the above warning is inlined below, but I'm not
> > sure if this is best direction.  Or we should rather consider reverting
> > part of commit 374ad05ab64d to avoid the softirq performance regression?
> >
> 
> At the moment, I'm not seeing a better alternative. If this works, I
> think it would still be far superior in terms of performance than a
> revert. 

Started performance benchmarking:
 163 cycles = current state
 183 cycles = with BH disable + in_irq
 218 cycles = with BH disable + in_irq + irqs_disabled

Thus, the performance numbers unfortunately looks bad, once we add the
test for irqs_disabled().  The slowdown by replacing preempt_disable
with BH-disable is still a win (we saved 29 cycles before, and loose
20, I was expecting regression to be only 10 cycles).

Bad things happen when adding the test for irqs_disabled().  This
likely happens because it uses the "pushfq + pop" to read CPU flags.  I
wonder if X86-experts know if e.g. using "lahf" would be faster (and if
it also loads the interrupt flag X86_EFLAGS_IF)?

We basically lost more (163-218=-55) than we gained (29) :-(


> As before, if there are bad consequences to adding a BH
> rescheduling point then we'll have to revert. However, I don't like a
> revert being the first option as it'll keep encouraging drivers to build
> sub-allocators to avoid the page allocator.

I'm also motivated by speeding up the page allocator to avoid this
happening in all the drivers.

> > [PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator
> > 
> > From: Jesper Dangaard Brouer 
> >   
> 
> Other than the slightly misleading comments about NMI which could
> explain "this potentially misses an NMI but an NMI allocating pages is
> brain damaged", I don't see a problem. The irqs_disabled() check is a
> subtle but it's not earth shattering and it still helps the 100GiB cases
> with the limited cycle budget to process packets.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Mel Gorman
On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > Regardless or using in_irq() (or in combi with in_nmi()) I get the
> > following warning below:
> > 
> > [0.00] Kernel command line: 
> > BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> > root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet LANG=en_DK.UTF
> > -8
> > [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> > [0.00] [ cut here ]
> > [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> > __local_bh_enable_ip+0x70/0x90
> > [0.00] Modules linked in:
> > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> > [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), BIOS 
> > 1.60 12/16/2015
> > [0.00] Call Trace:
> > [0.00]  dump_stack+0x4f/0x73
> > [0.00]  __warn+0xcb/0xf0
> > [0.00]  warn_slowpath_null+0x1d/0x20
> > [0.00]  __local_bh_enable_ip+0x70/0x90
> > [0.00]  free_hot_cold_page+0x1a4/0x2f0
> > [0.00]  __free_pages+0x1f/0x30
> > [0.00]  __free_pages_bootmem+0xab/0xb8
> > [0.00]  __free_memory_core+0x79/0x91
> > [0.00]  free_all_bootmem+0xaa/0x122
> > [0.00]  mem_init+0x71/0xa4
> > [0.00]  start_kernel+0x1e5/0x3f1
> > [0.00]  x86_64_start_reservations+0x2a/0x2c
> > [0.00]  x86_64_start_kernel+0x178/0x18b
> > [0.00]  start_cpu+0x14/0x14
> > [0.00]  ? start_cpu+0x14/0x14
> > [0.00] ---[ end trace a57944bec8fc985c ]---
> > [0.00] Memory: 32739472K/33439416K available (7624K kernel code, 
> > 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K reserved, 0K 
> > cma-reserved)
> > 
> > And kernel/softirq.c:161 contains:
> > 
> >  WARN_ON_ONCE(in_irq() || irqs_disabled());
> > 
> > Thus, I don't think the change in my RFC-patch[1] is safe.
> > Of changing[2] to support softirq allocations by replacing
> > preempt_disable() with local_bh_disable().
> > 
> > [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> > 
> > [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for 
> > irq-safe requests")
> >  https://git.kernel.org/torvalds/c/374ad05ab64d
> 
> A patch that avoids the above warning is inlined below, but I'm not
> sure if this is best direction.  Or we should rather consider reverting
> part of commit 374ad05ab64d to avoid the softirq performance regression?
>  

At the moment, I'm not seeing a better alternative. If this works, I
think it would still be far superior in terms of performance than a
revert. As before, if there are bad consequences to adding a BH
rescheduling point then we'll have to revert. However, I don't like a
revert being the first option as it'll keep encouraging drivers to build
sub-allocators to avoid the page allocator.

> [PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator
> 
> From: Jesper Dangaard Brouer 
> 

Other than the slightly misleading comments about NMI which could
explain "this potentially misses an NMI but an NMI allocating pages is
brain damaged", I don't see a problem. The irqs_disabled() check is a
subtle but it's not earth shattering and it still helps the 100GiB cases
with the limited cycle budget to process packets.

-- 
Mel Gorman
SUSE Labs


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Mel Gorman
On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > Regardless or using in_irq() (or in combi with in_nmi()) I get the
> > following warning below:
> > 
> > [0.00] Kernel command line: 
> > BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> > root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet LANG=en_DK.UTF
> > -8
> > [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> > [0.00] [ cut here ]
> > [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> > __local_bh_enable_ip+0x70/0x90
> > [0.00] Modules linked in:
> > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> > [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), BIOS 
> > 1.60 12/16/2015
> > [0.00] Call Trace:
> > [0.00]  dump_stack+0x4f/0x73
> > [0.00]  __warn+0xcb/0xf0
> > [0.00]  warn_slowpath_null+0x1d/0x20
> > [0.00]  __local_bh_enable_ip+0x70/0x90
> > [0.00]  free_hot_cold_page+0x1a4/0x2f0
> > [0.00]  __free_pages+0x1f/0x30
> > [0.00]  __free_pages_bootmem+0xab/0xb8
> > [0.00]  __free_memory_core+0x79/0x91
> > [0.00]  free_all_bootmem+0xaa/0x122
> > [0.00]  mem_init+0x71/0xa4
> > [0.00]  start_kernel+0x1e5/0x3f1
> > [0.00]  x86_64_start_reservations+0x2a/0x2c
> > [0.00]  x86_64_start_kernel+0x178/0x18b
> > [0.00]  start_cpu+0x14/0x14
> > [0.00]  ? start_cpu+0x14/0x14
> > [0.00] ---[ end trace a57944bec8fc985c ]---
> > [0.00] Memory: 32739472K/33439416K available (7624K kernel code, 
> > 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K reserved, 0K 
> > cma-reserved)
> > 
> > And kernel/softirq.c:161 contains:
> > 
> >  WARN_ON_ONCE(in_irq() || irqs_disabled());
> > 
> > Thus, I don't think the change in my RFC-patch[1] is safe.
> > Of changing[2] to support softirq allocations by replacing
> > preempt_disable() with local_bh_disable().
> > 
> > [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> > 
> > [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for 
> > irq-safe requests")
> >  https://git.kernel.org/torvalds/c/374ad05ab64d
> 
> A patch that avoids the above warning is inlined below, but I'm not
> sure if this is best direction.  Or we should rather consider reverting
> part of commit 374ad05ab64d to avoid the softirq performance regression?
>  

At the moment, I'm not seeing a better alternative. If this works, I
think it would still be far superior in terms of performance than a
revert. As before, if there are bad consequences to adding a BH
rescheduling point then we'll have to revert. However, I don't like a
revert being the first option as it'll keep encouraging drivers to build
sub-allocators to avoid the page allocator.

> [PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator
> 
> From: Jesper Dangaard Brouer 
> 

Other than the slightly misleading comments about NMI which could
explain "this potentially misses an NMI but an NMI allocating pages is
brain damaged", I don't see a problem. The irqs_disabled() check is a
subtle but it's not earth shattering and it still helps the 100GiB cases
with the limited cycle budget to process packets.

-- 
Mel Gorman
SUSE Labs


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Jesper Dangaard Brouer
On Thu, 30 Mar 2017 09:35:02 +0200
Peter Zijlstra  wrote:

> On Thu, Mar 30, 2017 at 09:12:23AM +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 30 Mar 2017 08:49:58 +0200
> > Peter Zijlstra  wrote:
> >   
> > > On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:  
> > > > @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool 
> > > > cold)
> > > > unsigned long pfn = page_to_pfn(page);
> > > > int migratetype;
> > > >  
> > > > -   if (in_interrupt()) {
> > > > +   /*
> > > > +* Exclude (hard) IRQ and NMI context from using the pcplists.
> > > > +* But allow softirq context, via disabling BH.
> > > > +*/
> > > > +   if (in_irq() || irqs_disabled()) {
> > > 
> > > Why do you need irqs_disabled() ?   
> > 
> > Because further down I call local_bh_enable(), which calls
> > __local_bh_enable_ip() which triggers a warning during early boot on:
> > 
> >   WARN_ON_ONCE(in_irq() || irqs_disabled());
> > 
> > It looks like it is for supporting CONFIG_TRACE_IRQFLAGS.  
> 
> Ah, no. Its because when you do things like:
> 
>   local_irq_disable();
>   local_bh_enable();
>   local_irq_enable();
> 
> you can loose a pending softirq.
> 
> Bugger.. that irqs_disabled() is something we could do without.

Yes, I really don't like adding this irqs_disabled() check here.

> I'm thinking that when tglx finishes his soft irq disable patches for
> x86 (same thing ppc also does) we can go revert all these patches.
> 
> Thomas, see:
> 
>   https://lkml.kernel.org/r/20170301144845.783f8...@redhat.com

The summary is Mel and I found a way to optimized the page allocator,
by avoiding a local_irq_{save,restore} operation, see commit
374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for irq-safe
requests")  [1] https://git.kernel.org/davem/net-next/c/374ad05ab64d696

But Tariq discovered that this caused a regression for 100Gbit/s NICs,
as the patch excluded softirq from using the per-cpu-page (PCP) lists.
As DMA RX page-refill happens in softirq context.

Now we are trying to re-enable allowing softirq to use the PCP.
My proposal is: https://lkml.kernel.org/r/20170329214441.08332...@redhat.com
The alternative is to revert this optimization.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Jesper Dangaard Brouer
On Thu, 30 Mar 2017 09:35:02 +0200
Peter Zijlstra  wrote:

> On Thu, Mar 30, 2017 at 09:12:23AM +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 30 Mar 2017 08:49:58 +0200
> > Peter Zijlstra  wrote:
> >   
> > > On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:  
> > > > @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool 
> > > > cold)
> > > > unsigned long pfn = page_to_pfn(page);
> > > > int migratetype;
> > > >  
> > > > -   if (in_interrupt()) {
> > > > +   /*
> > > > +* Exclude (hard) IRQ and NMI context from using the pcplists.
> > > > +* But allow softirq context, via disabling BH.
> > > > +*/
> > > > +   if (in_irq() || irqs_disabled()) {
> > > 
> > > Why do you need irqs_disabled() ?   
> > 
> > Because further down I call local_bh_enable(), which calls
> > __local_bh_enable_ip() which triggers a warning during early boot on:
> > 
> >   WARN_ON_ONCE(in_irq() || irqs_disabled());
> > 
> > It looks like it is for supporting CONFIG_TRACE_IRQFLAGS.  
> 
> Ah, no. Its because when you do things like:
> 
>   local_irq_disable();
>   local_bh_enable();
>   local_irq_enable();
> 
> you can loose a pending softirq.
> 
> Bugger.. that irqs_disabled() is something we could do without.

Yes, I really don't like adding this irqs_disabled() check here.

> I'm thinking that when tglx finishes his soft irq disable patches for
> x86 (same thing ppc also does) we can go revert all these patches.
> 
> Thomas, see:
> 
>   https://lkml.kernel.org/r/20170301144845.783f8...@redhat.com

The summary is Mel and I found a way to optimized the page allocator,
by avoiding a local_irq_{save,restore} operation, see commit
374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for irq-safe
requests")  [1] https://git.kernel.org/davem/net-next/c/374ad05ab64d696

But Tariq discovered that this caused a regression for 100Gbit/s NICs,
as the patch excluded softirq from using the per-cpu-page (PCP) lists.
As DMA RX page-refill happens in softirq context.

Now we are trying to re-enable allowing softirq to use the PCP.
My proposal is: https://lkml.kernel.org/r/20170329214441.08332...@redhat.com
The alternative is to revert this optimization.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Peter Zijlstra
On Thu, Mar 30, 2017 at 09:12:23AM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 30 Mar 2017 08:49:58 +0200
> Peter Zijlstra  wrote:
> 
> > On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > > @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool 
> > > cold)
> > >   unsigned long pfn = page_to_pfn(page);
> > >   int migratetype;
> > >  
> > > - if (in_interrupt()) {
> > > + /*
> > > +  * Exclude (hard) IRQ and NMI context from using the pcplists.
> > > +  * But allow softirq context, via disabling BH.
> > > +  */
> > > + if (in_irq() || irqs_disabled()) {  
> > 
> > Why do you need irqs_disabled() ? 
> 
> Because further down I call local_bh_enable(), which calls
> __local_bh_enable_ip() which triggers a warning during early boot on:
> 
>   WARN_ON_ONCE(in_irq() || irqs_disabled());
> 
> It looks like it is for supporting CONFIG_TRACE_IRQFLAGS.

Ah, no. Its because when you do things like:

local_irq_disable();
local_bh_enable();
local_irq_enable();

you can loose a pending softirq.

Bugger.. that irqs_disabled() is something we could do without.

I'm thinking that when tglx finishes his soft irq disable patches for
x86 (same thing ppc also does) we can go revert all these patches.

Thomas, see:

  https://lkml.kernel.org/r/20170301144845.783f8...@redhat.com


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Peter Zijlstra
On Thu, Mar 30, 2017 at 09:12:23AM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 30 Mar 2017 08:49:58 +0200
> Peter Zijlstra  wrote:
> 
> > On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > > @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool 
> > > cold)
> > >   unsigned long pfn = page_to_pfn(page);
> > >   int migratetype;
> > >  
> > > - if (in_interrupt()) {
> > > + /*
> > > +  * Exclude (hard) IRQ and NMI context from using the pcplists.
> > > +  * But allow softirq context, via disabling BH.
> > > +  */
> > > + if (in_irq() || irqs_disabled()) {  
> > 
> > Why do you need irqs_disabled() ? 
> 
> Because further down I call local_bh_enable(), which calls
> __local_bh_enable_ip() which triggers a warning during early boot on:
> 
>   WARN_ON_ONCE(in_irq() || irqs_disabled());
> 
> It looks like it is for supporting CONFIG_TRACE_IRQFLAGS.

Ah, no. Its because when you do things like:

local_irq_disable();
local_bh_enable();
local_irq_enable();

you can loose a pending softirq.

Bugger.. that irqs_disabled() is something we could do without.

I'm thinking that when tglx finishes his soft irq disable patches for
x86 (same thing ppc also does) we can go revert all these patches.

Thomas, see:

  https://lkml.kernel.org/r/20170301144845.783f8...@redhat.com


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Jesper Dangaard Brouer
On Thu, 30 Mar 2017 08:49:58 +0200
Peter Zijlstra  wrote:

> On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool cold)
> > unsigned long pfn = page_to_pfn(page);
> > int migratetype;
> >  
> > -   if (in_interrupt()) {
> > +   /*
> > +* Exclude (hard) IRQ and NMI context from using the pcplists.
> > +* But allow softirq context, via disabling BH.
> > +*/
> > +   if (in_irq() || irqs_disabled()) {  
> 
> Why do you need irqs_disabled() ? 

Because further down I call local_bh_enable(), which calls
__local_bh_enable_ip() which triggers a warning during early boot on:

  WARN_ON_ONCE(in_irq() || irqs_disabled());

It looks like it is for supporting CONFIG_TRACE_IRQFLAGS.


> Also, your comment is stale, it still refers to NMI context.

True, as you told me NMI is implicit, as it cannot occur.

> > __free_pages_ok(page, 0);
> > return;
> > }  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Jesper Dangaard Brouer
On Thu, 30 Mar 2017 08:49:58 +0200
Peter Zijlstra  wrote:

> On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> > @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool cold)
> > unsigned long pfn = page_to_pfn(page);
> > int migratetype;
> >  
> > -   if (in_interrupt()) {
> > +   /*
> > +* Exclude (hard) IRQ and NMI context from using the pcplists.
> > +* But allow softirq context, via disabling BH.
> > +*/
> > +   if (in_irq() || irqs_disabled()) {  
> 
> Why do you need irqs_disabled() ? 

Because further down I call local_bh_enable(), which calls
__local_bh_enable_ip() which triggers a warning during early boot on:

  WARN_ON_ONCE(in_irq() || irqs_disabled());

It looks like it is for supporting CONFIG_TRACE_IRQFLAGS.


> Also, your comment is stale, it still refers to NMI context.

True, as you told me NMI is implicit, as it cannot occur.

> > __free_pages_ok(page, 0);
> > return;
> > }  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Peter Zijlstra
On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool cold)
>   unsigned long pfn = page_to_pfn(page);
>   int migratetype;
>  
> - if (in_interrupt()) {
> + /*
> +  * Exclude (hard) IRQ and NMI context from using the pcplists.
> +  * But allow softirq context, via disabling BH.
> +  */
> + if (in_irq() || irqs_disabled()) {

Why do you need irqs_disabled() ? Also, your comment is stale, it still
refers to NMI context.

>   __free_pages_ok(page, 0);
>   return;
>   }


Re: in_irq_or_nmi() and RFC patch

2017-03-30 Thread Peter Zijlstra
On Wed, Mar 29, 2017 at 09:44:41PM +0200, Jesper Dangaard Brouer wrote:
> @@ -2481,7 +2481,11 @@ void free_hot_cold_page(struct page *page, bool cold)
>   unsigned long pfn = page_to_pfn(page);
>   int migratetype;
>  
> - if (in_interrupt()) {
> + /*
> +  * Exclude (hard) IRQ and NMI context from using the pcplists.
> +  * But allow softirq context, via disabling BH.
> +  */
> + if (in_irq() || irqs_disabled()) {

Why do you need irqs_disabled() ? Also, your comment is stale, it still
refers to NMI context.

>   __free_pages_ok(page, 0);
>   return;
>   }


Re: in_irq_or_nmi() and RFC patch

2017-03-29 Thread Jesper Dangaard Brouer
On Wed, 29 Mar 2017 21:11:44 +0200
Jesper Dangaard Brouer  wrote:

> On Wed, 29 Mar 2017 11:12:26 -0700 Matthew Wilcox  wrote:
> 
> > On Wed, Mar 29, 2017 at 11:19:49AM +0200, Peter Zijlstra wrote:  
> > > On Wed, Mar 29, 2017 at 10:59:28AM +0200, Jesper Dangaard Brouer wrote:   
> > >  
> > > > On Wed, 29 Mar 2017 10:12:19 +0200
> > > > Peter Zijlstra  wrote:
> > > > > No, that's horrible. Also, wth is this about? A memory allocator that
> > > > > needs in_nmi()? That sounds beyond broken.
> > > > 
> > > > It is the other way around. We want to exclude NMI and HARDIRQ from
> > > > using the per-cpu-pages (pcp) lists "order-0 cache" (they will
> > > > fall-through using the normal buddy allocator path).
> > > 
> > > Any in_nmi() code arriving at the allocator is broken. No need to fix
> > > the allocator.
> > 
> > That's demonstrably true.  You can't grab a spinlock in NMI code and
> > the first thing that happens if this in_irq_or_nmi() check fails is ...
> > spin_lock_irqsave(>lock, flags);
> > so this patch should just use in_irq().
> > 
> > (the concept of NMI code needing to allocate memory was blowing my mind
> > a little bit)  
> 
> Regardless or using in_irq() (or in combi with in_nmi()) I get the
> following warning below:
> 
> [0.00] Kernel command line: 
> BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet LANG=en_DK.UTF
> -8
> [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [0.00] [ cut here ]
> [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> __local_bh_enable_ip+0x70/0x90
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), BIOS 
> 1.60 12/16/2015
> [0.00] Call Trace:
> [0.00]  dump_stack+0x4f/0x73
> [0.00]  __warn+0xcb/0xf0
> [0.00]  warn_slowpath_null+0x1d/0x20
> [0.00]  __local_bh_enable_ip+0x70/0x90
> [0.00]  free_hot_cold_page+0x1a4/0x2f0
> [0.00]  __free_pages+0x1f/0x30
> [0.00]  __free_pages_bootmem+0xab/0xb8
> [0.00]  __free_memory_core+0x79/0x91
> [0.00]  free_all_bootmem+0xaa/0x122
> [0.00]  mem_init+0x71/0xa4
> [0.00]  start_kernel+0x1e5/0x3f1
> [0.00]  x86_64_start_reservations+0x2a/0x2c
> [0.00]  x86_64_start_kernel+0x178/0x18b
> [0.00]  start_cpu+0x14/0x14
> [0.00]  ? start_cpu+0x14/0x14
> [0.00] ---[ end trace a57944bec8fc985c ]---
> [0.00] Memory: 32739472K/33439416K available (7624K kernel code, 
> 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K reserved, 0K 
> cma-reserved)
> 
> And kernel/softirq.c:161 contains:
> 
>  WARN_ON_ONCE(in_irq() || irqs_disabled());
> 
> Thus, I don't think the change in my RFC-patch[1] is safe.
> Of changing[2] to support softirq allocations by replacing
> preempt_disable() with local_bh_disable().
> 
> [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> 
> [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for 
> irq-safe requests")
>  https://git.kernel.org/torvalds/c/374ad05ab64d

A patch that avoids the above warning is inlined below, but I'm not
sure if this is best direction.  Or we should rather consider reverting
part of commit 374ad05ab64d to avoid the softirq performance regression?
 

[PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator

From: Jesper Dangaard Brouer 

IRQ context were excluded from using the Per-Cpu-Pages (PCP) lists
caching of order-0 pages in commit 374ad05ab64d ("mm, page_alloc: only
use per-cpu allocator for irq-safe requests").

This unfortunately also included excluded SoftIRQ.  This hurt the
performance for the use-case of refilling DMA RX rings in softirq
context.

This patch re-allow softirq context, which should be safe by disabling
BH/softirq, while accessing the list.  PCP-lists access from both
hard-IRQ and NMI context must not be allowed.  Peter Zijlstra says
in_nmi() code never access the page allocator, thus it should be
sufficient to only test for !in_irq().

One concern with this change is adding a BH (enable) scheduling point
at both PCP alloc and free.

Fixes: 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for irq-safe 
requests")
Signed-off-by: Jesper Dangaard Brouer 
---
 mm/page_alloc.c |   26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6cbde310abed..d7e986967910 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2351,9 +2351,9 @@ static void drain_local_pages_wq(struct work_struct *work)
 * cpu which is allright but we also have to make sure to not move to
 

Re: in_irq_or_nmi() and RFC patch

2017-03-29 Thread Jesper Dangaard Brouer
On Wed, 29 Mar 2017 21:11:44 +0200
Jesper Dangaard Brouer  wrote:

> On Wed, 29 Mar 2017 11:12:26 -0700 Matthew Wilcox  wrote:
> 
> > On Wed, Mar 29, 2017 at 11:19:49AM +0200, Peter Zijlstra wrote:  
> > > On Wed, Mar 29, 2017 at 10:59:28AM +0200, Jesper Dangaard Brouer wrote:   
> > >  
> > > > On Wed, 29 Mar 2017 10:12:19 +0200
> > > > Peter Zijlstra  wrote:
> > > > > No, that's horrible. Also, wth is this about? A memory allocator that
> > > > > needs in_nmi()? That sounds beyond broken.
> > > > 
> > > > It is the other way around. We want to exclude NMI and HARDIRQ from
> > > > using the per-cpu-pages (pcp) lists "order-0 cache" (they will
> > > > fall-through using the normal buddy allocator path).
> > > 
> > > Any in_nmi() code arriving at the allocator is broken. No need to fix
> > > the allocator.
> > 
> > That's demonstrably true.  You can't grab a spinlock in NMI code and
> > the first thing that happens if this in_irq_or_nmi() check fails is ...
> > spin_lock_irqsave(>lock, flags);
> > so this patch should just use in_irq().
> > 
> > (the concept of NMI code needing to allocate memory was blowing my mind
> > a little bit)  
> 
> Regardless or using in_irq() (or in combi with in_nmi()) I get the
> following warning below:
> 
> [0.00] Kernel command line: 
> BOOT_IMAGE=/vmlinuz-4.11.0-rc3-net-next-page-alloc-softirq+ 
> root=UUID=2e8451ff-6797-49b5-8d3a-eed5a42d7dc9 ro rhgb quiet LANG=en_DK.UTF
> -8
> [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [0.00] [ cut here ]
> [0.00] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:161 
> __local_bh_enable_ip+0x70/0x90
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> 4.11.0-rc3-net-next-page-alloc-softirq+ #235
> [0.00] Hardware name: MSI MS-7984/Z170A GAMING PRO (MS-7984), BIOS 
> 1.60 12/16/2015
> [0.00] Call Trace:
> [0.00]  dump_stack+0x4f/0x73
> [0.00]  __warn+0xcb/0xf0
> [0.00]  warn_slowpath_null+0x1d/0x20
> [0.00]  __local_bh_enable_ip+0x70/0x90
> [0.00]  free_hot_cold_page+0x1a4/0x2f0
> [0.00]  __free_pages+0x1f/0x30
> [0.00]  __free_pages_bootmem+0xab/0xb8
> [0.00]  __free_memory_core+0x79/0x91
> [0.00]  free_all_bootmem+0xaa/0x122
> [0.00]  mem_init+0x71/0xa4
> [0.00]  start_kernel+0x1e5/0x3f1
> [0.00]  x86_64_start_reservations+0x2a/0x2c
> [0.00]  x86_64_start_kernel+0x178/0x18b
> [0.00]  start_cpu+0x14/0x14
> [0.00]  ? start_cpu+0x14/0x14
> [0.00] ---[ end trace a57944bec8fc985c ]---
> [0.00] Memory: 32739472K/33439416K available (7624K kernel code, 
> 1528K rwdata, 3168K rodata, 1860K init, 2260K bss, 699944K reserved, 0K 
> cma-reserved)
> 
> And kernel/softirq.c:161 contains:
> 
>  WARN_ON_ONCE(in_irq() || irqs_disabled());
> 
> Thus, I don't think the change in my RFC-patch[1] is safe.
> Of changing[2] to support softirq allocations by replacing
> preempt_disable() with local_bh_disable().
> 
> [1] http://lkml.kernel.org/r/20170327143947.4c237...@redhat.com
> 
> [2] commit 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for 
> irq-safe requests")
>  https://git.kernel.org/torvalds/c/374ad05ab64d

A patch that avoids the above warning is inlined below, but I'm not
sure if this is best direction.  Or we should rather consider reverting
part of commit 374ad05ab64d to avoid the softirq performance regression?
 

[PATCH] mm, page_alloc: re-enable softirq use of per-cpu page allocator

From: Jesper Dangaard Brouer 

IRQ context were excluded from using the Per-Cpu-Pages (PCP) lists
caching of order-0 pages in commit 374ad05ab64d ("mm, page_alloc: only
use per-cpu allocator for irq-safe requests").

This unfortunately also included excluded SoftIRQ.  This hurt the
performance for the use-case of refilling DMA RX rings in softirq
context.

This patch re-allow softirq context, which should be safe by disabling
BH/softirq, while accessing the list.  PCP-lists access from both
hard-IRQ and NMI context must not be allowed.  Peter Zijlstra says
in_nmi() code never access the page allocator, thus it should be
sufficient to only test for !in_irq().

One concern with this change is adding a BH (enable) scheduling point
at both PCP alloc and free.

Fixes: 374ad05ab64d ("mm, page_alloc: only use per-cpu allocator for irq-safe 
requests")
Signed-off-by: Jesper Dangaard Brouer 
---
 mm/page_alloc.c |   26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6cbde310abed..d7e986967910 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2351,9 +2351,9 @@ static void drain_local_pages_wq(struct work_struct *work)
 * cpu which is allright but we also have to make sure to not move to
 * a different one.
 */
-   preempt_disable();
+   local_bh_disable();