[PATCH 6/7] x86,mm: always use lazy TLB mode

2018-07-10 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu

[PATCH 6/7] x86,mm: always use lazy TLB mode

2018-07-10 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu

[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-07-10 Thread Rik van Riel
. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 493559cae2d5..ac86c5010472 100644 --- a/arch/x86/mm/tlb.c

[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-07-10 Thread Rik van Riel
. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 493559cae2d5..ac86c5010472 100644 --- a/arch/x86/mm/tlb.c

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-07-10 Thread Rik van Riel
oad on two socket systems, and by about 1% for a heavily multi-process netperf between two systems. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 68 +++ 1 file changed, 59 insertions(+), 9 deleti

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-07-10 Thread Rik van Riel
oad on two socket systems, and by about 1% for a heavily multi-process netperf between two systems. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 68 +++ 1 file changed, 59 insertions(+), 9 deleti

[PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-07-10 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5

[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-10 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 43 +++ 1 file changed, 39 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 26542cc17043..e4156e37aa71

[PATCH v5 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-07-10 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

[PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-07-10 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5

[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-10 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 43 +++ 1 file changed, 39 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 26542cc17043..e4156e37aa71

[PATCH v5 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-07-10 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-09 Thread Rik van Riel
On Sun, 2018-07-08 at 16:13 +0200, Mike Galbraith wrote: > On Sat, 2018-07-07 at 17:25 -0400, Rik van Riel wrote: > > > > > ./include/linux/bitmap.h:208:3: warning: ‘memset’ writing 64 > > > bytes > > > into a region of size 0 overflows the des

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-09 Thread Rik van Riel
On Sun, 2018-07-08 at 16:13 +0200, Mike Galbraith wrote: > On Sat, 2018-07-07 at 17:25 -0400, Rik van Riel wrote: > > > > > ./include/linux/bitmap.h:208:3: warning: ‘memset’ writing 64 > > > bytes > > > into a region of size 0 overflows the des

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-07 Thread Rik van Riel
I. On Sat, 2018-07-07 at 10:23 +0200, Mike Galbraith wrote: > On Fri, 2018-07-06 at 17:56 -0400, Rik van Riel wrote: > > The mm_struct always contains a cpumask bitmap, regardless of > > CONFIG_CPUMASK_OFFSTACK. That means the first step can be to > > simplify things

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-07 Thread Rik van Riel
I. On Sat, 2018-07-07 at 10:23 +0200, Mike Galbraith wrote: > On Fri, 2018-07-06 at 17:56 -0400, Rik van Riel wrote: > > The mm_struct always contains a cpumask bitmap, regardless of > > CONFIG_CPUMASK_OFFSTACK. That means the first step can be to > > simplify things

Re: [PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-07 Thread Rik van Riel
On Sat, 2018-07-07 at 14:26 +0200, Mike Galbraith wrote: > On Fri, 2018-06-29 at 10:29 -0400, Rik van Riel wrote: > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > > index e59214ec52b1..c4073367219d 100644 > > --- a/arch/x86/mm/tlb.c > > +++ b/arch/x86/mm/t

Re: [PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-07 Thread Rik van Riel
On Sat, 2018-07-07 at 14:26 +0200, Mike Galbraith wrote: > On Fri, 2018-06-29 at 10:29 -0400, Rik van Riel wrote: > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > > index e59214ec52b1..c4073367219d 100644 > > --- a/arch/x86/mm/tlb.c > > +++ b/arch/x86/mm/t

[PATCH 6/7] x86,mm: always use lazy TLB mode

2018-07-06 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu

[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-07-06 Thread Rik van Riel
. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e7c6de7eb903..c8644aa12abd 100644 --- a/arch/x86/mm/tlb.c

[PATCH 6/7] x86,mm: always use lazy TLB mode

2018-07-06 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu

[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-07-06 Thread Rik van Riel
. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e7c6de7eb903..c8644aa12abd 100644 --- a/arch/x86/mm/tlb.c

[PATCH 3/7] x86,mm: restructure switch_mm_irqs_off

2018-07-06 Thread Rik van Riel
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Suggested-by: Andy Lutomirski --- arch/x86

[PATCH 3/7] x86,mm: restructure switch_mm_irqs_off

2018-07-06 Thread Rik van Riel
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Suggested-by: Andy Lutomirski --- arch/x86

[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-06 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 39 --- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 26542cc17043..8aa20d217603

[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-06 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 39 --- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 26542cc17043..8aa20d217603

[PATCH v4 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-07-06 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

[PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-06 Thread Rik van Riel
is compiled for, since we only have one init_mm in the system, anyway. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- include/linux/mm_types.h | 237 --- kernel/fork.c| 15 +-- mm/init-mm.c | 11

[PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-07-06 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-07-06 Thread Rik van Riel
oad on two socket systems, and by about 1% for a heavily multi-process netperf between two systems. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 68 +++ 1 file changed, 59 insertions(+), 9 deleti

[PATCH v4 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-07-06 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

[PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-07-06 Thread Rik van Riel
is compiled for, since we only have one init_mm in the system, anyway. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- include/linux/mm_types.h | 237 --- kernel/fork.c| 15 +-- mm/init-mm.c | 11

[PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-07-06 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-07-06 Thread Rik van Riel
oad on two socket systems, and by about 1% for a heavily multi-process netperf between two systems. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 68 +++ 1 file changed, 59 insertions(+), 9 deleti

[PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range"

2018-07-06 Thread Rik van Riel
There was a bug in Linux that could cause madvise (and mprotect?) system calls to return to userspace without the TLB having been flushed for all the pages involved. This could happen when multiple threads of a process made simultaneous madvise and/or mprotect calls. This was noticed in the

[PATCH] Revert "mm: always flush VMA ranges affected by zap_page_range"

2018-07-06 Thread Rik van Riel
There was a bug in Linux that could cause madvise (and mprotect?) system calls to return to userspace without the TLB having been flushed for all the pages involved. This could happen when multiple threads of a process made simultaneous madvise and/or mprotect calls. This was noticed in the

mm,tlb: revert 4647706ebeee?

2018-07-06 Thread Rik van Riel
Hello, It looks like last summer, there were 2 sets of patches in flight to fix the issue of simultaneous mprotect/madvise calls unmapping PTEs, and some pages not being flushed from the TLB before returning to userspace. Minchan posted these patches: 56236a59556c ("mm: refactor TLB gathering

mm,tlb: revert 4647706ebeee?

2018-07-06 Thread Rik van Riel
Hello, It looks like last summer, there were 2 sets of patches in flight to fix the issue of simultaneous mprotect/madvise calls unmapping PTEs, and some pages not being flushed from the TLB before returning to userspace. Minchan posted these patches: 56236a59556c ("mm: refactor TLB gathering

Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-29 Thread Rik van Riel
On Fri, 2018-06-29 at 10:05 -0700, Dave Hansen wrote: > On 06/29/2018 07:29 AM, Rik van Riel wrote: > > + /* > > +* If the CPU is not in lazy TLB mode, we are just > > switching > > +* from one thread in a process to ano

Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-29 Thread Rik van Riel
On Fri, 2018-06-29 at 10:05 -0700, Dave Hansen wrote: > On 06/29/2018 07:29 AM, Rik van Riel wrote: > > + /* > > +* If the CPU is not in lazy TLB mode, we are just > > switching > > +* from one thread in a process to ano

Re: [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-29 Thread Rik van Riel
On Fri, 2018-06-29 at 09:39 -0700, Dave Hansen wrote: > On 06/29/2018 07:29 AM, Rik van Riel wrote: > > The latter problem can be prevented in two ways. The first is to > > always send a TLB shootdown IPI to CPUs in lazy TLB mode, while > > the second one is to only send the

Re: [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-29 Thread Rik van Riel
On Fri, 2018-06-29 at 09:39 -0700, Dave Hansen wrote: > On 06/29/2018 07:29 AM, Rik van Riel wrote: > > The latter problem can be prevented in two ways. The first is to > > always send a TLB shootdown IPI to CPUs in lazy TLB mode, while > > the second one is to only send the

[PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-29 Thread Rik van Riel
is compiled for, since we only have one init_mm in the system, anyway. Signed-off-by: Rik van Riel Tested-by: Song Liu --- include/linux/mm_types.h | 237 --- kernel/fork.c| 15 +-- mm/init-mm.c | 11 +++ 3 files changed, 140

[PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-29 Thread Rik van Riel
is compiled for, since we only have one init_mm in the system, anyway. Signed-off-by: Rik van Riel Tested-by: Song Liu --- include/linux/mm_types.h | 237 --- kernel/fork.c| 15 +-- mm/init-mm.c | 11 +++ 3 files changed, 140

[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-06-29 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/mm/tlb.c | 39 --- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e59214ec52b1..c4073367219d 100644 --- a/arch/x86/mm

[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-06-29 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/mm/tlb.c | 39 --- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e59214ec52b1..c4073367219d 100644 --- a/arch/x86/mm

[PATCH 3/7] x86,mm: restructure switch_mm_irqs_off

2018-06-29 Thread Rik van Riel
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. Signed-off-by: Rik van Riel Suggested-by: Andy Lutomirski --- arch/x86/mm/tlb.c |

[PATCH 3/7] x86,mm: restructure switch_mm_irqs_off

2018-06-29 Thread Rik van Riel
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. Signed-off-by: Rik van Riel Suggested-by: Andy Lutomirski --- arch/x86/mm/tlb.c |

[PATCH 6/7] x86,mm: always use lazy TLB mode

2018-06-29 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include

[PATCH 6/7] x86,mm: always use lazy TLB mode

2018-06-29 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-29 Thread Rik van Riel
oad on two socket systems, and by about 1% for a heavily multi-process netperf between two systems. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/uv/uv.h | 6 ++-- arch/x86/mm/tlb.c | 82 --- arch/x86/platform/uv/tlb_u

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-29 Thread Rik van Riel
oad on two socket systems, and by about 1% for a heavily multi-process netperf between two systems. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/uv/uv.h | 6 ++-- arch/x86/mm/tlb.c | 82 --- arch/x86/platform/uv/tlb_u

[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-06-29 Thread Rik van Riel
. Signed-off-by: Rik van Riel Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 5a01fcb22a7e..b55e6b7df7c9 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c

[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-06-29 Thread Rik van Riel
. Signed-off-by: Rik van Riel Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 5a01fcb22a7e..b55e6b7df7c9 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c

[PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-29 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5 + arch/x86/mm

[PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-29 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5 + arch/x86/mm

[PATCH v3 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-06-29 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

[PATCH v3 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-06-29 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

Re: [PATCH 3/6] x86,tlb: make lazy TLB mode lazier

2018-06-28 Thread Rik van Riel
On Wed, 2018-06-27 at 11:10 -0700, Andy Lutomirski wrote: > > You left this comment: > > /* > * We don't currently support having a real mm loaded > without > * our cpu set in mm_cpumask(). We have all the > bookkeeping > * in

Re: [PATCH 3/6] x86,tlb: make lazy TLB mode lazier

2018-06-28 Thread Rik van Riel
On Wed, 2018-06-27 at 11:10 -0700, Andy Lutomirski wrote: > > You left this comment: > > /* > * We don't currently support having a real mm loaded > without > * our cpu set in mm_cpumask(). We have all the > bookkeeping > * in

Re: [PATCH 3/6] x86,tlb: make lazy TLB mode lazier

2018-06-27 Thread Rik van Riel
On Wed, 2018-06-27 at 11:10 -0700, Andy Lutomirski wrote: > On Tue, Jun 26, 2018 at 10:31 AM Rik van Riel > wrote: > In general, the changes to this function are very hard to review > because you're mixing semantic changes and restructuring the > function. > Is there any w

Re: [PATCH 3/6] x86,tlb: make lazy TLB mode lazier

2018-06-27 Thread Rik van Riel
On Wed, 2018-06-27 at 11:10 -0700, Andy Lutomirski wrote: > On Tue, Jun 26, 2018 at 10:31 AM Rik van Riel > wrote: > In general, the changes to this function are very hard to review > because you're mixing semantic changes and restructuring the > function. > Is there any w

[PATCH 1/6] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-26 Thread Rik van Riel
is compiled for, since we only have one init_mm in the system, anyway. Signed-off-by: Rik van Riel Tested-by: Song Liu --- include/linux/mm_types.h | 237 --- kernel/fork.c| 15 +-- mm/init-mm.c | 11 +++ 3 files changed, 140

[PATCH 5/6] x86,mm: always use lazy TLB mode

2018-06-26 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include

[PATCH 1/6] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-26 Thread Rik van Riel
is compiled for, since we only have one init_mm in the system, anyway. Signed-off-by: Rik van Riel Tested-by: Song Liu --- include/linux/mm_types.h | 237 --- kernel/fork.c| 15 +-- mm/init-mm.c | 11 +++ 3 files changed, 140

[PATCH 5/6] x86,mm: always use lazy TLB mode

2018-06-26 Thread Rik van Riel
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include

[PATCH 6/6] x86,switch_mm: skip atomic operations for init_mm

2018-06-26 Thread Rik van Riel
of the savings from switch_mm_irqs_off seem to go towards higher netperf throughput. Signed-off-by: Rik van Riel Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index

[PATCH 6/6] x86,switch_mm: skip atomic operations for init_mm

2018-06-26 Thread Rik van Riel
of the savings from switch_mm_irqs_off seem to go towards higher netperf throughput. Signed-off-by: Rik van Riel Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index

[PATCH 2/6] x86,tlb: leave lazy TLB mode at page table free time

2018-06-26 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5 + arch/x86/mm

[PATCH 3/6] x86,tlb: make lazy TLB mode lazier

2018-06-26 Thread Rik van Riel
016 version of this patch, CPUs with cpu_tlbstate.is_lazy set are not removed from the mm_cpumask(mm), since that would prevent the TLB flush IPIs at page table free time from being sent to all the CPUs that need them. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/uv/uv.h |

[PATCH 2/6] x86,tlb: leave lazy TLB mode at page table free time

2018-06-26 Thread Rik van Riel
workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5 + arch/x86/mm

[PATCH 3/6] x86,tlb: make lazy TLB mode lazier

2018-06-26 Thread Rik van Riel
016 version of this patch, CPUs with cpu_tlbstate.is_lazy set are not removed from the mm_cpumask(mm), since that would prevent the TLB flush IPIs at page table free time from being sent to all the CPUs that need them. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/uv/uv.h |

[PATCH 4/6] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-06-26 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/mm/tlb.c | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 137a2c62c75b..03512772395f 100644 --- a/arch/x86/mm

[PATCH 4/6] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-06-26 Thread Rik van Riel
be up to date yet. Signed-off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/mm/tlb.c | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 137a2c62c75b..03512772395f 100644 --- a/arch/x86/mm

[PATCH v2 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-06-26 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

[PATCH v2 0/7] x86,tlb,mm: make lazy TLB mode even lazier

2018-06-26 Thread Rik van Riel
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 15:05 -0700, Andy Lutomirski wrote: > I think the right solution if you want that last little bit of > performance is to get rid of the code in intel_idle and to add it in > the core idle code. We have fancy scheduler code to estimate the > idle > time, and we should use it

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 15:05 -0700, Andy Lutomirski wrote: > I think the right solution if you want that last little bit of > performance is to get rid of the code in intel_idle and to add it in > the core idle code. We have fancy scheduler code to estimate the > idle > time, and we should use it

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 09:01 -0700, Andy Lutomirski wrote: > Hmm, fair enough. I think a better heuristic would be if the > estimated idle duration is more than, say, 10ms. I *think* the code > has been cleaned up enough that this is easy now. (Using time > instead > of C6 will make it a lot

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 09:01 -0700, Andy Lutomirski wrote: > Hmm, fair enough. I think a better heuristic would be if the > estimated idle duration is more than, say, 10ms. I *think* the code > has been cleaned up enough that this is easy now. (Using time > instead > of C6 will make it a lot

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 08:10 -0700, Dave Hansen wrote: > On 06/20/2018 12:56 PM, Rik van Riel wrote: > > /* > > -* FIXME! The "sizeof(struct mm_struct)" currently > > includes the > > -* whole struct cpumask for the OFFSTACK case. We could > >

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 08:10 -0700, Dave Hansen wrote: > On 06/20/2018 12:56 PM, Rik van Riel wrote: > > /* > > -* FIXME! The "sizeof(struct mm_struct)" currently > > includes the > > -* whole struct cpumask for the OFFSTACK case. We could > >

Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 10:05 -0700, Dave Hansen wrote: > On 06/20/2018 12:56 PM, Rik van Riel wrote: > > This patch deals with that issue by introducing a third TLB state, > > TLBSTATE_FLUSH, which causes %CR3 to be reloaded at the next > > context > > switch.

Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 10:05 -0700, Dave Hansen wrote: > On 06/20/2018 12:56 PM, Rik van Riel wrote: > > This patch deals with that issue by introducing a third TLB state, > > TLBSTATE_FLUSH, which causes %CR3 to be reloaded at the next > > context > > switch.

Re: [PATCH 3/7] x86,tlb: change tlbstate.is_lazy to tlbstate.state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 10:01 -0700, Dave Hansen wrote: > On 06/20/2018 12:56 PM, Rik van Riel wrote: > > +#define TLBSTATE_OK0 > > +#define TLBSTATE_LAZY 1 > > Could we spell out a bit more about what "OK" means in comments? It > obviously means &qu

Re: [PATCH 3/7] x86,tlb: change tlbstate.is_lazy to tlbstate.state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 10:01 -0700, Dave Hansen wrote: > On 06/20/2018 12:56 PM, Rik van Riel wrote: > > +#define TLBSTATE_OK0 > > +#define TLBSTATE_LAZY 1 > > Could we spell out a bit more about what "OK" means in comments? It > obviously means &qu

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 08:36 -0700, Andy Lutomirski wrote: > On Wed, Jun 20, 2018 at 12:57 PM Rik van Riel > wrote: > > > > Do not call leave_mm when going into a cstate. Now that mprotect > > and > > madvise no longer send IPIs for TLB shootdowns to idle CPUs, t

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 08:36 -0700, Andy Lutomirski wrote: > On Wed, Jun 20, 2018 at 12:57 PM Rik van Riel > wrote: > > > > Do not call leave_mm when going into a cstate. Now that mprotect > > and > > madvise no longer send IPIs for TLB shootdowns to idle CPUs, t

Re: [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 07:58 -0700, Andy Lutomirski wrote: > On Wed, Jun 20, 2018 at 12:57 PM Rik van Riel > wrote: > > > > +++ b/arch/x86/mm/tlb.c > > @@ -646,6 +646,30 @@ void flush_tlb_mm_range(struct mm_struct *mm, > > unsigned long start, > >

Re: [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 07:58 -0700, Andy Lutomirski wrote: > On Wed, Jun 20, 2018 at 12:57 PM Rik van Riel > wrote: > > > > +++ b/arch/x86/mm/tlb.c > > @@ -646,6 +646,30 @@ void flush_tlb_mm_range(struct mm_struct *mm, > > unsigned long start, > >

Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 08:04 -0700, Andy Lutomirski wrote: > On Wed, Jun 20, 2018 at 12:57 PM Rik van Riel > wrote: > > > > Lazy TLB mode can result in an idle CPU being woken up by a TLB > > flush, > > when all it really needs to do is reload %CR3 at the next co

Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-22 Thread Rik van Riel
On Fri, 2018-06-22 at 08:04 -0700, Andy Lutomirski wrote: > On Wed, Jun 20, 2018 at 12:57 PM Rik van Riel > wrote: > > > > Lazy TLB mode can result in an idle CPU being woken up by a TLB > > flush, > > when all it really needs to do is reload %CR3 at the next co

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-21 Thread Rik van Riel
rc1] > [if your patch is applied to the wrong git tree, please drop us a > note to help improve the system] > > url:https://github.com/0day-ci/linux/commits/Rik-van-Riel/x86-tlb > -mm-make-lazy-TLB-mode-even-lazier/20180621-045620 > config: x86_64-randconfig-x016-201824 (attache

Re: [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids

2018-06-21 Thread Rik van Riel
rc1] > [if your patch is applied to the wrong git tree, please drop us a > note to help improve the system] > > url:https://github.com/0day-ci/linux/commits/Rik-van-Riel/x86-tlb > -mm-make-lazy-TLB-mode-even-lazier/20180621-045620 > config: x86_64-randconfig-x016-201824 (attache

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-20 Thread Rik van Riel
On Thu, 2018-06-21 at 06:20 +0800, kbuild test robot wrote: > All error/warnings (new ones prefixed by >>): > >arch/x86/xen/mmu_pv.c: In function 'drop_mm_ref_this_cpu': > > > arch/x86/xen/mmu_pv.c:987:3: error: implicit declaration of > > > function 'leave_mm'

Re: [PATCH 7/7] x86,idle: do not leave mm in idle state

2018-06-20 Thread Rik van Riel
On Thu, 2018-06-21 at 06:20 +0800, kbuild test robot wrote: > All error/warnings (new ones prefixed by >>): > >arch/x86/xen/mmu_pv.c: In function 'drop_mm_ref_this_cpu': > > > arch/x86/xen/mmu_pv.c:987:3: error: implicit declaration of > > > function 'leave_mm'

Re: [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-20 Thread Rik van Riel
On Wed, 2018-06-20 at 15:56 -0400, Rik van Riel wrote: > > +void tlb_flush_remove_tables(struct mm_struct *mm) > +{ > + int cpu = get_cpu(); > + /* > + * XXX: this really only needs to be called for CPUs in lazy > TLB mode. > + */ > + if (cpumask_

Re: [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

2018-06-20 Thread Rik van Riel
On Wed, 2018-06-20 at 15:56 -0400, Rik van Riel wrote: > > +void tlb_flush_remove_tables(struct mm_struct *mm) > +{ > + int cpu = get_cpu(); > + /* > + * XXX: this really only needs to be called for CPUs in lazy > TLB mode. > + */ > + if (cpumask_

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-20 Thread Rik van Riel
off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5 ++ arch/x86/include/asm/uv/uv.h| 6 +-- arch/x86/mm/tlb.c | 108 arch/x86/platform/uv/tlb_uv.c | 2 +- 4 files changed, 107 insertions(+), 14 deleti

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-06-20 Thread Rik van Riel
off-by: Rik van Riel Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 5 ++ arch/x86/include/asm/uv/uv.h| 6 +-- arch/x86/mm/tlb.c | 108 arch/x86/platform/uv/tlb_uv.c | 2 +- 4 files changed, 107 insertions(+), 14 deleti

<    1   2   3   4   5   6   7   8   9   10   >