> On 18 Apr 2024, at 13:20, Mike Rapoport wrote:
>
> On Tue, Apr 16, 2024 at 12:36:08PM +0300, Nadav Amit wrote:
>>
>>
>>
>> I might be missing something, but it seems a bit racy.
>>
>> IIUC, module_finalize() calls alternatives_smp_module_add(
> On 11 Apr 2024, at 19:05, Mike Rapoport wrote:
>
> @@ -2440,7 +2479,24 @@ static int post_relocation(struct module *mod, const
> struct load_info *info)
> add_kallsyms(mod, info);
>
> /* Arch-specific module finalizing. */
> - return module_finalize(info->hdr, info->sechdrs
> On Apr 8, 2021, at 12:18 AM, Joerg Roedel wrote:
>
> Hi Nadav,
>
> On Wed, Apr 07, 2021 at 05:57:31PM +0000, Nadav Amit wrote:
>> I tested it on real bare-metal hardware. I ran some basic I/O workloads
>> with the IOMMU enabled, checkers enabled/disabled, and so
> On Apr 7, 2021, at 3:01 AM, Joerg Roedel wrote:
>
> On Tue, Mar 23, 2021 at 02:06:19PM -0700, Nadav Amit wrote:
>> From: Nadav Amit
>>
>> Currently, IOMMU invalidations and device-IOTLB invalidations using
>> AMD IOMMU fall back to full address-space inva
> On Apr 1, 2021, at 1:38 AM, Mel Gorman wrote:
>
> On Wed, Mar 31, 2021 at 09:36:04AM -0700, Nadav Amit wrote:
>>
>>
>>> On Mar 31, 2021, at 6:16 AM, Mel Gorman wrote:
>>>
>>> On Wed, Mar 31, 2021 at 07:20:09PM +0800, Huang, Ying wrote:
>
> On Mar 31, 2021, at 6:16 AM, Mel Gorman wrote:
>
> On Wed, Mar 31, 2021 at 07:20:09PM +0800, Huang, Ying wrote:
>> Mel Gorman writes:
>>
>>> On Mon, Mar 29, 2021 at 02:26:51PM +0800, Huang Ying wrote:
For NUMA balancing, in hint page fault handler, the faulting page will
be migrat
> On Mar 26, 2021, at 7:31 PM, Lu Baolu wrote:
>
> Hi Nadav,
>
> On 3/19/21 12:46 AM, Nadav Amit wrote:
>> So here is my guess:
>> Intel probably used as a basis for the IOTLB an implementation of
>> some other (regular) TLB design.
>> Intel SDM say
From: Nadav Amit
Currently, IOMMU invalidations and device-IOTLB invalidations using
AMD IOMMU fall back to full address-space invalidation if more than a
single page need to be flushed.
Full flushes are especially inefficient when the IOMMU is virtualized by
a hypervisor, since it requires the
e, Cloud Infrastructure Service Product Dept.)
>> ; Nadav Amit
>> Cc: chenjiashang ; David Woodhouse
>> ; io...@lists.linux-foundation.org; LKML
>> ; alex.william...@redhat.com; Gonglei (Arei)
>> ; w...@kernel.org
>> Subject: RE: A problem of Intel IOMMU hardwar
> On Mar 17, 2021, at 9:46 PM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) wrote:
>
[Snip]
>
> NOTE, the magical thing happen...(*Operation-4*) we write the PTE
> of Operation-1 from 0 to 0x3 which means can Read/Write, and then
> we trigger DMA read again, it success and r
> On Mar 17, 2021, at 2:35 AM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) wrote:
>
> Hi Nadav,
>
>> -Original Message-
>> From: Nadav Amit [mailto:nadav.a...@gmail.com]
>>> reproduce the problem with high probability (~50%).
>
> On Mar 16, 2021, at 8:16 PM, Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) wrote:
>
> Hi guys,
>
> We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a special
> situation, it would cause DMA fails or get wrong data.
>
> The reproducer (based on Alex's vfio tes
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 6035152d8eebe16a5bb60398d3e05dc7799067b0
Gitweb:
https://git.kernel.org/tip/6035152d8eebe16a5bb60398d3e05dc7799067b0
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:06 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: a32a4d8a815c4eb6dc64b8962dc13a9dfae70868
Gitweb:
https://git.kernel.org/tip/a32a4d8a815c4eb6dc64b8962dc13a9dfae70868
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:04 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 4c1ba3923e6c8aa736e40f481a278c21b956c072
Gitweb:
https://git.kernel.org/tip/4c1ba3923e6c8aa736e40f481a278c21b956c072
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:05 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 2f4305b19fe6a2a261d76c21856c5598f7d878fe
Gitweb:
https://git.kernel.org/tip/2f4305b19fe6a2a261d76c21856c5598f7d878fe
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:08 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 4ce94eabac16b1d2c95762b40f49e5654ab288d7
Gitweb:
https://git.kernel.org/tip/4ce94eabac16b1d2c95762b40f49e5654ab288d7
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:07 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 1608e4cf31b88c8c448ce13aa1d77969dda6bdb7
Gitweb:
https://git.kernel.org/tip/1608e4cf31b88c8c448ce13aa1d77969dda6bdb7
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:11 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 291c4011dd7ac0cd0cebb727a75ee5a50d16dcf7
Gitweb:
https://git.kernel.org/tip/291c4011dd7ac0cd0cebb727a75ee5a50d16dcf7
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:10 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: a5aa5ce300597224ec76dacc8e63ba3ad7a18bbd
Gitweb:
https://git.kernel.org/tip/a5aa5ce300597224ec76dacc8e63ba3ad7a18bbd
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:12 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 09c5272e48614a30598e759c3c7bed126d22037d
Gitweb:
https://git.kernel.org/tip/09c5272e48614a30598e759c3c7bed126d22037d
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:09 -08:00
Committer
From: Nadav Amit
Userfaultfd self-test fails occasionally, indicating a memory
corruption.
Analyzing this problem indicates that there is a real bug since
mmap_lock is only taken for read in mwriteprotect_range() and defers
flushes, and since there is insufficient consideration of concurrent
> On Mar 3, 2021, at 11:03 AM, Peter Xu wrote:
>
> On Wed, Mar 03, 2021 at 01:57:02AM -0800, Nadav Amit wrote:
>> From: Nadav Amit
>>
>> Userfaultfd self-test fails occasionally, indicating a memory
>> corruption.
>
> It's failing very constantly
From: Nadav Amit
Userfaultfd self-test fails occasionally, indicating a memory
corruption.
Analyzing this problem indicates that there is a real bug since
mmap_lock is only taken for read in mwriteprotect_range() and defers
flushes, and since there is insufficient consideration of concurrent
From: Nadav Amit
Userfaultfd self-test fails occasionally, indicating a memory
corruption.
Analyzing this problem indicates that there is a real bug since
mmap_lock is only taken for read in mwriteprotect_range() and defers
flushes, and since there is insufficient consideration of concurrent
> On Mar 3, 2021, at 1:51 AM, Nadav Amit wrote:
>
> From: Nadav Amit
>
> Userfaultfd self-test fails occasionally, indicating a memory
> corruption.
Please ignore - I will resend.
signature.asc
Description: Message signed with OpenPGP
> On Mar 2, 2021, at 2:13 PM, Peter Xu wrote:
>
> On Fri, Dec 25, 2020 at 01:25:27AM -0800, Nadav Amit wrote:
>> From: Nadav Amit
>>
>> This patch-set went from v1 to RFCv2, as there is still an ongoing
>> discussion regarding the way of solving the recent
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: b54d50640ca698383fc5b711487f303c17f4b47f
Gitweb:
https://git.kernel.org/tip/b54d50640ca698383fc5b711487f303c17f4b47f
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:04 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: bc51e8e6f9c387d8dda1d8dea2b8856d0ade4101
Gitweb:
https://git.kernel.org/tip/bc51e8e6f9c387d8dda1d8dea2b8856d0ade4101
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:06 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: f4f14f7c20440a442b4eaeb7b6f25cd0fc437e36
Gitweb:
https://git.kernel.org/tip/f4f14f7c20440a442b4eaeb7b6f25cd0fc437e36
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:05 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: efa72447b0b95cd5e8b2bd7cf55ae23c716f8702
Gitweb:
https://git.kernel.org/tip/efa72447b0b95cd5e8b2bd7cf55ae23c716f8702
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:07 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: fe978069739b59804c911fc9e9645ce768ec5b9e
Gitweb:
https://git.kernel.org/tip/fe978069739b59804c911fc9e9645ce768ec5b9e
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:08 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 28344ab0a282a5ab5e4d56bfbcb2b363f4c15447
Gitweb:
https://git.kernel.org/tip/28344ab0a282a5ab5e4d56bfbcb2b363f4c15447
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:12 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: db73f8099a502be8ed46f6332c91754c74ac76c2
Gitweb:
https://git.kernel.org/tip/db73f8099a502be8ed46f6332c91754c74ac76c2
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:09 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 1028a5918cbaae6b9d7f0a04b6a200b9e67aec14
Gitweb:
https://git.kernel.org/tip/1028a5918cbaae6b9d7f0a04b6a200b9e67aec14
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:10 -08:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 327db7a160b33865e086f7fff73e08f6d8d47005
Gitweb:
https://git.kernel.org/tip/327db7a160b33865e086f7fff73e08f6d8d47005
Author:Nadav Amit
AuthorDate:Sat, 20 Feb 2021 15:17:11 -08:00
Committer
> On Mar 1, 2021, at 9:10 AM, Peter Zijlstra wrote:
>
> On Sat, Feb 20, 2021 at 03:17:04PM -0800, Nadav Amit wrote:
>> +/*
>> + * Choose the most efficient way to send an IPI. Note that the
>> + * number of CPUs might be zero due t
> On Feb 26, 2021, at 9:47 AM, Sean Christopherson wrote:
>
> On Fri, Feb 26, 2021, Nadav Amit wrote:
>>
>>> On Feb 25, 2021, at 1:16 PM, Sean Christopherson wrote:
>>> It's been literally years since I wrote this code, but I distinctly
>>> r
> On Feb 25, 2021, at 1:16 PM, Sean Christopherson wrote:
>
> On Wed, Feb 24, 2021, Nadav Amit wrote:
>> From: Nadav Amit
>>
>> Apparently, the assembly considers __ex_table as the location when the
>> pushsection directive was issued. Therefore when there
> On Feb 25, 2021, at 9:32 AM, Matthew Wilcox wrote:
>
> On Thu, Feb 25, 2021 at 04:56:50PM +0000, Nadav Amit wrote:
>>
>>> On Feb 25, 2021, at 4:16 AM, Matthew Wilcox wrote:
>>>
>>> On Wed, Feb 24, 2021 at 11:29:04PM -0800, Nadav Amit wrote:
> On Feb 25, 2021, at 4:16 AM, Matthew Wilcox wrote:
>
> On Wed, Feb 24, 2021 at 11:29:04PM -0800, Nadav Amit wrote:
>> Just as applications can use prefetch instructions to overlap
>> computations and memory accesses, applications may want to overlap the
>> page-fa
> On Feb 25, 2021, at 12:52 AM, Nadav Amit wrote:
>
>
>
>> On Feb 25, 2021, at 12:40 AM, Peter Zijlstra wrote:
>>
>> On Wed, Feb 24, 2021 at 11:29:04PM -0800, Nadav Amit wrote:
>>> From: Nadav Amit
>>>
>>> Just as applications can
> On Feb 25, 2021, at 12:40 AM, Peter Zijlstra wrote:
>
> On Wed, Feb 24, 2021 at 11:29:04PM -0800, Nadav Amit wrote:
>> From: Nadav Amit
>>
>> Just as applications can use prefetch instructions to overlap
>> computations and memory accesses, application
From: Nadav Amit
When FAULT_FLAG_RETRY_NOWAIT is set, the caller arguably wants only a
lightweight reclaim to avoid a long reclamation, which would not respect
the "NOWAIT" semantic. Regard the request in swap and file-backed
page-faults accordingly during the first try.
Cc: Andy Luto
From: Nadav Amit
Test prefetch_page() in cases of invalid pointer, file-mmap and
anonymous memory. Partial checks are also done with mincore syscall to
ensure the output of prefetch_page() is consistent with mincore (taking
into account the different semantics of the two).
The tests are not
From: Nadav Amit
Certain use-cases (e.g., prefetch_page()) may want to avoid polling
while a page is brought from the swap. Yet, swap_cluster_readahead()
and swap_vma_readahead() do not respect FAULT_FLAG_RETRY_NOWAIT.
Add support to respect FAULT_FLAG_RETRY_NOWAIT by not polling in these
cases
From: Nadav Amit
Introduce a new vDSO function: page_prefetch() which is to be used when
certain memory, which might be paged out, is expected to be used soon.
The function prefetches the page if needed. The function returns zero if
the page is accessible after the call and -1 otherwise
From: Nadav Amit
Apparently, the assembly considers __ex_table as the location when the
pushsection directive was issued. Therefore when there is more than a
single entry in the vDSO exception table, the calculations of the base
and fixup are wrong.
Fix the calculations of the expected fault IP
From: Nadav Amit
Add a "mask" field to vDSO exception tables that says which exceptions
should be handled.
Add a "flags" field to vDSO as well to provide additional information
about the exception.
The existing preprocessor macro _ASM_VDSO_EXTABLE_HANDLE for assembly is
n
From: Nadav Amit
Just as applications can use prefetch instructions to overlap
computations and memory accesses, applications may want to overlap the
page-faults and compute or overlap the I/O accesses that are required
for page-faults of different pages.
Applications can use multiple threads
From: Nadav Amit
Simplify the code and avoid having an additional function on the stack
by inlining on_each_cpu_cond() and on_each_cpu().
Cc: Andy Lutomirski
Cc: Thomas Gleixner
Suggested-by: Peter Zijlstra
Signed-off-by: Nadav Amit
---
include/linux/smp.h | 50
From: Nadav Amit
The compiler is smart enough without these hints.
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Suggested-by: Dave Hansen
Reviewed-by: Dave Hansen
Signed-off-by: Nadav Amit
---
arch/x86/mm/tlb.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86
From: Nadav Amit
cpumask_next_and() and cpumask_any_but() are pure, and marking them as
such seems to generate different and presumably better code for
native_flush_tlb_multi().
Reviewed-by: Dave Hansen
Signed-off-by: Nadav Amit
---
include/linux/cpumask.h | 6 +++---
1 file changed, 3
From: Nadav Amit
Blindly writing to is_lazy for no reason, when the written value is
identical to the old value, makes the cacheline dirty for no reason.
Avoid making such writes to prevent cache coherency traffic for no
reason.
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Suggested-by: Dave Hansen
From: Nadav Amit
cpu_tlbstate is mostly private and only the variable is_lazy is shared.
This causes some false-sharing when TLB flushes are performed.
Break cpu_tlbstate intro cpu_tlbstate and cpu_tlbstate_shared, and mark
each one accordingly.
Cc: Andy Lutomirski
Cc: Peter Zijlstra
From: Nadav Amit
To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).
While the updated smp
From: Nadav Amit
Open-code on_each_cpu_cond_mask() in native_flush_tlb_others() to
optimize the code. Open-coding eliminates the need for the indirect branch
that is used to call is_lazy(), and in CPUs that are vulnerable to
Spectre v2, it eliminates the retpoline. In addition, it allows to use
From: Nadav Amit
The unification of these two functions allows to use them in the updated
SMP infrastrucutre.
To do so, remove the reason argument from flush_tlb_func_local(), add
a member to struct tlb_flush_info that says which CPU initiated the
flush and act accordingly. Optimize the size of
From: Nadav Amit
Currently, on_each_cpu() and similar functions do not exploit the
potential of concurrency: the function is first executed remotely and
only then it is executed locally. Functions such as TLB flush can take
considerable time, so this provides an opportunity for performance
From: Nadav Amit
The series improves TLB shootdown by flushing the local TLB concurrently
with remote TLBs, overlapping the IPI delivery time with the local
flush. Performance numbers can be found in the previous version [1].
v5 was rebased on 5.11 (long time after v4), and had some bugs and
> On Feb 18, 2021, at 12:09 AM, Christoph Hellwig wrote:
>
> On Tue, Feb 09, 2021 at 02:16:46PM -0800, Nadav Amit wrote:
>> +/*
>> + * Flags to be used as scf_flags argument of smp_call_function_many_cond().
>> + */
>> +#define SCF_WAIT(1U << 0)
> On Feb 18, 2021, at 12:16 AM, Christoph Hellwig wrote:
>
> On Tue, Feb 09, 2021 at 02:16:48PM -0800, Nadav Amit wrote:
>> +/*
>> + * Although we could have used on_each_cpu_cond_mask(),
>> + * open-coding it has performance a
> On Feb 16, 2021, at 10:59 AM, Peter Zijlstra wrote:
>
> On Tue, Feb 16, 2021 at 06:53:09PM +0000, Nadav Amit wrote:
>>> On Feb 16, 2021, at 8:32 AM, Peter Zijlstra wrote:
>
>>> I'm not sure I can explain it yet. It did get me looking at
>>> o
Hello Mathieu,
While trying to find some unrelated by, something in
sync_runqueues_membarrier_state() caught my eye:
static int sync_runqueues_membarrier_state(struct mm_struct *mm)
{
if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) {
this_cpu_write(runq
> storage and make things a bit simpler.
>
> Cc: Nadav Amit
> Cc: "VMware, Inc."
> Cc: Arnd Bergmann
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman
> ---
Thanks for the cleanup.
Acked-by: Nadav Amit
> On Feb 16, 2021, at 4:10 AM, Peter Zijlstra wrote:
>
> On Tue, Feb 09, 2021 at 02:16:49PM -0800, Nadav Amit wrote:
>> @@ -816,8 +821,8 @@ STATIC_NOPV void native_flush_tlb_others(const struct
>> cpumask *cpumask,
>> * doing a speculative memory access.
&
> On Feb 16, 2021, at 10:59 AM, Peter Zijlstra wrote:
>
> On Tue, Feb 16, 2021 at 06:53:09PM +0000, Nadav Amit wrote:
>>> On Feb 16, 2021, at 8:32 AM, Peter Zijlstra wrote:
>
>>> I'm not sure I can explain it yet. It did get me looking at
>>> o
> On Feb 16, 2021, at 8:32 AM, Peter Zijlstra wrote:
>
> On Tue, Feb 09, 2021 at 02:16:46PM -0800, Nadav Amit wrote:
>> From: Nadav Amit
>>
>> Currently, on_each_cpu() and similar functions do not exploit the
>> potential of concurrency: the function is first e
> On Feb 16, 2021, at 4:04 AM, Peter Zijlstra wrote:
>
> On Tue, Feb 09, 2021 at 02:16:46PM -0800, Nadav Amit wrote:
>> @@ -894,17 +911,12 @@ EXPORT_SYMBOL(on_each_cpu_mask);
>> void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
>>
From: Nadav Amit
Open-code on_each_cpu_cond_mask() in native_flush_tlb_others() to
optimize the code. Open-coding eliminates the need for the indirect branch
that is used to call is_lazy(), and in CPUs that are vulnerable to
Spectre v2, it eliminates the retpoline. In addition, it allows to use
From: Nadav Amit
Blindly writing to is_lazy for no reason, when the written value is
identical to the old value, makes the cacheline dirty for no reason.
Avoid making such writes to prevent cache coherency traffic for no
reason.
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Suggested-by: Dave Hansen
From: Nadav Amit
cpumask_next_and() and cpumask_any_but() are pure, and marking them as
such seems to generate different and presumably better code for
native_flush_tlb_multi().
Reviewed-by: Dave Hansen
Signed-off-by: Nadav Amit
---
include/linux/cpumask.h | 6 +++---
1 file changed, 3
From: Nadav Amit
The compiler is smart enough without these hints.
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Suggested-by: Dave Hansen
Reviewed-by: Dave Hansen
Signed-off-by: Nadav Amit
---
arch/x86/mm/tlb.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86
From: Nadav Amit
cpu_tlbstate is mostly private and only the variable is_lazy is shared.
This causes some false-sharing when TLB flushes are performed.
Break cpu_tlbstate intro cpu_tlbstate and cpu_tlbstate_shared, and mark
each one accordingly.
Cc: Andy Lutomirski
Cc: Peter Zijlstra
From: Nadav Amit
To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).
While the updated smp
From: Nadav Amit
This is a respin of a rebased version of an old series, which I did not
follow, as I was preoccupied with personal issues (sorry).
The series improve TLB shootdown by flushing the local TLB concurrently
with remote TLBs, overlapping the IPI delivery time with the local
flush
From: Nadav Amit
Currently, on_each_cpu() and similar functions do not exploit the
potential of concurrency: the function is first executed remotely and
only then it is executed locally. Functions such as TLB flush can take
considerable time, so this provides an opportunity for performance
From: Nadav Amit
The unification of these two functions allows to use them in the updated
SMP infrastrucutre.
To do so, remove the reason argument from flush_tlb_func_local(), add
a member to struct tlb_flush_info that says which CPU initiated the
flush and act accordingly. Optimize the size of
> On Feb 3, 2021, at 1:44 AM, Will Deacon wrote:
>
> On Tue, Feb 02, 2021 at 01:35:38PM -0800, Nadav Amit wrote:
>>> On Feb 2, 2021, at 3:00 AM, Peter Zijlstra wrote:
>>>
>>> On Tue, Feb 02, 2021 at 01:32:36AM -0800, Nadav Amit wrote:
>>>>>
> On Feb 2, 2021, at 3:00 AM, Peter Zijlstra wrote:
>
> On Tue, Feb 02, 2021 at 01:32:36AM -0800, Nadav Amit wrote:
>>> On Feb 1, 2021, at 3:36 AM, Peter Zijlstra wrote:
>>>
>>>
>>> https://lkml.kernel.org/r/20210127235347.1402-1-w...@kernel.org
&
> On Feb 1, 2021, at 4:14 PM, Andy Lutomirski wrote:
>
>
>> On Feb 1, 2021, at 2:04 PM, Nadav Amit wrote:
>>
>> Andy’s comments managed to make me realize this code is wrong. We must
>> call inc_mm_tlb_gen(mm) every time.
>>
>> Otherwise, a CPU t
> On Feb 2, 2021, at 1:31 AM, Peter Zijlstra wrote:
>
> On Tue, Feb 02, 2021 at 07:20:55AM +0000, Nadav Amit wrote:
>> Arm does not define tlb_end_vma, and consequently it flushes the TLB after
>> each VMA. I suspect it is not intentional.
>
> ARM is one of those that
> On Feb 1, 2021, at 3:36 AM, Peter Zijlstra wrote:
>
>
> https://lkml.kernel.org/r/20210127235347.1402-1-w...@kernel.org
I have seen this series, and applied my patches on it.
Despite Will’s patches, there were still inconsistencies between fullmm
and need_flush_all.
Am I missing something?
> On Feb 1, 2021, at 10:41 PM, Nicholas Piggin wrote:
>
> Excerpts from Peter Zijlstra's message of February 1, 2021 10:09 pm:
>> I also don't think AGRESSIVE_FLUSH_BATCHING quite captures what it does.
>> How about:
>>
>> CONFIG_MMU_GATHER_NO_PER_VMA_FLUSH
>
> Yes please, have to have des
> On Feb 1, 2021, at 5:19 AM, Peter Zijlstra wrote:
>
> On Sat, Jan 30, 2021 at 04:11:25PM -0800, Nadav Amit wrote:
>> +#define tlb_start_ptes(tlb) \
>> +do {\
&
> On Jan 30, 2021, at 4:11 PM, Nadav Amit wrote:
>
> From: Nadav Amit
>
> Currently, deferred TLB flushes are detected in the mm granularity: if
> there is any deferred TLB flush in the entire address space due to NUMA
> migration, pte_accessible() in x86 w
> On Jan 30, 2021, at 6:57 PM, Andy Lutomirski wrote:
>
> On Sat, Jan 30, 2021 at 5:19 PM Nadav Amit wrote:
>>> On Jan 30, 2021, at 5:02 PM, Andy Lutomirski wrote:
>>>
>>> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote:
>>>> From: Nadav Amit
> On Jan 31, 2021, at 2:07 AM, Damian Tometzki wrote:
>
> On Sat, 30. Jan 16:11, Nadav Amit wrote:
>> From: Nadav Amit
>>
>> Introduce tlb_start_ptes() and tlb_end_ptes() which would be called
>> before and after PTEs are updated and TLB flushes are deferred.
> On Jan 31, 2021, at 12:32 PM, Andy Lutomirski wrote:
>
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote:
>> From: Nadav Amit
>>
>> To detect deferred TLB flushes in fine granularity, we need to keep
>> track on the completed TLB flush generation for each
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote:
>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>> index 632d5a677d3f..b7473d2c9a1f 100644
>>> --- a/mm/mprotect.c
>>> +++ b/mm/mprotect.c
>>> @@ -139,7 +139,8 @@ static unsigned long chang
> On Jan 30, 2021, at 11:57 PM, Nadav Amit wrote:
>
>> On Jan 30, 2021, at 7:30 PM, Nicholas Piggin wrote:
>>
>> Excerpts from Nadav Amit's message of January 31, 2021 10:11 am:
>>> From: Nadav Amit
>>>
>>> There are currently (at le
> On Jan 30, 2021, at 7:30 PM, Nicholas Piggin wrote:
>
> Excerpts from Nadav Amit's message of January 31, 2021 10:11 am:
>> From: Nadav Amit
>>
>> There are currently (at least?) 5 different TLB batching schemes in the
>> kernel:
>>
>> 1. U
> On Jan 30, 2021, at 5:02 PM, Andy Lutomirski wrote:
>
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote:
>> From: Nadav Amit
>>
>> fullmm in mmu_gather is supposed to indicate that the mm is torn-down
>> (e.g., on process exit) and can therefore allow
> On Jan 30, 2021, at 5:07 PM, Andy Lutomirski wrote:
>
> Adding Andrew Cooper, who has a distressingly extensive understanding
> of the x86 PTE magic.
>
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote:
>> From: Nadav Amit
>>
>> Currently, using mprote
> On Jan 30, 2021, at 4:39 PM, Andy Lutomirski wrote:
>
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote:
>> From: Nadav Amit
>>
>> There are currently (at least?) 5 different TLB batching schemes in the
>> kernel:
>>
>> 1. Using mmu_gather (
From: Nadav Amit
mm_cpumask() is volatile: a bit might be turned on or off at any given
moment, and it is not protected by any lock. While the kernel coding
guidelines are very prohibitive against the use of volatile, not marking
mm_cpumask() as volatile seems wrong.
Cpumask and bitmap
From: Nadav Amit
Detecting deferred TLB flushes per-VMA has two drawbacks:
1. It requires an atomic cmpxchg to record mm's TLB generation at the
time of the last TLB flush, as two deferred TLB flushes on the same VMA
can race.
2. It might be in coarse granularity for large VMAs.
On 6
From: Nadav Amit
Introduce cpumask_atomic_or() and bitmask_atomic_or() to allow to
perform atomic or operations atomically on cpumasks. This will be used
by the next patch.
To be more efficient, skip atomic operations when no changes are needed.
Signed-off-by: Nadav Amit
Cc: Mel Gorman
Cc
From: Nadav Amit
flush_tlb_batched_pending() appears to have a theoretical race:
tlb_flush_batched is being cleared after the TLB flush, and if in
between another core calls set_tlb_ubc_flush_pending() and sets the
pending TLB flush indication, this indication might be lost. Holding the
page
From: Nadav Amit
If all the deferred TLB flushes were completed, there is no need to
update the completed TLB flush. This update requires an atomic cmpxchg,
so we would like to skip it.
To do so, save for each mm the last TLB generation in which TLB flushes
were deferred. While saving this
1 - 100 of 1001 matches
Mail list logo