On x86 software page-table walkers depend on the fact that remote TLB flush does an IPI: walk is performed lockless but with interrupts disabled and in case the pagetable is freed the freeing CPU will get blocked as remote TLB flush is required. On other architecture which don't require an IPI to do remote TLB flush we have an RCU-based mechanism (see include/asm-generic/tlb.h for more details).
In virtualized environments we may want to override .flush_tlb_others hook in pv_mmu_ops and use a hypercall asking the hypervisor to do remote TLB flush for us. This breaks the assumption about IPI. Xen PV does this for years and the upcoming remote TLB flush for Hyper-V will do it too. This is not safe, software pagetable walkers may step on an already freed page. Solve the issue by enabling RCU-based table free mechanism. Testing with kernbench and mmap/munmap microbenchmars didn't show any notable performance impact. Suggested-by: Peter Zijlstra <pet...@infradead.org> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com> Acked-by: Juergen Gross <jgr...@suse.com> Acked-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> --- Changes since v2: - Add Kirill's Acked-by. - Add a comment to __tlb_remove_table [Peter Zijlstra] --- arch/x86/Kconfig | 1 + arch/x86/include/asm/tlb.h | 14 ++++++++++++++ arch/x86/mm/pgtable.c | 8 ++++---- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 323cb065be5e..b0bfc27d05a2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -167,6 +167,7 @@ config X86 select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && STACK_VALIDATION select HAVE_STACK_VALIDATION if X86_64 diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index c7797307fc2b..79a4ca6a9606 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -15,4 +15,18 @@ #include <asm-generic/tlb.h> +/* + * While x86 architecture in general requires an IPI to perform TLB + * shootdown, enablement code for several hypervisors overrides + * .flush_tlb_others hook in pv_mmu_ops and implements it by issuing + * a hypercall. To keep software pagetable walkers safe in this case we + * switch to RCU based table free (HAVE_RCU_TABLE_FREE). See the comment + * below 'ifdef CONFIG_HAVE_RCU_TABLE_FREE' in include/asm-generic/tlb.h + * for more details. + */ +static inline void __tlb_remove_table(void *table) +{ + free_page_and_swap_cache(table); +} + #endif /* _ASM_X86_TLB_H */ diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 508a708eb9a6..218834a3e9ad 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -56,7 +56,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte) { pgtable_page_dtor(pte); paravirt_release_pte(page_to_pfn(pte)); - tlb_remove_page(tlb, pte); + tlb_remove_table(tlb, pte); } #if CONFIG_PGTABLE_LEVELS > 2 @@ -72,21 +72,21 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) tlb->need_flush_all = 1; #endif pgtable_pmd_page_dtor(page); - tlb_remove_page(tlb, page); + tlb_remove_table(tlb, page); } #if CONFIG_PGTABLE_LEVELS > 3 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) { paravirt_release_pud(__pa(pud) >> PAGE_SHIFT); - tlb_remove_page(tlb, virt_to_page(pud)); + tlb_remove_table(tlb, virt_to_page(pud)); } #if CONFIG_PGTABLE_LEVELS > 4 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d) { paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); - tlb_remove_page(tlb, virt_to_page(p4d)); + tlb_remove_table(tlb, virt_to_page(p4d)); } #endif /* CONFIG_PGTABLE_LEVELS > 4 */ #endif /* CONFIG_PGTABLE_LEVELS > 3 */ -- 2.13.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel