as well.
>
> Cc:
> Cc: Rafael J. Wysocki
> Cc: Liu Shixin
> Cc: Dan Williams
> Cc: Kirill A. Shutemov
> Reported-by: Chris Piper
> Signed-off-by: Vishal Verma
Acked-by: Kirill A. Shutemov
--
Kiryl Shutsemau / Kirill A. Shutemov
truct memory_initiator, node);
> +
> + set_bit(initiator->processor_pxm, p_nodes);
> + return 0;
> + }
> +
> + list_sort(p_nodes, , initiator_cmp);
> + return 0;
> +}
> +
Hm. I think it indicates that these set_bit()s do not belong to
initiator_cmp().
Maybe remove both set_bit() from the compare helper and walk the list
separately to initialize the node mask? I think it will be easier to
follow.
--
Kiryl Shutsemau / Kirill A. Shutemov
t; Cc: Rafael J. Wysocki
> Cc: Liu Shixin
> Cc: Dan Williams
> Signed-off-by: Vishal Verma
Acked-by: Kirill A. Shutemov
--
Kiryl Shutsemau / Kirill A. Shutemov
On Mon, Apr 19, 2021 at 08:09:13PM +, Sean Christopherson wrote:
> On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > On Mon, Apr 19, 2021 at 06:09:29PM +, Sean Christopherson wrote:
> > > On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > > > On Mon, Ap
On Mon, Apr 19, 2021 at 06:09:29PM +, Sean Christopherson wrote:
> On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > On Mon, Apr 19, 2021 at 04:01:46PM +, Sean Christopherson wrote:
> > > But fundamentally the private pages, are well, private. They can't be
> &g
On Mon, Apr 19, 2021 at 04:01:46PM +, Sean Christopherson wrote:
> On Mon, Apr 19, 2021, Kirill A. Shutemov wrote:
> > On Fri, Apr 16, 2021 at 05:30:30PM +, Sean Christopherson wrote:
> > > I like the idea of using "special" PTE value to denote guest priv
On Fri, Apr 16, 2021 at 05:30:30PM +, Sean Christopherson wrote:
> On Fri, Apr 16, 2021, Kirill A. Shutemov wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 1b404e4d7dd8..f8183386abe7 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x
On Fri, Apr 16, 2021 at 06:10:30PM +0200, Borislav Petkov wrote:
> On Fri, Apr 16, 2021 at 06:40:55PM +0300, Kirill A. Shutemov wrote:
> > Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
> >
> > Host side doesn't provide the feature yet, so it i
inline struct folio *folio_next(struct folio *folio)
> +{
> + return (struct folio *)folio_page(folio, folio_nr_pages(folio));
> +}
>
> (it occurs to me this should also be const-preserving, but it's not clear
> that's needed yet)
Are we risking that we would need to replace inline functions with macros
all the way down? Not sure const-preserving worth it.
--
Kirill A. Shutemov
://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf
Not-signed-off-by: Kirill A. Shutemov
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/mmu/mmu.c | 12 ++-
arch/x86/kvm/mmu/paging_tmpl.h | 10 +-
arch/x86
Make struct kvm pointer available within hva_to_pfn_slow(). It is
prepartion for the next patch.
Signed-off-by: Kirill A. Shutemov
---
arch/powerpc/kvm/book3s_64_mmu_hv.c| 2 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
arch/x86/kvm/mmu/mmu.c | 8 +++--
include/linux
hvclock is shared between the guest and the hypervisor. It has to be
accessible by host.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/kernel/kvmclock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index
with hwpoison entries.
Keep page referenced on setting up hwpoison entries, copy the reference
on fork() and return on zap().
Signed-off-by: Kirill A. Shutemov
---
mm/memory.c | 6 ++
mm/rmap.c | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c
index
If the page got unpoisoned we can replace hwpoison entry with a present
PTE on page fault instead of delivering SIGBUS.
Signed-off-by: Kirill A. Shutemov
---
mm/memory.c | 38 +-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm
Forbid access to poisoned pages.
TODO: Probably more fine-grained approach is needed. It shuld be a
allowed to fault-in these pages as hwpoison entries.
Not-Signed-off-by: Kirill A. Shutemov
---
mm/shmem.c | 7 +++
1 file changed, 7 insertions(+)
diff --git a/mm/shmem.c b/mm/shmem.c
index
If KVM memory protection is active, the trampoline area will need to be
in shared memory.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/realmode/init.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index
Add helpers to convert hwpoison swap entry to pfn and page.
Signed-off-by: Kirill A. Shutemov
---
include/linux/swapops.h | 20
1 file changed, 20 insertions(+)
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index d9b7c9132c2f..520589b12fb3 100644
The new flag allows to bypass check if the page is poisoned and get
reference on it.
Signed-off-by: Kirill A. Shutemov
---
include/linux/mm.h | 1 +
mm/gup.c | 29 ++---
2 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/include/linux/mm.h b
Mirror SEV, use SWIOTLB always if KVM memory protection is enabled.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mem_encrypt.h | 7 +++--
arch/x86/kernel/kvm.c | 2 ++
arch/x86/kernel/pci-swiotlb.c | 3 +-
arch/x86/mm
Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
Host side doesn't provide the feature yet, so it is a dead code for now.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_para.h | 5 +
arch/x86/include/uapi/asm
the page
allows to touch it
- FOLL_ALLOW_POISONED is implemented
The patchset can also be found here:
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-poison
Kirill A. Shutemov (13):
x86/mm: Move force_dma_unencrypted() to common code
x86/kvm: Introduce KVM memory
-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 7 +-
arch/x86/include/asm/io.h| 4 +++-
arch/x86/mm/Makefile | 2 ++
arch/x86/mm/mem_encrypt.c| 30 -
arch/x86/mm/mem_encrypt_common.c | 38
5
Make force_dma_unencrypted() return true for KVM to get DMA pages mapped
as shared.
__set_memory_enc_dec() now informs the host via hypercall if the state
of the page has changed from shared to private or back.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch
looked at the rest of the patches yet, but why do you need a
special free path for shadow stack? Why the normal unmap route doesn't
work for you?
> + if (r == -EINTR) {
> + cond_resched();
> + continue;
> + }
> + break;
> + }
> +
> + cet->shstk_base = 0;
> + cet->shstk_size = 0;
> +}
> +
> +void shstk_disable(void)
> +{
> + struct cet_status *cet = >thread.cet;
> + u64 msr_val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) ||
> + !cet->shstk_size ||
> + !cet->shstk_base)
> + return;
> +
> + start_update_msrs();
> + rdmsrl(MSR_IA32_U_CET, msr_val);
> + wrmsrl(MSR_IA32_U_CET, msr_val & ~CET_SHSTK_EN);
> + wrmsrl(MSR_IA32_PL3_SSP, 0);
> + end_update_msrs();
> +
> + shstk_free(current);
> +}
> --
> 2.21.0
>
>
--
Kirill A. Shutemov
k
> (RW=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need
> to preserve it.
>
> Exclude shadow stack from preserve_write test, and apply the same change to
> change_huge_pmd().
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kirill A. Shutemov
> ---
> v2
ff-by: Yu-cheng Yu
> Reviewed-by: Kees Cook
> Cc: Kirill A. Shutemov
> ---
> v24:
> - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping().
>
> mm/gup.c | 8 +---
> mm/huge_memory.c | 8 +---
> 2 files changed, 10 insertions(+), 6 deletions(-)
&
On Thu, Apr 01, 2021 at 03:10:52PM -0700, Yu-cheng Yu wrote:
> Account shadow stack pages to stack memory.
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
> Cc: Kirill A. Shutemov
> ---
> v24:
> - Change arch_shadow_stack_mapping() to is_shadow_stack_mappin
bytes and
> 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE.
> Thus, putting a gap page on both ends of a shadow stack prevents INCSSP,
> CALL, and RET from going beyond.
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
> Cc: Kirill A. Shutemov
> -
t; - In change_pte_range(), pte_mkwrite() is called directly. Replace it with
> maybe_mkwrite().
>
> A shadow stack vma is writable but has different vma
> flags, and handled accordingly in maybe_mkwrite().
>
Have you checked THP side? Looks like at least do_huge_pmd_numa_page()
needs adjustment, no?
--
Kirill A. Shutemov
e().
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
> Cc: Kirill A. Shutemov
> ---
> v24:
> - Instead of doing arch_maybe_mkwrite(), overwrite maybe*_mkwrite() with x86
> versions.
> - Change VM_SHSTK to VM_SHADOW_STACK.
>
> arch/x86/inclu
ed and both are
> handled as a write access.
>
> Signed-off-by: Yu-cheng Yu
> Reviewed-by: Kees Cook
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
SHADOW_STACK to track shadow stack VMAs.
>
> Signed-off-by: Yu-cheng Yu
> Cc: Kees Cook
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
r Zijlstra provided many
> insights to the issue. Jann Horn provided the cmpxchg solution.
>
> Signed-off-by: Yu-cheng Yu
> Reviewed-by: Kees Cook
> Cc: Kirill A. Shutemov
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
es() works - trigger a fault first, then
> lookup the PTE in the page tables).
For now, the patch has two step poisoning: first fault in, on the add to
shadow PTE -- poison. By the time VM has chance to use the page it's
poisoned and unmapped from the host userspace.
--
Kirill A. Shutemov
On Thu, Apr 08, 2021 at 11:52:35AM +0200, Borislav Petkov wrote:
> On Fri, Apr 02, 2021 at 06:26:40PM +0300, Kirill A. Shutemov wrote:
> > Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
> >
> > Host side doesn't provide the feature yet, so it i
On Wed, Apr 07, 2021 at 04:55:54PM +0200, David Hildenbrand wrote:
> On 02.04.21 17:26, Kirill A. Shutemov wrote:
> > TDX architecture aims to provide resiliency against confidentiality and
> > integrity attacks. Towards this goal, the TDX architecture helps enforce
> > t
On Wed, Apr 07, 2021 at 04:09:35PM +0200, David Hildenbrand wrote:
> On 07.04.21 15:16, Kirill A. Shutemov wrote:
> > On Tue, Apr 06, 2021 at 04:57:46PM +0200, David Hildenbrand wrote:
> > > On 06.04.21 16:33, Dave Hansen wrote:
> > > > On 4/6/21 12:
On Tue, Apr 06, 2021 at 04:57:46PM +0200, David Hildenbrand wrote:
> On 06.04.21 16:33, Dave Hansen wrote:
> > On 4/6/21 12:44 AM, David Hildenbrand wrote:
> > > On 02.04.21 17:26, Kirill A. Shutemov wrote:
> > > > TDX architecture aims to provide resi
On Tue, Apr 06, 2021 at 09:11:25AM -0700, Dave Hansen wrote:
> On 4/6/21 8:37 AM, Kirill A. Shutemov wrote:
> > On Thu, Apr 01, 2021 at 01:06:29PM -0700, Dave Hansen wrote:
> >> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> >>> From: "Kirill A. Shute
ENT));
>
> That "!enc" looks wrong to me. Caches would need to be flushed whenever
> encryption attributes *change*, not just when they are set.
>
> Also, cpa_flush() flushes caches *AND* the TLB. How does TDX manage to
> not need TLB flushes?
I will double-check everthing, but I think we can skip *both* cpa_flush()
for private->shared conversion. VMM and TDX module will take care about
TLB and cache flush in response to MapGPA TDVMCALL.
> > ret = __change_page_attr_set_clr(, 1);
> >
> > @@ -2012,6 +2020,11 @@ static int __set_memory_enc_dec(unsigned long addr,
> > int numpages, bool enc)
> > */
> > cpa_flush(, 0);
> >
> > + if (!ret && is_tdx_guest()) {
> > + ret = tdx_map_gpa(__pa(addr), numpages, enc);
> > + // XXX: need to undo on error?
> > + }
>
> Time to fix this stuff up if you want folks to take this series more
> seriously.
My bad, will fix it.
--
Kirill A. Shutemov
On Thu, Apr 01, 2021 at 01:26:23PM -0700, Dave Hansen wrote:
> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov"
> >
> > All ioremap()ed paged that are not backed by normal memory (NONE or
> > RESERVED) have to be
here.
I would rather keep it as is. We should be fine as long as we only allow
to clear bits from the mask.
--
Kirill A. Shutemov
On Thu, Apr 01, 2021 at 01:06:29PM -0700, Dave Hansen wrote:
> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov"
> >
> > Intel TDX doesn't allow VMM to access guest memory. Any memory that is
> > required for commun
hristoph, I'm not a fan of this :/
>
> What would you prefer?
I liked earlier approach with only struct page here. Once we know a field
should never be referenced from raw struct page, we can move it here.
But feel free to ignore my suggestion. It's not show-stopper for me and
reverting is bac
atomic_t _mapcount;
> + atomic_t _refcount;
> +#ifdef CONFIG_MEMCG
> + unsigned long memcg_data;
> +#endif
As Christoph, I'm not a fan of this :/
> + /* private: the union with struct page is transitional */
> + };
> + struct page page;
> + };
> +};
--
Kirill A. Shutemov
On Tue, Apr 06, 2021 at 09:44:07AM +0200, David Hildenbrand wrote:
> On 02.04.21 17:26, Kirill A. Shutemov wrote:
> > TDX architecture aims to provide resiliency against confidentiality and
> > integrity attacks. Towards this goal, the TDX architecture helps enforce
> > t
hvclock is shared between the guest and the hypervisor. It has to be
accessible by host.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/kernel/kvmclock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index
Mirror SEV, use SWIOTLB always if KVM memory protection is enabled.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/mem_encrypt.h | 7 +++--
arch/x86/kernel/kvm.c | 2 ++
arch/x86/kernel/pci-swiotlb.c | 3 +-
arch/x86/mm
Provide basic helpers, KVM_FEATURE, CPUID flag and a hypercall.
Host side doesn't provide the feature yet, so it is a dead code for now.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_para.h | 5 +
arch/x86/include/uapi/asm
-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 7 +-
arch/x86/include/asm/io.h| 4 +++-
arch/x86/mm/Makefile | 2 ++
arch/x86/mm/mem_encrypt.c| 30 -
arch/x86/mm/mem_encrypt_common.c | 38
5
Make force_dma_unencrypted() return true for KVM to get DMA pages mapped
as shared.
__set_memory_enc_dec() now informs the host via hypercall if the state
of the page has changed from shared to private or back.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/Kconfig | 1 +
arch
the issue.
The core of the change is in the last patch. Please see more detailed
description of the issue and proposoal of the solution there.
The patchset can also be found here:
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-poison
Kirill A. Shutemov (7):
x86/mm: Move
be tied to KVM instance and another KVM must not
be able to map the page into guest.
[1]
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf
Not-signed-off-by: Kirill A. Shutemov
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/cpuid.c
If KVM memory protection is active, the trampoline area will need to be
in shared memory.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/realmode/init.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index
On Fri, Mar 26, 2021 at 08:46:30AM -0700, Yu, Yu-cheng wrote:
> On 3/22/2021 3:57 AM, Kirill A. Shutemov wrote:
> > On Tue, Mar 16, 2021 at 08:10:44AM -0700, Yu-cheng Yu wrote:
> > > Account shadow stack pages to stack memory.
> > >
> > > Signed-off-by: Yu-ch
[0.145129] R10: 990b80013c78 R11: 990b80013c7d R12:
> > > 972dc74ada80
> > > > [0.145129] R13: 972d474038c0 R14: R15:
> > >
> > > > [0.145129] FS: () GS:972d47a0()
> > > knlGS:
> > > > [0.145129] CS: 0010 DS: ES: CR0: 80050033
> > > > [0.145129] CR2: CR3: 0660a000 CR4:
> > > 003406f0
> > > > [0.145129] Call Trace:
> > > > [0.145129] acpi_os_release_object+0x5/0x10
> > > > [0.145129] acpi_ns_delete_children+0x46/0x59
> > > > [0.145129] acpi_ns_delete_namespace_subtree+0x5c/0x79
> > > > [0.145129] ? acpi_sleep_proc_init+0x1f/0x1f
> > > > [0.145129] acpi_ns_terminate+0xc/0x31
> > > > [0.145129] acpi_ut_subsystem_shutdown+0x45/0xa3
> > > > [0.145129] ? acpi_sleep_proc_init+0x1f/0x1f
> > > > [0.145129] acpi_terminate+0x5/0xf
> > > > [0.145129] acpi_init+0x27b/0x308
> > > > [0.145129] ? video_setup+0x79/0x79
> > > > [0.145129] do_one_initcall+0x7b/0x160
> > > > [0.145129] kernel_init_freeable+0x190/0x1f2
> > > > [0.145129] ? rest_init+0x9a/0x9a
> > > > [0.145129] kernel_init+0x5/0xf6
> > > > [0.145129] ret_from_fork+0x22/0x30
> > > > [0.145129] ---[ end trace 574554fca7bd06bb ]---
> > > > [0.145133] INFO: Allocated in acpi_ns_root_initialize+0xb6/0x2d1
> > > > age=58
> > > cpu=0 pid=0
> > > > [0.145881] kmem_cache_alloc_trace+0x1a9/0x1c0
> > > > [0.146132] acpi_ns_root_initialize+0xb6/0x2d1
> > > > [0.146578] acpi_initialize_subsystem+0x65/0xa8
> > > > [0.147024] acpi_early_init+0x5d/0xd1
> > > > [0.147132] start_kernel+0x45b/0x518
> > > > [0.147491] secondary_startup_64+0xb6/0xc0
> > > > [0.147897] [ cut here ]
> > > >
> > > > And it seems ACPI is allocating an object via kmalloc() and then
> > > > freeing it via kmem_cache_free(<"Acpi-Namespace" kmem_cache>) which
> > > is wrong.
> > > > > ./scripts/faddr2line vmlinux 'acpi_ns_root_initialize+0xb6'
> > > > acpi_ns_root_initialize+0xb6/0x2d1:
> > > > kmalloc at include/linux/slab.h:555
> > > > (inlined by) kzalloc at include/linux/slab.h:669 (inlined by)
> > > > acpi_os_allocate_zeroed at include/acpi/platform/aclinuxex.h:57
> > > > (inlined by) acpi_ns_root_initialize at
> > > > drivers/acpi/acpica/nsaccess.c:102
> > > >
> > Hi Vegard,
> >
> > > That's it :-) This fixes it for me:
> > We'll take this patch for ACPICA and it will be in the next release.
> >
> > Rafael, do you want to take this as a part of the next rc?
>
> Yes, I do.
Folks, what happened to the patch? I don't see it in current upstream.
Looks like it got reported again:
https://lore.kernel.org/r/a1461e21-c744-767d-6dfc-6641fd3e3...@siemens.com
--
Kirill A. Shutemov
ned-off-by: Yanfei Xu
Acked-by: Kirill A. Shutemov
--
Kirill A. Shutemov
On Mon, Mar 22, 2021 at 11:46:21AM +0100, Peter Zijlstra wrote:
> On Mon, Mar 22, 2021 at 01:15:02PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Mar 16, 2021 at 08:10:38AM -0700, Yu-cheng Yu wrote:
>
> > > + pte_t old_pte, new_pte;
> > > +
> > &
On Tue, Mar 16, 2021 at 08:10:39AM -0700, Yu-cheng Yu wrote:
> +#ifdef CONFIG_X86_CET
> +# define VM_SHSTKVM_HIGH_ARCH_5
> +#else
> +# define VM_SHSTKVM_NONE
> +#endif
> +
Why not VM_SHADOW_STACK? Random reader may think SH stands for SHARED or
something.
--
Kirill A. Shutemov
GE_COW.
>
> Apply the same changes to pmd_modify().
>
> Signed-off-by: Yu-cheng Yu
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
rlie
> Cc: Joonas Lahtinen
> Cc: Jani Nikula
> Cc: Daniel Vetter
> Cc: Rodrigo Vivi
> Cc: Zhenyu Wang
> Cc: Zhi Wang
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
pte_*() helpers and apply the same changes to
> pmd and pud.
>
> After this, there are six free bits left in the 64-bit PTE, and no more
> free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not
> implemented for the 32-bit kernel.
>
> Signed-off-by: Yu-cheng Yu
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
On Tue, Mar 16, 2021 at 08:10:34AM -0700, Yu-cheng Yu wrote:
> To prepare the introduction of _PAGE_COW, move pmd_write() and
> pud_write() up in the file, so that they can be used by other
> helpers below.
>
> Signed-off-by: Yu-cheng Yu
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
return -EINVAL;
> +
> if (!file_mmap_ok(file, inode, pgoff, len))
> return -EOVERFLOW;
>
> @@ -1545,7 +1551,7 @@ unsigned long do_mmap(struct file *file, unsigned long
> addr,
> } else {
> switch (flags & MAP_TYPE) {
> case MAP_SHARED:
> - if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
> + if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP|VM_SHSTK))
> return -EINVAL;
> /*
>* Ignore pgoff.
> --
> 2.21.0
>
--
Kirill A. Shutemov
t; mm->stack_vm += npages;
> else if (is_data_mapping(flags))
> mm->data_vm += npages;
> + else if (arch_shadow_stack_mapping(flags))
> + mm->stack_vm += npages;
Ditto.
> }
>
> static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
> --
> 2.21.0
>
--
Kirill A. Shutemov
truct vm_area_struct *vma)
> {
> unsigned long vm_end = vma->vm_end;
> + unsigned long gap = 0;
> +
> + if (vma->vm_flags & VM_GROWSUP)
> + gap = stack_guard_gap;
> + else if (vma->vm_flags & VM_SHSTK)
> + gap = ARCH_SHADOW_STACK_GUARD_GAP;
>
> - if (vma->vm_flags & VM_GROWSUP) {
> - vm_end += stack_guard_gap;
> + if (gap != 0) {
> + vm_end += gap;
> if (vm_end < vma->vm_end)
> vm_end = -PAGE_SIZE;
> }
> --
> 2.21.0
>
--
Kirill A. Shutemov
override maybe_mkwrite()
and maybe_pmd_mkwrite() altogether. Wrap it into #ifndef maybe_mkwrite
here and provide VM_SHSTK-aware version from .
--
Kirill A. Shutemov
* This method cannot distinguish shadow stack read vs. write.
> + * For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect
> + * copy-on-write.
> + */
> + if (error_code & X86_PF_SHSTK)
> + flags |= FAULT_FLAG_WRITE;
> if (error_code & X86_PF_WRITE)
> flags |= FAULT_FLAG_WRITE;
> if (error_code & X86_PF_INSTR)
> --
> 2.21.0
>
--
Kirill A. Shutemov
gt; + new_pmd = pmd_wrprotect(old_pmd);
> + } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)_pmd,
> pmd_val(new_pmd)));
> +
> + return;
> + }
> clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
> }
>
> --
> 2.21.0
>
--
Kirill A. Shutemov
Anvin"
> Cc: Kees Cook
> Cc: Thomas Gleixner
> Cc: Dave Hansen
> Cc: Christoph Hellwig
> Cc: Andy Lutomirski
> Cc: Ingo Molnar
> Cc: Borislav Petkov
> Cc: Peter Zijlstra
Looks good to me.
Reviewed-by: Kirill A. Shutemov
--
Kirill A. Shutemov
4148] RSP: :a18cb0af7a40 EFLAGS: 00010246
> > [16247.544153] RAX: 0036 RBX: 000d RCX:
> > 8ef13fc9a748
> > [16247.544158] RDX: RSI: 0027 RDI:
> > 8ef13fc9a740
> > [16247.544162] RBP: 8eb2f9a02ef8 R08: 8ef23ffb48a8 R09:
> > 0004fffb
> > [16247.544166] R10: R11: 3fff R12:
> > 1400
> > [16247.544170] R13: 8eb2f9a02f00 R14: R15:
> > d651b1978000
> > [16247.544175] FS: 7f97c1717740() GS:8ef13fc8()
> > knlGS:
> > [16247.544180] CS: 0010 DS: ES: CR0: 80050033
> > [16247.544184] CR2: 7f97c0efec0d CR3: 0040aa3ac006 CR4:
> > 007706e0
> > [16247.544188] DR0: DR1: DR2:
> >
> > [16247.544191] DR3: DR6: fffe0ff0 DR7:
> > 0400
> > [16247.544194] PKRU: 5554
> > [16247.546763] BUG: Bad rss-counter state mm:060c94f4
> > type:MM_ANONPAGES val:8
> >
> >
--
Kirill A. Shutemov
On Mon, Mar 15, 2021 at 01:25:40PM +0100, David Hildenbrand wrote:
> On 15.03.21 13:22, Kirill A. Shutemov wrote:
> > On Mon, Mar 08, 2021 at 05:45:20PM +0100, David Hildenbrand wrote:
> > > + case -EHWPOISON: /* Skip over
ous to me.
--
Kirill A. Shutemov
ogram is added to
> tools/testing/selftests/vm to utilize the interface by splitting
> PMD THPs and PTE-mapped THPs.
>
Okay, makes sense.
But it doesn't cover non-mapped THPs. tmpfs may have file backed by THP
that mapped nowhere. Do we want to cover this case too?
Maybe have PID:,, and
FILE:,, ?
--
Kirill A. Shutemov
e, it's distraction, churn and friction, ongoing for years; but
> that's just me, and I'm resigned to the possibility that it will go in.
> Matthew is not alone in wanting to pursue it: let others speak.
I'm with Matthew on this. I would really want to drop the number of places
where we call compoud_head(). I hope we can get rid of the page flag
policy hack I made.
--
Kirill A. Shutemov
> 1 file changed, 20 insertions(+), 27 deletions(-)
Apart from patch 4/5, looks fine. For the rest, you can use:
Acked-by: Kirill A. Shutemov
--
Kirill A. Shutemov
d_do_scan() and with the change mem_cgroup_charge() may get
called twice for two different mm_structs.
Is it safe?
--
Kirill A. Shutemov
age flags will only complicate the picture.
--
Kirill A. Shutemov
On Sun, Feb 07, 2021 at 08:01:50AM -0800, Dave Hansen wrote:
> On 2/7/21 6:13 AM, Kirill A. Shutemov wrote:
> >>> + /* Allow to pass R10, R11, R12, R13, R14 and R15 down to the VMM
> >>> */
> >>> + rcx = BIT(10) | BIT(11)
On Fri, Feb 05, 2021 at 05:06:20PM -0600, Seth Forshee wrote:
> This feature requires ino_t be 64-bits, which is true for every
> 64-bit architecture but s390, so prevent this option from being
> selected there.
Quick grep suggests the same for alpha. Am I wrong?
--
Kirill A. Shutemov
On Fri, Feb 05, 2021 at 03:42:01PM -0800, Andy Lutomirski wrote:
> On Fri, Feb 5, 2021 at 3:39 PM Kuppuswamy Sathyanarayanan
> wrote:
> >
> > From: "Kirill A. Shutemov"
> >
> > TDX has three classes of CPUID leaves: some CPUID leaves
> > are
On Sun, Feb 07, 2021 at 09:24:23AM +0100, Dmitry Vyukov wrote:
> On Fri, Feb 5, 2021 at 4:16 PM Kirill A. Shutemov
> wrote:
> >
> > Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > 64-bit linear addresses, allowing software to use of the untra
On Sun, Feb 07, 2021 at 09:07:02AM +0100, Dmitry Vyukov wrote:
> On Fri, Feb 5, 2021 at 4:43 PM H.J. Lu wrote:
> >
> > On Fri, Feb 5, 2021 at 7:16 AM Kirill A. Shutemov
> > wrote:
> > >
> > > Provide prctl() interface to enabled LAM for user addresses. Depe
strips address from the metadata bits and gets it to canonical
shape before handling memory access. It has to be done very early before
TLB lookup.
Signed-off-by: Kirill A. Shutemov
---
accel/tcg/cputlb.c| 54 +++
include/hw/core/cpu.h | 1
LAM_U48 steals bits above 47-bit for tags and makes it impossible for
userspace to use full address space on 5-level paging machine.
Make these features mutually exclusive: whichever gets enabled first
blocks the othe one.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/elf.h
On Fri, Feb 05, 2021 at 04:49:05PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 05, 2021 at 06:16:20PM +0300, Kirill A. Shutemov wrote:
> > The feature competes for bits with 5-level paging: LAM_U48 makes it
> > impossible to map anything about 47-bits. The patchset made these
The new thread flags indicate that the thread has Linear Address Masking
enabled.
switch_mm_irqs_off() now respects these flags and set CR3 accordingly.
The active LAM mode gets recorded in the tlb_state.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/thread_info.h | 9 ++-
arch
The mask must not include bits above physical address mask. These bits
are reserved and can be used for other things. Bits 61 and 62 are used
for Linear Address Masking.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/processor-flags.h | 2 +-
1 file changed, 1 insertion(+), 1
, before calling into
the assembly helper.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/uaccess.h | 16 ++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index c9fa7be3df82..ee0a482b2f1f
ntroduce a totally new API and
leave the legacy one as is. But it would slow down adoption: new
prctl(2) flag wound need to propogate to the userspace headers.
Signed-off-by: Kirill A. Shutemov
---
arch/arm64/include/asm/processor.h| 12 ++--
arch/arm64/kernel/process.c
strips address from the metadata bits and gets it to canonical
shape before handling memory access. It has to be done very early before
TLB lookup.
Signed-off-by: Kirill A. Shutemov
---
accel/tcg/cputlb.c| 54 +++
include/hw/core/cpu.h | 1
Enumerate Linear Address Masking and provide defines for CR3 and CR4
flags.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/uapi/asm/processor-flags.h | 6 ++
2 files changed, 7 insertions(+)
diff --git a/arch/x86/include/asm
The helper used by the core-mm to strip tag bits and get the address to
the canonical shape. In only handles userspace addresses.
For LAM, the address gets sanitized according to the thread flags.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/page_32.h | 3 +++
arch/x86/include
-by: Kirill A. Shutemov
---
arch/x86/include/asm/processor.h | 10 +++
arch/x86/kernel/process_64.c | 145 +++
2 files changed, 155 insertions(+)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 82a08b585818..49fac2cc4329 100644
o work correctly.
Signed-off-by: Kirill A. Shutemov
---
arch/x86/include/asm/mmu.h | 1 +
arch/x86/mm/tlb.c | 28
2 files changed, 29 insertions(+)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 9257667d13c5..fb05d6a11538 100644
---
/e85fa032e5b276ddf17edd056f92f599db9e8369
Kirill A. Shutemov (9):
mm, arm64: Update PR_SET/GET_TAGGED_ADDR_CTRL interface
x86/mm: Fix CR3_ADDR_MASK
x86: CPUID and CR3/CR4 flags for Linear Address Masking
x86/mm: Introduce TIF_LAM_U57 and TIF_LAM_U48
x86/mm: Provide untagged_addr() helper
On Mon, Jan 18, 2021 at 04:42:20PM -0500, Arvind Sankar wrote:
> AFAICT, MODULES_END is only relevant as being something that happens to
> be in the top 512GiB, and -1ul would be clearer.
I think you are right. But -1UL is not very self-descriptive. :/
--
Kirill A. Shutemov
ter-procedural analysis either?
>
> For the second assertion there, everything is always constant AFAICT:
> EFI_VA_START, EFI_VA_END and P4D_MASK are all constants regardless of
> CONFIG_5LEVEL.
Back in the days BUILD_BUG_ON() false-triggered on GCC-4.8 as well.
--
Kirill A. Shutemov
if CPU1 hold reference to it?
>
> .. and then re-check ..
> if (unlikely(page != xas_reload())) {
> put_page(page);
> goto repeat;
> }
>
--
Kirill A. Shutemov
On Mon, Jan 11, 2021 at 02:37:42PM +, Will Deacon wrote:
> On Mon, Jan 11, 2021 at 05:25:33PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Jan 08, 2021 at 05:15:16PM +, Will Deacon wrote:
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index c1f2dc
1 - 100 of 10468 matches
Mail list logo