Re: [PATCH v3 2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race

2024-05-28 Thread Dave Hansen
On 5/17/24 04:06, Dmitrii Kuvaiskii wrote: ... First, why is SGX so special here? How is the SGX problem different than what the core mm code does? > --- a/arch/x86/kernel/cpu/sgx/encl.h > +++ b/arch/x86/kernel/cpu/sgx/encl.h > @@ -25,6 +25,9 @@ > /* 'desc' bit marking that the page is being

Re: [PATCH v3 0/2] x86/sgx: Fix two data races in EAUG/EREMOVE flows

2024-05-28 Thread Dave Hansen
On 5/17/24 04:06, Dmitrii Kuvaiskii wrote: > We wrote a trivial stress test to reproduce the hangs observed in > real-world applications. The test stresses #PF-based page allocation and > SGX_IOC_ENCLAVE_REMOVE_PAGES flows in the SGX driver: This seems like something we'd want in the kernel SGX

Re: [PATCH] x86/paravirt: Disable virt spinlock when CONFIG_PARAVIRT_SPINLOCKS disabled

2024-05-23 Thread Dave Hansen
On 5/23/24 11:39, Jürgen Groß wrote: >> >> Let's just keep it simple.  How about the attached patch? > > Simple indeed. The attachment is empty.  Let's try this again.diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 5358d43886ad..c193c9e60a1b 100644 ---

Re: [PATCH] x86/paravirt: Disable virt spinlock when CONFIG_PARAVIRT_SPINLOCKS disabled

2024-05-23 Thread Dave Hansen
On 5/16/24 06:02, Chen Yu wrote: > Performance drop is reported when running encode/decode workload and > BenchSEE cache sub-workload. > Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused > native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS > is disabled the

Re: [PATCH v2 1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS

2024-05-15 Thread Dave Hansen
On 5/15/24 06:54, Jarkko Sakkinen wrote: > I'd cut out 90% of the description out and just make the argument of > the wrong error code, and done. The sequence is great for showing > how this could happen. The prose makes my head hurt tbh. The changelog is too long, but not fatally so. I'd much

Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-10 Thread Dave Hansen
On 5/10/24 12:06, Dongli Zhang wrote: > } else { > + /* > + * This call borrows from the comments and implementation > + * of apic_update_vector(): "If the target CPU is offline > + * then the regular release mechanism via the cleanup > +

Re: [RFC PATCH v2 1/1] x86/sgx: Explicitly give up the CPU in EDMM's ioctl() to avoid softlockup

2024-04-26 Thread Dave Hansen
On 4/26/24 07:18, Bojun Zhu wrote: > for (c = 0 ; c < modp->length; c += PAGE_SIZE) { > + if (sgx_check_signal_and_resched()) { > + if (!c) > + ret = -ERESTARTSYS; > + > + goto out; > + } This

Re: [PATCH v12 14/14] selftests/sgx: Add scripts for EPC cgroup testing

2024-04-26 Thread Dave Hansen
On 4/16/24 07:15, Jarkko Sakkinen wrote: > On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote: > Yes, exactly. I'd take one week break and cycle the kselftest part > internally a bit as I said my previous response. I'm sure that there > is experise inside Intel how to implement it properly.

Re: [PATCH v9 15/15] selftests/sgx: Add scripts for EPC cgroup testing

2024-04-02 Thread Dave Hansen
On 3/30/24 04:23, Jarkko Sakkinen wrote: >>> I also wonder is cgroup-tools dependency absolutely required or could >>> you just have a function that would interact with sysfs? >> I should have checked email before hit the send button for v10 . >> >> It'd be more complicated and less readable to

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 14:34, Huang, Kai wrote: > So I am trying to get the actual downside of doing per-cgroup reclaim or > the full reason that we choose global reclaim. Take the most extreme example: while (hit_global_sgx_limit()) reclaim_from_this(cgroup); You eventually end up

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 14:24, Huang, Kai wrote: > What is the downside of doing per-group reclaim when try_charge() > succeeds for the enclave but failed to allocate EPC page? > > Could you give an complete answer why you choose to use global reclaim > for the above case? There are literally two different

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 13:48, Haitao Huang wrote: > In case of overcomitting, i.e., sum of limits greater than the EPC > capacity, if one group has a fault, and its usage is not above its own > limit (try_charge() passes), yet total usage of the system has exceeded > the capacity, whether we do global reclaim

Re: [PATCH v9 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge()

2024-02-26 Thread Dave Hansen
On 2/26/24 03:36, Huang, Kai wrote: >> In case of overcomitting, even if we always reclaim from the same cgroup >> for each fault, one group may still interfere the other: e.g., consider an >> extreme case in that group A used up almost all EPC at the time group B >> has a fault, B has to

Re: [RFC PATCH] x86/sgx: Remove 'reclaim' boolean parameters

2024-02-19 Thread Dave Hansen
On 2/19/24 07:39, Haitao Huang wrote: > Remove all boolean parameters for 'reclaim' from the function > sgx_alloc_epc_page() and its callers by making two versions of each > function. > > Also opportunistically remove non-static declaration of > __sgx_alloc_epc_page() and a typo > >

Re: [PATCH v9 09/15] x86/sgx: Charge mem_cgroup for per-cgroup reclamation

2024-02-16 Thread Dave Hansen
On 2/16/24 13:38, Haitao Huang wrote: > On Fri, 16 Feb 2024 09:15:59 -0600, Dave Hansen > wrote: ... >> Does this 'indirect' change any behavior other than whether it does a >> search for an mm to find a place to charge the backing storage? > > No. > >> Inst

Re: [PATCH v9 09/15] x86/sgx: Charge mem_cgroup for per-cgroup reclamation

2024-02-16 Thread Dave Hansen
On 2/5/24 13:06, Haitao Huang wrote: > @@ -414,7 +416,7 @@ static void sgx_reclaim_pages_global(void) > void sgx_reclaim_direct(void) > { > if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) > - sgx_reclaim_pages_global(); > + sgx_reclaim_pages_global(false); > } > >

Re: [PATCH v9 09/15] x86/sgx: Charge mem_cgroup for per-cgroup reclamation

2024-02-15 Thread Dave Hansen
On 2/5/24 13:06, Haitao Huang wrote: > static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl) > { > @@ -1003,14 +1001,6 @@ static struct mem_cgroup > *sgx_encl_get_mem_cgroup(struct sgx_encl *encl) > struct sgx_encl_mm *encl_mm; > int idx; > > - /* > - *

Re: [PATCH v6 00/12] Add Cgroup support for SGX EPC memory

2024-01-05 Thread Dave Hansen
There's very little about how the LRU design came to be in this cover letter. Let's add some details. How's this? Writing this up, I'm a lot more convinced that this series is, in general, taking the right approach. I honestly don't see any other alternatives. As much as I'd love to do

Re: [PATCH v6 07/12] x86/sgx: Introduce EPC page states

2024-01-05 Thread Dave Hansen
On 10/30/23 11:20, Haitao Huang wrote: > @@ -527,16 +530,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page > *page) > int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) > { > spin_lock(_global_lru.lock); > - if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { > -

Re: [PATCH v6 09/12] x86/sgx: Restructure top-level EPC reclaim function

2024-01-04 Thread Dave Hansen
On 1/4/24 11:11, Haitao Huang wrote: > If those are OK with users and also make it acceptable for merge > quickly, I'm happy to do that  How about we put some actual numbers behind this? How much complexity are we talking about here? What's the diffstat for the utterly bare-bones

Re: [PATCH v6 09/12] x86/sgx: Restructure top-level EPC reclaim function

2024-01-03 Thread Dave Hansen
On 12/18/23 13:24, Haitao Huang wrote:> @Dave and @Michal, Your thoughts? Or could you confirm we should not > do reclaim per cgroup at all? What's the benefit of doing reclaim per cgroup? Is that worth the extra complexity? The key question here is whether we want the SGX VM to be complex and

Re: [PATCH v7 00/13] selftests/sgx: Fix compilation errors

2023-11-08 Thread Dave Hansen
On 11/8/23 12:31, Jo Van Bulck wrote: > Just a kind follow-up: from what I can see, this series has not been > merged into the x86/sgx branch of tip yet (assuming that's where it > should go next)? > > Apologies if I've overlooked anything, and please let me know if there's > something on my end

Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-18 Thread Dave Hansen
On 10/18/23 08:26, Haitao Huang wrote: > Maybe not in sense of killing something. My understanding memory.reclaim > does not necessarily invoke the OOM killer. But what I really intend to > say is we can have a separate knob for user to express the need for > reducing the current usage explicitly

Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-18 Thread Dave Hansen
On 10/17/23 21:37, Haitao Huang wrote: > Yes we can introduce misc.reclaim to give user a knob to forcefully > reducing usage if that is really needed in real usage. The semantics > would make force-kill VMs explicit to user. Do any other controllers do something like this? It seems odd.

Re: [PATCH v6] x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race

2023-09-28 Thread Dave Hansen
On 9/28/23 16:08, Reinette Chatre wrote: > I'd like to check in on the status of this patch. This two month old > patch looks to be a needed fix and has Jarkko and Kai's review tags, > but I am not able to find it queued or merged in tip or upstream. > Apologies if I did not look in the right

Re: [PATCH v4 03/18] x86/sgx: Add sgx_epc_lru_lists to encapsulate LRU lists

2023-09-14 Thread Dave Hansen
On 9/14/23 03:31, Huang, Kai wrote: >> Signed-off-by: Haitao Huang >> Cc: Sean Christopherson > You don't need 'Cc:' Sean if the patch has Sean's SoB. It is a SoB for Sean's @intel address and cc's his @google address. It is fine.

Re: [PATCH] x86/tdx: refactor deprecated strncpy

2023-09-11 Thread Dave Hansen
On 9/11/23 11:27, Justin Stitt wrote: > `strncpy` is deprecated and we should prefer more robust string apis. I dunno. It actually seems like a pretty good fit here. > In this case, `message.str` is not expected to be NUL-terminated as it > is simply a buffer of characters residing in a union

Re: [PATCH v4] memregion: Add cpu_cache_invalidate_memregion() interface

2022-10-27 Thread Dave Hansen
s or negative error code on a failure to perform > + * the cache maintenance. > + */ WBINVD is a scary beast. But, there's also no better alternative in the architecture. I don't think any of my comments above are deal breakers, so from the x86 side: Acked-by: Dave Hansen

Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Dave Hansen
On 4/20/21 4:12 PM, Kuppuswamy, Sathyanarayanan wrote: > On 4/20/21 12:59 PM, Dave Hansen wrote: >> On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote: >>>>> approach is, it adds a few extra instructions for every >>>>> TDCALL use case when compared to

Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Dave Hansen
On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote: >>> approach is, it adds a few extra instructions for every >>> TDCALL use case when compared to distributed checks. Although >>> it's a bit less efficient, it's worth it to make the code more >>> readable. >> >> What's a "distributed check"?

Re: [PATCH v2 1/1] x86/tdx: Add __tdcall() and __tdvmcall() helper functions

2021-04-20 Thread Dave Hansen
On 3/26/21 4:38 PM, Kuppuswamy Sathyanarayanan wrote: > Implement common helper functions to communicate with > the TDX Module and VMM (using TDCALL instruction). This is missing any kind of background. I'd say: Guests communicate with VMMs with hypercalls. Historically, these are implemented

Re: [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table

2021-04-19 Thread Dave Hansen
On 4/19/21 11:10 AM, Andy Lutomirski wrote: > I’m confused by this scenario. This should only affect physical pages > that are in the 2M area that contains guest memory. But, if we have a > 2M direct map PMD entry that contains kernel data and guest private > memory, we’re already in a situation

Re: [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table

2021-04-19 Thread Dave Hansen
On 4/19/21 10:46 AM, Brijesh Singh wrote: > - guest wants to make gpa 0x1000 as a shared page. To support this, we > need to psmash the large RMP entry into 512 4K entries. The psmash > instruction breaks the large RMP entry into 512 4K entries without > affecting the previous validation. Now the

Re: [RFCv2 06/13] x86/realmode: Share trampoline area if KVM memory protection enabled

2021-04-19 Thread Dave Hansen
On 4/16/21 8:40 AM, Kirill A. Shutemov wrote: > /* > - * If SME is active, the trampoline area will need to be in > - * decrypted memory in order to bring up other processors > + * If SME or KVM memory protection is active, the trampoline area will > + * need to be in

Re: [RFCv2 04/13] x86/kvm: Use bounce buffers for KVM memory protection

2021-04-16 Thread Dave Hansen
On 4/16/21 8:40 AM, Kirill A. Shutemov wrote: > Mirror SEV, use SWIOTLB always if KVM memory protection is enabled. ... > arch/x86/mm/mem_encrypt.c | 44 --- > arch/x86/mm/mem_encrypt_common.c | 48 ++ The changelog need to at least

Re: [PATCH 00/10] [v7][RESEND] Migrate Pages in lieu of discard

2021-04-16 Thread Dave Hansen
On 4/16/21 5:35 AM, Michal Hocko wrote: > I have to confess that I haven't grasped the initialization > completely. There is a nice comment explaining a 2 socket system with > 3 different NUMA nodes attached to it with one node being terminal. > This is OK if the terminal node is PMEM but

Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-04-15 Thread Dave Hansen
On 4/15/21 9:24 AM, Andy Lutomirski wrote: > In the patches, *as submitted*, if you trip the XFD #NM *once* and you > are the only thread on the system to do so, you will eat the cost of a > WRMSR on every subsequent context switch. I think you're saying: If a thread trips XFD #NM *once*, every

Re: [PATCH 02/10] mm/numa: automatically generate node migration order

2021-04-15 Thread Dave Hansen
On 4/14/21 9:07 PM, Wei Xu wrote: > On Wed, Apr 14, 2021 at 1:08 AM Oscar Salvador wrote: >> Fast class/memory are pictured as those nodes with CPUs, while Slow >> class/memory >> are PMEM, right? >> Then, what stands for medium class/memory? > > That is Dave's example. I think David's guess

Re: [PATCH v8] x86/sgx: Maintain encl->refcount for each encl->mm_list entry

2021-04-14 Thread Dave Hansen
On 4/14/21 8:51 AM, Sean Christopherson wrote: >> Could this access to and kfree of encl_mm possibly be after the >> kfree(encl_mm) noted above? > No, the mmu_notifier_unregister() ensures that all in-progress notifiers > complete > before it returns, i.e. SGX's notifier call back is not

Re: [PATCH v2 2/3] soundwire: Intel: introduce DMI quirks for HP Spectre x360 Convertible

2021-04-12 Thread Dave Hansen
On 3/1/21 11:51 PM, Bard Liao wrote: > +++ b/drivers/soundwire/dmi-quirks.c > @@ -0,0 +1,66 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > +// Copyright(c) 2021 Intel Corporation. It looks like this is already in intel-next, so this may be moot. But, is there a specific reason

Re: [PATCH v6 19/34] xlink-core: Add xlink core device tree bindings

2021-04-12 Thread Dave Hansen
On 2/12/21 2:22 PM, mgr...@linux.intel.com wrote: > +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) > +# Copyright (c) Intel Corporation. All rights reserved. > +%YAML 1.2 > +--- > +$id: "http://devicetree.org/schemas/misc/intel,keembay-xlink.yaml#; > +$schema:

Re: [PATCH v2 0/3] x86/sgx: eextend ioctl

2021-04-12 Thread Dave Hansen
On 4/12/21 8:58 AM, Jethro Beekman wrote: > On 2021-04-12 17:36, Dave Hansen wrote: >> On 4/12/21 1:59 AM, Raoul Strackx wrote: >>> This patch set adds a new ioctl to enable userspace to execute EEXTEND >>> leaf functions per 256 bytes of enclave memory. With thi

Re: [PATCH v2 0/3] x86/sgx: eextend ioctl

2021-04-12 Thread Dave Hansen
On 4/12/21 9:41 AM, Jethro Beekman wrote: > Yes this still doesn't let one execute all possible ECREATE, EADD, EEXTEND, > EINIT sequences. OK, so we're going in circles now. I don't believe we necessarily *WANT* or need Linux to support "all possible ECREATE, EADD, EEXTEND, EINIT sequences".

Re: [PATCH v2 0/3] x86/sgx: eextend ioctl

2021-04-12 Thread Dave Hansen
On 4/12/21 1:59 AM, Raoul Strackx wrote: > This patch set adds a new ioctl to enable userspace to execute EEXTEND > leaf functions per 256 bytes of enclave memory. With this patch in place, > Linux will be able to build all valid SGXv1 enclaves. This didn't cover why we need a *NEW* ABI for this

Re: [PATCH 04/10] mm/migrate: make migrate_pages() return nr_succeeded

2021-04-09 Thread Dave Hansen
On 4/8/21 11:17 AM, Oscar Salvador wrote: > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8490,7 +8490,8 @@ static int __alloc_contig_migrate_range(struct > compact_control *cc, > cc->nr_migratepages -= nr_reclaimed; > > ret = migrate_pages(>migratepages,

Re: [PATCH 02/10] mm/numa: automatically generate node migration order

2021-04-08 Thread Dave Hansen
On 4/8/21 1:26 AM, Oscar Salvador wrote: > On Thu, Apr 01, 2021 at 11:32:19AM -0700, Dave Hansen wrote: >> The protocol for node_demotion[] access and writing is not >> standard. It has no specific locking and is intended to be read >> locklessly. Readers must take ca

Re: [PATCH 01/10] mm/numa: node demotion data structure and lookup

2021-04-08 Thread Dave Hansen
On 4/8/21 1:03 AM, Oscar Salvador wrote: > I think this patch and patch#2 could be squashed > > Reviewed-by: Oscar Salvador Yeah, that makes a lot of sense. I'll do that for the next version.

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-04-08 Thread Dave Hansen
On 4/8/21 8:27 AM, Jethro Beekman wrote: > But the native “executable format” for SGX is very clearly defined in > the Intel SDM as a specific sequence of ECREATE, EADD, EEXTEND and > EINIT calls. It's that sequence that's used for loading the enclave > and it's that sequence that's used for

Re: [RFC v1 25/26] x86/tdx: Make DMA pages shared

2021-04-06 Thread Dave Hansen
On 4/6/21 9:31 AM, Kirill A. Shutemov wrote: > On Thu, Apr 01, 2021 at 02:01:15PM -0700, Dave Hansen wrote: >>> @@ -1977,8 +1978,8 @@ static int __set_memory_enc_dec(unsigned long addr, >>> int numpages, bool enc) >>> struct cpa_data cpa; >>>

Re: [RFC v1 23/26] x86/tdx: Make pages shared in ioremap()

2021-04-06 Thread Dave Hansen
On 4/6/21 9:00 AM, Kirill A. Shutemov wrote: >>> --- a/arch/x86/mm/ioremap.c >>> +++ b/arch/x86/mm/ioremap.c >>> @@ -87,12 +87,12 @@ static unsigned int __ioremap_check_ram(struct resource >>> *res) >>> } >>> >>> /* >>> - * In a SEV guest, NONE and RESERVED should not be mapped encrypted

Re: [RFC v1 22/26] x86/tdx: Exclude Shared bit from __PHYSICAL_MASK

2021-04-06 Thread Dave Hansen
On 4/6/21 8:54 AM, Kirill A. Shutemov wrote: > On Thu, Apr 01, 2021 at 01:13:16PM -0700, Dave Hansen wrote: >>> @@ -56,6 +61,9 @@ static void tdx_get_info(void) >>> >>> td_info.gpa_width = rcx & GENMASK(5, 0); >>> td_info.attributes = rd

Re: [RFC v1 21/26] x86/mm: Move force_dma_unencrypted() to common code

2021-04-06 Thread Dave Hansen
On 4/6/21 8:37 AM, Kirill A. Shutemov wrote: > On Thu, Apr 01, 2021 at 01:06:29PM -0700, Dave Hansen wrote: >> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: >>> From: "Kirill A. Shutemov" >>> >>> Intel TDX doesn't allow VMM to access

Re: [RFCv1 7/7] KVM: unmap guest memory using poisoned pages

2021-04-06 Thread Dave Hansen
On 4/6/21 12:44 AM, David Hildenbrand wrote: > On 02.04.21 17:26, Kirill A. Shutemov wrote: >> TDX architecture aims to provide resiliency against confidentiality and >> integrity attacks. Towards this goal, the TDX architecture helps enforce >> the enabling of memory integrity for all TD-private

Re: [RFC 2/3] vmalloc: Support grouped page allocations

2021-04-05 Thread Dave Hansen
On 4/5/21 1:37 PM, Rick Edgecombe wrote: > +static void __dispose_pages(struct list_head *head) > +{ > + struct list_head *cur, *next; > + > + list_for_each_safe(cur, next, head) { > + list_del(cur); > + > + /* The list head is stored at the start of the page */ > +

Re: [RFC v1 00/26] Add TDX Guest Support

2021-04-04 Thread Dave Hansen
It occurred to me that I've been doing a lot of digging in the TDX spec lately. I think we can all agree that the "Architecture Specification" is not the world's easiest, most disgestable reading. It's hard to figure out what the Linux relation to the spec is. One bit of Documentation we need

Re: [RFC v1 00/26] Add TDX Guest Support

2021-04-03 Thread Dave Hansen
On 4/2/21 2:32 PM, Andi Kleen wrote: >> If we go this route, what are the rules and restrictions? Do we have to >> say "no MMIO in #VE"? > > All we have to say is "No MMIO in #VE before getting thd TDVEINFO arguments" > After that it can nest without problems. Well, not exactly. You still

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-04-02 Thread Dave Hansen
On 4/2/21 1:20 PM, Jethro Beekman wrote: > On 2021-04-02 21:50, Dave Hansen wrote: >> Again, how does this save space? >> >> Are you literally talking about the temporary cost of allocating *one* page? > > No I'm talking about the amount of disk space/network tra

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-04-02 Thread Dave Hansen
On 4/2/21 12:38 PM, Jethro Beekman wrote: > On 2021-04-02 20:42, Dave Hansen wrote: >> On 4/2/21 11:31 AM, Jethro Beekman wrote: >>> On 2021-04-02 17:53, Dave Hansen wrote: >>>> But, why would an enclave loader application ever do this? >>> >>> e.

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-04-02 Thread Dave Hansen
On 4/2/21 11:31 AM, Jethro Beekman wrote: > On 2021-04-02 17:53, Dave Hansen wrote: >> On 4/2/21 1:38 AM, Jethro Beekman wrote: >>>> So, we're talking here about pages that have been EEADDED, but for >>>> which we do not want to include the entire contents of

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-04-02 Thread Dave Hansen
On 4/2/21 1:38 AM, Jethro Beekman wrote: >> So, we're talking here about pages that have been EEADDED, but for >> which we do not want to include the entire contents of the page? >> Do these contents always include the beginning of the page, or can >> the holes be anywhere? > Holes can be

Re: [RFC v1 00/26] Add TDX Guest Support

2021-04-02 Thread Dave Hansen
On 4/1/21 7:48 PM, Andi Kleen wrote: >> I've heard things like "we need to harden the drivers" or "we need to do >> audits" and that drivers might be "whitelisted". > > The basic driver allow listing patches are already in the repository, > but not currently posted or complete: > >

Re: [RFC v1 00/26] Add TDX Guest Support

2021-04-01 Thread Dave Hansen
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: > Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious > hosts and some physical attacks. This series adds the bare-minimum > support to run a TDX guest. The host-side support will be submitted > separately. Also support for

Re: [PATCH 04/10] mm/migrate: make migrate_pages() return nr_succeeded

2021-04-01 Thread Dave Hansen
On 4/1/21 3:35 PM, Wei Xu wrote: > A small suggestion: Given that migrate_pages() requires that > *nr_succeeded should be initialized to 0 when it is called due to its > use of *nr_succeeded in count_vm_events() and trace_mm_migrate_pages(), > it would be less error-prone if migrate_pages()

Re: [PATCH 05/10] mm/migrate: demote pages during reclaim

2021-04-01 Thread Dave Hansen
On 4/1/21 1:01 PM, Yang Shi wrote: > On Thu, Apr 1, 2021 at 11:35 AM Dave Hansen > wrote: >> >> >> From: Dave Hansen >> >> This is mostly derived from a patch from Yang Shi: >> >> >> https://lore.kernel.org/linux-mm/15604685

Re: [RFC v1 12/26] x86/tdx: Handle in-kernel MMIO

2021-04-01 Thread Dave Hansen
On 4/1/21 3:26 PM, Sean Christopherson wrote: > On Thu, Apr 01, 2021, Dave Hansen wrote: >> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: >>> From: "Kirill A. Shutemov" >>> >>> Handle #VE due to MMIO operations. MMIO triggers #VE with EPT_VIOL

Re: [RFC v1 03/26] x86/cpufeatures: Add is_tdx_guest() interface

2021-04-01 Thread Dave Hansen
On 4/1/21 2:15 PM, Kuppuswamy, Sathyanarayanan wrote: > On 4/1/21 2:08 PM, Dave Hansen wrote: >> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: >>> +bool is_tdx_guest(void) >>> +{ >>> +    return static_cpu_has(X86_FEATURE_TDX_GUEST); >>> +} &g

Re: [RFC v1 26/26] x86/kvm: Use bounce buffers for TD guest

2021-04-01 Thread Dave Hansen
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: > From: "Kirill A. Shutemov" > > TDX doesn't allow to perform DMA access to guest private memory. > In order for DMA to work properly in TD guest, user SWIOTLB bounce > buffers. > > Move AMD SEV initialization into common code and adopt for

Re: [RFC v1 03/26] x86/cpufeatures: Add is_tdx_guest() interface

2021-04-01 Thread Dave Hansen
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: > +bool is_tdx_guest(void) > +{ > + return static_cpu_has(X86_FEATURE_TDX_GUEST); > +} Why do you need is_tdx_guest() as opposed to calling cpu_feature_enabled(X86_FEATURE_TDX_GUEST) everywhere?

Re: [RFC v1 25/26] x86/tdx: Make DMA pages shared

2021-04-01 Thread Dave Hansen
> +int tdx_map_gpa(phys_addr_t gpa, int numpages, bool private) > +{ > + int ret, i; > + > + ret = __tdx_map_gpa(gpa, numpages, private); > + if (ret || !private) > + return ret; > + > + for (i = 0; i < numpages; i++) > + tdx_accept_page(gpa + i*PAGE_SIZE);

Re: [RFC v1 23/26] x86/tdx: Make pages shared in ioremap()

2021-04-01 Thread Dave Hansen
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: > From: "Kirill A. Shutemov" > > All ioremap()ed paged that are not backed by normal memory (NONE or > RESERVED) have to be mapped as shared. s/paged/pages/ > +/* Make the page accesable by VMM */ > +#define pgprot_tdx_shared(prot)

Re: [RFC v1 22/26] x86/tdx: Exclude Shared bit from __PHYSICAL_MASK

2021-04-01 Thread Dave Hansen
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: > From: "Kirill A. Shutemov" > > tdx_shared_mask() returns the mask that has to be set in page table > entry to make page shared with VMM. Needs to be either: has to be set in a page table entry or has to be set in page table

Re: [RFC v1 21/26] x86/mm: Move force_dma_unencrypted() to common code

2021-04-01 Thread Dave Hansen
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: > From: "Kirill A. Shutemov" > > Intel TDX doesn't allow VMM to access guest memory. Any memory that is > required for communication with VMM suppose to be shared explicitly by s/suppose to/must/ > setting the bit in page table entry. The

Re: [RFC v1 12/26] x86/tdx: Handle in-kernel MMIO

2021-04-01 Thread Dave Hansen
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote: > From: "Kirill A. Shutemov" > > Handle #VE due to MMIO operations. MMIO triggers #VE with EPT_VIOLATION > exit reason. > > For now we only handle subset of instruction that kernel uses for MMIO > oerations. User-space access triggers SIGBUS.

Re: [PATCH 2/2] x86/sgx: Add sgx_nr_{all, free}_pages to the debugfs

2021-04-01 Thread Dave Hansen
On 3/31/21 10:21 PM, Jarkko Sakkinen wrote: > +#ifdef CONFIG_DEBUG_FS > + debugfs_create_file("sgx_nr_all_pages", 0400, arch_debugfs_dir, NULL, > + _nr_all_pages_fops); > + debugfs_create_file("sgx_nr_free_pages", 0400, arch_debugfs_dir, NULL, > +

[PATCH 02/10] mm/numa: automatically generate node migration order

2021-04-01 Thread Dave Hansen
From: Dave Hansen When memory fills up on a node, memory contents can be automatically migrated to another node. The biggest problems are knowing when to migrate and to where the migration should be targeted. The most straightforward way to generate the "to where" list would be

[PATCH 03/10] mm/migrate: update node demotion order during on hotplug events

2021-04-01 Thread Dave Hansen
From: Dave Hansen Reclaim-based migration is attempting to optimize data placement in memory based on the system topology. If the system changes, so must the migration ordering. The implementation is conceptually simple and entirely unoptimized. On any memory or CPU hotplug events, assume

[PATCH 00/10] [v7][RESEND] Migrate Pages in lieu of discard

2021-04-01 Thread Dave Hansen
I'm resending this because I forgot to cc the mailing lists on the post yesterday. Sorry for the noise. Please reply to this series. The full series is also available here: https://github.com/hansendc/linux/tree/automigrate-20210331 which also inclues some vm.zone_reclaim_mode sysctl

[PATCH 10/10] mm/migrate: new zone_reclaim_mode to enable reclaim migration

2021-04-01 Thread Dave Hansen
From: Dave Hansen Some method is obviously needed to enable reclaim-based migration. Just like traditional autonuma, there will be some workloads that will benefit like workloads with more "static" configurations where hot pages stay hot and cold pages stay cold. If pages come a

[PATCH 06/10] mm/vmscan: add page demotion counter

2021-04-01 Thread Dave Hansen
ges() ] Signed-off-by: Yang Shi Signed-off-by: Dave Hansen Reviewed-by: Yang Shi Cc: Wei Xu Cc: David Rientjes Cc: Huang Ying Cc: Dan Williams Cc: David Hildenbrand Cc: osalvador -- Changes since 202010: * remove unused scan-control 'demoted' field --- b/include/linux/vm_event_ite

[PATCH 04/10] mm/migrate: make migrate_pages() return nr_succeeded

2021-04-01 Thread Dave Hansen
account how many pages are reclaimed (demoted) since page reclaim behavior depends on this. Add *nr_succeeded parameter to make migrate_pages() return how many pages are demoted successfully for all cases. Signed-off-by: Yang Shi Signed-off-by: Dave Hansen Reviewed-by: Yang Shi Cc: Wei Xu Cc

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-04-01 Thread Dave Hansen
On 4/1/21 10:49 AM, Raoul Strackx wrote: > On 4/1/21 6:11 PM, Dave Hansen wrote: >> On 4/1/21 7:56 AM, Raoul Strackx wrote: >>> SOLUTION OF THIS PATCH >>> This patch adds a new ioctl to enable userspace to execute EEXTEND leaf >>> functions per 256 bytes of e

[PATCH 09/10] mm/vmscan: never demote for memcg reclaim

2021-04-01 Thread Dave Hansen
From: Dave Hansen Global reclaim aims to reduce the amount of memory used on a given node or set of nodes. Migrating pages to another node serves this purpose. memcg reclaim is different. Its goal is to reduce the total memory consumption of the entire memcg, across all nodes. Migration

[PATCH 01/10] mm/numa: node demotion data structure and lookup

2021-04-01 Thread Dave Hansen
From: Dave Hansen Prepare for the kernel to auto-migrate pages to other memory nodes with a user defined node migration table. This allows creating single migration target for each NUMA node to enable the kernel to do NUMA page migrations instead of simply reclaiming colder pages. A node

[PATCH 07/10] mm/vmscan: add helper for querying ability to age anonymous pages

2021-04-01 Thread Dave Hansen
From: Dave Hansen Anonymous pages are kept on their own LRU(s). These lists could theoretically always be scanned and maintained. But, without swap, there is currently nothing the kernel can *do* with the results of a scanned, sorted LRU for anonymous pages. A check for '!total_swap_pages

[PATCH 05/10] mm/migrate: demote pages during reclaim

2021-04-01 Thread Dave Hansen
From: Dave Hansen This is mostly derived from a patch from Yang Shi: https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang@linux.alibaba.com/ Add code to the reclaim path (shrink_page_list()) to "demote" data to another NUMA node instead of

[PATCH 08/10] mm/vmscan: Consider anonymous pages without swap

2021-04-01 Thread Dave Hansen
context *can* actually be reclaimed, given current swap space and cgroup limits anon_should_be_aged() is a much simpler and more preliminary check which just says whether there is a possibility of future reclaim. #Signed-off-by: Keith Busch Cc: Keith Busch Signed-off-by: Dave Hansen Reviewed

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-04-01 Thread Dave Hansen
On 4/1/21 7:56 AM, Raoul Strackx wrote: > > SOLUTION OF THIS PATCH > This patch adds a new ioctl to enable userspace to execute EEXTEND leaf > functions per 256 bytes of enclave memory. This enables enclaves to be > build as specified by enclave providers. I think tying the user ABI to the SGX

Re: [PATCH v4 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-31 Thread Dave Hansen
On 3/31/21 8:28 PM, Andi Kleen wrote: >> The hardware (and VMMs and SEAM) have ways of telling the guest kernel >> what is supported: CPUID. If it screws up, and the guest gets an >> unexpected #VE, so be it. > The main reason for disabling stuff is actually that we don't need > to harden it. All

Re: [PATCH v4 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-31 Thread Dave Hansen
On 3/31/21 3:28 PM, Kuppuswamy, Sathyanarayanan wrote: > > On 3/31/21 3:11 PM, Dave Hansen wrote: >> On 3/31/21 3:06 PM, Sean Christopherson wrote: >>> I've no objection to a nice message in the #VE handler.  What I'm >>> objecting to >>> is sanity check

Re: [PATCH v4 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-31 Thread Dave Hansen
On 3/31/21 3:06 PM, Sean Christopherson wrote: > I've no objection to a nice message in the #VE handler. What I'm objecting to > is sanity checking the CPUID model provided by the TDX module. If we don't > trust the TDX module to honor the spec, then there are a huge pile of things > that are

Re: [PATCH v4 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-31 Thread Dave Hansen
On 3/31/21 2:53 PM, Sean Christopherson wrote: > On Wed, Mar 31, 2021, Kuppuswamy Sathyanarayanan wrote: >> Changes since v3: >> * WARN user if SEAM does not disable MONITOR/MWAIT instruction. > Why bother? There are a whole pile of features that are dictated by the TDX > module spec.

Re: [PATCH v4 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-31 Thread Dave Hansen
On 3/31/21 2:09 PM, Kuppuswamy Sathyanarayanan wrote: > As per Guest-Host Communication Interface (GHCI) Specification > for Intel TDX, sec 2.4.1, TDX architecture does not support > MWAIT, MONITOR and WBINVD instructions. So in non-root TDX mode, > if MWAIT/MONITOR instructions are executed with

Re: [PATCH V5 08/10] x86/entry: Preserve PKRS MSR across exceptions

2021-03-31 Thread Dave Hansen
On 3/31/21 12:14 PM, ira.we...@intel.com wrote: > + * To protect against exceptions having access to this memory we save the > + * current running value and sets the PKRS value to be used during the > + * exception. This series seems to have grown some "we's". The preexisting pkey code was not

Re: [PATCH RESEND 0/3] x86/sgx: eextend ioctl

2021-03-31 Thread Dave Hansen
On 3/31/21 5:50 AM, Raoul Strackx wrote: > The sgx driver can only load enclaves whose pages are fully measured. > This may exclude existing enclaves from running. This patch adds a > new ioctl to measure 256 byte chunks at a time. The changelogs here are pretty sparse. Could you explain in a

Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-03-30 Thread Dave Hansen
On 3/30/21 10:56 AM, Len Brown wrote: > On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski wrote: >>> On Mar 30, 2021, at 10:01 AM, Len Brown wrote: >>> Is it required (by the "ABI") that a user program has everything >>> on the stack for user-space XSAVE/XRESTOR to get back >>> to the state of the

Re: [PATCH v1 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-30 Thread Dave Hansen
On 3/30/21 8:00 AM, Andi Kleen wrote: >>> + /* MWAIT is not supported in TDX platform, so suppress it */ >>> + setup_clear_cpu_cap(X86_FEATURE_MWAIT); >> In fact, MWAIT bit returned by CPUID instruction is zero for TD guest. This >> is enforced by SEAM module. > Good point. >> Do we still need

Re: [PATCH v3 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-29 Thread Dave Hansen
On 3/29/21 4:16 PM, Kuppuswamy Sathyanarayanan wrote: > In non-root TDX guest mode, MWAIT, MONITOR and WBINVD instructions > are not supported. So handle #VE due to these instructions > appropriately. This misses a key detail: "are not supported" ... and other patches have prevented a

Re: [PATCH v2 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-29 Thread Dave Hansen
On 3/29/21 3:09 PM, Kuppuswamy, Sathyanarayanan wrote: > +    case EXIT_REASON_MWAIT_INSTRUCTION: > +    /* MWAIT is supressed, not supposed to reach here. */ > +    WARN(1, "MWAIT unexpected #VE Exception\n"); > +    return -EFAULT; How is MWAIT

Re: [PATCH v2 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD

2021-03-29 Thread Dave Hansen
On 3/29/21 2:55 PM, Kuppuswamy, Sathyanarayanan wrote: >> >> MONITOR is a privileged instruction, right?  So we can only end up in >> here if the kernel screws up and isn't reading CPUID correctly, right? >> >> That dosen't seem to me like something we want to suppress.  This needs >> a warning,

Re: I915 CI-run with kfence enabled, issues found

2021-03-29 Thread Dave Hansen
On 3/29/21 10:45 AM, Marco Elver wrote: > On Mon, 29 Mar 2021 at 19:32, Dave Hansen wrote: > Doing it to all CPUs is too expensive, and we can tolerate this being > approximate (nothing bad will happen, KFENCE might just miss a bug and > that's ok). ... >> BTW,

  1   2   3   4   5   6   7   8   9   10   >