On 5/17/24 04:06, Dmitrii Kuvaiskii wrote:
...
First, why is SGX so special here? How is the SGX problem different
than what the core mm code does?
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -25,6 +25,9 @@
> /* 'desc' bit marking that the page is being
On 5/17/24 04:06, Dmitrii Kuvaiskii wrote:
> We wrote a trivial stress test to reproduce the hangs observed in
> real-world applications. The test stresses #PF-based page allocation and
> SGX_IOC_ENCLAVE_REMOVE_PAGES flows in the SGX driver:
This seems like something we'd want in the kernel SGX
On 5/23/24 11:39, Jürgen Groß wrote:
>>
>> Let's just keep it simple. How about the attached patch?
>
> Simple indeed. The attachment is empty.
Let's try this again.diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 5358d43886ad..c193c9e60a1b 100644
---
On 5/16/24 06:02, Chen Yu wrote:
> Performance drop is reported when running encode/decode workload and
> BenchSEE cache sub-workload.
> Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
> native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
> is disabled the
On 5/15/24 06:54, Jarkko Sakkinen wrote:
> I'd cut out 90% of the description out and just make the argument of
> the wrong error code, and done. The sequence is great for showing
> how this could happen. The prose makes my head hurt tbh.
The changelog is too long, but not fatally so. I'd much
On 5/10/24 12:06, Dongli Zhang wrote:
> } else {
> + /*
> + * This call borrows from the comments and implementation
> + * of apic_update_vector(): "If the target CPU is offline
> + * then the regular release mechanism via the cleanup
> +
On 4/26/24 07:18, Bojun Zhu wrote:
> for (c = 0 ; c < modp->length; c += PAGE_SIZE) {
> + if (sgx_check_signal_and_resched()) {
> + if (!c)
> + ret = -ERESTARTSYS;
> +
> + goto out;
> + }
This
On 4/16/24 07:15, Jarkko Sakkinen wrote:
> On Tue Apr 16, 2024 at 8:42 AM EEST, Huang, Kai wrote:
> Yes, exactly. I'd take one week break and cycle the kselftest part
> internally a bit as I said my previous response. I'm sure that there
> is experise inside Intel how to implement it properly.
On 3/30/24 04:23, Jarkko Sakkinen wrote:
>>> I also wonder is cgroup-tools dependency absolutely required or could
>>> you just have a function that would interact with sysfs?
>> I should have checked email before hit the send button for v10 .
>>
>> It'd be more complicated and less readable to
On 2/26/24 14:34, Huang, Kai wrote:
> So I am trying to get the actual downside of doing per-cgroup reclaim or
> the full reason that we choose global reclaim.
Take the most extreme example:
while (hit_global_sgx_limit())
reclaim_from_this(cgroup);
You eventually end up
On 2/26/24 14:24, Huang, Kai wrote:
> What is the downside of doing per-group reclaim when try_charge()
> succeeds for the enclave but failed to allocate EPC page?
>
> Could you give an complete answer why you choose to use global reclaim
> for the above case?
There are literally two different
On 2/26/24 13:48, Haitao Huang wrote:
> In case of overcomitting, i.e., sum of limits greater than the EPC
> capacity, if one group has a fault, and its usage is not above its own
> limit (try_charge() passes), yet total usage of the system has exceeded
> the capacity, whether we do global reclaim
On 2/26/24 03:36, Huang, Kai wrote:
>> In case of overcomitting, even if we always reclaim from the same cgroup
>> for each fault, one group may still interfere the other: e.g., consider an
>> extreme case in that group A used up almost all EPC at the time group B
>> has a fault, B has to
On 2/19/24 07:39, Haitao Huang wrote:
> Remove all boolean parameters for 'reclaim' from the function
> sgx_alloc_epc_page() and its callers by making two versions of each
> function.
>
> Also opportunistically remove non-static declaration of
> __sgx_alloc_epc_page() and a typo
>
>
On 2/16/24 13:38, Haitao Huang wrote:
> On Fri, 16 Feb 2024 09:15:59 -0600, Dave Hansen
> wrote:
...
>> Does this 'indirect' change any behavior other than whether it does a
>> search for an mm to find a place to charge the backing storage?
>
> No.
>
>> Inst
On 2/5/24 13:06, Haitao Huang wrote:
> @@ -414,7 +416,7 @@ static void sgx_reclaim_pages_global(void)
> void sgx_reclaim_direct(void)
> {
> if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
> - sgx_reclaim_pages_global();
> + sgx_reclaim_pages_global(false);
> }
>
>
On 2/5/24 13:06, Haitao Huang wrote:
> static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
> {
> @@ -1003,14 +1001,6 @@ static struct mem_cgroup
> *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
> struct sgx_encl_mm *encl_mm;
> int idx;
>
> - /*
> - *
There's very little about how the LRU design came to be in this cover
letter. Let's add some details.
How's this?
Writing this up, I'm a lot more convinced that this series is, in
general, taking the right approach. I honestly don't see any other
alternatives. As much as I'd love to do
On 10/30/23 11:20, Haitao Huang wrote:
> @@ -527,16 +530,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page
> *page)
> int sgx_unmark_page_reclaimable(struct sgx_epc_page *page)
> {
> spin_lock(_global_lru.lock);
> - if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) {
> -
On 1/4/24 11:11, Haitao Huang wrote:
> If those are OK with users and also make it acceptable for merge
> quickly, I'm happy to do that
How about we put some actual numbers behind this? How much complexity
are we talking about here? What's the diffstat for the utterly
bare-bones
On 12/18/23 13:24, Haitao Huang wrote:> @Dave and @Michal, Your
thoughts? Or could you confirm we should not
> do reclaim per cgroup at all?
What's the benefit of doing reclaim per cgroup? Is that worth the extra
complexity?
The key question here is whether we want the SGX VM to be complex and
On 11/8/23 12:31, Jo Van Bulck wrote:
> Just a kind follow-up: from what I can see, this series has not been
> merged into the x86/sgx branch of tip yet (assuming that's where it
> should go next)?
>
> Apologies if I've overlooked anything, and please let me know if there's
> something on my end
On 10/18/23 08:26, Haitao Huang wrote:
> Maybe not in sense of killing something. My understanding memory.reclaim
> does not necessarily invoke the OOM killer. But what I really intend to
> say is we can have a separate knob for user to express the need for
> reducing the current usage explicitly
On 10/17/23 21:37, Haitao Huang wrote:
> Yes we can introduce misc.reclaim to give user a knob to forcefully
> reducing usage if that is really needed in real usage. The semantics
> would make force-kill VMs explicit to user.
Do any other controllers do something like this? It seems odd.
On 9/28/23 16:08, Reinette Chatre wrote:
> I'd like to check in on the status of this patch. This two month old
> patch looks to be a needed fix and has Jarkko and Kai's review tags,
> but I am not able to find it queued or merged in tip or upstream.
> Apologies if I did not look in the right
On 9/14/23 03:31, Huang, Kai wrote:
>> Signed-off-by: Haitao Huang
>> Cc: Sean Christopherson
> You don't need 'Cc:' Sean if the patch has Sean's SoB.
It is a SoB for Sean's @intel address and cc's his @google address.
It is fine.
On 9/11/23 11:27, Justin Stitt wrote:
> `strncpy` is deprecated and we should prefer more robust string apis.
I dunno. It actually seems like a pretty good fit here.
> In this case, `message.str` is not expected to be NUL-terminated as it
> is simply a buffer of characters residing in a union
s or negative error code on a failure to perform
> + * the cache maintenance.
> + */
WBINVD is a scary beast. But, there's also no better alternative in the
architecture. I don't think any of my comments above are deal breakers,
so from the x86 side:
Acked-by: Dave Hansen
On 4/20/21 4:12 PM, Kuppuswamy, Sathyanarayanan wrote:
> On 4/20/21 12:59 PM, Dave Hansen wrote:
>> On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>>>> approach is, it adds a few extra instructions for every
>>>>> TDCALL use case when compared to
On 4/20/21 12:20 PM, Kuppuswamy, Sathyanarayanan wrote:
>>> approach is, it adds a few extra instructions for every
>>> TDCALL use case when compared to distributed checks. Although
>>> it's a bit less efficient, it's worth it to make the code more
>>> readable.
>>
>> What's a "distributed check"?
On 3/26/21 4:38 PM, Kuppuswamy Sathyanarayanan wrote:
> Implement common helper functions to communicate with
> the TDX Module and VMM (using TDCALL instruction).
This is missing any kind of background. I'd say:
Guests communicate with VMMs with hypercalls. Historically, these are
implemented
On 4/19/21 11:10 AM, Andy Lutomirski wrote:
> I’m confused by this scenario. This should only affect physical pages
> that are in the 2M area that contains guest memory. But, if we have a
> 2M direct map PMD entry that contains kernel data and guest private
> memory, we’re already in a situation
On 4/19/21 10:46 AM, Brijesh Singh wrote:
> - guest wants to make gpa 0x1000 as a shared page. To support this, we
> need to psmash the large RMP entry into 512 4K entries. The psmash
> instruction breaks the large RMP entry into 512 4K entries without
> affecting the previous validation. Now the
On 4/16/21 8:40 AM, Kirill A. Shutemov wrote:
> /*
> - * If SME is active, the trampoline area will need to be in
> - * decrypted memory in order to bring up other processors
> + * If SME or KVM memory protection is active, the trampoline area will
> + * need to be in
On 4/16/21 8:40 AM, Kirill A. Shutemov wrote:
> Mirror SEV, use SWIOTLB always if KVM memory protection is enabled.
...
> arch/x86/mm/mem_encrypt.c | 44 ---
> arch/x86/mm/mem_encrypt_common.c | 48 ++
The changelog need to at least
On 4/16/21 5:35 AM, Michal Hocko wrote:
> I have to confess that I haven't grasped the initialization
> completely. There is a nice comment explaining a 2 socket system with
> 3 different NUMA nodes attached to it with one node being terminal.
> This is OK if the terminal node is PMEM but
On 4/15/21 9:24 AM, Andy Lutomirski wrote:
> In the patches, *as submitted*, if you trip the XFD #NM *once* and you
> are the only thread on the system to do so, you will eat the cost of a
> WRMSR on every subsequent context switch.
I think you're saying: If a thread trips XFD #NM *once*, every
On 4/14/21 9:07 PM, Wei Xu wrote:
> On Wed, Apr 14, 2021 at 1:08 AM Oscar Salvador wrote:
>> Fast class/memory are pictured as those nodes with CPUs, while Slow
>> class/memory
>> are PMEM, right?
>> Then, what stands for medium class/memory?
>
> That is Dave's example. I think David's guess
On 4/14/21 8:51 AM, Sean Christopherson wrote:
>> Could this access to and kfree of encl_mm possibly be after the
>> kfree(encl_mm) noted above?
> No, the mmu_notifier_unregister() ensures that all in-progress notifiers
> complete
> before it returns, i.e. SGX's notifier call back is not
On 3/1/21 11:51 PM, Bard Liao wrote:
> +++ b/drivers/soundwire/dmi-quirks.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +// Copyright(c) 2021 Intel Corporation.
It looks like this is already in intel-next, so this may be moot. But,
is there a specific reason
On 2/12/21 2:22 PM, mgr...@linux.intel.com wrote:
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +# Copyright (c) Intel Corporation. All rights reserved.
> +%YAML 1.2
> +---
> +$id: "http://devicetree.org/schemas/misc/intel,keembay-xlink.yaml#;
> +$schema:
On 4/12/21 8:58 AM, Jethro Beekman wrote:
> On 2021-04-12 17:36, Dave Hansen wrote:
>> On 4/12/21 1:59 AM, Raoul Strackx wrote:
>>> This patch set adds a new ioctl to enable userspace to execute EEXTEND
>>> leaf functions per 256 bytes of enclave memory. With thi
On 4/12/21 9:41 AM, Jethro Beekman wrote:
> Yes this still doesn't let one execute all possible ECREATE, EADD, EEXTEND,
> EINIT sequences.
OK, so we're going in circles now.
I don't believe we necessarily *WANT* or need Linux to support "all
possible ECREATE, EADD, EEXTEND, EINIT sequences".
On 4/12/21 1:59 AM, Raoul Strackx wrote:
> This patch set adds a new ioctl to enable userspace to execute EEXTEND
> leaf functions per 256 bytes of enclave memory. With this patch in place,
> Linux will be able to build all valid SGXv1 enclaves.
This didn't cover why we need a *NEW* ABI for this
On 4/8/21 11:17 AM, Oscar Salvador wrote:
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8490,7 +8490,8 @@ static int __alloc_contig_migrate_range(struct
> compact_control *cc,
> cc->nr_migratepages -= nr_reclaimed;
>
> ret = migrate_pages(>migratepages,
On 4/8/21 1:26 AM, Oscar Salvador wrote:
> On Thu, Apr 01, 2021 at 11:32:19AM -0700, Dave Hansen wrote:
>> The protocol for node_demotion[] access and writing is not
>> standard. It has no specific locking and is intended to be read
>> locklessly. Readers must take ca
On 4/8/21 1:03 AM, Oscar Salvador wrote:
> I think this patch and patch#2 could be squashed
>
> Reviewed-by: Oscar Salvador
Yeah, that makes a lot of sense. I'll do that for the next version.
On 4/8/21 8:27 AM, Jethro Beekman wrote:
> But the native “executable format” for SGX is very clearly defined in
> the Intel SDM as a specific sequence of ECREATE, EADD, EEXTEND and
> EINIT calls. It's that sequence that's used for loading the enclave
> and it's that sequence that's used for
On 4/6/21 9:31 AM, Kirill A. Shutemov wrote:
> On Thu, Apr 01, 2021 at 02:01:15PM -0700, Dave Hansen wrote:
>>> @@ -1977,8 +1978,8 @@ static int __set_memory_enc_dec(unsigned long addr,
>>> int numpages, bool enc)
>>> struct cpa_data cpa;
>>>
On 4/6/21 9:00 AM, Kirill A. Shutemov wrote:
>>> --- a/arch/x86/mm/ioremap.c
>>> +++ b/arch/x86/mm/ioremap.c
>>> @@ -87,12 +87,12 @@ static unsigned int __ioremap_check_ram(struct resource
>>> *res)
>>> }
>>>
>>> /*
>>> - * In a SEV guest, NONE and RESERVED should not be mapped encrypted
On 4/6/21 8:54 AM, Kirill A. Shutemov wrote:
> On Thu, Apr 01, 2021 at 01:13:16PM -0700, Dave Hansen wrote:
>>> @@ -56,6 +61,9 @@ static void tdx_get_info(void)
>>>
>>> td_info.gpa_width = rcx & GENMASK(5, 0);
>>> td_info.attributes = rd
On 4/6/21 8:37 AM, Kirill A. Shutemov wrote:
> On Thu, Apr 01, 2021 at 01:06:29PM -0700, Dave Hansen wrote:
>> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
>>> From: "Kirill A. Shutemov"
>>>
>>> Intel TDX doesn't allow VMM to access
On 4/6/21 12:44 AM, David Hildenbrand wrote:
> On 02.04.21 17:26, Kirill A. Shutemov wrote:
>> TDX architecture aims to provide resiliency against confidentiality and
>> integrity attacks. Towards this goal, the TDX architecture helps enforce
>> the enabling of memory integrity for all TD-private
On 4/5/21 1:37 PM, Rick Edgecombe wrote:
> +static void __dispose_pages(struct list_head *head)
> +{
> + struct list_head *cur, *next;
> +
> + list_for_each_safe(cur, next, head) {
> + list_del(cur);
> +
> + /* The list head is stored at the start of the page */
> +
It occurred to me that I've been doing a lot of digging in the TDX spec
lately. I think we can all agree that the "Architecture Specification"
is not the world's easiest, most disgestable reading. It's hard to
figure out what the Linux relation to the spec is.
One bit of Documentation we need
On 4/2/21 2:32 PM, Andi Kleen wrote:
>> If we go this route, what are the rules and restrictions? Do we have to
>> say "no MMIO in #VE"?
>
> All we have to say is "No MMIO in #VE before getting thd TDVEINFO arguments"
> After that it can nest without problems.
Well, not exactly. You still
On 4/2/21 1:20 PM, Jethro Beekman wrote:
> On 2021-04-02 21:50, Dave Hansen wrote:
>> Again, how does this save space?
>>
>> Are you literally talking about the temporary cost of allocating *one* page?
>
> No I'm talking about the amount of disk space/network tra
On 4/2/21 12:38 PM, Jethro Beekman wrote:
> On 2021-04-02 20:42, Dave Hansen wrote:
>> On 4/2/21 11:31 AM, Jethro Beekman wrote:
>>> On 2021-04-02 17:53, Dave Hansen wrote:
>>>> But, why would an enclave loader application ever do this?
>>>
>>> e.
On 4/2/21 11:31 AM, Jethro Beekman wrote:
> On 2021-04-02 17:53, Dave Hansen wrote:
>> On 4/2/21 1:38 AM, Jethro Beekman wrote:
>>>> So, we're talking here about pages that have been EEADDED, but for
>>>> which we do not want to include the entire contents of
On 4/2/21 1:38 AM, Jethro Beekman wrote:
>> So, we're talking here about pages that have been EEADDED, but for
>> which we do not want to include the entire contents of the page?
>> Do these contents always include the beginning of the page, or can
>> the holes be anywhere?
> Holes can be
On 4/1/21 7:48 PM, Andi Kleen wrote:
>> I've heard things like "we need to harden the drivers" or "we need to do
>> audits" and that drivers might be "whitelisted".
>
> The basic driver allow listing patches are already in the repository,
> but not currently posted or complete:
>
>
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
> hosts and some physical attacks. This series adds the bare-minimum
> support to run a TDX guest. The host-side support will be submitted
> separately. Also support for
On 4/1/21 3:35 PM, Wei Xu wrote:
> A small suggestion: Given that migrate_pages() requires that
> *nr_succeeded should be initialized to 0 when it is called due to its
> use of *nr_succeeded in count_vm_events() and trace_mm_migrate_pages(),
> it would be less error-prone if migrate_pages()
On 4/1/21 1:01 PM, Yang Shi wrote:
> On Thu, Apr 1, 2021 at 11:35 AM Dave Hansen
> wrote:
>>
>>
>> From: Dave Hansen
>>
>> This is mostly derived from a patch from Yang Shi:
>>
>>
>> https://lore.kernel.org/linux-mm/15604685
On 4/1/21 3:26 PM, Sean Christopherson wrote:
> On Thu, Apr 01, 2021, Dave Hansen wrote:
>> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
>>> From: "Kirill A. Shutemov"
>>>
>>> Handle #VE due to MMIO operations. MMIO triggers #VE with EPT_VIOL
On 4/1/21 2:15 PM, Kuppuswamy, Sathyanarayanan wrote:
> On 4/1/21 2:08 PM, Dave Hansen wrote:
>> On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
>>> +bool is_tdx_guest(void)
>>> +{
>>> + return static_cpu_has(X86_FEATURE_TDX_GUEST);
>>> +}
&g
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov"
>
> TDX doesn't allow to perform DMA access to guest private memory.
> In order for DMA to work properly in TD guest, user SWIOTLB bounce
> buffers.
>
> Move AMD SEV initialization into common code and adopt for
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> +bool is_tdx_guest(void)
> +{
> + return static_cpu_has(X86_FEATURE_TDX_GUEST);
> +}
Why do you need is_tdx_guest() as opposed to calling
cpu_feature_enabled(X86_FEATURE_TDX_GUEST) everywhere?
> +int tdx_map_gpa(phys_addr_t gpa, int numpages, bool private)
> +{
> + int ret, i;
> +
> + ret = __tdx_map_gpa(gpa, numpages, private);
> + if (ret || !private)
> + return ret;
> +
> + for (i = 0; i < numpages; i++)
> + tdx_accept_page(gpa + i*PAGE_SIZE);
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov"
>
> All ioremap()ed paged that are not backed by normal memory (NONE or
> RESERVED) have to be mapped as shared.
s/paged/pages/
> +/* Make the page accesable by VMM */
> +#define pgprot_tdx_shared(prot)
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov"
>
> tdx_shared_mask() returns the mask that has to be set in page table
> entry to make page shared with VMM.
Needs to be either:
has to be set in a page table entry
or
has to be set in page table
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov"
>
> Intel TDX doesn't allow VMM to access guest memory. Any memory that is
> required for communication with VMM suppose to be shared explicitly by
s/suppose to/must/
> setting the bit in page table entry. The
On 2/5/21 3:38 PM, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov"
>
> Handle #VE due to MMIO operations. MMIO triggers #VE with EPT_VIOLATION
> exit reason.
>
> For now we only handle subset of instruction that kernel uses for MMIO
> oerations. User-space access triggers SIGBUS.
On 3/31/21 10:21 PM, Jarkko Sakkinen wrote:
> +#ifdef CONFIG_DEBUG_FS
> + debugfs_create_file("sgx_nr_all_pages", 0400, arch_debugfs_dir, NULL,
> + _nr_all_pages_fops);
> + debugfs_create_file("sgx_nr_free_pages", 0400, arch_debugfs_dir, NULL,
> +
From: Dave Hansen
When memory fills up on a node, memory contents can be
automatically migrated to another node. The biggest problems are
knowing when to migrate and to where the migration should be
targeted.
The most straightforward way to generate the "to where" list would
be
From: Dave Hansen
Reclaim-based migration is attempting to optimize data placement in
memory based on the system topology. If the system changes, so must
the migration ordering.
The implementation is conceptually simple and entirely unoptimized.
On any memory or CPU hotplug events, assume
I'm resending this because I forgot to cc the mailing lists on the
post yesterday. Sorry for the noise. Please reply to this series.
The full series is also available here:
https://github.com/hansendc/linux/tree/automigrate-20210331
which also inclues some vm.zone_reclaim_mode sysctl
From: Dave Hansen
Some method is obviously needed to enable reclaim-based migration.
Just like traditional autonuma, there will be some workloads that
will benefit like workloads with more "static" configurations where
hot pages stay hot and cold pages stay cold. If pages come a
ges()
]
Signed-off-by: Yang Shi
Signed-off-by: Dave Hansen
Reviewed-by: Yang Shi
Cc: Wei Xu
Cc: David Rientjes
Cc: Huang Ying
Cc: Dan Williams
Cc: David Hildenbrand
Cc: osalvador
--
Changes since 202010:
* remove unused scan-control 'demoted' field
---
b/include/linux/vm_event_ite
account how many pages are reclaimed (demoted) since page
reclaim behavior depends on this. Add *nr_succeeded parameter to make
migrate_pages() return how many pages are demoted successfully for all
cases.
Signed-off-by: Yang Shi
Signed-off-by: Dave Hansen
Reviewed-by: Yang Shi
Cc: Wei Xu
Cc
On 4/1/21 10:49 AM, Raoul Strackx wrote:
> On 4/1/21 6:11 PM, Dave Hansen wrote:
>> On 4/1/21 7:56 AM, Raoul Strackx wrote:
>>> SOLUTION OF THIS PATCH
>>> This patch adds a new ioctl to enable userspace to execute EEXTEND leaf
>>> functions per 256 bytes of e
From: Dave Hansen
Global reclaim aims to reduce the amount of memory used on
a given node or set of nodes. Migrating pages to another
node serves this purpose.
memcg reclaim is different. Its goal is to reduce the
total memory consumption of the entire memcg, across all
nodes. Migration
From: Dave Hansen
Prepare for the kernel to auto-migrate pages to other memory nodes
with a user defined node migration table. This allows creating single
migration target for each NUMA node to enable the kernel to do NUMA
page migrations instead of simply reclaiming colder pages. A node
From: Dave Hansen
Anonymous pages are kept on their own LRU(s). These lists could
theoretically always be scanned and maintained. But, without swap,
there is currently nothing the kernel can *do* with the results of a
scanned, sorted LRU for anonymous pages.
A check for '!total_swap_pages
From: Dave Hansen
This is mostly derived from a patch from Yang Shi:
https://lore.kernel.org/linux-mm/1560468577-101178-10-git-send-email-yang@linux.alibaba.com/
Add code to the reclaim path (shrink_page_list()) to "demote" data
to another NUMA node instead of
context *can* actually be reclaimed, given
current swap space and cgroup limits
anon_should_be_aged() is a much simpler and more preliminary check
which just says whether there is a possibility of future reclaim.
#Signed-off-by: Keith Busch
Cc: Keith Busch
Signed-off-by: Dave Hansen
Reviewed
On 4/1/21 7:56 AM, Raoul Strackx wrote:
>
> SOLUTION OF THIS PATCH
> This patch adds a new ioctl to enable userspace to execute EEXTEND leaf
> functions per 256 bytes of enclave memory. This enables enclaves to be
> build as specified by enclave providers.
I think tying the user ABI to the SGX
On 3/31/21 8:28 PM, Andi Kleen wrote:
>> The hardware (and VMMs and SEAM) have ways of telling the guest kernel
>> what is supported: CPUID. If it screws up, and the guest gets an
>> unexpected #VE, so be it.
> The main reason for disabling stuff is actually that we don't need
> to harden it. All
On 3/31/21 3:28 PM, Kuppuswamy, Sathyanarayanan wrote:
>
> On 3/31/21 3:11 PM, Dave Hansen wrote:
>> On 3/31/21 3:06 PM, Sean Christopherson wrote:
>>> I've no objection to a nice message in the #VE handler. What I'm
>>> objecting to
>>> is sanity check
On 3/31/21 3:06 PM, Sean Christopherson wrote:
> I've no objection to a nice message in the #VE handler. What I'm objecting to
> is sanity checking the CPUID model provided by the TDX module. If we don't
> trust the TDX module to honor the spec, then there are a huge pile of things
> that are
On 3/31/21 2:53 PM, Sean Christopherson wrote:
> On Wed, Mar 31, 2021, Kuppuswamy Sathyanarayanan wrote:
>> Changes since v3:
>> * WARN user if SEAM does not disable MONITOR/MWAIT instruction.
> Why bother? There are a whole pile of features that are dictated by the TDX
> module spec.
On 3/31/21 2:09 PM, Kuppuswamy Sathyanarayanan wrote:
> As per Guest-Host Communication Interface (GHCI) Specification
> for Intel TDX, sec 2.4.1, TDX architecture does not support
> MWAIT, MONITOR and WBINVD instructions. So in non-root TDX mode,
> if MWAIT/MONITOR instructions are executed with
On 3/31/21 12:14 PM, ira.we...@intel.com wrote:
> + * To protect against exceptions having access to this memory we save the
> + * current running value and sets the PKRS value to be used during the
> + * exception.
This series seems to have grown some "we's".
The preexisting pkey code was not
On 3/31/21 5:50 AM, Raoul Strackx wrote:
> The sgx driver can only load enclaves whose pages are fully measured.
> This may exclude existing enclaves from running. This patch adds a
> new ioctl to measure 256 byte chunks at a time.
The changelogs here are pretty sparse. Could you explain in a
On 3/30/21 10:56 AM, Len Brown wrote:
> On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski wrote:
>>> On Mar 30, 2021, at 10:01 AM, Len Brown wrote:
>>> Is it required (by the "ABI") that a user program has everything
>>> on the stack for user-space XSAVE/XRESTOR to get back
>>> to the state of the
On 3/30/21 8:00 AM, Andi Kleen wrote:
>>> + /* MWAIT is not supported in TDX platform, so suppress it */
>>> + setup_clear_cpu_cap(X86_FEATURE_MWAIT);
>> In fact, MWAIT bit returned by CPUID instruction is zero for TD guest. This
>> is enforced by SEAM module.
> Good point.
>> Do we still need
On 3/29/21 4:16 PM, Kuppuswamy Sathyanarayanan wrote:
> In non-root TDX guest mode, MWAIT, MONITOR and WBINVD instructions
> are not supported. So handle #VE due to these instructions
> appropriately.
This misses a key detail:
"are not supported" ... and other patches have prevented a
On 3/29/21 3:09 PM, Kuppuswamy, Sathyanarayanan wrote:
> + case EXIT_REASON_MWAIT_INSTRUCTION:
> + /* MWAIT is supressed, not supposed to reach here. */
> + WARN(1, "MWAIT unexpected #VE Exception\n");
> + return -EFAULT;
How is MWAIT
On 3/29/21 2:55 PM, Kuppuswamy, Sathyanarayanan wrote:
>>
>> MONITOR is a privileged instruction, right? So we can only end up in
>> here if the kernel screws up and isn't reading CPUID correctly, right?
>>
>> That dosen't seem to me like something we want to suppress. This needs
>> a warning,
On 3/29/21 10:45 AM, Marco Elver wrote:
> On Mon, 29 Mar 2021 at 19:32, Dave Hansen wrote:
> Doing it to all CPUs is too expensive, and we can tolerate this being
> approximate (nothing bad will happen, KFENCE might just miss a bug and
> that's ok).
...
>> BTW,
1 - 100 of 9736 matches
Mail list logo