Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-05-06 Thread Nataliia Bondarevska
Tested-by: Nataliia Bondarevska 




On Fri, May 2, 2025 at 1:56 AM Jarkko Sakkinen  wrote:
>
> On Fri, 2025-05-02 at 07:22 +, Reshetova, Elena wrote:
> >
> > >
> > > On Wed, Apr 30, 2025 at 06:53:32AM +, Reshetova, Elena wrote:
> > > > 2. Switch to Sean's approach to execute EUPDATESVN during the
> > > sgx_open().
> > > > Btw, Sean do you agree that we don't gain much doing it second
> > > > time during
> > > > release() given the microcode flow?
> > > > I would rather leave only one invocation of eupdatesvn during
> > > sgx_inc_usage_count().
> > > >
> > > > Proc: No new uABI. More predictable on svn change compared to
> > > > option 1.
> > >
> > > > Cons: Two explicit paths to hook: sgx_open() and sgx_vepc_open().
> > >
> > > Why this is a con?
> >
> > Well, just from the pov of not having a single path to enable.
> > Are you ok with option 2?
> >
>
> Yep, as SGX is anyway very much run-time managed feature and these
> hooks fit better on how it is used.
>
> > Best Regards,
> > Elena.
>
> BR, Jarkko
>



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-05-02 Thread Jarkko Sakkinen
On Fri, 2025-05-02 at 07:22 +, Reshetova, Elena wrote:
> 
> > 
> > On Wed, Apr 30, 2025 at 06:53:32AM +, Reshetova, Elena wrote:
> > > 2. Switch to Sean's approach to execute EUPDATESVN during the
> > sgx_open().
> > > Btw, Sean do you agree that we don't gain much doing it second
> > > time during
> > > release() given the microcode flow?
> > > I would rather leave only one invocation of eupdatesvn during
> > sgx_inc_usage_count().
> > > 
> > > Proc: No new uABI. More predictable on svn change compared to
> > > option 1.
> > 
> > > Cons: Two explicit paths to hook: sgx_open() and sgx_vepc_open().
> > 
> > Why this is a con?
> 
> Well, just from the pov of not having a single path to enable. 
> Are you ok with option 2? 
> 

Yep, as SGX is anyway very much run-time managed feature and these
hooks fit better on how it is used.

> Best Regards,
> Elena.

BR, Jarkko




RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-05-02 Thread Reshetova, Elena

> 
> On Wed, Apr 30, 2025 at 06:53:32AM +, Reshetova, Elena wrote:
> > 2. Switch to Sean's approach to execute EUPDATESVN during the
> sgx_open().
> > Btw, Sean do you agree that we don't gain much doing it second time during
> > release() given the microcode flow?
> > I would rather leave only one invocation of eupdatesvn during
> sgx_inc_usage_count().
> >
> > Proc: No new uABI. More predictable on svn change compared to option 1.
> 
> > Cons: Two explicit paths to hook: sgx_open() and sgx_vepc_open().
> 
> Why this is a con?

Well, just from the pov of not having a single path to enable. 
Are you ok with option 2? 

Best Regards,
Elena.


Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-30 Thread Jarkko Sakkinen
On Wed, Apr 30, 2025 at 06:53:32AM +, Reshetova, Elena wrote:
> 2. Switch to Sean's approach to execute EUPDATESVN during the sgx_open().
> Btw, Sean do you agree that we don't gain much doing it second time during
> release() given the microcode flow?
> I would rather leave only one invocation of eupdatesvn during 
> sgx_inc_usage_count().
> 
> Proc: No new uABI. More predictable on svn change compared to option 1.

> Cons: Two explicit paths to hook: sgx_open() and sgx_vepc_open(). 

Why this is a con?

BR, Jarkko



RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-29 Thread Reshetova, Elena
> On 4/29/25 04:44, Reshetova, Elena wrote:
> >> On 4/25/25 14:58, Sean Christopherson wrote:
> >>> On Fri, Apr 25, 2025, Dave Hansen wrote:
>  On 4/25/25 14:04, Sean Christopherson wrote:
> > Userspace is going to be waiting on ->release() no matter what.
>  Unless it isn't even involved and it happens automatically.
> >>> With my Google hat on: no thanks.
> >> I'm completely open to the idea of adding transparency so that folks can
> >> tell when the SVN increments. I'm mostly open to having a new ABI to do
> >> it explicitly in addition to doing it implicitly.]
> > Could you please clarify here Dave what ABI do you have in mind?
> 
> I should have said I'm open to *eventually* adding features and new ABI
> to prod at the SVN state.
> 
> Not now.
> 

OK, so in what direction should I prepare and send a proper v4?
Here are the options as I understand them:

1. Keep the current approach (with all suggested fixes) to execute EUPDATESVN
during first EPC page creation. 

Proc: Single flow inside EPC page allocation. No new uABI. 
Const: Rejecting EPC allocation if EUPDATESVN fails breaks existing behaviour
(note this can be change to original version of the patch where eupdatesvn
failures are ignored and EPC allocation can proceed normally)
More unpredictability to svn change compared to option 2. 

2. Switch to Sean's approach to execute EUPDATESVN during the sgx_open().
Btw, Sean do you agree that we don't gain much doing it second time during
release() given the microcode flow?
I would rather leave only one invocation of eupdatesvn during 
sgx_inc_usage_count().

Proc: No new uABI. More predictable on svn change compared to option 1.
Cons: Two explicit paths to hook: sgx_open() and sgx_vepc_open(). 

3. explicit uAPI to execute this

Proc: Allows clear explicit, targeted action to execute eupdatesvn, aligned with
instruction definition. 
Cons: deemed not acceptable to create new uABI for this usecase.  

I am personally learning towards option 2 (but without release flow).
Opinions? Sean? Dave? Jarkko? 

Best Regards,
Elena.


Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-29 Thread Dave Hansen
On 4/29/25 04:44, Reshetova, Elena wrote:
>> On 4/25/25 14:58, Sean Christopherson wrote:
>>> On Fri, Apr 25, 2025, Dave Hansen wrote:
 On 4/25/25 14:04, Sean Christopherson wrote:
> Userspace is going to be waiting on ->release() no matter what.
 Unless it isn't even involved and it happens automatically.
>>> With my Google hat on: no thanks.
>> I'm completely open to the idea of adding transparency so that folks can
>> tell when the SVN increments. I'm mostly open to having a new ABI to do
>> it explicitly in addition to doing it implicitly.]
> Could you please clarify here Dave what ABI do you have in mind? 

I should have said I'm open to *eventually* adding features and new ABI
to prod at the SVN state.

Not now.





RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-29 Thread Reshetova, Elena
 
> On 4/25/25 14:58, Sean Christopherson wrote:
> > On Fri, Apr 25, 2025, Dave Hansen wrote:
> >> On 4/25/25 14:04, Sean Christopherson wrote:
> >>> Userspace is going to be waiting on ->release() no matter what.
> >> Unless it isn't even involved and it happens automatically.
> > With my Google hat on: no thanks.
> 
> I'm completely open to the idea of adding transparency so that folks can
> tell when the SVN increments. I'm mostly open to having a new ABI to do
> it explicitly in addition to doing it implicitly.]

Could you please clarify here Dave what ABI do you have in mind? 





RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-27 Thread Reshetova, Elena
> So then why on earth is the kernel implementing automatic updates?  I read
> back
> through most of the cover letters, and IIUC, we went straight from "destroy 
> all
> enclaves and force an update" to "blindly try to do EUPDATESVN every time
> the
> number of enclaves goes from 0=>1".  Those are essentially the two most
> extreme
> options.
> 
> Even worse, rejecting enclave creation on EUPDATESVN failure represents an
> ABI
> change, i.e. could break existing setups.
> 
> Why not simply add an ioctl() or sysfs knob to let userspace trigger
> EUPDATESVN?

Just for the record, this was my initial proposal on how to do this :)
So, I personally agree with this line of thinking fully. 

Best Regards,
Elena.


Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Dave Hansen
On 4/25/25 14:58, Sean Christopherson wrote:
> On Fri, Apr 25, 2025, Dave Hansen wrote:
>> On 4/25/25 14:04, Sean Christopherson wrote:
>>> Userspace is going to be waiting on ->release() no matter what.
>> Unless it isn't even involved and it happens automatically.
> With my Google hat on: no thanks.

I'm completely open to the idea of adding transparency so that folks can
tell when the SVN increments. I'm mostly open to having a new ABI to do
it explicitly in addition to doing it implicitly.]

> Coupled with adding latency to launching the 0=>1 enclave, just to
> handle something that happens a few times per year, and I don't see
> any value in automatic updates. Maybe it sounds nice on paper, but
> from my perspective, I see nothing but pain.
I'm not worried about the latency until I see numbers. If the number
show it's horrible, then we either change course or go yell at the folks
who wrote the xucode.



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Sean Christopherson
On Fri, Apr 25, 2025, Dave Hansen wrote:
> On 4/25/25 14:04, Sean Christopherson wrote:
> > Userspace is going to be waiting on ->release() no matter what.
> 
> Unless it isn't even involved and it happens automatically.

With my Google hat on: no thanks.

Customer:   Hey Google, why haven't you applied security update XYZ?
Support:We have.
Customer:   The SVN in my attestation report says otherwise.
Support:Let me check with engineering.
TDX team:   We applied the ucode update provided by platforms.  Platforms, 
what's up?
Platforms:  That's the right ucode patch.
TDX team:   Hmm, the kernel is supposed to update the SVN.  Let's bug the 
kernel team.
Me: Have you guaranteed there are no active enclaves after the update?
TDX team:   Yep.
Me: 
Me: Our theory is that enclaves haven't been fully destroyed when the
hold is lifted.  Try adding a delay?  Maybe 1s?
TDX team:   That helped, but we still have intermittent failures.
Me: How about 5 seconds?
TDX team:   Great, that worked!
Support:Sorry for the delay, we're rolling out a fix, you should see the 
correct
SVN shortly.

Customer:   Hey Google, my TDX VMs are stalled for 5 seconds during boot.
Support:Let me check with engineering...


Is that likely to happen?  No.  Is a delay of multiple seconds likely?  Also no.
But it's not that far fetched.  And if something does go sideways, e.g. an EPC
page gets leaked, or enclave FD gets orphaned and left opened, etc., then I 
would
much, much prefer that the issue be visible to userspace.  Things going sideways
is inevitable; being able to take action when badness happens makes a world of
difference.

Coupled with adding latency to launching the 0=>1 enclave, just to handle 
something
that happens a few times per year, and I don't see any value in automatic 
updates.
Maybe it sounds nice on paper, but from my perspective, I see nothing but pain.



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Dave Hansen
On 4/25/25 14:04, Sean Christopherson wrote:
> Userspace is going to be waiting on ->release() no matter what.

Unless it isn't even involved and it happens automatically.





Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Sean Christopherson
On Fri, Apr 25, 2025, Dave Hansen wrote:
> On 4/25/25 12:29, Sean Christopherson wrote:
> > --- a/arch/x86/kernel/cpu/sgx/virt.c
> > +++ b/arch/x86/kernel/cpu/sgx/virt.c
> > @@ -255,6 +255,7 @@ static int sgx_vepc_release(struct inode *inode, struct 
> > file *file)
> > xa_destroy(&vepc->page_array);
> > kfree(vepc);
> >  
> > +   sgx_dec_usage_count();
> > return 0;
> >  }
> 
> ->release() is not close(). Userspace doesn't have control over when
> release() gets called, so it's a poor thing to say: "wait until all SGX
> struct files have been released, then do EUPDATESVN". At least that's
> what folks have always told me when I went poking around the VFS.
> 
> That alone would make this a non-starter.

And what frees all the EPC pages?  Hint: it's starts with an 'r'.

Userspace is going to be waiting on ->release() no matter what.  There's no 
getting
around that, all enclaves and virtual EPC instances must be fully destroyed to
ensure the EPC is empty.

At least with this approach, userspace can know that the EPC is "busy", whereas
with automatic updates, userspace has zero visibility.  The only breadcrumb it
would get is the SVN, which presumably can only be retrieved from within an 
encave.
But even if userspace can easily read the SVN, unless userspace has a priori
knowledge of the SVN it expects, userspace has no way of knowing if the SVN 
isn't
changing because no update was required, or if the SVN isn't changing because
the kernel can't find a window where there's no active EPC page.

Exposing -EBUSY to userspace gives it a variety of options.  E.g. retry N times,
with M delay, and then force a reboot if all else fails.

If we wanted to be more explicit, it would be trivial to add a getter, but IMO
that's completely unnecessary, as it's fully redundant with the errno from the
setter.

E.g.

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index e8fcf032e842..4dc95d59c11f 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -958,14 +958,26 @@ static int sgx_update_svn(const char *buffer, const 
struct kernel_param *kp)
}
 }
 
+static int sgx_can_update_svn(char *buffer, const struct kernel_param *kp)
+{
+   if (!sgx_has_eupdatesvn)
+   return sysfs_emit(buffer, "unsupported\n");
+
+   if (atomic_read(&sgx_usage_count))
+   return sysfs_emit(buffer, "busy\n");
+
+   return sysfs_emit(buffer, "ready\n");
+}
+
 static const struct kernel_param_ops sgx_update_svn_ops = {
.set = sgx_update_svn,
+   .get = sgx_can_update_svn,
 };
 
 #undef MODULE_PARAM_PREFIX
 #define MODULE_PARAM_PREFIX "sgx."
-device_param_cb(update_svn, &sgx_update_svn_ops, NULL, 0200);
-__MODULE_PARM_TYPE(update_svn, "bool");
+device_param_cb(update_svn, &sgx_update_svn_ops, NULL, 0644);
+__MODULE_PARM_TYPE(update_svn, "int");
 
 static int __init sgx_init(void)
 {




Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Dave Hansen
On 4/25/25 12:29, Sean Christopherson wrote:
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -255,6 +255,7 @@ static int sgx_vepc_release(struct inode *inode, struct 
> file *file)
>   xa_destroy(&vepc->page_array);
>   kfree(vepc);
>  
> + sgx_dec_usage_count();
>   return 0;
>  }

->release() is not close(). Userspace doesn't have control over when
release() gets called, so it's a poor thing to say: "wait until all SGX
struct files have been released, then do EUPDATESVN". At least that's
what folks have always told me when I went poking around the VFS.

That alone would make this a non-starter.



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Sean Christopherson
On Fri, Apr 25, 2025, Dave Hansen wrote:
> On 4/25/25 10:40, Sean Christopherson wrote:
> > So then why on earth is the kernel implementing automatic updates?
> 
> Because it's literally the least amount of code

It's literally not.

This series:

 4 files changed, 104 insertions(+), 20 deletions(-)

The below:

 7 files changed, 94 insertions(+), 15 deletions(-)

And that's not counting the extra code need to return something other than 
-ENOMEM
when EUPDATESVN fails.

> and doesn't create any new ABI.

It doesn't create new uAPI, but it certainly creates new ABI.  The SVN is 
visible
to both userspace, and when VMs are in play, the guest and the end customer.
Before this, userspace must explicitly reboot for the SVN to change.  After 
this,
the SVN will change at a completely opaque time.  Userspace can try to coerce
the kernel into performing an update, but it's not guaranteed to happen.

Even worse, rejecting EPC allocation if EUPDATESVN fails risks breaking existing
setups.

I verified writing the param works as expected, but otherwise untested.

--
From: Sean Christopherson 
Date: Fri, 25 Apr 2025 11:56:08 -0700
Subject: [PATCH] x86/sgx: Implement EUPDATESVN via "module" param

Add support for a new SGX ENCLS instruction, EUPDATESVN, which updates
the Security Version Number (SVN) that's capture in attestation reports.
To succeed, EUPDATESVN requires that there be no active pages in any EPC
section, i.e. there cannot be active enclaves.

Give userspace full control over when updates are attempted, as updates
are typically required only after a corresponding microcode update, e.g.
on the order of every few months or so, and EUPDATESVN is quite slow even
when the SVN isn't changing, i.e. triggering updates automatically isn't
desirable as doing so would negatively impact workloads that spawn enclaves
on-demand.

To ensure no EPC pages are active without introducing overhead in EPC
allocation, which is a hot path, count the number of active enclaves or
virtual EPC instances, and only attempt an update if there are no such
users.  Alternatively, the kernel could track the number of EPC pages that
are allocated,

Note, if the kernel or hardware is buggy and leaks or fails to reap an
EPC page, EUPDATESVN will unexpectedly fail.  But there's nothing that
the update flow can do to resolve such problems.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/sgx.h   | 41 +++-
 arch/x86/kernel/cpu/sgx/driver.c |  1 +
 arch/x86/kernel/cpu/sgx/encl.c   |  2 ++
 arch/x86/kernel/cpu/sgx/encls.h  |  6 
 arch/x86/kernel/cpu/sgx/main.c   | 54 
 arch/x86/kernel/cpu/sgx/sgx.h|  3 ++
 arch/x86/kernel/cpu/sgx/virt.c   |  2 ++
 7 files changed, 94 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
index 6a0069761508..4329187e085f 100644
--- a/arch/x86/include/asm/sgx.h
+++ b/arch/x86/include/asm/sgx.h
@@ -26,23 +26,26 @@
 #define SGX_CPUID_EPC_SECTION  0x1
 /* The bitmask for the EPC section type. */
 #define SGX_CPUID_EPC_MASK GENMASK(3, 0)
+/* EUPDATESVN support in CPUID.0x12.0.EAX */
+#define SGX_CPUID_EUPDATESVN   BIT(10)
 
 enum sgx_encls_function {
-   ECREATE = 0x00,
-   EADD= 0x01,
-   EINIT   = 0x02,
-   EREMOVE = 0x03,
-   EDGBRD  = 0x04,
-   EDGBWR  = 0x05,
-   EEXTEND = 0x06,
-   ELDU= 0x08,
-   EBLOCK  = 0x09,
-   EPA = 0x0A,
-   EWB = 0x0B,
-   ETRACK  = 0x0C,
-   EAUG= 0x0D,
-   EMODPR  = 0x0E,
-   EMODT   = 0x0F,
+   ECREATE = 0x00,
+   EADD= 0x01,
+   EINIT   = 0x02,
+   EREMOVE = 0x03,
+   EDGBRD  = 0x04,
+   EDGBWR  = 0x05,
+   EEXTEND = 0x06,
+   ELDU= 0x08,
+   EBLOCK  = 0x09,
+   EPA = 0x0A,
+   EWB = 0x0B,
+   ETRACK  = 0x0C,
+   EAUG= 0x0D,
+   EMODPR  = 0x0E,
+   EMODT   = 0x0F,
+   EUPDATESVN  = 0x18,
 };
 
 /**
@@ -73,6 +76,11 @@ enum sgx_encls_function {
  * public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_PAGE_NOT_MODIFIABLE:   The EPC page cannot be modified because it
  * is in the PENDING or MODIFIED state.
+ * %SGX_INSUFFICIENT_ENTROPY:  Insufficient entropy in RNG.
+ * %SGX_EPC_NOT_READY: EPC is not ready for SVN update.
+ * %SGX_NO_UPDATE: EUPDATESVN was successful, but CPUSVN was not
+ * updated because current SVN was not newer than
+ * CPUSVN.
  * %SGX_UNMASKED_EVENT:An unmasked event, e.g. INTR, was 
received
  */
 enum sgx_return_code {
@@ -81,6 +89,9 @@ enum sgx_return_code {
SGX_CHILD_PRESENT   = 13,
SGX_INVALID_EINITTOKEN  = 16,
SGX_PAGE_NOT_MODIFIABLE = 20,
+   SGX

Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Dave Hansen
On 4/25/25 10:40, Sean Christopherson wrote:
> So then why on earth is the kernel implementing automatic updates?

Because it's literally the least amount of code and doesn't create any
new ABI.

> I read back through most of the cover letters, and IIUC, we went
> straight from "destroy all enclaves and force an update" to "blindly
> try to do EUPDATESVN every time the number of enclaves goes from
> 0=>1".  Those are essentially the two most extreme options.
I'm sure we can think of a bunch more extreme things. How about after
every ENCLS? ;)



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-25 Thread Sean Christopherson
On Fri, Apr 25, 2025, Elena Reshetova wrote:
> > On Thu, Apr 24, 2025, Elena Reshetova wrote:
> > Userspace generally won't care about a 10us delay when destroying a
> > process, but a 10us delay to launch an enclave could be quite problematic,
> > e.g. in the TDX use case where enclaves may be launched on-demand in
> > response to a guest attestation request.
> 
> Ok, but in this case, you are hooking both: create and release. 
> I guess your line of thinking is that in most of the cases it will be handled 
> by
> a release flow when destroying enclaves, but in cases we happen to need
> to update a machine which doesn’t have a single enclave yet, the create flow
> will be used. The problem is that if you look at the instruction definition, 
> we won't save too much when executing this in release flow (Kai I think 
> already pointed
> this out):
> 
> IF (Other instruction is accessing EPC) THEN
>  RFLAGS.ZF := 1
>  RAX := SGX_LOCKFAIL;
>  GOTO ERROR_EXIT;
> FI
> (* Verify EPC is ready *)
> IF (the EPC contains any valid pages) THEN
>  RFLAGS.ZF := 1;
>  RAX := SGX_EPC_NOT_READY;
>  GOTO ERROR_EXIT;
> FI
> (* Refresh paging key *)  
> IF (NOT RDSEED(&TMP_KEY, 16)) THEN
>  RFLAGS.ZF := 1;
>  RAX := SGX_INSUFFICIENT_ENTROPY;
> GOTO ERROR_EXIT;
> FI
> (* Commit *)
> CR_BASE_KEY := TMP_KEY;
> TMP_CPUSVN := CR_CPUSVN;
> (* Update CPUSVN to current minimum patch even if locked *)
> (* Determine if info status is needed *)
> 
> 
> All above would be done anyhow on create even if we successfully
> executed it on release previously (( And then finally we go into:

Ah, so the slow part happens no matter what.

> IF (TMP_CPUSVN = CR_CPUSVN) THEN
>  RFLAGS.CF := 1;
>  RAX := SGX_NO_UPDATE;
> FI
> ERROR_EXIT:
> 
> > 
> > If the update time is tiny, then I agree that hooking release would 
> > probably do
> > more harm than good.
> 
> So, it is not that the time is tiny, it is like we will do it twice,
> unnecessary and potentially quite long in both cases (taking RDSEED step into
> account).
> 
> The reason why the instruction is defined this way is that it was not
> intended originally to be inserted into some existing SGX flows, but was
> envisioned to be standalone cmd for the host orchestrator to execute once it
> thinks system is ready. 

So then why on earth is the kernel implementing automatic updates?  I read back
through most of the cover letters, and IIUC, we went straight from "destroy all
enclaves and force an update" to "blindly try to do EUPDATESVN every time the
number of enclaves goes from 0=>1".  Those are essentially the two most extreme
options.

Even worse, rejecting enclave creation on EUPDATESVN failure represents an ABI
change, i.e. could break existing setups.

Why not simply add an ioctl() or sysfs knob to let userspace trigger EUPDATESVN?
If there are pages in the EPC, return -EBUSY.  That would give userspace full
control of the update policy, and would allow for a super simple implementation
in the kernel.  Userspace should darn well know when it's an appropriate time to
do an SVN update, e.g. after userspace has initiated a ucode update, 
periodically
if it wants to, after the last TDX VM is destroyed, etc.

Assuming TDX module updates are coming down the pipe, with my KVM maintainer hat
on, that is exactly how I will "request" the module be updated.  Userspace 
initiates
the update and is therefore responsible for knowing when to do (or not do) an 
update.
The kernel's job is purely to do the actual update and ensure 
correctness/safety,
e.g. reject the module update if there are active TDX VMs.

That is also how SEV firmware updates are being implemented.  Userspace is
responsible for stopping any VM types that aren't compatible with conccurent
updates.

I see no reason to do something different for SGX.



RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-24 Thread Reshetova, Elena
> On Thu, Apr 24, 2025, Elena Reshetova wrote:
> > > On Thu, Apr 24, 2025, Elena Reshetova wrote:
> > > +void sgx_dec_usage_count(void)
> > > +{
> > > + if (atomic_dec_return(&sgx_usage_count))
> > > + return;
> > > +
> > > + guard(mutex)(&sgx_svn_lock);
> > > +
> > > + if (atomic_read(&sgx_usage_count))
> > > + return;
> > > +
> > > + sgx_update_svn();
> >
> > Why do we want to try to execute this on release also?  I would think that
> > doing this in sgx_inc_usage_count() is enough.
> 
> I assume an actual SVN update takes some amount of time?

Yes, correct, it is not a fast action and can be even slower if we are
running out of entropy and have to retry.

> 
> If that's correct, then doing the work upon destroying the last enclave is
> desirable,
> as it's less likely to introduce delay that negatively affects userspace.
> Userspace
> generally won't care about a 10us delay when destroying a process, but a
> 10us delay
> to launch an enclave could be quite problematic, e.g. in the TDX use case
> where
> enclaves may be launched on-demand in response to a guest attestation
> request.

Ok, but in this case, you are hooking both: create and release. 
I guess your line of thinking is that in most of the cases it will be handled by
a release flow when destroying enclaves, but in cases we happen to need
to update a machine which doesn’t have a single enclave yet, the create flow
will be used. The problem is that if you look at the instruction definition, 
we won't save too much when executing this in release flow (Kai I think already 
pointed
this out):

IF (Other instruction is accessing EPC) THEN
 RFLAGS.ZF := 1
 RAX := SGX_LOCKFAIL;
 GOTO ERROR_EXIT;
FI
(* Verify EPC is ready *)
IF (the EPC contains any valid pages) THEN
 RFLAGS.ZF := 1;
 RAX := SGX_EPC_NOT_READY;
 GOTO ERROR_EXIT;
FI
(* Refresh paging key *)  
IF (NOT RDSEED(&TMP_KEY, 16)) THEN
 RFLAGS.ZF := 1;
 RAX := SGX_INSUFFICIENT_ENTROPY;
GOTO ERROR_EXIT;
FI
(* Commit *)
CR_BASE_KEY := TMP_KEY;
TMP_CPUSVN := CR_CPUSVN;
(* Update CPUSVN to current minimum patch even if locked *)
(* Determine if info status is needed *)


All above would be done anyhow on create even if we successfully
executed it on release previously (( And then finally we go into:

IF (TMP_CPUSVN = CR_CPUSVN) THEN
 RFLAGS.CF := 1;
 RAX := SGX_NO_UPDATE;
FI
ERROR_EXIT:

> 
> If the update time is tiny, then I agree that hooking release would probably 
> do
> more harm than good.

So, it is not that the time is tiny, it is like we will do it twice, unnecessary
and potentially quite long in both cases (taking RDSEED step into account).

The reason why the instruction is defined this way is that it was not intended 
originally
to be inserted into some existing SGX flows, but was envisioned to be 
standalone cmd
for the host orchestrator to execute once it thinks system is ready. 

Best Regards,
Elena.


Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-24 Thread Sean Christopherson
On Thu, Apr 24, 2025, Elena Reshetova wrote:
> > On Thu, Apr 24, 2025, Elena Reshetova wrote:
> > +void sgx_dec_usage_count(void)
> > +{
> > +   if (atomic_dec_return(&sgx_usage_count))
> > +   return;
> > +
> > +   guard(mutex)(&sgx_svn_lock);
> > +
> > +   if (atomic_read(&sgx_usage_count))
> > +   return;
> > +
> > +   sgx_update_svn();
> 
> Why do we want to try to execute this on release also?  I would think that
> doing this in sgx_inc_usage_count() is enough.

I assume an actual SVN update takes some amount of time?

If that's correct, then doing the work upon destroying the last enclave is 
desirable,
as it's less likely to introduce delay that negatively affects userspace.  
Userspace
generally won't care about a 10us delay when destroying a process, but a 10us 
delay
to launch an enclave could be quite problematic, e.g. in the TDX use case where
enclaves may be launched on-demand in response to a guest attestation request.

If the update time is tiny, then I agree that hooking release would probably do
more harm than good.



RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-24 Thread Reshetova, Elena
> On Thu, Apr 24, 2025, Elena Reshetova wrote:
> > > On Tue, Apr 22, 2025, Kai Huang wrote:
> > > > On Fri, 2025-04-18 at 07:55 -0700, Sean Christopherson wrote:
> > > > > On Tue, Apr 15, 2025, Elena Reshetova wrote:
> > > > > That said, handling this deep in the bowels of EPC page allocation
> seems
> > > > > unnecessary.  The only way for there to be no active EPC pages is if
> > > > > there are no enclaves.  As above, virtual EPC usage is already all but
> > > > > guaranteed to hit false positives, so I don't think there's anything
> > > > > gained by trying to update the SVN based on EPC allocations.
> > > > >
> > > > > So rather than react to EPC allocations, why not hook sgx_open() and
> > > sgx_vepc_open()?
> > > >
> > > > The only thing I don't like about this is we need to hook both of them.
> > >
> > > And having to maintain a separate counter.
> 
> ...
> 
> > If we follow the approach of trying to execute EUPDATESVN via
> > sgx_open() and sgx_vepc_open() paths, it adds more complexity to kernel
> > code
> 
> This is where I disagree.  I don't see how it's more complex even on the
> surface,
> and when you start considering the implications of "randomly" inserting a
> non-
> trivial operation into EPC allocation, IMO it's far less complex overall.

Your code below looks clean enough, so I agree now. I was afraid it would
turn into more complexity.
 
> 
> > and imo it still doesn’t remove the complexity from userspace
> > orchestration sw. I.e. userspace still has to get rid of host enclaves and
> > SGX enabled VMs (because syncing with VMs owners to make sure their
> > encalves are destroyed and these VMs are ready for EUDPATESVN seems
> > like a big organizational complexity and error prone).
> 
> Yeah, I don't see a way around that.
> 
> > So, the only thing this approach would address would be an EPC
> > pre-allocation done by qemu? Wouldn't it be more reasonable
> > (here I am purely speculating, I dont know qemu code) to implement
> > in qemu the code to drop EPC pre-allocation if no VMs with SGX are
> > running? That would be a good overall policy imo not to waste EPC
> > space when not needed in practice.
> 
> QEMU only preallocates when the VM is being created, i.e. QEMU doesn't
> maintain
> a persistent pool.  All I was saying is that userspace needs to shut down SGX
> capable VMs, even if the VM isn't actively running enclaves, which is a shame.

OK, now we are on the same page then. Sorry I misunderstood your comment
about qemu preallocation. 

> 
> Untested sketch of hooking create/delete to do SVN updates.

Thank you very much, I can give this a try. 
Jarkko does this new approach looks good to you on the high level? 

One question though on details, see below inline. 

> 
> ---
>  arch/x86/kernel/cpu/sgx/driver.c |  4 
>  arch/x86/kernel/cpu/sgx/encl.c   |  2 ++
>  arch/x86/kernel/cpu/sgx/main.c   | 34 
>  arch/x86/kernel/cpu/sgx/sgx.h|  3 +++
>  arch/x86/kernel/cpu/sgx/virt.c   |  6 ++
>  5 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c
> b/arch/x86/kernel/cpu/sgx/driver.c
> index 7f8d1e11dbee..669e44d61f9f 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file *file)
>   struct sgx_encl *encl;
>   int ret;
> 
> + ret = sgx_inc_usage_count();
> + if (ret)
> + return ret;
> +
>   encl = kzalloc(sizeof(*encl), GFP_KERNEL);
>   if (!encl)
>   return -ENOMEM;
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 279148e72459..84ca78627e55 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -765,6 +765,8 @@ void sgx_encl_release(struct kref *ref)
>   WARN_ON_ONCE(encl->secs.epc_page);
> 
>   kfree(encl);
> +
> + sgx_dec_usage_count();
>  }
> 
>  /*
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 8ce352fc72ac..ca74c91d4291 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -914,6 +914,40 @@ int sgx_set_attribute(unsigned long
> *allowed_attributes,
>  }
>  EXPORT_SYMBOL_GPL(sgx_set_attribute);
> 
> +static atomic_t sgx_usage_count;
> +static DEFINE_MUTEX(sgx_svn_lock);
> +
> +static int sgx_update_svn(void)
> +{
> + // blah blah blah
> +}
> +
> +int sgx_inc_usage_count(void)
> +{
> + if (atomic_inc_not_zero(&sgx_usage_count))
> + return 0;
> +
> + guard(mutex)(&sgx_svn_lock);
> +
> + if (atomic_inc_not_zero(&sgx_usage_count))
> + return 0;
> +
> + return sgx_update_svn();
> +}
> +
> +void sgx_dec_usage_count(void)
> +{
> + if (atomic_dec_return(&sgx_usage_count))
> + return;
> +
> + guard(mutex)(&sgx_svn_lock);
> +
> + if (atomic_read(&sgx_usage_count))
> + return;
> +
> + sgx_update_svn();

Why do we want to try to

Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-24 Thread Sean Christopherson
On Thu, Apr 24, 2025, Elena Reshetova wrote:
> > On Tue, Apr 22, 2025, Kai Huang wrote:
> > > On Fri, 2025-04-18 at 07:55 -0700, Sean Christopherson wrote:
> > > > On Tue, Apr 15, 2025, Elena Reshetova wrote:
> > > > That said, handling this deep in the bowels of EPC page allocation seems
> > > > unnecessary.  The only way for there to be no active EPC pages is if
> > > > there are no enclaves.  As above, virtual EPC usage is already all but
> > > > guaranteed to hit false positives, so I don't think there's anything
> > > > gained by trying to update the SVN based on EPC allocations.
> > > >
> > > > So rather than react to EPC allocations, why not hook sgx_open() and
> > sgx_vepc_open()?
> > >
> > > The only thing I don't like about this is we need to hook both of them.
> > 
> > And having to maintain a separate counter.

...

> If we follow the approach of trying to execute EUPDATESVN via 
> sgx_open() and sgx_vepc_open() paths, it adds more complexity to kernel
> code 

This is where I disagree.  I don't see how it's more complex even on the 
surface,
and when you start considering the implications of "randomly" inserting a non-
trivial operation into EPC allocation, IMO it's far less complex overall.

> and imo it still doesn’t remove the complexity from userspace
> orchestration sw. I.e. userspace still has to get rid of host enclaves and 
> SGX enabled VMs (because syncing with VMs owners to make sure their
> encalves are destroyed and these VMs are ready for EUDPATESVN seems
> like a big organizational complexity and error prone).

Yeah, I don't see a way around that.

> So, the only thing this approach would address would be an EPC
> pre-allocation done by qemu? Wouldn't it be more reasonable
> (here I am purely speculating, I dont know qemu code) to implement
> in qemu the code to drop EPC pre-allocation if no VMs with SGX are
> running? That would be a good overall policy imo not to waste EPC
> space when not needed in practice. 

QEMU only preallocates when the VM is being created, i.e. QEMU doesn't maintain
a persistent pool.  All I was saying is that userspace needs to shut down SGX
capable VMs, even if the VM isn't actively running enclaves, which is a shame.

Untested sketch of hooking create/delete to do SVN updates.

---
 arch/x86/kernel/cpu/sgx/driver.c |  4 
 arch/x86/kernel/cpu/sgx/encl.c   |  2 ++
 arch/x86/kernel/cpu/sgx/main.c   | 34 
 arch/x86/kernel/cpu/sgx/sgx.h|  3 +++
 arch/x86/kernel/cpu/sgx/virt.c   |  6 ++
 5 files changed, 49 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 7f8d1e11dbee..669e44d61f9f 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file *file)
struct sgx_encl *encl;
int ret;
 
+   ret = sgx_inc_usage_count();
+   if (ret)
+   return ret;
+
encl = kzalloc(sizeof(*encl), GFP_KERNEL);
if (!encl)
return -ENOMEM;
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..84ca78627e55 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -765,6 +765,8 @@ void sgx_encl_release(struct kref *ref)
WARN_ON_ONCE(encl->secs.epc_page);
 
kfree(encl);
+
+   sgx_dec_usage_count();
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8ce352fc72ac..ca74c91d4291 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -914,6 +914,40 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
 
+static atomic_t sgx_usage_count;
+static DEFINE_MUTEX(sgx_svn_lock);
+
+static int sgx_update_svn(void)
+{
+   // blah blah blah
+}
+
+int sgx_inc_usage_count(void)
+{
+   if (atomic_inc_not_zero(&sgx_usage_count))
+   return 0;
+
+   guard(mutex)(&sgx_svn_lock);
+
+   if (atomic_inc_not_zero(&sgx_usage_count))
+   return 0;
+
+   return sgx_update_svn();
+}
+
+void sgx_dec_usage_count(void)
+{
+   if (atomic_dec_return(&sgx_usage_count))
+   return;
+
+   guard(mutex)(&sgx_svn_lock);
+
+   if (atomic_read(&sgx_usage_count))
+   return;
+
+   sgx_update_svn();
+}
+
 static int __init sgx_init(void)
 {
int ret;
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index d2dad21259a8..f5940393d9bd 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -102,6 +102,9 @@ static inline int __init sgx_vepc_init(void)
 }
 #endif
 
+int sgx_inc_usage_count(void);
+void sgx_dec_usage_count(void);
+
 void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
 
 #endif /* _X86_SGX_H */
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 7aaa3652e31d..e548de340c2e 100644
--- a/arch/x86/

RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-24 Thread Reshetova, Elena

> On Tue, Apr 22, 2025, Kai Huang wrote:
> > On Fri, 2025-04-18 at 07:55 -0700, Sean Christopherson wrote:
> > > On Tue, Apr 15, 2025, Elena Reshetova wrote:
> > > That said, handling this deep in the bowels of EPC page allocation seems
> > > unnecessary.  The only way for there to be no active EPC pages is if there
> are
> > > no enclaves.  As above, virtual EPC usage is already all but guaranteed to
> hit
> > > false positives, so I don't think there's anything gained by trying to 
> > > update
> > > the SVN based on EPC allocations.
> > >
> > > So rather than react to EPC allocations, why not hook sgx_open() and
> sgx_vepc_open()?
> >
> > The only thing I don't like about this is we need to hook both of them.
> 
> And having to maintain a separate counter.

Sorry for the delayed response, I was away for easter holidays. 
Thank you very much for your review, Sean!

I see your point about the issues with SGX-enabled VMs. My overall
understanding was that this would have to be up to the userspace to
prepare the system for EUPDATESVN, which *must* include shutting down
all host enclaves, and also migrating the SGX enabled VMs out of this
host or destroying them (likely the former). Is this an unacceptable assumption?
Because I would rather prefer to keep the kernel code clean and simple
and ask userspace to handle this (note: EUPDATESVN is not a common
action, we expect it to be needed only a couple of times per year max). 

If we follow the approach of trying to execute EUPDATESVN via 
sgx_open() and sgx_vepc_open() paths, it adds more complexity to kernel
code and imo it still doesn’t remove the complexity from userspace
 orchestration sw. I.e. userspace still has to get rid of host enclaves and 
SGX enabled VMs (because syncing with VMs owners to make sure their
encalves are destroyed and these VMs are ready for EUDPATESVN seems
like a big organizational complexity and error prone).
So, the only thing this approach would address would be an EPC
pre-allocation done by qemu? Wouldn't it be more reasonable
(here I am purely speculating, I dont know qemu code) to implement
in qemu the code to drop EPC pre-allocation if no VMs with SGX are
running? That would be a good overall policy imo not to waste EPC
space when not needed in practice. 

Best Regards,
Elena. 








Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-22 Thread Huang, Kai
On Tue, 2025-04-22 at 06:54 -0700, Sean Christopherson wrote:
> On Tue, Apr 22, 2025, Kai Huang wrote:
> > On Fri, 2025-04-18 at 07:55 -0700, Sean Christopherson wrote:
> > > On Tue, Apr 15, 2025, Elena Reshetova wrote:
> > > That said, handling this deep in the bowels of EPC page allocation seems
> > > unnecessary.  The only way for there to be no active EPC pages is if 
> > > there are
> > > no enclaves.  As above, virtual EPC usage is already all but guaranteed 
> > > to hit
> > > false positives, so I don't think there's anything gained by trying to 
> > > update
> > > the SVN based on EPC allocations.
> > > 
> > > So rather than react to EPC allocations, why not hook sgx_open() and 
> > > sgx_vepc_open()?
> > 
> > The only thing I don't like about this is we need to hook both of them.
> 
> And having to maintain a separate counter.
> 
> > > It's the hypervisor's responsibility to enumerate SGX_CPUID_EUPDATESVN or 
> > > not.
> > > E.g. if the kernel is running in a "passthrough" type setup, it would be 
> > > perfectly
> > > fine to do EUPDATESVN from a guest kernel.  EUPDATESVN could even be 
> > > proxied,
> > > e.g. by intercepting ECREATE to track EPC usage
> > 
> > I am open to this HYPERVISOR check, because I don't think we currently 
> > support
> > any kind of "passthrough" setup?
> 
> Who is "we"?  KVM?  From the SGX driver's perspective, it doesn't know on 
> which
> hypervisor it's running, or the intent of that hypervisor.  I.e. whether or 
> not
> KVM or any other hypervisor supports something is irrelevant.

I was thinking about KVM, but right we cannot assume what will the underneath 
hypervisor do.

> 
> > And depending on what kinda of "passthrough" types, the things that the
> > hypervisor traps can vary.
> > 
> > Anyway, I agree it's not necessary to have this check, because currently it 
> > is
> > not possible for a guest to see the EUPDATESVN in CPUID.
> 
> Why is that?
> 

I was thinking about KVM :-( You are right it's up to the hypervisor to decide
whether to advertise EUPDATESVN in CPUID.

Given we don't want to rule out there might be some hypervisor out there which
may just passthrough everything, I agree we should remove this HYPERVISOR check.

Thanks :-)


Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-22 Thread Sean Christopherson
On Tue, Apr 22, 2025, Kai Huang wrote:
> On Fri, 2025-04-18 at 07:55 -0700, Sean Christopherson wrote:
> > On Tue, Apr 15, 2025, Elena Reshetova wrote:
> > That said, handling this deep in the bowels of EPC page allocation seems
> > unnecessary.  The only way for there to be no active EPC pages is if there 
> > are
> > no enclaves.  As above, virtual EPC usage is already all but guaranteed to 
> > hit
> > false positives, so I don't think there's anything gained by trying to 
> > update
> > the SVN based on EPC allocations.
> > 
> > So rather than react to EPC allocations, why not hook sgx_open() and 
> > sgx_vepc_open()?
> 
> The only thing I don't like about this is we need to hook both of them.

And having to maintain a separate counter.

> > It's the hypervisor's responsibility to enumerate SGX_CPUID_EUPDATESVN or 
> > not.
> > E.g. if the kernel is running in a "passthrough" type setup, it would be 
> > perfectly
> > fine to do EUPDATESVN from a guest kernel.  EUPDATESVN could even be 
> > proxied,
> > e.g. by intercepting ECREATE to track EPC usage
> 
> I am open to this HYPERVISOR check, because I don't think we currently support
> any kind of "passthrough" setup?

Who is "we"?  KVM?  From the SGX driver's perspective, it doesn't know on which
hypervisor it's running, or the intent of that hypervisor.  I.e. whether or not
KVM or any other hypervisor supports something is irrelevant.

> And depending on what kinda of "passthrough" types, the things that the
> hypervisor traps can vary.
> 
> Anyway, I agree it's not necessary to have this check, because currently it is
> not possible for a guest to see the EUPDATESVN in CPUID.

Why is that?



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-22 Thread Huang, Kai
On Fri, 2025-04-18 at 07:55 -0700, Sean Christopherson wrote:
> On Tue, Apr 15, 2025, Elena Reshetova wrote:
> > +/* This lock is held to prevent new EPC pages from being created
> > + * during the execution of ENCLS[EUPDATESVN].
> > + */
> > +static DEFINE_SPINLOCK(sgx_epc_eupdatesvn_lock);
> > +
> >  static atomic_long_t sgx_nr_used_pages = ATOMIC_LONG_INIT(0);
> >  static unsigned long sgx_nr_total_pages;
> >  
> > @@ -444,6 +449,7 @@ static struct sgx_epc_page 
> > *__sgx_alloc_epc_page_from_node(int nid)
> >  {
> > struct sgx_numa_node *node = &sgx_numa_nodes[nid];
> > struct sgx_epc_page *page = NULL;
> > +   bool ret;
> >  
> > spin_lock(&node->lock);
> >  
> > @@ -452,12 +458,33 @@ static struct sgx_epc_page 
> > *__sgx_alloc_epc_page_from_node(int nid)
> > return NULL;
> > }
> >  
> > +   if (!atomic_long_inc_not_zero(&sgx_nr_used_pages)) {
> 
> FWIW, the entire automatic scheme has terrible behavior when KVM is running 
> SGX
> capable guests.  KVM will allocate EPC when the virtual EPC is first touched 
> by
> the guest, and won't free the EPC pages until the VM is terminated.  And IIRC,
> userspace (QEMU) typically preallocates the virtual EPC to ensure the VM 
> doesn't
> need to be killed later on due to lack of EPC.
> 
> I.e. running an SGX capable VM, even if there are no active enclaves in said 
> VM,
> will prevent SVN updates.
> 
> Unfortunately, I can't think of a sane way around that, because while it would
> be easy enough to force interception of ECREATE, there's no definitive ENCLS 
> leaf
> that KVM can intercept to detect when an enclave is destroyed (interception
> EREMOVE would have terrible performance implications).
> 
> That said, handling this deep in the bowels of EPC page allocation seems
> unnecessary.  The only way for there to be no active EPC pages is if there are
> no enclaves.  As above, virtual EPC usage is already all but guaranteed to hit
> false positives, so I don't think there's anything gained by trying to update
> the SVN based on EPC allocations.
> 
> So rather than react to EPC allocations, why not hook sgx_open() and 
> sgx_vepc_open()?

The only thing I don't like about this is we need to hook both of them.

> Then you're hooking a relative slow path (I assume EUPDATESVN is not fast), 
> 

I was expecting the SGX_NO_UPDATE case should be fast, i.e., some sort of simple
compare and return, but looking at the pseudo code it still re-generates the
CR_BASE_KEY etc.

> error
> codes can be returned to userspace (if you really want the kernel to freak 
> out if
> EUPDATESVN unexpected fails), and you don't _have_ to use a spinlock, e.g. if
> EUPDATESVN takes a long time, using a mutex might be desirable.

Seems reasonable to me.

> 
> > +bool sgx_updatesvn(void)
> > +{
> > +   int retry = 10;
> > +   int ret;
> > +
> > +   lockdep_assert_held(&sgx_epc_eupdatesvn_lock);
> > +
> > +   if (!(cpuid_eax(SGX_CPUID) & SGX_CPUID_EUPDATESVN))
> > +   return true;
> 
> Checking CPUID features inside the spinlock is asinine.  They can't change, 
> barring
> a misbehaving hypervisor.  This should be a "cache once during boot" sorta 
> thing.

Agreed.

> 
> > +
> > +   /*
> > +* Do not execute ENCLS[EUPDATESVN] if running in a VM since
> > +* microcode updates are only meaningful to be applied on the host.
> 
> I don't think the kernel should check HYPERVISOR.  The SDM doesn't actually
> say anything about EUPDATESVN, but unless it's explicitly special cased, I 
> doubt
> XuCode cares whether ENCLS was executed in Root vs. Non-Root.

The spec is here:

[1] https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true

And from the pseudo code it doesn't distinguish root vs non-root.

> 
> It's the hypervisor's responsibility to enumerate SGX_CPUID_EUPDATESVN or not.
> E.g. if the kernel is running in a "passthrough" type setup, it would be 
> perfectly
> fine to do EUPDATESVN from a guest kernel.  EUPDATESVN could even be proxied,
> e.g. by intercepting ECREATE to track EPC usage

I am open to this HYPERVISOR check, because I don't think we currently support
any kind of "passthrough" setup? And depending on what kinda of "passthrough"
types, the things that the hypervisor traps can vary.

Anyway, I agree it's not necessary to have this check, because currently it is
not possible for a guest to see the EUPDATESVN in CPUID.




Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-21 Thread Huang, Kai
On Fri, 2025-04-18 at 07:18 -0700, Sean Christopherson wrote:
> On Thu, Apr 17, 2025, Kai Huang wrote:
> > I think the sgx_updatesvn() should just return true when EUPDATESVN returns 
> > 0 or
> > SGX_NO_UPDATE, and return false for all other error codes.  And it should
> > ENCLS_WARN() for all other error codes, except SGX_INSUFFICIENT_ENTROPY 
> > because
> > it can still legally happen.
> > 
> > Something like:
> > 
> > do {
> > ret = __eupdatesvn();
> > if (ret != SGX_INSUFFICIENT_ENTROPY)
> > break;
> > } while (--retry);
> 
> This can be:
> 
>   do {
>   ret = __eupdatesvn();
>   } while (ret == SGX_INSUFFICIENT_ENTROPY && --retry)
> 
> To make it super obvious that retry is only relevant to lack of entropy.

Yep looks better.

> 
> > if (!ret || ret == SGX_NO_UPDATE) {
> > /*
> >  * SVN successfully updated, or it was already up-to-date.
> >  * Let users know when the update was successful.
> >  */
> > if (!ret)
> > pr_info("SVN updated successfully\n");
> > return true;
> 
> Returning true/false is confusing since the vast majority of the SGX code uses
> '0' for success.  A lot of cleverness went into splicing SGX's error codes 
> into
> the kernel's -ernno; it would be a shame to ignore that :-)

Agreed :-)

> 
> E.g. this looks wrong at first (and second glance)
> 
>   ret = sgx_updatesvn();
>   if (!ret) {
>   /*
>* sgx_updatesvn() returned unknown error, smth
>* must be broken, do not allocate a page from 
> EPC
>*/
>   spin_unlock(&sgx_epc_eupdatesvn_lock);
>   spin_unlock(&node->lock);
>   return NULL;
>   }

Yep.

> 
> > }
> > 
> > /*
> >  * EUPDATESVN was called when EPC is empty, all other error
> >  * codes are unexcepted except running out of entropy.
> >  */
> > if (ret != SGX_INSUFFICIENT_ENTROPY)
> > ENCLS_WARN(ret, "EUPDATESVN");
> > 
> > return false;
> > 
> > 
> > In __sgx_alloc_epc_page_from_node(), it should fail to allocate EPC page and
> > return -ENOMEM when sgx_updatesvn() returns false.  We should only allow 
> > EPC to
> 
> No, it should return a meaningful error code, not -ENOMEM.  
> 

Oh I was actually thinking to keep __sgx_alloc_epc_page_from_node() returning
NULL if sgx_updatesvn() fails (given it returns true/false) and make
sgx_alloc_epc_page() return -ENOMEM.

Sorry for didn't say fully. :-(

> And if that's the
> behavior you want, then __sgx_alloc_epc_page() should be updated to bail 
> immediately.
> The current code assuming -ENOMEM is the only failure scenario:
> 
>   do {
>   page = __sgx_alloc_epc_page_from_node(nid);
>   if (page)
>   return page;
> 
>   nid = next_node_in(nid, sgx_numa_mask);
>   } while (nid != nid_start);
> 
> That should be something like:
> 
>   do {
>   page = __sgx_alloc_epc_page_from_node(nid);
>   if (!IS_ERR(page) || PTR_ERR(page) != -ENOMEM)
>   return page;
> 
>   nid = next_node_in(nid, sgx_numa_mask);
>   } while (nid != nid_start);

If we want to bail out immediately, then perhaps -EIO is a better option instead
of -ENOMEM.  And in this case sgx_alloc_epc_page() can also bail out early:

--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -569,6 +569,9 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool
reclaim)
break;
}
 
+   if (PTR_ERR(page) == -EIO)
+   return page;
+
if (list_empty(&sgx_active_page_list))
return ERR_PTR(-ENOMEM);

> 
> > be allocated when we know the SVN is already up-to-date.
> > 
> > Any further call of EPC allocation will trigger sgx_updatesvn() again.  If 
> > it
> > was failed due to unexpected error, then it should continue to fail,
> > guaranteeing "sgx_alloc_epc_page() return consistently -ENOMEM, if the
> > unexpected happens".  If it was failed due to running out of entropy, it 
> > then
> > may fail again, or it will just succeed and then SGX can continue to work.
> 
> 
> Side topic, the function comment for __sgx_alloc_epc_page() is stale/wrong.  
> It
> returns ERR_PTR(-ENOMEM), not NULL, on failure.

Right :-)

Thanks for spending time on review!

> 
> /**
>  * __sgx_alloc_epc_page() - Allocate an EPC page
>  *
>  * Iterate through NUMA nodes and reserve ia free EPC page to the caller. 
> Start
>  * from the NUMA node, where the caller is executing.
>  *
>  * Return:
>  * - an EPC page: A borrowed EPC pages were available.
>  * - NULL:

Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-18 Thread Sean Christopherson
On Tue, Apr 15, 2025, Elena Reshetova wrote:
> +/* This lock is held to prevent new EPC pages from being created
> + * during the execution of ENCLS[EUPDATESVN].
> + */
> +static DEFINE_SPINLOCK(sgx_epc_eupdatesvn_lock);
> +
>  static atomic_long_t sgx_nr_used_pages = ATOMIC_LONG_INIT(0);
>  static unsigned long sgx_nr_total_pages;
>  
> @@ -444,6 +449,7 @@ static struct sgx_epc_page 
> *__sgx_alloc_epc_page_from_node(int nid)
>  {
>   struct sgx_numa_node *node = &sgx_numa_nodes[nid];
>   struct sgx_epc_page *page = NULL;
> + bool ret;
>  
>   spin_lock(&node->lock);
>  
> @@ -452,12 +458,33 @@ static struct sgx_epc_page 
> *__sgx_alloc_epc_page_from_node(int nid)
>   return NULL;
>   }
>  
> + if (!atomic_long_inc_not_zero(&sgx_nr_used_pages)) {

FWIW, the entire automatic scheme has terrible behavior when KVM is running SGX
capable guests.  KVM will allocate EPC when the virtual EPC is first touched by
the guest, and won't free the EPC pages until the VM is terminated.  And IIRC,
userspace (QEMU) typically preallocates the virtual EPC to ensure the VM doesn't
need to be killed later on due to lack of EPC.

I.e. running an SGX capable VM, even if there are no active enclaves in said VM,
will prevent SVN updates.

Unfortunately, I can't think of a sane way around that, because while it would
be easy enough to force interception of ECREATE, there's no definitive ENCLS 
leaf
that KVM can intercept to detect when an enclave is destroyed (interception
EREMOVE would have terrible performance implications).

That said, handling this deep in the bowels of EPC page allocation seems
unnecessary.  The only way for there to be no active EPC pages is if there are
no enclaves.  As above, virtual EPC usage is already all but guaranteed to hit
false positives, so I don't think there's anything gained by trying to update
the SVN based on EPC allocations.

So rather than react to EPC allocations, why not hook sgx_open() and 
sgx_vepc_open()?
Then you're hooking a relative slow path (I assume EUPDATESVN is not fast), 
error
codes can be returned to userspace (if you really want the kernel to freak out 
if
EUPDATESVN unexpected fails), and you don't _have_ to use a spinlock, e.g. if
EUPDATESVN takes a long time, using a mutex might be desirable.

> +bool sgx_updatesvn(void)
> +{
> + int retry = 10;
> + int ret;
> +
> + lockdep_assert_held(&sgx_epc_eupdatesvn_lock);
> +
> + if (!(cpuid_eax(SGX_CPUID) & SGX_CPUID_EUPDATESVN))
> + return true;

Checking CPUID features inside the spinlock is asinine.  They can't change, 
barring
a misbehaving hypervisor.  This should be a "cache once during boot" sorta 
thing.

> +
> + /*
> +  * Do not execute ENCLS[EUPDATESVN] if running in a VM since
> +  * microcode updates are only meaningful to be applied on the host.

I don't think the kernel should check HYPERVISOR.  The SDM doesn't actually
say anything about EUPDATESVN, but unless it's explicitly special cased, I doubt
XuCode cares whether ENCLS was executed in Root vs. Non-Root.

It's the hypervisor's responsibility to enumerate SGX_CPUID_EUPDATESVN or not.
E.g. if the kernel is running in a "passthrough" type setup, it would be 
perfectly
fine to do EUPDATESVN from a guest kernel.  EUPDATESVN could even be proxied,
e.g. by intercepting ECREATE to track EPC usage



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-18 Thread Sean Christopherson
On Thu, Apr 17, 2025, Kai Huang wrote:
> I think the sgx_updatesvn() should just return true when EUPDATESVN returns 0 
> or
> SGX_NO_UPDATE, and return false for all other error codes.  And it should
> ENCLS_WARN() for all other error codes, except SGX_INSUFFICIENT_ENTROPY 
> because
> it can still legally happen.
> 
> Something like:
> 
>   do {
>   ret = __eupdatesvn();
>   if (ret != SGX_INSUFFICIENT_ENTROPY)
>   break;
>   } while (--retry);

This can be:

do {
ret = __eupdatesvn();
} while (ret == SGX_INSUFFICIENT_ENTROPY && --retry)

To make it super obvious that retry is only relevant to lack of entropy.

>   if (!ret || ret == SGX_NO_UPDATE) {
>   /*
>* SVN successfully updated, or it was already up-to-date.
>* Let users know when the update was successful.
>*/
>   if (!ret)
>   pr_info("SVN updated successfully\n");
>   return true;

Returning true/false is confusing since the vast majority of the SGX code uses
'0' for success.  A lot of cleverness went into splicing SGX's error codes into
the kernel's -ernno; it would be a shame to ignore that :-)

E.g. this looks wrong at first (and second glance)

ret = sgx_updatesvn();
if (!ret) {
/*
 * sgx_updatesvn() returned unknown error, smth
 * must be broken, do not allocate a page from 
EPC
 */
spin_unlock(&sgx_epc_eupdatesvn_lock);
spin_unlock(&node->lock);
return NULL;
}

>   }
> 
>   /*
>* EUPDATESVN was called when EPC is empty, all other error
>* codes are unexcepted except running out of entropy.
>*/
>   if (ret != SGX_INSUFFICIENT_ENTROPY)
>   ENCLS_WARN(ret, "EUPDATESVN");
> 
>   return false;
>   
> 
> In __sgx_alloc_epc_page_from_node(), it should fail to allocate EPC page and
> return -ENOMEM when sgx_updatesvn() returns false.  We should only allow EPC 
> to

No, it should return a meaningful error code, not -ENOMEM.  And if that's the
behavior you want, then __sgx_alloc_epc_page() should be updated to bail 
immediately.
The current code assuming -ENOMEM is the only failure scenario:

do {
page = __sgx_alloc_epc_page_from_node(nid);
if (page)
return page;

nid = next_node_in(nid, sgx_numa_mask);
} while (nid != nid_start);

That should be something like:

do {
page = __sgx_alloc_epc_page_from_node(nid);
if (!IS_ERR(page) || PTR_ERR(page) != -ENOMEM)
return page;

nid = next_node_in(nid, sgx_numa_mask);
} while (nid != nid_start);

> be allocated when we know the SVN is already up-to-date.
> 
> Any further call of EPC allocation will trigger sgx_updatesvn() again.  If it
> was failed due to unexpected error, then it should continue to fail,
> guaranteeing "sgx_alloc_epc_page() return consistently -ENOMEM, if the
> unexpected happens".  If it was failed due to running out of entropy, it then
> may fail again, or it will just succeed and then SGX can continue to work.


Side topic, the function comment for __sgx_alloc_epc_page() is stale/wrong.  It
returns ERR_PTR(-ENOMEM), not NULL, on failure.

/**
 * __sgx_alloc_epc_page() - Allocate an EPC page
 *
 * Iterate through NUMA nodes and reserve ia free EPC page to the caller. Start
 * from the NUMA node, where the caller is executing.
 *
 * Return:
 * - an EPC page:   A borrowed EPC pages were available.
 * - NULL:  Out of EPC pages.
 */



Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-17 Thread Huang, Kai

> 
> > 
> > Reading below, it could also fail due to running out of entropy, which is 
> > still
> > legally possible to happen IMHO.
> 
> Actually, no in this case, we only return false from sgx_updatesvn in case 
> unknown
> error happens as agreed previously. In case we run out of entropy it should 
> be safe
> to retry later and we dont prevent per current code EPC page allocation. 
> 
> > 
> > Maybe just:
> > /*
> >  * Updating SVN failed.  SGX might be broken,
> >  * or running out of entropy happened.  Do
> > not
> >  * allocate EPC page since it is not safe to
> > use
> >  * SGX anymore if it was the former.  If it was
> >  * due to running out of entropy, the further
> >  * call of EPC allocation will try
> >  * sgx_updatesvn() again.
> >  */
> 
> I agree with this except that the current code doesn’t prevent EPC allocation 
> on any
> other error return than unknown error. The question is whenever we want to
> change the behaviour to require it? 

[...]

> > > + * Return:
> > > + * True: success or EUPDATESVN can be safely retried next time
> > > + * False: Unexpected error occurred
> > 
> > Hmm.. IIUC it could fail with running out of entropy but this is still 
> > legally
> > possible to happen.  And it is safe to retry.
> 
> Yes, in this case we get back SGX_INSUFFICIENT_ENTROPY currently we
> return "true" here and do not prevent EPC allocations of the page in this
> case, which means we will start populate EPC and can next time retry only
> when EPC is empty again. 

[...]

> > > + switch (ret) {
> > > + case 0:
> > > + pr_info("EUPDATESVN: success\n");
> > > + break;
> > > + case SGX_EPC_NOT_READY:
> > > + case SGX_INSUFFICIENT_ENTROPY:
> > > + case SGX_EPC_PAGE_CONFLICT:
> > > + ENCLS_WARN(ret, "EUPDATESVN");
> > > + break;
> > 
> > I don't think we should use ENCLS_WARN() for SGX_INSUFFICIENT_ENTROPY,
> > since
> > IIUC it is still legally possible to happen after the above retry.
> > 
> > Also, it doesn't seem SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT
> > are needed
> > since IIUC the only possible error is out of entropy.
> 
> Well, in case we have a kernel bug, and we think EPC is empty and it is safe
> to execute EUPDATESVN, while it is not the case, we can still get back the 
> SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT from HW, right? 

Right, but as you said, in case we have a kernel bug.

Which means it is not expected and we should just use ENCLS_WARN() for them.

IMHO we can even add

WARN_ON_ONCE(atomic_long_read(&sgx_nr_used_pages) != 0);

to sgx_updatesvn(), e.g., right after
 
lockdep_assert_held(&sgx_epc_eupdatesvn_lock);

to assert sgx_updatesvn() is called when EPC is empty, thus SGX_EPC_NOT_READY
and SGX_EPC_PAGE_CONFLICT are not possible to happen.

> Which means we probably should warn about such buggy cases. 

Yes.

> And maybe we should also prevent page allocation from EPC in this case also
> similarly as for unknown error?

Yes.

I don't see any reason we should continue to allow SGX to work in case of
SGX_EPC_NOT_READY or SGX_EPC_PAGE_CONFLICT.

IIUC, we also agreed in the last round discussion that we should:

 "I guess the best action would be make sgx_alloc_epc_page() return
 consistently -ENOMEM, if the unexpected happens."

SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT are indeed unexpected to me.

So my suggestion would be:

I think the sgx_updatesvn() should just return true when EUPDATESVN returns 0 or
SGX_NO_UPDATE, and return false for all other error codes.  And it should
ENCLS_WARN() for all other error codes, except SGX_INSUFFICIENT_ENTROPY because
it can still legally happen.

Something like:

do {
ret = __eupdatesvn();
if (ret != SGX_INSUFFICIENT_ENTROPY)
break;
} while (--retry);

if (!ret || ret == SGX_NO_UPDATE) {
/*
 * SVN successfully updated, or it was already up-to-date.
 * Let users know when the update was successful.
 */
if (!ret)
pr_info("SVN updated successfully\n");
return true;
}

/*
 * EUPDATESVN was called when EPC is empty, all other error
 * codes are unexcepted except running out of entropy.
 */
if (ret != SGX_INSUFFICIENT_ENTROPY)
ENCLS_WARN(ret, "EUPDATESVN");

return false;


In __sgx_alloc_epc_page_from_node(), it should fail to allocate EPC page and
return -ENOMEM when sgx_updatesvn() returns false.  We should only allow EPC to
be allocated when we know the SVN is already up-to-date.

Any further call of EPC allocation will trigger sgx_updatesvn()

Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-16 Thread Jarkko Sakkinen
On Tue, Apr 15, 2025 at 02:51:22PM +0300, Elena Reshetova wrote:
> SGX architecture introduced a new instruction called EUPDATESVN
> to Ice Lake. It allows updating security SVN version, given that EPC
> is completely empty. The latter is required for security reasons
> in order to reason that enclave security posture is as secure as the
> security SVN version of the TCB that created it.

"Ice Lake introduced ENCLS[EUPDATESVN], which allows to to update the
microcode version for an online system" ?

Now you are saying it allows updating SVN which is null information.

> 
> Additionally it is important to ensure that while ENCLS[EUPDATESVN]
> runs, no concurrent page creation happens in EPC, because it might
> result in #GP delivered to the creator. Legacy SW might not be prepared
> to handle such unexpected #GPs and therefore this patch introduces
> a locking mechanism to ensure no concurrent EPC allocations can happen.
> 
> It is also ensured that ENCLS[EUPDATESVN] is not called when running
> in a VM since it does not have a meaning in this context (microcode
> updates application is limited to the host OS) and will create
> unnecessary load.
> 
> This patch is based on previous submision by Cathy Zhang
> https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/
> 
> Signed-off-by: Elena Reshetova 
> ---
>  arch/x86/include/asm/sgx.h  | 41 +++--
>  arch/x86/kernel/cpu/sgx/encls.h |  6 +++
>  arch/x86/kernel/cpu/sgx/main.c  | 82 -
>  arch/x86/kernel/cpu/sgx/sgx.h   |  1 +
>  4 files changed, 114 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 6a0069761508..5caf5c31ebc6 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -26,23 +26,26 @@
>  #define SGX_CPUID_EPC_SECTION0x1
>  /* The bitmask for the EPC section type. */
>  #define SGX_CPUID_EPC_MASK   GENMASK(3, 0)
> +/* EUPDATESVN presence indication */
> +#define SGX_CPUID_EUPDATESVN BIT(10)
>  
>  enum sgx_encls_function {
> - ECREATE = 0x00,
> - EADD= 0x01,
> - EINIT   = 0x02,
> - EREMOVE = 0x03,
> - EDGBRD  = 0x04,
> - EDGBWR  = 0x05,
> - EEXTEND = 0x06,
> - ELDU= 0x08,
> - EBLOCK  = 0x09,
> - EPA = 0x0A,
> - EWB = 0x0B,
> - ETRACK  = 0x0C,
> - EAUG= 0x0D,
> - EMODPR  = 0x0E,
> - EMODT   = 0x0F,
> + ECREATE = 0x00,
> + EADD= 0x01,
> + EINIT   = 0x02,
> + EREMOVE = 0x03,
> + EDGBRD  = 0x04,
> + EDGBWR  = 0x05,
> + EEXTEND = 0x06,
> + ELDU= 0x08,
> + EBLOCK  = 0x09,
> + EPA = 0x0A,
> + EWB = 0x0B,
> + ETRACK  = 0x0C,
> + EAUG= 0x0D,
> + EMODPR  = 0x0E,
> + EMODT   = 0x0F,
> + EUPDATESVN  = 0x18,
>  };
>  
>  /**
> @@ -73,6 +76,11 @@ enum sgx_encls_function {
>   *   public key does not match IA32_SGXLEPUBKEYHASH.
>   * %SGX_PAGE_NOT_MODIFIABLE: The EPC page cannot be modified because it
>   *   is in the PENDING or MODIFIED state.
> + * %SGX_INSUFFICIENT_ENTROPY:Insufficient entropy in RNG.
> + * %SGX_EPC_NOT_READY:   EPC is not ready for SVN update.
> + * %SGX_NO_UPDATE:   EUPDATESVN was successful, but CPUSVN was not
> + *   updated because current SVN was not newer than
> + *   CPUSVN.
>   * %SGX_UNMASKED_EVENT:  An unmasked event, e.g. INTR, was 
> received
>   */
>  enum sgx_return_code {
> @@ -81,6 +89,9 @@ enum sgx_return_code {
>   SGX_CHILD_PRESENT   = 13,
>   SGX_INVALID_EINITTOKEN  = 16,
>   SGX_PAGE_NOT_MODIFIABLE = 20,
> + SGX_INSUFFICIENT_ENTROPY= 29,
> + SGX_EPC_NOT_READY   = 30,
> + SGX_NO_UPDATE   = 31,
>   SGX_UNMASKED_EVENT  = 128,
>  };
>  
> diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
> index 99004b02e2ed..3d83c76dc91f 100644
> --- a/arch/x86/kernel/cpu/sgx/encls.h
> +++ b/arch/x86/kernel/cpu/sgx/encls.h
> @@ -233,4 +233,10 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, 
> void *addr)
>   return __encls_2(EAUG, pginfo, addr);
>  }
>  
> +/* Update CPUSVN at runtime. */
> +static inline int __eupdatesvn(void)
> +{
> + return __encls_ret_1(EUPDATESVN, "");
> +}
> +
>  #endif /* _X86_ENCLS_H */
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index b61d3bad0446..c21f4f6226b0 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -32,6 +32,11 @@ static DEFINE_XARRAY(sgx_epc_address_space);
>  static LIST_HEAD(sgx_active_page_list);
>  static DEFINE_SPINLOCK(sgx_reclaimer_lock);
>  
> +/* This lock is held to prevent new EPC pages from bein

RE: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-16 Thread Reshetova, Elena

> On Tue, 2025-04-15 at 14:51 +0300, Elena Reshetova wrote:
> > SGX architecture introduced a new instruction called EUPDATESVN
> 
> "a new ENCLS leaf function EUPDATESVN"?

Yes, you are right, better wording, will fix. 

> 
> > to Ice Lake. It allows updating security SVN version, given that EPC
> > is completely empty. The latter is required for security reasons
> > in order to reason that enclave security posture is as secure as the
> > security SVN version of the TCB that created it.
> >
> > Additionally it is important to ensure that while ENCLS[EUPDATESVN]
> > runs, no concurrent page creation happens in EPC, because it might
> > result in #GP delivered to the creator. Legacy SW might not be prepared
> > to handle such unexpected #GPs and therefore this patch introduces
> > a locking mechanism to ensure no concurrent EPC allocations can happen.
> >
> > It is also ensured that ENCLS[EUPDATESVN] is not called when running
> > in a VM since it does not have a meaning in this context (microcode
> > updates application is limited to the host OS) and will create
> > unnecessary load.
> >
> > This patch is based on previous submision by Cathy Zhang
> > https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/
> 
> When Jarkko suggested to use "Link" here:
> 
> https://lore.kernel.org/lkml/z983zatawnqfu...@kernel.org/
> 
> I think he meant something like below:
> 
> This patch is based on ... [1]
> 
> Link: https://lore.kernel.org/all/20220520103904.1216-1-
> cathy.zh...@intel.com/
> [1]

Oh, I see, I didn’t understand this. Can fix in the next version. 

> 
> >
> > Signed-off-by: Elena Reshetova 
> > ---
> >  arch/x86/include/asm/sgx.h  | 41 +++--
> >  arch/x86/kernel/cpu/sgx/encls.h |  6 +++
> >  arch/x86/kernel/cpu/sgx/main.c  | 82
> -
> >  arch/x86/kernel/cpu/sgx/sgx.h   |  1 +
> >  4 files changed, 114 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> > index 6a0069761508..5caf5c31ebc6 100644
> > --- a/arch/x86/include/asm/sgx.h
> > +++ b/arch/x86/include/asm/sgx.h
> > @@ -26,23 +26,26 @@
> >  #define SGX_CPUID_EPC_SECTION  0x1
> >  /* The bitmask for the EPC section type. */
> >  #define SGX_CPUID_EPC_MASK GENMASK(3, 0)
> > +/* EUPDATESVN presence indication */
> > +#define SGX_CPUID_EUPDATESVN   BIT(10)
> 
> I am not sure whether this should be a new X86_FEATURE bit, just like other
> SGX
> bits:
> 
>   X86_FEATURE_SGX1
>   X86_FEATURE_SGX2
>   X86_FEATURE_SGX_EDECCSSA
> 
> .. reported in CPUID.0x12.0x0:EAX.
> 
> But this is never going to be exposed to VMs, i.e., KVM should never need to
> use
> it, so perhaps fine to just put it here.

I am fine either way. I can change this if people consider it is a better fit. 

> 
> [...]
> 
> 
> >
> > +/* This lock is held to prevent new EPC pages from being created
> > + * during the execution of ENCLS[EUPDATESVN].
> > + */
> 
> /*
>  * This lock ...
>  * ...
>  */

Thank you for catching this! I thought that I fixed all comments to this
Format but obvious I missed this one. Sad that checkpatch doesn’t
complain on this (

> 
> > +static DEFINE_SPINLOCK(sgx_epc_eupdatesvn_lock);
> > +
> >  static atomic_long_t sgx_nr_used_pages = ATOMIC_LONG_INIT(0);
> >  static unsigned long sgx_nr_total_pages;
> >
> > @@ -444,6 +449,7 @@ static struct sgx_epc_page
> *__sgx_alloc_epc_page_from_node(int nid)
> >  {
> > struct sgx_numa_node *node = &sgx_numa_nodes[nid];
> > struct sgx_epc_page *page = NULL;
> > +   bool ret;
> >
> > spin_lock(&node->lock);
> >
> > @@ -452,12 +458,33 @@ static struct sgx_epc_page
> *__sgx_alloc_epc_page_from_node(int nid)
> > return NULL;
> > }
> >
> > +   if (!atomic_long_inc_not_zero(&sgx_nr_used_pages)) {
> > +   spin_lock(&sgx_epc_eupdatesvn_lock);
> > +   /*
> > +* Only call sgx_updatesvn() once the first enclave's
> > +* page is allocated from EPC
> > +*/
> 
> The VMs can pre-populate EPC w/o having any enclave being created inside.
> How
> about just:
> 
>   /*
>* Make sure SVN is up-to-date before any EPC page is
>* allocated for any enclave.
>*/
> 

Right, agree with this wording change, good point!

> > +   if (atomic_long_read(&sgx_nr_used_pages) == 0) {
> > +   ret = sgx_updatesvn();
> > +   if (!ret) {
> > +   /*
> > +* sgx_updatesvn() returned unknown error,
> smth
> > +* must be broken, do not allocate a page
> from EPC
> > +*/
> 
> Strictly speaking, sgx_updatesvn() returns true or false, but not "unknown
> error".

Yes, will fix the wording. 

> 
> Reading below, it could also fail due to running out of entropy, which is 
> still
> legally possible to happen IMHO.

Actually, no in this case, we only return false from 

Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

2025-04-16 Thread Huang, Kai
On Tue, 2025-04-15 at 14:51 +0300, Elena Reshetova wrote:
> SGX architecture introduced a new instruction called EUPDATESVN

"a new ENCLS leaf function EUPDATESVN"?

> to Ice Lake. It allows updating security SVN version, given that EPC
> is completely empty. The latter is required for security reasons
> in order to reason that enclave security posture is as secure as the
> security SVN version of the TCB that created it.
> 
> Additionally it is important to ensure that while ENCLS[EUPDATESVN]
> runs, no concurrent page creation happens in EPC, because it might
> result in #GP delivered to the creator. Legacy SW might not be prepared
> to handle such unexpected #GPs and therefore this patch introduces
> a locking mechanism to ensure no concurrent EPC allocations can happen.
> 
> It is also ensured that ENCLS[EUPDATESVN] is not called when running
> in a VM since it does not have a meaning in this context (microcode
> updates application is limited to the host OS) and will create
> unnecessary load.
> 
> This patch is based on previous submision by Cathy Zhang
> https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/

When Jarkko suggested to use "Link" here:

https://lore.kernel.org/lkml/z983zatawnqfu...@kernel.org/

I think he meant something like below:

This patch is based on ... [1]

Link: https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/
[1]

> 
> Signed-off-by: Elena Reshetova 
> ---
>  arch/x86/include/asm/sgx.h  | 41 +++--
>  arch/x86/kernel/cpu/sgx/encls.h |  6 +++
>  arch/x86/kernel/cpu/sgx/main.c  | 82 -
>  arch/x86/kernel/cpu/sgx/sgx.h   |  1 +
>  4 files changed, 114 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 6a0069761508..5caf5c31ebc6 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -26,23 +26,26 @@
>  #define SGX_CPUID_EPC_SECTION0x1
>  /* The bitmask for the EPC section type. */
>  #define SGX_CPUID_EPC_MASK   GENMASK(3, 0)
> +/* EUPDATESVN presence indication */
> +#define SGX_CPUID_EUPDATESVN BIT(10)

I am not sure whether this should be a new X86_FEATURE bit, just like other SGX
bits:

  X86_FEATURE_SGX1
  X86_FEATURE_SGX2
  X86_FEATURE_SGX_EDECCSSA

.. reported in CPUID.0x12.0x0:EAX.

But this is never going to be exposed to VMs, i.e., KVM should never need to use
it, so perhaps fine to just put it here.

[...]


>  
> +/* This lock is held to prevent new EPC pages from being created
> + * during the execution of ENCLS[EUPDATESVN].
> + */

/*
 * This lock ...
 * ...
 */

> +static DEFINE_SPINLOCK(sgx_epc_eupdatesvn_lock);
> +
>  static atomic_long_t sgx_nr_used_pages = ATOMIC_LONG_INIT(0);
>  static unsigned long sgx_nr_total_pages;
>  
> @@ -444,6 +449,7 @@ static struct sgx_epc_page 
> *__sgx_alloc_epc_page_from_node(int nid)
>  {
>   struct sgx_numa_node *node = &sgx_numa_nodes[nid];
>   struct sgx_epc_page *page = NULL;
> + bool ret;
>  
>   spin_lock(&node->lock);
>  
> @@ -452,12 +458,33 @@ static struct sgx_epc_page 
> *__sgx_alloc_epc_page_from_node(int nid)
>   return NULL;
>   }
>  
> + if (!atomic_long_inc_not_zero(&sgx_nr_used_pages)) {
> + spin_lock(&sgx_epc_eupdatesvn_lock);
> + /*
> +  * Only call sgx_updatesvn() once the first enclave's
> +  * page is allocated from EPC
> +  */

The VMs can pre-populate EPC w/o having any enclave being created inside.  How
about just:

/*
 * Make sure SVN is up-to-date before any EPC page is
 * allocated for any enclave.
 */

> + if (atomic_long_read(&sgx_nr_used_pages) == 0) {
> + ret = sgx_updatesvn();
> + if (!ret) {
> + /*
> +  * sgx_updatesvn() returned unknown error, smth
> +  * must be broken, do not allocate a page from 
> EPC
> +  */

Strictly speaking, sgx_updatesvn() returns true or false, but not "unknown
error".

Reading below, it could also fail due to running out of entropy, which is still
legally possible to happen IMHO.

Maybe just:
/*
 * Updating SVN failed.  SGX might be broken,
 * or running out of entropy happened.  Do not
 * allocate EPC page since it is not safe to
use
 * SGX anymore if it was the former.  If it was
 * due to running out of entropy, the further
 * call of EPC allocation will try
 * sgx_updatesvn() again.
 */

> + spin_unlock(&sgx_epc_eupdatesvn_lock);
> +