from:"Borislav Petkov"

Re: [PATCH 4/5] x86/kernel: Move page table macros to new header

2024-05-23 Thread Borislav Petkov

On Thu, May 23, 2024 at 03:59:43PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> > ---
> >  arch/x86/kernel/head_64.S| 22 ++
> >  arch/x86/kernel/pgtable_64_helpers.h | 28 
> 
> That's the wrong place as you want to include it from arch/x86/platform.
> 
> arch/x86/include/asm/

... and there already is a header waiting:

arch/x86/include/asm/pgtable_64.h

so no need for a new one.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 08/35] x86/fred: Disable FRED by default in its early stage

2024-01-22 Thread Borislav Petkov

On Tue, Dec 05, 2023 at 02:49:57AM -0800, Xin Li wrote:
>   Warning: use of this parameter will taint the kernel
>   and may cause unknown problems.
>  
> + fred[X86-64]
> + Enable flexible return and event delivery

Let's make it accept multiple options from the get-go:

fred=on,disable-when,foo,bar,bla...

in case we need to tweak its behavior.

If it is only "fred" it will propagate this way downstream and it'll
lead to confusion later when people have to update their scripts and
config files when "fred" alone doesn't do what they're expecting
anymore.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 07/35] x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled

2024-01-22 Thread Borislav Petkov

On Tue, Dec 05, 2023 at 02:49:56AM -0800, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> Add CONFIG_X86_FRED to  to make
> cpu_feature_enabled() work correctly with FRED.
> 
> Originally-by: Megha Dey 
> Signed-off-by: H. Peter Anvin (Intel) 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
> 
> Changes since v10:
> * FRED feature is defined in cpuid word 12, not 13 (Nikolay Borisov).
> ---
>  arch/x86/include/asm/disabled-features.h   | 8 +++-
>  tools/arch/x86/include/asm/disabled-features.h | 8 +++-
>  2 files changed, 14 insertions(+), 2 deletions(-)

Whoever applies this: this one and the previous one can be merged into
one patch.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support

2024-01-03 Thread Borislav Petkov

On Tue, Jan 02, 2024 at 10:06:27PM +, Li, Xin3 wrote:
> Do I need to send an updated patch?

> Or just leave it to the maintainer who is going to take care of it?

While waiting, please take a look at this:

https://kernel.org/doc/html/latest/process/submitting-patches.html#don-t-get-discouraged-or-impatient

Might want to read the whole doc too.

But to answer your question: you wait a few weeks and collect all
comments and review feedback that you've received and incorporate them
into the patchset.

Then, after the time passes you send a new revision and explain in the
0th message what has changed.

Ok?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support

2024-01-02 Thread Borislav Petkov

On Tue, Dec 05, 2023 at 02:49:50AM -0800, Xin Li wrote:

> Subject: Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS 
> instruction support

Or simply "x86/fred: Add ... "

Other than that,

Acked-by: Borislav Petkov (AMD) 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation

2023-12-19 Thread Borislav Petkov

On Tue, Dec 19, 2023 at 04:49:14PM +0800, kernel test robot wrote:
> [ 13.481107][ T48] WARNING: CPU: 0 PID: 48 at int80_emulation 
> (arch/x86/entry/common.c:164) 
> [   13.481454][   T48] Modules linked in:
> [   13.481655][   T48] CPU: 0 PID: 48 Comm: init Tainted: G N 
> 6.7.0-rc4-2-gbe5341eb0d43 #1
> [ 13.482162][ T48] RIP: 0010:int80_emulation (arch/x86/entry/common.c:164) 

Looking at the dmesg, I think you missed the most important part - the
preceding line:

[   13.480504][   T48] CFI failure at int80_emulation+0x67/0xb0 (target: 
sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
^^^

[   13.481107][   T48] WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0
[   13.481454][   T48] Modules linked in:
[   13.481655][   T48] CPU: 0 PID: 48 Comm: init Tainted: G N 
6.7.0-rc4-2-gbe5341eb0d43 #1

The CFI bla is also in the stack trace.

Now, decode_cfi_insn() has a comment there which says what the compiler
generates about indirect call checks:

 *   movl-, %r10d   ; 6 bytes
 *   addl-4(%reg), %r10d; 4 bytes
 *   je  .Ltmp1 ; 2 bytes
 *   ud2; <- regs->ip
 *   .Ltmp1:

and the opcodes you decoded...

> [ 13.482437][ T48] Code: 01 00 00 77 43 89 c1 48 81 f9 c9 01 00 00 48 19 c9 
> 21 c1 48 89 df 4c 8b 1c cd 90 12 20 9a 41 ba 27 cb d4 4f 45 03 53 fc 74 02 
> <0f> 0b 41 ff d3 48 89 c1 48 89 4b 50 90 48 89 df 5b 41 5e 31 c0 31
> All code
> 
>0: 01 00   add%eax,(%rax)
>2: 00 77 43add%dh,0x43(%rdi)
>5: 89 c1   mov%eax,%ecx
>7: 48 81 f9 c9 01 00 00cmp$0x1c9,%rcx
>e: 48 19 c9sbb%rcx,%rcx
>   11: 21 c1   and%eax,%ecx
>   13: 48 89 dfmov%rbx,%rdi
>   16: 4c 8b 1c cd 90 12 20mov-0x65dfed70(,%rcx,8),%r11
>   1d: 9a 
>   1e: 41 ba 27 cb d4 4f   mov$0x4fd4cb27,%r10d
>   24: 45 03 53 fc add-0x4(%r11),%r10d
>   28: 74 02   je 0x2c
>   2a:*0f 0b   ud2 <-- trapping instruction

... these guys here, look exactly like what the compiler did issue.

This is the first time I'm looking at this CFI bla but it sounds like it
is trying to compare the syscall target's address of
sys_ni_posix_timers with something it is expecting to call and the
comparison doesn't work out (%r10 is not 0).

There's that special symbol __cfi_sys_ni_posix_timers which also gets
generated...

Someone would need to dig into that whole CFI gunk to figure out why
this is not happy.

Oh well.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 24/37] x86/idtentry: Incorporate definitions/declarations of the FRED entries

2023-11-28 Thread Borislav Petkov

On Tue, Nov 28, 2023 at 10:58:50AM -0800, H. Peter Anvin wrote:
> >You have a very good sense 

I've been reading code of a couple of people for a couple of years. :-)

> Remember that Signed-off-by: relates to the *patch flow*.

Yes, you should take the time to read Documentation/process/ and
especially submitting-patches.rst.

We have it all written down and I point new(-er) people at that
directory at least once a week. :-\

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 24/37] x86/idtentry: Incorporate definitions/declarations of the FRED entries

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:45PM -0700, Xin Li wrote:
> FRED and IDT can share most of the definitions and declarations so
> that in the majority of cases the actual handler implementation is the
> same.
> 
> The differences are the exceptions where FRED stores exception related
> information on the stack and the sysvec implementations as FRED can
> handle irqentry/exit() in the dispatcher instead of having it in each
> handler.
> 
> Also add stub defines for vectors which are not used due to Kconfig
> decisions to spare the ifdeffery in the actual FRED dispatch code.
> 
> Tested-by: Shan Kang 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Xin Li 

This makes me wonder too who the author is. The commit message text
sounds like tglx. :)

> @@ -137,6 +141,17 @@ static __always_inline void __##func(struct pt_regs 
> *regs,   \
>  #define DEFINE_IDTENTRY_RAW(func)\
>  __visible noinstr void func(struct pt_regs *regs)
>  
> +/**
> + * DEFINE_FREDENTRY_RAW - Emit code for raw FRED entry points

LOL, "FREDENTRY"

...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 23/37] x86/fred: Make exc_page_fault() work for FRED

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:44PM -0700, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> On a FRED system, the faulting address (CR2) is passed on the stack,
> to avoid the problem of transient state. Thus we get the page fault
^^

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 20/37] x86/fred: Disallow the swapgs instruction when FRED is enabled

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:41PM -0700, Xin Li wrote:
> +  * Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE
> +  * MSR instead of the GS segment’s descriptor cache. As such, the

:verify_diff: WARNING: Unicode char [’] (0x8217 in line: + * MSR instead of 
the GS segment’s descriptor cache. As such, the

Just do a normal ' - char number 0x27.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 16/37] x86/ptrace: Add FRED additional information to the pt_regs structure

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:37PM -0700, Xin Li wrote:
> FRED defines additional information in the upper 48 bits of cs/ss
> fields. Therefore add the information definitions into the pt_regs
> structure.
> 
> Specially introduce a new structure fred_ss to denote the FRED flags
> above SS selector, which avoids FRED_SSX_ macros and makes the code
> simpler and easier to read.
> 
> Signed-off-by: H. Peter Anvin (Intel) 

You and hpa need to go through all the patches and figure out who's the
author that's going to land in git.

Because this and others have hpa's SOB first, suggesting he's the
author. However, the mail doesn't start with

From: H. Peter Anvin (Intel) 

and then git will make *you* the author.

> Tested-by: Shan Kang 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Xin Li 

...

>   union {
> - u64 ssx;// The full 64-bit data slot containing SS
> - u16 ss; // SS selector
> + /* SS selector */
> + u16 ss;
> + /* The extended 64-bit data slot containing SS */
> + u64 ssx;
> + /* The FRED SS extension */
> + struct fred_ss  fred_ss;

Aha, sanity about the right comments has come to your mind in this next
patch. :-P

Just do them right in the previous one.

>   /*
> -  * Top of stack on IDT systems.
> +  * Top of stack on IDT systems, while FRED systems have extra fields
> +  * defined above for storing exception related information, e.g. CR2 or
> +  * DR6.

Btw, I really appreciate the good commenting - thanks for that!

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 15/37] x86/ptrace: Cleanup the definition of the pt_regs structure

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:36PM -0700, Xin Li wrote:
> -/* Return frame for iretq */
> +
> + /* The IRETQ return frame starts here */
>   unsigned long ip;
> - unsigned long cs;
> +
> + union {
> + u64 csx;// The full 64-bit data slot containing CS
> + u16 cs; // CS selector

I know people want to start using those // one-line comments now but
this struct already uses the /* multiline ones. Can we stick to one type
pls, otherwise it'll turn into a mess.

And pls put those comments ontop, not on the side.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS

2023-11-13 Thread Borislav Petkov

On Tue, Nov 14, 2023 at 12:43:38AM +, Li, Xin3 wrote:
> No.  tglx asked for it:
> https://lkml.kernel.org/kvm/87y1h81ht4.ffs@tglx/

Aha

"According to the CPU folks FRED systems are guaranteed to have WRMSRNS -
I asked for that :). It's just not yet documented."

so I'm going to expect that to appear in the next FRED spec revision...

> Because we are doing 
>   wrmsrns(MSR_IA32_FRED_RSP0, ...)
> here, and X86_FEATURE_WRMSRNS doesn't guarantee MSR_IA32_FRED_RSP0 exists.
> 
> Or I missed something?

Well, according to what I'm hearing and reading so far:

FRED means WRMSRNS
FRED means MSR_IA32_FRED_RSP0

and if you had to be precise, the code should do:

if (cpu_feature_enabled(X86_FEATURE_FRED)) {
if (cpu_feature_enabled(X86_FEATURE_WRMSRNS))
wrmsrns(MSR_IA32_FRED_RSP0, (unsigned 
long)task_stack_page(task) + THREAD_SIZE);
else
wrmsr(MSR_IA32_FRED_RSP0, (unsigned 
long)task_stack_page(task) + THREAD_SIZE);
}

but apparently FRED implies WRMSRNS - not documented anywhere currently
- so you can save yourself one check.

But your version checks FRED if it can do WRMSRNS while there's
a separate WRMSRNS flag and that made me wonder...

> Another patch set should replace WRMSR with WRMSRNS, with SERIALIZE added
> when needed.

I sense someone wants to optimize MSR writes ... :-)

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 19/37] x86/fred: Update MSR_IA32_FRED_RSP0 during task switch

2023-11-13 Thread Borislav Petkov

On Mon, Nov 13, 2023 at 12:36:04PM -0500, H. Peter Anvin wrote:
> A resource cannot be consumed after the value has been written; this
> is the only necessary level of serialization, equivalent to, say, RAX.

Lemme see if I understand this correctly using this context as an
example: after this MSR_IA32_FRED_RSP0 write, any FRED events determined
to be delivered to level 0 will use this new task stack ptr?

And since the new task is not running yet and the old one isn't running
either, we're fine here. So the "serialization point" I was talking
about above is bollocks.

Close? :)

> A serializing instruction stops the entire pipeline until everything
> has retired and any stores have become globally visible.

Right, we don't need that here.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 19/37] x86/fred: Update MSR_IA32_FRED_RSP0 during task switch

2023-11-13 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:40PM -0700, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> MSR_IA32_FRED_RSP0 is used during ring 3 event delivery, and needs to
> be updated to point to the top of next task stack during task switch.
> 
> Signed-off-by: H. Peter Anvin (Intel) 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/switch_to.h | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/switch_to.h 
> b/arch/x86/include/asm/switch_to.h
> index f42dbf17f52b..c3bd0c0758c9 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -70,9 +70,13 @@ static inline void update_task_stack(struct task_struct 
> *task)
>  #ifdef CONFIG_X86_32
>   this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
>  #else
> - /* Xen PV enters the kernel on the thread stack. */
> - if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> + /* WRMSRNS is a baseline feature for FRED. */
> + wrmsrns(MSR_IA32_FRED_RSP0, (unsigned 
> long)task_stack_page(task) + THREAD_SIZE);

If this non-serializing write happens now and, AFAICT, the CR3 write
during the task switch has already happened in switch_mm* earlier, what
is the serialization point that's going to make sure that write is
committed before the new task starts executing?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS

2023-11-08 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:22PM -0700, Xin Li wrote:
> Subject: Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for 
> WRMSRNS
  

For all your text:

s/cpu/CPU/g

> WRMSRNS is an instruction that behaves exactly like WRMSR, with
> the only difference being that it is not a serializing instruction
> by default. Under certain conditions, WRMSRNS may replace WRMSR to
> improve performance.
> 
> Add the CPU feature bit for WRMSRNS.
> 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/cpufeatures.h   | 1 +
>  tools/arch/x86/include/asm/cpufeatures.h | 1 +
>  2 files changed, 2 insertions(+)

It looks to me like you can merge the first three patches into one as
all they do is add that insn support.

Then, further down in the patchset, it says:

+   if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+   /* WRMSRNS is a baseline feature for FRED. */

but WRMSRNS is not mentioned in the FRED spec "Document Number:
346446-005US, Revision: 5.0" which, according to

https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html

is the latest.

Am I looking at the wrong one?

> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 58cb9495e40f..330876d34b68 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -322,6 +322,7 @@
>  #define X86_FEATURE_FSRS (12*32+11) /* "" Fast short REP STOSB */
>  #define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP 
> {CMPSB,SCASB} */
>  #define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" 
> (userspace) GS */
> +#define X86_FEATURE_WRMSRNS  (12*32+19) /* "" Non-Serializing Write 
> to Model Specific Register instruction */

  /* "" Non-serializing WRMSR */

is more than enough.

And now I'm wondering: when you're adding a separate CPUID bit, then the
above should be

+   if (cpu_feature_enabled(X86_FEATURE_WRMSRNS)) {
+   /* WRMSRNS is a baseline feature for FRED. */

I see that you're adding a dependency:

+   { X86_FEATURE_FRED, X86_FEATURE_WRMSRNS   },

which then means you don't need the X86_FEATURE_WRMSRNS definition at
all and can use X86_FEATURE_FRED only.

So, what's up?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 1/5] x86/paravirt: move some functions and defines to alternative

2023-10-25 Thread Borislav Petkov

On Wed, Oct 25, 2023 at 03:31:07PM +0200, Juergen Gross wrote:
> There is
> 
> #define nop() asm volatile ("nop")
> 
> in arch/x86/include/asm/special_insns.h already.

Then call it "nop_func" or so.

> It might not be needed now, but are you sure we won't need it in future?

No, I'm not.

What I'm sure of is: stuff should be added to the kernel only when
really needed. Not in the expectation that it might potentially be
needed at some point.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 1/5] x86/paravirt: move some functions and defines to alternative

2023-10-25 Thread Borislav Petkov

On Thu, Oct 19, 2023 at 11:15:16AM +0200, Juergen Gross wrote:
> +/* Low-level backend functions usable from alternative code replacements. */
> +DEFINE_ASM_FUNC(x86_nop, "", .entry.text);
> +EXPORT_SYMBOL_GPL(x86_nop);

This is all x86 code so you don't really need the "x86_" prefix - "nop"
is perfectly fine.

> +noinstr void x86_BUG(void)
> +{
> + BUG();
> +}
> +EXPORT_SYMBOL_GPL(x86_BUG);

That export is needed for?

Paravirt stuff in modules?

It builds here without it - I guess I need to do an allmodconfig.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests"

2023-09-15 Thread Borislav Petkov

On Thu, Sep 14, 2023 at 10:02:05AM -0700, Elliott Mitchell wrote:
> Indeed.  At what point is the lack of information and response long
> enough to simply commit a revert due to those lacks?

At no point.

> Even with the commit message having been rewritten and the link to:
> https://lkml.kernel.org/r/20210628172740.245689-1-smita.koralahallichannabasa...@amd.com
> added, this still reads as roughly:
> 
> "A hypothetical bug on a hypothetivisor"

If "Hypervisors likely do not expose the SMCA feature to the guest"
doesn't explain to you what the problem is this commit is fixing, then
I can't help you.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests"

2023-09-07 Thread Borislav Petkov

On Thu, Sep 07, 2023 at 08:08:00PM -0700, Elliott Mitchell wrote:
> This reverts commit 767f4b620edadac579c9b8b6660761d4285fa6f9.
> 
> There are at least 3 valid reasons why a VM may see MCE events/registers.

Hmm, so they all read like a bunch of handwaving to me, with those
probable hypothetical "may" formulations.

How about we cut to the chase and you explain what exactly is the
concrete issue you're encountering and trying to solve?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov

On Thu, Jun 01, 2023 at 03:22:33PM +0200, Borislav Petkov wrote:
> Now lemme restart testing.

This is from another box, with the latest changes incorporated:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc1-mtrr

--- proc-mtrr.before2011-03-04 01:03:35.243994733 +0100
+++ proc-mtrr.after 2023-06-01 16:28:54.95456 +0200
@@ -1,3 +1,3 @@
 reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
 reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
-reg02: base=0x0c000 ( 3072MB), size=  256MB, count=1: write-back
+reg02: base=0x0c000 ( 3072MB), size=  128MB, count=1: write-back

Want mtrr=debug output again?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov

On Thu, Jun 01, 2023 at 10:19:01AM +0200, Juergen Gross wrote:
> Patch 2 wants this diff on top:

Obviously. :-)

That fixes it, thx.

Now lemme restart testing.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov

On Thu, Jun 01, 2023 at 08:39:17AM +0200, Juergen Gross wrote:
> Does this translate to: "we should remove that cleanup crap"? I'd be
> positive to that. :-)

Why, what's wrong with that thing?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-31 Thread Borislav Petkov

On Wed, May 31, 2023 at 11:31:37AM +0200, Juergen Gross wrote:
> What it did would have been printed if pr_debug() would have been
> active. :-(

Lemme turn those into pr_info(). pr_debug() is nuts.

> Did you check whether CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT was the same in 
> both
> kernels you've tested?

Yes, it is enabled.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-31 Thread Borislav Petkov

On Wed, May 31, 2023 at 09:28:57AM +0200, Juergen Gross wrote:
> Can you please boot the system with the MTRR patches and specify "mtrr=debug"
> on the command line? I'd be interested in the raw register values being read
> and the resulting memory type map.

This is exactly why I wanted this option. And you're already putting it
to good use. :-P

Full dmesg below.

[0.00] microcode: updated early: 0x710 -> 0x718, date = 2019-05-21
[0.00] Linux version 6.4.0-rc1+ (boris@zn) (gcc (Debian 12.2.0-9) 
12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Tue May 
30 15:54:17 CEST 2023
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-6.4.0-rc1+ root=/dev/sda7 
ro earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 ras=cec_disable 
root=/dev/sda7 log_buf_len=10M resume=/dev/sda5 no_console_suspend 
ignore_loglevel mtrr=debug
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] signal: max sigframe size: 1776
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009] usable
[0.00] BIOS-e820: [mem 0x0010-0x18ebafff] usable
[0.00] BIOS-e820: [mem 0x18ebb000-0x18fe7fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x18fe8000-0x18fe8fff] usable
[0.00] BIOS-e820: [mem 0x18fe9000-0x18ff] ACPI NVS
[0.00] BIOS-e820: [mem 0x1900-0x1dffcfff] usable
[0.00] BIOS-e820: [mem 0x1dffd000-0x1dff] ACPI data
[0.00] BIOS-e820: [mem 0x1e00-0xac77cfff] usable
[0.00] BIOS-e820: [mem 0xac77d000-0xac77] type 20
[0.00] BIOS-e820: [mem 0xac78-0xac780fff] reserved
[0.00] BIOS-e820: [mem 0xac781000-0xac782fff] type 20
[0.00] BIOS-e820: [mem 0xac783000-0xac7d9fff] reserved
[0.00] BIOS-e820: [mem 0xac7da000-0xac7dafff] type 20
[0.00] BIOS-e820: [mem 0xac7db000-0xac7dcfff] reserved
[0.00] BIOS-e820: [mem 0xac7dd000-0xac7e7fff] type 20
[0.00] BIOS-e820: [mem 0xac7e8000-0xac7f1fff] reserved
[0.00] BIOS-e820: [mem 0xac7f2000-0xac7f5fff] type 20
[0.00] BIOS-e820: [mem 0xac7f6000-0xac7f9fff] reserved
[0.00] BIOS-e820: [mem 0xac7fa000-0xac7fafff] type 20
[0.00] BIOS-e820: [mem 0xac7fb000-0xac803fff] reserved
[0.00] BIOS-e820: [mem 0xac804000-0xac810fff] type 20
[0.00] BIOS-e820: [mem 0xac811000-0xac813fff] reserved
[0.00] BIOS-e820: [mem 0xac814000-0xad7f] usable
[0.00] BIOS-e820: [mem 0xb000-0xb3ff] reserved
[0.00] BIOS-e820: [mem 0xfed2-0xfed3] reserved
[0.00] BIOS-e820: [mem 0xfed5-0xfed8] reserved
[0.00] BIOS-e820: [mem 0xffa0-0xffa3] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00044fff] usable
[0.00] printk: bootconsole [earlyser0] enabled
[0.00] printk: debug: ignoring loglevel setting.
[0.00] NX (Execute Disable) protection: active
[0.00] efi: EFI v2.0 by American Megatrends
[0.00] efi: ACPI 2.0=0x1d98 SMBIOS=0xac811018 
[0.00] efi: Remove mem57: MMIO range=[0xb000-0xb3ff] (64MB) 
from e820 map
[0.00] e820: remove [mem 0xb000-0xb3ff] reserved
[0.00] efi: Not removing mem58: MMIO range=[0xfed2-0xfed3] 
(128KB) from e820 map
[0.00] efi: Remove mem59: MMIO range=[0xfed5-0xfed8] (0MB) from 
e820 map
[0.00] e820: remove [mem 0xfed5-0xfed8] reserved
[0.00] efi: Remove mem60: MMIO range=[0xffa0-0xffa3] (0MB) from 
e820 map
[0.00] e820: remove [mem 0xffa0-0xffa3] reserved
[0.00] SMBIOS 2.6 present.
[0.00] DMI: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 3591.377 MHz processor
[0.000767] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.007307] e820: remove [mem 0x000a-0x000f] usable
[0.012878] last_pfn = 0x45 max_arch_pfn = 0x4
[0.018357] MTRR default

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-30 Thread Borislav Petkov

On Mon, May 22, 2023 at 04:17:50PM +0200, Juergen Gross wrote:
> The attached diff is for patch 13.

Merged and pushed out into same branch.

Next issue. Diffing /proc/mtrr shows:

--- proc-mtrr.6.3   2023-05-30 17:00:13.215999483 +0200
+++ proc-mtrr.after 2023-05-30 16:01:38.281997816 +0200
@@ -1,8 +1,8 @@
 reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
-reg01: base=0x08000 ( 2048MB), size=  512MB, count=1: write-back
+reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
 reg02: base=0x0a000 ( 2560MB), size=  256MB, count=1: write-back
 reg03: base=0x0ae00 ( 2784MB), size=   32MB, count=1: uncachable
-reg04: base=0x1 ( 4096MB), size= 4096MB, count=1: write-back
+reg04: base=0x1 ( 4096MB), size=  256MB, count=1: write-back
 reg05: base=0x2 ( 8192MB), size= 8192MB, count=1: write-back
 reg06: base=0x4 (16384MB), size= 1024MB, count=1: write-back
 reg07: base=0x44000 (17408MB), size=  256MB, count=1: write-back

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-11 Thread Borislav Petkov

On Wed, May 10, 2023 at 05:53:15PM +0200, Juergen Gross wrote:
> Urgh, yes, there is something missing:
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
> b/arch/x86/kernel/cpu/mtrr/generic.c
> index 031f7ea8e72b..9544e7d13bb3 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -521,8 +521,12 @@ u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
> for (i = 0; i < cache_map_n && start < end; i++) {
> if (start >= cache_map[i].end)
> continue;

So the loop will go through the map until...

> -   if (start < cache_map[i].start)
> +   if (start < cache_map[i].start) {

... it reaches the first entry where that is true.

> type = type_merge(type, mtrr_state.def_type, uniform);

the @type argument is MTRR_TYPE_INVALID, def_type is WRBACK so what
this'll do is simply get you the default WRBACK type:

type_merge:
if (type == MTRR_TYPE_INVALID)
return new_type;

> +   start = cache_map[i].start;
> +   if (end <= start)
> +   break;

Now you break here because end <= start. Why?

You can just as well do:

if (start < cache_map[i].start) {
/* region non-overlapping with the region in the map */
if (end <= cache_map[i].start)
return type_merge(type, mtrr_state.def_type, uniform);

... rest of the processing ...

In general, I get it that your code is slick but I want it to be
maintainable - not slick. I'd like for when people look at this, not
have to  add a bunch of debugging output in order to swap the whole
thing back into their brains.

So mtrr_type_lookup() definitely needs comments explaining what goes
where.

You can send it as a diff ontop - I'll merge it.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-10 Thread Borislav Petkov

On Wed, May 10, 2023 at 03:30:24PM +0200, Borislav Petkov wrote:
> So this map lookup thing is wrong in this case.

Btw, current tree is:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc1-mtrr

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-10 Thread Borislav Petkov

On Wed, May 10, 2023 at 01:36:41AM +0200, Borislav Petkov wrote:
> More staring at this tomorrow, on a clear head.

Yeah, I'm going to leave it as is. Tried doing a union with bitfields
but doesn't get any prettier.

Next crapola:

The Intel box says now:

[8.138683] sgx: EPC section 0x8020-0x85ff
[8.204838] pmd_set_huge: Cannot satisfy [mem 0x8020-0x8040] with a 
huge-page mapping due to MTRR override, uniform: 0

(I've extended the debug output).

and that happens because

[8.174229] mtrr_type_lookup: mtrr_state_set: 1
[8.178909] mtrr_type_lookup: start: 0x8020, cache_map[3].start: 
0x8880

that's

 if (start < cache_map[i].start) {

in mtrr_type_lookup(). I fail to see how that check would work for the
range 0x8020-0x8040 and the MTRR map is:

[0.000587] MTRR map: 4 entries (3 fixed + 1 variable; max 23), built from 
10 variable MTRRs
[0.000588]   0: -0009 write-back
[0.000589]   1: 000a-000b uncachable
[0.000590]   2: 000c-000f write-protect
[0.000591]   3: 8880- uncachable

so the UC range comes after this one we request.

[8.186372] mtrr_type_lookup: type: 0x6, cache_map[3].type: 0x0

now the next type merging happens and the 3rd region's type is UC, ofc.

[8.192433] type_merge: type: 0x6, new_type: 0x0, effective_type: 0x0, clear 
uniform

we clear uniform and we fail:

[8.200331] mtrr_type_lookup: ret, uniform: 0

So this map lookup thing is wrong in this case.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-09 Thread Borislav Petkov

On Tue, May 09, 2023 at 10:14:37PM +0200, Borislav Petkov wrote:
> On Tue, May 02, 2023 at 02:09:15PM +0200, Juergen Gross wrote:
> > This series tries to fix the rather special case of PAT being available
> > without having MTRRs (either due to CONFIG_MTRR being not set, or
> > because the feature has been disabled e.g. by a hypervisor).
> 
> More weird stuff. With the series:

Yah, that was me.

That ->enabled thing is *two* bits. FFS.

More staring at this tomorrow, on a clear head.

diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index a28e6bbd8f21..f476a1355182 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -84,7 +84,7 @@ typedef __u8 mtrr_type;
 struct mtrr_state_type {
struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
-   bool enabled;
+   unsigned char enabled;
bool have_fixed;
mtrr_type def_type;
 };

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-09 Thread Borislav Petkov

On Tue, May 02, 2023 at 02:09:15PM +0200, Juergen Gross wrote:
> This series tries to fix the rather special case of PAT being available
> without having MTRRs (either due to CONFIG_MTRR being not set, or
> because the feature has been disabled e.g. by a hypervisor).

More weird stuff. With the series:

[root@vh: ~> cat /proc/mtrr 
cat: /proc/mtrr: Input/output error

before:

[root@vh: ~> cat /proc/mtrr 
reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0c000 ( 3072MB), size=  256MB, count=1: write-back
reg03: base=0x0ff00 ( 4080MB), size=   16MB, count=1: write-protect

I think it wrongly determines that MTRRs are disabled by BIOS:

MTRRs disabled by BIOS
x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT

which is obviously wrong.

But more debugging later.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-03 Thread Borislav Petkov

On Wed, May 03, 2023 at 11:14:25AM -0700, Sohil Mehta wrote:
> A Nit -> Documentation/process/maintainer-tip.rst suggests:
> "The condensed patch description in the subject line should start with a
> uppercase letter and ..."

Yeah, good point.

But my patch massaging script does that automatically so no need to
resend.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v4 05/12] x86/xen: set MTRR state when running as Xen PV initial domain

2023-03-23 Thread Borislav Petkov

On Mon, Mar 06, 2023 at 05:34:18PM +0100, Juergen Gross wrote:
> + for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
> + op.u.read_memtype.reg = reg;
> + if (HYPERVISOR_platform_op())
> + break;
> +
> + /*
> +  * Only called in dom0, which has all RAM PFNs mapped at
> +  * RAM MFNs, and all PCI space etc. is identity mapped.
> +  * This means we can treat MFN == PFN regarding MTTR settings.


"MTRR"

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2] x86/paravirt: merge activate_mm and dup_mmap callbacks

2023-03-06 Thread Borislav Petkov

On Thu, Feb 23, 2023 at 04:05:51PM +0100, Juergen Gross wrote:
> x86 maintainers, I think this patch should be carried via the tip tree.

You missed a spot. I'll whack it.

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index a8b323266179..c3ad8a526378 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -16,13 +16,6 @@
 
 extern atomic64_t last_mm_ctx_id;
 
-#ifndef CONFIG_PARAVIRT_XXL
-static inline void paravirt_activate_mm(struct mm_struct *prev,
-   struct mm_struct *next)
-{
-}
-#endif /* !CONFIG_PARAVIRT_XXL */
-
 #ifdef CONFIG_PERF_EVENTS
 DECLARE_STATIC_KEY_FALSE(rdpmc_never_available_key);
 DECLARE_STATIC_KEY_FALSE(rdpmc_always_available_key);

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v7 13/41] mm: Make pte_mkwrite() take a VMA

2023-03-02 Thread Borislav Petkov

On Mon, Feb 27, 2023 at 02:29:29PM -0800, Rick Edgecombe wrote:
> [0] 
> https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e403...@redhat.com/#t

I guess that sub-thread about how you arrived at this "pass a VMA"
decision should be in the Link tag. But that's for the committer, I'd
say.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 4/7] x86/power: Inline write_cr[04]()

2023-01-20 Thread Borislav Petkov

On Mon, Jan 16, 2023 at 03:25:37PM +0100, Peter Zijlstra wrote:
> Since we can't do CALL/RET until GS is restored and CR[04] pinning is
^^

Ditto like for the previous one.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 3/7] x86/power: De-paravirt restore_processor_state()

2023-01-20 Thread Borislav Petkov

On Mon, Jan 16, 2023 at 03:25:36PM +0100, Peter Zijlstra wrote:
> Since Xen PV doesn't use restore_processor_state(), and we're going to
> have to avoid CALL/RET until at least GS is restored,

Drop the "we":

"..., and CALL/RET cannot happen before GS has been restored, ..."

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 2/7] x86/boot: Delay sev_verify_cbit() a bit

2023-01-19 Thread Borislav Petkov

On Mon, Jan 16, 2023 at 03:25:35PM +0100, Peter Zijlstra wrote:
> Per the comment it is important to call sev_verify_cbit() before the
> first RET instruction, this means we can delay calling this until more

Make that "... this means that this can be delayed until... "

And I believe this is not about the first RET insn but about the *next* RET
which will pop poisoned crap from the unencrypted stack and do shits with it.

Also, there's this over sev_verify_cbit():

 * sev_verify_cbit() is called before switching to a new long-mode page-table
 * at boot.

so you can't move it under the

movq%rax, %cr3

Looking at this more, there's a sme_enable() call on the BSP which is already in
C.

So, can we do that C-bit verification once on the BSP, *in C* which would be a
lot easier, and be done with it?

Once it is verified there, the bit is the same on all APs so all good.

Right?

joro?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 1/2] x86: Check return values from early_memremap calls

2023-01-04 Thread Borislav Petkov

On Thu, Nov 10, 2022 at 08:07:53AM -0800, Dave Hansen wrote:
> On 11/10/22 07:45, Ross Philipson wrote:
> > dt = early_memremap(initial_dtb, map_len);
> > +   if (!dt) {
> > +   pr_warn("failed to memremap initial dtb\n");
> > +   return;
> > +   }
> 
> Are all of these new pr_warn/err()'s really adding much value?  They all
> look pretty generic.  It makes me wonder if we should just spit out a
> generic message in early_memremap() and save all the callers the trouble.

Well, let's see.

early_memremap() calls __early_ioremap() and that one already warns befofe each
NULL return. So I guess we don't need the error messages as we will know where
it starts failing.

I guess we still need the error handling though.

I.e., this above should be:

dt = early_memremap(initial_dtb, map_len);
+   if (!dt)
+   return;

so that we don't go off into the weeds with a NULL ptr.

Or?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3] x86/boot: skip realmode init code when running as Xen PV guest

2022-11-24 Thread Borislav Petkov

On Thu, Nov 24, 2022 at 02:30:39PM +0100, Juergen Gross wrote:
> Looking at the date when 084ee1c641a0 went in I don't think it _needs_
> to go in now, but I wouldn't complain ...

So if you don't have a particular and specific reason, I won't queue it
for stable at all.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3] x86/boot: skip realmode init code when running as Xen PV guest

2022-11-24 Thread Borislav Petkov

On Wed, Nov 23, 2022 at 12:45:23PM +0100, Juergen Gross wrote:
> When running as a Xen PV guest there is no need for setting up the
> realmode trampoline, as realmode isn't supported in this environment.
> 
> Trying to setup the trampoline has been proven to be problematic in
> some cases, especially when trying to debug early boot problems with
> Xen requiring to keep the EFI boot-services memory mapped (some
> firmware variants seem to claim basically all memory below 1M for boot
> services).
> 
> Introduce new x86_platform_ops operations for that purpose, which can
> be set to a nop by the Xen PV specific kernel boot code.
> 
> Fixes: 084ee1c641a0 ("x86, realmode: Relocator for realmode code")

This text and Fixes: tag sounds like this needs to go to Linus and
stable now?

> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 41d7669a97ad..247aca9f8ed1 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -200,14 +200,18 @@ static void __init set_real_mode_permissions(void)
>   set_memory_x((unsigned long) text_start, text_size >> PAGE_SHIFT);
>  }
>  
> -static int __init init_real_mode(void)
> +void __init init_real_mode(void)
>  {
>   if (!real_mode_header)
>   panic("Real mode trampoline was not allocated");
>  
>   setup_real_mode();
>   set_real_mode_permissions();
> +}
>  
> +static int __init call_init_real_mode(void)
> +{
> + x86_platform.realmode_init();
>   return 0;
>  }
> -early_initcall(init_real_mode);
> +early_initcall(call_init_real_mode);

I'll name that one "do_init_real_mode" as "call init" sounds weird.

Otherwise, it is as straightforward as it gets.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/xen: simplify sysenter and syscall setup

2022-10-20 Thread Borislav Petkov

On Thu, Oct 20, 2022 at 01:36:19PM +0200, Juergen Gross wrote:
> xen_enable_sysenter() and xen_enable_syscall() can be simplified a lot.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/xen/setup.c | 23 ++-
>  1 file changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index cfa99e8f054b..0f33ed6d3a7b 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -910,17 +910,9 @@ static int register_callback(unsigned type, const void 
> *func)
>  
>  void xen_enable_sysenter(void)
>  {
> - int ret;
> - unsigned sysenter_feature;
> -
> - sysenter_feature = X86_FEATURE_SYSENTER32;
> -
> - if (!boot_cpu_has(sysenter_feature))
> - return;
> -
> - ret = register_callback(CALLBACKTYPE_sysenter, 
> xen_entry_SYSENTER_compat);
> - if(ret != 0)
> - setup_clear_cpu_cap(sysenter_feature);
> + if (boot_cpu_has(X86_FEATURE_SYSENTER32) &&

Can you switch that and below to cpu_feature_enabled() while at it, pls?

> + if (boot_cpu_has(X86_FEATURE_SYSCALL32) &&
^^

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/2] x86: Check return values from early_memremap calls

2022-10-12 Thread Borislav Petkov

On Wed, Oct 12, 2022 at 11:13:20AM -0400, Ross Philipson wrote:
> I am already using the pr_* macros in the patches. Are you asking me to do
> something

Yes. You do

pr_X("prefix_string: msg"

while you should set the prefix string and do

pr_X("msg".

As said, grep the tree for examples where pr_fmt() is set.
arch/x86/kernel/cpu/bugs.c is a good example.

> When I was working on the patches I asked Andrew Cooper at Citrix what
> action I should take if any of the calls in the Xen code failed. I
> believe he told me it was basically fatal and that panic() would be
> fine there.

Yes, that should be confirmed by people who know the code and you should
mention in the commit message at least that panicking is the only viable
thing to do in the respective case. If, as Jürgen says, it is actually
better to panic() in those cases, especially if it otherwise would make
debugging harder, then sure, by all means.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/2] x86: Check return values from early_memremap calls

2022-10-08 Thread Borislav Petkov

Adding Xen and Jailhouse people and MLs to Cc.

Folks, thread starts here:

https://lore.kernel.org/r/1650035401-22855-1-git-send-email-ross.philip...@oracle.com

On Fri, Apr 15, 2022 at 11:10:00AM -0400, Ross Philipson wrote:
> There are a number of places where early_memremap is called
> but the return pointer is not checked for NULL. The call
> can result in a NULL being returned so the checks must
> be added.
> 
> Signed-off-by: Ross Philipson 
> ---
>  arch/x86/kernel/devicetree.c | 10 ++
>  arch/x86/kernel/e820.c   |  5 +
>  arch/x86/kernel/jailhouse.c  |  6 ++
>  arch/x86/kernel/mpparse.c| 23 +++
>  arch/x86/kernel/setup.c  |  5 +
>  arch/x86/xen/enlighten_hvm.c |  2 ++
>  arch/x86/xen/mmu_pv.c|  8 
>  arch/x86/xen/setup.c |  2 ++
>  8 files changed, 61 insertions(+)

Ok, a couple of notes:

1. the pr_*(":" ... )

thing is done using pr_fmt() - grep the tree for examples.

2. I think you should not panic() the machine but issue a the
warning/error and let the machine die a painful death anyway. But Xen
folks will know better what would be the optimal thing to do.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-30 Thread Borislav Petkov

On Fri, Sep 30, 2022 at 03:11:07PM +0200, Juergen Gross wrote:
> Yes, this can be done. It would practically have to be the first one just
> after CPUHP_BRINGUP_CPU.

Right.

> The question is whether we really want to call the MTRR/PAT initialization
> on hotplugged cpus only after enabling interrupts. Note that the callbacks
> are activated only at the end of start_secondary(), while today MTRR/PAT
> initialization is called some time earlier by:
> 
>   start_secondary()
> smp_callin()
>   smp_store_cpu_info()
> identify_secondary_cpu()
>   mtrr_ap_init()
> 
> I don't think this is a real problem, but I wanted to mention it.

Yep, I saw that too but I don't think there will be a problem either.
I mean, it should be early enough as you point out not to need proper
MTRR/PAT settings yet.

But we'll make sure we test this real good too.

> The next question would be, why MTRR/PAT init should be special
> (meaning: why are all the other functions called that early not
> realized via callbacks)?

Well, our init code is crazy. Frankly, I don't see why not more of the
"init stuff on the freshly hotplugged CPU" work is done there...

> Is it just because of the special handling during boot/resume?

... unless this is the case, ofc. Right.

> It might be worth a discussion whether there shouldn't be a special group
> of callbacks activated BEFORE interrupts are being enabled.

That's a good question. /me writes it down to ask tglx when he gets back.

I mean, that early I don't think it matters whether IRQs are enabled
or not. But this'll need to be audited on a case by case basis. As I
said, our boot code is nuts with stuff bolted on everywhere for whatever
reasons.

> Thanks. I'll write a patch for that.

Thanks too.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-30 Thread Borislav Petkov

On Thu, Sep 29, 2022 at 10:26:59AM +0200, Juergen Gross wrote:
> So right now I'm inclined to be better on the safe side by not adding any
> cpu hotplug hook, but to use just the same "delayed AP init" flag as today,
> just renaming it. This would leave the delayed MTRR/PAT init in place for
> resume and kexec cases, but deferring the MTRR/PAT cleanup due to this
> potential issue seems not appropriate, as the cleanup isn't changing the
> behavior here.

Ok, what's wrong with adding a special hotplug level just for that thing
and running it very early? Practically pretty much where it was in time,
in identify_secondary_cpu()?

Having a special one is warranted, as you explain, I'd say.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-28 Thread Borislav Petkov

On Wed, Sep 28, 2022 at 06:32:20PM +0200, Juergen Gross wrote:
> I can do that.

Thx!

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-28 Thread Borislav Petkov

On Wed, Sep 28, 2022 at 03:43:56PM +0200, Juergen Gross wrote:
> Would you feel better with adding a new enum member CPUHP_AP_CACHECTRL_ONLINE?
> 
> This would avoid a possible source of failure during resume in case no slot
> for CPUHP_AP_ONLINE_DYN is found (quite improbable, but in theory possible).

Let's keep that in the bag for the time when we get to cross that bridge.

> You wouldn't want to do that there, as there are multiple places where
> pm_sleep_enable_secondary_cpus() is being called.

We want all of them, I'd say. They're all some sort of suspend AFAICT.
But yes, if we get to do it, that would need a proper audit.

> Additionally not all cases are coming in via
> pm_sleep_enable_secondary_cpus(), as there is e.g. a call of
> suspend_enable_secondary_cpus() from kernel_kexec(), which wants to
> have the same handling.

Which means, more hairy.

> arch_thaw_secondary_cpus_begin() and arch_thaw_secondary_cpus_end() are
> the functions to mark start and end of the special region where the
> delayed MTRR setup should happen.

Yap, it seems like the best solution at the moment. Want me to do a
proper patch and test it on real hw?

:-)

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-28 Thread Borislav Petkov

On Wed, Sep 28, 2022 at 01:14:11PM +0200, Juergen Gross wrote:
> No, we don't.
> 
> Using basically your patch, but with
> 
> + mtrr_online = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> + "x86/mtrr:online",
> + mtrr_ap_init, NULL);
> 
> moved to the end of mtrr_aps_init(), and:
> 
> +void mtrr_aps_thaw(void)
> +{
> + cpuhp_remove_state_nocalls(mtrr_online);
> +}

Yes, so you said. I'm not sure I like this toggling of notifier
registration like that.

Optimally, I'd like to be able to query the suspend code whether it is
in the process of resuming.

This here:


static int resume_target_kernel(bool platform_mode)
{

...

 Enable_irqs:
system_state = SYSTEM_RUNNING;
local_irq_enable();
 
 Enable_cpus:
pm_sleep_enable_secondary_cpus();


but being able to do:

pm_sleep_enable_secondary_cpus();
system_state = SYSTEM_RUNNING | SYSTEM_RUNNING_APS_UP;

which can't work, obviously. But something like that.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-28 Thread Borislav Petkov

On Wed, Sep 28, 2022 at 08:16:53AM +0200, Juergen Gross wrote:
> > Are sure the hotplug notifier doesn't get called in the boot and in the

It doesn't because it gets registered after smp_init()...

> > resume cases?

... but it gets called during resume because by that time the notifier
has been registered already. See my notes at the end of this mail of
what the code does currently.

> In case my suspicion is correct: this can still be solved by adding the
> hotplug notifier only in mtrr_aps_init(), and removing it again in
> arch_thaw_secondary_cpus_begin().

Pretty much. Yeah, we still need a bool. ;-(

But that bool has a much smaller scope and it is perfectly clear what it
does. And I've added a comment. Could've used comments for the delayed
init thing.

Anyway, it gets set in a thaw callback (I mean, might as well, since
we call into the MTRR code anyway). I probably can make this even
cleaner and not do any bool if I could query in the notifier whether I'm
resuming...

Thx.

---
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 76d726074c16..86b8009d2429 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -42,9 +42,8 @@ extern int mtrr_add_page(unsigned long base, unsigned long 
size,
 extern int mtrr_del(int reg, unsigned long base, unsigned long size);
 extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_ap_init(void);
-extern void set_mtrr_aps_delayed_init(void);
 extern void mtrr_aps_init(void);
+extern void mtrr_aps_thaw(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
@@ -83,9 +82,8 @@ static inline int mtrr_trim_uncached_memory(unsigned long 
end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-#define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
 #define mtrr_aps_init() do {} while (0)
+#define mtrr_aps_thaw() do {} while (0)
 #define mtrr_bp_restore() do {} while (0)
 #  endif
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..deef1b5b27cc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1948,7 +1948,6 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_32
enable_sep_cpu();
 #endif
-   mtrr_ap_init();
validate_apic_and_package_id(c);
x86_spec_ctrl_setup_ap();
update_srbds_msr();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..c4089fd2b477 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -69,7 +69,7 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;
+static bool ap_notifier_disabled;
 
 static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
 
@@ -176,7 +176,7 @@ static int mtrr_rendezvous_handler(void *info)
if (data->smp_reg != ~0U) {
mtrr_if->set(data->smp_reg, data->smp_base,
 data->smp_size, data->smp_type);
-   } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
+   } else if (!cpu_online(smp_processor_id())) {
mtrr_if->set_all();
}
return 0;
@@ -784,13 +784,16 @@ void __init mtrr_bp_init(void)
}
 }
 
-void mtrr_ap_init(void)
+static int mtrr_ap_init(unsigned int cpu)
 {
if (!mtrr_enabled())
-   return;
+   return 1;
 
-   if (!use_intel() || mtrr_aps_delayed_init)
-   return;
+   if (!use_intel())
+   return 1;
+
+   if (ap_notifier_disabled)
+   return 0;
 
/*
 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
@@ -806,6 +809,8 @@ void mtrr_ap_init(void)
 *  lock to prevent mtrr entry changes
 */
set_mtrr_from_inactive_cpu(~0U, 0, 0, 0);
+
+   return 0;
 }
 
 /**
@@ -823,34 +828,26 @@ void mtrr_save_state(void)
smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
 }
 
-void set_mtrr_aps_delayed_init(void)
-{
-   if (!mtrr_enabled())
-   return;
-   if (!use_intel())
-   return;
-
-   mtrr_aps_delayed_init = true;
-}
-
 /*
- * Delayed MTRR initialization for all AP's
+ * Delayed MTRR initialization for all APs
  */
 void mtrr_aps_init(void)
 {
if (!use_intel() || !mtrr_enabled())
return;
 
-   /*
-* Check if someone has requested the delay of AP MTRR initialization,
-* by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
-* then we are done.
-*/
-   if (!mtrr_aps_delayed_init)
-   return;
-
set_mtrr(~0U, 0, 0,

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-27 Thread Borislav Petkov

On Tue, Sep 27, 2022 at 02:21:17PM +0200, Juergen Gross wrote:
> So replacing the bool with "(system_state != SYSTEM_RUNNING)" is fine
> with you right now? We can later switch that to the "more elegant"
> solution when it shows up.

Ok, I think I have something. And it was staring me straight in the
face but I didn't see it: the MTRR code needs a hotplug notifier. In
that notifier it can do the immediate, i.e., non-delayed init while the
delayed init becomes the default, see below.

And ignore the pr_info debugging gunk pls.

mtrr_ap_init() becomes the notifier callback. It doesn't need to be
called in identify_secondary_cpu() anymore as in the init case that
function doesn't do anything - delayed=true - and in the hotplug case
the notifier runs.

mtrr_aps_init() - "aps" in plural - does the delayed init after all CPUs
have been brought online after the box has booted. That might need some
renaming.

And yes, there's a lot more to cleanup after this. This code has grown
wart after wart over the years...

Fun.

---
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 76d726074c16..1a3dad244bba 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -42,8 +42,6 @@ extern int mtrr_add_page(unsigned long base, unsigned long 
size,
 extern int mtrr_del(int reg, unsigned long base, unsigned long size);
 extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_ap_init(void);
-extern void set_mtrr_aps_delayed_init(void);
 extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
@@ -83,8 +81,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long 
end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-#define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
 #define mtrr_aps_init() do {} while (0)
 #define mtrr_bp_restore() do {} while (0)
 #  endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..deef1b5b27cc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1948,7 +1948,6 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_32
enable_sep_cpu();
 #endif
-   mtrr_ap_init();
validate_apic_and_package_id(c);
x86_spec_ctrl_setup_ap();
update_srbds_msr();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..abbf7cb8a430 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -69,7 +69,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;
 
 static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
 
@@ -176,7 +175,7 @@ static int mtrr_rendezvous_handler(void *info)
if (data->smp_reg != ~0U) {
mtrr_if->set(data->smp_reg, data->smp_base,
 data->smp_size, data->smp_type);
-   } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
+   } else if (!cpu_online(smp_processor_id())) {
mtrr_if->set_all();
}
return 0;
@@ -784,13 +783,16 @@ void __init mtrr_bp_init(void)
}
 }
 
-void mtrr_ap_init(void)
+static int mtrr_ap_init(unsigned int cpu)
 {
+   pr_info("%s: single AP entry, use_intel: %d, mtrr_enabled: %d, 
mtrr_aps_delayed_init\n",
+   __func__, use_intel(), mtrr_enabled());
+
if (!mtrr_enabled())
-   return;
+   return 1;
 
-   if (!use_intel() || mtrr_aps_delayed_init)
-   return;
+   if (!use_intel())
+   return 1;
 
/*
 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
@@ -806,6 +808,8 @@ void mtrr_ap_init(void)
 *  lock to prevent mtrr entry changes
 */
set_mtrr_from_inactive_cpu(~0U, 0, 0, 0);
+
+   return 0;
 }
 
 /**
@@ -820,37 +824,24 @@ void mtrr_save_state(void)
return;
 
first_cpu = cpumask_first(cpu_online_mask);
-   smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
-}
 
-void set_mtrr_aps_delayed_init(void)
-{
-   if (!mtrr_enabled())
-   return;
-   if (!use_intel())
-   return;
+   pr_info("%s: first_cpu: %d\n", __func__, first_cpu);
 
-   mtrr_aps_delayed_init = true;
+   smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
 }
 
 /*
- * Delayed MTRR initialization for all AP's
+ * Delayed MTRR initialization for all APs
  */
 void mtrr_aps_init(void)
 {
-   if (!use_intel() || !mtrr_enabled())
-   return;
+   pr_info("%s: entry, use_intel: %d, mtrr_enabled: %d, 
mtrr_aps_delayed_init\n",
+   __func__,

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-27 Thread Borislav Petkov

On Tue, Sep 27, 2022 at 01:25:47PM +0200, Juergen Gross wrote:
> You mean by replacing it with "(system_state != SYSTEM_RUNNING)" ?

Right, or maybe even something more elegant. I've been meaning to ask
tglx about it as I needed it for the microcode loader too.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-27 Thread Borislav Petkov

On Tue, Sep 27, 2022 at 12:14:42PM +0200, Juergen Gross wrote:
> Yes: cpu hotplug.

You might need to elaborate here.

Because I see mtrr_ap_init() on the AP hotplug path:

native_cpu_up->
do_boot_cpu->
start_secondary->
smp_callin->
smp_store_cpu_info->
identify_secondary_cpu->
mtrr_ap_init

Which then means that we could check in mtrr_ap_init() if we're on the
hotplug path and still get rid of that stupid bool...

Close?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-27 Thread Borislav Petkov

On Tue, Sep 27, 2022 at 10:57:37AM +0200, Juergen Gross wrote:
> TBH I don't see the point of having an accessor which is just setting a
> variable to "true". But if you like it better, I can keep it.

Accessors are always better, no matter how silly. :)

But, in trying to grok your next patch - you really should split those
more complex ones because they're a pain to review - I'm starting to
wonder whether we could even remove mtrr_aps_delayed_init and make the
delayed init the default.

Because, AFAICT, set_mtrr_aps_delayed_init() is called by default
by native_smp_prepare_cpus(). Which is called by hyperv and
arch/x86/xen/smp_hvm.c.

The only one that's not calling it is arch/x86/xen/smp_pv.c but that
thing doesn't support MTRRs in the first place, right?

Which means, it doesn't need delayed MTRR init anyway.

Which would then mean that this would simplify this ugly logic even more.

Or am I missing an angle?

It is possible in this nuts code.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

2022-09-26 Thread Borislav Petkov

On Thu, Sep 08, 2022 at 10:49:12AM +0200, Juergen Gross wrote:
> -void set_mtrr_aps_delayed_init(void)
> -{
> - if (!cache_generic)
> - return;
> -
> - mtrr_aps_delayed_init = true;
> -}
> -

Except that you've removed the accessors and made that bool global.
Which is less pretty than it was before...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 07/10] x86/mtrr: simplify mtrr_bp_init()

2022-09-26 Thread Borislav Petkov

On Thu, Sep 08, 2022 at 10:49:11AM +0200, Juergen Gross wrote:
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 9609a0d235f8..956838bb4481 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -761,13 +761,10 @@ void __init mtrr_bp_init(void)
>   mtrr_enabled = get_mtrr_state();
>  
>   if (mtrr_enabled) {
> - mtrr_bp_pat_init();
>   cache_generic |= CACHE_GENERIC_MTRR |
>CACHE_GENERIC_PAT;
> - }
> -
> - if (mtrr_cleanup(phys_addr)) {
> - changed_by_mtrr_cleanup = 1;
> + changed_by_mtrr_cleanup =
> + mtrr_cleanup(phys_addr);

Just let those lines stick out.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 05/10] x86/mtrr: split generic_set_all()

2022-09-19 Thread Borislav Petkov

On Thu, Sep 08, 2022 at 10:49:09AM +0200, Juergen Gross wrote:
> Split generic_set_all() into multiple parts, while moving the main
> function body into cacheinfo.c.
> 
> This prepares the support of PAT without needing MTRR support by

"Prepare PAT support without ... "

> moving the main function body of generic_set_all() into cacheinfo.c
> while renaming it to cache_cpu_init(). The MTRR specific parts are
> moved into a dedicated small function called by cache_cpu_init().
> The PAT and MTRR specific functions are called conditionally based
> on the cache_generic bit settings.
> 
> The setting of smp_changes_mask is merged into the (new) function

... and so on. So the commit message should not say what you're doing -
that should be visible from the diff itself. It should talk more about
the *why* you're doing it.

...

> diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
> index 47e2c72fa8a4..36378604ec61 100644
> --- a/arch/x86/kernel/cpu/cacheinfo.c
> +++ b/arch/x86/kernel/cpu/cacheinfo.c
> @@ -1120,3 +1120,22 @@ void cache_enable(void) __releases(cache_disable_lock)
>  
>   raw_spin_unlock(_disable_lock);
>  }
> +
> +void cache_cpu_init(void)
> +{
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + cache_disable();
> +
> + /* Set MTRR state. */

Yeah, and when you name the functions properly as you've done, you don't
really need those comments anymore either.

> + if (cache_generic & CACHE_GENERIC_MTRR)
> + mtrr_generic_set_state();
> +
> + /* Set PAT. */

And this one.

> + if (cache_generic & CACHE_GENERIC_PAT)
> + pat_init();
> +
> + cache_enable();
> + local_irq_restore(flags);
> +}
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
> b/arch/x86/kernel/cpu/mtrr/generic.c
> index 5ed397f03a87..fc7b2d952737 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -731,30 +731,19 @@ void mtrr_enable(void)
>   mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>  }
>  
> -static void generic_set_all(void)
> +void mtrr_generic_set_state(void)
>  {
>   unsigned long mask, count;
> - unsigned long flags;
> -
> - local_irq_save(flags);
> - cache_disable();
>  
>   /* Actually set the state */
>   mask = set_mtrr_state();
>  
> - /* also set PAT */
> - pat_init();
> -
> - cache_enable();
> - local_irq_restore(flags);
> -
>   /* Use the atomic bitops to update the global mask */
>   for (count = 0; count < sizeof(mask) * 8; ++count) {
>   if (mask & 0x01)
>   set_bit(count, _changes_mask);
>   mask >>= 1;
>   }
> -
>  }
>  
>  /**
> @@ -854,7 +843,7 @@ int positive_have_wrcomb(void)
>   * Generic structure...
>   */
>  const struct mtrr_ops generic_mtrr_ops = {
> - .set_all= generic_set_all,
> + .set_all= cache_cpu_init,

I was gonna say that this looks weird - a set_all function pointer
assigned to a init function but that changes in the next patch.

But yeah, I like where this is going.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 03/10] x86/mtrr: replace use_intel() with a local flag

2022-09-19 Thread Borislav Petkov

On Mon, Sep 12, 2022 at 11:10:29AM +0200, Juergen Gross wrote:
> In the end this variable doesn't specify which caching types are available,
> but the ways to select/control the caching types.
> 
> So what about "memory_caching_select" or "memory_caching_control" instead?

_control sounds like the right thing. As in, which of the memory caching
things the kernel controls. Along with a comment above it what this
exactly is. Yap.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 02/10] x86/mtrr: remove unused cyrix_set_all() function

2022-08-25 Thread Borislav Petkov

On Thu, Aug 25, 2022 at 12:41:05PM +0200, Juergen Gross wrote:
> Maybe the alternative reasoning is much faster to understand: if the
> Cyrix set_all() could be called, the AMD and Centaur ones would be callable,
> too.

Right.

> Those being called would result in a NULL deref, so why should we keep
> the Cyrix one?

I know you're eager to remove dead code - I'd love that too. But before
we do that, we need to find out whether some Cyrix hw out there would
not need this.

I know, I know, they should've complained by now ... maybe they have but
we haven't heard about it.

What it most likely looks like is that those machines - a commit from
before git

commit 8fbdcb188e31ac901e216b466b97e90e8b057daa
Author: Dave Jones 
Date:   Wed Aug 14 21:14:22 2002 -0700

[PATCH] Modular x86 MTRR driver.

talks about

+/*
+ * On Cyrix 6x86(MX) and M II the ARR3 is special: it has connection
+ * with the SMM (System Management Mode) mode. So we need the following:
+ * Check whether SMI_LOCK (CCR3 bit 0) is set
+ *   if it is set, write a warning message: ARR3 cannot be changed!
+ * (it cannot be changed until the next processor reset)

which sounds like old rust. And which no one uses or such machines are
long dead already.

Wikipedia says:

https://en.wikipedia.org/wiki/Cyrix_6x86

"The Cyrix 6x86 is a line of sixth-generation, 32-bit x86
microprocessors designed and released by Cyrix in 1995..."

So I'm thinking removing it would be ok...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 01/10] x86/mtrr: fix MTRR fixup on APs

2022-08-22 Thread Borislav Petkov

On Mon, Aug 22, 2022 at 07:17:40AM +0200, Juergen Gross wrote:
> And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c
> which has a comment saying:
> 
> /* Some BIOS's are messed up and don't set all MTRRs the same! */

That thing also says:

pr_info("mtrr: probably your BIOS does not setup all CPUs.\n");
pr_info("mtrr: corrected configuration.\n");

because it'll go and force on all CPUs the MTRR state it read from the
BSP in mtrr_bp_init->get_mtrr_state.

> Yes, the chances are slim to hit such a box,

Well, my workstation says:

$ dmesg | grep -i mtrr
[0.391514] mtrr: your CPUs had inconsistent variable MTRR settings
[0.395199] mtrr: probably your BIOS does not setup all CPUs.
[0.399199] mtrr: corrected configuration.

but that's the variable MTRRs.

> but your reasoning suggests I should remove the related code?

My reasoning says you should not do anything at all here - works as
advertized. :-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 01/10] x86/mtrr: fix MTRR fixup on APs

2022-08-21 Thread Borislav Petkov

On Sun, Aug 21, 2022 at 02:25:59PM +0200, Borislav Petkov wrote:
> > Fix that by using percpu variables for saving the MSR contents.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Juergen Gross 
> > ---
> > I thought adding a "Fixes:" tag for the kernel's initial git commit
> > would maybe be entertaining, but without being really helpful.
> > The percpu variables were preferred over on-stack ones in order to
> > avoid more code churn in followup patches decoupling PAT from MTRR
> > support.
> 
> So if that thing has been broken for so long and no one noticed, we
> could just as well not backport to stable at all...

Yeah, you can't do that.

The whole day today I kept thinking that something's wrong with this
here. As in, why hasn't it been reported until now.

You say above:

"... for all cpus is racy in case the MSR contents differ across cpus."

But they don't differ:

"7.7.5 MTRRs in Multi-Processing Environments

In multi-processing environments, the MTRRs located in all processors
must characterize memory in the same way. Generally, this means that
identical values are written to the MTRRs used by the processors. This
also means that values CR0.CD and the PAT must be consistent across
processors. Failure to do so may result in coherency violations or loss
of atomicity. Processor implementations do not check the MTRR settings
in other processors to ensure consistency. It is the responsibility of
system software to initialize and maintain MTRR consistency across all
processors."

And you can't have different fixed MTRR type on each CPU - that would
lead to all kinds of nasty bugs.

And here's from a big fat box:

$ rdmsr -a 0x2ff | uniq -c
256 c00

All 256 CPUs have the def type set to the same thing.

Now, if all CPUs go write that same deftype_lo variable in the
rendezvous handler, the only issue that could happen is if a read
sees a partial write. BUT, AFAIK, x86 doesn't tear 32-bit writes so I
*think* all CPUs see the same value being corrected by using mtrr_state
previously saved on the BSP.

As I said, we should've seen this exploding left and right otherwise...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 01/10] x86/mtrr: fix MTRR fixup on APs

2022-08-21 Thread Borislav Petkov

On Sat, Aug 20, 2022 at 11:25:24AM +0200, Juergen Gross wrote:
> When booting or resuming the system MTRR state is saved on the boot
> processor and then this state is loaded into MTRRs of all other cpus.

s/cpu/CPU/g

Pls check all commit messages.

> During update of the MTRRs the MTRR mechanism needs to be disabled by
> writing the related MSR. The old contents of this MSR are saved in a
> set of static variables and later those static variables are used to
> restore the MSR.
> 
> In case the MSR contents need to be modified on a cpu due to the MSR
> not having been initialized properly by the BIOS, the related update
> function is modifying the static variables accordingly.
> 
> Unfortunately the MTRR state update is usually running on all cpus
> at the same time, so using just one set of static variables for all
> cpus is racy in case the MSR contents differ across cpus.
> 
> Fix that by using percpu variables for saving the MSR contents.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Juergen Gross 
> ---
> I thought adding a "Fixes:" tag for the kernel's initial git commit
> would maybe be entertaining, but without being really helpful.
> The percpu variables were preferred over on-stack ones in order to
> avoid more code churn in followup patches decoupling PAT from MTRR
> support.

So if that thing has been broken for so long and no one noticed, we
could just as well not backport to stable at all...

> V2:
> - new patch
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
> b/arch/x86/kernel/cpu/mtrr/generic.c
> index 558108296f3c..3d185fcf08ca 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -679,7 +679,8 @@ static bool set_mtrr_var_ranges(unsigned int index, 
> struct mtrr_var_range *vr)
>   return changed;
>  }
>  
> -static u32 deftype_lo, deftype_hi;
> +static DEFINE_PER_CPU(u32, deftype_lo);
> +static DEFINE_PER_CPU(u32, deftype_hi);

My APM says that the high 32 bits of the MTRRdefType MSR are reserved
and MBZ. So you can drop the _hi thing and use 0 everywhere. Or rather a
dummy = 0 var because of that damn rdmsr() macro.

Or simply use a

u64 deftype;

use the rdmsrl/wrmsrl() variants and split it when passing to
umtrr_wrmsr() because that thing wants 2 32s.

>  /**
>   * set_mtrr_state - Set the MTRR state for this CPU.
> @@ -691,6 +692,7 @@ static unsigned long set_mtrr_state(void)
>  {
>   unsigned long change_mask = 0;
>   unsigned int i;
> + u32 *lo = this_cpu_ptr(_lo);

The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::

struct long_struct_name *descriptive_name;
unsigned long foo, bar;
unsigned int tmp;
int ret;

The above is faster to parse than the reverse ordering::

int ret;
unsigned int tmp;
unsigned long foo, bar;
struct long_struct_name *descriptive_name;

And even more so than random ordering::

unsigned long foo, bar;
int ret;
struct long_struct_name *descriptive_name;
unsigned int tmp;

Please check all your patches.

>   for (i = 0; i < num_var_ranges; i++) {
>   if (set_mtrr_var_ranges(i, _state.var_ranges[i]))
> @@ -704,10 +706,10 @@ static unsigned long set_mtrr_state(void)
>* Set_mtrr_restore restores the old value of MTRRdefType,
>* so to set it we fiddle with the saved value:
>*/
> - if ((deftype_lo & 0xff) != mtrr_state.def_type
> - || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
> + if ((*lo & 0xff) != mtrr_state.def_type
> + || ((*lo & 0xc00) >> 10) != mtrr_state.enabled) {
>  
> - deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
> + *lo = (*lo & ~0xcff) | mtrr_state.def_type |
>(mtrr_state.enabled << 10);
>   change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
>   }
> @@ -729,6 +731,8 @@ static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
>  static void prepare_set(void) __acquires(set_atomicity_lock)
>  {
>   unsigned long cr0;
> + u32 *lo = this_cpu_ptr(_lo);
> + u32 *hi = this_cpu_ptr(_hi);

You don't need the pointers here - this_cpu_read() should be enough.

>   /*
>* Note that this is not ideal
> @@ -763,10 +767,10 @@ static void prepare_set(void) 
> __acquires(set_atomicity_lock)
>   flush_tlb_local();
>  
>   /* Save MTRR state */
> - rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
> + rdmsr(MSR_MTRRdefType, *lo, *hi);
>  
>   /* Disable MTRRs, and set the default type to uncached */
> - mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
> + mtrr_wrmsr(MSR_MTRRdefType, *lo & ~0xcff, *hi);
>  
>   /* Again, only flush caches if we have to. */
>   if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
> @@ -775,12

Re: PING [PATCH 3/3] x86: decouple pat and mtrr handling

2022-08-13 Thread Borislav Petkov

On Sat, Aug 13, 2022 at 05:40:34PM -0400, Chuck Zmudzinski wrote:
> I did a search for Juergen Gross on lkml and he is active submitting and
> reviewing patches during the past few weeks. However, he is ignoring
> comments on his patch to fix this regression.

Please stop this non-sense and be patient. We will fix this soon. For
the time being you can use Jan's patch locally.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: PING [PATCH 3/3] x86: decouple pat and mtrr handling

2022-08-13 Thread Borislav Petkov

On Sat, Aug 13, 2022 at 12:56:44PM -0400, Chuck Zmudzinski wrote:
> Why has Juergen not at least responded in some way to the
> comments that Boris has made here? Why has Boris not
> pinged Juergen by now,

How about: it is summer here and people usually take their vacations
during that time and everything takes a bit longer than usual?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 3/3] x86: decouple pat and mtrr handling

2022-07-19 Thread Borislav Petkov

On Fri, Jul 15, 2022 at 04:25:49PM +0200, Juergen Gross wrote:
> Today PAT is usable only with MTRR being active, with some nasty tweaks
> to make PAT usable when running as Xen PV guest, which doesn't support
> MTRR.
> 
> The reason for this coupling is, that both, PAT MSR changes and MTRR
> changes, require a similar sequence and so full PAT support was added
> using the already available MTRR handling.
> 
> Xen PV PAT handling can work without MTRR, as it just needs to consume
> the PAT MSR setting done by the hypervisor without the ability and need
> to change it. This in turn has resulted in a convoluted initialization
> sequence and wrong decisions regarding cache mode availability due to
> misguiding PAT availability flags.
> 
> Fix all of that by allowing to use PAT without MTRR and by adding an
> environment dependent PAT init function.

Aha, there's the explanation I was looking for.

> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 0a1bd14f7966..3edfb779dab5 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2408,8 +2408,8 @@ void __init cache_bp_init(void)
>  {
>   if (IS_ENABLED(CONFIG_MTRR))
>   mtrr_bp_init();
> - else
> - pat_disable("PAT support disabled because CONFIG_MTRR is 
> disabled in the kernel.");
> +
> + pat_cpu_init();
>  }
>  
>  void cache_ap_init(void)
> @@ -2417,7 +2417,8 @@ void cache_ap_init(void)
>   if (cache_aps_delayed_init)
>   return;
>  
> - mtrr_ap_init();
> + if (!mtrr_ap_init())
> + pat_ap_init_nomtrr();
>  }

So I'm reading this as: if it couldn't init AP's MTRRs, init its PAT.

But currently, the code sets the MTRRs for the delayed case or when the
CPU is not online by doing ->set_all and in there it sets first MTRRs
and then PAT.

I think the code above should simply try the two things, one after the
other, independently from one another.

And I see you've added another stomp machine call for PAT only.

Now, what I think the design of all this should be, is:

you have a bunch of things you need to do at each point:

* cache_ap_init

* cache_aps_init

* ...

Now, in each those, you look at whether PAT or MTRR is supported and you
do only those which are supported.

Also, the rendezvous handler should do:

if MTRR:
do MTRR specific stuff

if PAT:
do PAT specific stuff

This way you have clean definitions of what needs to happen when and you
also do *only* the things that the platform supports, by keeping the
proper order of operations - I believe MTRRs first and then PAT.

This way we'll get rid of that crazy maze of who calls what and when.

But first we need to define those points where stuff needs to happen and
then for each point define what stuff needs to happen.

How does that sound?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/3] x86: move some code out of arch/x86/kernel/cpu/mtrr

2022-07-18 Thread Borislav Petkov

On Fri, Jul 15, 2022 at 04:25:47PM +0200, Juergen Gross wrote:
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 736262a76a12..e43322f8a4ef 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c

I guess the move's ok but not into cpu/common.c pls. That thing is
huuuge and is a dumping ground for everything.

arch/x86/kernel/cpu/cacheinfo.c looks like a more viable candidate for
all things cache.

Rest looks trivial.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 0/3] x86: make pat and mtrr independent from each other

2022-07-16 Thread Borislav Petkov

On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote:
> Can you confirm that with this patch series you are trying
> to fix that regression?

Yes, this patchset is aimed to fix the whole situation but please don't
do anything yet - I need to find time and look at the whole approach
before you can test it. Just be patient and we'll ping you when the time
comes.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

2022-07-11 Thread Borislav Petkov

On Thu, Jul 07, 2022 at 08:38:44AM +0200, Jan Beulich wrote:
> Well, right now the pvops hook for Xen swallows #GP anyway (wrongly
> so imo, but any of my earlier pointing out of that has been left
> unheard, despite even the code comments there saying "It may be worth
> changing that").

Oh great. ;-\

> The point is therefore that after writing PAT, it would need reading
> back. In which case it feels (slightly) more clean to me to avoid the
> write attempt in the first place, when we know it's not going to work.

X86_FEATURE_XENPV check then.

> If I may ask - doesn't this mean this patch, in its current shape, is
> already a (small) step in that direction? In any event what you say
> doesn't sound to me like a viable (backportable) route to addressing
> the regression at hand.

Backportable to where? To whatever tree has

bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

? If it is that, then 5.17 and newer.

Anyway, I don't mind it as long as you put the proper splitting out
ontop and it all goes as a single patchset, with the minimal fix
CC:stable and queued first.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

2022-07-05 Thread Borislav Petkov

On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> Re-using pat_disabled like you do in your suggestion below won't
> work, because mtrr_bp_init() calls pat_disable() when MTRRs
> appear to be disabled (from the kernel's view). The goal is to
> honor "nopat" without honoring any other calls to pat_disable().

Actually, the current goal is to adjust Xen dom0 because:

1. it uses the PAT code

2. but then it does something special and hides the MTRRs

which is not something real hardware does.

So this one-off thing should be prominent, visible and not get in the
way.

As to mtrr_bp_init(), can you use X86_FEATURE_XENPV there to detect this
special case when the kernel is running as dom0 and set stuff there
accordingly so that it doesn't disable PAT?

Then you don't have to touch pat_disabled() either but intergrate the
Xen variant properly...

> I can probably fiddle with pat_enabled() instead of with
> init_cache_modes(), but when making the change I had the feeling
> this might be less liked (as looking more hacky, at least to me).

Why would that be more hacky?

I'd much rather check upfront what the kernel is running on and act
accordingly instead of hooking into random functions and then years
later wonder why was it done in the first place.

> But besides the "where" the other question is: Do you really want
> me to limit this to Xen/PV, rather than - as I have it now -
> extending it to any hypervisor, which may behave in similar ways?

Well, do you know of some other HV which hides MTRRs from the guest?

I haven't heard of any...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen

2022-04-26 Thread Borislav Petkov

On Tue, Apr 26, 2022 at 11:36:40AM +0200, Juergen Gross wrote:
> As the suggestion was to add another flag this wouldn't be a problem IMO.

We had a problem already with adding one flag would break the same flag
on the other guest type. That's why we added cc_vendor too. So it can be
tricky.

> platform_has() doesn't seem too bad IMO.
> 
> I will write a patch for starting the discussion.

Yeah, I guess such a proposal would need a wider audience - maybe CC
linux-arch...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen

2022-04-25 Thread Borislav Petkov

On Mon, Apr 25, 2022 at 11:38:36PM +0300, Oleksandr wrote:
> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> index efd8205..d06bc7a 100644
> --- a/include/linux/cc_platform.h
> +++ b/include/linux/cc_platform.h
> @@ -72,6 +72,19 @@ enum cc_attr {
>  * Examples include TDX guest & SEV.
>  */
>     CC_ATTR_GUEST_UNROLL_STRING_IO,
> +
> +   /**
> +    * @CC_ATTR_GUEST_MEM_ACCESS_RESTRICTED: Restricted memory access to
> +    *   Guest memory is active
> +    *
> +    * The platform/OS is running as a guest/virtual machine and uses
> +    * the restricted access to its memory. This attribute is set if
> either
> +    * Guest memory encryption or restricted memory access using Xen
> grant
> +    * mappings is active.
> +    *
> +    * Examples include Xen guest and SEV.

Wait, whaaat?

The cc_platform* stuff is for *confidential computing* guests to check
different platform aspects.

>From quickly skimming over this, this looks like a misuse to me.

Why can't you query this from the hypervisor just like you do your other
querying about what is supported, etc? Hypercalls, CPUID, whatever...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-16 Thread Borislav Petkov

On Tue, Nov 16, 2021 at 10:39:21AM -0500, Tianyu Lan wrote:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 35487305d8af..65bc385ae07a 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "mm_internal.h"
>  
> @@ -203,7 +204,8 @@ void __init sev_setup_arch(void)
>   phys_addr_t total_mem = memblock_phys_mem_size();
>   unsigned long size;
>  
> - if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)
> + && !hv_is_isolation_supported())

Are we gonna start sprinkling this hv_is_isolation_supported() check
everywhere now?

Are those isolation VMs SEV-like guests? Is CC_ATTR_GUEST_MEM_ENCRYPT
set on them?

What you should do, instead, is add an isol. VM specific
hv_cc_platform_has() just like amd_cc_platform_has() and handle
the cc_attrs there for your platform, like return false for
CC_ATTR_GUEST_MEM_ENCRYPT and then you won't need to add that hv_* thing
everywhere.

And then fix it up in __set_memory_enc_dec() too.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

On Mon, Nov 08, 2021 at 03:59:26PM -0500, Alan Stern wrote:
> Is there really any reason for returning an error code?  For example, is 
> it anticipated that at some point in the future these registration calls 
> might fail?
> 
> Currently, the only reason for failing...

Right, I believe with not making it return void we're leaving the door
open for some, *hypothetical* future return values if we decide we need
to return them too, at some point.

Yes, I can't think of another fact to state besides that the callback
was already registered or return success but who knows what we wanna do
in the future...

And so if we change them all to void now, I think it'll be a lot more
churn to switch back to returning a non-void value and having the
callers who choose to handle that value, do so again.

So, long story short, keeping the retval - albeit not very useful right
now - is probably easier.

I hope I'm making some sense here.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

On Mon, Nov 08, 2021 at 11:23:13AM -0500, Steven Rostedt wrote:
> Question, how often does this warning trigger? Is it common to see in
> development?

Yeah, haven't seen it myself yet.

But we hashed it out over IRC. :-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

On Mon, Nov 08, 2021 at 05:12:16PM +0100, Geert Uytterhoeven wrote:
> Returning void is the other extreme ;-)
> 
> There are 3 levels (ignoring BUG_ON()/panic () inside the callee):
>   1. Return void: no one can check success or failure,
>   2. Return an error code: up to the caller to decide,
>   3. Return a __must_check error code: every caller must check.
> 
> I'm in favor of 2, as there are several places where it cannot fail.

Makes sense to me. I'll do that in the next iteration.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

On Mon, Nov 08, 2021 at 04:25:47PM +0100, Geert Uytterhoeven wrote:
> I'm not against returning proper errors codes.  I'm against forcing
> callers to check things that cannot fail and to add individual error
> printing to each and every caller.

If you're against checking things at the callers, then the registration
function should be void. IOW, those APIs are not optimally designed atm.

> Note that in other areas, we are moving in the other direction,
> to a centralized printing of error messages, cfr. e.g. commit
> 7723f4c5ecdb8d83 ("driver core: platform: Add an error message to
> platform_get_irq*()").

Yes, thus my other idea to add a lower level __notifier_chain_register()
to do the checking.

I'll see if I can convert those notifier registration functions to
return void, in the process. But let's see what the others think first.

Thanks for taking the time.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

On Mon, Nov 08, 2021 at 03:24:39PM +0100, Borislav Petkov wrote:
> I guess I can add another indirection to notifier_chain_register() and
> avoid touching all the call sites.

IOW, something like this below.

This way I won't have to touch all the callsites and the registration
routines would still return a proper value instead of returning 0
unconditionally.

---
diff --git a/kernel/notifier.c b/kernel/notifier.c
index b8251dc0bc0f..04f08b2ef17f 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -19,14 +19,12 @@ BLOCKING_NOTIFIER_HEAD(reboot_notifier_list);
  * are layered on top of these, with appropriate locking added.
  */
 
-static int notifier_chain_register(struct notifier_block **nl,
-   struct notifier_block *n)
+static int __notifier_chain_register(struct notifier_block **nl,
+struct notifier_block *n)
 {
while ((*nl) != NULL) {
-   if (unlikely((*nl) == n)) {
-   WARN(1, "double register detected");
-   return 0;
-   }
+   if (unlikely((*nl) == n))
+   return -EEXIST;
if (n->priority > (*nl)->priority)
break;
nl = &((*nl)->next);
@@ -36,6 +34,18 @@ static int notifier_chain_register(struct notifier_block 
**nl,
return 0;
 }
 
+static int notifier_chain_register(struct notifier_block **nl,
+  struct notifier_block *n)
+{
+   int ret = __notifier_chain_register(nl, n);
+
+   if (ret == -EEXIST)
+   WARN(1, "double register of notifier callback %ps detected",
+   n->notifier_call);
+
+   return ret;
+}
+
 static int notifier_chain_unregister(struct notifier_block **nl,
struct notifier_block *n)
 {


-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

On Mon, Nov 08, 2021 at 09:17:03AM -0500, Alan Stern wrote:
> What reason is there for moving the check into the callers?  It seems 
> like pointless churn.  Why not add the error return code, change the 
> WARN to pr_warn, and leave the callers as they are?  Wouldn't that end 
> up having exactly the same effect?
> 
> For that matter, what sort of remedial action can a caller take if the 
> return code is -EEXIST?  Is there any point in forcing callers to check 
> the return code if they can't do anything about it?

See my reply to Geert from just now:

https://lore.kernel.org/r/yykyueqcsowqm...@zn.tnic

I guess I can add another indirection to notifier_chain_register() and
avoid touching all the call sites.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

On Mon, Nov 08, 2021 at 03:07:03PM +0100, Geert Uytterhoeven wrote:
> I think the addition of __must_check is overkill, leading to the
> addition of useless error checks and message printing.

See the WARN in notifier_chain_register() - it will already do "message
printing".

> Many callers call this where it cannot fail, and where nothing can
> be done in the very unlikely event that the call would ever start to
> fail.

This is an attempt to remove this WARN() hack in
notifier_chain_register() and have the function return a proper error
value instead of this "Currently always returns zero." which is bad
design.

Some of the registration functions around the tree check that retval and
some don't. So if "it cannot fail" those registration either should not
return a value or callers should check that return value - what we have
now doesn't make a whole lot of sense.

Oh, and then fixing this should avoid stuff like:

+   if (notifier_registered == false) {
+   mce_register_decode_chain(_bad_page_nb);
+   notifier_registered = true;
+   }

from propagating in the code.

The other idea I have is to add another indirection in
notifier_chain_register() which will check the *proper* return value of
a lower level __notifier_chain_register() and then issue that warning.
That would definitely avoid touch so many call sites.

Bottom line is: what we have now needs cleaning up.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

From: Borislav Petkov 

Hi all,

this is a huge patchset for something which is really trivial - it
changes the notifier registration routines to return an error value
if a notifier callback is already present on the respective list of
callbacks. For more details scroll to the last patch.

Everything before it is converting the callers to check the return value
of the registration routines and issue a warning, instead of the WARN()
notifier_chain_register() does now.

Before the last patch has been applied, though, that checking is a
NOP which would make the application of those patches trivial - every
maintainer can pick a patch at her/his discretion - only the last one
enables the build warnings and that one will be queued only after the
preceding patches have all been merged so that there are no build
warnings.

Due to the sheer volume of the patches, I have addressed the respective
patch and the last one, which enables the warning, with addressees for
each maintained area so as not to spam people unnecessarily.

If people prefer I carry some through tip, instead, I'll gladly do so -
your call.

And, if you think the warning messages need to be more precise, feel
free to adjust them before committing.

Thanks!

Cc: Arnd Bergmann 
Cc: Ayush Sawal 
Cc: Greg Kroah-Hartman 
Cc: Rohit Maheshwari 
Cc: Steven Rostedt 
Cc: Vinay Kumar Yadav  
Cc: alsa-de...@alsa-project.org
Cc: bcm-kernel-feedback-l...@broadcom.com
Cc: intel-...@lists.freedesktop.org
Cc: intel-gvt-...@lists.freedesktop.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-e...@vger.kernel.org
Cc: linux-fb...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-l...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-remotep...@vger.kernel.org
Cc: linux-renesas-...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-stag...@lists.linux.dev
Cc: linux-te...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: net...@vger.kernel.org
Cc: openipmi-develo...@lists.sourceforge.net
Cc: r...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org
Cc: xen-devel@lists.xenproject.org

Borislav Petkov (42):
  x86: Check notifier registration return value
  xen/x86: Check notifier registration return value
  impi: Check notifier registration return value
  clk: renesas: Check notifier registration return value
  dca: Check notifier registration return value
  firmware: Check notifier registration return value
  drm/i915: Check notifier registration return value
  Drivers: hv: vmbus: Check notifier registration return value
  iio: proximity: cros_ec: Check notifier registration return value
  leds: trigger: Check notifier registration return value
  misc: Check notifier registration return value
  ethernet: chelsio: Check notifier registration return value
  power: reset: Check notifier registration return value
  remoteproc: Check notifier registration return value
  scsi: target: Check notifier registration return value
  USB: Check notifier registration return value
  drivers: video: Check notifier registration return value
  drivers/xen: Check notifier registration return value
  kernel/hung_task: Check notifier registration return value
  rcu: Check notifier registration return value
  tracing: Check notifier registration return value
  net: fib_notifier: Check notifier registration return value
  ASoC: soc-jack: Check notifier registration return value
  staging: olpc_dcon: Check notifier registration return value
  arch/um: Check notifier registration return value
  alpha: Check notifier registration return value
  bus: brcmstb_gisb: Check notifier registration return value
  soc: bcm: brcmstb: pm: pm-arm: Check notifier registration return
value
  arm64: Check notifier registration return value
  soc/tegra: Check notifier registration return value
  parisc: Check notifier registration return value
  macintosh/adb: Check notifier registration return value
  mips: Check notifier registration return value
  powerpc: Check notifier registration return value
  sh: Check notifier registration return value
  s390: Check notifier registration return value
  sparc: Check notifier registration return value
  xtensa: Check notifier registration return value
  crypto: ccree - check notifier registration return value
  EDAC/altera: Check notifier registration return value
  power: supply: ab8500: Check notifier registration return value
  notifier: Return an error when callback is already registered

 arch/alpha/kernel/setup.c |  5 +--
 arch/arm64/kernel/setup.c |  6 ++--
 arch/mips/kernel/relocate.c

[PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov

From: Borislav Petkov 

The notifier registration routine doesn't return a proper error value
when a callback has already been registered, leading people to track
whether that registration has happened at the call site:

  https://lore.kernel.org/amd-gfx/20210512013058.6827-1-mukul.jo...@amd.com/

Which is unnecessary.

Return -EEXIST to signal that case so that callers can act accordingly.
Enforce callers to check the return value, leading to loud screaming
during build:

  arch/x86/kernel/cpu/mce/core.c: In function ‘mce_register_decode_chain’:
  arch/x86/kernel/cpu/mce/core.c:167:2: error: ignoring return value of \
   ‘blocking_notifier_chain_register’, declared with attribute 
warn_unused_result [-Werror=unused-result]
blocking_notifier_chain_register(_mce_decoder_chain, nb);
  ^~~~

Drop the WARN too, while at it.

Suggested-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov 
Cc: Arnd Bergmann 
Cc: Ayush Sawal 
Cc: Greg Kroah-Hartman 
Cc: Rohit Maheshwari 
Cc: Steven Rostedt 
Cc: Vinay Kumar Yadav 
Cc: alsa-de...@alsa-project.org
Cc: bcm-kernel-feedback-l...@broadcom.com
Cc: intel-...@lists.freedesktop.org
Cc: intel-gvt-...@lists.freedesktop.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-e...@vger.kernel.org
Cc: linux-fb...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-l...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-remotep...@vger.kernel.org
Cc: linux-renesas-...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-stag...@lists.linux.dev
Cc: linux-te...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: net...@vger.kernel.org
Cc: openipmi-develo...@lists.sourceforge.net
Cc: r...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org
Cc: xen-devel@lists.xenproject.org
---
 include/linux/notifier.h |  8 
 kernel/notifier.c| 36 +++-
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 87069b8459af..45cc5a8d0fd8 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -141,13 +141,13 @@ extern void srcu_init_notifier_head(struct 
srcu_notifier_head *nh);
 
 #ifdef __KERNEL__
 
-extern int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
+extern int __must_check atomic_notifier_chain_register(struct 
atomic_notifier_head *nh,
struct notifier_block *nb);
-extern int blocking_notifier_chain_register(struct blocking_notifier_head *nh,
+extern int __must_check blocking_notifier_chain_register(struct 
blocking_notifier_head *nh,
struct notifier_block *nb);
-extern int raw_notifier_chain_register(struct raw_notifier_head *nh,
+extern int __must_check raw_notifier_chain_register(struct raw_notifier_head 
*nh,
struct notifier_block *nb);
-extern int srcu_notifier_chain_register(struct srcu_notifier_head *nh,
+extern int __must_check srcu_notifier_chain_register(struct srcu_notifier_head 
*nh,
struct notifier_block *nb);
 
 extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
diff --git a/kernel/notifier.c b/kernel/notifier.c
index b8251dc0bc0f..451ef3f73ad2 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -20,13 +20,11 @@ BLOCKING_NOTIFIER_HEAD(reboot_notifier_list);
  */
 
 static int notifier_chain_register(struct notifier_block **nl,
-   struct notifier_block *n)
+  struct notifier_block *n)
 {
while ((*nl) != NULL) {
-   if (unlikely((*nl) == n)) {
-   WARN(1, "double register detected");
-   return 0;
-   }
+   if (unlikely((*nl) == n))
+   return -EEXIST;
if (n->priority > (*nl)->priority)
break;
nl = &((*nl)->next);
@@ -134,10 +132,11 @@ static int notifier_call_chain_robust(struct 
notifier_block **nl,
  *
  * Adds a notifier to an atomic notifier chain.
  *
- * Currently always returns zero.
+ * Returns 0 on success, %-EEXIST on error.
  */
-int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
-   struct notifier_block *n)
+int __must_check
+atomic_notifier_chain_register(struct atomic_notifier_head *nh,
+  struct notifier_block *n)
 {
unsigned long flags;
int ret;
@@ -216,10 +215,11 @@ NOKPROBE_SYMBOL(atomic_notifier_call_chain);
  * Adds a notifier to

[PATCH v0 18/42] drivers/xen: Check notifier registration return value

2021-11-08 Thread Borislav Petkov

From: Borislav Petkov 

Avoid homegrown notifier registration checks.

No functional changes.

Signed-off-by: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
---
 drivers/xen/manage.c  | 3 ++-
 drivers/xen/xenbus/xenbus_probe.c | 8 +---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 374d36de7f5a..f3c5cef0995f 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -48,7 +48,8 @@ static RAW_NOTIFIER_HEAD(xen_resume_notifier);
 
 void xen_resume_notifier_register(struct notifier_block *nb)
 {
-   raw_notifier_chain_register(_resume_notifier, nb);
+   if (raw_notifier_chain_register(_resume_notifier, nb))
+   pr_warn("Xen resume notifier already registered\n");
 }
 EXPORT_SYMBOL_GPL(xen_resume_notifier_register);
 
diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index bd003ca8acbe..4e83ce95acd1 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -731,10 +731,12 @@ int register_xenstore_notifier(struct notifier_block *nb)
 {
int ret = 0;
 
-   if (xenstored_ready > 0)
+   if (xenstored_ready > 0) {
ret = nb->notifier_call(nb, 0, NULL);
-   else
-   blocking_notifier_chain_register(_chain, nb);
+   } else {
+   if (blocking_notifier_chain_register(_chain, nb))
+   pr_warn("Xenstore notifier already registered\n");
+   }
 
return ret;
 }
-- 
2.29.2

[PATCH v0 02/42] xen/x86: Check notifier registration return value

2021-11-08 Thread Borislav Petkov

From: Borislav Petkov 

Avoid homegrown notifier registration checks.

No functional changes.

Signed-off-by: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
---
 arch/x86/xen/enlighten.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 95d970359e17..2264dd6e157f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -354,7 +354,9 @@ static struct notifier_block xen_panic_block = {
 
 int xen_panic_handler_init(void)
 {
-   atomic_notifier_chain_register(_notifier_list, _panic_block);
+   if (atomic_notifier_chain_register(_notifier_list, 
_panic_block))
+   pr_warn("Xen panic notifier already registered\n");
+
return 0;
 }
 
-- 
2.29.2

Re: [PATCH V4 04/50] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

2021-11-02 Thread Borislav Petkov

On Tue, Nov 02, 2021 at 12:22:50PM +0100, H. Peter Anvin wrote:
> It would be interesting to have an "override function with jmp"
> alternatives macro. It doesn't require any changes to the alternatives
> mechanism proper (but possibly to objtool): it would just insert an
> alternatives entry without adding any code including nops to the main
> path. It would of course only be applicable to a jmp, so a syntax like
> OVERRIDE_JMP feature, target rather than open-coding the instruction
> would probably be a good idea.

I think you wanna say ALTERNATIVE_JMP here seeing how we have
ALTERNATIVE_CALL already :)

As to marking it properly, we can finally add that struct
alt_instr.flags thing we have been trying to add for years now.

/me adds it to his evergrowing todo.

If anyone beats /me to it, /me will gladly have a look at it.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH V4 04/50] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

2021-11-02 Thread Borislav Petkov

On Tue, Nov 02, 2021 at 05:19:46PM +0800, Lai Jiangshan wrote:
> It will add a 5-byte NOP at the beginning of the native
> swapgs_restore_regs_and_return_to_usermode.

So?

> I avoided adding unneeded code in the native code even if it is NOPs
> and avoided melting xenpv-one into the native one which will reduce
> the code readability.

How does this reduce code readability?!

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e38a4cf795d9..bf1de54a1fca 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -567,6 +567,10 @@ __irqentry_text_end:
 
 SYM_CODE_START_LOCAL(common_interrupt_return)
 SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
+
+   ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \
+X86_FEATURE_XENPV
+
 #ifdef CONFIG_DEBUG_ENTRY
/* Assert that pt_regs indicates user mode. */
testb   $3, CS(%rsp)

> I will follow your preference since a 5-byte NOP is so negligible in the slow
> path with an iret instruction.

Yes, we do already gazillion things on those entry and exit paths.

> Or other option that adds macros to wrap the ALTERNATIVE.
> RESTORE_REGS_AND_RETURN_TO_USERMODE and
> COND_RESTORE_REGS_AND_RETURN_TO_USERMODE (test %eax before jmp in native case)

No, the main goal is to keep the asm code as readable and as simple as
possible.

If macros or whatever need to be added, there better be a good reason
for them. Saving a NOP is not one of them.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH V4 04/50] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

2021-11-02 Thread Borislav Petkov

On Tue, Oct 26, 2021 at 10:13:34PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan 
> 
> While in the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the
> trampoline stack.  But XEN pv doesn't use trampoline stack, so
> PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack.  Hence source
> and destination stacks are identical in that case, which means reusing
> swapgs_restore_regs_and_return_to_usermode() in XEN pv would cause %rsp
> to move up to the top of the kernel stack and leave the IRET frame below
> %rsp, which is dangerous to be corrupted if #NMI / #MC hit as either of
> these events occurring in the middle of the stack pushing would clobber
> data on the (original) stack.
> 
> And swapgs_restore_regs_and_return_to_usermode() pushing the IRET frame
> on to the original address is useless and error-prone when there is any
> future attempt to modify the code.
> 
> Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT 
> entries")
> Cc: Jan Beulich 
> Cc: Thomas Gleixner 
> Cc: Juergen Gross 
> Cc: Peter Anvin 
> Cc: xen-devel@lists.xenproject.org>
> Reviewed-by: Boris Ostrovsky 
> Signed-off-by: Lai Jiangshan 
> ---
>  arch/x86/entry/entry_64.S|  9 ++---
>  arch/x86/entry/entry_64_compat.S |  7 ---
>  arch/x86/xen/xen-asm.S   | 27 +++
>  3 files changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 9d468c8877e2..0dde5a253dda 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -119,7 +119,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
> SYM_L_GLOBAL)
>* In the Xen PV case we must use iret anyway.
>*/
>  
> - ALTERNATIVE "", "jmpswapgs_restore_regs_and_return_to_usermode", \
> + ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \

Instead of sprinkling all those ALTERNATIVE calls everywhere,
why don't you simply jump to the xenpv-one at the
swapgs_restore_regs_and_return_to_usermode label itself and have a
single ALTERNATIVE there?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/setup: call early_reserve_memory() earlier

2021-09-16 Thread Borislav Petkov

On Thu, Sep 16, 2021 at 12:09:27PM +0300, Mike Rapoport wrote:
> I think the first sentence about reserving memory before memblock
> allocations are possible is important and I think we should keep it.

I expanded that comment this way:

/*
 * Do some memory reservations *before* memory is added to memblock, so
 * memblock allocations won't overwrite it.
 *
 * After this point, everything still needed from the boot loader or
 * firmware or kernel text should be early reserved or marked not RAM in
 * e820. All other memory is free game.
 *
 * This call needs to happen before e820__memory_setup() which calls the
 * xen_memory_setup() on Xen dom0 which relies on the fact that those
 * early reservations have happened already.
 */

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/setup: call early_reserve_memory() earlier

2021-09-15 Thread Borislav Petkov

You forgot to Cc Mike, lemme add him.

And drop stable@ too.

On Tue, Sep 14, 2021 at 01:06:22PM +0200, Juergen Gross wrote:
> On 14.09.21 12:03, Jan Beulich wrote:
> > On 14.09.2021 11:41, Juergen Gross wrote:
> > > Commit a799c2bd29d19c565 ("x86/setup: Consolidate early memory
> > > reservations") introduced early_reserve_memory() to do all needed
> > > initial memblock_reserve() calls in one function. Unfortunately the
> > > call of early_reserve_memory() is done too late for Xen dom0, as in
> > > some cases a Xen hook called by e820__memory_setup() will need those
> > > memory reservations to have happened already.
> > > 
> > > Move the call of early_reserve_memory() to the beginning of
> > > setup_arch() in order to avoid such problems.
> > > 
> > > Cc: sta...@vger.kernel.org
> > > Fixes: a799c2bd29d19c565 ("x86/setup: Consolidate early memory 
> > > reservations")
> > > Reported-by: Marek Marczykowski-Górecki 
> > > Signed-off-by: Juergen Gross 
> > > ---
> > >   arch/x86/kernel/setup.c | 24 
> > >   1 file changed, 12 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index 79f164141116..f369c51ec580 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -757,6 +757,18 @@ dump_kernel_offset(struct notifier_block *self, 
> > > unsigned long v, void *p)
> > >   void __init setup_arch(char **cmdline_p)
> > >   {
> > > + /*
> > > +  * Do some memory reservations *before* memory is added to
> > > +  * memblock, so memblock allocations won't overwrite it.
> > > +  * Do it after early param, so we could get (unlikely) panic from
> > > +  * serial.
> > 
> > Hmm, this part of the comment is not only stale now, but gets actively
> > undermined. No idea how likely such a panic() would be, and hence how
> > relevant it is to retain this particular property.
> 
> Ah, right.
> 
> The alternative would be to split it up again. Let's let the x86
> maintainers decide which way is the better one.
> 
> 
> Juergen
> 
> > 
> > Jan
> > 
> > > +  * After this point everything still needed from the boot loader or
> > > +  * firmware or kernel text should be early reserved or marked not
> > > +  * RAM in e820. All other memory is free game.
> > > +  */
> > > + early_reserve_memory();
> > > +
> > >   #ifdef CONFIG_X86_32
> > >   memcpy(_cpu_data, _cpu_data, sizeof(new_cpu_data));
> > > @@ -876,18 +888,6 @@ void __init setup_arch(char **cmdline_p)
> > >   parse_early_param();
> > > - /*
> > > -  * Do some memory reservations *before* memory is added to
> > > -  * memblock, so memblock allocations won't overwrite it.
> > > -  * Do it after early param, so we could get (unlikely) panic from
> > > -  * serial.
> > > -  *
> > > -  * After this point everything still needed from the boot loader or
> > > -  * firmware or kernel text should be early reserved or marked not
> > > -  * RAM in e820. All other memory is free game.
> > > -  */
> > > - early_reserve_memory();
> > > -
> > >   #ifdef CONFIG_MEMORY_HOTPLUG
> > >   /*
> > >* Memory used by the kernel cannot be hot-removed because Linux
> > > 
> > 
> 






-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [RFC PATCH V3 03/11] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-05-30 Thread Borislav Petkov

On Sun, May 30, 2021 at 11:06:20AM -0400, Tianyu Lan wrote:
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 156cd235659f..a82975600107 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -29,6 +29,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "../mm_internal.h"
>  
> @@ -1986,8 +1988,14 @@ static int __set_memory_enc_dec(unsigned long addr, 
> int numpages, bool enc)
>   int ret;
>  
>   /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported()) {
> + return hv_set_mem_host_visibility((void *)addr,
> + numpages * HV_HYP_PAGE_SIZE,
> + enc ? VMBUS_PAGE_NOT_VISIBLE
> + : VMBUS_PAGE_VISIBLE_READ_WRITE);

Put all this gunk in a hv-specific function somewhere in hv-land which
you only call from here. This way you probably won't even need to export
hv_set_mem_host_visibility() and so on...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/12] x86: major paravirt cleanup

2021-03-11 Thread Borislav Petkov

On Thu, Mar 11, 2021 at 01:50:26PM +0100, Borislav Petkov wrote:
> and move the cleanups patches 13 and 14 to the beginning of the set?

Yeah, 14 needs ALTERNATIVE_TERNARY so I guess after patch 5, that is.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 00/12] x86: major paravirt cleanup

2021-03-11 Thread Borislav Petkov

On Tue, Mar 09, 2021 at 02:48:01PM +0100, Juergen Gross wrote:
> This is a major cleanup of the paravirt infrastructure aiming at
> eliminating all custom code patching via paravirt patching.
> 
> This is achieved by using ALTERNATIVE instead, leading to the ability
> to give objtool access to the patched in instructions.
> 
> In order to remove most of the 32-bit special handling from pvops the
> time related operations are switched to use static_call() instead.
> 
> At the end of this series all paravirt patching has to do is to
> replace indirect calls with direct ones. In a further step this could
> be switched to static_call(), too.
> 
> Changes in V6:
> - switched back to "not" bit in feature value for "not feature"
> - other minor comments addressed

Ok, looks real good, the simplification is amazing! :)

Can you please redo with the minor nits addressed ontop of the
tip:x86/alternatives branch:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/alternatives

and move the cleanups patches 13 and 14 to the beginning of the set?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 05/12] x86/alternative: support ALTERNATIVE_TERNARY

2021-03-10 Thread Borislav Petkov

On Tue, Mar 09, 2021 at 02:48:06PM +0100, Juergen Gross wrote:
> diff --git a/arch/x86/include/asm/alternative.h 
> b/arch/x86/include/asm/alternative.h
> index 89889618ae01..4fb844e29d26 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -178,6 +178,9 @@ static inline int alternatives_text_reserved(void *start, 
> void *end)
>   ALTINSTR_REPLACEMENT(newinstr2, 2)  \
>   ".popsection\n"
>  
> +#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, newinstr2) \
> + ALTERNATIVE_2(oldinstr, newinstr2, X86_FEATURE_ALWAYS, newinstr1, 
> feature)

Make that:

/*
 * If @feature is set, patch @newinstr_yes, else @newinstr_no
 */
#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr_yes, newinstr_no) \
ALTERNATIVE_2(oldinstr, newinstr_no, X86_FEATURE_ALWAYS, newinstr_yes, 
feature)

and in alternative-asm.h too pls.

Regardless, this looks nice! :)

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 04/12] x86/alternative: support not-feature

2021-03-10 Thread Borislav Petkov

On Wed, Mar 10, 2021 at 08:52:40AM +0100, Jürgen Groß wrote:
> Did you look at patch 13? :-)

Well, I usually review in increasing patch order. :-P

But make that change here pls because otherwise unnecessary churn.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 02/12] x86/paravirt: switch time pvops functions to use static_call()

2021-03-10 Thread Borislav Petkov

On Wed, Mar 10, 2021 at 08:51:22AM +0100, Jürgen Groß wrote:
> It is combining the two needed actions: update the static call and
> set the paravirt_using_native_sched_clock boolean.

I actually meant what the point of using_native_sched_clock() is but put
this comment at the wrong place, sorry.

> Just had another idea: I could add a function to static_call.h for
> querying the current function. This would avoid the double book keeping
> and could probably be used later when switching other pv_ops calls to
> static_call, too (e.g. pv_is_native_spin_unlock()).
> 
> What do you think?

Yap, that makes sense. Alternatively, you could even add a bitfield to
pv_ops which carries that info and since pv_ops is global...

But yeah, the less bookkeeping the better.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 04/12] x86/alternative: support not-feature

2021-03-09 Thread Borislav Petkov

On Tue, Mar 09, 2021 at 02:48:05PM +0100, Juergen Gross wrote:
> Add support for alternative patching for the case a feature is not
> present on the current cpu.
> 
> For users of ALTERNATIVE() and friends an inverted feature is specified
> by applying the ALT_NOT() macro to it, e.g.:
> 
> ALTERNATIVE(old, new, ALT_NOT(feature))
> 
> Signed-off-by: Juergen Gross 
> ---
> V5:
> - split off from next patch
> - reworked to use flag byte (Boris Petkov)
> V6:
> - rework again to not use flag byte (Boris Petkov)
> ---
>  arch/x86/include/asm/alternative-asm.h |  3 +++
>  arch/x86/include/asm/alternative.h |  3 +++
>  arch/x86/kernel/alternative.c  | 19 ++-
>  3 files changed, 20 insertions(+), 5 deletions(-)

LGTM, minor touchups I'd do ontop:

---

diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 89889618ae01..fd205cdcfbad 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -55,18 +55,18 @@
".long 999b - .\n\t"\
".popsection\n\t"
 
+#define ALTINSTR_FLAG_INV  (1 << 15)
+#define ALT_NOT(feat)  ((feat) | ALTINSTR_FLAG_INV)
+
 struct alt_instr {
s32 instr_offset;   /* original instruction */
s32 repl_offset;/* offset to replacement instruction */
u16 cpuid;  /* cpuid bit set for replacement */
-#define ALTINSTR_FLAG_INV (1 << 15)
u8  instrlen;   /* length of original instruction */
u8  replacementlen; /* length of new instruction */
u8  padlen; /* length of build-time padding */
 } __packed;
 
-#define ALT_NOT(feat)  ((feat) | ALTINSTR_FLAG_INV)
-
 /*
  * Debug flag that can be tested to see whether alternative
  * instructions were patched in already:
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d8e669a1546f..133b549dc091 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -397,9 +397,10 @@ void __init_or_module noinline apply_alternatives(struct 
alt_instr *start,
BUG_ON(feature >= (NCAPINTS + NBUGINTS) * 32);
 
/*
-* Drop out if either:
-* - feature not available, but required, or
-* - feature available, but NOT required
+* Patch if either:
+* - feature is present
+* - feature not present but ALTINSTR_FLAG_INV is set to mean,
+*   patch if feature is *NOT* present.
 */
if (!boot_cpu_has(feature) == !(a->cpuid & ALTINSTR_FLAG_INV)) {
if (a->padlen > 1)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v6 02/12] x86/paravirt: switch time pvops functions to use static_call()

2021-03-09 Thread Borislav Petkov

On Tue, Mar 09, 2021 at 02:48:03PM +0100, Juergen Gross wrote:
> @@ -167,6 +168,17 @@ static u64 native_steal_clock(int cpu)
>   return 0;
>  }
>  
> +DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
> +DEFINE_STATIC_CALL(pv_sched_clock, native_sched_clock);
> +
> +bool paravirt_using_native_sched_clock = true;
> +
> +void paravirt_set_sched_clock(u64 (*func)(void))
> +{
> + static_call_update(pv_sched_clock, func);
> + paravirt_using_native_sched_clock = (func == native_sched_clock);
> +}

What's the point of this function if there's a global
paravirt_using_native_sched_clock variable now?

Looking how the bit of information whether native_sched_clock is used,
is needed in tsc.c, it probably would be cleaner if you add a

set_sched_clock_native(void);

or so, to tsc.c instead and call that here and make that long var name a
a shorter and static one in tsc.c instead.

Hmm?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v5 11/12] x86/paravirt: switch functions with custom code to ALTERNATIVE

2021-03-08 Thread Borislav Petkov

On Mon, Mar 08, 2021 at 01:28:43PM +0100, Juergen Gross wrote:
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 36cd71fa097f..04b3067f31b5 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -137,7 +137,8 @@ static inline void write_cr0(unsigned long x)
>  
>  static inline unsigned long read_cr2(void)
>  {
> - return PVOP_CALLEE0(unsigned long, mmu.read_cr2);
> + return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2,
> + "mov %%cr2, %%rax;", ~X86_FEATURE_XENPV);

Just some cursory poking first - indepth review later.

Do I see this correctly that the negated feature can be expressed with, to use
this example here:

ALTERNATIVE_TERNARY(mmu.read_cr2, X86_FEATURE_XENPV, "", "mov %%cr2, 
%%rax;");

?

And then you don't need to touch the patching code for ~feature handling
and the flags byte.

If you want it syntactically sugared, you can define a separate
ALTERNATIVE_NOT macro using ALTERNATIVE_TERNARY...

Hmmm.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v4 07/15] x86/paravirt: switch time pvops functions to use static_call()

2021-02-01 Thread Borislav Petkov

On Wed, Jan 20, 2021 at 02:55:47PM +0100, Juergen Gross wrote:
> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Signed-off-by: Juergen Gross 
> ---
> V4:
> - drop paravirt_time.h again
> - don't move Hyper-V code (Michael Kelley)
> ---
>  arch/x86/Kconfig  |  1 +
>  arch/x86/include/asm/mshyperv.h   |  2 +-
>  arch/x86/include/asm/paravirt.h   | 17 ++---
>  arch/x86/include/asm/paravirt_types.h |  6 --
>  arch/x86/kernel/cpu/vmware.c  |  5 +++--
>  arch/x86/kernel/kvm.c |  2 +-
>  arch/x86/kernel/kvmclock.c|  2 +-
>  arch/x86/kernel/paravirt.c| 16 
>  arch/x86/kernel/tsc.c |  2 +-
>  arch/x86/xen/time.c   | 11 ---
>  drivers/clocksource/hyperv_timer.c|  5 +++--
>  drivers/xen/time.c|  2 +-
>  12 files changed, 42 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..7ccd4a80788c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -771,6 +771,7 @@ if HYPERVISOR_GUEST
>  
>  config PARAVIRT
>   bool "Enable paravirtualization code"
> + depends on HAVE_STATIC_CALL
>   help
> This changes the kernel so it can modify itself when it is run
> under a hypervisor, potentially improving performance significantly
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 30f76b966857..b4ee331d29a7 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -63,7 +63,7 @@ typedef int (*hyperv_fill_flush_list_func)(
>  static __always_inline void hv_setup_sched_clock(void *sched_clock)
>  {
>  #ifdef CONFIG_PARAVIRT
> - pv_ops.time.sched_clock = sched_clock;
> + paravirt_set_sched_clock(sched_clock);
>  #endif
>  }
>  
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 4abf110e2243..1e45b46fae84 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -15,11 +15,22 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
> -static inline unsigned long long paravirt_sched_clock(void)
> +u64 dummy_steal_clock(int cpu);
> +u64 dummy_sched_clock(void);
> +
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);

Did you build this before sending?

I'm test-applying this on rc6 + tip/master so I probably am using a
different tree so it looks like something has changed in the meantime.
-rc6 has a couple of Xen changes which made applying those to need some
wiggling in...

Maybe you should redo them ontop of tip/master. That is, *if* they're
going to eventually go through tip. The diffstat has Xen stuff too so we
might need some synchronization here what goes where how...

./arch/x86/include/asm/paravirt.h:24:1: warning: data definition has no type or 
storage class
   24 | DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
  | ^~~
./arch/x86/include/asm/paravirt.h:24:1: error: type defaults to ‘int’ in 
declaration of ‘DECLARE_STATIC_CALL’ [-Werror=implicit-int]
./arch/x86/include/asm/paravirt.h:24:1: warning: parameter names (without 
types) in function declaration
./arch/x86/include/asm/paravirt.h:25:1: warning: data definition has no type or 
storage class
   25 | DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
  | ^~~
./arch/x86/include/asm/paravirt.h:25:1: error: type defaults to ‘int’ in 
declaration of ‘DECLARE_STATIC_CALL’ [-Werror=implicit-int]
./arch/x86/include/asm/paravirt.h:25:1: warning: parameter names (without 
types) in function declaration
./arch/x86/include/asm/paravirt.h: In function ‘paravirt_sched_clock’:
./arch/x86/include/asm/paravirt.h:33:9: error: implicit declaration of function 
‘static_call’ [-Werror=implicit-function-declaration]
   33 |  return static_call(pv_sched_clock)();
  | ^~~
./arch/x86/include/asm/paravirt.h:33:21: error: ‘pv_sched_clock’ undeclared 
(first use in this function); did you mean ‘dummy_sched_clock’?
   33 |  return static_call(pv_sched_clock)();
  | ^~
  | dummy_sched_clock
./arch/x86/include/asm/paravirt.h:33:21: note: each undeclared identifier is 
reported only once for each function it appears in
./arch/x86/include/asm/paravirt.h: In function ‘paravirt_steal_clock’:
./arch/x86/include/asm/paravirt.h:47:21: error: ‘pv_steal_clock’ undeclared 
(first use in this function); did you mean ‘dummy_steal_clock’?
   47 |  return static_call(pv_steal_clock)(cpu);
  | ^~
  | dummy_steal_clock
cc1: some

Re: [PATCH v3 07/15] x86/alternative: support "not feature" and ALTERNATIVE_TERNARY

2021-01-19 Thread Borislav Petkov

On Tue, Jan 19, 2021 at 12:35:42PM +0100, Jürgen Groß wrote:
> In fact this should rather be named "X86_FEATURE_TRUE", as this is its
> semantics.
>
> And I think I can define it to the value 0x instead of using a
> "real" bit for it.

A real bit is cheap - a special value to pay attention to in the future
not so much. Also we do have X86_FEATURE_ALWAYS already which has a
similar purpose...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 08/15] x86: add new features for paravirt patching

2021-01-14 Thread Borislav Petkov

On Thu, Jan 14, 2021 at 07:30:24PM +0100, Borislav Petkov wrote:
> On Thu, Dec 17, 2020 at 09:12:57PM +0800, kernel test robot wrote:
> >ld: arch/x86/kernel/alternative.o: in function `paravirt_set_cap':
> > >> arch/x86/kernel/alternative.c:605: undefined reference to 
> > >> `pv_is_native_spin_unlock'
> > >> ld: arch/x86/kernel/alternative.c:608: undefined reference to 
> > >> `pv_is_native_vcpu_is_preempted'
> 
> Looks like alternative.c needs #include .

Nah, that needs paravirt_set_cap() to be inside
CONFIG_PARAVIRT_SPINLOCKS-enabled ifdeffery.

Jürgen, why don't you move it to arch/x86/kernel/paravirt.c? That
should keep the ifdeffery low.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

1 2 >

1 - 100 of 136 matches

Mail list logo