from:"Roger Pau Monné"

Re: [XEN PATCH v1 2/3] x86/hvm: introduce config option for stdvga emulation

2024-10-04 Thread Roger Pau Monné

On Fri, Oct 04, 2024 at 12:33:53PM +0300, Sergiy Kibrik wrote:
> Introduce config option X86_STDVGA so that stdvga emulation driver can later 
> be
> made configurable and be disabled on systems that don't need it.
> 
> As a first step the option is hidden from user. No functional changes 
> intended.
> 
> Signed-off-by: Sergiy Kibrik 
> CC: Jan Beulich 
> ---
>  xen/arch/x86/Kconfig  | 3 +++
>  xen/arch/x86/hvm/Makefile | 2 +-
>  xen/arch/x86/include/asm/domain.h | 3 ++-
>  xen/arch/x86/include/asm/hvm/io.h | 4 
>  4 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 95275dc17e..89c42ff6da 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -147,6 +147,9 @@ config INTEL_VMX
>  config X86_PMTIMER
>   def_bool HVM
>  
> +config X86_STDVGA
> + def_bool HVM

Same as previous patch, the content of patch 3 needs to be moved here.

> +
>  config XEN_SHSTK
>   bool "Supervisor Shadow Stacks"
>   depends on HAS_AS_CET_SS
> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> index 321241f0bf..b7741b0f60 100644
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -22,7 +22,7 @@ obj-$(CONFIG_X86_PMTIMER) += pmtimer.o
>  obj-y += quirks.o
>  obj-y += rtc.o
>  obj-y += save.o
> -obj-y += stdvga.o
> +obj-$(CONFIG_X86_STDVGA) += stdvga.o
>  obj-y += vioapic.o
>  obj-y += vlapic.o
>  obj-y += vm_event.o
> diff --git a/xen/arch/x86/include/asm/domain.h 
> b/xen/arch/x86/include/asm/domain.h
> index 3f65bfd190..675a13d917 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -501,7 +501,8 @@ struct arch_domain
>  #define has_vrtc(d)(!!((d)->arch.emulation_flags & X86_EMU_RTC))
>  #define has_vioapic(d) (!!((d)->arch.emulation_flags & X86_EMU_IOAPIC))
>  #define has_vpic(d)(!!((d)->arch.emulation_flags & X86_EMU_PIC))
> -#define has_vvga(d)(!!((d)->arch.emulation_flags & X86_EMU_VGA))
> +#define has_vvga(d)(IS_ENABLED(CONFIG_X86_STDVGA) && \

You don't need the IS_ENABLED() if emulation_flags_ok() is adjusted
accordingly.

Thanks, Roger.

Re: [XEN PATCH v1 1/3] x86/hvm: introduce config option for ACPI PM timer

2024-10-04 Thread Roger Pau Monné

On Fri, Oct 04, 2024 at 12:31:50PM +0300, Sergiy Kibrik wrote:
> Introduce config option X86_PMTIMER so that pmtimer emulation driver can later
> be made configurable and be disabled on systems that don't need it.
> 
> As a first step the option is hidden from user, thus not making any functional
> changes here.
> 
> Signed-off-by: Sergiy Kibrik 
> CC: Jan Beulich 
> ---
>  xen/arch/x86/Kconfig   |  3 +++
>  xen/arch/x86/hvm/Makefile  |  2 +-
>  xen/arch/x86/include/asm/acpi.h|  5 +
>  xen/arch/x86/include/asm/domain.h  |  3 ++-
>  xen/arch/x86/include/asm/hvm/vpt.h | 10 ++
>  5 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 9cdd04721a..95275dc17e 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -144,6 +144,9 @@ config INTEL_VMX
> If your system includes a processor with Intel VT-x support, say Y.
> If in doubt, say Y.
>  
> +config X86_PMTIMER
> + def_bool HVM

The chunk in patch 3 that fill this option needs to be moved here,
together with the updated checks in emulation_flags_ok().

>  config XEN_SHSTK
>   bool "Supervisor Shadow Stacks"
>   depends on HAS_AS_CET_SS
> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> index 4c1fa5c6c2..321241f0bf 100644
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -18,7 +18,7 @@ obj-y += irq.o
>  obj-y += monitor.o
>  obj-y += mtrr.o
>  obj-y += nestedhvm.o
> -obj-y += pmtimer.o
> +obj-$(CONFIG_X86_PMTIMER) += pmtimer.o

I think you can also make the hvm_hw_acpi field in struct hvm_domain
presence dependent on CONFIG_X86_PMTIMER being enabled.

>  obj-y += quirks.o
>  obj-y += rtc.o
>  obj-y += save.o
> diff --git a/xen/arch/x86/include/asm/acpi.h b/xen/arch/x86/include/asm/acpi.h
> index 217819dd61..8d92014ae9 100644
> --- a/xen/arch/x86/include/asm/acpi.h
> +++ b/xen/arch/x86/include/asm/acpi.h
> @@ -150,8 +150,13 @@ void acpi_mmcfg_init(void);
>  /* Incremented whenever we transition through S3. Value is 1 during boot. */
>  extern uint32_t system_reset_counter;
>  
> +#ifdef CONFIG_X86_PMTIMER
>  void hvm_acpi_power_button(struct domain *d);
>  void hvm_acpi_sleep_button(struct domain *d);
> +#else
> +static inline void hvm_acpi_power_button(struct domain *d) {}
> +static inline void hvm_acpi_sleep_button(struct domain *d) {}
> +#endif

It would be best if those functions returned -ENODEV when the
interface is not available, but that's an existing issue, so won't
insist in you fixing it here.

>  /* suspend/resume */
>  void save_rest_processor_state(void);
> diff --git a/xen/arch/x86/include/asm/domain.h 
> b/xen/arch/x86/include/asm/domain.h
> index bdcdb8de09..3f65bfd190 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -496,7 +496,8 @@ struct arch_domain
>  
>  #define has_vlapic(d)  (!!((d)->arch.emulation_flags & X86_EMU_LAPIC))
>  #define has_vhpet(d)   (!!((d)->arch.emulation_flags & X86_EMU_HPET))
> -#define has_vpm(d) (!!((d)->arch.emulation_flags & X86_EMU_PM))
> +#define has_vpm(d) (IS_ENABLED(CONFIG_X86_PMTIMER) && \
> +!!((d)->arch.emulation_flags & X86_EMU_PM))

Do you really need the IS_ENABLED() here?  If you modify
emulation_flags_ok() to reject the flag if not available it won't be
possible for any domain to have it set.

Thanks, Roger.

Re: [PATCH 1/3] x86/APIC: Switch flat driver to use phys dst for ext ints

2024-10-04 Thread Roger Pau Monné

On Fri, Oct 04, 2024 at 08:48:05AM +0200, Jan Beulich wrote:
> On 03.10.2024 12:51, Roger Pau Monné wrote:
> > On Wed, Oct 02, 2024 at 04:17:24PM +0100, Matthew Barnes wrote:
> > 
> > The commit needs a log, doesn't need to be extremely long, but it's
> > important to note the reasoning behind using physical delivery for
> > external interrupts vs logial mode.
> 
> Furthermore I question that the naming can remain as is - the driver
> is no longer uniformly "flat" then.

Yeah, that's done in a later patch.  I wouldn't mind if it was all
folded into the same patch TBH.

Thanks, Roger.

Re: [PATCH] x86/boot: Further simplify CR4 handling in dom0_construct_pv()

2024-10-04 Thread Roger Pau Monné

On Fri, Oct 04, 2024 at 08:52:52AM +0200, Jan Beulich wrote:
> On 03.10.2024 01:20, Andrew Cooper wrote:
> > The logic would be more robust disabling SMAP based on its precense in CR4,
> > rather than SMAP's accociation with a synthetic feature.
> 
> It's hard to tell what's more robust without knowing what future changes
> there might be. In particular ...
> 
> > @@ -1064,19 +1065,19 @@ int __init dom0_construct_pv(struct domain *d,
> >   * prevents us needing to write construct_dom0() in terms of
> >   * copy_{to,from}_user().
> >   */
> > -if ( boot_cpu_has(X86_FEATURE_XEN_SMAP) )
> > +if ( cr4 & X86_CR4_SMAP )
> 
> ... with this adjustment ...
> 
> >  {
> >  if ( IS_ENABLED(CONFIG_PV32) )
> >  cr4_pv32_mask &= ~X86_CR4_SMAP;
> 
> ... this update of a global no longer occurs. Playing games with CR4
> elsewhere might run into issues with this lack of updating.

Maybe we should assert the state of cr4 is as expected?

ASSERT(!boot_cpu_has(X86_FEATURE_XEN_SMAP) || (cr4 & X86_CR4_SMAP));

Thanks, Roger.

Re: [PATCH] x86/boot: Don't use INC to set defaults

2024-10-03 Thread Roger Pau Monné

On Thu, Oct 03, 2024 at 04:42:23PM +0100, Frediano Ziglio wrote:
> On Thu, Oct 3, 2024 at 3:58 PM Andrew Cooper  
> wrote:
> >
> > __efi64_mb2_start() makes some bold assumptions about the efi_platform and
> > skip_realmode booleans.  Set them to 1 explicitly, which is more robust.
> >
> > Make the comment a little more consice.
> >
> > No practical change.
> >
> > Signed-off-by: Andrew Cooper 
> > ---
> > CC: Jan Beulich 
> > CC: Roger Pau Monné 
> > CC: Marek Marczykowski-Górecki 
> > CC: Daniel P. Smith 
> > CC: Frediano Ziglio 
> > ---
> >  xen/arch/x86/boot/head.S | 10 --
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> > index d1856d8012c9..af776c201a15 100644
> > --- a/xen/arch/x86/boot/head.S
> > +++ b/xen/arch/x86/boot/head.S
> > @@ -279,14 +279,12 @@ __efi64_mb2_start:
> >  pop %rbx
> >  pop %rax
> >
> > -/* We are on EFI platform and EFI boot services are available. */
> > -incbefi_platform(%rip)
> > -
> >  /*
> > - * Disable real mode and other legacy stuff which should not
> > - * be run on EFI platforms.
> > + * efi_multiboot2_prelude() is happy that we're on EFI platform.  
> > Skip
> > + * the BIOS initialisation path.
> >   */
> > -incbskip_realmode(%rip)
> > +movb$1, efi_platform(%rip)
> > +movb$1, skip_realmode(%rip)
> >
> >  /* Jump to trampoline_setup after switching CPU to x86_32 mode. */
> >  lea trampoline_setup(%rip),%r15
> >
> > base-commit: eb21ce14d709ef0c0030d0625028a4868c81126f
> 
> Reviewed-by: Frediano Ziglio 

Acked-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH 2/3] x86/APIC: Remove unnecessary logical flat functions

2024-10-03 Thread Roger Pau Monné

On Wed, Oct 02, 2024 at 04:17:25PM +0100, Matthew Barnes wrote:
> Now that GENAPIC_FLAT uses physical destination for external interrupts,
> these functions implementing logical flat for external interrupts are no
> longer used.

I think it's fine to merge this with the previous commit, so that the
functions are removed at the point there are no longer used.

Thanks, Roger.

Re: [PATCH 1/3] x86/APIC: Switch flat driver to use phys dst for ext ints

2024-10-03 Thread Roger Pau Monné

On Wed, Oct 02, 2024 at 04:17:24PM +0100, Matthew Barnes wrote:

The commit needs a log, doesn't need to be extremely long, but it's
important to note the reasoning behind using physical delivery for
external interrupts vs logial mode.

Take a look at the x2APIC one for inspiration, the motivation should
be similar.

Thanks, Roger.

Re: [PATCH 0/3] Switch flat driver to use phys dst for ext ints

2024-10-03 Thread Roger Pau Monné

On Wed, Oct 02, 2024 at 06:45:51PM +0100, Andrew Cooper wrote:
> On 02/10/2024 4:17 pm, Matthew Barnes wrote:
> > This patch series switches the apic_default APIC driver from using
> > logical flat destination mode for external interrupts, to using
> > physical destination mode for external interrupts.
> >
> > This is followed up by two non-functional cleanup commits.
> >
> > Matthew Barnes (3):
> >   x86/APIC: Switch flat driver to use phys dst for ext ints
> >   x86/APIC: Remove unnecessary logical flat functions
> >   x86/APIC: Refactor GENAPIC_FLAT -> GENAPIC_MIXED
> 
> Patches 1 and 2 look fine.
> 
> For patch 3, can't we just delete the macro and expand it in it's single
> location?
> 
> It's a bigger patch, but a better improvement in genapic.h

I agree, but if we go that route we might as well do the conversion of
GENAPIC_PHYS in the same patch, and deal with both at the same time
(GENAPIC_PHYS is also used in a single place).

Thanks, Roger.

Re: [PATCH] iommu/amd-vi: make IOMMU list ro after init

2024-09-30 Thread Roger Pau Monné

On Mon, Sep 30, 2024 at 12:33:56PM +0200, Jan Beulich wrote:
> On 30.09.2024 12:28, Roger Pau Monne wrote:
> > The only functions to modify the list, amd_iommu_detect_one_acpi() and
> > amd_iommu_init_cleanup(), are already init.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> Acked-by: Jan Beulich 
> 
> Surely the same can be said for VT-d's acpi_*_units? And likely other
> globals there and here?

Possibly, I wasn't explicitly looking for stuff in IOMMU code to
convert from read_mostly to ro_after_init, just came across this one
while looking at something in the area.

Thanks, Roger.

Re: [PATCH 22/22] x86/mm: zero stack on stack switch or reset

2024-09-27 Thread Roger Pau Monné

On Tue, Aug 13, 2024 at 03:16:42PM +0200, Jan Beulich wrote:
> On 26.07.2024 17:22, Roger Pau Monne wrote:
> > With the stack mapped on a per-CPU basis there's no risk of other CPUs being
> > able to read the stack contents, but vCPUs running on the current pCPU could
> > read stack rubble from operations of previous vCPUs.
> > 
> > The #DF stack is not zeroed because handling of #DF results in a panic.
> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  xen/arch/x86/include/asm/current.h | 30 +-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/include/asm/current.h 
> > b/xen/arch/x86/include/asm/current.h
> > index 75b9a341f814..02b4118b03ef 100644
> > --- a/xen/arch/x86/include/asm/current.h
> > +++ b/xen/arch/x86/include/asm/current.h
> > @@ -177,6 +177,14 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
> >  # define SHADOW_STACK_WORK ""
> >  #endif
> >  
> > +#define ZERO_STACK  \
> > +"test %[stk_size], %[stk_size];"\
> > +"jz .L_skip_zeroing.%=;"\
> > +"std;"  \
> > +"rep stosb;"\
> > +"cld;"  \
> 
> Is ERMS actually helping with backwards copies? I didn't think so, and hence
> it may be that REP STOSQ might be more efficient here?

Possibly, Intel optimization guide says:

"However, setting the DF to force REP MOVSB to copy bytes from high
towards low addresses will experience significant performance
degradation."

I will see what I can do.

Thanks, Roger.

Re: [PATCH 16/22] x86/mm: introduce a per-CPU L3 table for the per-domain slot

2024-09-27 Thread Roger Pau Monné

On Fri, Aug 16, 2024 at 07:40:40PM +0100, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 4:22 PM BST, Roger Pau Monne wrote:
> > So far L4 slot 260 has always been per-domain, in other words: all vCPUs of 
> > a
> > domain share the same L3 entry.  Currently only 3 slots are used in that L3
> > table, which leaves plenty of room.
> >
> > Introduce a per-CPU L3 that's used the the domain has Address Space 
> > Isolation
> > enabled.  Such per-CPU L3 gets currently populated using the same L3 entries
> > present on the per-domain L3 (d->arch.perdomain_l3_pg).
> >
> > No functional change expected, as the per-CPU L3 is always a copy of the
> > contents of d->arch.perdomain_l3_pg.
> >
> > Note that all the per-domain L3 entries are populated at domain create, and
> > hence there's no need to sync the state of the per-CPU L3 as the domain 
> > won't
> > yet be running when the L3 is modified.
> >
> > Signed-off-by: Roger Pau Monné 
> 
> Still scratching my head with the details on this, but in general I'm utterly
> confused whenever I read per-CPU in the series because it's not obvious which
> CPU (p or v) I should be thinking about. A general change that would help a 
> lot
> is to replace every instance of per-CPU with per-vCPU or per-pCPU as needed.

per-CPU is always per-pCPU, as CPU without any prefix should always
refer to a physical CPU.  I think it's only recently that we have
started using pCPU vs vCPU, in the past it always was CPU vs vCPU.

I will attempt to be better at explicitly using pCPU instead of CPU in
the commit messages, sorry.

Thanks, Roger.

Re: [PATCH v4 1/6] xen: introduce SIMPLE_DECL_SECTION

2024-09-27 Thread Roger Pau Monné

On Fri, Sep 27, 2024 at 11:07:58AM +0200, oleksii.kuroc...@gmail.com wrote:
> On Fri, 2024-09-27 at 09:58 +0200, Roger Pau Monné wrote:
> > On Thu, Sep 26, 2024 at 06:54:20PM +0200, Oleksii Kurochko wrote:
> > > Introduce SIMPLE_DECL_SECTION to cover the case when
> > > an architecture wants to declare a section without specifying
> > > of load address for the section.
> > > 
> > > Update x86/xen.lds.S to use SIMPLE_DECL_SECTION.
> > 
> > No strong opinion, but I feel SIMPLE is not very descriptive.  It
> > might be better to do it the other way around: introduce a define for
> > when the DECL_SECTION macro should specify a load address:
> > DECL_SECTION_WITH_LADDR for example.
> In the next patch, two sections are introduced: dt_dev_info and
> acpi_dev_info. The definition of these sections has been made common
> and moved to xen.lds.h, and it looks like this:
>+#define DT_DEV_INFO(secname)\
>+  . = ALIGN(POINTER_ALIGN); \
>+  DECL_SECTION(secname) {   \
>+   _sdevice = .;   \
>+   *(secname)  \
>+   _edevice = .;   \
>+  } :text
> (A similar approach is used for ACPI, please refer to the next patch in
> this series.)
> 
> For PPC, DECL_SECTION should specify a load address, whereas for Arm
> and RISC-V, it should not.
> 
> With this generalization, the name of DECL_SECTION should have the same
> name in both cases, whether a load address needs to be specified or not

Oh, sorry, I think you misunderstood my suggestion.

I'm not suggesting to introduce a new macro named
DECL_SECTION_WITH_LADDR(), but rather to use DECL_SECTION_WITH_LADDR
instead of SIMPLE_DECL_SECTION in order to signal whether
DECL_SECTION() should specify a load address or not, iow:

#ifdef DECL_SECTION_WITH_LADDR
# define DECL_SECTION(x) x : AT(ADDR(x) - __XEN_VIRT_START)
#else
# define DECL_SECTION(x) x :
#endif

Thanks, Roger.

Re: [PATCH 15/22] x86/idle: allow using a per-pCPU L4

2024-09-27 Thread Roger Pau Monné

On Wed, Aug 21, 2024 at 05:42:26PM +0100, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 4:21 PM BST, Roger Pau Monne wrote:
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index 9cfcf0dc63f3..b62c4311da6c 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -555,6 +555,7 @@ void arch_vcpu_regs_init(struct vcpu *v)
> >  int arch_vcpu_create(struct vcpu *v)
> >  {
> >  struct domain *d = v->domain;
> > +root_pgentry_t *pgt = NULL;
> >  int rc;
> >  
> >  v->arch.flags = TF_kernel_mode;
> > @@ -589,7 +590,23 @@ int arch_vcpu_create(struct vcpu *v)
> >  else
> >  {
> >  /* Idle domain */
> > -v->arch.cr3 = __pa(idle_pg_table);
> > +if ( (opt_asi_pv || opt_asi_hvm) && v->vcpu_id )
> > +{
> > +pgt = alloc_xenheap_page();
> > +
> > +/*
> > + * For the idle vCPU 0 (the BSP idle vCPU) use idle_pg_table
> > + * directly, there's no need to create yet another copy.
> > + */
> 
> Shouldn't this comment be in the else branch instead? Or reworded to refer to
> non-0 vCPUs.

Sure, moved to the else branch.

> > +rc = -ENOMEM;
> 
> While it's true rc is overriden later, I feel uneasy leaving it with -ENOMEM
> after the check. Could we have it immediately before "goto fail"?

I have to admit I found this coding style weird at first, but it's
used all over Xen.  I don't mind setting rc ahead of the goto, AFAICT
the only benefit of the current style is that we can avoid the braces
around the if code block for it being a single statement.

> > +if ( !pgt )
> > +goto fail;
> > +
> > +copy_page(pgt, idle_pg_table);
> > +v->arch.cr3 = __pa(pgt);
> > +}
> > +else
> > +v->arch.cr3 = __pa(idle_pg_table);
> >  rc = 0;
> >  v->arch.msrs = ZERO_BLOCK_PTR; /* Catch stray misuses */
> >  }
> > @@ -611,6 +628,7 @@ int arch_vcpu_create(struct vcpu *v)
> >  vcpu_destroy_fpu(v);
> >  xfree(v->arch.msrs);
> >  v->arch.msrs = NULL;
> > +free_xenheap_page(pgt);
> >  
> >  return rc;
> >  }
> 
> I guess the idle domain has a forever lifetime and its vCPUs are kept around
> forever too, right?; otherwise we'd need extra logic in the the vcpu_destroy()
> to free the page table copies should they exist too.

Indeed, vcpus are only destroyed when destroying domains, and system
domains are never destroyed.

Thanks, Roger.

Re: [PATCH v4 1/6] xen: introduce SIMPLE_DECL_SECTION

2024-09-27 Thread Roger Pau Monné

On Thu, Sep 26, 2024 at 06:54:20PM +0200, Oleksii Kurochko wrote:
> Introduce SIMPLE_DECL_SECTION to cover the case when
> an architecture wants to declare a section without specifying
> of load address for the section.
> 
> Update x86/xen.lds.S to use SIMPLE_DECL_SECTION.

No strong opinion, but I feel SIMPLE is not very descriptive.  It
might be better to do it the other way around: introduce a define for
when the DECL_SECTION macro should specify a load address:
DECL_SECTION_WITH_LADDR for example.

> 
> Signed-off-by: Oleksii Kurochko 
> ---
> Changes in V4:
>  - new patch
> ---
>  xen/arch/x86/xen.lds.S| 6 --
>  xen/include/xen/xen.lds.h | 6 ++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index b60d2f0d82..9275a566e1 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -3,6 +3,10 @@
>  
>  #include 
>  #include 
> +
> +#ifdef EFI
> +#define SIMPLE_DECL_SECTION
> +#endif

A nit, but we have been trying to add some indentation to make the
ifdef blocks easier to read, so this would become:

#ifdef EFI
# define SIMPLE_DECL_SECTION
#endif

If it's not too much fuzz to adjust here and below.

Thanks, Roger.

Re: [PATCH v3 3/5] xen/livepatch: do Xen build-id check earlier

2024-09-26 Thread Roger Pau Monné

On Thu, Sep 26, 2024 at 05:11:19PM +0100, Ross Lagerwall wrote:
> On Thu, Sep 26, 2024 at 11:21 AM Roger Pau Monne  wrote:
> >
> > The check against the expected Xen build ID should be done ahead of 
> > attempting
> > to apply the alternatives contained in the livepatch.
> >
> > If the CPUID in the alternatives patching data is out of the scope of the
> > running Xen featureset the BUG() in _apply_alternatives() will trigger thus
> > bringing the system down.  Note the layout of struct alt_instr could also
> > change between versions.  It's also possible for struct 
> > exception_table_entry
> > to have changed format, hence leading to other kind of errors if parsing of 
> > the
> > payload is done ahead of checking if the Xen build-id matches.
> >
> > Move the Xen build ID check as early as possible.  To do so introduce a new
> > check_xen_buildid() function that parses and checks the Xen build-id before
> > moving the payload.  Since the expected Xen build-id is used early to
> > detect whether the livepatch payload could be loaded, there's no reason to
> > store it in the payload struct, as a non-matching Xen build-id won't get the
> > payload populated in the first place.
> >
> > Note printing the expected Xen build ID has part of dumping the payload
> > information is no longer done: all loaded payloads would have Xen build IDs
> > matching the running Xen, otherwise they would have failed to load.
> >
> > Fixes: 879615f5db1d ('livepatch: Always check hypervisor build ID upon 
> > livepatch upload')
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Ross Lagerwall 
> 
> Should the ELF_LIVEPATCH_XEN_DEPENDS check also be dropped from
> check_special_sections() since it is no longer necessary?

It's dropped from check_special_sections() in this patch, just not
mentioned in the commit message I'm afraid.

Thanks, Roger.

Re: [XEN PATCH 0/2] blkif: Fix discard req align requirement and various typos

2024-09-26 Thread Roger Pau Monné

On Thu, Sep 26, 2024 at 12:53:49PM +, Anthony PERARD wrote:
> Patch series available in this git branch:
> https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git 
> br.blkif-wording-fix-v1

That was fast, thanks:

Reviewed-by: Roger Pau Monné 

Regards, Roger.

Re: [PATCH v3 4/5] x86/alternatives: do not BUG during apply

2024-09-26 Thread Roger Pau Monné

On Thu, Sep 26, 2024 at 12:09:05PM +0100, Andrew Cooper wrote:
> On 26/09/2024 11:14 am, Roger Pau Monne wrote:
> > diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
> > index 7824053c9d33..990b7c600932 100644
> > --- a/xen/arch/x86/alternative.c
> > +++ b/xen/arch/x86/alternative.c
> > @@ -198,9 +198,29 @@ static void init_or_livepatch 
> > _apply_alternatives(struct alt_instr *start,
> >  uint8_t buf[MAX_PATCH_LEN];
> >  unsigned int total_len = a->orig_len + a->pad_len;
> >  
> > -BUG_ON(a->repl_len > total_len);
> > -BUG_ON(total_len > sizeof(buf));
> > -BUG_ON(a->cpuid >= NCAPINTS * 32);
> > +if ( a->repl_len > total_len )
> > +{
> > +printk(XENLOG_ERR
> > +   "alt for %ps, replacement size %#x larger than origin 
> > %#x\n",
> 
> "Alt" I think.  It's both an abbreviation, and the start of the sentence
> here.

I'm always unsure whether to use uppercase at the beginning of log
messages, because those are preceded by the (XEN) prefix.

> Can fix on commit.
> 
> Reviewed-by: Andrew Cooper 

Thanks, Roger.

Re: [PATCH v3 1/5] xen/livepatch: drop load_addr Elf section field

2024-09-26 Thread Roger Pau Monné

On Thu, Sep 26, 2024 at 12:04:06PM +0100, Andrew Cooper wrote:
> On 26/09/2024 11:14 am, Roger Pau Monne wrote:
> > The Elf loading logic will initially use the `data` section field to stash a
> > pointer to the temporary loaded data (from the buffer allocated in
> > livepatch_upload(), which is later relocated and the new pointer stashed in
> > `load_addr`.
> >
> > Remove this dual field usage and use an `addr` uniformly.  Initially data 
> > will
> > point to the temporary buffer, until relocation happens, at which point the
> > pointer will be updated to the relocated address.
> >
> > This avoids leaking a dangling pointer in the `data` field once the 
> > temporary
> > buffer is freed by livepatch_upload().
> >
> > Note the `addr` field cannot retain the const attribute from the previous
> > `data`field, as there's logic that performs manipulations against the loaded
> > sections, like applying relocations or sorting the exception table.
> >
> > No functional change intended.
> >
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Andrew Cooper 
> 
> > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> > index df41dcce970a..7e6bf58f4408 100644
> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -371,18 +371,21 @@ static int move_payload(struct payload *payload, 
> > struct livepatch_elf *elf)
> >  
> >  ASSERT(offset[i] != UINT_MAX);
> >  
> > -elf->sec[i].load_addr = buf + offset[i];
> > +buf += offset[i];
> >  
> >  /* Don't copy NOBITS - such as BSS. */
> >  if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
> >  {
> > -memcpy(elf->sec[i].load_addr, elf->sec[i].data,
> > +memcpy(buf, elf->sec[i].addr,
> > elf->sec[i].sec->sh_size);
> >  dprintk(XENLOG_DEBUG, LIVEPATCH "%s: Loaded %s at %p\n",
> > -elf->name, elf->sec[i].name, 
> > elf->sec[i].load_addr);
> > +elf->name, elf->sec[i].name, buf);
> >  }
> >  else
> > -memset(elf->sec[i].load_addr, 0, elf->sec[i].sec->sh_size);
> > +memset(buf, 0, elf->sec[i].sec->sh_size);
> > +
> > +/* Replace the temporary buffer with the relocated one. */
> > +elf->sec[i].addr = buf;
> 
> I'd suggest /* Update sec[] to refer to its final location. */
> 
> Replace is technically the memcpy() above, and "relocate" means
> something else in ELF terms.

Sure, no strong opinion.

> Can fix on commit.

Thanks!

Re: [PATCH v3] blkif: reconcile protocol specification with in-use implementations

2024-09-26 Thread Roger Pau Monné

On Thu, Sep 26, 2024 at 09:46:43AM +, Anthony PERARD wrote:
> On Thu, Sep 12, 2024 at 11:57:29AM +0200, Roger Pau Monne wrote:
> >  /*
> >   * Cast to this structure when blkif_request.operation == BLKIF_OP_DISCARD
> >   * sizeof(struct blkif_request_discard) <= sizeof(struct blkif_request)
> > + *
> > + * The 'sector_number' field is in units of 512b, despite the value of the
> > + * 'sector-size' xenstore node.  Note however that the offset in
> > + * 'sector_number' must be aligned to 'sector-size'.
> 
> For discard request, there's "discard-granularity", and I think
> `sector_number` should be aligned to it. See "discard-granularity" and
> note 4.

Indeed, the wording here would be better as:

"Note however that the offset in 'sector_number' must be aligned to
'sector-size' or 'discard-alignment' if present."

Would you mind sending a patch to fix this?

Thanks, Roger.

Re: [PATCH v2 2/6] xen/livepatch: zero pointer to temporary load buffer

2024-09-26 Thread Roger Pau Monné

On Thu, Sep 26, 2024 at 09:24:56AM +0100, Andrew Cooper wrote:
> On 26/09/2024 9:22 am, Roger Pau Monné wrote:
> > On Wed, Sep 25, 2024 at 07:28:42PM +0100, Andrew Cooper wrote:
> >> On 25/09/2024 11:00 am, Roger Pau Monné wrote:
> >>> On Wed, Sep 25, 2024 at 10:33:39AM +0100, Andrew Cooper wrote:
> >>>> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> >>>>> The livepatch_elf_sec data field points to the temporary load buffer, 
> >>>>> it's the
> >>>>> load_addr field that points to the stable loaded section data.  Zero 
> >>>>> the data
> >>>>> field once load_addr is set, as it would otherwise become a dangling 
> >>>>> pointer
> >>>>> once the load buffer is freed.
> >>>>>
> >>>>> No functional change intended.
> >>>>>
> >>>>> Signed-off-by: Roger Pau Monné 
> >>>>> ---
> >>>>> Changes since v1:
> >>>>>  - New in this version.
> >>>>> ---
> >>>>>  xen/common/livepatch.c | 3 +++
> >>>>>  1 file changed, 3 insertions(+)
> >>>>>
> >>>>> diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> >>>>> index df41dcce970a..87b3db03e26d 100644
> >>>>> --- a/xen/common/livepatch.c
> >>>>> +++ b/xen/common/livepatch.c
> >>>>> @@ -383,6 +383,9 @@ static int move_payload(struct payload *payload, 
> >>>>> struct livepatch_elf *elf)
> >>>>>  }
> >>>>>  else
> >>>>>  memset(elf->sec[i].load_addr, 0, 
> >>>>> elf->sec[i].sec->sh_size);
> >>>>> +
> >>>>> +/* Avoid leaking pointers to temporary load buffers. */
> >>>>> +elf->sec[i].data = NULL;
> >>>>>  }
> >>>>>  }
> >>>>>  
> >>>> Where is the data allocated and freed?
> >>>>
> >>>> I don't see it being freed in this loop, so how is freed subsequently?
> >>> It's allocated and freed by livepatch_upload(), it's the raw_data
> >>> buffer that's allocated in the context of that function.
> >> Well, this is a mess isn't it.
> >>
> >> Ok, so livepatch_upload() makes a temporary buffer to copy everything into.
> >>
> >> In elf_resolve_sections(), we set up sec[i].data pointing into this
> >> temporary buffer.
> >>
> >> And here, we copy the data from the temporary buffer, into the final
> >> destination in the Xen .text/data/rodata region.
> >>
> >> So yes, this does end up being a dangling pointer, and clobbering it is
> >> good.
> >>
> >>
> >> But, seeing the `n = sec->load_addr ?: sec->data` in patch 4, wouldn't
> >> it be better to drop this second pointer and just have move_payload()
> >> update it here?
> >>
> >> I can't see anything good which can come from having two addresses, nor
> >> a reason why we'd need both.
> >>
> >> Then again, if this is too hard to arrange, it probably can be left as
> >> an exercise to a future developer.
> > So this is how it ends up looking,  there's a bit of churn due to
> > having to drop const on some function parameters.
> 
> Looks fine to me.
> 
> I'd be tempted to name the final field 'addr' because for most of its
> life it is the load address.

I've changed to `addr`.  I however feel it's kind of redundant to name
a pointer field `addr`, as by the type itself (being a pointer) it's
clear it's an address.

> For the comment on the field, I'd say "this is first a temporary buffer,
> then later the load address".

Adjusted.

Thanks, Roger.

Re: [PATCH v2 2/6] xen/livepatch: zero pointer to temporary load buffer

2024-09-26 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 07:28:42PM +0100, Andrew Cooper wrote:
> On 25/09/2024 11:00 am, Roger Pau Monné wrote:
> > On Wed, Sep 25, 2024 at 10:33:39AM +0100, Andrew Cooper wrote:
> >> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> >>> The livepatch_elf_sec data field points to the temporary load buffer, 
> >>> it's the
> >>> load_addr field that points to the stable loaded section data.  Zero the 
> >>> data
> >>> field once load_addr is set, as it would otherwise become a dangling 
> >>> pointer
> >>> once the load buffer is freed.
> >>>
> >>> No functional change intended.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>> ---
> >>> Changes since v1:
> >>>  - New in this version.
> >>> ---
> >>>  xen/common/livepatch.c | 3 +++
> >>>  1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> >>> index df41dcce970a..87b3db03e26d 100644
> >>> --- a/xen/common/livepatch.c
> >>> +++ b/xen/common/livepatch.c
> >>> @@ -383,6 +383,9 @@ static int move_payload(struct payload *payload, 
> >>> struct livepatch_elf *elf)
> >>>  }
> >>>  else
> >>>  memset(elf->sec[i].load_addr, 0, 
> >>> elf->sec[i].sec->sh_size);
> >>> +
> >>> +/* Avoid leaking pointers to temporary load buffers. */
> >>> +elf->sec[i].data = NULL;
> >>>  }
> >>>  }
> >>>  
> >> Where is the data allocated and freed?
> >>
> >> I don't see it being freed in this loop, so how is freed subsequently?
> > It's allocated and freed by livepatch_upload(), it's the raw_data
> > buffer that's allocated in the context of that function.
> 
> Well, this is a mess isn't it.
> 
> Ok, so livepatch_upload() makes a temporary buffer to copy everything into.
> 
> In elf_resolve_sections(), we set up sec[i].data pointing into this
> temporary buffer.
> 
> And here, we copy the data from the temporary buffer, into the final
> destination in the Xen .text/data/rodata region.
> 
> So yes, this does end up being a dangling pointer, and clobbering it is
> good.
> 
> 
> But, seeing the `n = sec->load_addr ?: sec->data` in patch 4, wouldn't
> it be better to drop this second pointer and just have move_payload()
> update it here?
> 
> I can't see anything good which can come from having two addresses, nor
> a reason why we'd need both.
> 
> Then again, if this is too hard to arrange, it probably can be left as
> an exercise to a future developer.

So this is how it ends up looking,  there's a bit of churn due to
having to drop const on some function parameters.

diff --git a/xen/arch/arm/arm32/livepatch.c b/xen/arch/arm/arm32/livepatch.c
index d50066564666..d1a89dde6ea3 100644
--- a/xen/arch/arm/arm32/livepatch.c
+++ b/xen/arch/arm/arm32/livepatch.c
@@ -243,7 +243,7 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
 
 symndx = ELF32_R_SYM(r_a->r_info);
 type = ELF32_R_TYPE(r_a->r_info);
-dest = base->load_addr + r_a->r_offset; /* P */
+dest = base->data + r_a->r_offset; /* P */
 addend = r_a->r_addend;
 }
 else
@@ -252,7 +252,7 @@ int arch_livepatch_perform(struct livepatch_elf *elf,
 
 symndx = ELF32_R_SYM(r->r_info);
 type = ELF32_R_TYPE(r->r_info);
-dest = base->load_addr + r->r_offset; /* P */
+dest = base->data + r->r_offset; /* P */
 addend = get_addend(type, dest);
 }
 
diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
index dfb72be44fb8..46a390da42a3 100644
--- a/xen/arch/arm/arm64/livepatch.c
+++ b/xen/arch/arm/arm64/livepatch.c
@@ -248,7 +248,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 {
 const Elf_RelA *r = rela->data + i * rela->sec->sh_entsize;
 unsigned int symndx = ELF64_R_SYM(r->r_info);
-void *dest = base->load_addr + r->r_offset; /* P */
+void *dest = base->data + r->r_offset; /* P */
 bool overflow_check = true;
 int ovf = 0;
 uint64_t val;
diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index 4f76127e1f77..89a62b33cd9c 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -260,7 +260,7 @@ int arch_livepatch_perform_rela(struct livepatch_elf *elf,
 {
 con

Re: [PATCH v2 2/6] xen/livepatch: zero pointer to temporary load buffer

2024-09-26 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 07:28:42PM +0100, Andrew Cooper wrote:
> On 25/09/2024 11:00 am, Roger Pau Monné wrote:
> > On Wed, Sep 25, 2024 at 10:33:39AM +0100, Andrew Cooper wrote:
> >> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> >>> The livepatch_elf_sec data field points to the temporary load buffer, 
> >>> it's the
> >>> load_addr field that points to the stable loaded section data.  Zero the 
> >>> data
> >>> field once load_addr is set, as it would otherwise become a dangling 
> >>> pointer
> >>> once the load buffer is freed.
> >>>
> >>> No functional change intended.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>> ---
> >>> Changes since v1:
> >>>  - New in this version.
> >>> ---
> >>>  xen/common/livepatch.c | 3 +++
> >>>  1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> >>> index df41dcce970a..87b3db03e26d 100644
> >>> --- a/xen/common/livepatch.c
> >>> +++ b/xen/common/livepatch.c
> >>> @@ -383,6 +383,9 @@ static int move_payload(struct payload *payload, 
> >>> struct livepatch_elf *elf)
> >>>  }
> >>>  else
> >>>  memset(elf->sec[i].load_addr, 0, 
> >>> elf->sec[i].sec->sh_size);
> >>> +
> >>> +/* Avoid leaking pointers to temporary load buffers. */
> >>> +elf->sec[i].data = NULL;
> >>>  }
> >>>  }
> >>>  
> >> Where is the data allocated and freed?
> >>
> >> I don't see it being freed in this loop, so how is freed subsequently?
> > It's allocated and freed by livepatch_upload(), it's the raw_data
> > buffer that's allocated in the context of that function.
> 
> Well, this is a mess isn't it.
> 
> Ok, so livepatch_upload() makes a temporary buffer to copy everything into.
> 
> In elf_resolve_sections(), we set up sec[i].data pointing into this
> temporary buffer.
> 
> And here, we copy the data from the temporary buffer, into the final
> destination in the Xen .text/data/rodata region.
> 
> So yes, this does end up being a dangling pointer, and clobbering it is
> good.
> 
> 
> But, seeing the `n = sec->load_addr ?: sec->data` in patch 4, wouldn't
> it be better to drop this second pointer and just have move_payload()
> update it here?

I didn't specially like the usage of either load_addr or data in patch
4.

I see, so move_payload() would replace ->data with the relocated
pointer.  One slightly nice thing about the current arrangement is
that data is const, with that change it should become non-const, as we
do modify the contents of load_addr in some case (like sorting the
exception table).  I don't think the constness of data was that
useful, as it's just the temporary buffer.

> I can't see anything good which can come from having two addresses, nor
> a reason why we'd need both.
> 
> Then again, if this is too hard to arrange, it probably can be left as
> an exercise to a future developer.

I can see if it's mostly a trivial change.

Thanks, Roger.

Re: [PATCH 13/22] x86/hvm: use a per-pCPU monitor table in HAP mode

2024-09-25 Thread Roger Pau Monné

On Fri, Aug 16, 2024 at 07:02:54PM +0100, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 4:21 PM BST, Roger Pau Monne wrote:
> > Instead of allocating a monitor table for each vCPU when running in HVM HAP
> > mode, use a per-pCPU monitor table, which gets the per-domain slot updated 
> > on
> > guest context switch.
> >
> > This limits the amount of memory used for HVM HAP monitor tables to the 
> > amount
> > of active pCPUs, rather than to the number of vCPUs.  It also simplifies 
> > vCPU
> > allocation and teardown, since the monitor table handling is removed from
> > there.
> >
> > Note the switch to using a per-CPU monitor table is done regardless of 
> > whether
> 
> s/per-CPU/per-pCPU/

Sorry, I might not has been as consistent as I wanted with using pCPU
everywhere.

> > Address Space Isolation is enabled or not.  Partly for the memory usage
> > reduction, and also because it allows to simplify the VM tear down path by 
> > not
> > having to cleanup the per-vCPU monitor tables.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Note the monitor table is not made static because uses outside of the file
> > where it's defined will be added by further patches.
> > ---
> >  xen/arch/x86/hvm/hvm.c | 60 
> >  xen/arch/x86/hvm/svm/svm.c |  5 ++
> >  xen/arch/x86/hvm/vmx/vmcs.c|  1 +
> >  xen/arch/x86/hvm/vmx/vmx.c |  4 ++
> >  xen/arch/x86/include/asm/hap.h |  1 -
> >  xen/arch/x86/include/asm/hvm/hvm.h |  8 
> >  xen/arch/x86/mm.c  |  8 
> >  xen/arch/x86/mm/hap/hap.c  | 75 --
> >  xen/arch/x86/mm/paging.c   |  4 +-
> >  9 files changed, 87 insertions(+), 79 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 7f4b627b1f5f..3f771bc65677 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -104,6 +104,54 @@ static const char __initconst warning_hvm_fep[] =
> >  static bool __initdata opt_altp2m_enabled;
> >  boolean_param("altp2m", opt_altp2m_enabled);
> >  
> > +DEFINE_PER_CPU(root_pgentry_t *, monitor_pgt);
> > +
> > +static int allocate_cpu_monitor_table(unsigned int cpu)
> 
> To avoid ambiguity, could we call these *_pcpu_*() instead?

As replied by Jan, plain 'cpu' is physical CPU on hypervisor code
function names usually.  '_pcpu_' here would IMO imply per-CPU, which
it also is, but likely doesn't need spelling in the function name.

> > +{
> > +root_pgentry_t *pgt = alloc_xenheap_page();
> > +
> > +if ( !pgt )
> > +return -ENOMEM;
> > +
> > +clear_page(pgt);
> > +
> > +init_xen_l4_slots(pgt, _mfn(virt_to_mfn(pgt)), INVALID_MFN, NULL,
> > +  false, true, false);
> > +
> > +ASSERT(!per_cpu(monitor_pgt, cpu));
> > +per_cpu(monitor_pgt, cpu) = pgt;
> > +
> > +return 0;
> > +}
> > +
> > +static void free_cpu_monitor_table(unsigned int cpu)
> > +{
> > +root_pgentry_t *pgt = per_cpu(monitor_pgt, cpu);
> > +
> > +if ( !pgt )
> > +return;
> > +
> > +per_cpu(monitor_pgt, cpu) = NULL;
> > +free_xenheap_page(pgt);
> > +}
> > +
> > +void hvm_set_cpu_monitor_table(struct vcpu *v)
> > +{
> > +root_pgentry_t *pgt = this_cpu(monitor_pgt);
> > +
> > +ASSERT(pgt);
> > +
> > +setup_perdomain_slot(v, pgt);
> 
> Why not modify them as part of write_ptbase() instead? As it stands, it 
> appears
> to be modifying the PTEs of what may very well be our current PT, which makes
> the perdomain slot be in a $DEITY-knows-what state until the next flush
> (presumably the write to cr3 in write_ptbase()?; assuming no PCIDs).
> 
> Setting the slot up right before the cr3 change should reduce the potential 
> for
> misuse.

The reasoning for doing it here it that the per-domain slot only needs
setting on context switch.  In the PV case write_ptbase() will be
called each time the guest switches %cr3, but setting the per-domain
slot is not required for each call if the vCPU hasn't changed.

Let me see if I can arrange for the current contents of
setup_perdomain_slot() to be merged into write_ptbase(). Note
setup_perdomain_slot() started as a wrapper to extract XPTI specific
code from paravirt_ctxt_switch_to().

> > +
> > +make_cr3(v, _mfn(virt_to_mfn(pgt)));
> > +}
> > +
> > +void hvm_clear_cpu_monitor_table(struct vcpu *v)
> > +{
>

Re: [PATCH 12/22] x86/spec-ctrl: introduce Address Space Isolation command line option

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 04:03:04PM +0200, Jan Beulich wrote:
> On 25.09.2024 15:31, Roger Pau Monné wrote:
> > On Wed, Aug 14, 2024 at 12:10:56PM +0200, Jan Beulich wrote:
> >> On 26.07.2024 17:21, Roger Pau Monne wrote:
> >>> --- a/docs/misc/xen-command-line.pandoc
> >>> +++ b/docs/misc/xen-command-line.pandoc
> >>> @@ -2387,7 +2387,7 @@ By default SSBD will be mitigated at runtime (i.e 
> >>> `ssbd=runtime`).
> >>>  
> >>>  ### spec-ctrl (x86)
> >>>  > `= List of [ , xen=, {pv,hvm}=,
> >>> ->  {msr-sc,rsb,verw,{ibpb,bhb}-entry}=|{pv,hvm}=,
> >>> +>  
> >>> {msr-sc,rsb,verw,{ibpb,bhb}-entry,asi}=|{pv,hvm}=,
> >>
> >> Is it really appropriate to hide this underneath an x86-only option? Even
> >> of other architectures won't support it right away, they surely will want
> >> to down the road? In which case making as much of this common right away
> >> is probably the best we can do. This goes along with the question whether,
> >> like e.g. "xpti", this should be a top-level option.
> > 
> > I think it's better placed in spec-ctrl as it's a speculation
> > mitigation.
> 
> As is XPTI.

But XPTI predates the introduction of spec-ctrl option, I assumed
that's why xpti is not part of spec-ctrl.

> >  I can see your point about sharing with other arches,
> > maybe when that's needed we can introduce a generic parser of
> > spec-ctrl options?
> 
> Not sure how much could be generalized there.

Oh, so your point was not about sharing the parsing code, but sharing
the command line documentation about it, sorry, I missed that.

Along the lines of:

asi= boolean | { pv, hvm, hwdom }

Or similar?

Even then sub-options would likely be different between architectures.

> >>> @@ -143,6 +148,10 @@ static int __init cf_check parse_spec_ctrl(const 
> >>> char *s)
> >>>  opt_unpriv_mmio = false;
> >>>  opt_gds_mit = 0;
> >>>  opt_div_scrub = 0;
> >>> +
> >>> +opt_asi_pv = 0;
> >>> +opt_asi_hwdom = 0;
> >>> +opt_asi_hvm = 0;
> >>>  }
> >>>  else if ( val > 0 )
> >>>  rc = -EINVAL;
> >>
> >> I'm frequently in trouble when deciding where the split between "=no" and
> >> "=xen" should be. opt_xpti_* are cleared ahead of the disable_common label;
> >> considering the similarity I wonder whether the same should be true for ASI
> >> (as this is also or even mainly about protecting guests from one another),
> >> or whether the XPTI placement is actually wrong.
> > 
> > Hm, that's a difficult one.  ASI is a Xen implemented mitigation, so
> > it should be turned off when spec-ctrl=no-xen is used according to the
> > description of the option:
> > 
> > "spec-ctrl=no-xen can be used to turn off all of Xen’s mitigations"
> 
> Meaning (aiui) mitigations to protect Xen itself.

So that would speculation attacks that take place in Xen context,
which is what ASI would protect against?

I don't have a strong opinion, but I also have a hard time seeing what
should `no-xen` disable.

> >>> @@ -378,6 +410,13 @@ int8_t __ro_after_init opt_xpti_domu = -1;
> >>>  
> >>>  static __init void xpti_init_default(void)
> >>>  {
> >>> +ASSERT(opt_asi_pv >= 0 && opt_asi_hwdom >= 0);
> >>> +if ( (opt_xpti_hwdom == 1 || opt_xpti_domu == 1) && opt_asi_pv == 1 )
> >>
> >> There is a separate opt_asi_hwdom which isn't used here, but only ...
> > 
> > opt_asi_pv (and opt_asi_hvm) must be set for opt_asi_hwdom to also be
> > set.  XPTI is sligtly different, in that XPTI could be set only for
> > the hwdom by using `xpti=dom0`.
> 
> Hmm, I didn't even notice this oddity (as it feels to me) in parsing.
> From the doc provided it wouldn't occur to me that e.g. "asi=pv" won't
> affect a PV Dom0. That's (iirc) specifically why "xpti=" has a "hwdom"
> sub-option.

It seems to be like that for all spec-ctrl options, see `bhb-entry`
for example.

> >>> @@ -643,22 +683,24 @@ static void __init print_details(enum ind_thunk 
> >>> thunk)
> >>> opt_eager_fpu ? " EAGER_FPU" 
> >>> : "",
> >>> opt_verw_hvm  ? " VER

Re: [PATCH v2 5/6] x86/alternatives: do not BUG during apply

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 11:53:30AM +0100, Andrew Cooper wrote:
> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> > alternatives is used both at boot time, and when loading livepatch payloads.
> > While for the former it makes sense to panic, it's not useful for the 
> > later, as
> > for livepatches it's possible to fail to load the livepatch if alternatives
> > cannot be resolved and continue operating normally.
> >
> > Relax the BUGs in _apply_alternatives() to instead return an error code.  
> > The
> > caller will figure out whether the failures are fatal and panic.
> >
> > Print an error message to provide some user-readable information about what
> > went wrong.
> >
> > Signed-off-by: Roger Pau Monné 
> 
> Much nicer.  A few more diagnostic comments.
> 
> > diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
> > index 7824053c9d33..c8848ba6006e 100644
> > --- a/xen/arch/x86/alternative.c
> > +++ b/xen/arch/x86/alternative.c
> > @@ -198,9 +198,29 @@ static void init_or_livepatch 
> > _apply_alternatives(struct alt_instr *start,
> >  uint8_t buf[MAX_PATCH_LEN];
> >  unsigned int total_len = a->orig_len + a->pad_len;
> >  
> > -BUG_ON(a->repl_len > total_len);
> > -BUG_ON(total_len > sizeof(buf));
> > -BUG_ON(a->cpuid >= NCAPINTS * 32);
> > +if ( a->repl_len > total_len )
> > +{
> > +printk(XENLOG_ERR
> > +   "alt replacement size (%#x) bigger than destination 
> > (%#x)\n",
> 
> These all say "some alternative went wrong", without identifying which. 
> For x86_decode_lite(), debugging was far easier when using:
> 
> "Alternative for %ps ...", ALT_ORIG_PTR(a)
> 
> If we get the order of loading correct, then %ps will even work for a
> livepatch, but that's secondary - even the raw number is slightly useful
> given the livepatch load address.

I don't think this will work as-is for livepatches.  The call to
register the virtual region is currently done in livepatch_upload(),
after load_payload_data() has completed.

We could see about registering the virtual region earlier (no
volunteering to do that work right now).

> I could be talked down to just "Alt for %ps" as you've got it here.  I
> think it's clear enough in context.  So, I'd recommend:
> 
> "Alt for %ps, replacement size %#x larger than origin %#x\n".
> 
> Here, I think origin is better than destination, when discussing
> alternatives.

Sure.

> I can adjust on commit.  Everything else is fine.

If you are comfortable with doing the adjustments on commit, please go
ahead.

Thanks, Roger.

Re: [PATCH v2 4/6] xen/livepatch: do Xen build-id check earlier

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 11:21:01AM +0100, Andrew Cooper wrote:
> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> > The check against the expected Xen build ID should be done ahead of 
> > attempting
> > to apply the alternatives contained in the livepatch.
> >
> > If the CPUID in the alternatives patching data is out of the scope of the
> > running Xen featureset the BUG() in _apply_alternatives() will trigger thus
> > bringing the system down.  Note the layout of struct alt_instr could also
> > change between versions.  It's also possible for struct 
> > exception_table_entry
> > to have changed format, hence leading to other kind of errors if parsing of 
> > the
> > payload is done ahead of checking if the Xen build-id matches.
> >
> > Move the Xen build ID check as early as possible.  To do so introduce a new
> > check_xen_buildid() function that parses and checks the Xen build-id before
> > moving the payload.  Since the expected Xen build-id is used early to
> > detect whether the livepatch payload could be loaded, there's no reason to
> > store it in the payload struct, as a non-matching Xen build-id won't get the
> > payload populated in the first place.
> >
> > Note parse_buildid() has to be slightly adjusted to allow fetching the 
> > section
> > data from the 'data' field instead of the 'load_addr' one: with the Xen 
> > build
> > ID moved ahead of move_payload() 'load_addr' is not yet set when the Xen 
> > build
> > ID check is performed.  Also printing the expected Xen build ID has part of
> > dumping the payload is no longer done, as all loaded payloads would have Xen
> > build IDs matching the running Xen, otherwise they would have failed to 
> > load.
> >
> > Fixes: 879615f5db1d ('livepatch: Always check hypervisor build ID upon 
> > livepatch upload')
> > Signed-off-by: Roger Pau Monné 
> 
> Much nicer.  A couple of suggestions.
> 
> > ---
> > Changes since v1:
> >  - Do the Xen build-id check even earlier.
> > ---
> >  xen/common/livepatch.c  | 66 +++--
> >  xen/include/xen/livepatch_payload.h |  1 -
> >  2 files changed, 44 insertions(+), 23 deletions(-)
> >
> > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> > index 8e61083f23a7..895c425cd5ea 100644
> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -448,24 +448,21 @@ static bool section_ok(const struct livepatch_elf 
> > *elf,
> >  return true;
> >  }
> >  
> > -static int xen_build_id_dep(const struct payload *payload)
> > +static int xen_build_id_dep(const struct livepatch_build_id *expected)
> >  {
> >  const void *id = NULL;
> >  unsigned int len = 0;
> >  int rc;
> >  
> > -ASSERT(payload->xen_dep.len);
> > -ASSERT(payload->xen_dep.p);
> > +ASSERT(expected->len);
> > +ASSERT(expected->p);
> >  
> >  rc = xen_build_id(&id, &len);
> >  if ( rc )
> >  return rc;
> >  
> > -if ( payload->xen_dep.len != len || memcmp(id, payload->xen_dep.p, 
> > len) ) {
> > -printk(XENLOG_ERR LIVEPATCH "%s: check against hypervisor build-id 
> > failed\n",
> > -   payload->name);
> > +if ( expected->len != len || memcmp(id, expected->p, len) )
> >  return -EINVAL;
> > -}
> 
> I'd suggest merging this into check_xen_buildid(), which is the single
> caller and serves the same singular purpose.
> 
> It removes the ASSERT() (expected is now a local variable), and it helps
> with some diagnostic improvements.

Sure.

> >  
> >  return 0;
> >  }
> > @@ -495,11 +493,44 @@ static int parse_buildid(const struct 
> > livepatch_elf_sec *sec,
> > return 0;
> >  }
> >  
> > +static int check_xen_buildid(const struct livepatch_elf *elf)
> > +{
> > +struct livepatch_build_id id;
> > +const struct livepatch_elf_sec *sec =
> > +livepatch_elf_sec_by_name(elf, ELF_LIVEPATCH_XEN_DEPENDS);
> > +int rc;
> > +
> > +if ( !sec )
> > +{
> > +printk(XENLOG_ERR LIVEPATCH "%s: %s is missing\n",
> 
> "%s: Section: '%s' missing\n".
> 
> (Maybe no single quotes around the section as we know it's non-empty.)
> 
> > +   elf->name, ELF_LIVEPATCH_XEN_DEPENDS);
> > +return -EINVAL;
> > +}
> > +
> > +

Re: [PATCH] changelog: add note about blkif protocol fixes

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 12:18:59PM +0100, Andrew Cooper wrote:
> On 12/09/2024 2:23 pm, Roger Pau Monne wrote:
> > Reported-by: Andrew Cooper 
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  CHANGELOG.md | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/CHANGELOG.md b/CHANGELOG.md
> > index 26e7d8dd2ac4..8864ea7eafad 100644
> > --- a/CHANGELOG.md
> > +++ b/CHANGELOG.md
> > @@ -9,6 +9,7 @@ The format is based on [Keep a 
> > Changelog](https://keepachangelog.com/en/1.0.0/)
> >  ### Changed
> >   - On x86:
> > - Prefer ACPI reboot over UEFI ResetSystem() run time service call.
> > + - Fixed blkif protocol specification for sector sizes different than 512b.
> 
> It's minor, but blkif is common, so should go ahead of the "On x86"
> subsection.

Oh, it's in common (indentation wise) it's just after the x86 section.
For other release notes we have the x86/ARM/... sections first, and
afterwards the common changes.

I don't mind putting the common changes first and the arch-specific
ones later, but it should likely be documented somewhere.

Thanks, Roger.

Re: [PATCH 12/22] x86/spec-ctrl: introduce Address Space Isolation command line option

2024-09-25 Thread Roger Pau Monné

On Wed, Aug 14, 2024 at 12:10:56PM +0200, Jan Beulich wrote:
> On 26.07.2024 17:21, Roger Pau Monne wrote:
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -2387,7 +2387,7 @@ By default SSBD will be mitigated at runtime (i.e 
> > `ssbd=runtime`).
> >  
> >  ### spec-ctrl (x86)
> >  > `= List of [ , xen=, {pv,hvm}=,
> > ->  {msr-sc,rsb,verw,{ibpb,bhb}-entry}=|{pv,hvm}=,
> > +>  
> > {msr-sc,rsb,verw,{ibpb,bhb}-entry,asi}=|{pv,hvm}=,
> 
> Is it really appropriate to hide this underneath an x86-only option? Even
> of other architectures won't support it right away, they surely will want
> to down the road? In which case making as much of this common right away
> is probably the best we can do. This goes along with the question whether,
> like e.g. "xpti", this should be a top-level option.

I think it's better placed in spec-ctrl as it's a speculation
mitigation.  I can see your point about sharing with other arches,
maybe when that's needed we can introduce a generic parser of
spec-ctrl options?

It might end up needing slightly different processing for arches
different than x86, as for x86 it should be possible to enable the
option only for PV or HVM domains, while for other arches this might
make no sense for not having PV support.

> > @@ -2414,10 +2414,10 @@ in place for guests to use.
> >  
> >  Use of a positive boolean value for either of these options is invalid.
> >  
> > -The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `verw=`, `ibpb-entry=` and 
> > `bhb-entry=`
> > -options offer fine grained control over the primitives by Xen.  These 
> > impact
> > -Xen's ability to protect itself, and/or Xen's ability to virtualise support
> > -for guests to use.
> > +The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `verw=`, `ibpb-entry=`, `bhb-entry=` 
> > and
> > +`asi=` options offer fine grained control over the primitives by Xen.  
> > These
> 
> Here, ahead of "by Xen", it looks like "used" was missing. Maybe a good
> opportunity to add it?

Oh, yes.

> > @@ -2449,6 +2449,11 @@ for guests to use.
> >is not available (see `bhi-dis-s`).  The choice of scrubbing sequence 
> > can be
> >selected using the `bhb-seq=` option.  If it is necessary to protect dom0
> >too, boot with `spec-ctrl=bhb-entry`.
> > +* `asi=` offers control over whether the hypervisor will engage in Address
> > +  Space Isolation, by not having sensitive information mapped in the VMM
> > +  page-tables.  Not having sensitive information on the page-tables avoids
> > +  having to perform some mitigations for speculative attacks when
> > +  context-switching to the hypervisor.
> 
> Is "not having" and ...
> 
> > --- a/xen/arch/x86/include/asm/domain.h
> > +++ b/xen/arch/x86/include/asm/domain.h
> > @@ -458,6 +458,9 @@ struct arch_domain
> >  /* Don't unconditionally inject #GP for unhandled MSRs. */
> >  bool msr_relaxed;
> >  
> > +/* Run the guest without sensitive information in the VMM page-tables. 
> > */
> > +bool asi;
> 
> ... "without" really going to be fully true? Wouldn't we better say "as little
> as possible" or alike?

Maybe better use:

"...by not having sensitive information permanently mapped..."

And a similar adjustment to the comment?

The key point is that we would only map sensitive information
transiently IMO.

> > @@ -143,6 +148,10 @@ static int __init cf_check parse_spec_ctrl(const char 
> > *s)
> >  opt_unpriv_mmio = false;
> >  opt_gds_mit = 0;
> >  opt_div_scrub = 0;
> > +
> > +opt_asi_pv = 0;
> > +opt_asi_hwdom = 0;
> > +opt_asi_hvm = 0;
> >  }
> >  else if ( val > 0 )
> >  rc = -EINVAL;
> 
> I'm frequently in trouble when deciding where the split between "=no" and
> "=xen" should be. opt_xpti_* are cleared ahead of the disable_common label;
> considering the similarity I wonder whether the same should be true for ASI
> (as this is also or even mainly about protecting guests from one another),
> or whether the XPTI placement is actually wrong.

Hm, that's a difficult one.  ASI is a Xen implemented mitigation, so
it should be turned off when spec-ctrl=no-xen is used according to the
description of the option:

"spec-ctrl=no-xen can be used to turn off all of Xen’s mitigations"

OTOH, there's no "virtualisation support in place for guests to use"
when no-xen is used.

I have to admin the description for that option is not obviously clear
to me, so 

> > @@ -378,6 +410,13 @@ int8_t __ro_after_init opt_xpti_domu = -1;
> >  
> >  static __init void xpti_init_default(void)
> >  {
> > +ASSERT(opt_asi_pv >= 0 && opt_asi_hwdom >= 0);
> > +if ( (opt_xpti_hwdom == 1 || opt_xpti_domu == 1) && opt_asi_pv == 1 )
> 
> There is a separate opt_asi_hwdom which isn't used here, but only ...

opt_asi_pv (and opt_asi_hvm) must be set for opt_asi_hwdom to also be
set.  XPTI is sligtly different, in that XPTI could be set only for
the hwdom by using `x

Re: [PATCH] changelog: add note about blkif protocol fixes

2024-09-25 Thread Roger Pau Monné

On Fri, Sep 13, 2024 at 04:19:32PM +0200, oleksii.kuroc...@gmail.com wrote:
> On Thu, 2024-09-12 at 15:23 +0200, Roger Pau Monne wrote:
> > Reported-by: Andrew Cooper 
> > Signed-off-by: Roger Pau Monné 
> LGTM: Oleksii Kurochko 

Hello Oleksii,

Could you formalize the tag into either a Reviewed-by or an Acked-by
so the patch can be committed?

Thanks, Roger.

Re: [PATCH v2 3/6] xen/livepatch: simplify and unify logic in prepare_payload()

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 10:37:56AM +0100, Andrew Cooper wrote:
> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> > index 87b3db03e26d..8e61083f23a7 100644
> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -470,6 +470,31 @@ static int xen_build_id_dep(const struct payload 
> > *payload)
> >  return 0;
> >  }
> >  
> > +/* Parses build-id sections into the given destination. */
> > +static int parse_buildid(const struct livepatch_elf_sec *sec,
> > + struct livepatch_build_id *id)
> > +{
> > +const Elf_Note *n;
> > +int rc;
> > +
> > +/* Presence of the sections is ensured by check_special_sections(). */
> > +ASSERT(sec);
> > +
> > +n = sec->load_addr;
> > +
> > +if ( sec->sec->sh_size <= sizeof(*n) )
> > +return -EINVAL;
> > +
> > +rc = xen_build_id_check(n, sec->sec->sh_size, &id->p, &id->len);
> 
> I've just realised what is so confusing.
> 
> This function is not a Xen buildid check, it's an ELF buildid note check.
> 
> I'll do a followup patch after yours goes in renaming it.
> 
> Reviewed-by: Andrew Cooper 

Yeah, the naming of xen_build_id_check is confusing, as it's not just
a check, it also populates livepatch_build_id fields.  Thought about
renaming it, but the series was already long enough...

Thanks, Roger.

Re: [PATCH v2 2/6] xen/livepatch: zero pointer to temporary load buffer

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 10:33:39AM +0100, Andrew Cooper wrote:
> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> > The livepatch_elf_sec data field points to the temporary load buffer, it's 
> > the
> > load_addr field that points to the stable loaded section data.  Zero the 
> > data
> > field once load_addr is set, as it would otherwise become a dangling pointer
> > once the load buffer is freed.
> >
> > No functional change intended.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Changes since v1:
> >  - New in this version.
> > ---
> >  xen/common/livepatch.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> > index df41dcce970a..87b3db03e26d 100644
> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -383,6 +383,9 @@ static int move_payload(struct payload *payload, struct 
> > livepatch_elf *elf)
> >  }
> >  else
> >  memset(elf->sec[i].load_addr, 0, elf->sec[i].sec->sh_size);
> > +
> > +/* Avoid leaking pointers to temporary load buffers. */
> > +elf->sec[i].data = NULL;
> >  }
> >  }
> >  
> 
> Where is the data allocated and freed?
> 
> I don't see it being freed in this loop, so how is freed subsequently?

It's allocated and freed by livepatch_upload(), it's the raw_data
buffer that's allocated in the context of that function.

Thanks, Roger.

Re: [PATCH v2 5/6] x86/alternatives: do not BUG during apply

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 11:01:08AM +0200, Jan Beulich wrote:
> On 25.09.2024 10:42, Roger Pau Monne wrote:
> > alternatives is used both at boot time, and when loading livepatch payloads.
> > While for the former it makes sense to panic, it's not useful for the 
> > later, as
> > for livepatches it's possible to fail to load the livepatch if alternatives
> > cannot be resolved and continue operating normally.
> > 
> > Relax the BUGs in _apply_alternatives() to instead return an error code.  
> > The
> > caller will figure out whether the failures are fatal and panic.
> > 
> > Print an error message to provide some user-readable information about what
> > went wrong.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich 
> 
> Albeit ...
> 
> > @@ -394,8 +418,10 @@ static int __init cf_check nmi_apply_alternatives(
> >   PAGE_HYPERVISOR_RWX);
> >  flush_local(FLUSH_TLB_GLOBAL);
> >  
> > -_apply_alternatives(__alt_instructions, __alt_instructions_end,
> > -alt_done);
> > +rc = _apply_alternatives(__alt_instructions, 
> > __alt_instructions_end,
> > + alt_done);
> > +if ( rc )
> > +panic("Unable to apply alternatives: %d\n", rc);
> 
> ... I see little value in logging rc here - the other log message will
> provide better detail anyway.

Current log messages do indeed provide better detail, but maybe we end
up adding new return error paths to _apply_alternatives() in the
future.  I see no harm in printing the error code if there's one.

> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -896,7 +896,15 @@ static int prepare_payload(struct payload *payload,
> >  return -EINVAL;
> >  }
> >  }
> > -apply_alternatives(start, end);
> > +
> > +rc = apply_alternatives(start, end);
> > +if ( rc )
> > +{
> > +printk(XENLOG_ERR LIVEPATCH "%s applying alternatives failed: 
> > %d\n",
> > +   elf->name, rc);
> > +return rc;
> > +}
> 
> Whereas here it may indeed make sense to keep things as you have them, as the
> error code is propagated onwards, and matching it with other error messages
> coming from elsewhere may help in understanding the situation.
> 
> As to possible applying: It looks as if this was independent of the earlier 4
> patches?

Yes, I think patches 5 and 6 can be applied ahead of the preceding
livepatch fixes.

Thanks, Roger.

Re: [PATCH v2 6/6] x86/alternative: build time check feature is in range

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 09:51:36AM +0100, Andrew Cooper wrote:
> On 25/09/2024 9:42 am, Roger Pau Monne wrote:
> > diff --git a/xen/arch/x86/include/asm/alternative.h 
> > b/xen/arch/x86/include/asm/alternative.h
> > index 69555d781ef9..b7f155994b2c 100644
> > --- a/xen/arch/x86/include/asm/alternative.h
> > +++ b/xen/arch/x86/include/asm/alternative.h
> > @@ -7,6 +7,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  struct __packed alt_instr {
> >  int32_t  orig_offset;   /* original instruction */
> > @@ -59,6 +60,9 @@ extern void alternative_branches(void);
> >  alt_repl_len(n2)) "-" alt_orig_len)
> >  
> >  #define ALTINSTR_ENTRY(feature, num)\
> > +" .if " __stringify(feature) " >= " __stringify(NCAPINTS * 32) 
> > "\n"\
> 
> Please use STR() from macros.h.  It's shorter, and we're trying to
> retire the use of __stringify().

Oh, yes, indeed.  I just copied from the surrounding context and
forgot about STR().

> Happy to fix on commit.  Reviewed-by: Andrew Cooper
> 

Sure, thanks.

Roger.

Re: [PATCH v2 6/6] x86/alternative: build time check feature is in range

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 11:04:45AM +0200, Jan Beulich wrote:
> On 25.09.2024 10:42, Roger Pau Monne wrote:
> > Ensure at build time the feature(s) used for the alternative blocks are in
> > range of the featureset.
> > 
> > No functional change intended, as all current usages are correct.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> I'm struggling with what this is meant to guard against. All validly usable
> constants are within range. Any unsuitable constant can of course have any
> value, yet you'd then refuse only those which are out of bounds.

It's IMO better than nothing, and it's a build-time check.

Thanks, Roger.

Re: [PATCH v2 1/6] xen/livepatch: remove useless check for duplicated sections

2024-09-25 Thread Roger Pau Monné

On Wed, Sep 25, 2024 at 10:52:06AM +0200, Jan Beulich wrote:
> On 25.09.2024 10:42, Roger Pau Monne wrote:
> > The current check for duplicated sections in a payload is not effective.  
> > Such
> > check is done inside a loop that iterates over the sections names, it's
> > logically impossible for the bitmap to be set more than once.
> > 
> > The usage of a bitmap in check_patching_sections() has been replaced with a
> > boolean, since the function just cares that at least one of the special
> > sections is present.
> > 
> > No functional change intended, as the check was useless.
> > 
> > Fixes: 29f4ab0b0a4f ('xsplice: Implement support for 
> > applying/reverting/replacing patches.')
> > Fixes: 76b3d4098a92 ('livepatch: Do not enforce ELF_LIVEPATCH_FUNC section 
> > presence')
> > Signed-off-by: Roger Pau Monné 
> 
> Just to clarify for my eventual picking up for backporting: Despite
> the Fixes: tags there's no actual bug being fixed; it's merely code
> simplification. So no need to backport.

Indeed, no strict bug, just useless code (unless my analysis is wrong).

Thanks, Roger.

Re: [PATCH] x86/APIC: Remove x2APIC pure cluster mode

2024-09-25 Thread Roger Pau Monné

On Tue, Sep 24, 2024 at 07:20:44PM +0100, Andrew Cooper wrote:
> On 24/09/2024 6:10 pm, Roger Pau Monné wrote:
> > On Tue, Sep 24, 2024 at 06:21:47PM +0200, Jan Beulich wrote:
> >> On 24.09.2024 18:14, Roger Pau Monné wrote:
> >>> On Tue, Sep 24, 2024 at 04:27:36PM +0100, Andrew Cooper wrote:
> >>>> On 24/09/2024 4:10 pm, Roger Pau Monné wrote:
> >>>>> On Mon, Sep 23, 2024 at 03:35:59PM +0100, Matthew Barnes wrote:
> >>>>>> With the introduction of mixed x2APIC mode (using cluster addressing 
> >>>>>> for
> >>>>>> IPIs and physical for external interrupts) the use of pure cluster mode
> >>>>>> doesn't have any benefit.
> >>>>>>
> >>>>>> Remove the mode itself, leaving only the code required for logical
> >>>>>> addressing when sending IPIs.
> >>>>>>
> >>>>>> Implements: https://gitlab.com/xen-project/xen/-/issues/189
> >>>> We use the Resolves: tag for this.  Can fix on commit.
> >>>>
> >>>>> There's at least one extra bit which I would also like to see removed,
> >>>>> either in this patch, or as following patch.
> >>>>>
> >>>>> In struct arch_irq_desc we have 3 cpumasks: cpu_mask, old_cpu_mask and
> >>>>> pending_mask.  After dropping cluster mode for external interrupts,
> >>>>> those fields could become integers AFACT, as now interrupts can only
> >>>>> target a single CPU opposed to a logical CPU set.
> >>>> A separate patch for sure, but that sounds like a great improvement.
> >>> Oh, there are quite some fields of struct genapic that are not needed
> >>> anymore, since both implementations set it to the same function.  It
> >>> would be helpful to prune them.
> >> Pruning where possible - yes. But "both" won't cover it, as we have 4
> >> instances of the struct (not just the two x2apic ones).
> > Yeah, realized that afterwards, we still have the xAPIC flat mode,
> > which is using logical delivery mode, but target a single CPU.  So
> > getting rid of the cpumask in arch_irq_desc seem possible, however
> > there might be nothing to prune in struct genapic.
> 
> Logical delivery mode for external interrupts in xAPIC is just as
> broken/useless as the code we've just deleted.
> 
> If that's the only thing in the way of more cleanup, we delete it too.

Bah, xAPIC flat delivery mode needs to be adjusted to use physical
delivery for external interrupts, as the vector space is shared
between all CPUs in that mode.  This must be done ahead of turning the
arch_irq_desc cpumasks into integers.

Sorry, this is turning into a more work that I originally expected,
mostly because I wasn't taking into account that xAPIC was still using
logical mode for external interrupts.

Thanks, Roger.

Re: [PATCH] x86/APIC: Remove x2APIC pure cluster mode

2024-09-24 Thread Roger Pau Monné

On Tue, Sep 24, 2024 at 06:21:47PM +0200, Jan Beulich wrote:
> On 24.09.2024 18:14, Roger Pau Monné wrote:
> > On Tue, Sep 24, 2024 at 04:27:36PM +0100, Andrew Cooper wrote:
> >> On 24/09/2024 4:10 pm, Roger Pau Monné wrote:
> >>> On Mon, Sep 23, 2024 at 03:35:59PM +0100, Matthew Barnes wrote:
> >>>> With the introduction of mixed x2APIC mode (using cluster addressing for
> >>>> IPIs and physical for external interrupts) the use of pure cluster mode
> >>>> doesn't have any benefit.
> >>>>
> >>>> Remove the mode itself, leaving only the code required for logical
> >>>> addressing when sending IPIs.
> >>>>
> >>>> Implements: https://gitlab.com/xen-project/xen/-/issues/189
> >>
> >> We use the Resolves: tag for this.  Can fix on commit.
> >>
> >>> There's at least one extra bit which I would also like to see removed,
> >>> either in this patch, or as following patch.
> >>>
> >>> In struct arch_irq_desc we have 3 cpumasks: cpu_mask, old_cpu_mask and
> >>> pending_mask.  After dropping cluster mode for external interrupts,
> >>> those fields could become integers AFACT, as now interrupts can only
> >>> target a single CPU opposed to a logical CPU set.
> >>
> >> A separate patch for sure, but that sounds like a great improvement.
> > 
> > Oh, there are quite some fields of struct genapic that are not needed
> > anymore, since both implementations set it to the same function.  It
> > would be helpful to prune them.
> 
> Pruning where possible - yes. But "both" won't cover it, as we have 4
> instances of the struct (not just the two x2apic ones).

Yeah, realized that afterwards, we still have the xAPIC flat mode,
which is using logical delivery mode, but target a single CPU.  So
getting rid of the cpumask in arch_irq_desc seem possible, however
there might be nothing to prune in struct genapic.

Regards, Roger.

Re: [PATCH] x86/APIC: Remove x2APIC pure cluster mode

2024-09-24 Thread Roger Pau Monné

On Tue, Sep 24, 2024 at 04:27:36PM +0100, Andrew Cooper wrote:
> On 24/09/2024 4:10 pm, Roger Pau Monné wrote:
> > On Mon, Sep 23, 2024 at 03:35:59PM +0100, Matthew Barnes wrote:
> >> With the introduction of mixed x2APIC mode (using cluster addressing for
> >> IPIs and physical for external interrupts) the use of pure cluster mode
> >> doesn't have any benefit.
> >>
> >> Remove the mode itself, leaving only the code required for logical
> >> addressing when sending IPIs.
> >>
> >> Implements: https://gitlab.com/xen-project/xen/-/issues/189
> 
> We use the Resolves: tag for this.  Can fix on commit.
> 
> > There's at least one extra bit which I would also like to see removed,
> > either in this patch, or as following patch.
> >
> > In struct arch_irq_desc we have 3 cpumasks: cpu_mask, old_cpu_mask and
> > pending_mask.  After dropping cluster mode for external interrupts,
> > those fields could become integers AFACT, as now interrupts can only
> > target a single CPU opposed to a logical CPU set.
> 
> A separate patch for sure, but that sounds like a great improvement.

Oh, there are quite some fields of struct genapic that are not needed
anymore, since both implementations set it to the same function.  It
would be helpful to prune them.

Thanks, Roger.

Re: [PATCH] x86/APIC: Remove x2APIC pure cluster mode

2024-09-24 Thread Roger Pau Monné

On Mon, Sep 23, 2024 at 03:35:59PM +0100, Matthew Barnes wrote:
> With the introduction of mixed x2APIC mode (using cluster addressing for
> IPIs and physical for external interrupts) the use of pure cluster mode
> doesn't have any benefit.
> 
> Remove the mode itself, leaving only the code required for logical
> addressing when sending IPIs.
> 
> Implements: https://gitlab.com/xen-project/xen/-/issues/189

There's at least one extra bit which I would also like to see removed,
either in this patch, or as following patch.

In struct arch_irq_desc we have 3 cpumasks: cpu_mask, old_cpu_mask and
pending_mask.  After dropping cluster mode for external interrupts,
those fields could become integers AFACT, as now interrupts can only
target a single CPU opposed to a logical CPU set.

Thanks, Roger.

Re: [PATCH] x86/boot: Further simplify CR4 handling in dom0_construct_pv()

2024-09-24 Thread Roger Pau Monné

On Tue, Sep 24, 2024 at 03:44:07PM +0200, Jan Beulich wrote:
> On 24.09.2024 13:23, Andrew Cooper wrote:
> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -1057,29 +1057,31 @@ int __init dom0_construct_pv(struct domain *d,
> >   module_t *initrd,
> >   const char *cmdline)
> >  {
> > +unsigned long cr4 = read_cr4();
> > +unsigned long mask = X86_CR4_SMAP | X86_CR4_LASS;
> >  int rc;
> >  
> >  /*
> > - * Clear SMAP in CR4 to allow user-accesses in construct_dom0().  This
> > - * prevents us needing to write construct_dom0() in terms of
> > + * Clear SMAP/LASS in CR4 to allow user-accesses in construct_dom0().
> > + * This prevents us needing to write construct_dom0() in terms of
> >   * copy_{to,from}_user().
> >   */
> > -if ( boot_cpu_has(X86_FEATURE_XEN_SMAP) )
> > +if ( cr4 & mask )
> >  {
> >  if ( IS_ENABLED(CONFIG_PV32) )
> > -cr4_pv32_mask &= ~X86_CR4_SMAP;
> > +cr4_pv32_mask &= ~mask;
> >  
> > -write_cr4(read_cr4() & ~X86_CR4_SMAP);
> > +write_cr4(cr4 & ~mask);
> >  }
> >  
> >  rc = dom0_construct(d, image, image_headroom, initrd, cmdline);
> >  
> > -if ( boot_cpu_has(X86_FEATURE_XEN_SMAP) )
> > +if ( cr4 & mask )
> >  {
> > -write_cr4(read_cr4() | X86_CR4_SMAP);
> > +write_cr4(cr4);
> >  
> >  if ( IS_ENABLED(CONFIG_PV32) )
> > -cr4_pv32_mask |= X86_CR4_SMAP;
> > +cr4_pv32_mask |= mask;
> 
> You may end up setting a bit here which wasn't previously set, and which
> might then fault when cr4_pv32_restore tries to OR this into %cr4. Aiui
> you must have tested this on LASS-capable hardware, for it to have worked.

Possibly also needs X86_CR4_LASS adding to the XEN_CR4_PV32_BITS
define, as otherwise it won't end up in cr4_pv32_mask in the first
place AFAICT.

Thanks, Roger.

Re: [PATCH] x86/boot: Further simplify CR4 handling in dom0_construct_pv()

2024-09-24 Thread Roger Pau Monné

On Tue, Sep 24, 2024 at 12:23:43PM +0100, Andrew Cooper wrote:
> The logic would be more robust disabling SMAP based on its precense in CR4,
> rather than on certain features.
> 
> A forthcoming feature, LASS, needs the same treatment here.  Introduce minimum
> enumeration information, although it will take a bit more work to get LASS
> fully usable in guests.

Reading the ISA, doesn't LASS require SMAP to be enabled in %cr4, and
hence disabling SMAP already disables LASS? (without having to toggle
the LASS %cr4 bit)

"A supervisor-mode data access causes a LASS violation only if
supervisor-mode access protection is enabled (because CR4.SMAP = 1)
and either RFLAGS.AC = 0 or the access implicitly accesses a system
data structure."

We can consider also disabling it, but I think it would need to be
noted that such disabling is not strictly necessary, as disabling SMAP
already disables LASS.

> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> 
> I know LASS can't be used with traditional PV guests, but I have some PV-lite
> plans.  The problem is the PV kernel, in CPL3, accessing addresses in the high
> canonincal half.
> ---
>  xen/arch/x86/include/asm/x86-defns.h|  1 +
>  xen/arch/x86/pv/dom0_build.c| 18 ++
>  xen/include/public/arch-x86/cpufeatureset.h |  1 +
>  3 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/include/asm/x86-defns.h 
> b/xen/arch/x86/include/asm/x86-defns.h
> index caa92829eaa9..8f97fb1e6a12 100644
> --- a/xen/arch/x86/include/asm/x86-defns.h
> +++ b/xen/arch/x86/include/asm/x86-defns.h
> @@ -75,6 +75,7 @@
>  #define X86_CR4_PKE0x0040 /* enable PKE */
>  #define X86_CR4_CET0x0080 /* Control-flow Enforcement Technology 
> */
>  #define X86_CR4_PKS0x0100 /* Protection Key Supervisor */
> +#define X86_CR4_LASS   0x0800 /* Linear Address Space Separation */
>  
>  /*
>   * XSTATE component flags in XCR0 | MSR_XSS
> diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
> index 262edb6bf2f0..f5c868df384f 100644
> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -1057,29 +1057,31 @@ int __init dom0_construct_pv(struct domain *d,
>   module_t *initrd,
>   const char *cmdline)
>  {
> +unsigned long cr4 = read_cr4();
> +unsigned long mask = X86_CR4_SMAP | X86_CR4_LASS;

const maybe?  Seeing as it is read-only.

Thanks, Roger.

Re: [PATCH 2/3] xen/livepatch: do Xen build-id check earlier

2024-09-23 Thread Roger Pau Monné

On Mon, Sep 23, 2024 at 12:04:30PM +0100, Andrew Cooper wrote:
> On 20/09/2024 10:36 am, Roger Pau Monne wrote:
> > The check against the expected Xen build ID should be done ahead of 
> > attempting
> > to apply the alternatives contained in the livepatch.
> >
> > If the CPUID in the alternatives patching data is out of the scope of the
> > running Xen featureset the BUG() in _apply_alternatives() will trigger thus
> > bringing the system down.  Note the layout of struct alt_instr could also
> > change between versions.  It's also possible for struct 
> > exception_table_entry
> > to have changed format, hence possibly leading to other errors.
> >
> > Move the Xen build ID check to be done ahead of any processing of the 
> > livepatch
> > payload sections.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  xen/common/livepatch.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> > index cea47ffe4c84..3e4fce036a1c 100644
> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -767,6 +767,11 @@ static int prepare_payload(struct payload *payload,
> >  if ( rc )
> >  return rc;
> >  
> > +/* Perform the Xen build-id check ahead of doing any more processing. 
> > */
> > +rc = xen_build_id_dep(payload);
> > +if ( rc )
> > +return rc;
> > +
> 
> While a step in the right direction, I think this needs to be moved far
> earlier.  Even here, it's behind the processing of the livepatch func
> state, which is something that can also change like alt_instr.
> 
> The buildid checks need to be as early as possible.  Looking through the
> logic (which doesn't have great names/splits), I'd say the buildid
> checks want to be between livepatch_elf_load() (which parses the
> structure of the ELF), and move_payload() (which starts copying it into
> place).
> 
> That would involve moving check_special_sections() too, but I think it's
> the right thing to do.

My plan would be to move check_special_sections() ahead and expand its
logic to also check that the expected buildid matches the running
hypervisor one.

Thanks, Roger.

Re: [PATCH 1/3] xen/livepatch: simplify and unify logic in prepare_payload()

2024-09-23 Thread Roger Pau Monné

On Mon, Sep 23, 2024 at 01:01:57PM +0200, Jan Beulich wrote:
> On 20.09.2024 11:36, Roger Pau Monne wrote:
> > The following sections: .note.gnu.build-id, .livepatch.xen_depends and
> > .livepatch.depends are mandatory and ensured to be present by
> > check_special_sections() before prepare_payload() is called.
> > 
> > Simplify the logic in prepare_payload() by introducing a generic function to
> > parse the sections that contain a buildid.  Note the function assumes the
> > buildid related section to always be present.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  xen/common/livepatch.c | 106 ++---
> >  1 file changed, 46 insertions(+), 60 deletions(-)
> > 
> > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> > index d93a556bcda2..cea47ffe4c84 100644
> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -647,15 +647,37 @@ static inline int livepatch_check_expectations(const 
> > struct payload *payload)
> >  nhooks = __sec->sec->sh_size / sizeof(*hook);  
> >\
> >  } while (0)
> >  
> > +static int fetch_buildid(const struct livepatch_elf_sec *sec,
> > + struct livepatch_build_id *id)
> > +{
> > +const Elf_Note *n = sec->load_addr;
> > +int rc;
> > +
> > +ASSERT(sec);
> > +
> > +if ( sec->sec->sh_size <= sizeof(*n) )
> > +return -EINVAL;
> 
> Oh, after my reply to Andrew's reply, now looking at the actual change -
> is it perhaps ASSERT(sec->sec) that was meant?

The original check before moving the code was against 'sec', not
'sec->sec'.  That's what I intending to retain with the assert.

I can do the `n = sec->load_addr` assign after the assert if that's
better analysis wise.

Thanks, Roger.

Re: [PATCH 3/3] x86/alternatives: relax apply BUGs during runtime

2024-09-23 Thread Roger Pau Monné

On Mon, Sep 23, 2024 at 12:17:51PM +0100, Andrew Cooper wrote:
> On 20/09/2024 10:36 am, Roger Pau Monne wrote:
> > alternatives is used both at boot time, and when loading livepatch payloads.
> > While for the former it makes sense to panic, it's not useful for the 
> > later, as
> > for livepatches it's possible to fail to load the livepatch if alternatives
> > cannot be resolved and continue operating normally.
> >
> > Relax the BUGs in _apply_alternatives() to instead return an error code if
> > alternatives are applied after boot.
> >
> > Add a dummy livepatch test so the new logic can be assessed, with the change
> > applied the loading now fails with:
> >
> > alt table 82d040602024 -> 82d040602032
> > livepatch: xen_alternatives_fail applying alternatives failed: -22
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  xen/arch/x86/alternative.c | 29 --
> >  xen/arch/x86/include/asm/alternative.h |  3 ++-
> >  xen/common/livepatch.c | 10 +++-
> >  xen/test/livepatch/Makefile|  5 
> >  xen/test/livepatch/xen_alternatives_fail.c | 29 ++
> >  5 files changed, 66 insertions(+), 10 deletions(-)
> >  create mode 100644 xen/test/livepatch/xen_alternatives_fail.c
> >
> > diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
> > index 7824053c9d33..c0912cb12ea5 100644
> > --- a/xen/arch/x86/alternative.c
> > +++ b/xen/arch/x86/alternative.c
> > @@ -174,10 +174,13 @@ extern void *const __initdata_cf_clobber_end[];
> >   * The caller will set the "force" argument to true for the final
> >   * invocation, such that no CALLs/JMPs to NULL pointers will be left
> >   * around. See also the further comment below.
> > + *
> > + * Note the function cannot fail if system_state < SYS_STATE_active, it 
> > would
> > + * panic instead.  The return value is only meaningful for runtime usage.
> >   */
> > -static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
> > -  struct alt_instr *end,
> > -  bool force)
> > +static int init_or_livepatch _apply_alternatives(struct alt_instr *start,
> > + struct alt_instr *end,
> > + bool force)
> >  {
> >  struct alt_instr *a, *base;
> >  
> > @@ -198,9 +201,17 @@ static void init_or_livepatch 
> > _apply_alternatives(struct alt_instr *start,
> >  uint8_t buf[MAX_PATCH_LEN];
> >  unsigned int total_len = a->orig_len + a->pad_len;
> >  
> > -BUG_ON(a->repl_len > total_len);
> > -BUG_ON(total_len > sizeof(buf));
> > -BUG_ON(a->cpuid >= NCAPINTS * 32);
> > +#define BUG_ON_BOOT(cond)   \
> > +if ( system_state < SYS_STATE_active )  \
> > +BUG_ON(cond);   \
> > +else if ( cond )\
> > +return -EINVAL;
> > +
> > +BUG_ON_BOOT(a->repl_len > total_len);
> > +BUG_ON_BOOT(total_len > sizeof(buf));
> > +BUG_ON_BOOT(a->cpuid >= NCAPINTS * 32);
> > +
> > +#undef BUG_ON_BOOT
> 
> The "error handling" before was horrible and this isn't really any better.
> 
> This function should always return int, printing more helpful info than
> that (a printk() says a thousand things better than a BUG()), and
> nmi_apply_alternatives can panic() rather than leaving these BUG()s here.

OK, will rework the logic here so it's the caller that panics (or not)
as necessary, and _apply_alternatives() always prints some error
message.

> As a tangent, the __must_check doesn't seem to have been applied to
> nmi_apply_alternatives(), but I'd suggest dropping the attribute; I
> don't think it adds much.

I didn't see the value in adding the attribute to
nmi_apply_alternatives(), as in that context _apply_alternatives()
would unconditionally panic instead of returning an error code.

> > diff --git a/xen/test/livepatch/xen_alternatives_fail.c 
> > b/xen/test/livepatch/xen_alternatives_fail.c
> > new file mode 100644
> > index ..01d289095a31
> > --- /dev/null
> > +++ b/xen/test/livepatch/xen_alternatives_fail.c
> > @@ -0,0 +1,29 @@
> > +/*
> > + * Copyright (c) 2024 Cloud Soft

Re: [PATCH 1/3] xen/livepatch: simplify and unify logic in prepare_payload()

2024-09-23 Thread Roger Pau Monné

On Sun, Sep 22, 2024 at 11:19:01AM +0200, Andrew Cooper wrote:
> On 20/09/2024 11:36 am, Roger Pau Monne wrote:
> > diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
> > index d93a556bcda2..cea47ffe4c84 100644
> > --- a/xen/common/livepatch.c
> > +++ b/xen/common/livepatch.c
> > @@ -647,15 +647,37 @@ static inline int livepatch_check_expectations(const 
> > struct payload *payload)
> >  nhooks = __sec->sec->sh_size / sizeof(*hook);  
> >\
> >  } while (0)
> >  
> > +static int fetch_buildid(const struct livepatch_elf_sec *sec,
> > + struct livepatch_build_id *id)
> 
> Is this really fetch?  I'd describe it as parse, more than fetch.

I can indeed change the naming.  I've used fetch because it 'fetches'
the contents of livepatch_build_id.

> > +{
> > +const Elf_Note *n = sec->load_addr;
> > +int rc;
> > +
> > +ASSERT(sec);
> 
> This needs to turn back into a runtime check.  Now, if a livepatch is
> missing one of the sections, we'll dereference NULL below, rather than
> leaving no data in the struct livepatch_build_id.

Loading should never get here without those sections being present,
check_special_sections() called earlier will return error if any of
the sections is not present, hence the ASSERT() is fine IMO.

I could do `if ( !sec ) { ASSERT_UNREACHABLE(); return -ENOENT; }`,
but given the code in check_special_sections() that checks the section
presence just ahead it seemed unnecessary convoluted.

Thanks, Roger.

Re: [PATCH v2] x86/traps: Re-enable interrupts after reading cr2 in the #PF handler

2024-09-20 Thread Roger Pau Monné

On Wed, Sep 18, 2024 at 02:05:54PM +0100, Alejandro Vallejo wrote:
> Moves sti directly after the cr2 read and immediately after the #PF
> handler.

I think you need to add some context about why this is needed, iow:
avoid corrupting %cr2 if a nested 3PF happens.

> While in the area, remove redundant q suffix to a movq in entry.S
> 
> Signed-off-by: Alejandro Vallejo 

Acked-by: Roger Pau Monné 

One nit below.

> ---
> Got lost alongside other patches. Here's the promised v2.
> 
> pipeline: 
> https://gitlab.com/xen-project/people/agvallejo/xen/-/pipelines/1458699639
> v1: 
> https://lore.kernel.org/xen-devel/20240911145823.12066-1-alejandro.vall...@cloud.com/
> 
> v2:
>   * (cosmetic), add whitespace after comma
>   * Added ASSERT(local_irq_is_enabled()) to do_page_fault()
>   * Only re-enable interrupts if they were enabled in the interrupted
> context.
> ---
>  xen/arch/x86/traps.c|  8 
>  xen/arch/x86/x86_64/entry.S | 20 
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 708136f62558..a9c2c607eb08 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1600,6 +1600,14 @@ void asmlinkage do_page_fault(struct cpu_user_regs 
> *regs)
>  
>  addr = read_cr2();
>  
> +/*
> + * Don't re-enable interrupts if we were running an IRQ-off region when
> + * we hit the page fault, or we'll break that code.
> + */
> +ASSERT(!local_irq_is_enabled());
> +if ( regs->flags & X86_EFLAGS_IF )
> +local_irq_enable();
> +
>  /* fixup_page_fault() might change regs->error_code, so cache it here. */
>  error_code = regs->error_code;
>  
> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
> index b8482de8ee5b..218e5ea85efb 100644
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -844,9 +844,9 @@ handle_exception_saved:
>  #elif !defined(CONFIG_PV)
>  ASSERT_CONTEXT_IS_XEN
>  #endif /* CONFIG_PV */
> -sti
> -1:  movq  %rsp,%rdi
> -movzbl UREGS_entry_vector(%rsp),%eax
> +.Ldispatch_handlers:

Maybe 'dispatch_exception', since it's only exceptions that are
handled here? dispatch_handlers seems a bit too generic, but no strong
opinion.

Thanks, Roger.

Re: [PATCH 2/3] xen/livepatch: do build-id check earlier

2024-09-20 Thread Roger Pau Monné

Ignore this one, I forgot to cleanup my output folder before
re-generating the patch series.  The correct 2/3 is:

"xen/livepatch: do Xen build-id check earlier"

Thanks, Roger.

Re: [PATCH] x86/shutdown: mask LVTERR ahead of offlining CPUs

2024-09-20 Thread Roger Pau Monné

On Thu, Sep 19, 2024 at 10:19:49PM +0200, Andrew Cooper wrote:
> On 19/09/2024 4:27 pm, Roger Pau Monne wrote:
> > Leaving active interrupt sources targeting APIC IDs that are offline can be
> > problematic on AMD machines during shutdown.
> 
> What exactly qualifies as "offline" here?
> 
> We don't self-INIT, so I'm guessing we leave the APIC in some kind of
> disabled state, especially given ...

I would think it's the APIC software disabling done in the SVR
register.  Otherwise it might be such disabling, plus putting the CPU
in the hlt loop with interrupts disabled.

> >   This is due to AMD local APICs
> > reporting Receive Accept Errors when a message is not handled by any APIC on
> > the system.
> 
> ... this.
> 
> 
> >   Note Intel SDM states that Receive Accept Errors are only reported
> > on P6 family and Pentium processors.
> >
> > If at shutdown an active interrupt source is left targeting an offline APIC 
> > ID,
> > the following can be seen on AMD boxes:
> >
> > Hardware Dom0 shutdown: rebooting machine
> > APIC error on CPU0: 00(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > APIC error on CPU0: 08(08), Receive accept error
> > [...]
> >
> > Thus preventing the shutdown.  In the above case the interrupt source that 
> > was
> > left targeting an offline APIC ID was the serial console one
> 
> While masking LVTERR might allow more progress, it's not a wise approach.
> 
> The real issue here is that the UART driver is still active as we're
> trying to tear the system down.  If nothing else, it's rude to leave an
> active interrupt source for the kexec kernel to deal with.

While we could attempt to shutdown interrupts on the clean
shutdown/reboot paths, do we want to attempt doing the same on the
crash path?

There's an increased risk of further fallout and inability to jump
into the crash kernel as more logic is added to the hand-over path.

> IMO, we should shut the UART down like other devices, and move it back
> into polled mode.

Yeah, dealing with the UART should be doable, but we have no guarantee
that dom0 will have unmapped all interrupts it owns before shutdown,
much less when crashing.  So we could attempt to mitigate, but it's
possibly a non-trivial amount of logic to be added.

> > , so printing of
> > the local APIC ESR lead to more unhandled messages on the APIC bus, leaving 
> > the
> > host unable to make progress.
> 
> Minor note, but there's not been an APIC bus in decades.  Here, I'd
> simply say "lead to more console IRQs, and more errors".

I think some of the manuals that I have still mention the "APIC bus".

> >
> > Mask LVTERR ahead of bringing any CPU offline, thus avoiding receiving
> > interrupts for any APIC reported errors.  Note that other local APIC errors
> > will also be ignored.  At the point where the masking is done it's unlikely 
> > for
> > any reported APIC errors to be meaningful anyway, the system is about to 
> > reboot
> > or power off.
> >
> > The LVTERR masking could be limited to AMD, but there's no guarantee that in
> > the future Intel parts also start reporting the error, thus hitting the same
> > issue.  Unifying behavior across vendors when possible seems more desirable.
> > The local APIC gets wholesale disabled as part of offlining the CPUs and
> > bringing the system down in __stop_this_cpu().
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Note a similar issue possibly exists in the nmi_shootdown_cpus() path, 
> > however
> > that being a crash path it is more complicated to uniformly mask the LVTERR
> > strictly ahead of offlining CPUs.  That path is also more resilient AFAICT, 
> > as
> > nmi_shootdown_cpus() disables interrupts at the start (so no LVTERR 
> > interrupt
> > will be handled) and the CPUs are stopped using an NMI, which would bypass 
> > any
> > LVTERR processing.
> > ---
> >  xen/arch/x86/smp.c | 17 +
> >  1 file changed, 17 insertions(+)
> >
> > diff -

Re: [XEN PATCH] x86/hvm: make ACPI PM timer support optional

2024-09-18 Thread Roger Pau Monné

On Wed, Sep 18, 2024 at 04:35:21PM +0300, Sergiy Kibrik wrote:
> 18.09.24 12:41, Roger Pau Monné:
> > On Wed, Sep 18, 2024 at 12:29:39PM +0300, Sergiy Kibrik wrote:
> > > 16.09.24 22:57, Stefano Stabellini:
> > > > On Mon, 16 Sep 2024, Roger Pau Monné wrote:
> > > > > On Mon, Sep 16, 2024 at 09:37:57AM +0300, Sergiy Kibrik wrote:
> > > > > > Introduce config option X86_PMTIMER so that pmtimer driver can be 
> > > > > > disabled on
> > > > > > systems that don't need it.
> > > > > 
> > > > > Same comment as in the VGA patch, you need to handle the user passing
> > > > > X86_EMU_PM.  It's not OK to just ignore the flag if the hypervisor is
> > > > > built without ACPI PM timer support.
> > > > 
> > > > I also think that the flag should not be ignored. I think that Xen
> > > > should return error if a user is passing a domain feature not supported
> > > > by a particular version of the Xen build. I don't think that libxl needs
> > > > to be changed as part of this patch necessarily.
> > > 
> > > It looks like toolstack always leaves X86_EMU_PM bit enabled, so that part
> > > may also require changes.
> > 
> > I think you will be unable to create HVM guests, but that's kind of
> > expected if ACPI PM emulation is removed from the hypervisor (it won't
> > be an HVM guest anymore if it doesn't have ACPI PM).
> > 
> > PVH guest don't set X86_EMU_PM so you should be able to create those
> > fine.
> > 
> 
> would the check like this be enough?:
> 
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -758,6 +758,9 @@ static bool emulation_flags_ok(const struct domain *d,
> uint32_t emflags)
>   (X86_EMU_ALL & ~(X86_EMU_VPCI | X86_EMU_USE_PIRQ)) &&
>   emflags != X86_EMU_LAPIC )
>  return false;
> +if ( !is_hardware_domain(d) &&

I don't think you need to gate this to the hardware domain?  IOW: if
it's build time disabled, it's not available for the hardware domain
either.

Seeing as there are several options you want to disable at build time, it
might be best to keep a mask, something like:

const uint32_t disabled_mask =
(!IS_ENABLED(CONFIG_X86_PMTIMER) ? X86_EMU_PM  : 0) |
(!IS_ENABLED(CONFIG_X86_STDVGA)  ? X86_EMU_VGA : 0);

And then:

if ( emflags & disabled_mask )
return false;

You also want to adjust the has_foo() macros to short-circuit them:

#define has_vpm(d) (IS_ENABLED(CONFIG_X86_PMTIMER) &&
!!((d)->arch.emulation_flags & X86_EMU_PM))

Also all those Kconfig options likely want to be inside of a separate
Kconfig section, rather than mixed with the rest of generic x86 arch
options.  It would nice to have all the options grouped inside of a
"Emulated device support" sub section or similar.

Thanks, Roger.

Re: [XEN PATCH] x86/hvm: make ACPI PM timer support optional

2024-09-18 Thread Roger Pau Monné

On Wed, Sep 18, 2024 at 12:29:39PM +0300, Sergiy Kibrik wrote:
> 16.09.24 22:57, Stefano Stabellini:
> > On Mon, 16 Sep 2024, Roger Pau Monné wrote:
> > > On Mon, Sep 16, 2024 at 09:37:57AM +0300, Sergiy Kibrik wrote:
> > > > Introduce config option X86_PMTIMER so that pmtimer driver can be 
> > > > disabled on
> > > > systems that don't need it.
> > > 
> > > Same comment as in the VGA patch, you need to handle the user passing
> > > X86_EMU_PM.  It's not OK to just ignore the flag if the hypervisor is
> > > built without ACPI PM timer support.
> > 
> > I also think that the flag should not be ignored. I think that Xen
> > should return error if a user is passing a domain feature not supported
> > by a particular version of the Xen build. I don't think that libxl needs
> > to be changed as part of this patch necessarily.
> 
> It looks like toolstack always leaves X86_EMU_PM bit enabled, so that part
> may also require changes.

I think you will be unable to create HVM guests, but that's kind of
expected if ACPI PM emulation is removed from the hypervisor (it won't
be an HVM guest anymore if it doesn't have ACPI PM).

PVH guest don't set X86_EMU_PM so you should be able to create those
fine.

Thanks, Roger.

Re: [PATCH v7 1/2] x86/time: introduce command line option to select wallclock

2024-09-16 Thread Roger Pau Monné

On Mon, Sep 16, 2024 at 03:20:56PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Sep 16, 2024 at 02:11:08PM +0100, Andrew Cooper wrote:
> > On 13/09/2024 8:59 am, Roger Pau Monne wrote:
> > > diff --git a/docs/misc/xen-command-line.pandoc 
> > > b/docs/misc/xen-command-line.pandoc
> > > index 959cf45b55d9..2a9b3b9b8975 100644
> > > --- a/docs/misc/xen-command-line.pandoc
> > > +++ b/docs/misc/xen-command-line.pandoc
> > > @@ -2816,6 +2816,27 @@ vwfi to `native` reduces irq latency 
> > > significantly. It can also lead to
> > >  suboptimal scheduling decisions, but only when the system is
> > >  oversubscribed (i.e., in total there are more vCPUs than pCPUs).
> > >  
> > > +### wallclock (x86)
> > > +> `= auto | xen | cmos | efi`
> > > +
> > > +> Default: `auto`
> > > +
> > > +Allow forcing the usage of a specific wallclock source.
> > > +
> > > + * `auto` let the hypervisor select the clocksource based on internal
> > > +   heuristics.
> > > +
> > > + * `xen` force usage of the Xen shared_info wallclock when booted as a 
> > > Xen
> > > +   guest.  This option is only available if the hypervisor was compiled 
> > > with
> > > +   `CONFIG_XEN_GUEST` enabled.
> > > +
> > > + * `cmos` force usage of the CMOS RTC wallclock.
> > > +
> > > + * `efi` force usage of the EFI_GET_TIME run-time method when booted 
> > > from EFI
> > > +   firmware.
> > 
> > For both `xen` and `efi`, something should be said about "if selected
> > and not satisfied, Xen falls back to other heuristics".

There's a line just below that notes this: "If the selected option is
invalid or not available Xen will default to `auto`."  I think it's
clearer than having to comment on every specific option.

> > > +
> > > +If the selected option is invalid or not available Xen will default to 
> > > `auto`.
> > 
> > I'm afraid that I'm firmly of the opinion that "auto" on the cmdline is
> > unnecessary complexity.  Auto is the default, and doesn't need
> > specifying explicitly.  That in turn simplifies ...
> 
> What about overriding earlier choice? For example overriding a built-in
> cmdline? That said, with the current code, the same can be achieved with
> wallclock=foo, and living with the warning in boot log...

It's IMO weird to ask the users to use wallclock=foo to override a
previous command line wallclock option and fallback to the default
behavior.

> > > +
> > >  ### watchdog (x86)
> > >  > `= force | `
> > >  
> > > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> > > index 29b026735e5d..e4751684951e 100644
> > > --- a/xen/arch/x86/time.c
> > > +++ b/xen/arch/x86/time.c
> > > @@ -1552,6 +1552,37 @@ static const char *__init 
> > > wallclock_type_to_string(void)
> > >  return "";
> > >  }
> > >  
> > > +static int __init cf_check parse_wallclock(const char *arg)
> > > +{
> > > +wallclock_source = WALLCLOCK_UNSET;
> > > +
> > > +if ( !arg )
> > > +return -EINVAL;
> > > +
> > > +if ( !strcmp("auto", arg) )
> > > +ASSERT(wallclock_source == WALLCLOCK_UNSET);
> > 
> > ... this.
> > 
> > Hitting this assert will manifest as a system reboot/hang with no
> > information on serial/VGA, because all of this runs prior to getting up
> > the console.  You only get to see it on a PVH boot because we bodge the
> > Xen console into default-existence.
> 
> This assert is no-op as wallclock_source is unconditionally set to 
> WALLCLOCK_UNSET few lines above.

As mentioned to Jan - I find it nicer than adding an empty statement.

> > So, ASSERT()/etc really need avoiding wherever possible in cmdline parsing.
> > 
> > In this case, all it serves to do is break examples like `wallclock=xen
> > wallclock=auto` case, which is unlike our latest-takes-precedence
> > behaviour everywhere else.
> > 
> > > +else if ( !strcmp("xen", arg) )
> > > +{
> > > +if ( !xen_guest )
> > 
> > We don't normally treat this path as an error when parsing (we know what
> > it is, even if we can't action it).  Instead, there's no_config_param()
> > to be more friendly (for PVH at least).
> > 
> > It's a bit awkward, but this should do:
> > 
> >     {
> > #ifdef CONFIG_XEN_GUEST
> >         wallclock_source = WALLCLOCK_XEN;
> > #else
> >         no_config_param("XEN_GUEST", "wallclock", s, ss);
> > #endif
> >     }
> 
> Can you boot the binary build with CONFIG_XEN_GUEST=y as native? If so,
> the above will not be enough, a runtime check is needed anyway.
> 
> > There probably wants to be something similar for EFI, although it's not
> > a plain CONFIG so it might be more tricky.
> 
> It needs to be runtime check here even more. Not only because of
> different boot modes, but due to interaction with efi=no-rs (or any
> other reason for not having runtime services). See the comment there.

I agree with Marek, no_config_param() is not enough here: Xen needs to
ensure it has been booted as a Xen guest, or that EFI run-time
services are enabled in order to ensure the selected clock source is
available.

Thanks, Roger.

Re: [PATCH v7 1/2] x86/time: introduce command line option to select wallclock

2024-09-16 Thread Roger Pau Monné

On Fri, Sep 13, 2024 at 02:38:14PM +0200, Jan Beulich wrote:
> On 13.09.2024 09:59, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1552,6 +1552,37 @@ static const char *__init 
> > wallclock_type_to_string(void)
> >  return "";
> >  }
> >  
> > +static int __init cf_check parse_wallclock(const char *arg)
> > +{
> > +wallclock_source = WALLCLOCK_UNSET;
> 
> With this ...
> 
> > +if ( !arg )
> > +return -EINVAL;
> > +
> > +if ( !strcmp("auto", arg) )
> > +ASSERT(wallclock_source == WALLCLOCK_UNSET);
> 
> ... I'm not convinced this is (still) needed.

It reduces to an empty statement in release builds, and is IMO clearer
code wise.  I could replace the assert with a comment, but IMO the
assert conveys the same information in a more compact format and
provides extra safety in case the code is changed and wallclock_source
is no longer initialized to the expected value.

Thanks, Roger.

Re: [XEN PATCH] x86/hvm: make ACPI PM timer support optional

2024-09-16 Thread Roger Pau Monné

On Mon, Sep 16, 2024 at 09:37:57AM +0300, Sergiy Kibrik wrote:
> Introduce config option X86_PMTIMER so that pmtimer driver can be disabled on
> systems that don't need it.

Same comment as in the VGA patch, you need to handle the user passing
X86_EMU_PM.  It's not OK to just ignore the flag if the hypervisor is
built without ACPI PM timer support.

Regards, Roger.

Re: [XEN PATCH] x86/hvm: make stdvga support optional

2024-09-16 Thread Roger Pau Monné

On Mon, Sep 16, 2024 at 09:37:16AM +0300, Sergiy Kibrik wrote:
> 12.09.24 12:14, Roger Pau Monné:
> > Shouldn't Xen report an error if a user attempts to create a domain
> > with X86_EMU_VGA set in emulation_flags, but stdvga has been built
> > time disabled?
> 
> I'm afraid this can accidentally render the system unbootable, because it
> looks like toolstack always sets X86_EMU_VGA flag.

Not for PV or PVH guests.  It won't render the system unbootable, it
would just leave it unable to create HVM guests.  dom0 however, and PV
or PVH guests don't use the X86_EMU_VGA flag.

As pointed out by Jan, we need slightly better integration with the
toolstack.  IMO if we want to pursue this route we need a way for Xen
to advertise which X86_EMU_* are supported at least.

Thanks, Roger.

Re: [PATCH v2] Fix two problems in the microcode parsers

2024-09-13 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 05:11:32PM -0400, Demi Marie Obenour wrote:
> The microcode might come from a questionable source, so it is necessary
> for the parsers to treat it as untrusted.  The CPU will validate the
> microcode before applying it, so loading microcode from unofficial
> sources is actually a legitimate thing to do in some cases.

As said by Jan I think this needs expanding as to what's actually
being fixed, to give readers context.

Additionally you want to add one or more "Fixes" tags if this is a
bugfix.

Thanks, Roger.

Re: [PATCH] x86/mm: undo type change of partial_flags

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 05:38:17PM +0200, Jan Beulich wrote:
> Clang dislikes the boolean type combined with the field being set using
> PTF_partial_set.
> 
> Fixes: 5ffe6d4a02e0 ("types: replace remaining uses of s16")
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

Thanks, Roger.

Re: xen | Failed pipeline for staging | 221f2748

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 05:13:04PM +0200, Jan Beulich wrote:
> On 12.09.2024 17:08, Roger Pau Monné wrote:
> > On Thu, Sep 12, 2024 at 04:30:29PM +0200, Jan Beulich wrote:
> >> On 12.09.2024 14:52, GitLab wrote:
> >>>
> >>>
> >>> Pipeline #1450753094 has failed!
> >>>
> >>> Project: xen ( https://gitlab.com/xen-project/hardware/xen )
> >>> Branch: staging ( 
> >>> https://gitlab.com/xen-project/hardware/xen/-/commits/staging )
> >>>
> >>> Commit: 221f2748 ( 
> >>> https://gitlab.com/xen-project/hardware/xen/-/commit/221f2748e8dabe8361b8cdfcffbeab9102c4c899
> >>>  )
> >>> Commit Message: blkif: reconcile protocol specification with in...
> >>> Commit Author: Roger Pau Monné
> >>> Committed by: Jan Beulich ( https://gitlab.com/jbeulich )
> >>>
> >>>
> >>> Pipeline #1450753094 ( 
> >>> https://gitlab.com/xen-project/hardware/xen/-/pipelines/1450753094 ) 
> >>> triggered by Jan Beulich ( https://gitlab.com/jbeulich )
> >>> had 13 failed jobs.
> >>>
> >>> Job #7809027717 ( 
> >>> https://gitlab.com/xen-project/hardware/xen/-/jobs/7809027717/raw )
> >>>
> >>> Stage: build
> >>> Name: ubuntu-24.04-x86_64-clang
> >>> Job #7809027747 ( 
> >>> https://gitlab.com/xen-project/hardware/xen/-/jobs/7809027747/raw )
> >>>
> >>> Stage: build
> >>> Name: opensuse-tumbleweed-clang
> >>> Job #7809027539 ( 
> >>> https://gitlab.com/xen-project/hardware/xen/-/jobs/7809027539/raw )
> >>>
> >>> Stage: build
> >>> Name: debian-bookworm-clang-debug
> >>
> >> I picked this one as example - Clang dislikes the switch to bool in
> >>
> >> --- a/xen/arch/x86/include/asm/mm.h
> >> +++ b/xen/arch/x86/include/asm/mm.h
> >> @@ -286,8 +286,8 @@ struct page_info
> >>  struct {
> >>  u16 nr_validated_ptes:PAGETABLE_ORDER + 1;
> >>  u16 :16 - PAGETABLE_ORDER - 1 - 1;
> >> -u16 partial_flags:1;
> >> -s16 linear_pt_count;
> >> +bool partial_flags:1;
> >> +int16_t linear_pt_count;
> >>  };
> >>  
> >>  /*
> >>
> >> for places where that field's set using PTF_partial_set:
> >>
> >> arch/x86/mm.c:1582:35: error: converting the result of '<<' to a boolean 
> >> always evaluates to true [-Werror,-Wtautological-constant-compare]
> >> page->partial_flags = PTF_partial_set;
> >>   ^
> >> I wonder why we're not using plain "true" there. Alternatively the change 
> >> to
> >> bool would need undoing.
> > 
> > I'm hitting this locally when building with clang.
> > 
> > I find it confusing that the partial_flags field cannot possibly be a
> > flags field, for it having a width of 1 bit.
> 
> I meanwhile guess the idea may have been that the field could be widened
> if needed, and the low bit would then retain its present meaning. How
> well that would work out if a need for that arose I can't tell of course.
> 
> > I think my proposal would be to rename to partially_validated (or
> > similar) and set it using `true`, but that would also imply re-writing
> > part of the comment in struct page_info definition.
> 
> This may have been named just "partial" originally. Yet with the above
> maybe we really should switch back to uint16_t (for the time being; I'm
> unconvinced of the use of fixed-width types here, or in general when it
> comes to bitfields).

Seeing it's not straightforward how to fix this, I think it's best if
for the time being we revert this part of the change, going back to
use uint16_t for the field.

Thanks, Roger.

Re: xen | Failed pipeline for staging | 221f2748

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 04:30:29PM +0200, Jan Beulich wrote:
> On 12.09.2024 14:52, GitLab wrote:
> > 
> > 
> > Pipeline #1450753094 has failed!
> > 
> > Project: xen ( https://gitlab.com/xen-project/hardware/xen )
> > Branch: staging ( 
> > https://gitlab.com/xen-project/hardware/xen/-/commits/staging )
> > 
> > Commit: 221f2748 ( 
> > https://gitlab.com/xen-project/hardware/xen/-/commit/221f2748e8dabe8361b8cdfcffbeab9102c4c899
> >  )
> > Commit Message: blkif: reconcile protocol specification with in...
> > Commit Author: Roger Pau Monné
> > Committed by: Jan Beulich ( https://gitlab.com/jbeulich )
> > 
> > 
> > Pipeline #1450753094 ( 
> > https://gitlab.com/xen-project/hardware/xen/-/pipelines/1450753094 ) 
> > triggered by Jan Beulich ( https://gitlab.com/jbeulich )
> > had 13 failed jobs.
> > 
> > Job #7809027717 ( 
> > https://gitlab.com/xen-project/hardware/xen/-/jobs/7809027717/raw )
> > 
> > Stage: build
> > Name: ubuntu-24.04-x86_64-clang
> > Job #7809027747 ( 
> > https://gitlab.com/xen-project/hardware/xen/-/jobs/7809027747/raw )
> > 
> > Stage: build
> > Name: opensuse-tumbleweed-clang
> > Job #7809027539 ( 
> > https://gitlab.com/xen-project/hardware/xen/-/jobs/7809027539/raw )
> > 
> > Stage: build
> > Name: debian-bookworm-clang-debug
> 
> I picked this one as example - Clang dislikes the switch to bool in
> 
> --- a/xen/arch/x86/include/asm/mm.h
> +++ b/xen/arch/x86/include/asm/mm.h
> @@ -286,8 +286,8 @@ struct page_info
>  struct {
>  u16 nr_validated_ptes:PAGETABLE_ORDER + 1;
>  u16 :16 - PAGETABLE_ORDER - 1 - 1;
> -u16 partial_flags:1;
> -s16 linear_pt_count;
> +bool partial_flags:1;
> +int16_t linear_pt_count;
>  };
>  
>  /*
> 
> for places where that field's set using PTF_partial_set:
> 
> arch/x86/mm.c:1582:35: error: converting the result of '<<' to a boolean 
> always evaluates to true [-Werror,-Wtautological-constant-compare]
> page->partial_flags = PTF_partial_set;
>   ^
> I wonder why we're not using plain "true" there. Alternatively the change to
> bool would need undoing.

I'm hitting this locally when building with clang.

I find it confusing that the partial_flags field cannot possibly be a
flags field, for it having a width of 1 bit.

I think my proposal would be to rename to partially_validated (or
similar) and set it using `true`, but that would also imply re-writing
part of the comment in struct page_info definition.

I can prepare a patch if this is deemed appropriate.

Thanks, Roger.

Re: [PATCH v6 1/2] x86/time: introduce command line option to select wallclock

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 03:30:29PM +0200, Marek Marczykowski-Górecki wrote:
> On Thu, Sep 12, 2024 at 02:56:55PM +0200, Roger Pau Monné wrote:
> > On Thu, Sep 12, 2024 at 01:57:00PM +0200, Jan Beulich wrote:
> > > On 12.09.2024 13:15, Roger Pau Monne wrote:
> > > > --- a/xen/arch/x86/time.c
> > > > +++ b/xen/arch/x86/time.c
> > > > @@ -1552,6 +1552,35 @@ static const char *__init 
> > > > wallclock_type_to_string(void)
> > > >  return "";
> > > >  }
> > > >  
> > > > +static int __init cf_check parse_wallclock(const char *arg)
> > > > +{
> > > > +if ( !arg )
> > > > +return -EINVAL;
> > > > +
> > > > +if ( !strcmp("auto", arg) )
> > > > +wallclock_source = WALLCLOCK_UNSET;
> > > > +else if ( !strcmp("xen", arg) )
> > > > +{
> > > > +if ( !xen_guest )
> > > > +return -EINVAL;
> > > > +
> > > > +wallclock_source = WALLCLOCK_XEN;
> > > > +}
> > > > +else if ( !strcmp("cmos", arg) )
> > > > +wallclock_source = WALLCLOCK_CMOS;
> > > > +else if ( !strcmp("efi", arg) )
> > > > +/*
> > > > + * Checking if run-time services are available must be done 
> > > > after
> > > > + * command line parsing.
> > > > + */
> > > > +wallclock_source = WALLCLOCK_EFI;
> > > > +else
> > > > +return -EINVAL;
> > > > +
> > > > +return 0;
> > > > +}
> > > > +custom_param("wallclock", parse_wallclock);
> > > > +
> > > >  static void __init probe_wallclock(void)
> > > >  {
> > > >  ASSERT(wallclock_source == WALLCLOCK_UNSET);
> > > > @@ -2527,7 +2556,15 @@ int __init init_xen_time(void)
> > > >  
> > > >  open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration);
> > > >  
> > > > -probe_wallclock();
> > > > +/*
> > > > + * EFI run time services can be disabled from the command line, 
> > > > hence the
> > > > + * check for them cannot be done as part of the wallclock option 
> > > > parsing.
> > > > + */
> > > > +if ( wallclock_source == WALLCLOCK_EFI && !efi_enabled(EFI_RS) )
> > > > +wallclock_source = WALLCLOCK_UNSET;
> > > > +
> > > > +if ( wallclock_source == WALLCLOCK_UNSET )
> > > > +probe_wallclock();
> > > 
> > > I don't want to stand in the way, and I can live with this form, so I'd 
> > > like to
> > > ask that EFI folks or Andrew provide the necessary A-b / R-b. I'd like to 
> > > note
> > > though that there continue to be quirks here. They may not be affecting 
> > > the
> > > overall result as long as we have only three possible wallclocks, but 
> > > there
> > > might be problems if a 4th one appeared.
> > > 
> > > With the EFI_RS check in the command line handler and assuming the 
> > > default case
> > > of no "efi=no-rs" on the command line, EFI_RS may still end up being off 
> > > by the
> > > time the command line is being parsed. With "wallclock=cmos 
> > > wallclock=efi" this
> > > would result in no probing and "cmos" chosen if there was that check in 
> > > place.
> > > With the logic as added here there will be probing in that case. Replace 
> > > "cmos"
> > > by a hypothetical non-default 4th wallclock type (i.e. probing picking 
> > > "cmos"),
> > > and I expect you can see how the result would then not necessarily be as
> > > expected.
> > 
> > My expectation would be that if "wallclock=cmos wallclock=efi" is used
> > the last option overrides any previous one, and hence if that last
> > option is not valid the logic will fallback to the default selection
> > (in this case to probing).
> 
> That would be my expectation too. If some kind of preference would be
> expected, it should looks like wallclock=efi,cmos, but I don't think we
> need that.
> 
> > Thinking about this, it might make sense to unconditionally set
> > wallclock_source = WALLCLOCK_UNSET at the start of parse_wallclock()
> > to avoid previous instances carrying over if later ones are not valid.
> 
> This may be a good idea. But more importantly, the behavior should be
> included in the option documentation (that if a selected value is not
> available, it fallback to auto). And maybe a log message when that
> happens (but I'm okay with skipping this one, as selected wallclock
> source is logged already)?

Thanks, would you be fine with:

### wallclock (x86)
> `= auto | xen | cmos | efi`

> Default: `auto`

Allow forcing the usage of a specific wallclock source.

 * `auto` let the hypervisor select the clocksource based on internal
   heuristics.

 * `xen` force usage of the Xen shared_info wallclock when booted as a Xen
   guest.  This option is only available if the hypervisor was compiled with
   `CONFIG_XEN_GUEST` enabled.

 * `cmos` force usage of the CMOS RTC wallclock.

 * `efi` force usage of the EFI_GET_TIME run-time method when booted from EFI
   firmware.

If the selected option is not available Xen will default to `auto`.

Regards, Roger.

Re: [PATCH] xen/keyhandler: Move key_table[] into __ro_after_init

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 01:51:54PM +0100, Andrew Cooper wrote:
> All registration is done at boot.  Almost...
> 
> iommu_dump_page_tables() is registered in iommu_hwdom_init(), which is called
> twice when LATE_HWDOM is in use.
> 
> register_irq_keyhandler() has an ASSERT() guarding againt multiple
> registration attempts, and the absence of bug reports hints at how many
> configurations use LATE_HWDOM in practice.
> 
> Move the registration into iommu_setup() just after printing the overall
> status of the IOMMU.  For starters, the hardware domain is specifically
> excluded by iommu_dump_page_tables().
> 
> ept_dump_p2m_table is registered in setup_ept_dump() which is non-__init, but
> whose sole caller, start_vmx(), is __init.  Move setup_ept_dump() to match.
> 
> With these two tweeks, all keyhandler reigstration is from __init functions,
> so register_{,irq_}keyhandler() can move, and key_table[] can become
> __ro_after_init.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Roger Pau Monné 

> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Daniel Smith 
> CC: Jason Andryuk 
> 
> CC'ing some OpenXT folks just FYI.
> ---
>  xen/arch/x86/mm/p2m-ept.c   |  2 +-
>  xen/common/keyhandler.c | 10 +-
>  xen/drivers/passthrough/iommu.c |  4 ++--
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 2ea574ca6aef..21728397f9ac 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -1497,7 +1497,7 @@ static void cf_check ept_dump_p2m_table(unsigned char 
> key)
>  rcu_read_unlock(&domlist_read_lock);
>  }
>  
> -void setup_ept_dump(void)
> +void __init setup_ept_dump(void)

I would be tempted to just drop this function altogether and open-code
the call to register_keyhandler().

Thanks, Roger.

Re: [PATCH v6 1/2] x86/time: introduce command line option to select wallclock

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 01:57:00PM +0200, Jan Beulich wrote:
> On 12.09.2024 13:15, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1552,6 +1552,35 @@ static const char *__init 
> > wallclock_type_to_string(void)
> >  return "";
> >  }
> >  
> > +static int __init cf_check parse_wallclock(const char *arg)
> > +{
> > +if ( !arg )
> > +return -EINVAL;
> > +
> > +if ( !strcmp("auto", arg) )
> > +wallclock_source = WALLCLOCK_UNSET;
> > +else if ( !strcmp("xen", arg) )
> > +{
> > +if ( !xen_guest )
> > +return -EINVAL;
> > +
> > +wallclock_source = WALLCLOCK_XEN;
> > +}
> > +else if ( !strcmp("cmos", arg) )
> > +wallclock_source = WALLCLOCK_CMOS;
> > +else if ( !strcmp("efi", arg) )
> > +/*
> > + * Checking if run-time services are available must be done after
> > + * command line parsing.
> > + */
> > +wallclock_source = WALLCLOCK_EFI;
> > +else
> > +return -EINVAL;
> > +
> > +return 0;
> > +}
> > +custom_param("wallclock", parse_wallclock);
> > +
> >  static void __init probe_wallclock(void)
> >  {
> >  ASSERT(wallclock_source == WALLCLOCK_UNSET);
> > @@ -2527,7 +2556,15 @@ int __init init_xen_time(void)
> >  
> >  open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration);
> >  
> > -probe_wallclock();
> > +/*
> > + * EFI run time services can be disabled from the command line, hence 
> > the
> > + * check for them cannot be done as part of the wallclock option 
> > parsing.
> > + */
> > +if ( wallclock_source == WALLCLOCK_EFI && !efi_enabled(EFI_RS) )
> > +wallclock_source = WALLCLOCK_UNSET;
> > +
> > +if ( wallclock_source == WALLCLOCK_UNSET )
> > +probe_wallclock();
> 
> I don't want to stand in the way, and I can live with this form, so I'd like 
> to
> ask that EFI folks or Andrew provide the necessary A-b / R-b. I'd like to note
> though that there continue to be quirks here. They may not be affecting the
> overall result as long as we have only three possible wallclocks, but there
> might be problems if a 4th one appeared.
> 
> With the EFI_RS check in the command line handler and assuming the default 
> case
> of no "efi=no-rs" on the command line, EFI_RS may still end up being off by 
> the
> time the command line is being parsed. With "wallclock=cmos wallclock=efi" 
> this
> would result in no probing and "cmos" chosen if there was that check in place.
> With the logic as added here there will be probing in that case. Replace 
> "cmos"
> by a hypothetical non-default 4th wallclock type (i.e. probing picking 
> "cmos"),
> and I expect you can see how the result would then not necessarily be as
> expected.

My expectation would be that if "wallclock=cmos wallclock=efi" is used
the last option overrides any previous one, and hence if that last
option is not valid the logic will fallback to the default selection
(in this case to probing).

Thinking about this, it might make sense to unconditionally set
wallclock_source = WALLCLOCK_UNSET at the start of parse_wallclock()
to avoid previous instances carrying over if later ones are not valid.

Thanks, Roger.

Re: [PATCH v3] blkif: reconcile protocol specification with in-use implementations

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 02:06:23PM +0200, Jan Beulich wrote:
> On 12.09.2024 11:57, Roger Pau Monne wrote:
> > Current blkif implementations (both backends and frontends) have all slight
> > differences about how they handle the 'sector-size' xenstore node, and how
> > other fields are derived from this value or hardcoded to be expressed in 
> > units
> > of 512 bytes.
> > 
> > To give some context, this is an excerpt of how different implementations 
> > use
> > the value in 'sector-size' as the base unit for to other fields rather than
> > just to set the logical sector size of the block device:
> > 
> > │ sectors xenbus node │ requests sector_number │ 
> > requests {first,last}_sect
> > ┼─┼┼───
> > FreeBSD blk{front,back} │ sector-size │  sector-size   │
> >512
> > ┼─┼┼───
> > Linux blk{front,back}   │ 512 │  512   │
> >512
> > ┼─┼┼───
> > QEMU blkback│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > Windows blkfront│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > MiniOS  │ sector-size │  512   │
> >512
> > 
> > An attempt was made by 67e1c050e36b in order to change the base units of the
> > request fields and the xenstore 'sectors' node.  That however only lead to 
> > more
> > confusion, as the specification now clearly diverged from the reference
> > implementation in Linux.  Such change was only implemented for QEMU Qdisk
> > and Windows PV blkfront.
> > 
> > Partially revert to the state before 67e1c050e36b while adjusting the
> > documentation for 'sectors' to match what it used to be previous to
> > 2fa701e5346d:
> > 
> >  * Declare 'feature-large-sector-size' deprecated.  Frontends should not 
> > expose
> >the node, backends should not make decisions based on its presence.
> > 
> >  * Clarify that 'sectors' xenstore node and the requests fields are always 
> > in
> >512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.
> > 
> > All base units for the fields used in the protocol are 512-byte based, the
> > xenbus 'sector-size' field is only used to signal the logic block size.  
> > When
> > 'sector-size' is greater than 512, blkfront implementations must make sure 
> > that
> > the offsets and sizes (despite being expressed in 512-byte units) are 
> > aligned
> > to the logical block size specified in 'sector-size', otherwise the backend
> > will fail to process the requests.
> > 
> > This will require changes to some of the frontends and backends in order to
> > properly support 'sector-size' nodes greater than 512.
> > 
> > Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the 
> > blkif interface')
> > Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector 
> > based quantities')
> > Signed-off-by: Roger Pau Monné 
> > Reviewed-by: Juergen Gross 
> > Reviewed-by: Anthony PERARD 
> 
> The Fixes: tags generally suggest this wants backporting. I'm a little 
> uncertain
> here though, as it won't really affect anything that is built. Opinions?

I would suggest to backport to open release branches, as people
working on protocol implementations might not use the headers from
unstable, but rather from the last release.

Thanks, Roger.

Re: [PATCH 4/5] types: replace remaining uses of s32

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 12:05:23PM +0200, Jan Beulich wrote:
> On 12.09.2024 11:52, Roger Pau Monné wrote:
> > On Thu, Aug 29, 2024 at 02:01:16PM +0200, Jan Beulich wrote:
> >> ... and move the type itself to linux-compat.h.
> >>
> >> While doing so switch a few adjacent types as well, for (a little bit
> >> of) consistency.
> >>
> >> Signed-off-by: Jan Beulich 
> > 
> > Reviewed-by: Roger Pau Monné 
> 
> Thanks. Andrew asked for a style adjustment, which I wasn't sure about.
> I'd like to follow whatever maintainers prefer, so could you clarify
> that please?

Oh, sorry, I would prefer with the alignment added, as suggested by
Andrew.

Thanks, Roger.

Re: [PATCH 3/5] types: replace remaining uses of s16

2024-09-12 Thread Roger Pau Monné

On Thu, Aug 29, 2024 at 02:00:20PM +0200, Jan Beulich wrote:
> ... and move the type itself to linux-compat.h.
> 
> While doing so switch an adjacent x86 struct page_info field to bool.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH 4/5] types: replace remaining uses of s32

2024-09-12 Thread Roger Pau Monné

On Thu, Aug 29, 2024 at 02:01:16PM +0200, Jan Beulich wrote:
> ... and move the type itself to linux-compat.h.
> 
> While doing so switch a few adjacent types as well, for (a little bit
> of) consistency.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH] x86/traps: Re-enable IRQs after reading cr2 in the #PF handler

2024-09-12 Thread Roger Pau Monné

On Wed, Sep 11, 2024 at 03:58:23PM +0100, Alejandro Vallejo wrote:
> Moves sti directly after the cr2 read and immediately after the #PF
> handler.
> 
> While in the area, remove redundant q suffix to a movq in entry.S
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> I don't think this is a bug as much as an accident about to happen. Even if
> there's no cases at the moment in which the IRQ handler may page fault, that
> might change in the future.
> 
> Note: I haven't tested it extensively beyond running it on GitLab.
> 
> pipeline:
> https://gitlab.com/xen-project/people/agvallejo/xen/-/pipelines/1449182525
> 
> ---
>  xen/arch/x86/traps.c|  2 ++
>  xen/arch/x86/x86_64/entry.S | 11 +--
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 708136f625..1c04c03d9f 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1600,6 +1600,8 @@ void asmlinkage do_page_fault(struct cpu_user_regs 
> *regs)
>  
>  addr = read_cr2();
>  
> +local_irq_enable();

I would maybe add an ASSERT(!local_irq_is_enabled()); at the top of the
function, just to make sure the context is as expected.

> +
>  /* fixup_page_fault() might change regs->error_code, so cache it here. */
>  error_code = regs->error_code;
>  
> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
> index b8482de8ee..ef803f6288 100644
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -844,8 +844,7 @@ handle_exception_saved:
>  #elif !defined(CONFIG_PV)
>  ASSERT_CONTEXT_IS_XEN
>  #endif /* CONFIG_PV */
> -sti
> -1:  movq  %rsp,%rdi
> +1:  mov   %rsp,%rdi

Since you are modifying this already - we usually add a space between
the comma and the next operand.

Thanks, Roger.

Re: [XEN PATCH] x86/hvm: make stdvga support optional

2024-09-12 Thread Roger Pau Monné

On Thu, Sep 12, 2024 at 11:57:09AM +0300, Sergiy Kibrik wrote:
> Introduce config option X86_STDVGA so that stdvga driver can be disabled on
> systems that don't need it.
> 
> Signed-off-by: Sergiy Kibrik 
> ---
>  xen/arch/x86/Kconfig  | 10 ++
>  xen/arch/x86/hvm/Makefile |  2 +-
>  xen/arch/x86/include/asm/hvm/io.h |  5 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 62f0b5e0f4..2ba25e6906 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -385,6 +385,16 @@ config ALTP2M
>  
> If unsure, stay with defaults.
>  
> +config X86_STDVGA
> + bool "Standard VGA card emulation support" if EXPERT
> + default y
> + depends on HVM
> + help
> +   Build stdvga driver that emulates standard VGA card with VESA BIOS
> +  Extensions for HVM guests.
> +
> +   If unsure, say Y.
> +
>  endmenu
>  
>  source "common/Kconfig"
> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> index 4c1fa5c6c2..4d1f8e00eb 100644
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -22,7 +22,7 @@ obj-y += pmtimer.o
>  obj-y += quirks.o
>  obj-y += rtc.o
>  obj-y += save.o
> -obj-y += stdvga.o
> +obj-$(CONFIG_X86_STDVGA) += stdvga.o
>  obj-y += vioapic.o
>  obj-y += vlapic.o
>  obj-y += vm_event.o
> diff --git a/xen/arch/x86/include/asm/hvm/io.h 
> b/xen/arch/x86/include/asm/hvm/io.h
> index 24d1b6134f..9b8d4f6b7a 100644
> --- a/xen/arch/x86/include/asm/hvm/io.h
> +++ b/xen/arch/x86/include/asm/hvm/io.h
> @@ -128,8 +128,13 @@ struct hvm_hw_stdvga {
>  spinlock_t lock;
>  };
>  
> +#ifdef CONFIG_X86_STDVGA
>  void stdvga_init(struct domain *d);
>  void stdvga_deinit(struct domain *d);
> +#else
> +static inline void stdvga_init(struct domain *d) {}
> +static inline void stdvga_deinit(struct domain *d) {}
> +#endif

Shouldn't Xen report an error if a user attempts to create a domain
with X86_EMU_VGA set in emulation_flags, but stdvga has been built
time disabled?

Thanks, Roger.

Re: [PATCH v5 3/4] x86/time: introduce command line option to select wallclock

2024-09-10 Thread Roger Pau Monné

On Tue, Sep 10, 2024 at 04:29:43PM +0200, Jan Beulich wrote:
> On 10.09.2024 16:24, Roger Pau Monné wrote:
> > On Tue, Sep 10, 2024 at 03:49:52PM +0200, Jan Beulich wrote:
> >> On 10.09.2024 15:10, Roger Pau Monné wrote:
> >>>  Would you be fine with
> >>> adding the following in init_xen_time():
> >>>
> >>> /*
> >>>  * EFI run time services can be disabled form the command line, hence 
> >>> the
> >>>  * check for them cannot be done as part of the wallclock option 
> >>> parsing.
> >>>  */
> >>> if ( wallclock_source == WALLCLOCK_EFI && !efi_enabled(EFI_RS) )
> >>> wallclock_source = WALLCLOCK_UNSET;
> >>>
> >>> if ( wallclock_source == WALLCLOCK_UNSET )
> >>> probe_wallclock();
> >>
> >> ... this is probably the best we can do (nit: s/form/from/ in the comment;
> >> maybe also "..., hence the check done as part of option parsing may not
> >> suffice" or some such).
> > 
> > I didn't put in my previous reply, but I've removed the efi_enabled()
> > check from the option parsing and instead added this comment:
> > 
> > /*
> >  * Checking if run-time services are available must be done after
> >  * command line parsing.
> >  */
> > 
> > I don't think there's much point in doing the check in
> > parse_wallclock() if it's not reliable, so your reference in the
> > comment to "the check done as part of option parsing" is no longer
> > valid.
> 
> Hmm. Rejecting the option if we can is reasonable imo. "efi=rs" can imo only
> sensibly be used to override an earlier "efi=no-rs". Hence what we see while
> parsing the wallclock option gives us at least reasonable grounds to reject
> the option if EFI_RS is already clear. We then merely fail to reject the
> option if the flag is cleared later.

I won't strongly argue about it, but I think having a non-reliable
check in parse_wallclock() is confusing.  I would have to add a
comment there anyway to note that depending on the position of the efi
and wallclock parameters the check for EFI_RS might not be effective -
at which point I think it's best to unify the check so it's uniformly
performed in init_xen_time().

> Yet in the end I'd be happy to leave this particular aspect to you and the
> EFI maintainers.

Thanks again for the feedback.

Roger.

Re: [PATCH v5 3/4] x86/time: introduce command line option to select wallclock

2024-09-10 Thread Roger Pau Monné

On Tue, Sep 10, 2024 at 03:49:52PM +0200, Jan Beulich wrote:
> On 10.09.2024 15:10, Roger Pau Monné wrote:
> > On Tue, Sep 10, 2024 at 11:32:05AM +0200, Jan Beulich wrote:
> >> On 09.09.2024 16:54, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/time.c
> >>> +++ b/xen/arch/x86/time.c
> >>> @@ -1550,6 +1550,36 @@ static const char *__init 
> >>> wallclock_type_to_string(void)
> >>>  return "";
> >>>  }
> >>>  
> >>> +static int __init cf_check parse_wallclock(const char *arg)
> >>> +{
> >>> +if ( !arg )
> >>> +return -EINVAL;
> >>> +
> >>> +if ( !strcmp("auto", arg) )
> >>> +wallclock_source = WALLCLOCK_UNSET;
> >>> +else if ( !strcmp("xen", arg) )
> >>> +{
> >>> +if ( !xen_guest )
> >>> +return -EINVAL;
> >>> +
> >>> +wallclock_source = WALLCLOCK_XEN;
> >>> +}
> >>> +else if ( !strcmp("cmos", arg) )
> >>> +wallclock_source = WALLCLOCK_CMOS;
> >>> +else if ( !strcmp("efi", arg) )
> >>> +{
> >>> +if ( !efi_enabled(EFI_RS) )
> >>> +return -EINVAL;
> >>
> >> I'm afraid there's a problem here, and I'm sorry for not paying attention
> >> earlier: EFI_RS is possibly affected by "efi=" (and hence may change after
> >> this code ran). (It can also be cleared if ->SetVirtualAddressMap() fails,
> >> but I think that's strictly ahead of command line parsing.)
> > 
> > Hm, I see, thanks for noticing.  Anyone using 'efi=no-rs
> > wallclock=efi' likely deserves to be punished.
> 
> Well, if you don't want to actually do this ;-) then ...

It's not too complicated to attempt to arrange for something half sane
even if the user provided options are nonsense.  I've seen people
accumulate all kind of crap on the command line "just because I've
read it online".

> >  Would you be fine with
> > adding the following in init_xen_time():
> > 
> > /*
> >  * EFI run time services can be disabled form the command line, hence 
> > the
> >  * check for them cannot be done as part of the wallclock option 
> > parsing.
> >  */
> > if ( wallclock_source == WALLCLOCK_EFI && !efi_enabled(EFI_RS) )
> > wallclock_source = WALLCLOCK_UNSET;
> > 
> > if ( wallclock_source == WALLCLOCK_UNSET )
> > probe_wallclock();
> 
> ... this is probably the best we can do (nit: s/form/from/ in the comment;
> maybe also "..., hence the check done as part of option parsing may not
> suffice" or some such).

I didn't put in my previous reply, but I've removed the efi_enabled()
check from the option parsing and instead added this comment:

/*
 * Checking if run-time services are available must be done after
 * command line parsing.
 */

I don't think there's much point in doing the check in
parse_wallclock() if it's not reliable, so your reference in the
comment to "the check done as part of option parsing" is no longer
valid.

Thanks, Roger.

Re: [PATCH v2] blkif: reconcile protocol specification with in-use implementations

2024-09-10 Thread Roger Pau Monné

On Tue, Sep 10, 2024 at 03:46:00PM +0200, Jürgen Groß wrote:
> On 10.09.24 13:46, Roger Pau Monne wrote:
> > Current blkif implementations (both backends and frontends) have all slight
> > differences about how they handle the 'sector-size' xenstore node, and how
> > other fields are derived from this value or hardcoded to be expressed in 
> > units
> > of 512 bytes.
> > 
> > To give some context, this is an excerpt of how different implementations 
> > use
> > the value in 'sector-size' as the base unit for to other fields rather than
> > just to set the logical sector size of the block device:
> > 
> >  │ sectors xenbus node │ requests sector_number │ 
> > requests {first,last}_sect
> > ┼─┼┼───
> > FreeBSD blk{front,back} │ sector-size │  sector-size   │
> >512
> > ┼─┼┼───
> > Linux blk{front,back}   │ 512 │  512   │
> >512
> > ┼─┼┼───
> > QEMU blkback│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > Windows blkfront│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > MiniOS  │ sector-size │  512   │
> >512
> > 
> > An attempt was made by 67e1c050e36b in order to change the base units of the
> > request fields and the xenstore 'sectors' node.  That however only lead to 
> > more
> > confusion, as the specification now clearly diverged from the reference
> > implementation in Linux.  Such change was only implemented for QEMU Qdisk
> > and Windows PV blkfront.
> > 
> > Partially revert to the state before 67e1c050e36b while adjusting the
> > documentation for 'sectors' to match what it used to be previous to
> > 2fa701e5346d:
> > 
> >   * Declare 'feature-large-sector-size' deprecated.  Frontends should not 
> > expose
> > the node, backends should not make decisions based on its presence.
> > 
> >   * Clarify that 'sectors' xenstore node and the requests fields are always 
> > in
> > 512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.
> > 
> > All base units for the fields used in the protocol are 512-byte based, the
> > xenbus 'sector-size' field is only used to signal the logic block size.  
> > When
> > 'sector-size' is greater than 512, blkfront implementations must make sure 
> > that
> > the offsets and sizes (despite being expressed in 512-byte units) are 
> > aligned
> > to the logical block size specified in 'sector-size', otherwise the backend
> > will fail to process the requests.
> > 
> > This will require changes to some of the frontends and backends in order to
> > properly support 'sector-size' nodes greater than 512.
> > 
> > Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the 
> > blkif interface')
> > Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector 
> > based quantities')
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Changes since v1:
> >   - Update commit message.
> >   - Expand comments.
> > ---
> >   xen/include/public/io/blkif.h | 50 ++-
> >   1 file changed, 38 insertions(+), 12 deletions(-)
> > 
> > diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
> > index 22f1eef0c0ca..da893eb379db 100644
> > --- a/xen/include/public/io/blkif.h
> > +++ b/xen/include/public/io/blkif.h
> > @@ -237,12 +237,16 @@
> >* sector-size
> >*  Values: 
> >*
> > - *  The logical block size, in bytes, of the underlying storage. This
> > - *  must be a power of two with a minimum value of 512.
> > + *  The logical block size, in bytes, of the underlying storage. This 
> > must
> > + *  be a power of two with a minimum value of 512.  The sector size 
> > should
> > + *  only be used for request seg

Re: [PATCH v3] xen/x86/pvh: handle ACPI RSDT table in PVH Dom0 build

2024-09-10 Thread Roger Pau Monné

Ping?

I think this is a useful change, could we please have a new version
with the proposed adjustments?

Thanks, Roger.

On Wed, Apr 24, 2024 at 03:18:26PM -0400, Daniel P. Smith wrote:
> From: Stefano Stabellini 
> 
> Xen always generates as XSDT table even if the firmware provided an RSDT 
> table.
> Copy the RSDT header from the firmware table, adjusting the signature, for the
> XSDT table when not provided by the firmware.
> 
> Fixes: 1d74282c455f ('x86: setup PVHv2 Dom0 ACPI tables')
> Suggested-by: Roger Pau Monné 
> Signed-off-by: Stefano Stabellini 
> Signed-off-by: Daniel P. Smith 
> ---
> 
> This patch is used for development and testing of hyperlaunch using the QEMU
> emulator. After dicussiong with Stefano, he was okay with me addressing Jan's
> comment regarding the table signature and reposting for acceptance.
> 
> Changes in v3:
> - ensure the constructed XSDT table always has the correct signature
> 
> ---
>  xen/arch/x86/hvm/dom0_build.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index ac71d43a6b3f..781aac00fe72 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -1077,7 +1077,16 @@ static int __init pvh_setup_acpi_xsdt(struct domain 
> *d, paddr_t madt_addr,
>  rc = -EINVAL;
>  goto out;
>  }
> -xsdt_paddr = rsdp->xsdt_physical_address;
> +/*
> + * Note the header is the same for both RSDT and XSDT, so it's fine to
> + * copy the native RSDT header to the Xen crafted XSDT if no native
> + * XSDT is available.
> + */
> +if ( rsdp->revision > 1 && rsdp->xsdt_physical_address )
> +xsdt_paddr = rsdp->xsdt_physical_address;
> +else
> +xsdt_paddr = rsdp->rsdt_physical_address;
> +
>  acpi_os_unmap_memory(rsdp, sizeof(*rsdp));
>  table = acpi_os_map_memory(xsdt_paddr, sizeof(*table));
>  if ( !table )
> @@ -1089,6 +1098,9 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, 
> paddr_t madt_addr,
>  xsdt->header = *table;
>  acpi_os_unmap_memory(table, sizeof(*table));
>  
> +/* In case the header is an RSDT copy, blindly ensure it has an XSDT sig 
> */
> +xsdt->header.signature[0] = 'X';
> +
>  /* Add the custom MADT. */
>  xsdt->table_offset_entry[0] = madt_addr;
>  
> -- 
> 2.30.2
>

Re: [PATCH v5 3/4] x86/time: introduce command line option to select wallclock

2024-09-10 Thread Roger Pau Monné

On Tue, Sep 10, 2024 at 11:32:05AM +0200, Jan Beulich wrote:
> On 09.09.2024 16:54, Roger Pau Monne wrote:
> > Allow setting the used wallclock from the command line.  When the option is 
> > set
> > to a value different than `auto` the probing is bypassed and the selected
> > implementation is used (as long as it's available).
> > 
> > The `xen` and `efi` options require being booted as a Xen guest (with Xen 
> > guest
> > supported built-in) or from UEFI firmware.
> 
> Perhaps add "respectively"?

Sure.

> 
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1550,6 +1550,36 @@ static const char *__init 
> > wallclock_type_to_string(void)
> >  return "";
> >  }
> >  
> > +static int __init cf_check parse_wallclock(const char *arg)
> > +{
> > +if ( !arg )
> > +return -EINVAL;
> > +
> > +if ( !strcmp("auto", arg) )
> > +wallclock_source = WALLCLOCK_UNSET;
> > +else if ( !strcmp("xen", arg) )
> > +{
> > +if ( !xen_guest )
> > +return -EINVAL;
> > +
> > +wallclock_source = WALLCLOCK_XEN;
> > +}
> > +else if ( !strcmp("cmos", arg) )
> > +wallclock_source = WALLCLOCK_CMOS;
> > +else if ( !strcmp("efi", arg) )
> > +{
> > +if ( !efi_enabled(EFI_RS) )
> > +return -EINVAL;
> 
> I'm afraid there's a problem here, and I'm sorry for not paying attention
> earlier: EFI_RS is possibly affected by "efi=" (and hence may change after
> this code ran). (It can also be cleared if ->SetVirtualAddressMap() fails,
> but I think that's strictly ahead of command line parsing.)

Hm, I see, thanks for noticing.  Anyone using 'efi=no-rs
wallclock=efi' likely deserves to be punished.  Would you be fine with
adding the following in init_xen_time():

/*
 * EFI run time services can be disabled form the command line, hence the
 * check for them cannot be done as part of the wallclock option parsing.
 */
if ( wallclock_source == WALLCLOCK_EFI && !efi_enabled(EFI_RS) )
wallclock_source = WALLCLOCK_UNSET;

if ( wallclock_source == WALLCLOCK_UNSET )
probe_wallclock();

Thanks, Roger.

Re: [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot

2024-09-10 Thread Roger Pau Monné

On Tue, Sep 10, 2024 at 11:00:27AM +0200, Jan Beulich wrote:
> On 10.09.2024 10:54, Roger Pau Monné wrote:
> > On Tue, Aug 13, 2024 at 05:54:54PM +0200, Jan Beulich wrote:
> >> On 26.07.2024 17:21, Roger Pau Monne wrote:
> >>> The idle_pg_table L4 is cloned to create all the other L4 Xen uses, and 
> >>> hence
> >>> it shouldn't be modified once further L4 are created.
> >>
> >> Yes, but the window between moving into SYS_STATE_smp_boot and Dom0 having
> >> its initial page tables created is quite large. If the justification was
> >> relative to AP bringup, that may be all fine. But when related to cloning,
> >> I think that would then truly want keying to there being any non-system
> >> domain(s).
> > 
> > Further changes in this series will add a per-CPU idle page table, and
> > hence we need to ensure that by the time APs are started the BSP L4 idle
> > page directory is not changed, as otherwise the copies in the APs
> > would get out of sync.
> > 
> > The idle system domain is indeed tied to the idle page talbes, but the
> > idle vCPU0 (the BSP) directly uses idle_pg_table (no copying), and
> > hence it's fine to allow modifications of the L4 idle page table
> > directory up to when APs are started (those will indeed make copies of
> > the idle L4.
> 
> Which may want at least mentioning in the description then. I take it
> that ...
> 
> >>> --- a/xen/arch/x86/mm.c
> >>> +++ b/xen/arch/x86/mm.c
> >>> @@ -5023,6 +5023,12 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long 
> >>> v)
> >>>  mfn_t l3mfn;
> >>>  l3_pgentry_t *l3t = alloc_mapped_pagetable(&l3mfn);
> >>>  
> >>> +/*
> >>> + * dom0 is build at smp_boot, at which point we already create 
> >>> new L4s
> >>> + * based on idle_pg_table.
> >>> + */
> 
> ... this comment is then refined by the later patches you refer to?

Hm, I would have to double check, not sure I've updated it once the
idle_pg_table is cloned for AP bringup.

Will expand commit message and update the comment here if not done
already by later patches.

Thanks, Roger.

Re: [PATCH] blkif: reconcile protocol specification with in-use implementations

2024-09-10 Thread Roger Pau Monné

On Wed, Sep 04, 2024 at 01:25:46PM +, Anthony PERARD wrote:
> On Tue, Sep 03, 2024 at 04:19:23PM +0200, Roger Pau Monne wrote:
> > Current blkif implementations (both backends and frontends) have all slight
> > differences about how they handle the 'sector-size' xenstore node, and how
> > other fields are derived from this value or hardcoded to be expressed in 
> > units
> > of 512 bytes.
> >
> > To give some context, this is an excerpt of how different implementations 
> > use
> > the value in 'sector-size' as the base unit for to other fields rather than
> > just to set the logical sector size of the block device:
> >
> > │ sectors xenbus node │ requests sector_number │ 
> > requests {first,last}_sect
> > ┼─┼┼───
> > FreeBSD blk{front,back} │ sector-size │  sector-size   │
> >512
> > ┼─┼┼───
> > Linux blk{front,back}   │ 512 │  512   │
> >512
> > ┼─┼┼───
> > QEMU blkback│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > Windows blkfront│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > MiniOS  │ sector-size │  512   │
> >512
> >
> > An attempt was made by 67e1c050e36b in order to change the base units of the
> > request fields and the xenstore 'sectors' node.  That however only lead to 
> > more
> > confusion, as the specification now clearly diverged from the reference
> > implementation in Linux.  Such change was only implemented for QEMU Qdisk
> > and Windows PV blkfront.
> >
> > Partially revert to the state before 67e1c050e36b:
> >
> >  * Declare 'feature-large-sector-size' deprecated.  Frontends should not 
> > expose
> >the node, backends should not make decisions based on its presence.
> >
> >  * Clarify that 'sectors' xenstore node and the requests fields are always 
> > in
> >512-byte units, like it was previous to 67e1c050e36b.
> >
> > All base units for the fields used in the protocol are 512-byte based, the
> > xenbus 'sector-size' field is only used to signal the logic block size.  
> > When
> > 'sector-size' is greater than 512, blkfront implementations must make sure 
> > that
> > the offsets and sizes (even when expressed in 512-byte units) are aligned to
> > the logical block size specified in 'sector-size', otherwise the backend 
> > will
> > fail to process the requests.
> >
> > This will require changes to some of the frontends and backends in order to
> > properly support 'sector-size' nodes greater than 512.
> >
> > Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector 
> > based quantities')
> 
> Probably want to add:
> Fixes: 2fa701e5346d ("blkif.h: Provide more complete documentation of the 
> blkif interface")
> 
> > Signed-off-by: Roger Pau Monné 
> > ---
> >  xen/include/public/io/blkif.h | 23 ++-
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
> > index 22f1eef0c0ca..07708f4d08eb 100644
> > --- a/xen/include/public/io/blkif.h
> > +++ b/xen/include/public/io/blkif.h
> > @@ -240,10 +240,6 @@
> >   *  The logical block size, in bytes, of the underlying storage. This
> >   *  must be a power of two with a minimum value of 512.
> 
> Should we add that "sector-size" is to be used only for request length?

Yes, that would be fine.

What about:

The logical block size, in bytes, of the underlying storage. This must
be a power of two with a minimum value of 512.  The sector size should
only be used for request segment length and alignment.

> 
> > - *  NOTE: Because of implementation bugs in some frontends this must be
> > - *set to 512, unless the frontend advertizes a non-zero value
> > - *in its "feature-large-secto

Re: [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot

2024-09-10 Thread Roger Pau Monné

On Tue, Aug 13, 2024 at 05:54:54PM +0200, Jan Beulich wrote:
> On 26.07.2024 17:21, Roger Pau Monne wrote:
> > The idle_pg_table L4 is cloned to create all the other L4 Xen uses, and 
> > hence
> > it shouldn't be modified once further L4 are created.
> 
> Yes, but the window between moving into SYS_STATE_smp_boot and Dom0 having
> its initial page tables created is quite large. If the justification was
> relative to AP bringup, that may be all fine. But when related to cloning,
> I think that would then truly want keying to there being any non-system
> domain(s).

Further changes in this series will add a per-CPU idle page table, and
hence we need to ensure that by the time APs are started the BSP L4 idle
page directory is not changed, as otherwise the copies in the APs
would get out of sync.

The idle system domain is indeed tied to the idle page talbes, but the
idle vCPU0 (the BSP) directly uses idle_pg_table (no copying), and
hence it's fine to allow modifications of the L4 idle page table
directory up to when APs are started (those will indeed make copies of
the idle L4.

> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5023,6 +5023,12 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
> >  mfn_t l3mfn;
> >  l3_pgentry_t *l3t = alloc_mapped_pagetable(&l3mfn);
> >  
> > +/*
> > + * dom0 is build at smp_boot, at which point we already create new 
> > L4s
> > + * based on idle_pg_table.
> > + */
> > +BUG_ON(system_state >= SYS_STATE_smp_boot);
> 
> Which effectively means most of this function could become __init (e.g. by
> moving into a helper). We'd then hit the BUG_ON() prior to init_done()
> destroying the .init.* mappings, and we'd simply #PF afterwards. That's
> not so much for the space savings in .text, but to document the limited
> lifetime of the (helper) function directly in its function head.

IMO the BUG_ON() is clearer to debug, but I won't mind splitting the
logic inside the if body into a separate helper.

> I further wonder whether in such a case the enclosing if() wouldn't want
> to gain unlikely() at the same time.

Yes, I can certainly add that.

Thanks, Roger.

Re: [PATCH] fbdev/xen-fbfront: Assign fb_info->device

2024-09-10 Thread Roger Pau Monné

On Tue, Sep 10, 2024 at 09:29:30AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 10.09.24 um 09:22 schrieb Roger Pau Monné:
> > On Mon, Sep 09, 2024 at 10:09:16PM -0400, Jason Andryuk wrote:
> > > From: Jason Andryuk 
> > > 
> > > Probing xen-fbfront faults in video_is_primary_device().  The passed-in
> > > struct device is NULL since xen-fbfront doesn't assign it and the
> > > memory is kzalloc()-ed.  Assign fb_info->device to avoid this.
> > > 
> > > This was exposed by the conversion of fb_is_primary_device() to
> > > video_is_primary_device() which dropped a NULL check for struct device.
> > > 
> > > Fixes: f178e96de7f0 ("arch: Remove struct fb_info from video helpers")
> > > Reported-by: Arthur Borsboom 
> > > Closes: 
> > > https://lore.kernel.org/xen-devel/CALUcmUncX=lkxweisitksdy-coe8qkswhfvccneokfrkd0z...@mail.gmail.com/
> > > Tested-by: Arthur Borsboom 
> > > CC: sta...@vger.kernel.org
> > > Signed-off-by: Jason Andryuk 
> > Reviewed-by: Roger Pau Monné 
> > 
> > > ---
> > > The other option would be to re-instate the NULL check in
> > > video_is_primary_device()
> > I do think this is needed, or at least an explanation.  The commit
> > message in f178e96de7f0 doesn't mention anything about
> > video_is_primary_device() not allowing being passed a NULL device
> > (like it was possible with fb_is_primary_device()).
> > 
> > Otherwise callers of video_is_primary_device() would need to be
> > adjusted to check for device != NULL.
> 
> The helper expects a non-NULL pointer. We might want to document this.

A BUG_ON(!dev); might be enough documentation that the function
expected a non-NULL dev IMO.

Thanks, Roger.

Re: [PATCH] fbdev/xen-fbfront: Assign fb_info->device

2024-09-10 Thread Roger Pau Monné

On Mon, Sep 09, 2024 at 10:09:16PM -0400, Jason Andryuk wrote:
> From: Jason Andryuk 
> 
> Probing xen-fbfront faults in video_is_primary_device().  The passed-in
> struct device is NULL since xen-fbfront doesn't assign it and the
> memory is kzalloc()-ed.  Assign fb_info->device to avoid this.
> 
> This was exposed by the conversion of fb_is_primary_device() to
> video_is_primary_device() which dropped a NULL check for struct device.
> 
> Fixes: f178e96de7f0 ("arch: Remove struct fb_info from video helpers")
> Reported-by: Arthur Borsboom 
> Closes: 
> https://lore.kernel.org/xen-devel/CALUcmUncX=lkxweisitksdy-coe8qkswhfvccneokfrkd0z...@mail.gmail.com/
> Tested-by: Arthur Borsboom 
> CC: sta...@vger.kernel.org
> Signed-off-by: Jason Andryuk 

Reviewed-by: Roger Pau Monné 

> ---
> The other option would be to re-instate the NULL check in
> video_is_primary_device()

I do think this is needed, or at least an explanation.  The commit
message in f178e96de7f0 doesn't mention anything about
video_is_primary_device() not allowing being passed a NULL device
(like it was possible with fb_is_primary_device()).

Otherwise callers of video_is_primary_device() would need to be
adjusted to check for device != NULL.

Thanks, Roger.

Re: [PATCH v4 4/6] x86/time: introduce probing logic for the wallclock

2024-09-09 Thread Roger Pau Monné

On Thu, Sep 05, 2024 at 05:58:47PM +0200, Jan Beulich wrote:
> On 04.09.2024 17:31, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1291,14 +1291,23 @@ static bool __get_cmos_time(struct rtc_time *rtc)
> >  return t1 <= SECONDS(1) && t2 < MILLISECS(3);
> >  }
> >  
> > -static bool cmos_probe(struct rtc_time *rtc_p, bool cmos_rtc_probe)
> > +static bool __initdata cmos_rtc_probe;
> > +boolean_param("cmos-rtc-probe", cmos_rtc_probe);
> > +
> > +static bool __init cmos_probe(void)
> 
> I'm sorry for not paying attention to this earlier, but "cmos" alone
> is misleading here and perhaps even more so for cmos_read(). These
> aren't about the CMOS (storage) but the CMOS RTC. May I suggest
> cmos_rtc_{probe,read}() instead?

I've assumed that those living in time.c would be clear enough it's
the CMOS RTC, but I'm fine with renaming to cmos_rtc_{probe,read}().

> 
> >  {
> >  unsigned int seconds = 60;
> >  
> > +if ( !(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC) )
> > +return true;
> > +
> > +if ( !cmos_rtc_probe )
> > +return false;
> 
> With this I think ...
> 
> >  for ( ; ; )
> >  {
> > -bool success = __get_cmos_time(rtc_p);
> > -struct rtc_time rtc = *rtc_p;
> > +struct rtc_time rtc;
> > +bool success = __get_cmos_time(&rtc);
> >  
> >  if ( likely(!cmos_rtc_probe) )
> >  return true;
> 
> ... this ends up being dead code.

Indeed, I've missed to remove that one when moving the check outside
of the for loop.

Thanks, Roger.

Re: [PATCH v4 1/6] x86/time: introduce helper to fetch Xen wallclock when running as a guest

2024-09-09 Thread Roger Pau Monné

On Thu, Sep 05, 2024 at 05:45:20PM +0200, Jan Beulich wrote:
> On 04.09.2024 17:31, Roger Pau Monne wrote:
> > Move the current code in get_wallclock_time() to fetch the Xen wallclock
> > information from the shared page when running as a guest into a separate
> > helper.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> Acked-by: Jan Beulich 
> 
> Still ...
> 
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -787,6 +787,30 @@ static struct platform_timesource 
> > __initdata_cf_clobber plt_xen_timer =
> >  };
> >  #endif
> >  
> > +static unsigned long read_xen_wallclock(void)
> > +{
> > +#ifdef CONFIG_XEN_GUEST
> > +struct shared_info *sh_info = XEN_shared_info;
> 
> ... I'd like to switch this to pointer-to-const while committing, if okay
> with you.

Oh, sure.

Thanks, Roger.

Re: Setting up the Xen Communications Team

2024-09-09 Thread Roger Pau Monné

On Fri, Sep 06, 2024 at 01:54:13PM +0100, Kelly Choi wrote:
> Hi all,
> 
> Thank you for your interest, we'll look into setting up a new Xen
> PR/communications group.
> If anyone else in the community would like to join, please reply directly
> to this email.

I would be interested in joining.

Thanks, Roger.

Re: [XEN PATCH v14 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-09-09 Thread Roger Pau Monné

On Mon, Sep 09, 2024 at 10:56:07AM +0200, Jan Beulich wrote:
> On 07.09.2024 01:34, Stefano Stabellini wrote:
> > On Fri, 6 Sep 2024, Jan Beulich wrote:
> >> On 06.09.2024 00:51, Stefano Stabellini wrote:
> >>> On Thu, 5 Sep 2024, Jan Beulich wrote:
>  On 05.09.2024 08:45, Chen, Jiqian wrote:
> > HI,
> >
> > On 2024/9/4 14:04, Jan Beulich wrote:
> >> On 04.09.2024 03:43, Stefano Stabellini wrote:
> >>> On Tue, 3 Sep 2024, Jan Beulich wrote:
>  On 03.09.2024 12:53, Chen, Jiqian wrote:
> > On 2024/9/3 17:25, Jan Beulich wrote:
> >> On 03.09.2024 09:58, Chen, Jiqian wrote:
> >>> On 2024/9/3 15:42, Jan Beulich wrote:
>  On 03.09.2024 09:04, Jiqian Chen wrote:
> > When dom0 is PVH type and passthrough a device to HVM domU, 
> > Qemu code
> > xen_pt_realize->xc_physdev_map_pirq and libxl code 
> > pci_add_dm_done->
> > xc_physdev_map_pirq map a pirq for passthrough devices.
> > In xc_physdev_map_pirq call stack, function hvm_physdev_op has 
> > a check
> > has_pirq(currd), but currd is PVH dom0, PVH has no 
> > X86_EMU_USE_PIRQ flag,
> > so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in 
> > current
> > codes.
> >
> > But it is fine to map interrupts through pirq to a HVM domain 
> > whose
> > XENFEAT_hvm_pirqs is not enabled. Because pirq field is used as 
> > a way to
> > reference interrupts and it is just the way for the device 
> > model to
> > identify which interrupt should be mapped to which domain, 
> > however
> > has_pirq() is just to check if HVM domains route interrupts from
> > devices(emulated or passthrough) through event channel, so, the 
> > has_pirq()
> > check should not be applied to the PHYSDEVOP_map_pirq issued by 
> > dom0.
> >
> > So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
> > PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq. 
> > Then the
> > interrupt of a passthrough device can be successfully mapped to 
> > pirq for domU.
> 
>  As before: When you talk about just Dom0, ...
> 
> > --- a/xen/arch/x86/hvm/hypercall.c
> > +++ b/xen/arch/x86/hvm/hypercall.c
> > @@ -73,6 +73,8 @@ long hvm_physdev_op(int cmd, 
> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >  {
> >  case PHYSDEVOP_map_pirq:
> >  case PHYSDEVOP_unmap_pirq:
> > +break;
> > +
> >  case PHYSDEVOP_eoi:
> >  case PHYSDEVOP_irq_status_query:
> >  case PHYSDEVOP_get_free_pirq:
> 
>  ... that ought to match the code. IOW you've again lost why it 
>  is okay(ish)
>  (or even necessary) to also permit the op for non-Dom0 (e.g. a 
>  PVH stubdom).
>  Similarly imo Dom0 using this on itself wants discussing.
> >>> Do you mean I need to talk about why permit this op for all HVM
> >>
> >> You don't need to invent reasons, but it needs making clear that 
> >> wider than
> >> necessary (for your purpose) exposure is at least not going to be 
> >> a problem.
> >>
> >>> and  where can prevent PVH domain calling this for self-mapping, 
> >>> not only dom0?
> >>
> >> Aiui use on itself is limited to Dom0, so only that would need 
> >> clarifying
> >> (along the lines of the above, i.e. that/why it is not a problem). 
> >> For
> >> has_pirq() domains use on oneself was already permitted before.
> >
> > Changed commit message to below. Please check and that will be 
> > great helpful if you would show me how to modify it.
> > {
> > x86/pvh: Allow (un)map_pirq when dom0 is PVH
> >
> > Problem: when dom0 is PVH type and passthrough a device to HVM 
> > domU, Qemu
> > code xen_pt_realize->xc_physdev_map_pirq and libxl code 
> > pci_add_dm_done->
> > xc_physdev_map_pirq map a pirq for passthrough devices.
> > In xc_physdev_map_pirq call stack, function hvm_physdev_op has a 
> > check
> > has_pirq(currd), but currd is PVH dom0, PVH has no X86_EMU_USE_PIRQ 
> > flag,
> > so it fails, PHYSDEVOP_map_pirq is not allowed for PVH dom0 in 
> > current
> > codes.
> >
> > To solve above problem, need to remove the chack has_pirq() for that
> > situation(PHYSDEVOP_map_pirq is issued by dom0 for domUs). But 
> >

Re: [XEN PATCH v14 1/5] xen/pci: Add hypercall to support reset of pcidev

2024-09-09 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 03:04:20PM +0800, Jiqian Chen wrote:
> When a device has been reset on dom0 side, the Xen hypervisor
> doesn't get notification, so the cached state in vpci is all
> out of date compare with the real device state.
> 
> To solve that problem, add a new hypercall to support the reset
> of pcidev and clear the vpci state of device. So that once the
> state of device is reset on dom0 side, dom0 can call this
> hypercall to notify hypervisor.
> 
> The behavior of different reset types may be different in the
> future, so divide them now so that they can be easily modified
> in the future without affecting the hypercall interface.
> 
> Signed-off-by: Jiqian Chen 
> Signed-off-by: Huang Rui 
> Signed-off-by: Jiqian Chen 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [XEN PATCH v14 3/5] x86/domctl: Add hypercall to set the access of x86 gsi

2024-09-09 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 03:04:22PM +0800, Jiqian Chen wrote:
> Some type of domains don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq to set the access of irq, it is not suitable for
> dom0 that doesn't have PIRQs.

I think the above commit message is a bit confusing, I would rather
write it as:

x86/irq: allow setting IRQ permissions from GSI instead of pIRQ

Some domains are not aware of the pIRQ abstraction layer that maps
interrupt sources into Xen space interrupt numbers.  pIRQs values are
only exposed to domains that have the option to route physical
interrupts over event channels.

This creates issues for PCI-passthrough from a PVH domain, as some of
the passthrough related hypercalls use pIRQ as references to physical
interrupts on the system.  One of such interfaces is
XEN_DOMCTL_irq_permission, used to grant or revoke access to
interrupts, takes a pIRQ as the reference to the interrupt to be
adjusted.

Since PVH doesn't manage interrupts in terms of pIRQs, introduce a new
hypercall that allows setting interrupt permissions based on GSI value
rather than pIRQ.

Note the GSI hypercall parameters is translated to an IRQ value (in
case there are ACPI overrides) before doing the checks.

> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/revoke
> the permission of irq (translated from x86 gsi) to dumU when dom0
> has no PIRQs.
> 
> Regarding the translation from gsi to irq, it is that if there are
> ACPI overrides entries then get translation from them, if not gsi
> are identity mapped into irq.
> 
> Signed-off-by: Jiqian Chen 
> Signed-off-by: Huang Rui 
> Signed-off-by: Jiqian Chen 
> ---
> CC: Daniel P . Smith 
> Remaining unsolved comment @Daniel P . Smith:
> +ret = -EPERM;
> +if ( !irq_access_permitted(currd, irq) ||
> + xsm_irq_permission(XSM_HOOK, d, irq, flags) )
> +break;
> Is it okay to issue the XSM check using the translated value(irq),
> not the one(gsi) that was originally passed into the hypercall?
> ---
> v13->v14 changes:
> No.
> 
> v12->v13 changes:
> For struct xen_domctl_gsi_permission, rename "access_flag" to "flags", change 
> its type from uint8_t to uint32_t, delete "pad", add XEN_DOMCTL_GSI_REVOKE 
> and XEN_DOMCTL_GSI_GRANT macros.
> Move "gsi > highest_gsi()" into function gsi_2_irq.
> Modify parameter gsi in function gsi_2_irq and mp_find_ioapic to unsigned int 
> type.
> Delete unnecessary spaces and brackets around "~XEN_DOMCTL_GSI_ACTION_MASK".
> Delete unnecessary goto statements and change to direct break.
> Add description in commit message to explain how gsi to irq isconverted.
> 
> v11->v12 changes:
> Change nr_irqs_gsi to highest_gsi() to check gsi boundary, then need to 
> remove "__init" of highest_gsi function.
> Change the check of irq boundary from <0 to <=0, and remove unnecessary space.
> Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
> 
> v10->v11 changes:
> Extracted from patch#5 of v10 into a separate patch.
> Add non-zero judgment for other bits of allow_access.
> Delete unnecessary judgment "if ( is_pv_domain(currd) || has_pirq(currd) )".
> Change the error exit path identifier "out" to "gsi_permission_out".
> Use ARRAY_SIZE() instead of open coed.
> 
> v9->v10 changes:
> Modified the commit message to further describe the purpose of adding 
> XEN_DOMCTL_gsi_permission.
> Added a check for all zeros in the padding field in 
> XEN_DOMCTL_gsi_permission, and used currd instead of current->domain.
> In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of the 
> original new code, and error handling for irq0 was added.
> Deleted the extra spaces in the upper and lower lines of the struct 
> xen_domctl_gsi_permission definition.
> 
> v8->v9 changes:
> Change the commit message to describe more why we need this new hypercall.
> Add comment above "if ( is_pv_domain(current->domain) || 
> has_pirq(current->domain) )" to explain why we need this check.
> Add gsi_2_irq to transform gsi to irq, instead of considering gsi == irq.
> Add explicit padding to struct xen_domctl_gsi_permission.
> 
> v5->v8 changes:
> Nothing.
> 
> v4->v5 changes:
> New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant 
> gsi.
> ---
>  xen/arch/x86/domctl.c  | 29 +
>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>  xen/arch/x86/io_apic.c | 21 +
>  xen/arch/x86/mpparse.c |  7 +++
>  xen/include/public/domctl.h| 10 ++
>  xen/xsm/flask/hooks.c  |  1 +
>  6 files changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/do

Re: [PATCH for-4.19 v2] x86/spec-ctrl: Support for SRSO_US_NO and SRSO_MSR_FIX

2024-09-04 Thread Roger Pau Monné

On Wed, Jun 19, 2024 at 08:10:57PM +0100, Andrew Cooper wrote:
> AMD have updated the SRSO whitepaper[1] with further information.
> 
> There's a new SRSO_U/S_NO enumeration saying that SRSO attacks can't cross the
> user/supervisor boundary.  i.e. Xen don't need to use IBPB-on-entry for PV.
> 
> There's also a new SRSO_MSR_FIX identifying that the BP_SPEC_REDUCE bit is
> available in MSR_BP_CFG.  When set, SRSO attacks can't cross the host/guest
> boundary.  i.e. Xen don't need to use IBPB-on-entry for HVM.
> 
> Extend ibpb_calculations() to account for these when calculating
> opt_ibpb_entry_{pv,hvm} defaults.  Add a bp-spec-reduce option to control the
> use of BP_SPEC_REDUCE, but activate it by default.
> 
> Because MSR_BP_CFG is core-scoped with a race condition updating it, repurpose
> amd_check_erratum_1485() into amd_check_bp_cfg() and calculate all updates at
> once.
> 
> Advertise SRSO_U/S_NO to guests whenever possible, as it allows the guest
> kernel to skip SRSO protections too.  This is easy for HVM guests, but hard
> for PV guests, as both the guest userspace and kernel operate in CPL3.  After
> discussing with AMD, it is believed to be safe to advertise SRSO_U/S_NO to PV
> guests when BP_SPEC_REDUCE is active.
> 
> Fix a typo in the SRSO_NO's comment.
> 
> [1] 
> https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Oleksii Kurochko 
> 
> v2:
>  * Add "for HVM guests" to xen-command-line.pandoc
>  * Print details on boot
>  * Don't advertise SRSO_US_NO to PV guests if BP_SPEC_REDUCE isn't active.
> 
> For 4.19.  This should be no functional change on current hardware.  On
> forthcoming hardware, it avoids the substantial perf hits which were necessary
> to protect against the SRSO speculative vulnerability.
> ---
>  docs/misc/xen-command-line.pandoc   |  9 +++-
>  xen/arch/x86/cpu-policy.c   | 19 
>  xen/arch/x86/cpu/amd.c  | 29 +---
>  xen/arch/x86/include/asm/msr-index.h|  1 +
>  xen/arch/x86/include/asm/spec_ctrl.h|  1 +
>  xen/arch/x86/spec_ctrl.c| 49 -
>  xen/include/public/arch-x86/cpufeatureset.h |  4 +-
>  7 files changed, 92 insertions(+), 20 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc 
> b/docs/misc/xen-command-line.pandoc
> index 1dea7431fab6..88beb64525d5 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2390,7 +2390,7 @@ By default SSBD will be mitigated at runtime (i.e 
> `ssbd=runtime`).
>  >  {ibrs,ibpb,ssbd,psfd,
>  >  eager-fpu,l1d-flush,branch-harden,srb-lock,
>  >  unpriv-mmio,gds-mit,div-scrub,lock-harden,
> ->  bhi-dis-s}= ]`
> +>  bhi-dis-s,bp-spec-reduce}= ]`
>  
>  Controls for speculative execution sidechannel mitigations.  By default, Xen
>  will pick the most appropriate mitigations based on compiled in support,
> @@ -2539,6 +2539,13 @@ boolean can be used to force or prevent Xen from using 
> speculation barriers to
>  protect lock critical regions.  This mitigation won't be engaged by default,
>  and needs to be explicitly enabled on the command line.
>  
> +On hardware supporting SRSO_MSR_FIX, the `bp-spec-reduce=` option can be used
> +to force or prevent Xen from using MSR_BP_CFG.BP_SPEC_REDUCE to mitigate the
> +SRSO (Speculative Return Stack Overflow) vulnerability.  Xen will use
> +bp-spec-reduce when available, as it is preferable to using `ibpb-entry=hvm`
> +to mitigate SRSO for HVM guests, and because it is a necessary prerequisite 
> in
> +order to advertise SRSO_U/S_NO to PV guests.
> +
>  ### sync_console
>  > `= `
>  
> diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
> index 304dc20cfab8..fd32fe84 100644
> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  struct cpu_policy __read_mostly   raw_cpu_policy;
> @@ -605,6 +606,24 @@ static void __init calculate_pv_max_policy(void)
>  __clear_bit(X86_FEATURE_IBRS, fs);
>  }
>  
> +/*
> + * SRSO_U/S_NO means that the CPU is not vulnerable to SRSO attacks 
> across
> + * the User (CPL3)/Supervisor (CPL<3) boundary.  However the PV64
> + * user/kernel boundary is CPL3 on both sides, so it won't convey the
> + * meaning that a PV kernel expects.
> + *
> + * PV3

Re: [PATCH v3 7/7] x86/time: probe the CMOS RTC by default

2024-09-04 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 05:48:09PM +0200, Jan Beulich wrote:
> On 03.09.2024 15:03, Roger Pau Monne wrote:
> > Probing for the CMOS RTC registers consist in reading IO ports, and we 
> > expect
> > those reads to have no side effects even when the CMOS RTC is not present.
> 
> But what do we gain from this besides possible being slower to boot?

The intent is that Xen can successfully boot on more systems without
having to pass specific command line options.

> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -326,11 +326,14 @@ Interrupts.  Specifying zero disables CMCI handling.
> >  ### cmos-rtc-probe (x86)
> >  > `= `
> >  
> > -> Default: `false`
> > +> Default: `true`
> >  
> >  Flag to indicate whether to probe for a CMOS Real Time Clock irrespective 
> > of
> >  ACPI indicating none to be there.
> >  
> > +**WARNING: The `cmos-rtc-probe` option is deprecated and superseded by
> > +_wallclock=no-cmos-probe_ using both options in combination is undefined.**
> 
> Hmm, but then ...
> 
> > @@ -2822,7 +2825,7 @@ suboptimal scheduling decisions, but only when the 
> > system is
> >  oversubscribed (i.e., in total there are more vCPUs than pCPUs).
> >  
> >  ### wallclock (x86)
> > -> `= auto | xen | cmos | efi`
> > +> `= auto | xen | cmos | no-cmos-probe | efi`
> 
> ... this wants to be a boolean sub-option "cmos-probe", such that the flag
> can still be set both ways (in particular for a later command line option
> to override an earlier one).

What's the point in overriding?  Either the users selects a specific
wallclock to use, or it's left for Xen to decide which wallclock to
pick, either with (auto) or without (no-cmos-probe) possibly probing
the CMOS RTC.

Multiple different wallclock options being passed on the command line
will result in just the last one taking effect.

> > @@ -2836,6 +2839,11 @@ Allow forcing the usage of a specific wallclock 
> > source.
> >  
> >   * `cmos` force usage of the CMOS RTC wallclock.
> >  
> > + * `no-cmos-probe` do not probe for the CMOS RTC presence if the ACPI FADT
> > +   table signals there's no CMOS RTC.  Implies using the same heuristics as
> > +   the `auto` option.  By default Xen will probe for the CMOS RTC presence
> > +   even when ACPI FADT signals no CMOS RTC available.
> 
> "By default ..." reads as if this would always occur, which I don't think
> is the case.

Hm, not when using the Xen timer source indeed, there's no probing
then.

> > @@ -1560,6 +1560,8 @@ static int __init cf_check parse_wallclock(const char 
> > *arg)
> >  if ( !arg )
> >  return -EINVAL;
> >  
> > +cmos_rtc_probe = true;
> > +
> >  if ( !strcmp("auto", arg) )
> >  wallclock_source = WALLCLOCK_UNSET;
> >  else if ( !strcmp("xen", arg) )
> > @@ -1571,6 +1573,8 @@ static int __init cf_check parse_wallclock(const char 
> > *arg)
> >  }
> >  else if ( !strcmp("cmos", arg) )
> >  wallclock_source = WALLCLOCK_CMOS;
> > +else if ( !strcmp("no-cmos-probe", arg) )
> > +cmos_rtc_probe = false;
> >  else if ( !strcmp("efi", arg) )
> >  {
> >  if ( !efi_enabled(EFI_RS) )
> 
> And to request a particular wallclock _and_ control the probing one then
> needs two wallclock= on the command line? And - because of the forcing to
> true of cmos_rtc_probe - even in a particular order. Not very nice from a
> usability pov.

If you request a specific wallclock then there's no probing, so
nothing to control.  I agree the interface is not great, but I
couldn't come up with anything better.

I'm kind of fine with not introducing an extra option to wallclock= to
control the CMOS RTC probing, but would you agree to switching
cmos-rtc-probe to true by default?

Thanks, Roger.

Re: [PATCH v3 4/7] x86/time: introduce probing logic for the wallclock

2024-09-04 Thread Roger Pau Monné

On Wed, Sep 04, 2024 at 01:49:36PM +0200, Jan Beulich wrote:
> On 04.09.2024 12:58, Roger Pau Monné wrote:
> > I had it that way originally, but then it seemed the extra
> > indentation made it less readable.  Will see how can I adjust it, my
> > preference would be for:
> > 
> > panic("No usable wallclock found, probed:%s%s%s\n%s",
> >   !cmos_rtc_probe && !efi_enabled(EFI_RS) ? " None" : "",
> >   cmos_rtc_probe ? " CMOS" : "",
> >   efi_enabled(EFI_RS) ? " EFI" : "",
> >   !cmos_rtc_probe ? "Try with command line option 
> > \"cmos-rtc-probe\"\n"
> >   : !efi_enabled(EFI_RS) ? "System must be booted 
> > from EFI\n"
> >  : "");
> > 
> > But that exceeds the 80 columns limit.
> 
> Right, formally the above would be my preference, too. Here two shorter-
> lines alternatives:
> 
> panic("No usable wallclock found, probed:%s%s%s\n%s",
>   !cmos_rtc_probe && !efi_enabled(EFI_RS) ? " None" : "",
>   cmos_rtc_probe ? " CMOS" : "",
>   efi_enabled(EFI_RS) ? " EFI" : "",
>   !cmos_rtc_probe
>   ? "Try with command line option \"cmos-rtc-probe\"\n"
>   : !efi_enabled(EFI_RS) ? "System must be booted from EFI\n"
>  : "");
> 
> panic("No usable wallclock found, probed:%s%s%s\n%s",
>   !cmos_rtc_probe && !efi_enabled(EFI_RS) ? " None" : "",
>   cmos_rtc_probe ? " CMOS" : "",
>   efi_enabled(EFI_RS) ? " EFI" : "",
>   !cmos_rtc_probe
>   ? "Try with command line option \"cmos-rtc-probe\"\n"
>   : !efi_enabled(EFI_RS)
>   ? "System must be booted from EFI\n"
>   : "");
> 
> Either of these or anything more or less similar will do imo, just as
> long as the ? vs : alignment is there.

I think I prefer the second variant, as indentation is clearer there.

> 
> One thing I notice only now: The trailing %s will be a little odd if
> the "" variant is used in the last argument. That'll produce "(XEN) "
> with nothing following in the log. Which usually is a sign of some
> strange breakage.

I've tested this and it doesn't produce an extra newline if the string
parameter is "".  IOW:

printk("FOO\n%s", "");

Results in:

(XEN) [2.230603] TSC deadline timer enabled
(XEN) [2.235654] FOO
(XEN) [2.238682] Wallclock source: EFI

Thanks, Roger.

Re: [PATCH v3 4/7] x86/time: introduce probing logic for the wallclock

2024-09-04 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 05:32:27PM +0200, Jan Beulich wrote:
> On 03.09.2024 15:03, Roger Pau Monne wrote:
> > Adding such probing allows to clearly separate init vs runtime code, and to
> > place the probing logic into the init section for the CMOS case.  Note both
> > the Xen shared_info page wallclock, and the EFI wallclock don't really have 
> > any
> > probing-specific logic.  The shared_info wallclock will always be there if
> > booted as a Xen guest, while the EFI_GET_TIME method probing relies on 
> > checking
> > if it returns a value different than 0.
> > 
> > The panic message printed when Xen is unable to find a viable wallclock 
> > source
> > has been adjusted slightly, I believe the printed guidance still provides 
> > the
> > same amount of information to the user.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> Looks a little involved, but I'm largely fine with it; just a couple of
> more or less cosmetic remarks:
> 
> > @@ -1329,28 +1338,13 @@ static bool cmos_probe(struct rtc_time *rtc_p, bool 
> > cmos_rtc_probe)
> >  return false;
> >  }
> >  
> > -static unsigned long get_cmos_time(void)
> > +
> > +static unsigned long cmos_read(void)
> >  {
> > -unsigned long res;
> >  struct rtc_time rtc;
> > -static bool __read_mostly cmos_rtc_probe;
> > -boolean_param("cmos-rtc-probe", cmos_rtc_probe);
> > +bool success = __get_cmos_time(&rtc);
> >  
> > -if ( efi_enabled(EFI_RS) )
> > -{
> > -res = efi_get_time();
> > -if ( res )
> > -return res;
> > -}
> > -
> > -if ( likely(!(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC)) )
> > -cmos_rtc_probe = false;
> > -else if ( system_state < SYS_STATE_smp_boot && !cmos_rtc_probe )
> > -panic("System with no CMOS RTC advertised must be booted from EFI"
> > -  " (or with command line option \"cmos-rtc-probe\")\n");
> > -
> > -if ( !cmos_probe(&rtc, cmos_rtc_probe) )
> > -panic("No CMOS RTC found - system must be booted from EFI\n");
> > +ASSERT(success);
> 
> I'm not convinced of this assertion: It's either too much (compared to
> what we had so far) or not enough, considering the behavior ...
> 
> >  return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
> >  }
> 
> ... with a release build.

My reasoning was that on a debug build we want to spot any such
issues (as it's likely a symptom the RTC is misbehaving?) but on a release
build we should rather return an incorrect wallclock time rather than
panicking.  I can remove the ASSERT and local variable altogether if
you prefer.

> 
> > @@ -1533,12 +1527,82 @@ void rtc_guest_write(unsigned int port, unsigned 
> > int data)
> >  }
> >  }
> >  
> > -static unsigned long get_wallclock_time(void)
> > +static enum {
> > +WALLCLOCK_UNSET,
> > +WALLCLOCK_XEN,
> > +WALLCLOCK_CMOS,
> > +WALLCLOCK_EFI,
> > +} wallclock_source __ro_after_init;
> > +
> > +static const char *wallclock_type_to_string(void)
> 
> __init ?
> 
> >  {
> > +switch ( wallclock_source )
> > +{
> > +case WALLCLOCK_XEN:
> > +return "XEN";
> > +
> > +case WALLCLOCK_CMOS:
> > +return "CMOS RTC";
> > +
> > +case WALLCLOCK_EFI:
> > +return "EFI";
> > +
> > +case WALLCLOCK_UNSET:
> > +return "UNSET";
> > +}
> > +
> > +ASSERT_UNREACHABLE();
> > +return "";
> > +}
> > +
> > +static void __init probe_wallclock(void)
> > +{
> > +ASSERT(wallclock_source == WALLCLOCK_UNSET);
> > +
> >  if ( xen_guest )
> > +{
> > +wallclock_source = WALLCLOCK_XEN;
> > +return;
> > +}
> > +if ( efi_enabled(EFI_RS) && efi_get_time() )
> > +{
> > +wallclock_source = WALLCLOCK_EFI;
> > +return;
> > +}
> > +if ( cmos_probe() )
> > +{
> > +wallclock_source = WALLCLOCK_CMOS;
> > +return;
> > +}
> > +
> > +panic("No usable wallclock found, probed:%s%s%s\n%s",
> > +  !cmos_rtc_probe && !efi_enabled(EFI_RS) ? " None" : "",
> > +  cmos_rtc_probe ? " CMOS" : "

Re: [PATCH v3 3/7] x86/time: split CMOS read and probe logic into function

2024-09-04 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 05:16:44PM +0200, Jan Beulich wrote:
> On 03.09.2024 15:02, Roger Pau Monne wrote:
> > The current logic to probe for the CMOS RTC is open-coded in 
> > get_cmos_time(),
> > move it to a separate function that both serves the purpose of testing for 
> > the
> > CMOS RTC existence and returning its value.
> > 
> > The goal is to be able to split the probing and the reading logic into 
> > separate
> > helpers, and putting the current logic in a separate function helps 
> > simplifying
> > further changes.
> > 
> > A transient *rtc_p variable is introduced as a parameter to the function, 
> > that
> > will be removed by further changes.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> This looks like a straight transformation, except - as noted before - for ...
> 
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1292,45 +1292,32 @@ static bool __get_cmos_time(struct rtc_time *rtc)
> >  return t1 <= SECONDS(1) && t2 < MILLISECS(3);
> >  }
> >  
> > -static unsigned long get_cmos_time(void)
> > +static bool cmos_probe(struct rtc_time *rtc_p, bool cmos_rtc_probe)
> >  {
> > -unsigned long res;
> > -struct rtc_time rtc;
> >  unsigned int seconds = 60;
> > -static bool __read_mostly cmos_rtc_probe;
> > -boolean_param("cmos-rtc-probe", cmos_rtc_probe);
> > -
> > -if ( efi_enabled(EFI_RS) )
> > -{
> > -res = efi_get_time();
> > -if ( res )
> > -return res;
> > -}
> > -
> > -if ( likely(!(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC)) )
> > -cmos_rtc_probe = false;
> > -else if ( system_state < SYS_STATE_smp_boot && !cmos_rtc_probe )
> > -panic("System with no CMOS RTC advertised must be booted from EFI"
> > -  " (or with command line option \"cmos-rtc-probe\")\n");
> >  
> >  for ( ; ; )
> >  {
> > -bool success = __get_cmos_time(&rtc);
> > +bool success = __get_cmos_time(rtc_p);
> > +struct rtc_time rtc = *rtc_p;
> >  
> > -if ( likely(!cmos_rtc_probe) || !success ||
> > +if ( likely(!cmos_rtc_probe) )
> > +return true;
> > +
> > +if ( !success ||
> >   rtc.sec >= 60 || rtc.min >= 60 || rtc.hour >= 24 ||
> >   !rtc.day || rtc.day > 31 ||
> >   !rtc.mon || rtc.mon > 12 )
> > -break;
> > +return false;
> >  
> >  if ( seconds < 60 )
> >  {
> >  if ( rtc.sec != seconds )
> >  {
> > -cmos_rtc_probe = false;
> 
> ... the removal of this line. As asked for before, can the somewhat 
> sub-optimal
> new behavior (with the static, which now lives in another function, being
> cleared only the next time round) please be at least mentioned in the
> description?

Sure, sorry I've failed to do this here.  Such weird behavior is
transient, and will go away wit the next patch IIRC, once the probing
is properly split from the run-time reading of the CMOS RTC.

> >  acpi_gbl_FADT.boot_flags &= ~ACPI_FADT_NO_CMOS_RTC;
> > +return true;
> >  }
> > -break;
> > +return false;
> >  }
> >  
> >  process_pending_softirqs();
> > @@ -1338,7 +1325,31 @@ static unsigned long get_cmos_time(void)
> >  seconds = rtc.sec;
> >  }
> >  
> > -if ( unlikely(cmos_rtc_probe) )
> > +ASSERT_UNREACHABLE();
> > +return false;
> > +}
> > +
> > +static unsigned long get_cmos_time(void)
> > +{
> > +unsigned long res;
> > +struct rtc_time rtc;
> > +static bool __read_mostly cmos_rtc_probe;
> > +boolean_param("cmos-rtc-probe", cmos_rtc_probe);
> > +
> > +if ( efi_enabled(EFI_RS) )
> > +{
> > +res = efi_get_time();
> > +if ( res )
> > +return res;
> > +}
> > +
> > +if ( likely(!(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC)) )
> > +cmos_rtc_probe = false;
> > +else if ( system_state < SYS_STATE_smp_boot && !cmos_rtc_probe )
> > +panic("System with no CMOS RTC advertised must be booted from EFI"
> > +  " (or with

Re: [PATCH] blkif: reconcile protocol specification with in-use implementations

2024-09-04 Thread Roger Pau Monné

On Wed, Sep 04, 2024 at 09:35:40AM +, Anthony PERARD wrote:
> On Wed, Sep 04, 2024 at 11:11:51AM +0200, Roger Pau Monné wrote:
> > On Wed, Sep 04, 2024 at 09:39:17AM +0100, Paul Durrant wrote:
> > > On 04/09/2024 09:21, Roger Pau Monné wrote:
> > > > > In the absence of that I'm afraid it is a little harder to
> > > > > judge whether the proposal here is the best we can do at this point.
> > > >
> > > > While I don't mind looking at what we can do to better handle 4K
> > > > sector disks, we need IMO to revert to the specification before
> > > > 67e1c050e36b, as that change switched the hardcoded sector based units
> > > > from 512 to 'sector-size', thus breaking the existing ABI.
> > > >
> > >
> > > But that's the crux of the problem. What *is* is the ABI? We apparently
> > > don't have one that all OS subscribe to.
> >
> > At least prior to 67e1c050e36b the specification in blkif.h and (what
> > I consider) the reference implementation in Linux blk{front,back}
> > matched.  Previous to 67e1c050e36b blkif.h stated:
> >
> > /*
> >  * NB. first_sect and last_sect in blkif_request_segment, as well as
> >  * sector_number in blkif_request, are always expressed in 512-byte units.
> >  * However they must be properly aligned to the real sector size of the
> >  * physical disk, which is reported in the "physical-sector-size" node in
> >  * the backend xenbus info. Also the xenbus "sectors" node is expressed in
> >  * 512-byte units.
> >  */
> >
> > I think it was quite clear, and does in fact match the implementation
> > in Linux.
> 
> That's wrong, Linux doesn't match the specification before 67e1c050e36b,
> in particular for "sectors":
> 
> sectors
>  Values: 
> 
>  The size of the backend device, expressed in units of its logical
>  sector size ("sector-size").

This was a bug introduced in 2fa701e5346d.  The 'random' comment that
you mention notes that 'sectors' is unconditionally expressed in
512-byte units was added way before, in d05ae13188231.  The improved
documentation added by 2fa701e5346d missed to correctly reflect the
units of the 'sectors' node.

> 
> The only implementation that matches this specification is MiniOS (and
> OMVF).
> 
> Oh, I didn't notice that that random comment you quoted that comes from
> the middle of the header have a different definition for "sectors" ...
> 
> Well, the specification doesn't match with the specification ... and the
> only possible way to implement the specification is to only ever set
> "sector-size" to 512...
> 
> No wonder that they are so many different interpretation of the
> protocol.

My opinion is that there was a bug introduced in the specification in
2fa701e5346d, and that bug was extended by 67e1c050e36b to even more
fields.

Implementations should be fixed to adhere to the specification as it
was pre 2fa701e5346d, because that works correctly with 'sector-size'
!= 512, and is the one implemented in Linux blkfront and blkback.

There's no need to make this more complicated than it is.  We
introduced bugs in blkif.h, and those need to be fixed.  It's sad that
those bugs propagated into implementations, or that bugs from
implementations propagated into blkif.h.

I don't see an option where we get to keep our current diverging
implementations and still support 4K logical sector disks without
specification and code changes.  We could introduce a new way to
signal 4K logical sector sizes, but as that will require modifications
to every frontends and backend we might as well just fix the existing
mess and modify the implementations as required.

Thanks, Roger.

Re: [PATCH] blkif: reconcile protocol specification with in-use implementations

2024-09-04 Thread Roger Pau Monné

On Wed, Sep 04, 2024 at 11:31:08AM +0200, Jan Beulich wrote:
> On 04.09.2024 10:21, Roger Pau Monné wrote:
> > On Tue, Sep 03, 2024 at 04:36:37PM +0200, Jan Beulich wrote:
> >> On 03.09.2024 16:19, Roger Pau Monne wrote:
> >>> Current blkif implementations (both backends and frontends) have all 
> >>> slight
> >>> differences about how they handle the 'sector-size' xenstore node, and how
> >>> other fields are derived from this value or hardcoded to be expressed in 
> >>> units
> >>> of 512 bytes.
> >>>
> >>> To give some context, this is an excerpt of how different implementations 
> >>> use
> >>> the value in 'sector-size' as the base unit for to other fields rather 
> >>> than
> >>> just to set the logical sector size of the block device:
> >>>
> >>> │ sectors xenbus node │ requests sector_number │ 
> >>> requests {first,last}_sect
> >>> ┼─┼┼───
> >>> FreeBSD blk{front,back} │ sector-size │  sector-size   │  
> >>>  512
> >>> ┼─┼┼───
> >>> Linux blk{front,back}   │ 512 │  512   │  
> >>>  512
> >>> ┼─┼┼───
> >>> QEMU blkback│ sector-size │  sector-size   │  
> >>>  sector-size
> >>> ┼─┼┼───
> >>> Windows blkfront│ sector-size │  sector-size   │  
> >>>  sector-size
> >>> ┼─┼┼───
> >>> MiniOS  │ sector-size │  512   │  
> >>>  512
> >>>
> >>> An attempt was made by 67e1c050e36b in order to change the base units of 
> >>> the
> >>> request fields and the xenstore 'sectors' node.  That however only lead 
> >>> to more
> >>> confusion, as the specification now clearly diverged from the reference
> >>> implementation in Linux.  Such change was only implemented for QEMU Qdisk
> >>> and Windows PV blkfront.
> >>>
> >>> Partially revert to the state before 67e1c050e36b:
> >>>
> >>>  * Declare 'feature-large-sector-size' deprecated.  Frontends should not 
> >>> expose
> >>>the node, backends should not make decisions based on its presence.
> >>>
> >>>  * Clarify that 'sectors' xenstore node and the requests fields are 
> >>> always in
> >>>512-byte units, like it was previous to 67e1c050e36b.
> >>>
> >>> All base units for the fields used in the protocol are 512-byte based, the
> >>> xenbus 'sector-size' field is only used to signal the logic block size.  
> >>> When
> >>> 'sector-size' is greater than 512, blkfront implementations must make 
> >>> sure that
> >>> the offsets and sizes (even when expressed in 512-byte units) are aligned 
> >>> to
> >>> the logical block size specified in 'sector-size', otherwise the backend 
> >>> will
> >>> fail to process the requests.
> >>>
> >>> This will require changes to some of the frontends and backends in order 
> >>> to
> >>> properly support 'sector-size' nodes greater than 512.
> >>>
> >>> Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of 
> >>> sector based quantities')
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> Following the earlier discussion, I was kind of hoping that there would be
> >> at least an outline of some plan here as to (efficiently) dealing with 4k-
> >> sector disks.
> > 
> > What do you mean with efficiently?
> > 
> > 4K disks will set 'sector-size' to 4096, so the segments setup by the
> > frontends in the requests will all be 4K aligned (both address and
> > size).
> 
> Will they, despite granularity then being 512b?

The added text to blkif.h states:

"However the value in

Re: [PATCH v3 2/7] x86/time: move CMOS edge detection into read helper

2024-09-04 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 05:02:21PM +0200, Jan Beulich wrote:
> On 03.09.2024 15:02, Roger Pau Monne wrote:
> > Move the logic that ensures the CMOS RTC data is read just after it's been
> > updated into the __get_cmos_time() function that does the register reads.  
> > This
> > requires returning a boolean from __get_cmos_time() to signal whether the 
> > read
> > has been successfully performed after an update.
> 
> Considering the limited use of both function, that's probably fine a change
> to make, despite me otherwise thinking that this is the slightly wrong move.
> I'd generally expect __get_cmos_time() to be usable without that checking,
> so long as the results are interpreted appropriately.

I've expanded the commit message a bit to reflect what you mention
here.

> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1247,8 +1247,26 @@ struct rtc_time {
> >  unsigned int year, mon, day, hour, min, sec;
> >  };
> >  
> > -static void __get_cmos_time(struct rtc_time *rtc)
> > +static bool __get_cmos_time(struct rtc_time *rtc)
> >  {
> > +s_time_t start, t1, t2;
> > +unsigned long flags;
> > +
> > +spin_lock_irqsave(&rtc_lock, flags);
> > +
> > +/* read RTC exactly on falling edge of update flag */
> > +start = NOW();
> > +do { /* may take up to 1 second... */
> > +t1 = NOW() - start;
> > +} while ( !(CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) &&
> > +  t1 <= SECONDS(1) );
> > +
> > +start = NOW();
> > +do { /* must try at least 2.228 ms */
> > +t2 = NOW() - start;
> > +} while ( (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) &&
> > +  t2 < MILLISECS(3) );
> > +
> >  rtc->sec  = CMOS_READ(RTC_SECONDS);
> >  rtc->min  = CMOS_READ(RTC_MINUTES);
> >  rtc->hour = CMOS_READ(RTC_HOURS);
> > @@ -1268,11 +1286,15 @@ static void __get_cmos_time(struct rtc_time *rtc)
> >  
> >  if ( (rtc->year += 1900) < 1970 )
> >  rtc->year += 100;
> > +
> > +spin_unlock_irqrestore(&rtc_lock, flags);
> 
> Imo this unlock wants placing at least ahead of the if() in context. The
> lock needs to protect only the port accesses, not any of the calculations.

I could even cache the value of CMOS_READ(RTC_CONTROL) ahead of the
check, so that the drop could be dropped earlier, but I'm not going to
do it here.

Thanks, Roger.

Re: [PATCH v3 1/7] x86/time: introduce helper to fetch Xen wallclock when running as a guest

2024-09-04 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 04:48:41PM +0200, Jan Beulich wrote:
> On 03.09.2024 15:02, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -785,6 +785,31 @@ static struct platform_timesource 
> > __initdata_cf_clobber plt_xen_timer =
> >  .resume = resume_xen_timer,
> >  .counter_bits = 63,
> >  };
> > +
> > +static unsigned long read_xen_wallclock(void)
> > +{
> > +struct shared_info *sh_info = XEN_shared_info;
> > +uint32_t wc_version;
> > +uint64_t wc_sec;
> > +
> > +ASSERT(xen_guest);
> > +
> > +do {
> > +wc_version = sh_info->wc_version & ~1;
> > +smp_rmb();
> > +
> > +wc_sec  = sh_info->wc_sec;
> > +smp_rmb();
> > +} while ( wc_version != sh_info->wc_version );
> > +
> > +return wc_sec + read_xen_timer() / 10;
> > +}
> > +#else
> > +static unsigned long read_xen_wallclock(void)
> > +{
> > +ASSERT_UNREACHABLE();
> > +return 0;
> > +}
> >  #endif
> 
> I understand you try to re-use an existing #ifdef here, but I wonder if I
> could talk you into not doing so and instead placing the #ifdef inside the
> (then single) function body. Less redundancy, less room for mistakes /
> oversights.

I don't mind much, I've assumed reducing the number of #ifdefs blocks
was better, but I don't have a strong opinion.  Initially I just kept
the #ifdef block in get_wallclock_time().

Thanks, Roger.

Re: [PATCH v3] SUPPORT.md: split XSM from Flask

2024-09-04 Thread Roger Pau Monné

On Wed, Aug 14, 2024 at 09:44:11AM +0200, Jan Beulich wrote:
> XSM is a generic framework, which in particular is also used by SILO.
> With this it can't really be experimental: Arm mandates SILO for having
> a security supported configuration.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

> ---
> v3: Add explanations. Another terminology adjustment.
> v2: Terminology adjustments. Stronger description.
> 
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -769,13 +769,21 @@ Compile time disabled for ARM by default
>  
>  Status, x86: Supported, not security supported
>  
> -### XSM & FLASK
> +### XSM (Xen Security Module) Framework
> +
> +XSM is a security policy framework.  The dummy implementation is covered by 
> this
> +statement, and implements a policy whereby dom0 is all powerful.  See below 
> for
> +alternative modules (FLASK, SILO).
> +
> +Status: Supported
> +
> +### FLASK XSM Module
>  
>  Status: Experimental
>  
>  Compile time disabled by default.
>  
> -Also note that using XSM
> +Also note that using FLASK
>  to delegate various domain control hypercalls
>  to particular other domains, rather than only permitting use by dom0,
>  is also specifically excluded from security support for many hypercalls.
> @@ -788,6 +796,13 @@ Please see XSA-77 for more details.
>  The default policy includes FLASK labels and roles for a "typical" Xen-based 
> system
>  with dom0, driver domains, stub domains, domUs, and so on.
>  
> +### SILO XSM Module
> +
> +SILO implements a policy whereby DomU-s can only communicate with Dom0, yet 
> not
> +with each other.

Might be good to clarify SILO is just like the dummy XSM
implementation without allowing inter-domain communication, ie:

"SILO extends the dummy XSM policy by enforcing that DomU-s can only
communicate with Dom0, yet not with each other."

Or similar.

Thanks, Roger.

Re: [PATCH] blkif: reconcile protocol specification with in-use implementations

2024-09-04 Thread Roger Pau Monné

On Wed, Sep 04, 2024 at 09:39:17AM +0100, Paul Durrant wrote:
> On 04/09/2024 09:21, Roger Pau Monné wrote:
> > > In the absence of that I'm afraid it is a little harder to
> > > judge whether the proposal here is the best we can do at this point.
> > 
> > While I don't mind looking at what we can do to better handle 4K
> > sector disks, we need IMO to revert to the specification before
> > 67e1c050e36b, as that change switched the hardcoded sector based units
> > from 512 to 'sector-size', thus breaking the existing ABI.
> > 
> 
> But that's the crux of the problem. What *is* is the ABI? We apparently
> don't have one that all OS subscribe to.

At least prior to 67e1c050e36b the specification in blkif.h and (what
I consider) the reference implementation in Linux blk{front,back}
matched.  Previous to 67e1c050e36b blkif.h stated:

/*
 * NB. first_sect and last_sect in blkif_request_segment, as well as
 * sector_number in blkif_request, are always expressed in 512-byte units.
 * However they must be properly aligned to the real sector size of the
 * physical disk, which is reported in the "physical-sector-size" node in
 * the backend xenbus info. Also the xenbus "sectors" node is expressed in
 * 512-byte units.
 */

I think it was quite clear, and does in fact match the implementation
in Linux.

> From your findings it appears that the only thing that will work globally is
> to define 'sector-size' is strictly 512 and deprecate any large sector size
> support of any kind.

As said to Anthony, how do we deal with disks with 4K logical sectors?
I'm not really up for implementing read-modify-write in blkback on
Linux, as it would be complex, slow, and likely prone to errors.

We could introduce a new feature (`logical-sector-size` or some such?)
to expose a sector size != 4K, but we might as well just fix existing
implementations to match with the specification, as that's overall
less changes.

In kernel Linux blk{front,back} have always worked fine with 4K sector
disks, and did match the specification prior to 67e1c050e36b.  Let's
clarify the specification as required and fix the remaining front and
backends to adhere to it.

Thanks, Roger.

Re: [PATCH] blkif: reconcile protocol specification with in-use implementations

2024-09-04 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 04:36:37PM +0200, Jan Beulich wrote:
> On 03.09.2024 16:19, Roger Pau Monne wrote:
> > Current blkif implementations (both backends and frontends) have all slight
> > differences about how they handle the 'sector-size' xenstore node, and how
> > other fields are derived from this value or hardcoded to be expressed in 
> > units
> > of 512 bytes.
> > 
> > To give some context, this is an excerpt of how different implementations 
> > use
> > the value in 'sector-size' as the base unit for to other fields rather than
> > just to set the logical sector size of the block device:
> > 
> > │ sectors xenbus node │ requests sector_number │ 
> > requests {first,last}_sect
> > ┼─┼┼───
> > FreeBSD blk{front,back} │ sector-size │  sector-size   │
> >512
> > ┼─┼┼───
> > Linux blk{front,back}   │ 512 │  512   │
> >512
> > ┼─┼┼───
> > QEMU blkback│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > Windows blkfront│ sector-size │  sector-size   │
> >sector-size
> > ┼─┼┼───
> > MiniOS  │ sector-size │  512   │
> >512
> > 
> > An attempt was made by 67e1c050e36b in order to change the base units of the
> > request fields and the xenstore 'sectors' node.  That however only lead to 
> > more
> > confusion, as the specification now clearly diverged from the reference
> > implementation in Linux.  Such change was only implemented for QEMU Qdisk
> > and Windows PV blkfront.
> > 
> > Partially revert to the state before 67e1c050e36b:
> > 
> >  * Declare 'feature-large-sector-size' deprecated.  Frontends should not 
> > expose
> >the node, backends should not make decisions based on its presence.
> > 
> >  * Clarify that 'sectors' xenstore node and the requests fields are always 
> > in
> >512-byte units, like it was previous to 67e1c050e36b.
> > 
> > All base units for the fields used in the protocol are 512-byte based, the
> > xenbus 'sector-size' field is only used to signal the logic block size.  
> > When
> > 'sector-size' is greater than 512, blkfront implementations must make sure 
> > that
> > the offsets and sizes (even when expressed in 512-byte units) are aligned to
> > the logical block size specified in 'sector-size', otherwise the backend 
> > will
> > fail to process the requests.
> > 
> > This will require changes to some of the frontends and backends in order to
> > properly support 'sector-size' nodes greater than 512.
> > 
> > Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector 
> > based quantities')
> > Signed-off-by: Roger Pau Monné 
> 
> Following the earlier discussion, I was kind of hoping that there would be
> at least an outline of some plan here as to (efficiently) dealing with 4k-
> sector disks.

What do you mean with efficiently?

4K disks will set 'sector-size' to 4096, so the segments setup by the
frontends in the requests will all be 4K aligned (both address and
size).  Every segment will fill a full blkif_request_segment (making
the last_sect field kind of pointless).

> In the absence of that I'm afraid it is a little harder to
> judge whether the proposal here is the best we can do at this point.

While I don't mind looking at what we can do to better handle 4K
sector disks, we need IMO to revert to the specification before
67e1c050e36b, as that change switched the hardcoded sector based units
from 512 to 'sector-size', thus breaking the existing ABI.

Thanks, Roger.

Re: [PATCH v2] x86/boot: Fix include paths for 32bit objects

2024-09-03 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 02:01:45PM +0100, Andrew Cooper wrote:
> On 03/09/2024 1:43 pm, Roger Pau Monné wrote:
> > On Tue, Sep 03, 2024 at 12:53:34PM +0100, Andrew Cooper wrote:
> >> Most of Xen is build using -nostdinc and a fully specified include path.
> >> However, the makefile line:
> >>
> >>   $(head-bin-objs): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
> >>
> >> discards XEN_CFLAGS and replaces them with CFLAGS_x86_32.
> >>
> >> Reinstate -nostdinc, and copy all -I (and -Wa,-I) arguments from 
> >> XEN_CFLAGS.
> >>
> >> Signed-off-by: Andrew Cooper 
> > I wouldn't mind if you also open-coded the config.h -include addition
> > to CFLAGS_x86_32, regardless:
> >
> > Reviewed-by: Roger Pau Monné 
> 
> Thanks.
> 
> TBH, I'm going to put it in as is and unblock the fixes behind it.
> 
> We can adjust the others in due course.

Sure, if that allows you to unblock the rest.

> Given the other shuffling of headers we've done recently, I'm starting
> to think that the -include isn't really as needed as it might once have
> been.
> 
> > I do wonder however whether the explicit assembler includes parameters
> > (-Wa,-I) are actually required, seeing as we only provide include/ to
> > the assembler, but not the arch-specific include paths.
> >
> > This is from XSA-254, which used the '.include' asm directive, but
> > that was ultimately removed by:
> >
> > 762c3890c89f x86: fold indirect_thunk_asm.h into asm-defns.h
> >
> > So maybe the -Wa,-I is no longer needed?
> 
> Perhaps, although I'm struggling to find where it's declared today.

It's in xen/arch/x86/arch.mk.

Thanks, Roger.

Re: [PATCH v2] x86/boot: Fix include paths for 32bit objects

2024-09-03 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 12:53:34PM +0100, Andrew Cooper wrote:
> Most of Xen is build using -nostdinc and a fully specified include path.
> However, the makefile line:
> 
>   $(head-bin-objs): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
> 
> discards XEN_CFLAGS and replaces them with CFLAGS_x86_32.
> 
> Reinstate -nostdinc, and copy all -I (and -Wa,-I) arguments from XEN_CFLAGS.
> 
> Signed-off-by: Andrew Cooper 

I wouldn't mind if you also open-coded the config.h -include addition
to CFLAGS_x86_32, regardless:

Reviewed-by: Roger Pau Monné 

I do wonder however whether the explicit assembler includes parameters
(-Wa,-I) are actually required, seeing as we only provide include/ to
the assembler, but not the arch-specific include paths.

This is from XSA-254, which used the '.include' asm directive, but
that was ultimately removed by:

762c3890c89f x86: fold indirect_thunk_asm.h into asm-defns.h

So maybe the -Wa,-I is no longer needed?

Thanks, Roger.

Re: [PATCH] x86/boot: Reinstate -nostdinc for CFLAGS_x86_32

2024-09-03 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 12:51:28PM +0100, Andrew Cooper wrote:
> On 03/09/2024 12:20 pm, Anthony PERARD wrote:
> > On Tue, Sep 03, 2024 at 11:49:40AM +0100, Andrew Cooper wrote:
> >> Most of Xen is build using -nostdinc and a fully specified include path.
> >> However, the makefile line:
> >>
> >>   $(head-bin-objs): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
> >>
> >> discards XEN_CFLAGS and replaces them with CFLAGS_x86_32.
> >>
> >> Reinstate -nostdinc, and add the arch include path to the command line.  
> >> This
> >> is a latent bug for now, but it breaks properly with subsequent include
> >> changes.
> >>
> >> Fixes: d58a509e01c4 ("build,x86: remove the need for build32.mk")
> > I disagree with this. I'm pretty sure I've check that no command line
> > have changed.
> 
> Fine, I'll drop it.
> 
> >
> > Also, this commit shows that CFLAGS was only coming from root's
> > Config.mk:
> >> diff --git a/xen/arch/x86/boot/build32.mk b/xen/arch/x86/boot
> >> deleted file mode 100644
> >> index e90680cd9f..00
> >> --- a/xen/arch/x86/boot/build32.mk
> >> +++ /dev/null
> >> @@ -1,40 +0,0 @@
> >> -override XEN_TARGET_ARCH=x86_32
> >> -CFLAGS =
> >> -include $(XEN_ROOT)/Config.mk
> > So, I'm pretty sure, -nostdinc was never used before. But happy to be
> > told that I've come to the wrong conclusion. (We would need to check by
> > building with an old version without this commit to be sure.)
> 
> -nostdinc was added after the fact.  Which is fine.
> 
> But as a consequence of these being unlike the rest of Xen, somehow (and
> only on FreeBSD it seems), the regular build of Xen is fine, but
> tools/firmware/xen-dir for the shim is subject to -nostdinc in a way
> that breaks cmdline.c

FWIW, it did break for me on a normal build in xen/ on FreeBSD, no
need to run it from tools/firmware/xen-dir.

> > "xen/arch/x86/boot" as it's own sets of CFLAGS, which is annoying and I
> > haven't really tried to change that. This is also why XEN_CFLAGS is been
> > discarded.
> 
> This is necessary.  We're swapping -m64 for -m32 to build these two
> objects, and that invalidates a whole bunch of other CFLAGS.

See my previous reply, I guess figuring out which of those options also
need to be dropped as a result of dropping -m64 is likely too
complicated and fragile as we add new options.

Thanks, Roger.

Re: [PATCH] x86/boot: Reinstate -nostdinc for CFLAGS_x86_32

2024-09-03 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 12:16:33PM +0100, Andrew Cooper wrote:
> On 03/09/2024 12:08 pm, Andrew Cooper wrote:
> > On 03/09/2024 11:54 am, Roger Pau Monné wrote:
> >> On Tue, Sep 03, 2024 at 11:49:40AM +0100, Andrew Cooper wrote:
> >>> Most of Xen is build using -nostdinc and a fully specified include path.
> >>> However, the makefile line:
> >>>
> >>>   $(head-bin-objs): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
> >>>
> >>> discards XEN_CFLAGS and replaces them with CFLAGS_x86_32.
> >>>
> >>> Reinstate -nostdinc, and add the arch include path to the command line.  
> >>> This
> >>> is a latent bug for now, but it breaks properly with subsequent include
> >>> changes.
> >>>
> >>> Fixes: d58a509e01c4 ("build,x86: remove the need for build32.mk")
> >>> Signed-off-by: Andrew Cooper 
> >>> ---
> >>> CC: Jan Beulich 
> >>> CC: Roger Pau Monné 
> >>> CC: Anthony PERARD 
> >>> ---
> >>>  xen/arch/x86/boot/Makefile | 6 +++---
> >>>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
> >>> index 03d8ce3a9e48..19eec01e176e 100644
> >>> --- a/xen/arch/x86/boot/Makefile
> >>> +++ b/xen/arch/x86/boot/Makefile
> >>> @@ -13,11 +13,11 @@ $(obj)/head.o: $(head-bin-objs:.o=.bin)
> >>>  
> >>>  CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS))
> >>>  $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS))
> >>> -CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float
> >>> +CFLAGS_x86_32 += -Werror -nostdinc -fno-builtin -g0 -msoft-float
> >>>  ifneq ($(abs_objtree),$(abs_srctree))
> >>> -CFLAGS_x86_32 += -I$(objtree)/include
> >>> +CFLAGS_x86_32 += -I$(objtree)/include 
> >>> -I$(objtree)/arch/$(SRCARCH)/include
> >>>  endif
> >>> -CFLAGS_x86_32 += -I$(srctree)/include
> >>> +CFLAGS_x86_32 += -I$(srctree)/include 
> >>> -I$(srctree)/arch/$(SRCARCH)/include
> >> I think it might be best to just filter out the include paths from
> >> XEN_CFLAGS, iow:
> >>
> >> CFLAGS_x86_32 += $(filter -I%,$(XEN_CFLAGS))
> >>
> >> Instead of having to open-code the paths for the include search paths
> >> here again.  Using the filter leads to the following CC command line:
> >>
> >> clang -MMD -MP -MF arch/x86/boot/.cmdline.o.d -m32 -march=i686 -DBUILD_ID 
> >> -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes 
> >> -Wno-unused-but-set-variable -Wno-unused-local-typedefs   -fno-pie 
> >> -fno-stack-protector -fno-exceptions -fno-asynchronous-unwind-tables 
> >> -Werror -fno-builtin -g0 -msoft-float -mregparm=3 -I./include 
> >> -I./arch/x86/include -I./arch/x86/include/generated 
> >> -I./arch/x86/include/asm/mach-generic 
> >> -I./arch/x86/include/asm/mach-default -fpic 
> >> '-D__OBJECT_LABEL__=arch/x86/boot/cmdline.o'-c arch/x86/boot/cmdline.c 
> >> -o arch/x86/boot/.cmdline.o.tmp -MQ arch/x86/boot/cmdline.o
> > FWIW, https://cirrus-ci.com/build/5930269490806784 shows a good build on
> > FreeBSD with this patch in place.
> >
> > I'd be happy with that approach.  It's probably less fragile, although
> > I'll probably go with:
> >
> > CFLAGS_x86_32 += -nostdinc $(filter -I%,$(XEN_CFLAGS))
> >
> > to handle all the include changes together.  Lemme spin a v2.
> 
> Actually, it's not quite that easy.  From a regular Xen object file, we
> have:
> 
>  * -Wa,-I,./include (twice, for some reason).

There's a comment next to where this is added:

# Set up the assembler include path properly for older toolchains.

But won't we also need extra -Wa,-I for the other include paths that
are passed on the command line? (./arch/x86/include for example)

>  * -include ./include/xen/config.h

Hm, we could manually add that include option.

> 
> The former can be added to the filter reasonably easily, but the latter
> cannot.  I guess we've gone this long without it...

I've been also thinking, another approach we could take is filtering
out what we don't want, but I think that might end up being more
fragile.

Thanks, Roger.

Re: [PATCH] x86/boot: Reinstate -nostdinc for CFLAGS_x86_32

2024-09-03 Thread Roger Pau Monné

On Tue, Sep 03, 2024 at 11:49:40AM +0100, Andrew Cooper wrote:
> Most of Xen is build using -nostdinc and a fully specified include path.
> However, the makefile line:
> 
>   $(head-bin-objs): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
> 
> discards XEN_CFLAGS and replaces them with CFLAGS_x86_32.
> 
> Reinstate -nostdinc, and add the arch include path to the command line.  This
> is a latent bug for now, but it breaks properly with subsequent include
> changes.
> 
> Fixes: d58a509e01c4 ("build,x86: remove the need for build32.mk")
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Anthony PERARD 
> ---
>  xen/arch/x86/boot/Makefile | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
> index 03d8ce3a9e48..19eec01e176e 100644
> --- a/xen/arch/x86/boot/Makefile
> +++ b/xen/arch/x86/boot/Makefile
> @@ -13,11 +13,11 @@ $(obj)/head.o: $(head-bin-objs:.o=.bin)
>  
>  CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS))
>  $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS))
> -CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float
> +CFLAGS_x86_32 += -Werror -nostdinc -fno-builtin -g0 -msoft-float
>  ifneq ($(abs_objtree),$(abs_srctree))
> -CFLAGS_x86_32 += -I$(objtree)/include
> +CFLAGS_x86_32 += -I$(objtree)/include -I$(objtree)/arch/$(SRCARCH)/include
>  endif
> -CFLAGS_x86_32 += -I$(srctree)/include
> +CFLAGS_x86_32 += -I$(srctree)/include -I$(srctree)/arch/$(SRCARCH)/include

I think it might be best to just filter out the include paths from
XEN_CFLAGS, iow:

CFLAGS_x86_32 += $(filter -I%,$(XEN_CFLAGS))

Instead of having to open-code the paths for the include search paths
here again.  Using the filter leads to the following CC command line:

clang -MMD -MP -MF arch/x86/boot/.cmdline.o.d -m32 -march=i686 -DBUILD_ID 
-fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes 
-Wno-unused-but-set-variable -Wno-unused-local-typedefs   -fno-pie 
-fno-stack-protector -fno-exceptions -fno-asynchronous-unwind-tables -Werror 
-fno-builtin -g0 -msoft-float -mregparm=3 -I./include -I./arch/x86/include 
-I./arch/x86/include/generated -I./arch/x86/include/asm/mach-generic 
-I./arch/x86/include/asm/mach-default -fpic 
'-D__OBJECT_LABEL__=arch/x86/boot/cmdline.o'-c arch/x86/boot/cmdline.c -o 
arch/x86/boot/.cmdline.o.tmp -MQ arch/x86/boot/cmdline.o

Thanks, Roger.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3923 matches

Mail list logo