Re: [PATCH 1/7] mm: remove arch_flush_lazy_mmu_mode()
On Thu, Sep 04, 2025 at 01:57:30PM +0100, Kevin Brodsky wrote: > This function has only ever been used in arch/x86, so there is no > need for other architectures to implement it. Remove it from > linux/pgtable.h and all architectures besides x86. > > The arm64 implementation is not empty but it is only called from > arch_leave_lazy_mmu_mode(), so we can simply fold it there. > > Signed-off-by: Kevin Brodsky Acked-by: Mike Rapoport (Microsoft) > --- > arch/arm64/include/asm/pgtable.h | 9 + > arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 2 -- > arch/sparc/include/asm/tlbflush_64.h | 1 - > arch/x86/include/asm/pgtable.h | 3 ++- > include/linux/pgtable.h| 1 - > 5 files changed, 3 insertions(+), 13 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable.h > b/arch/arm64/include/asm/pgtable.h > index abd2dee416b3..728d7b6ed20a 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -101,21 +101,14 @@ static inline void arch_enter_lazy_mmu_mode(void) > set_thread_flag(TIF_LAZY_MMU); > } > > -static inline void arch_flush_lazy_mmu_mode(void) > +static inline void arch_leave_lazy_mmu_mode(void) > { > if (in_interrupt()) > return; > > if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) > emit_pte_barriers(); > -} > - > -static inline void arch_leave_lazy_mmu_mode(void) > -{ > - if (in_interrupt()) > - return; > > - arch_flush_lazy_mmu_mode(); > clear_thread_flag(TIF_LAZY_MMU); > } > > diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h > b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h > index 146287d9580f..176d7fd79eeb 100644 > --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h > +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h > @@ -55,8 +55,6 @@ static inline void arch_leave_lazy_mmu_mode(void) > preempt_enable(); > } > > -#define arch_flush_lazy_mmu_mode() do {} while (0) > - > extern void hash__tlbiel_all(unsigned int action); > > extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, > diff --git a/arch/sparc/include/asm/tlbflush_64.h > b/arch/sparc/include/asm/tlbflush_64.h > index 8b8cdaa69272..cd144eb31bdd 100644 > --- a/arch/sparc/include/asm/tlbflush_64.h > +++ b/arch/sparc/include/asm/tlbflush_64.h > @@ -44,7 +44,6 @@ void flush_tlb_kernel_range(unsigned long start, unsigned > long end); > void flush_tlb_pending(void); > void arch_enter_lazy_mmu_mode(void); > void arch_leave_lazy_mmu_mode(void); > -#define arch_flush_lazy_mmu_mode() do {} while (0) > > /* Local cpu only. */ > void __flush_tlb_all(void); > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index e33df3da6980..14fd672bc9b2 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -117,7 +117,8 @@ extern pmdval_t early_pmd_flags; > #define pte_val(x) native_pte_val(x) > #define __pte(x) native_make_pte(x) > > -#define arch_end_context_switch(prev)do {} while(0) > +#define arch_end_context_switch(prev)do {} while (0) > +#define arch_flush_lazy_mmu_mode() do {} while (0) > #endif /* CONFIG_PARAVIRT_XXL */ > > static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set) > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index 4c035637eeb7..8848e132a6be 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -234,7 +234,6 @@ static inline int pmd_dirty(pmd_t pmd) > #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE > #define arch_enter_lazy_mmu_mode() do {} while (0) > #define arch_leave_lazy_mmu_mode() do {} while (0) > -#define arch_flush_lazy_mmu_mode() do {} while (0) > #endif > > #ifndef pte_batch_hint > -- > 2.47.0 > -- Sincerely yours, Mike.
Re: [PATCH 2/7] mm: introduce local state for lazy_mmu sections
On Fri, Sep 05, 2025 at 12:14:39AM +0200, Kevin Brodsky wrote: > On 04/09/2025 19:28, Lorenzo Stoakes wrote: > > Hi Kevin, > > > > This is causing a build failure: > > > > In file included from ./include/linux/mm.h:31, > > from mm/userfaultfd.c:8: > > mm/userfaultfd.c: In function ‘move_present_ptes’: > > ./include/linux/pgtable.h:247:41: error: statement with no effect > > [-Werror=unused-value] > > 247 | #define arch_enter_lazy_mmu_mode() (LAZY_MMU_DEFAULT) > > | ^ > > mm/userfaultfd.c:1103:9: note: in expansion of macro > > ‘arch_enter_lazy_mmu_mode’ > > 1103 | arch_enter_lazy_mmu_mode(); > > | ^~~~ > > ./include/linux/pgtable.h:248:54: error: expected expression before ‘)’ > > token > > 248 | #define arch_leave_lazy_mmu_mode(state) ((void)(state)) > > | ^ > > mm/userfaultfd.c:1141:9: note: in expansion of macro > > ‘arch_leave_lazy_mmu_mode’ > > 1141 | arch_leave_lazy_mmu_mode(); > > | ^~~~ > > > > It seems you haven't carefully checked call sites here, please do very > > carefully recheck these - I see Yeoreum reported a mising kasan case, so I > > suggest you just aggressively grep this + make sure you've covered all > > bases :) > > I did check all call sites pretty carefully and of course build-tested, > but my series is based on v6.17-rc4 - just like the calls Yeoreum > mentioned, the issue is that those calls are in mm-stable but not in > mainline :/ I suppose I should post a v2 rebased on mm-stable ASAP then? You should really base on mm-new. You need to account for everything that is potentially going to go upstream. mm-stable is generally not actually populated all too well until shortly before merge window anyway. > > - Kevin Thanks, Lorenzo
[PATCH v7 03/12] xen/arm: vgic: implement helper functions for virq checks
Introduced two new helper functions for vGIC: vgic_is_valid_line and vgic_is_spi. The functions are similar to the newly introduced gic_is_valid_line and gic_is_spi, but they verify whether a vIRQ is available for a specific domain, while GIC-specific functions validate INTIDs for the real GIC hardware. For example, the GIC may support all 992 SPI lines, but the domain may use only some part of them (e.g., 640), depending on the highest IRQ number defined in the domain configuration. Therefore, for vGIC-related code and checks, the appropriate functions should be used. Also, updated the appropriate checks to use these new helper functions. The purpose of introducing new helper functions for vGIC is essentially the same as for GIC: to avoid potential confusion with GIC-related checks and to consolidate similar code into separate functions, which can be more easily extended by additional conditions, e.g., when implementing extended SPI interrupts. Only the validation change in vgic_inject_irq may affect existing functionality, as it currently checks whether the vIRQ is less than or equal to vgic_num_irqs. Since IRQ indexes start from 0 (where 32 is the first SPI), the check should behave consistently with similar logic in other places and should check if the vIRQ number is less than vgic_num_irqs. The remaining changes, which replace open-coded checks with the use of these new helper functions, do not introduce any functional changes, as the helper functions follow the current vIRQ index verification logic. Signed-off-by: Leonid Komarianskyi Reviewed-by: Oleksandr Tyshchenko Reviewed-by: Volodymyr Babchuk Acked-by: Julien Grall --- Changes in V7: - no changes Changes in V6: - no changes Changes in V5: - added reviewed-by from Oleksandr Tyshchenko and from Volodymyr Babchuk - added acked-by from Julien Grall Changes in V4: - removed redundant parentheses Changes in V3: - renamed vgic_is_valid_irq to vgic_is_valid_line and vgic_is_shared_irq to vgic_is_spi - added vgic_is_valid_line implementation for new-vgic, because vgic_is_valid_line is called from generic code. It is necessary to fix the build for new-vgic. - updated commit message Changes in V2: - introduced this patch --- xen/arch/arm/gic.c | 3 +-- xen/arch/arm/include/asm/vgic.h | 7 +++ xen/arch/arm/irq.c | 4 ++-- xen/arch/arm/vgic.c | 10 -- xen/arch/arm/vgic/vgic.c| 5 + 5 files changed, 23 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4bb11960ee..9469c9d08c 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -134,8 +134,7 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq, ASSERT(spin_is_locked(&desc->lock)); /* Caller has already checked that the IRQ is an SPI */ -ASSERT(virq >= 32); -ASSERT(virq < vgic_num_irqs(d)); +ASSERT(vgic_is_spi(d, virq)); ASSERT(!is_lpi(virq)); ret = vgic_connect_hw_irq(d, NULL, virq, desc, true); diff --git a/xen/arch/arm/include/asm/vgic.h b/xen/arch/arm/include/asm/vgic.h index 35c0c6a8b0..3e7cbbb196 100644 --- a/xen/arch/arm/include/asm/vgic.h +++ b/xen/arch/arm/include/asm/vgic.h @@ -335,6 +335,13 @@ extern void vgic_check_inflight_irqs_pending(struct vcpu *v, /* Default number of vGIC SPIs. 32 are substracted to cover local IRQs. */ #define VGIC_DEF_NR_SPIS (min(gic_number_lines(), VGIC_MAX_IRQS) - 32) +extern bool vgic_is_valid_line(struct domain *d, unsigned int virq); + +static inline bool vgic_is_spi(struct domain *d, unsigned int virq) +{ +return virq >= NR_LOCAL_IRQS && vgic_is_valid_line(d, virq); +} + /* * Allocate a guest VIRQ * - spi == 0 => allocate a PPI. It will be the same on every vCPU diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index 7dd5a2a453..b8eccfc924 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -442,7 +442,7 @@ int route_irq_to_guest(struct domain *d, unsigned int virq, unsigned long flags; int retval = 0; -if ( virq >= vgic_num_irqs(d) ) +if ( !vgic_is_valid_line(d, virq) ) { printk(XENLOG_G_ERR "the vIRQ number %u is too high for domain %u (max = %u)\n", @@ -560,7 +560,7 @@ int release_guest_irq(struct domain *d, unsigned int virq) int ret; /* Only SPIs are supported */ -if ( virq < NR_LOCAL_IRQS || virq >= vgic_num_irqs(d) ) +if ( !vgic_is_spi(d, virq) ) return -EINVAL; desc = vgic_get_hw_irq_desc(d, NULL, virq); diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index c563ba93af..2bbf4d99aa 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -24,6 +24,12 @@ #include #include + +bool vgic_is_valid_line(struct domain *d, unsigned int virq) +{ +return virq < vgic_num_irqs(d); +} + static inline struct vgic_irq_rank *vgic_get_rank(struct vcpu *v, unsigned int rank) { @@ -582,7 +588,7 @@ void vgic_inject_irq(
Re: [PATCH v6 02/15] xen/8250-uart: update definitions
Hi Denis, On Sat, Sep 6, 2025 at 2:27 AM wrote: > > From: Denis Mukhin > > Added missing definitions needed for NS16550 UART emulator. > > Newly introduced MSR definitions re-used in the existing ns16550 driver. > > Also, corrected FCR DMA definition bit#3 (0x08) as per: > https://www.ti.com/lit/ds/symlink/tl16c550c.pdf > See "7.7.2 FIFO Control Register (FCR)". > > Signed-off-by: Denis Mukhin > --- > Changes since v5: > - fixed commentaries > - Link to v5: > https://lore.kernel.org/xen-devel/20250828235409.2835815-3-dmuk...@ford.com/ > --- > xen/drivers/char/ns16550.c | 16 ++-- > xen/include/xen/8250-uart.h | 50 ++--- > 2 files changed, 49 insertions(+), 17 deletions(-) > > diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c > index df7fff7f81df..0e80fadbb894 100644 > --- a/xen/drivers/char/ns16550.c > +++ b/xen/drivers/char/ns16550.c > @@ -388,7 +388,7 @@ static void __init cf_check ns16550_init_preirq(struct > serial_port *port) > > /* Check this really is a 16550+. Otherwise we have no FIFOs. */ > if ( uart->fifo_size <= 1 && > - ((ns_read_reg(uart, UART_IIR) & 0xc0) == 0xc0) && > + ((ns_read_reg(uart, UART_IIR) & UART_IIR_FE) == UART_IIR_FE) && > ((ns_read_reg(uart, UART_FCR) & UART_FCR_TRG14) == UART_FCR_TRG14) ) > uart->fifo_size = 16; > } > @@ -728,20 +728,20 @@ static int __init check_existence(struct ns16550 *uart) > * Mask out IER[7:4] bits for test as some UARTs (e.g. TL > * 16C754B) allow only to modify them if an EFR bit is set. > */ > -scratch2 = ns_read_reg(uart, UART_IER) & 0x0f; > -ns_write_reg(uart,UART_IER, 0x0F); > -scratch3 = ns_read_reg(uart, UART_IER) & 0x0f; > +scratch2 = ns_read_reg(uart, UART_IER) & UART_IER_MASK; > +ns_write_reg(uart, UART_IER, UART_IER_MASK); > +scratch3 = ns_read_reg(uart, UART_IER) & UART_IER_MASK; > ns_write_reg(uart, UART_IER, scratch); > -if ( (scratch2 != 0) || (scratch3 != 0x0F) ) > +if ( (scratch2 != 0) || (scratch3 != UART_IER_MASK) ) > return 0; > > /* > * Check to see if a UART is really there. > * Use loopback test mode. > */ > -ns_write_reg(uart, UART_MCR, UART_MCR_LOOP | 0x0A); > -status = ns_read_reg(uart, UART_MSR) & 0xF0; > -return (status == 0x90); > +ns_write_reg(uart, UART_MCR, UART_MCR_LOOP | UART_MCR_RTS | > UART_MCR_OUT2); > +status = ns_read_reg(uart, UART_MSR) & UART_MSR_STATUS; > +return (status == (UART_MSR_CTS | UART_MSR_DCD)); > } > > #ifdef CONFIG_HAS_PCI > diff --git a/xen/include/xen/8250-uart.h b/xen/include/xen/8250-uart.h > index d13352940c13..bbbffb14d320 100644 > --- a/xen/include/xen/8250-uart.h > +++ b/xen/include/xen/8250-uart.h > @@ -32,6 +32,7 @@ > #define UART_MCR 0x04/* Modem control*/ > #define UART_LSR 0x05/* line status */ > #define UART_MSR 0x06/* Modem status */ > +#define UART_SCR 0x07/* Scratch pad */ > #define UART_USR 0x1f/* Status register (DW) */ > #define UART_DLL 0x00/* divisor latch (ls) (DLAB=1) */ > #define UART_DLM 0x01/* divisor latch (ms) (DLAB=1) */ > @@ -42,6 +43,8 @@ > #define UART_IER_ETHREI 0x02/* tx reg. empty*/ > #define UART_IER_ELSI 0x04/* rx line status */ > #define UART_IER_EMSI 0x08/* MODEM status */ > +#define UART_IER_MASK \ > +(UART_IER_ERDAI | UART_IER_ETHREI | UART_IER_ELSI | UART_IER_EMSI) > > /* Interrupt Identification Register */ > #define UART_IIR_NOINT0x01/* no interrupt pending */ > @@ -51,12 +54,19 @@ > #define UART_IIR_THR 0x02/* - tx reg. empty */ > #define UART_IIR_MSI 0x00/* - MODEM status */ > #define UART_IIR_BSY 0x07/* - busy detect (DW) */ > +#define UART_IIR_FE 0xc0/* FIFO enabled (2 bits) */ > > /* FIFO Control Register */ > -#define UART_FCR_ENABLE 0x01/* enable FIFO */ > -#define UART_FCR_CLRX 0x02/* clear Rx FIFO*/ > -#define UART_FCR_CLTX 0x04/* clear Tx FIFO*/ > -#define UART_FCR_DMA 0x10/* enter DMA mode */ > +#define UART_FCR_ENABLE BIT(0, U) /* enable FIFO */ > +#define UART_FCR_CLRX BIT(1, U) /* clear Rx FIFO*/ > +#define UART_FCR_CLTX BIT(2, U) /* clear Tx FIFO*/ > +#define UART_FCR_DMABIT(3, U) /* enter DMA mode */ > +#define UART_FCR_RESERVED0 BIT(4, U) /* reserved; always 0 */ > +#define UART_FCR_RESERVED1 BIT(5, U) /* reserved; always 0 */ > +#define UART_FCR_RTB0 BIT(6, U) /* receiver trigger bit #0 */ > +#define UART_FCR_RTB1 BIT(7, U) /* receiver trigger bit #1 */ > +#define UART_FCR_TRG_MASK (UART_FCR_RTB0 | UART_FCR_RTB1) Thanks for the patch. I noticed that in this changeset some bit definitions (e.g. UART_FCR_*) were rewritten u
Re: [PATCH v6 01/15] emul/vuart: introduce framework for UART emulators
Hi Denis, On Sat, Sep 6, 2025 at 2:27 AM wrote: > > From: Denis Mukhin > > Introduce a driver framework to abstract UART emulators in the hypervisor. > > That allows for architecture-independent handling of virtual UARTs in the > console driver and simplifies enabling new UART emulators. > > The framework is built under CONFIG_VUART_FRAMEWORK, which will be > automatically enabled once the user enables any UART emulator. > > Current implementation supports maximum of one vUART of each kind per domain. > > Use new domain_has_vuart() in the console driver code to check whether to > forward console input to the domain using vUART. > > Enable console forwarding over vUART for hardware domains with a vUART. That > enables console forwarding to dom0 on x86, since console can be forwarded only > to Xen, dom0 and pvshim on x86 as of now. > > Note: existing vUARTs are deliberately *not* hooked to the new framework to > minimize the scope of the patch: vpl011 (i.e. SBSA) emulator and "vuart" (i.e. > minimalistic MMIO-mapped dtuart for hwdoms on Arm) are kept unmodified. > > No functional changes for non-x86 architectures. > > Signed-off-by: Denis Mukhin > --- > Changes since v5: > - addressed v5 feedback > - Link to v5: > https://lore.kernel.org/xen-devel/20250828235409.2835815-2-dmuk...@ford.com/ > --- > xen/arch/arm/xen.lds.S | 1 + > xen/arch/ppc/xen.lds.S | 1 + > xen/arch/riscv/xen.lds.S | 1 + > xen/arch/x86/xen.lds.S | 1 + > xen/common/Kconfig | 2 + > xen/common/Makefile| 1 + > xen/common/emul/Kconfig| 6 ++ > xen/common/emul/Makefile | 1 + > xen/common/emul/vuart/Kconfig | 6 ++ > xen/common/emul/vuart/Makefile | 1 + > xen/common/emul/vuart/vuart.c | 157 + > xen/common/keyhandler.c| 3 + > xen/drivers/char/console.c | 6 +- > xen/include/xen/sched.h| 4 + > xen/include/xen/serial.h | 3 + > xen/include/xen/vuart.h| 116 > xen/include/xen/xen.lds.h | 10 +++ > 17 files changed, 319 insertions(+), 1 deletion(-) > create mode 100644 xen/common/emul/Kconfig > create mode 100644 xen/common/emul/Makefile > create mode 100644 xen/common/emul/vuart/Kconfig > create mode 100644 xen/common/emul/vuart/Makefile > create mode 100644 xen/common/emul/vuart/vuart.c > create mode 100644 xen/include/xen/vuart.h > > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S > index db17ff1efa98..cd05b18770f4 100644 > --- a/xen/arch/arm/xen.lds.S > +++ b/xen/arch/arm/xen.lds.S > @@ -58,6 +58,7 @@ SECTIONS > *(.rodata) > *(.rodata.*) > VPCI_ARRAY > + VUART_ARRAY > *(.data.rel.ro) > *(.data.rel.ro.*) > > diff --git a/xen/arch/ppc/xen.lds.S b/xen/arch/ppc/xen.lds.S > index 1de0b77fc6b9..f9d4e5b0dcd8 100644 > --- a/xen/arch/ppc/xen.lds.S > +++ b/xen/arch/ppc/xen.lds.S > @@ -52,6 +52,7 @@ SECTIONS > *(.rodata) > *(.rodata.*) > VPCI_ARRAY > +VUART_ARRAY > *(.data.rel.ro) > *(.data.rel.ro.*) > > diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S > index edcadff90bfe..59dcaa5fef9a 100644 > --- a/xen/arch/riscv/xen.lds.S > +++ b/xen/arch/riscv/xen.lds.S > @@ -47,6 +47,7 @@ SECTIONS > *(.rodata) > *(.rodata.*) > VPCI_ARRAY > +VUART_ARRAY > *(.data.rel.ro) > *(.data.rel.ro.*) > > diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S > index 966e514f2034..d877b93a6964 100644 > --- a/xen/arch/x86/xen.lds.S > +++ b/xen/arch/x86/xen.lds.S > @@ -132,6 +132,7 @@ SECTIONS > *(.rodata) > *(.rodata.*) > VPCI_ARRAY > + VUART_ARRAY > *(.data.rel.ro) > *(.data.rel.ro.*) > > diff --git a/xen/common/Kconfig b/xen/common/Kconfig > index 76f9ce705f7a..78a32b69e2b2 100644 > --- a/xen/common/Kconfig > +++ b/xen/common/Kconfig > @@ -676,4 +676,6 @@ config PM_STATS > Enable collection of performance management statistics to aid in > analyzing and tuning power/performance characteristics of the system > > +source "common/emul/Kconfig" > + > endmenu > diff --git a/xen/common/Makefile b/xen/common/Makefile > index 0c7d0f5d46e1..8c8462565050 100644 > --- a/xen/common/Makefile > +++ b/xen/common/Makefile > @@ -12,6 +12,7 @@ obj-$(CONFIG_DEVICE_TREE_PARSE) += device-tree/ > obj-$(CONFIG_IOREQ_SERVER) += dm.o > obj-y += domain.o > obj-y += domid.o > +obj-y += emul/ > obj-y += event_2l.o > obj-y += event_channel.o > obj-$(CONFIG_EVTCHN_FIFO) += event_fifo.o > diff --git a/xen/common/emul/Kconfig b/xen/common/emul/Kconfig > new file mode 100644 > index ..7c6764d1756b > --- /dev/null > +++ b/xen/common/emul/Kconfig > @@ -0,0 +1,6 @@ > +menu "Domain Emulation Features" > + visible if EXPERT > + > +source "common/emul/vuart/Kconfig" > + > +endmenu > diff --git a/xen/common/emul/Makefile b/xen/common/
Re: [PATCH 7/7] mm: update lazy_mmu documentation
On Thu, Sep 04, 2025 at 01:57:36PM +0100, Kevin Brodsky wrote: > We now support nested lazy_mmu sections on all architectures > implementing the API. Update the API comment accordingly. > > Signed-off-by: Kevin Brodsky Acked-by: Mike Rapoport (Microsoft) > --- > include/linux/pgtable.h | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index 6932c8e344ab..be0f059beb4d 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -228,8 +228,18 @@ static inline int pmd_dirty(pmd_t pmd) > * of the lazy mode. So the implementation must assume preemption may be > enabled > * and cpu migration is possible; it must take steps to be robust against > this. > * (In practice, for user PTE updates, the appropriate page table lock(s) are > - * held, but for kernel PTE updates, no lock is held). Nesting is not > permitted > - * and the mode cannot be used in interrupt context. > + * held, but for kernel PTE updates, no lock is held). The mode cannot be > used > + * in interrupt context. > + * > + * Calls may be nested: an arch_{enter,leave}_lazy_mmu_mode() pair may be > called > + * while the lazy MMU mode has already been enabled. An implementation should > + * handle this using the state returned by enter() and taken by the matching > + * leave() call; the LAZY_MMU_{DEFAULT,NESTED} flags can be used to indicate > + * whether this enter/leave pair is nested inside another or not. (It is up > to > + * the implementation to track whether the lazy MMU mode is enabled at any > point > + * in time.) The expectation is that leave() will flush any batched state > + * unconditionally, but only leave the lazy MMU mode if the passed state is > not > + * LAZY_MMU_NESTED. > */ > #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE > typedef int lazy_mmu_state_t; > -- > 2.47.0 > -- Sincerely yours, Mike.
[PATCH v3 2/2] efi: Support using Shim's LoadImage protocol
The existing Verify functionality of the Shim lock protocol is deprecated and will be removed, the alternative it to use the LoadImage interface to perform the verification. When the loading is successful we won't be using the newly loaded image (as of yet) so we must then immediately unload the image to clean up. If the LoadImage protocol isn't available then fall back to the Shim Lock (Verify) interface. Log when the kernel is not verified and fail if this occurs when secure boot mode is enabled. Signed-off-by: Gerald Elder-Vass Signed-off-by: Kevin Lampis --- CC: Marek Marczykowski-Górecki CC: "Daniel P. Smith" CC: Jan Beulich CC: Andrew Cooper CC: Anthony PERARD CC: Michal Orzel CC: Julien Grall CC: "Roger Pau Monné" CC: Stefano Stabellini v3: - Use Shim Image by default, fall back to Shim Lock --- xen/common/efi/boot.c | 59 +-- 1 file changed, 51 insertions(+), 8 deletions(-) diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c index e7e3dffa7ddc..1f63473d264d 100644 --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -38,6 +38,8 @@ { 0xf2fd1544U, 0x9794, 0x4a2c, {0x99, 0x2e, 0xe5, 0xbb, 0xcf, 0x20, 0xe3, 0x94} } #define SHIM_LOCK_PROTOCOL_GUID \ { 0x605dab50U, 0xe046, 0x4300, {0xab, 0xb6, 0x3d, 0xd8, 0x10, 0xdd, 0x8b, 0x23} } +#define SHIM_IMAGE_LOADER_GUID \ + { 0x1f492041U, 0xfadb, 0x4e59, {0x9e, 0x57, 0x7c, 0xaf, 0xe7, 0x3a, 0x55, 0xab} } #define APPLE_PROPERTIES_PROTOCOL_GUID \ { 0x91bd12feU, 0xf6c3, 0x44fb, {0xa5, 0xb7, 0x51, 0x22, 0xab, 0x30, 0x3a, 0xe0} } #define EFI_SYSTEM_RESOURCE_TABLE_GUID\ @@ -70,6 +72,13 @@ typedef struct { EFI_SHIM_LOCK_VERIFY Verify; } EFI_SHIM_LOCK_PROTOCOL; +typedef struct _SHIM_IMAGE_LOADER { +EFI_IMAGE_LOAD LoadImage; +EFI_IMAGE_START StartImage; +EFI_EXIT Exit; +EFI_IMAGE_UNLOAD UnloadImage; +} SHIM_IMAGE_LOADER; + struct _EFI_APPLE_PROPERTIES; typedef EFI_STATUS @@ -1047,6 +1056,46 @@ static UINTN __init efi_find_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop, return gop_mode; } +static void __init efi_verify_kernel(EFI_HANDLE ImageHandle) +{ +static EFI_GUID __initdata shim_image_guid = SHIM_IMAGE_LOADER_GUID; +static EFI_GUID __initdata shim_lock_guid = SHIM_LOCK_PROTOCOL_GUID; +SHIM_IMAGE_LOADER *shim_loader; +EFI_HANDLE loaded_kernel; +EFI_SHIM_LOCK_PROTOCOL *shim_lock; +EFI_STATUS status; +bool verified = false; + +/* Look for LoadImage first */ +if ( !EFI_ERROR(efi_bs->LocateProtocol(&shim_image_guid, NULL, + (void **)&shim_loader)) ) +{ +status = shim_loader->LoadImage(false, ImageHandle, NULL, +(void *)kernel.ptr, kernel.size, +&loaded_kernel); +if ( !EFI_ERROR(status) ) +verified = true; + +/* LoadImage performed verification, now clean up with UnloadImage */ +shim_loader->UnloadImage(loaded_kernel); +} + +/* else fall back to Shim Lock */ +if ( !verified && + !EFI_ERROR(efi_bs->LocateProtocol(&shim_lock_guid, NULL, + (void **)&shim_lock)) && + !EFI_ERROR(shim_lock->Verify(kernel.ptr, kernel.size)) ) +verified = true; + +if ( !verified ) +{ +PrintStr(L"Kernel was not verified\n"); + +if ( efi_secure_boot ) +blexit(L"Failed to verify kernel"); +} +} + static void __init efi_tables(void) { unsigned int i; @@ -1334,13 +1383,11 @@ void EFIAPI __init noreturn efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) { static EFI_GUID __initdata loaded_image_guid = LOADED_IMAGE_PROTOCOL; -static EFI_GUID __initdata shim_lock_guid = SHIM_LOCK_PROTOCOL_GUID; EFI_LOADED_IMAGE *loaded_image; EFI_STATUS status; unsigned int i; CHAR16 *file_name, *cfg_file_name = NULL, *options = NULL; UINTN gop_mode = ~0; -EFI_SHIM_LOCK_PROTOCOL *shim_lock; EFI_GRAPHICS_OUTPUT_PROTOCOL *gop = NULL; union string section = { NULL }, name; bool base_video = false; @@ -1591,12 +1638,8 @@ void EFIAPI __init noreturn efi_start(EFI_HANDLE ImageHandle, * device tree through the efi_check_dt_boot function, in this stage * verify it. */ -if ( kernel.ptr && - !kernel_verified && - !EFI_ERROR(efi_bs->LocateProtocol(&shim_lock_guid, NULL, - (void **)&shim_lock)) && - (status = shim_lock->Verify(kernel.ptr, kernel.size)) != EFI_SUCCESS ) -PrintErrMesg(L"Dom0 kernel image could not be verified", status); +if ( kernel.ptr && !kernel_verified ) +efi_verify_kernel(ImageHandle); efi_arch_edd(); -- 2.47.3
Re: [PATCH v2 3/4] xen/arm64: allow to make aarch32 support optional
On 05/09/2025 13:15, Volodymyr Babchuk wrote: Hi, Grygorii Strashko writes: On 27.08.25 03:16, Volodymyr Babchuk wrote: Hi Grygorii, Grygorii Strashko writes: From: Grygorii Strashko Now Arm64 AArch32 guest support is always enabled and built-in while not all Arm64 platforms supports AArch32 (for exmaple on Armv9A) or this support might not be needed (Arm64 AArch32 is used quite rarely in embedded systems). More over, when focusing on safety certification, AArch32 related code in Xen leaves a gap in terms of coverage that cannot really be justified in words. This leaves two options: either support it (lots of additional testing, requirements and documents would be needed) or compile it out. Hence, this patch introduces basic support to allow make Arm64 AArch32 guest support optional. The following changes are introduced: - Introduce Kconfig option CONFIG_ARM64_AARCH32 to allow enable/disable Arm64 AArch32 guest support (default y) - Introduce is_aarch32_enabled() helper which accounts Arm64 HW capability and CONFIG_ARM64_AARCH32 setting - Introduce arm64_set_domain_type() to configure Arm64 domain type in unified way instead of open coding (d)->arch.type, and account CONFIG_ARM64_AARCH32 configuration. - toolstack: do not advertise "xen-3.0-armv7l " capability if AArch32 is disabled. - do not expose EL1 AArch32 support to guest in ID_AA64PFR0_EL1 reg if AArch32 is disabled. - Set Arm64 domain type to DOMAIN_64BIT by default. - the Arm Xen boot code is handling this case properly already; - for toolstack case the XEN_DOMCTL_set_address_size hypercall handling updated to forcibly configure domain type regardless of current domain type configuration, so toolstack behavior is unchanged. With CONFIG_ARM64_AARCH32=n the Xen will reject AArch32 guests (kernels) if configured by user in the following way: - Xen boot will fail with panic during dom0 or dom0less domains creation - toolstack domain creation will be rejected due to xc_dom_compat_check() failure. Making Arm64 AArch32 guest support open further possibilities for build optimizations of Arm64 AArch32 guest support code. Signed-off-by: Grygorii Strashko --- changes in v2: - use Arm64 "cpu_has_el1_32" in all places to check if HW has AArch32 support - rework Arm64 XEN_DOMCTL_set_address_size hypercall handling to work with any initial domain type set (32bit or 64 bit) - fix comments related to macro parameters evaluation issues - do not expose EL1 AArch32 support to guest in ID_AA64PFR0_EL1 reg if AArch32 is disabled xen/arch/arm/Kconfig| 7 xen/arch/arm/arm64/domain-build.c | 46 +++-- xen/arch/arm/arm64/domctl.c | 16 + xen/arch/arm/arm64/vsysreg.c| 9 + xen/arch/arm/domain.c | 9 + xen/arch/arm/domain_build.c | 21 +++ xen/arch/arm/include/asm/arm32/domain.h | 12 +++ xen/arch/arm/include/asm/arm64/domain.h | 23 + xen/arch/arm/setup.c| 2 +- 9 files changed, 119 insertions(+), 26 deletions(-) diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index a0c816047427..bf6dd73caf73 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -266,6 +266,13 @@ config PCI_PASSTHROUGH help This option enables PCI device passthrough +config ARM64_AARCH32 + bool "AArch32 Guests support on on ARM64 (UNSUPPORTED)" if UNSUPPORTED But aarch32 guests are supported... I understand that you wanted to say "Disabling aarch32 support is unsupported". But currently this entry reads backwards. I think it should be reworded better. But I have no idea - how to do this. I think "(UNSUPPORTED)" can be just dropped. Is it ok? As I understand, If you want this feature to be eventually certified, it should not be UNSUPPORTED nor EXPERIMENTAL. The certification is somewhat irrelevant to the decision of the state of the feature. Instead, the decision should be based on the criteria based in SUPPORT.MD (see "Status"). If it is experimental/unsupported, then what's missing to make it supported? In addition to that, there is the "EXPERT" mode. This was introduced mainly to allow the user to tailor the Kconfig but also limit to what we security support. This is to reduce the amount of workload on the security team when it comes to decide on whether we need to issue an XSA (the more possibility, the more difficult it becomes). There has been discussion on providing a small set of config (one could be for certification purpose) that would be security supported. But I don't think we come to a conclusion yet. Cheers, -- Julien Grall
Re: [PATCH v2] Strip build path directories in tools and hypervisor
On Fri, Sep 05, 2025 at 10:15:12AM +0200, Jan Beulich wrote: > On 04.09.2025 16:27, Marek Marczykowski-Górecki wrote: > > On Thu, Sep 04, 2025 at 02:58:20PM +0200, Jan Beulich wrote: > >> On 04.09.2025 13:41, Marek Marczykowski-Górecki wrote: > >>> Use -fdebug-prefix-map in preference to -ffile-prefix-map, as it's > >>> available in earlier toolchain versions. But use it together with > >>> -fmacro-prefix-map (if available) for hypervisor build, otherwise it > >>> still contains some paths in out-of-tree builds. > >> > >> I consider it wrong not to use -ffile-prefix-map when available. That > >> already covers more than "debug" and "macro", and it may gain further > >> functionality. > > > > I asked about that on v1 and got ambiguous answer suggesting the opposite: > > https://lore.kernel.org/xen-devel/0370c0eb1fd9ac00acab016792132fa0b943d384.1742317309.git-series.marma...@invisiblethingslab.com/T/#m74a8883835e30fb74a85b07a7b14507ee52e7c65 > > Ambiguous answer(s)? There's no reply to that mail of yours, I mean your email to which I responded. > and I don't > see how the conclusion drawn fits my earlier comment. That was more > towards what I did in v1 of my patch - fall back to the more widely > supported option when the less widely available one can't be used. > > >>> --- a/tools/Makefile > >>> +++ b/tools/Makefile > >>> @@ -1,4 +1,4 @@ > >>> -XEN_ROOT = $(CURDIR)/.. > >>> +XEN_ROOT = $(realpath $(CURDIR)/..) > >>> > >>> export PKG_CONFIG_DIR = $(CURDIR)/pkg-config > >>> > >>> diff --git a/tools/Rules.mk b/tools/Rules.mk > >>> index 725c3c32e9a2..428fce094819 100644 > >>> --- a/tools/Rules.mk > >>> +++ b/tools/Rules.mk > >>> @@ -166,6 +166,8 @@ endif > >>> CFLAGS-$(CONFIG_X86_32) += $(call > >>> cc-option,$(CC),-mno-tls-direct-seg-refs) > >>> CFLAGS += $(CFLAGS-y) > >>> > >>> +$(call cc-option-add,CFLAGS,CC,-fdebug-prefix-map=$(realpath > >>> $(XEN_ROOT))=.) > >> > >> Here and below - no need to use cc-option-add for -fdebug-prefix-map, > >> which all permissible compilers support. > > > > Ok. > > > >> Further, again as per reply to Andrew on the thread hanging off of my > >> patch - I don't view it as desirable to leave the tools/ prefix in > >> place, or e.g. for libraries, the entire tools/libs// part. > >> Imo every binary should have only the path (if any) from its own source > >> root left. (And yes, how to deal with e.g. shared include files isn't > >> quite clear to me, yet. Maybe we actually need to pass two options.) > > > > I don't think it's valid to strip arbitrary prefixes from debug symbols, > > especially in tools. This will break some automated tools that try to match > > coredumps (and similar) to source code and sometimes even debug symbols > > too. But even for manual usage, having to jump between directories (I'm > > not sure if gdb supports multiple source dirs at once?) > > Pretty necessarily: When debugging you might easily cross project boundaries. > > > just because you > > happen to debug a binary that use more of libraries isn't exactly > > desirable. > > I think the paths in debug symbols and similar should match the layout > > in the source repository, not a subset of it. > > Well, okay, we disagree here. To me, xen.git really is an agglomeration of > too many things in a single repo. If things were properly split, you'd end > up with what I'm asking for anyway. To give specific example: Fedora installs source files in /usr/src/debug/(package name) and then does debuginfo postprocessing to point at that path. Debian does pretty much the same, and I'm sure many other distributions too. Now, if you strip part of the path from debug symbols, they will not point at the correct source location. Of course Fedora/Debian/etc package can apply a patch to adjust it (as it's currently supplying -fdebug-prefix-map via CFLAGS), but IMO forcing every distribution to basically undo upstream change is a wrong move. > > Theoretically this doesn't apply to the hypervisor yet, as I'm not aware > > of any tool processing xen memory dumps automatically (and those for > > manual usage are quite unstable, to say the least...). But I don't think > > it's an excuse to have incomplete paths in there, just to save few > > bytes? > > The only case where I can see it would make some sense is out of tree > > build, where indeed it's about just the hypervisor, not the toolstack > > (IMO due to the build system limitation, but well...). But at the same > > time, having different path variant depending on it-tree/out-of-tree > > build feels weird. > > Which is why I'm arguing for the dropping of the xen/ prefixes, as that's > how things come out in in-tree builds. > > Jan -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
[PATCH 4/7] x86/xen: support nested lazy_mmu sections (again)
Commit 49147beb0ccb ("x86/xen: allow nesting of same lazy mode") originally introduced support for nested lazy sections (LAZY_MMU and LAZY_CPU). It later got reverted by commit c36549ff8d84 as its implementation turned out to be intolerant to preemption. Now that the lazy_mmu API allows enter() to pass through a state to the matching leave() call, we can support nesting again for the LAZY_MMU mode in a preemption-safe manner. If xen_enter_lazy_mmu() is called inside an active lazy_mmu section, xen_lazy_mode will already be set to XEN_LAZY_MMU and we can then return LAZY_MMU_NESTED to instruct the matching xen_leave_lazy_mmu() call to leave xen_lazy_mode unchanged. The only effect of this patch is to ensure that xen_lazy_mode remains set to XEN_LAZY_MMU until the outermost lazy_mmu section ends. xen_leave_lazy_mmu() still calls xen_mc_flush() unconditionally. Signed-off-by: Kevin Brodsky --- arch/x86/include/asm/paravirt.h | 6 ++ arch/x86/include/asm/paravirt_types.h | 4 ++-- arch/x86/xen/mmu_pv.c | 11 --- 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 65a0d394fba1..4ecd3a6b1dea 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -529,14 +529,12 @@ static inline void arch_end_context_switch(struct task_struct *next) #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { - PVOP_VCALL0(mmu.lazy_mode.enter); - - return LAZY_MMU_DEFAULT; + return PVOP_CALL0(lazy_mmu_state_t, mmu.lazy_mode.enter); } static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { - PVOP_VCALL0(mmu.lazy_mode.leave); + PVOP_VCALL1(mmu.lazy_mode.leave, state); } static inline void arch_flush_lazy_mmu_mode(void) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index bc1af86868a3..b7c567ccbf32 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -45,8 +45,8 @@ typedef int lazy_mmu_state_t; struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ - void (*enter)(void); - void (*leave)(void); + lazy_mmu_state_t (*enter)(void); + void (*leave)(lazy_mmu_state_t); void (*flush)(void); } __no_randomize_layout; #endif diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2039d5132ca3..6e5390ff06a5 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2130,9 +2130,13 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) #endif } -static void xen_enter_lazy_mmu(void) +static lazy_mmu_state_t xen_enter_lazy_mmu(void) { + if (this_cpu_read(xen_lazy_mode) == XEN_LAZY_MMU) + return LAZY_MMU_NESTED; + enter_lazy(XEN_LAZY_MMU); + return LAZY_MMU_DEFAULT; } static void xen_flush_lazy_mmu(void) @@ -2167,11 +2171,12 @@ static void __init xen_post_allocator_init(void) pv_ops.mmu.write_cr3 = &xen_write_cr3; } -static void xen_leave_lazy_mmu(void) +static void xen_leave_lazy_mmu(lazy_mmu_state_t state) { preempt_disable(); xen_mc_flush(); - leave_lazy(XEN_LAZY_MMU); + if (state != LAZY_MMU_NESTED) + leave_lazy(XEN_LAZY_MMU); preempt_enable(); } -- 2.47.0
[PATCH 5/7] powerpc/mm: support nested lazy_mmu sections
The lazy_mmu API now allows nested sections to be handled by arch code: enter() can return a flag if called inside another lazy_mmu section, so that the matching call to leave() leaves any optimisation enabled. This patch implements that new logic for powerpc: if there is an active batch, then enter() returns LAZY_MMU_NESTED and the matching leave() leaves batch->active set. The preempt_{enable,disable} calls are left untouched as they already handle nesting themselves. TLB flushing is still done in leave() regardless of the nesting level, as the caller may rely on it whether nesting is occurring or not. Signed-off-by: Kevin Brodsky --- .../powerpc/include/asm/book3s/64/tlbflush-hash.h | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h index c9f1e819e567..001c474da1fe 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -30,6 +30,7 @@ typedef int lazy_mmu_state_t; static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct ppc64_tlb_batch *batch; + int lazy_mmu_nested; if (radix_enabled()) return LAZY_MMU_DEFAULT; @@ -39,9 +40,14 @@ static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) */ preempt_disable(); batch = this_cpu_ptr(&ppc64_tlb_batch); - batch->active = 1; + lazy_mmu_nested = batch->active; - return LAZY_MMU_DEFAULT; + if (!lazy_mmu_nested) { + batch->active = 1; + return LAZY_MMU_DEFAULT; + } else { + return LAZY_MMU_NESTED; + } } static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -54,7 +60,10 @@ static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) if (batch->index) __flush_tlb_pending(batch); - batch->active = 0; + + if (state != LAZY_MMU_NESTED) + batch->active = 0; + preempt_enable(); } -- 2.47.0
[PATCH 6/7] sparc/mm: support nested lazy_mmu sections
The lazy_mmu API now allows nested sections to be handled by arch code: enter() can return a flag if called inside another lazy_mmu section, so that the matching call to leave() leaves any optimisation enabled. This patch implements that new logic for sparc: if there is an active batch, then enter() returns LAZY_MMU_NESTED and the matching leave() leaves batch->active set. The preempt_{enable,disable} calls are left untouched as they already handle nesting themselves. TLB flushing is still done in leave() regardless of the nesting level, as the caller may rely on it whether nesting is occurring or not. Signed-off-by: Kevin Brodsky --- arch/sparc/mm/tlb.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index bf5094b770af..42de93d74d0e 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -53,12 +53,18 @@ void flush_tlb_pending(void) lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct tlb_batch *tb; + int lazy_mmu_nested; preempt_disable(); tb = this_cpu_ptr(&tlb_batch); - tb->active = 1; + lazy_mmu_nested = tb->active; - return LAZY_MMU_DEFAULT; + if (!lazy_mmu_nested) { + tb->active = 1; + return LAZY_MMU_DEFAULT; + } else { + return LAZY_MMU_NESTED; + } } void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -67,7 +73,10 @@ void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) if (tb->tlb_nr) flush_tlb_pending(); - tb->active = 0; + + if (state != LAZY_MMU_NESTED) + tb->active = 0; + preempt_enable(); } -- 2.47.0
Re: [PATCH v6 00/15] x86: introduce NS16550-compatible UART emulator
Oleksii and all, I would like to consider patches 1-12 of this patch series for 4.21, pending the few minor comments I made addressed. On Fri, 5 Sep 2025, dmuk...@xen.org wrote: > x86 port of Xen lacks vUART facility similar to Arm's vpl011 to support x86 > guest OS bring up in the embedded setups. > > This patch series introduces initial in-hypervisor emulator for > NS8250/NS16x50-compatible UARTs under CONFIG_VUART_NS16X50. > > In parallel domain creation scenario (hyperlaunch), NS16550 emulator helps > early guest firmware and OS bringup debugging, because it eliminates > dependency on the external emulator (qemu) being operational by the time > domains are created. > > The emulator also allows to forward the physical console input to the x86 > domain which is useful when a system has only one physical UART for early > debugging and this UART is owned by Xen. > > By default, CONFIG_VUART_NS16X50 enables emulation of NS16550 at I/O port > 0x2f8, IRQ#3 in guest OS (legacy COM2). Legacy COM resources cannot be > selected at built-time or via per-domain xl configuration in this initial > submission. > > CONFIG_VUART_NS16X50_DEBUG enables some extra debugging facilities useful > for NS16550 emulator development/debugging (disabled by default). > > The NS16550 emulator is disabled in default x86 configuration and goes under > CONFIG_EXPERT in Kconfig. > > Limitations > === > - Only x86; > - Only legacy COM2 resources, custom I/O ports/IRQs are not supported; > - Only Xen console as a backend, no inter-domain communication (similar to > vpl011 on Arm); > - Only 8n1 emulation (8-bit data, no parity, 1 stop bit); > - No toolstack integration; > - No baud rate emulation (reports 115200 baud to the guest OS); > - No FIFO-less mode emulation; > - No RX FIFO interrupt moderation (FCR) emulation; > - No integration w/ VM snapshotting (HVM_REGISTER_SAVE_RESTORE() and > friends); > - No MMIO-based UART emulation. > > Series > == > > Patch 1 introduces the new vUART framework, that is the code originally > posted here: > > https://lore.kernel.org/xen-devel/20250624035443.344099-16-dmuk...@ford.com/ > Required for emulator. > > Patch 2 adds missing NS16550 definitions, required for emulator. > > Patch 3 introduces the basic emulator skeleton - state machine > initialization stubs, I/O port handler stub, logging, etc. > > Patches 4-11 incrementally populate the minimal NS16550 register emulation. > > Patch 12 hooks vUART state debugging (disabled by default). > > Pathes 13-15 introduce necessary changes to enable NS16550 on dom0 (and > PVH). > > Link to CI: > https://gitlab.com/xen-project/people/dmukhin/xen/-/pipelines/2024756493 > Link to branch: > https://gitlab.com/xen-project/people/dmukhin/xen/-/tree/vuart-ns8250-v6?ref_type=heads > > Testing > === > > ```shell > echo CONFIG_EXPERT=y >> .config > echo CONFIG_VUART_NS16X50=y >> .config > make olddefconfig > ``` > COM2 (0x2f8) resources are used by default. > > To test w/ virtual COM2, the guest kernel parameters should contain > something like the following: > earlycon=uart,io,0x2f8,115200n8 console=uart,io,0x2f8,115200n8 > > HVM > --- > Tested only boot of HVM linux guest with OVMF as the virtual firmware. > SeaBIOS as a virtual firmware is not tested. > > PVH (dom0) > -- > Xen is able to forward physical console input to the domain with virtual > NS16550. To switch the console focus press Ctrl+aaa. > Console switch is limited on x86 to dom0 and Xen (fixes pending). > > Changes since v5: > - Split THR/RBR into two separate patches. > - Addressed feedback from v5. > - Link to v5: > https://lore.kernel.org/xen-devel/20250828235409.2835815-1-dmuk...@ford.com/ > > Changes since v4: > - Split the series to make it simpler to review. > - Addressed feedback from v4. > - Dropped xl changes, which I will submit separately. > - Link to v4: > https://lore.kernel.org/xen-devel/20250731192130.3948419-1-dmuk...@ford.com/ > > Changes since v3: > - Reduced the blast radius of the series, thanks to reviews, individual > aspects (like console focus) touched in v3 moved to separate threads. > - Kept the UART emulator framework since I need to redo some of emulator code > and there's more-or-less agreement on it (where to place, naming, scope). > - Applied the feedback from > > https://lore.kernel.org/xen-devel/20250624035443.344099-1-dmuk...@ford.com/ > - Link to v3: > https://lore.kernel.org/xen-devel/20250103-vuart-ns8250-v3-v1-0-c5d36b31d...@ford.com/ > > Changes since v2: > - renamed emulator s/NS8250/NS16550/g > - reduced the patch series after addressing v2 feedback > - introduced driver framework for UART emulators > - unified guest OS printouts across all available UART emulators > - Link to v2: > https://lore.kernel.org/xen-devel/20241205-vuart-ns8250-v1-0-e9aa92312...@ford.com/ > > Changes since v1: > - dropped kmalloc/kfree aliases > - fixed E
Re: [PATCH v6 03/15] emul/ns16x50: implement emulator stub
Hi Denis, On Sat, Sep 6, 2025 at 2:27 AM wrote: > > From: Denis Mukhin > > The change is the first on the way on introducing minimally functional > NS16550-compatible UART emulator. > > Define UART state and a set of emulated registers. > > Implement alloc/free vUART hooks. > > Stub out I/O port handler. > > Add initialization of the NS16x50-compatible UART emulator state machine. > > Plumb debug logging. > > Signed-off-by: Denis Mukhin > --- > Changes since v5: > - v5 feedback > - Link to v5: > https://lore.kernel.org/xen-devel/20250828235409.2835815-4-dmuk...@ford.com/ > --- > xen/arch/x86/hvm/hvm.c | 21 ++ > xen/common/emul/vuart/Kconfig | 19 ++ > xen/common/emul/vuart/Makefile | 1 + > xen/common/emul/vuart/ns16x50.c | 366 > 4 files changed, 407 insertions(+) > create mode 100644 xen/common/emul/vuart/ns16x50.c > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 23bd7f078a1d..91c971f11e14 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -28,6 +28,7 @@ > #include > #include > #include I noticed that this include ... > +#include ... is out of alphabetical order. All other includes in this block are correctly sorted. > #include > #include > #include > @@ -689,6 +690,22 @@ int hvm_domain_initialise(struct domain *d, > if ( rc != 0 ) > goto fail1; > > +/* Limit NS16550 emulator for dom0 only for now. */ > +if ( IS_ENABLED(CONFIG_VUART_NS16X50) && is_hardware_domain(d) ) Currently, the Xen glossary defines a "hardware domain" as: A domain, commonly dom0, which shares responsibility with Xen about the system as a whole. It has been historically correct to treat is_hardware_domain(d) as equivalent to dom0. However, according to patch series [1], this is no longer guaranteed: "Setting hardware domain as domid 0 is removed." After these changes, the assumption that hardware domain always equals dom0 may not hold. As a result, the above comment in the code could become misleading. Consider updating it to something like: /* Limit NS16550 emulator to the hardware domain only for now */ to reflect the new semantics. > +{ > +struct vuart_info info = { > +.name = "COM2", > +.compatible = "ns16550", > +.base_addr = 0x2f8, > +.size = 8, > +.irq= 3, > +}; Consider defining COM2 base address and IRQ in a shared header (e.g., VUART_COM2_BASE and VUART_COM2_IRQ) rather than using the magic numbers 0x2f8 and 3 here. This would allow reuse in `__start_xen` and other places, and makes the code clearer and easier to maintain. > + > +rc = vuart_init(d, &info); > +if ( rc ) > +goto out_vioapic_deinit; > +} > + > stdvga_init(d); > > rtc_init(d); > @@ -712,6 +729,8 @@ int hvm_domain_initialise(struct domain *d, > return 0; > > fail2: > +vuart_deinit(d); > + out_vioapic_deinit: > vioapic_deinit(d); > fail1: > if ( is_hardware_domain(d) ) > @@ -774,6 +793,8 @@ void hvm_domain_destroy(struct domain *d) > if ( hvm_funcs.domain_destroy ) > alternative_vcall(hvm_funcs.domain_destroy, d); > > +vuart_deinit(d); > + > vioapic_deinit(d); > > XFREE(d->arch.hvm.pl_time); > diff --git a/xen/common/emul/vuart/Kconfig b/xen/common/emul/vuart/Kconfig > index ce1b976b7da7..a27d7ca135af 100644 > --- a/xen/common/emul/vuart/Kconfig > +++ b/xen/common/emul/vuart/Kconfig > @@ -3,4 +3,23 @@ config VUART_FRAMEWORK > > menu "UART Emulation" > > +config VUART_NS16X50 > + bool "NS16550-compatible UART Emulator" if EXPERT > + depends on X86 && HVM > + select VUART_FRAMEWORK > + default n > + help > + In-hypervisor NS16x50 UART emulation. > + > + Only legacy PC COM2 port is emulated. > + > + This is strictly for testing purposes (such as early HVM guest > console), > + and not appropriate for use in production. > + > +config VUART_NS16X50_DEBUG > + bool "NS16550-compatible UART Emulator Debugging" > + depends on VUART_NS16X50 && DEBUG > + help > + Enable development debugging. > + > endmenu > diff --git a/xen/common/emul/vuart/Makefile b/xen/common/emul/vuart/Makefile > index 97f792dc6641..fe904f6cb65d 100644 > --- a/xen/common/emul/vuart/Makefile > +++ b/xen/common/emul/vuart/Makefile > @@ -1 +1,2 @@ > obj-y += vuart.o > +obj-$(CONFIG_VUART_NS16X50) += ns16x50.o > diff --git a/xen/common/emul/vuart/ns16x50.c b/xen/common/emul/vuart/ns16x50.c > new file mode 100644 > index ..0462a961e785 > --- /dev/null > +++ b/xen/common/emul/vuart/ns16x50.c > @@ -0,0 +1,366 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * NS16550-compatible UART Emulator. > + * > + * See: > + * - Serial and UART Tutorial: > + * > https://download.freebsd.org/doc/en/articles/serial-uart/serial-uart_en.pdf > + * - U
Re: [PATCH v6 05/15] emul/ns16x50: implement SCR register
Hi Denis, On Sat, Sep 6, 2025 at 2:27 AM wrote: > > From: Denis Mukhin > > Add SCR register emulation to the I/O port handler. > Firmware (e.g. OVMF) may use SCR during the guest OS boot. > > Signed-off-by: Denis Mukhin > --- > Changes since v5: > - moved earlier in the series to simplify I/O handler population in > the follow on patches > - Link to v5: > https://lore.kernel.org/xen-devel/20250828235409.2835815-11-dmuk...@ford.com/ > --- > xen/common/emul/vuart/ns16x50.c | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/xen/common/emul/vuart/ns16x50.c b/xen/common/emul/vuart/ns16x50.c > index 7f479a5be4a2..51ec85e57627 100644 > --- a/xen/common/emul/vuart/ns16x50.c > +++ b/xen/common/emul/vuart/ns16x50.c > @@ -103,6 +103,20 @@ static int ns16x50_io_write8( > > if ( ns16x50_dlab_get(vdev) && (reg == UART_DLL || reg == UART_DLM) ) > regs[NS16X50_REGS_NUM + reg] = val; > +else > +{ > +switch ( reg ) > +{ > +/* NB: Firmware (e.g. OVMF) may rely on SCR presence. */ > +case UART_SCR: > +regs[UART_SCR] = val; > +break; > + > +default: > +rc = -EINVAL; In the previous commit, when ns16x50_dlab_get() was zero and UART_DLL or UART_DLM was accessed, the function returned 0. With this commit, the behavior changes: now an -EINVAL error is returned for both DLL and DLM when ns16x50_dlab_get() is zero. Should this be fixed in the previous commit, or is this change intentional in this one? Note that for 16-bit accesses you already return an error when ns16x50_dlab_get() is zero, so the behavior is inconsistent for 8-bit accesses to DLL/DLM. > +break; > +} > +} > > return rc; > } > @@ -165,6 +179,19 @@ static int ns16x50_io_read8( > > if ( ns16x50_dlab_get(vdev) && (reg == UART_DLL || reg == UART_DLM) ) > val = regs[NS16X50_REGS_NUM + reg]; > +else > +{ > +switch ( reg ) > +{ > +case UART_SCR: > +val = regs[UART_SCR]; > +break; > + > +default: > +rc = -EINVAL; > +break; > +} > +} > > *data = val; > > -- > 2.51.0 > > Best regards, Mykola