Re: [XEN PATCH v2 1/4] build,include: rework shell script for headers++.chk
On 14.06.2022 18:22, Anthony PERARD wrote: > The command line generated for headers++.chk by make is quite long, > and in some environment it is too long. This issue have been seen in > Yocto build environment. > > Error messages: > make[9]: execvp: /bin/sh: Argument list too long > make[9]: *** [include/Makefile:181: include/headers++.chk] Error 127 > > Rework so that we do the foreach loop in shell rather that make to > reduce the command line size by a lot. We also need a way to get > headers prerequisite for some public headers so we use a switch "case" > in shell to be able to do some simple pattern matching. Variables > alone in POSIX shell don't allow to work with associative array or > variables with "/". > > Reported-by: Bertrand Marquis > Fixes: 28e13c7f43 ("build: xen/include: use if_changed") > Signed-off-by: Anthony PERARD > Reviewed-by: Bertrand Marquis > --- > > Notes: > v2: > - fix typo in commit message > - fix out-of-tree build > > v1: > - was sent as a reply to v1 of the series > > xen/include/Makefile | 17 + > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/xen/include/Makefile b/xen/include/Makefile > index 6d9bcc19b0..49c75a78f9 100644 > --- a/xen/include/Makefile > +++ b/xen/include/Makefile > @@ -158,13 +158,22 @@ define cmd_headerscxx_chk > touch $@.new; \ > exit 0; \ > fi; \ > - $(foreach i, $(filter %.h,$^),\ > - echo "#include "\"$(i)\" \ > + get_prereq() {\ > + case $$1 in \ > + $(foreach i, $(filter %.h,$^),\ > + $(if $($(patsubst $(srctree)/%,%,$(i))-prereq), \ > + $(i)$(close) \ > + echo "$(foreach j, $($(patsubst $(srctree)/%,%,$(i))-prereq), \ > + -include c$(j))";;)) \ > + esac; \ If I'm reading this right (indentation looks to be a little misleading and hence one needs to count parentheses) the case statement could (in theory) remain without any "body". As per the command language spec I'm looking at this (if it works in the first place) is an extension, and formally there's always at least one label required. Since we aim to be portable in such regards, I'd like to request that there be a final otherwise empty *) line. > + };\ > + for i in $(filter %.h,$^); do \ > + echo "#include "\"$$i\" \ > | $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__\ > -include stdint.h -include $(srcdir)/public/xen.h \ > - $(foreach j, $($(patsubst $(srctree)/%,%,$i)-prereq), -include > c$(j)) \ > + `get_prereq $$i`\ While I know we use back-ticked quoting elsewhere, I'd generally recommend to use $() for readability. But maybe others view this exactly the other way around ... And a question without good context to put at: Isn't headers99.chk in similar need of converting? It looks only slightly less involved than the C++ one. Jan > -S -o /dev/null - \ > - || exit $$?; echo $(i) >> $@.new;) \ > + || exit $$?; echo $$i >> $@.new; done;\ > mv $@.new $@ > endef >
Re: [PATCH V4 0/8] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer
Hi Oleksandr, On Mon, Jun 6, 2022 at 10:16 AM Oleksandr Tyshchenko wrote: > The high level idea is to create new Xen’s grant table based DMA-mapping > layer for the guest Linux whose main > purpose is to provide a special 64-bit DMA address which is formed by using > the grant reference (for a page > to be shared with the backend) with offset and setting the highest address > bit (this is for the backend to > be able to distinguish grant ref based DMA address from normal GPA). For this > to work we need the ability > to allocate contiguous (consecutive) grant references for multi-page > allocations. And the backend then needs > to offer VIRTIO_F_ACCESS_PLATFORM and VIRTIO_F_VERSION_1 feature bits (it > must support virtio-mmio modern > transport for 64-bit addresses in the virtqueue). I was trying your series, from Linus's tree now and started seeing boot failures, failed to mount rootfs. And the reason probably is these messages: [ 1.222498] virtio_scsi virtio1: device must provide VIRTIO_F_ACCESS_PLATFORM [ 1.316334] virtio_net virtio0: device must provide VIRTIO_F_ACCESS_PLATFORM I understand from your email that the backends need to offer VIRTIO_F_ACCESS_PLATFORM flag now, but should this requirement be a bit soft ? I mean shouldn't we allow both types of backends to run with the same kernel, ones that offer this feature and others that don't ? The ones that don't offer the feature, should continue to work like they used to, i.e. without the restricted memory access feature. I am testing Xen currently with help of Qemu over my x86 desktop and these backends (scsi and net) are part of QEMU itself I think, and I don't really want to go and make the change there. Thanks. -- Viresh
Re: [XEN PATCH v2 3/4] build: set PERL
On 14.06.2022 18:22, Anthony PERARD wrote: > --- a/xen/Makefile > +++ b/xen/Makefile > @@ -22,6 +22,7 @@ PYTHON_INTERPRETER := $(word 1,$(shell which python3 > python python2 2>/dev/null) > export PYTHON?= $(PYTHON_INTERPRETER) > > export CHECKPOLICY ?= checkpolicy > +export PERL ?= perl > > $(if $(filter __%, $(MAKECMDGOALS)), \ > $(error targets prefixed with '__' are only for internal use)) Considering my patch yesterday that moved the exporting of CC etc, I wonder if - at the very least for consistency - this and the neighbouring pre-existing exports shouldn't all be moved below the inclusion of ./Config.mk as well. After all ./config might override any of those. (Yes, the ones here don't prevent such overriding, but only as long as there aren't any further make quirks.) Since this may want doing in a separate patch, this one is Acked-by: Jan Beulich Jan
Re: [PATCH 23/36] arm64,smp: Remove trace_.*_rcuidle() usage
On Tue, 14 Jun 2022 17:24:48 +0100, Mark Rutland wrote: > > On Wed, Jun 08, 2022 at 04:27:46PM +0200, Peter Zijlstra wrote: > > Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as > > normal interrupts") this function is called in regular IRQ context. > > > > Signed-off-by: Peter Zijlstra (Intel) > > [adding Marc since he authored that commit] > > Makes sense to me: > > Acked-by: Mark Rutland > > Mark. > > > --- > > arch/arm64/kernel/smp.c |4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr) > > unsigned int cpu = smp_processor_id(); > > > > if ((unsigned)ipinr < NR_IPI) > > - trace_ipi_entry_rcuidle(ipi_types[ipinr]); > > + trace_ipi_entry(ipi_types[ipinr]); > > > > switch (ipinr) { > > case IPI_RESCHEDULE: > > @@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr) > > } > > > > if ((unsigned)ipinr < NR_IPI) > > - trace_ipi_exit_rcuidle(ipi_types[ipinr]); > > + trace_ipi_exit(ipi_types[ipinr]); > > } > > > > static irqreturn_t ipi_handler(int irq, void *data) Acked-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH 34.5/36] cpuidle,omap4: Push RCU-idle into omap4_enter_lowpower()
Hi, Adding Aaro Koskinen and Peter Vasil for pm24xx for n800 and n810 related idle. * Peter Zijlstra [220614 22:07]: > On Mon, Jun 13, 2022 at 03:39:05PM +0300, Tony Lindgren wrote: > > OMAP4 uses full SoC suspend modes as idle states, as such it needs the > > whole power-domain and clock-domain code from the idle path. > > > > All that code is not suitable to run with RCU disabled, as such push > > RCU-idle deeper still. > > > > Signed-off-by: Tony Lindgren > > --- > > > > Peter here's one more for your series, looks like this is needed to avoid > > warnings similar to what you did for omap3. > > Thanks Tony! > > I've had a brief look at omap2_pm_idle() and do I understand it right > that something like the below patch would reduce it to a simple 'WFI'? Yes that should do for omap2_do_wfi(). > What do I do with the rest of that code, because I don't think this > thing has a cpuidle driver to take over, effectively turning it into > dead code. As we are establishing a policy where deeper idle states must be handled by cpuidle, and for most part that has been the case for at least 10 years, I'd just drop the unused functions with an explanation in the patch why we're doing it. Or the functions could be tagged with __maybe_unused if folks prefer that. In the pm24xx case we are not really causing a regression for users as there are still pending patches to make n800 and n810 truly usable with the mainline kernel. At least the PMIC and LCD related patches need some work [0]. The deeper idle states can be added back later using cpuidle as needed so we have a clear path. Aaro & Peter V, do you have any better suggestions here as this will mostly affect you guys currently? Regards, Tony [0] https://lore.kernel.org/linux-omap/20211224214512.1583430-1-peter.va...@gmail.com/ > --- a/arch/arm/mach-omap2/pm24xx.c > +++ b/arch/arm/mach-omap2/pm24xx.c > @@ -126,10 +126,20 @@ static int omap2_allow_mpu_retention(voi > return 1; > } > > -static void omap2_enter_mpu_retention(void) > +static void omap2_do_wfi(void) > { > const int zero = 0; > > + /* WFI */ > + asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc"); > +} > + > +#if 0 > +/* > + * possible cpuidle implementation between WFI and full_retention above > + */ > +static void omap2_enter_mpu_retention(void) > +{ > /* The peripherals seem not to be able to wake up the MPU when >* it is in retention mode. */ > if (omap2_allow_mpu_retention()) { > @@ -146,8 +157,7 @@ static void omap2_enter_mpu_retention(vo > pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON); > } > > - /* WFI */ > - asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc"); > + omap2_do_wfi(); > > pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON); > } > @@ -161,6 +171,7 @@ static int omap2_can_sleep(void) > > return 1; > } > +#endif > > static void omap2_pm_idle(void) > { > @@ -169,6 +180,7 @@ static void omap2_pm_idle(void) > if (omap_irq_pending()) > return; > > +#if 0 > error = cpu_cluster_pm_enter(); > if (error || !omap2_can_sleep()) { > omap2_enter_mpu_retention(); > @@ -179,6 +191,9 @@ static void omap2_pm_idle(void) > > out_cpu_cluster_pm: > cpu_cluster_pm_exit(); > +#else > + omap2_do_wfi(); > +#endif > } > > static void __init prcm_setup_regs(void)
Re: [Pv-drivers] [PATCH 29/36] cpuidle, xenpv: Make more PARAVIRT_XXL noinstr clean
On Mon, Jun 13, 2022 at 07:23:13PM +, Nadav Amit wrote: > On Jun 13, 2022, at 11:48 AM, Srivatsa S. Bhat wrote: > > > ⚠ External Email > > > > On 6/8/22 4:27 PM, Peter Zijlstra wrote: > >> vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() > >> leaves .noinstr.text section > >> vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() > >> leaves .noinstr.text section > >> vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to > >> HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section > >> > >> Signed-off-by: Peter Zijlstra (Intel) > > > > Reviewed-by: Srivatsa S. Bhat (VMware) > > > >> > >> -static inline void wbinvd(void) > >> +extern noinstr void pv_native_wbinvd(void); > >> + > >> +static __always_inline void wbinvd(void) > >> { > >> PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV)); > >> } > > I guess it is yet another instance of wrong accounting of GCC for > the assembly blocks’ weight. I guess it is not a solution for older > GCCs, but presumably PVOP_ALT_CALL() and friends should have > used asm_inline or some new “asm_volatile_inline” variant. Partially, some of the *SAN options also generate a metric ton of nonsense when enabled and skew the compilers towards not inlining things.
Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()
On Tue, Jun 14, 2022 at 05:13:16PM +0100, Mark Rutland wrote: > On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote: > > All callers should still have RCU enabled. > > IIUC with that true we should be able to drop the RCU_NONIDLE() from > drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm > notifier. > > I should be able to give that a spin on some hardware. > > > > > Signed-off-by: Peter Zijlstra (Intel) > > --- > > kernel/cpu_pm.c |9 - > > 1 file changed, 9 deletions(-) > > > > --- a/kernel/cpu_pm.c > > +++ b/kernel/cpu_pm.c > > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve > > { > > int ret; > > > > - /* > > -* This introduces a RCU read critical section, which could be > > -* disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know > > -* this. > > -*/ > > - rcu_irq_enter_irqson(); > > rcu_read_lock(); > > ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL); > > rcu_read_unlock(); > > - rcu_irq_exit_irqson(); > > To make this easier to debug, is it worth adding an assertion that RCU is > watching here? e.g. > > RCU_LOCKDEP_WARN(!rcu_is_watching(), >"cpu_pm_notify() used illegally from EQS"); > My understanding is that rcu_read_lock() implies something along those lines when PROVE_RCU.
Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage
On Tue, Jun 14, 2022 at 01:41:13PM +0100, Mark Rutland wrote: > On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote: > > --- a/kernel/time/tick-broadcast.c > > +++ b/kernel/time/tick-broadcast.c > > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh > > * to avoid a deep idle transition as we are about to get the > > * broadcast IPI right away. > > */ > > -int tick_check_broadcast_expired(void) > > +noinstr int tick_check_broadcast_expired(void) > > { > > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H > > + return arch_test_bit(smp_processor_id(), > > cpumask_bits(tick_broadcast_force_mask)); > > +#else > > return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask); > > +#endif > > } > > This is somewhat not-ideal. :/ I'll say. > Could we unconditionally do the arch_test_bit() variant, with a comment, or > does that not exist in some cases? Loads of build errors ensued, which is how I ended up with this mess ...
Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess
On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote: > On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote: > > Hi All! (omg so many) > > Hi Peter, > > Sorry for the delay; my plate has also been rather full recently. I'm > beginning > to page this in now. No worries; we all have too much to do ;-) > > These here few patches mostly clear out the utter mess that is cpuidle vs > > rcuidle. > > > > At the end of the ride there's only 2 real RCU_NONIDLE() users left > > > > arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit()); > > drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, > > PERF_EF_RELOAD)); > > The latter of these is necessary because apparently PM notifiers are called > with RCU not watching. Is that still the case today (or at the end of this > series)? If so, that feels like fertile land for more issues (yaey...). If > not, > we should be able to drop this. That should be fixed; fingers crossed :-) > > kernel/cfi.c: RCU_NONIDLE({ > > > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand > > full > > of trace_.*_rcuidle() left: > > > > kernel/trace/trace_preemptirq.c: > > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > kernel/trace/trace_preemptirq.c: > > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > kernel/trace/trace_preemptirq.c: > > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); > > kernel/trace/trace_preemptirq.c: > > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); > > kernel/trace/trace_preemptirq.c: > > trace_preempt_enable_rcuidle(a0, a1); > > kernel/trace/trace_preemptirq.c: > > trace_preempt_disable_rcuidle(a0, a1); > > > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY. > > I think those are also unused on arm64 too? > > If not, I can go attack that. My grep spots: arch/arm64/kernel/entry-common.c: trace_hardirqs_on(); arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); The _on thing should be replaced with something like: trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); instrumentation_end(); rcu_irq_exit(); lockdep_hardirqs_on(CALLER_ADDR0); (as I think you know, since you have some of that already). And something similar for the _off thing, but with _off_finish(). > > I've touched a _lot_ of code that I can't test and likely broken some of it > > :/ > > In particular, the whole ARM cpuidle stuff was quite involved with OMAP > > being > > the absolute 'winner'. > > > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that > > to > > GENERIC_ENTRY. > > Moving to GENERIC_ENTRY as a whole is going to take a tonne of work > (refactoring both arm64 and the generic portion to be more amenable to each > other), but we can certainly move closer to that for the bits that matter > here. I know ... been there etc.. :-) > Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff > that > we can select regardless of GENERIC_ENTRY to make that easier. Possible yeah. > > I've also got a note that says ARM64 can probably do a WFE based > > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs. > > Possibly; I'm not sure how much of a win that'll be given that by default > we'll > have a ~10KHz WFE wakeup from the timer, but we could take a peek. Ohh.. I didn't know it woke up *that* often. I just know Will made use of it in things like smp_cond_load_relaxed() which would be somewhat similar to a very shallow idle state that looks at the TIF word.
Re: [PATCH 34.5/36] cpuidle,omap4: Push RCU-idle into omap4_enter_lowpower()
On Mon, Jun 13, 2022 at 03:39:05PM +0300, Tony Lindgren wrote: > OMAP4 uses full SoC suspend modes as idle states, as such it needs the > whole power-domain and clock-domain code from the idle path. > > All that code is not suitable to run with RCU disabled, as such push > RCU-idle deeper still. > > Signed-off-by: Tony Lindgren > --- > > Peter here's one more for your series, looks like this is needed to avoid > warnings similar to what you did for omap3. Thanks Tony! I've had a brief look at omap2_pm_idle() and do I understand it right that something like the below patch would reduce it to a simple 'WFI'? What do I do with the rest of that code, because I don't think this thing has a cpuidle driver to take over, effectively turning it into dead code. --- a/arch/arm/mach-omap2/pm24xx.c +++ b/arch/arm/mach-omap2/pm24xx.c @@ -126,10 +126,20 @@ static int omap2_allow_mpu_retention(voi return 1; } -static void omap2_enter_mpu_retention(void) +static void omap2_do_wfi(void) { const int zero = 0; + /* WFI */ + asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc"); +} + +#if 0 +/* + * possible cpuidle implementation between WFI and full_retention above + */ +static void omap2_enter_mpu_retention(void) +{ /* The peripherals seem not to be able to wake up the MPU when * it is in retention mode. */ if (omap2_allow_mpu_retention()) { @@ -146,8 +157,7 @@ static void omap2_enter_mpu_retention(vo pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON); } - /* WFI */ - asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc"); + omap2_do_wfi(); pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON); } @@ -161,6 +171,7 @@ static int omap2_can_sleep(void) return 1; } +#endif static void omap2_pm_idle(void) { @@ -169,6 +180,7 @@ static void omap2_pm_idle(void) if (omap_irq_pending()) return; +#if 0 error = cpu_cluster_pm_enter(); if (error || !omap2_can_sleep()) { omap2_enter_mpu_retention(); @@ -179,6 +191,9 @@ static void omap2_pm_idle(void) out_cpu_cluster_pm: cpu_cluster_pm_exit(); +#else + omap2_do_wfi(); +#endif } static void __init prcm_setup_regs(void)
Re: [PATCH 16/36] rcu: Fix rcu_idle_exit()
On Wed, Jun 08, 2022 at 04:27:39PM +0200, Peter Zijlstra wrote: > Current rcu_idle_exit() is terminally broken because it uses > local_irq_{save,restore}(), which are traced which uses RCU. > > However, now that all the callers are sure to have IRQs disabled, we > can remove these calls. > > Signed-off-by: Peter Zijlstra (Intel) > Acked-by: Paul E. McKenney We have some fun conflicts between this series and Frederic's context-tracking series. But it looks like these can be resolved by: 1. A patch on top of Frederic's series that provides the old rcu_*() names for the functions now prefixed with ct_*() such as ct_idle_exit(). 2. Another patch on top of Frederic's series that takes the changes remaining from this patch, shown below. Frederic's series uses raw_local_irq_save() and raw_local_irq_restore(), which can then be removed. Or is there a better way to do this? Thanx, Paul commit f64cee8c159e9863a74594efe3d33fb513a6a7b5 Author: Peter Zijlstra Date: Tue Jun 14 17:24:43 2022 -0700 context_tracking: Interrupts always disabled for ct_idle_exit() Now that the idle-loop cleanups have ensured that rcu_idle_exit() is always invoked with interrupts disabled, remove the interrupt disabling in favor of a debug check. Signed-off-by: Peter Zijlstra Cc: Frederic Weisbecker Signed-off-by: Paul E. McKenney diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 1da44803fd319..99310cf5b0254 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -332,11 +332,8 @@ EXPORT_SYMBOL_GPL(ct_idle_enter); */ void noinstr ct_idle_exit(void) { - unsigned long flags; - - raw_local_irq_save(flags); + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled()); ct_kernel_enter(false, RCU_DYNTICKS_IDX - CONTEXT_IDLE); - raw_local_irq_restore(flags); } EXPORT_SYMBOL_GPL(ct_idle_exit);
Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess
On Tue, Jun 14, 2022 at 06:58:30PM +0200, Peter Zijlstra wrote: > On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote: > > On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote: > > > Hi All! (omg so many) > > > > Hi Peter, > > > > Sorry for the delay; my plate has also been rather full recently. I'm > > beginning > > to page this in now. > > No worries; we all have too much to do ;-) > > > > These here few patches mostly clear out the utter mess that is cpuidle vs > > > rcuidle. > > > > > > At the end of the ride there's only 2 real RCU_NONIDLE() users left > > > > > > arch/arm64/kernel/suspend.c: > > > RCU_NONIDLE(__cpu_suspend_exit()); > > > drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, > > > PERF_EF_RELOAD)); > > > > The latter of these is necessary because apparently PM notifiers are called > > with RCU not watching. Is that still the case today (or at the end of this > > series)? If so, that feels like fertile land for more issues (yaey...). If > > not, > > we should be able to drop this. > > That should be fixed; fingers crossed :-) Cool; I'll try to give that a spin when I'm sat next to some relevant hardware. :) > > > kernel/cfi.c: RCU_NONIDLE({ > > > > > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand > > > full > > > of trace_.*_rcuidle() left: > > > > > > kernel/trace/trace_preemptirq.c: > > > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > > kernel/trace/trace_preemptirq.c: > > > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > > kernel/trace/trace_preemptirq.c: > > > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); > > > kernel/trace/trace_preemptirq.c: > > > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); > > > kernel/trace/trace_preemptirq.c: > > > trace_preempt_enable_rcuidle(a0, a1); > > > kernel/trace/trace_preemptirq.c: > > > trace_preempt_disable_rcuidle(a0, a1); > > > > > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY. > > I think those are also unused on arm64 too? > > > > If not, I can go attack that. > > My grep spots: > > arch/arm64/kernel/entry-common.c: trace_hardirqs_on(); > arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); > arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); Ah; I hadn't realised those used trace_.*_rcuidle() behind the scenes. That affects local_irq_{enable,disable,restore}() too (which is what the daifflags.h bits are emulating), and also the generic entry code's irqentry_exit(). So it feels to me like we should be fixing those more generally? e.g. say that with a new STRICT_ENTRY[_RCU], we can only call trace_hardirqs_{on,off}() with RCU watching, and alter the definition of those? > The _on thing should be replaced with something like: > > trace_hardirqs_on_prepare(); > lockdep_hardirqs_on_prepare(); > instrumentation_end(); > rcu_irq_exit(); > lockdep_hardirqs_on(CALLER_ADDR0); > > (as I think you know, since you have some of that already). And > something similar for the _off thing, but with _off_finish(). Sure; I knew that was necessary for the outermost parts of entry (and I think that's all handled), I just hadn't realised that trace_hardirqs_{on,off} did the rcuidle thing in the middle. It'd be nice to not have to open-code the whole sequence everywhere for the portions which run after entry and are instrumentable, so (as above) I reckon we want to make trace_hardirqs_{on,off}() not do the rcuidle part unnecessarily (which IIUC is an end-goal anyway)? > > > I've touched a _lot_ of code that I can't test and likely broken some of > > > it :/ > > > In particular, the whole ARM cpuidle stuff was quite involved with OMAP > > > being > > > the absolute 'winner'. > > > > > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves > > > that to > > > GENERIC_ENTRY. > > > > Moving to GENERIC_ENTRY as a whole is going to take a tonne of work > > (refactoring both arm64 and the generic portion to be more amenable to each > > other), but we can certainly move closer to that for the bits that matter > > here. > > I know ... been there etc.. :-) > > > Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff > > that > > we can select regardless of GENERIC_ENTRY to make that easier. > > Possible yeah. > > > > I've also got a note that says ARM64 can probably do a WFE based > > > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs. > > > > Possibly; I'm not sure how much of a win that'll be given that by default > > we'll > > have a ~10KHz WFE wakeup from the timer, but we could take a peek. > > Ohh.. I didn't know it woke up *that* often. I just know Will made use > of it in things like smp_cond_load_relaxed() which would be s
Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()
On Tue, Jun 14, 2022 at 06:42:14PM +0200, Peter Zijlstra wrote: > On Tue, Jun 14, 2022 at 05:13:16PM +0100, Mark Rutland wrote: > > On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote: > > > All callers should still have RCU enabled. > > > > IIUC with that true we should be able to drop the RCU_NONIDLE() from > > drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm > > notifier. > > > > I should be able to give that a spin on some hardware. > > > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > --- > > > kernel/cpu_pm.c |9 - > > > 1 file changed, 9 deletions(-) > > > > > > --- a/kernel/cpu_pm.c > > > +++ b/kernel/cpu_pm.c > > > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve > > > { > > > int ret; > > > > > > - /* > > > - * This introduces a RCU read critical section, which could be > > > - * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know > > > - * this. > > > - */ > > > - rcu_irq_enter_irqson(); > > > rcu_read_lock(); > > > ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL); > > > rcu_read_unlock(); > > > - rcu_irq_exit_irqson(); > > > > To make this easier to debug, is it worth adding an assertion that RCU is > > watching here? e.g. > > > > RCU_LOCKDEP_WARN(!rcu_is_watching(), > > "cpu_pm_notify() used illegally from EQS"); > > > > My understanding is that rcu_read_lock() implies something along those > lines when PROVE_RCU. Ah, duh. Given that: Acked-by: Mark Rutland Mark.
Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage
On Tue, Jun 14, 2022 at 06:40:53PM +0200, Peter Zijlstra wrote: > On Tue, Jun 14, 2022 at 01:41:13PM +0100, Mark Rutland wrote: > > On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote: > > > --- a/kernel/time/tick-broadcast.c > > > +++ b/kernel/time/tick-broadcast.c > > > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh > > > * to avoid a deep idle transition as we are about to get the > > > * broadcast IPI right away. > > > */ > > > -int tick_check_broadcast_expired(void) > > > +noinstr int tick_check_broadcast_expired(void) > > > { > > > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H > > > + return arch_test_bit(smp_processor_id(), > > > cpumask_bits(tick_broadcast_force_mask)); > > > +#else > > > return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask); > > > +#endif > > > } > > > > This is somewhat not-ideal. :/ > > I'll say. > > > Could we unconditionally do the arch_test_bit() variant, with a comment, or > > does that not exist in some cases? > > Loads of build errors ensued, which is how I ended up with this mess ... Yaey :( I see the same is true for the thread flag manipulation too. I'll take a look and see if we can layer things so that we can use the arch_*() helpers and wrap those consistently so that we don't have to check the CPP guard. Ideally we'd have a a better language that allows us to make some context-senstive decisions, then we could hide all this gunk in the lower levels with somethin like: if (!THIS_IS_A_NOINSTR_FUNCTION()) { explicit_instrumentation(...); } ... ho hum. Mark.
Re: [PATCH 16/36] rcu: Fix rcu_idle_exit()
On Wed, Jun 08, 2022 at 04:27:39PM +0200, Peter Zijlstra wrote: > Current rcu_idle_exit() is terminally broken because it uses > local_irq_{save,restore}(), which are traced which uses RCU. > > However, now that all the callers are sure to have IRQs disabled, we > can remove these calls. > > Signed-off-by: Peter Zijlstra (Intel) > Acked-by: Paul E. McKenney Acked-by: Mark Rutland Mark. > --- > kernel/rcu/tree.c |9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -659,7 +659,7 @@ static noinstr void rcu_eqs_enter(bool u > * If you add or remove a call to rcu_idle_enter(), be sure to test with > * CONFIG_RCU_EQS_DEBUG=y. > */ > -void rcu_idle_enter(void) > +void noinstr rcu_idle_enter(void) > { > lockdep_assert_irqs_disabled(); > rcu_eqs_enter(false); > @@ -896,13 +896,10 @@ static void noinstr rcu_eqs_exit(bool us > * If you add or remove a call to rcu_idle_exit(), be sure to test with > * CONFIG_RCU_EQS_DEBUG=y. > */ > -void rcu_idle_exit(void) > +void noinstr rcu_idle_exit(void) > { > - unsigned long flags; > - > - local_irq_save(flags); > + lockdep_assert_irqs_disabled(); > rcu_eqs_exit(false); > - local_irq_restore(flags); > } > EXPORT_SYMBOL_GPL(rcu_idle_exit); > > >
Re: [PATCH 20/36] arch/idle: Change arch_cpu_idle() IRQ behaviour
On Wed, Jun 08, 2022 at 04:27:43PM +0200, Peter Zijlstra wrote: > Current arch_cpu_idle() is called with IRQs disabled, but will return > with IRQs enabled. > > However, the very first thing the generic code does after calling > arch_cpu_idle() is raw_local_irq_disable(). This means that > architectures that can idle with IRQs disabled end up doing a > pointless 'enable-disable' dance. > > Therefore, push this IRQ disabling into the idle function, meaning > that those architectures can avoid the pointless IRQ state flipping. > > Signed-off-by: Peter Zijlstra (Intel) Nice! Acked-by: Mark Rutland [arm64] Mark. > --- > arch/alpha/kernel/process.c |1 - > arch/arc/kernel/process.c|3 +++ > arch/arm/kernel/process.c|1 - > arch/arm/mach-gemini/board-dt.c |3 ++- > arch/arm64/kernel/idle.c |1 - > arch/csky/kernel/process.c |1 - > arch/csky/kernel/smp.c |2 +- > arch/hexagon/kernel/process.c|1 - > arch/ia64/kernel/process.c |1 + > arch/microblaze/kernel/process.c |1 - > arch/mips/kernel/idle.c |8 +++- > arch/nios2/kernel/process.c |1 - > arch/openrisc/kernel/process.c |1 + > arch/parisc/kernel/process.c |2 -- > arch/powerpc/kernel/idle.c |5 ++--- > arch/riscv/kernel/process.c |1 - > arch/s390/kernel/idle.c |1 - > arch/sh/kernel/idle.c|1 + > arch/sparc/kernel/leon_pmc.c |4 > arch/sparc/kernel/process_32.c |1 - > arch/sparc/kernel/process_64.c |3 ++- > arch/um/kernel/process.c |1 - > arch/x86/coco/tdx/tdx.c |3 +++ > arch/x86/kernel/process.c| 15 --- > arch/xtensa/kernel/process.c |1 + > kernel/sched/idle.c |2 -- > 26 files changed, 28 insertions(+), 37 deletions(-) > > --- a/arch/alpha/kernel/process.c > +++ b/arch/alpha/kernel/process.c > @@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off); > void arch_cpu_idle(void) > { > wtint(0); > - raw_local_irq_enable(); > } > > void arch_cpu_idle_dead(void) > --- a/arch/arc/kernel/process.c > +++ b/arch/arc/kernel/process.c > @@ -114,6 +114,8 @@ void arch_cpu_idle(void) > "sleep %0 \n" > : > :"I"(arg)); /* can't be "r" has to be embedded const */ > + > + raw_local_irq_disable(); > } > > #else/* ARC700 */ > @@ -122,6 +124,7 @@ void arch_cpu_idle(void) > { > /* sleep, but enable both set E1/E2 (levels of interrupts) before > committing */ > __asm__ __volatile__("sleep 0x3 \n"); > + raw_local_irq_disable(); > } > > #endif > --- a/arch/arm/kernel/process.c > +++ b/arch/arm/kernel/process.c > @@ -78,7 +78,6 @@ void arch_cpu_idle(void) > arm_pm_idle(); > else > cpu_do_idle(); > - raw_local_irq_enable(); > } > > void arch_cpu_idle_prepare(void) > --- a/arch/arm/mach-gemini/board-dt.c > +++ b/arch/arm/mach-gemini/board-dt.c > @@ -42,8 +42,9 @@ static void gemini_idle(void) >*/ > > /* FIXME: Enabling interrupts here is racy! */ > - local_irq_enable(); > + raw_local_irq_enable(); > cpu_do_idle(); > + raw_local_irq_disable(); > } > > static void __init gemini_init_machine(void) > --- a/arch/arm64/kernel/idle.c > +++ b/arch/arm64/kernel/idle.c > @@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void) >* tricks >*/ > cpu_do_idle(); > - raw_local_irq_enable(); > } > --- a/arch/csky/kernel/process.c > +++ b/arch/csky/kernel/process.c > @@ -101,6 +101,5 @@ void arch_cpu_idle(void) > #ifdef CONFIG_CPU_PM_STOP > asm volatile("stop\n"); > #endif > - raw_local_irq_enable(); > } > #endif > --- a/arch/csky/kernel/smp.c > +++ b/arch/csky/kernel/smp.c > @@ -314,7 +314,7 @@ void arch_cpu_idle_dead(void) > while (!secondary_stack) > arch_cpu_idle(); > > - local_irq_disable(); > + raw_local_irq_disable(); > > asm volatile( > "movsp, %0\n" > --- a/arch/hexagon/kernel/process.c > +++ b/arch/hexagon/kernel/process.c > @@ -44,7 +44,6 @@ void arch_cpu_idle(void) > { > __vmwait(); > /* interrupts wake us up, but irqs are still disabled */ > - raw_local_irq_enable(); > } > > /* > --- a/arch/ia64/kernel/process.c > +++ b/arch/ia64/kernel/process.c > @@ -241,6 +241,7 @@ void arch_cpu_idle(void) > (*mark_idle)(1); > > raw_safe_halt(); > + raw_local_irq_disable(); > > if (mark_idle) > (*mark_idle)(0); > --- a/arch/microblaze/kernel/process.c > +++ b/arch/microblaze/kernel/process.c > @@ -138,5 +138,4 @@ int dump_fpu(struct pt_regs *regs, elf_f > > void arch_cpu_idle(void) > { > - raw_local_irq_enable(); > } > --- a/arch/mips/kernel/idle.c > +++ b/arch/mips/kernel/idle.c > @@ -33,13 +33,13 @@ static void __cpuidle r3081_wait(void) > { > unsigned long cfg
Re: [PATCH 25/36] time/tick-broadcast: Remove RCU_NONIDLE usage
On Wed, Jun 08, 2022 at 04:27:48PM +0200, Peter Zijlstra wrote: > No callers left that have already disabled RCU. > > Signed-off-by: Peter Zijlstra (Intel) Acked-by: Mark Rutland Mark. > --- > kernel/time/tick-broadcast-hrtimer.c | 29 - > 1 file changed, 12 insertions(+), 17 deletions(-) > > --- a/kernel/time/tick-broadcast-hrtimer.c > +++ b/kernel/time/tick-broadcast-hrtimer.c > @@ -56,25 +56,20 @@ static int bc_set_next(ktime_t expires, >* hrtimer callback function is currently running, then >* hrtimer_start() cannot move it and the timer stays on the CPU on >* which it is assigned at the moment. > + */ > + hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD); > + /* > + * The core tick broadcast mode expects bc->bound_on to be set > + * correctly to prevent a CPU which has the broadcast hrtimer > + * armed from going deep idle. >* > - * As this can be called from idle code, the hrtimer_start() > - * invocation has to be wrapped with RCU_NONIDLE() as > - * hrtimer_start() can call into tracing. > + * As tick_broadcast_lock is held, nothing can change the cpu > + * base which was just established in hrtimer_start() above. So > + * the below access is safe even without holding the hrtimer > + * base lock. >*/ > - RCU_NONIDLE( { > - hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD); > - /* > - * The core tick broadcast mode expects bc->bound_on to be set > - * correctly to prevent a CPU which has the broadcast hrtimer > - * armed from going deep idle. > - * > - * As tick_broadcast_lock is held, nothing can change the cpu > - * base which was just established in hrtimer_start() above. So > - * the below access is safe even without holding the hrtimer > - * base lock. > - */ > - bc->bound_on = bctimer.base->cpu_base->cpu; > - } ); > + bc->bound_on = bctimer.base->cpu_base->cpu; > + > return 0; > } > > >
Re: [PATCH 23/36] arm64,smp: Remove trace_.*_rcuidle() usage
On Wed, Jun 08, 2022 at 04:27:46PM +0200, Peter Zijlstra wrote: > Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as > normal interrupts") this function is called in regular IRQ context. > > Signed-off-by: Peter Zijlstra (Intel) [adding Marc since he authored that commit] Makes sense to me: Acked-by: Mark Rutland Mark. > --- > arch/arm64/kernel/smp.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr) > unsigned int cpu = smp_processor_id(); > > if ((unsigned)ipinr < NR_IPI) > - trace_ipi_entry_rcuidle(ipi_types[ipinr]); > + trace_ipi_entry(ipi_types[ipinr]); > > switch (ipinr) { > case IPI_RESCHEDULE: > @@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr) > } > > if ((unsigned)ipinr < NR_IPI) > - trace_ipi_exit_rcuidle(ipi_types[ipinr]); > + trace_ipi_exit(ipi_types[ipinr]); > } > > static irqreturn_t ipi_handler(int irq, void *data) > >
Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()
On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote: > All callers should still have RCU enabled. IIUC with that true we should be able to drop the RCU_NONIDLE() from drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm notifier. I should be able to give that a spin on some hardware. > > Signed-off-by: Peter Zijlstra (Intel) > --- > kernel/cpu_pm.c |9 - > 1 file changed, 9 deletions(-) > > --- a/kernel/cpu_pm.c > +++ b/kernel/cpu_pm.c > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve > { > int ret; > > - /* > - * This introduces a RCU read critical section, which could be > - * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know > - * this. > - */ > - rcu_irq_enter_irqson(); > rcu_read_lock(); > ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL); > rcu_read_unlock(); > - rcu_irq_exit_irqson(); To make this easier to debug, is it worth adding an assertion that RCU is watching here? e.g. RCU_LOCKDEP_WARN(!rcu_is_watching(), "cpu_pm_notify() used illegally from EQS"); > > return notifier_to_errno(ret); > } > @@ -49,11 +42,9 @@ static int cpu_pm_notify_robust(enum cpu > unsigned long flags; > int ret; > > - rcu_irq_enter_irqson(); > raw_spin_lock_irqsave(&cpu_pm_notifier.lock, flags); > ret = raw_notifier_call_chain_robust(&cpu_pm_notifier.chain, event_up, > event_down, NULL); > raw_spin_unlock_irqrestore(&cpu_pm_notifier.lock, flags); > - rcu_irq_exit_irqson(); ... and likewise here? Thanks, Mark. > > return notifier_to_errno(ret); > } > >
[xen-unstable-smoke test] 171170: tolerable all pass - PUSHED
flight 171170 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/171170/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 162dea4e768b835114c736cfd3fa1fc3742d39c5 baseline version: xen c9a707df83aad17a6fcf2e8330ab3b5bead6fb8b Last test of basis 171079 2022-06-11 12:00:26 Z3 days Testing same since 171170 2022-06-15 00:02:01 Z0 days1 attempts People who touched revisions under test: Julien Grall Roger Pau Monné Stefano Stabellini Stefano Stabellini jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git c9a707df83..162dea4e76 162dea4e768b835114c736cfd3fa1fc3742d39c5 -> smoke
[PATCH] xen/arm: avoid vtimer flip-flop transition in context switch
virt_vtimer_save is calculating the new time for the vtimer and has a potential risk of timer flip in: "v->arch.virt_timer.cval + v->domain->arch.virt_timer_base.offset - boot_count". In this formula, "cval + offset" could make uint64_t overflow. Generally speaking, this is difficult to trigger. But unfortunately the problem was encountered with a platform where the timer started with a very huge initial value, like 0xF3338991. On this platform cval + offset is overflowing after running for a while. So in this patch, we adjust the formula to use "offset - boot_count" first, and then use the result to plus cval. This will avoid the uint64_t overflow. Signed-off-by: Wei Chen --- xen/arch/arm/vtimer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index 5bb5970f58..86e63303c8 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -144,8 +144,9 @@ void virt_timer_save(struct vcpu *v) if ( (v->arch.virt_timer.ctl & CNTx_CTL_ENABLE) && !(v->arch.virt_timer.ctl & CNTx_CTL_MASK)) { -set_timer(&v->arch.virt_timer.timer, ticks_to_ns(v->arch.virt_timer.cval + - v->domain->arch.virt_timer_base.offset - boot_count)); +set_timer(&v->arch.virt_timer.timer, + ticks_to_ns(v->domain->arch.virt_timer_base.offset - + boot_count + v->arch.virt_timer.cval)); } } -- 2.25.1
[linux-5.4 test] 171167: regressions - FAIL
flight 171167 linux-5.4 real [real] flight 171169 linux-5.4 real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/171167/ http://logs.test-lab.xenproject.org/osstest/logs/171169/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-libvirt-raw 17 guest-start/debian.repeat fail REGR. vs. 170895 Tests which are failing intermittently (not blocking): test-armhf-armhf-libvirt-raw 13 guest-start fail in 171169 pass in 171167 test-amd64-i386-xl-qemut-debianhvm-amd64 20 guest-start/debianhvm.repeat fail pass in 171169-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170895 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170895 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 170895 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 170895 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 170895 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170895 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170895 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 170895 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 170895 test-armhf-armhf-xl-credit2 14 guest-start fail like 170895 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 170895 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170895 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 170895 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfa
Re: [RFC PATCH 2/2] xen/grant-table: Use unpopulated DMAable pages instead of real RAM ones
On Tue, 14 Jun 2022, Oleksandr wrote: > On 11.06.22 02:55, Stefano Stabellini wrote: > > Hello Stefano > > > On Thu, 9 Jun 2022, Oleksandr wrote: > > > On 04.06.22 00:19, Stefano Stabellini wrote: > > > Hello Stefano > > > > > > Thank you for having a look and sorry for the late response. > > > > > > > On Tue, 17 May 2022, Oleksandr Tyshchenko wrote: > > > > > From: Oleksandr Tyshchenko > > > > > > > > > > Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated > > > > > DMAable (contiguous) pages will be allocated for grant mapping into > > > > > instead of ballooning out real RAM pages. > > > > > > > > > > TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() > > > > > fails. > > > > > > > > > > Signed-off-by: Oleksandr Tyshchenko > > > > > --- > > > > >drivers/xen/grant-table.c | 27 +++ > > > > >1 file changed, 27 insertions(+) > > > > > > > > > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > > > > > index 8ac..2bb4392 100644 > > > > > --- a/drivers/xen/grant-table.c > > > > > +++ b/drivers/xen/grant-table.c > > > > > @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages); > > > > > */ > > > > >int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args) > > > > >{ > > > > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC > > > > > + int ret; > > > > This is an alternative implementation of the same function. > > > Currently, yes. > > > > > > > > > >If we are > > > > going to use #ifdef, then I would #ifdef the entire function, rather > > > > than just the body. Otherwise within the function body we can use > > > > IS_ENABLED. > > > > > > Good point. Note, there is one missing thing in current patch which is > > > described in TODO. > > > > > > "Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails." > > > So I > > > will likely use IS_ENABLED within the function body. > > > > > > If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages() > > > will > > > try to call xen_alloc_unpopulated_dma_pages() the first and if fails then > > > fallback to allocate RAM pages and balloon them out. > > > > > > One moment is not entirely clear to me. If we use fallback in > > > gnttab_dma_alloc_pages() then we must use fallback in > > > gnttab_dma_free_pages() > > > as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM > > > pages. > > > The question is how to pass this information to the > > > gnttab_dma_free_pages()? > > > The first idea which comes to mind is to add a flag to struct > > > gnttab_dma_alloc_args... > > You can check if the page is within the mhp_range range or part of > > iomem_resource? If not, you can free it as a normal page. > > > > If we do this, then the fallback is better implemented in > > unpopulated-alloc.c because that is the one that is aware about > > page addresses. > > > I got your idea and agree this can work technically. Or if we finally decide > to use the second option (use "dma_pool" for all) in the first patch > "[RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations" > then we will likely be able to check whether a page in question > is within a "dma_pool" using gen_pool_has_addr(). > > I am still wondering, can we avoid the fallback implementation in > unpopulated-alloc.c. > Because for that purpose we will need to pull more code into > unpopulated-alloc.c (to be more precise, almost everything which > gnttab_dma_free_pages() already has except gnttab_pages_clear_private()) and > pass more arguments to xen_free_unpopulated_dma_pages(). Also I might mistake, > but having a fallback split between grant-table.c (to allocate RAM pages in > gnttab_dma_alloc_pages()) and unpopulated-alloc.c (to free RAM pages in > xen_free_unpopulated_dma_pages()) would look a bit weird. > > I see two possible options for the fallback implementation in grant-table.c: > 1. (less preferable) by introducing new flag in struct gnttab_dma_alloc_args > 2. (more preferable) by guessing unpopulated (non real RAM) page using > is_zone_device_page(), etc > > > For example, with the second option the resulting code will look quite simple > (only build tested): > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > index 738029d..3bda71f 100644 > --- a/drivers/xen/grant-table.c > +++ b/drivers/xen/grant-table.c > @@ -1047,6 +1047,23 @@ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args > *args) > size_t size; > int i, ret; > > + if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC)) { > + ret = xen_alloc_unpopulated_dma_pages(args->dev, > args->nr_pages, > + args->pages); > + if (ret < 0) > + goto fallback; > + > + ret = gnttab_pages_set_private(args->nr_pages, args->pages); > + if (ret < 0) > + goto fail; > + > + args->vaddr = page_to_virt(
Re: [RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations
On Tue, 14 Jun 2022, Oleksandr wrote: > On 11.06.22 03:12, Stefano Stabellini wrote: > > On Wed, 8 Jun 2022, Oleksandr wrote: > > > 2. Drop the "page_list" entirely and use "dma_pool" for all (contiguous > > > and > > > non-contiguous) allocations. After all, all pages are initially contiguous > > > in > > > fill_list() as they are built from the resource. This changes behavior for > > > all > > > users of xen_alloc_unpopulated_pages() > > > > > > Below the diff for unpopulated-alloc.c. The patch is also available at: > > > > > > https://github.com/otyshchenko1/linux/commit/7be569f113a4acbdc4bcb9b20cb3995b3151387a > > > > > > > > > diff --git a/drivers/xen/unpopulated-alloc.c > > > b/drivers/xen/unpopulated-alloc.c > > > index a39f2d3..ab5c7bd 100644 > > > --- a/drivers/xen/unpopulated-alloc.c > > > +++ b/drivers/xen/unpopulated-alloc.c > > > @@ -1,5 +1,7 @@ > > > // SPDX-License-Identifier: GPL-2.0 > > > +#include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > @@ -13,8 +15,8 @@ > > > #include > > > > > > static DEFINE_MUTEX(list_lock); > > > -static struct page *page_list; > > > -static unsigned int list_count; > > > + > > > +static struct gen_pool *dma_pool; > > > > > > static struct resource *target_resource; > > > > > > @@ -36,7 +38,7 @@ static int fill_list(unsigned int nr_pages) > > > struct dev_pagemap *pgmap; > > > struct resource *res, *tmp_res = NULL; > > > void *vaddr; > > > - unsigned int i, alloc_pages = round_up(nr_pages, > > > PAGES_PER_SECTION); > > > + unsigned int alloc_pages = round_up(nr_pages, PAGES_PER_SECTION); > > > struct range mhp_range; > > > int ret; > > > > > > @@ -106,6 +108,7 @@ static int fill_list(unsigned int nr_pages) > > > * conflict with any devices. > > > */ > > > if (!xen_feature(XENFEAT_auto_translated_physmap)) { > > > + unsigned int i; > > > xen_pfn_t pfn = PFN_DOWN(res->start); > > > > > > for (i = 0; i < alloc_pages; i++) { > > > @@ -125,16 +128,17 @@ static int fill_list(unsigned int nr_pages) > > > goto err_memremap; > > > } > > > > > > - for (i = 0; i < alloc_pages; i++) { > > > - struct page *pg = virt_to_page(vaddr + PAGE_SIZE * i); > > > - > > > - pg->zone_device_data = page_list; > > > - page_list = pg; > > > - list_count++; > > > + ret = gen_pool_add_virt(dma_pool, (unsigned long)vaddr, > > > res->start, > > > + alloc_pages * PAGE_SIZE, NUMA_NO_NODE); > > > + if (ret) { > > > + pr_err("Cannot add memory range to the pool\n"); > > > + goto err_pool; > > > } > > > > > > return 0; > > > > > > +err_pool: > > > + memunmap_pages(pgmap); > > > err_memremap: > > > kfree(pgmap); > > > err_pgmap: > > > @@ -149,51 +153,49 @@ static int fill_list(unsigned int nr_pages) > > > return ret; > > > } > > > > > > -/** > > > - * xen_alloc_unpopulated_pages - alloc unpopulated pages > > > - * @nr_pages: Number of pages > > > - * @pages: pages returned > > > - * @return 0 on success, error otherwise > > > - */ > > > -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page > > > **pages) > > > +static int alloc_unpopulated_pages(unsigned int nr_pages, struct page > > > **pages, > > > + bool contiguous) > > > { > > > unsigned int i; > > > int ret = 0; > > > + void *vaddr; > > > + bool filled = false; > > > > > > /* > > > * Fallback to default behavior if we do not have any suitable > > > resource > > > * to allocate required region from and as the result we won't be > > > able > > > to > > > * construct pages. > > > */ > > > - if (!target_resource) > > > + if (!target_resource) { > > > + if (contiguous) > > > + return -ENODEV; > > > + > > > return xen_alloc_ballooned_pages(nr_pages, pages); > > > + } > > > > > > mutex_lock(&list_lock); > > > - if (list_count < nr_pages) { > > > - ret = fill_list(nr_pages - list_count); > > > + > > > + while (!(vaddr = (void *)gen_pool_alloc(dma_pool, nr_pages * > > > PAGE_SIZE))) { > > > + if (filled) > > > + ret = -ENOMEM; > > > + else { > > > + ret = fill_list(nr_pages); > > > + filled = true; > > > + } > > > if (ret) > > > goto out; > > > } > > > > > > for (i = 0; i < nr_pages; i++) { > > > - struct page *pg = page_list; > > > - > > > - BUG_ON(!pg); > > > - page_list = pg->zone_device_data; > > > - list_count--; >
Re: [PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU
On Tue, 14 Jun 2022, Julien Grall wrote: > From: Julien Grall > > Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in > xmalloc()" extended the checks in _xmalloc() to catch any use of the > helpers from context with interrupts disabled. > > Unfortunately, the rule is not followed when initializing the per-CPU > IRQs: > > (XEN) Xen call trace: > (XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC) > (XEN)[<>] (LR) > (XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0 > (XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4 > (XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c > (XEN)[<00288fa4>] start_secondary+0x194/0x274 > (XEN)[<40010170>] 40010170 > (XEN) > (XEN) > (XEN) > (XEN) Panic on CPU 2: > (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= > 1)' failed at common/xmalloc_tlsf.c:601 > (XEN) > > This is happening because zalloc_cpumask_var() may allocate memory > if NR_CPUS is > 2 * sizeof(unsigned long). > > Avoid the problem by allocate the per-CPU IRQs while preparing the > CPU. > > This also has the benefit to remove a BUG_ON() in the secondary CPU > code. > > Signed-off-by: Julien Grall > --- > xen/arch/arm/include/asm/irq.h | 1 - > xen/arch/arm/irq.c | 35 +++--- > xen/arch/arm/smpboot.c | 2 -- > 3 files changed, 28 insertions(+), 10 deletions(-) > > diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h > index e45d57459899..245f49dcbac5 100644 > --- a/xen/arch/arm/include/asm/irq.h > +++ b/xen/arch/arm/include/asm/irq.h > @@ -73,7 +73,6 @@ static inline bool is_lpi(unsigned int irq) > bool is_assignable_irq(unsigned int irq); > > void init_IRQ(void); > -void init_secondary_IRQ(void); > > int route_irq_to_guest(struct domain *d, unsigned int virq, > unsigned int irq, const char *devname); > diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c > index b761d90c4063..56bdcb95335d 100644 > --- a/xen/arch/arm/irq.c > +++ b/xen/arch/arm/irq.c > @@ -17,6 +17,7 @@ > * GNU General Public License for more details. > */ > > +#include > #include > #include > #include > @@ -100,7 +101,7 @@ static int __init init_irq_data(void) > return 0; > } > > -static int init_local_irq_data(void) > +static int init_local_irq_data(unsigned int cpu) > { > int irq; > > @@ -108,7 +109,7 @@ static int init_local_irq_data(void) > > for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ ) > { > -struct irq_desc *desc = irq_to_desc(irq); > +struct irq_desc *desc = &per_cpu(local_irq_desc, cpu)[irq]; > int rc = init_one_irq_desc(desc); > > if ( rc ) > @@ -131,6 +132,29 @@ static int init_local_irq_data(void) > return 0; > } > > +static int cpu_callback(struct notifier_block *nfb, unsigned long action, > +void *hcpu) > +{ > +unsigned long cpu = (unsigned long)hcpu; unsigned int cpu ? The rest looks good > +int rc = 0; > + > +switch ( action ) > +{ > +case CPU_UP_PREPARE: > +rc = init_local_irq_data(cpu); > +if ( rc ) > +printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%lu\n", > + cpu); > +break; > +} > + > +return !rc ? NOTIFY_DONE : notifier_from_errno(rc); > +} > + > +static struct notifier_block cpu_nfb = { > +.notifier_call = cpu_callback, > +}; > + > void __init init_IRQ(void) > { > int irq; > @@ -140,13 +164,10 @@ void __init init_IRQ(void) > local_irqs_type[irq] = IRQ_TYPE_INVALID; > spin_unlock(&local_irqs_type_lock); > > -BUG_ON(init_local_irq_data() < 0); > +BUG_ON(init_local_irq_data(smp_processor_id()) < 0); > BUG_ON(init_irq_data() < 0); > -} > > -void init_secondary_IRQ(void) > -{ > -BUG_ON(init_local_irq_data() < 0); > +register_cpu_notifier(&cpu_nfb); > } > > static inline struct irq_guest *irq_get_guest_info(struct irq_desc *desc) > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 9bb32a301a70..4888bcd78a5a 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -359,8 +359,6 @@ void start_secondary(void) > > gic_init_secondary_cpu(); > > -init_secondary_IRQ(); > - > set_current(idle_vcpu[cpuid]); > > setup_cpu_sibling_map(cpuid); > -- > 2.32.0 >
Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU
On Tue, 14 Jun 2022, Michal Orzel wrote: > On 14.06.2022 13:08, Julien Grall wrote: > >>> + unsigned int rc = 0; > >> ... here you are setting rc to 0 even though it will be reassigned. > >> Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving > >> the definition there? > > > > Because I forgot to replace "return NOTIFY_DONE;" with : > > > > return !rc ? NOTIFY_DONE : notifier_from_errno(rc); > That is what I thought. > With these fixes you can add my Rb. And also my Acked-by: Stefano Stabellini
[qemu-mainline test] 171165: tolerable FAIL - PUSHED
flight 171165 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/171165/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 171149 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 171149 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 171149 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 171149 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 171149 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 171149 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 171149 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 171149 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: qemuudebd0753663bc89c86f5462a53268f2e3f680f60 baseline version: qemuudcb40541ebca7ec98a14d461593b3cd7282b4fac Last test of basis 171149 2022-06-13 00:38:22 Z1 days Testing same since 171160 2022-06-14 06:39:46 Z0 days2 attempts People who touched revisions under test: Alex Bennée Daniel P. Berrangé Paolo Bonzini
[linux-linus test] 171161: regressions - FAIL
flight 171161 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/171161/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-examine 8 reboot fail REGR. vs. 170714 test-amd64-amd64-xl-qemuu-debianhvm-amd64 8 xen-bootfail REGR. vs. 170714 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-libvirt 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-xl-credit2 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-xl-multivcpu 8 xen-bootfail REGR. vs. 170714 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host fail REGR. vs. 170714 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host fail REGR. vs. 170714 test-amd64-amd64-xl-pvhv2-intel 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-freebsd12-amd64 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-qemuu-nested-intel 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-seattle 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-credit2 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-libvirt-qcow2 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-libvirt-raw 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-credit1 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-vhd 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-examine 8 reboot fail REGR. vs. 170714 test-amd64-amd64-examine-bios 8 reboot fail REGR. vs. 170714 test-arm64-arm64-libvirt-raw 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-examine-uefi 8 reboot fail REGR. vs. 170714 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-libvirt-xsm 8 xen-boot fail REGR. vs. 170714 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl 8 xen-boot fail pass in 171154 test-amd64-amd64-xl-vhd 8 xen-boot fail pass in 171154 test-amd64-amd64-xl-qemuu-ovmf-amd64 8 xen-boot fail pass in 171154 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 8 xen-boot fail pass in 171155 test-amd64-amd64-pair12 xen-boot/src_host fail pass in 171155 test-amd64-amd64-pair13 xen-boot/dst_host fail pass in 171155 test-amd64-amd64-xl-credit1 8 xen-boot fail pass in 171155 test-amd64-amd64-xl-xsm 8 xen-boot fail pass in 171155 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail pass in 171156 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170714 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170714 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170714 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170714 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170714 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 170714 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 170714 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 170714 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-cred
Re: [PATCH v2 2/2] xen/arm: add FF-A mediator
Hello Jens, Sorry for late review, I was busy with internal projects. This is preliminary review. I gave up at scatter-gather operations. Need more time to review them properly. One thing that bothers me is that Xen is non-preemptive and there are plenty potentially long-running operations. Jens Wiklander writes: > Adds a FF-A version 1.1 [1] mediator to communicate with a Secure > Partition in secure world. > > The implementation is the bare minimum to be able to communicate with > OP-TEE running as an SPMC at S-EL1. > > This is loosely based on the TEE mediator framework and the OP-TEE > mediator. > > [1] > https://urldefense.com/v3/__https://developer.arm.com/documentation/den0077/latest__;!!GF_29dbcQIUBPA!1rn9xKdmcgMXOyZ_CvNIVq-wAS1ZI_Ews1w-Gqt0YPwSXyyTJedeFQgD65WhhOwIf_-cIa4EINzmwM4o62XPcMt1cTLcMZ7d$ > [developer[.]arm[.]com] > Signed-off-by: Jens Wiklander > --- > xen/arch/arm/Kconfig | 11 + > xen/arch/arm/Makefile |1 + > xen/arch/arm/domain.c | 10 + > xen/arch/arm/ffa.c| 1624 + > xen/arch/arm/include/asm/domain.h |4 + > xen/arch/arm/include/asm/ffa.h| 71 ++ > xen/arch/arm/vsmc.c | 17 +- > 7 files changed, 1735 insertions(+), 3 deletions(-) > create mode 100644 xen/arch/arm/ffa.c > create mode 100644 xen/arch/arm/include/asm/ffa.h > > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig > index ecfa6822e4d3..5b75067e2745 100644 > --- a/xen/arch/arm/Kconfig > +++ b/xen/arch/arm/Kconfig > @@ -106,6 +106,17 @@ config TEE > > source "arch/arm/tee/Kconfig" > > +config FFA > + bool "Enable FF-A mediator support" if EXPERT > + default n > + depends on ARM_64 > + help > + This option enables a minamal FF-A mediator. The mediator is > + generic as it follows the FF-A specification [1], but it only > + implements a small substet of the specification. > + > + [1] > https://urldefense.com/v3/__https://developer.arm.com/documentation/den0077/latest__;!!GF_29dbcQIUBPA!1rn9xKdmcgMXOyZ_CvNIVq-wAS1ZI_Ews1w-Gqt0YPwSXyyTJedeFQgD65WhhOwIf_-cIa4EINzmwM4o62XPcMt1cTLcMZ7d$ > [developer[.]arm[.]com] > + > endmenu > > menu "ARM errata workaround via the alternative framework" > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile > index 1d862351d111..dbf5e593a069 100644 > --- a/xen/arch/arm/Makefile > +++ b/xen/arch/arm/Makefile > @@ -20,6 +20,7 @@ obj-y += domain.o > obj-y += domain_build.init.o > obj-y += domctl.o > obj-$(CONFIG_EARLY_PRINTK) += early_printk.o > +obj-$(CONFIG_FFA) += ffa.o > obj-y += gic.o > obj-y += gic-v2.o > obj-$(CONFIG_GICV3) += gic-v3.o > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 8110c1df8638..a93e6a9c4aef 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -756,6 +757,9 @@ int arch_domain_create(struct domain *d, > if ( (rc = tee_domain_init(d, config->arch.tee_type)) != 0 ) > goto fail; > > +if ( (rc = ffa_domain_init(d)) != 0 ) So, FFA support will be enabled for each domain? I think that this is fine for experimental feature, but I want to hear maintainer's opinion. > +goto fail; > + > update_domain_wallclock_time(d); > > /* > @@ -998,6 +1002,7 @@ static int relinquish_memory(struct domain *d, struct > page_list_head *list) > enum { > PROG_pci = 1, > PROG_tee, > +PROG_ffa, > PROG_xen, > PROG_page, > PROG_mapping, > @@ -1046,6 +1051,11 @@ int domain_relinquish_resources(struct domain *d) > if (ret ) > return ret; > > +PROGRESS(ffa): > +ret = ffa_relinquish_resources(d); > +if (ret ) Coding style: if ( ret ) > +return ret; > + > PROGRESS(xen): > ret = relinquish_memory(d, &d->xenpage_list); > if ( ret ) > diff --git a/xen/arch/arm/ffa.c b/xen/arch/arm/ffa.c > new file mode 100644 > index ..9063b7f2b59e > --- /dev/null > +++ b/xen/arch/arm/ffa.c > @@ -0,0 +1,1624 @@ > +/* > + * xen/arch/arm/ffa.c > + * > + * Arm Firmware Framework for ARMv8-A(FFA) mediator > + * > + * Copyright (C) 2021 Linaro Limited It is 2022 already :) > + * > + * Permission is hereby granted, free of charge, to any person > + * obtaining a copy of this software and associated documentation > + * files (the "Software"), to deal in the Software without restriction, > + * including without limitation the rights to use, copy, modify, merge, > + * publish, distribute, sublicense, and/or sell copies of the Software, > + * and to permit persons to whom the Software is furnished to do so, > + * subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be > + * included in all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS"
Xen Security Advisory 404 v1 (CVE-2022-21123,CVE-2022-21124,CVE-2022-21166) - x86: MMIO Stale Data vulnerabilities
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2022-21123,CVE-2022-21124,CVE-2022-21166 / XSA-404 x86: MMIO Stale Data vulnerabilities ISSUE DESCRIPTION = This issue is related to the SRBDS, TAA and MDS vulnerabilities. Please see: https://xenbits.xen.org/xsa/advisory-320.html (SRBDS) https://xenbits.xen.org/xsa/advisory-305.html (TAA) https://xenbits.xen.org/xsa/advisory-297.html (MDS) Please see Intel's whitepaper: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html IMPACT == An attacker might be able to directly read or infer data from other security contexts in the system. This can include data belonging to other VMs, or to Xen itself. The degree to which an attacker can obtain data depends on the CPU, and the system configuration. VULNERABLE SYSTEMS == Systems running all versions of Xen are affected. Only x86 processors are vulnerable. Processors from other manufacturers (e.g. ARM) are not believed to be vulnerable. Only Intel based processors are affected. Processors from other x86 manufacturers (e.g. AMD) are not believed to be vulnerable. Please consult the Intel Security Advisory for details on the affected processors and configurations. Per Xen's support statement, PCI passthrough should be to trusted domains because the overall system security depends on factors outside of Xen's control. As such, Xen, in a supported configuration, is not vulnerable to DRPW/SBDR. MITIGATION == All mitigations depend on functionality added in the IPU 2022.1 (May 2022) microcode release from Intel. Consult your dom0 OS vendor. To the best of the security team's understanding, the summary is as follows: Server CPUs (Xeon EP/EX, Scalable, and some Atom servers), excluding Xeon E3 (which use the client CPU design), are potentially vulnerable to DRPW (CVE-2022-21166). Client CPUs (inc Xeon E3) are, furthermore, potentially vulnerable to SBDR (CVE-2022-21123) and SBDS (CVE-2022-21125). SBDS only affects CPUs vulnerable to MDS. On these CPUs, there are previously undiscovered leakage channels. There is no change to the existing MDS mitigations. DRPW and SBDR only affects configurations where less privileged domains have MMIO mappings of buggy endpoints. Consult your hardware vendor. In configurations where less privileged domains have MMIO access to buggy endpoints, `spec-ctrl=unpriv-mmio` can be enabled which will cause Xen to mitigate cross-domain fill buffer leakage, and extend SRBDS protections to protect RNG data from leakage. RESOLUTION == Applying the appropriate attached patch resolves this issue. Note that patches for released versions are generally prepared to apply to the stable branches, and may not apply cleanly to the most recent release tarball. Downstreams are encouraged to update to the tip of the stable branch before applying these patches. The patches are still under review. An update will be sent once they are reviewed and the backports are done. xsa404/xsa404-?.patch xen-unstable $ sha256sum xsa404*/* 18b307c2cbbd08d568e9dcb2447901d94e22ff1e3945c3436173aa693f6456fb xsa404/xsa404-1.patch d6f193ad963396285e983aa1c18539f67222582711fc62105c21b71b3b53a97d xsa404/xsa404-2.patch d2c123ccdf5eb9f862d6e9cb0e59045ae18799a07db149c7d90e301ca20436aa xsa404/xsa404-3.patch $ NOTE CONCERNING CVE-2022-21127 / Update to SRBDS An issue was discovered with the SRBDS microcode mitigation. A microcode update was released as part of Intel's IPU 2022.1 in May 2022. Updating microcode is sufficient to fix the issue, with no extra actions required on Xen's behalf. Consult your dom0 OS vendor or OEM for updated microcode. NOTE CONCERNING CVE-2022-21180 / Undefined MMIO Hang A related issue was discovered. See: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/undefined-mmio-hang.html Xen is not vulnerable to UMH in supported configurations. The only mitigation to is avoid passing impacted devices through to untrusted guests. -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmKo0Z0MHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZc8cH/RFgxQ4L8OewWMxsuowpgLg8NVyYGFMBgttscBh+ ANpjRTnV4yQGpt9nNFDAcXT1c/fvWhypOiwadEtczRl5k/Q96JOKFdiAc1QR35Oj vmbCLgO20jQ/GdTzaqKUaGBwi8GLShJvH1zMPJ2KuXk5w5uFDhj2gEiB6Kdv9+9O 4FBxQkpDzll0gs5v16ien8btKhEuZj9lNtzXZw5j4+DJD69MvQqsRPVdEt+M17Ox XGYcpfpLeGUaIUPFTPZDcFIJnMvqPBQyt+2eaeR2ezW2ouNpxepCSPsEDlAmSZ/K uZA0ShyJD3pfCxjc8eztyF/4zajY5EvuEtWdUZC/3zVaUec= =4EdA -END PGP SIGNATURE- xsa404/xsa404-1.patch Description: Binary data xsa404/xsa404-2.patch Description: Binary data xsa404/xsa404-3.patch Description: Binary data
Re: [RFC PATCH 2/2] xen/grant-table: Use unpopulated DMAable pages instead of real RAM ones
On 11.06.22 02:55, Stefano Stabellini wrote: Hello Stefano On Thu, 9 Jun 2022, Oleksandr wrote: On 04.06.22 00:19, Stefano Stabellini wrote: Hello Stefano Thank you for having a look and sorry for the late response. On Tue, 17 May 2022, Oleksandr Tyshchenko wrote: From: Oleksandr Tyshchenko Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated DMAable (contiguous) pages will be allocated for grant mapping into instead of ballooning out real RAM pages. TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails. Signed-off-by: Oleksandr Tyshchenko --- drivers/xen/grant-table.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 8ac..2bb4392 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages); */ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args) { +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC + int ret; This is an alternative implementation of the same function. Currently, yes. If we are going to use #ifdef, then I would #ifdef the entire function, rather than just the body. Otherwise within the function body we can use IS_ENABLED. Good point. Note, there is one missing thing in current patch which is described in TODO. "Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails." So I will likely use IS_ENABLED within the function body. If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages() will try to call xen_alloc_unpopulated_dma_pages() the first and if fails then fallback to allocate RAM pages and balloon them out. One moment is not entirely clear to me. If we use fallback in gnttab_dma_alloc_pages() then we must use fallback in gnttab_dma_free_pages() as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM pages. The question is how to pass this information to the gnttab_dma_free_pages()? The first idea which comes to mind is to add a flag to struct gnttab_dma_alloc_args... You can check if the page is within the mhp_range range or part of iomem_resource? If not, you can free it as a normal page. If we do this, then the fallback is better implemented in unpopulated-alloc.c because that is the one that is aware about page addresses. I got your idea and agree this can work technically. Or if we finally decide to use the second option (use "dma_pool" for all) in the first patch "[RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations" then we will likely be able to check whether a page in question is within a "dma_pool" using gen_pool_has_addr(). I am still wondering, can we avoid the fallback implementation in unpopulated-alloc.c. Because for that purpose we will need to pull more code into unpopulated-alloc.c (to be more precise, almost everything which gnttab_dma_free_pages() already has except gnttab_pages_clear_private()) and pass more arguments to xen_free_unpopulated_dma_pages(). Also I might mistake, but having a fallback split between grant-table.c (to allocate RAM pages in gnttab_dma_alloc_pages()) and unpopulated-alloc.c (to free RAM pages in xen_free_unpopulated_dma_pages()) would look a bit weird. I see two possible options for the fallback implementation in grant-table.c: 1. (less preferable) by introducing new flag in struct gnttab_dma_alloc_args 2. (more preferable) by guessing unpopulated (non real RAM) page using is_zone_device_page(), etc For example, with the second option the resulting code will look quite simple (only build tested): diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 738029d..3bda71f 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -1047,6 +1047,23 @@ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args) size_t size; int i, ret; + if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC)) { + ret = xen_alloc_unpopulated_dma_pages(args->dev, args->nr_pages, + args->pages); + if (ret < 0) + goto fallback; + + ret = gnttab_pages_set_private(args->nr_pages, args->pages); + if (ret < 0) + goto fail; + + args->vaddr = page_to_virt(args->pages[0]); + args->dev_bus_addr = page_to_phys(args->pages[0]); + + return ret; + } + +fallback: size = args->nr_pages << PAGE_SHIFT; if (args->coherent) args->vaddr = dma_alloc_coherent(args->dev, size, @@ -1103,6 +1120,12 @@ int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args) gnttab_pages_clear_private(args->nr_pages, args->pages); + if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC) && + is_zone_device_page(args->pages[0])) { + xen_free_unpopulated_dma_pages(args->
Re: [RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations
On 11.06.22 03:12, Stefano Stabellini wrote: Hello Stefano On Wed, 8 Jun 2022, Oleksandr wrote: 2. Drop the "page_list" entirely and use "dma_pool" for all (contiguous and non-contiguous) allocations. After all, all pages are initially contiguous in fill_list() as they are built from the resource. This changes behavior for all users of xen_alloc_unpopulated_pages() Below the diff for unpopulated-alloc.c. The patch is also available at: https://github.com/otyshchenko1/linux/commit/7be569f113a4acbdc4bcb9b20cb3995b3151387a diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c index a39f2d3..ab5c7bd 100644 --- a/drivers/xen/unpopulated-alloc.c +++ b/drivers/xen/unpopulated-alloc.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include +#include #include #include #include @@ -13,8 +15,8 @@ #include static DEFINE_MUTEX(list_lock); -static struct page *page_list; -static unsigned int list_count; + +static struct gen_pool *dma_pool; static struct resource *target_resource; @@ -36,7 +38,7 @@ static int fill_list(unsigned int nr_pages) struct dev_pagemap *pgmap; struct resource *res, *tmp_res = NULL; void *vaddr; - unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION); + unsigned int alloc_pages = round_up(nr_pages, PAGES_PER_SECTION); struct range mhp_range; int ret; @@ -106,6 +108,7 @@ static int fill_list(unsigned int nr_pages) * conflict with any devices. */ if (!xen_feature(XENFEAT_auto_translated_physmap)) { + unsigned int i; xen_pfn_t pfn = PFN_DOWN(res->start); for (i = 0; i < alloc_pages; i++) { @@ -125,16 +128,17 @@ static int fill_list(unsigned int nr_pages) goto err_memremap; } - for (i = 0; i < alloc_pages; i++) { - struct page *pg = virt_to_page(vaddr + PAGE_SIZE * i); - - pg->zone_device_data = page_list; - page_list = pg; - list_count++; + ret = gen_pool_add_virt(dma_pool, (unsigned long)vaddr, res->start, + alloc_pages * PAGE_SIZE, NUMA_NO_NODE); + if (ret) { + pr_err("Cannot add memory range to the pool\n"); + goto err_pool; } return 0; +err_pool: + memunmap_pages(pgmap); err_memremap: kfree(pgmap); err_pgmap: @@ -149,51 +153,49 @@ static int fill_list(unsigned int nr_pages) return ret; } -/** - * xen_alloc_unpopulated_pages - alloc unpopulated pages - * @nr_pages: Number of pages - * @pages: pages returned - * @return 0 on success, error otherwise - */ -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages) +static int alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages, + bool contiguous) { unsigned int i; int ret = 0; + void *vaddr; + bool filled = false; /* * Fallback to default behavior if we do not have any suitable resource * to allocate required region from and as the result we won't be able to * construct pages. */ - if (!target_resource) + if (!target_resource) { + if (contiguous) + return -ENODEV; + return xen_alloc_ballooned_pages(nr_pages, pages); + } mutex_lock(&list_lock); - if (list_count < nr_pages) { - ret = fill_list(nr_pages - list_count); + + while (!(vaddr = (void *)gen_pool_alloc(dma_pool, nr_pages * PAGE_SIZE))) { + if (filled) + ret = -ENOMEM; + else { + ret = fill_list(nr_pages); + filled = true; + } if (ret) goto out; } for (i = 0; i < nr_pages; i++) { - struct page *pg = page_list; - - BUG_ON(!pg); - page_list = pg->zone_device_data; - list_count--; - pages[i] = pg; + pages[i] = virt_to_page(vaddr + PAGE_SIZE * i); #ifdef CONFIG_XEN_HAVE_PVMMU if (!xen_feature(XENFEAT_auto_translated_physmap)) { - ret = xen_alloc_p2m_entry(page_to_pfn(pg)); + ret = xen_alloc_p2m_entry(page_to_pfn(pages[i])); if (ret < 0) { - unsigned int j; - - for (j = 0; j <= i; j++) { - pages[j]->zone_device_data = page_list; - page_list = pages[j]; - list_count++; - } + /* XXX Do we need to zeroed pages[i]? */ + gen_pool_free(dma_pool, (unsigned long)vaddr, +
[XEN PATCH v2 1/4] build,include: rework shell script for headers++.chk
The command line generated for headers++.chk by make is quite long, and in some environment it is too long. This issue have been seen in Yocto build environment. Error messages: make[9]: execvp: /bin/sh: Argument list too long make[9]: *** [include/Makefile:181: include/headers++.chk] Error 127 Rework so that we do the foreach loop in shell rather that make to reduce the command line size by a lot. We also need a way to get headers prerequisite for some public headers so we use a switch "case" in shell to be able to do some simple pattern matching. Variables alone in POSIX shell don't allow to work with associative array or variables with "/". Reported-by: Bertrand Marquis Fixes: 28e13c7f43 ("build: xen/include: use if_changed") Signed-off-by: Anthony PERARD Reviewed-by: Bertrand Marquis --- Notes: v2: - fix typo in commit message - fix out-of-tree build v1: - was sent as a reply to v1 of the series xen/include/Makefile | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/xen/include/Makefile b/xen/include/Makefile index 6d9bcc19b0..49c75a78f9 100644 --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -158,13 +158,22 @@ define cmd_headerscxx_chk touch $@.new; \ exit 0; \ fi; \ - $(foreach i, $(filter %.h,$^),\ - echo "#include "\"$(i)\" \ + get_prereq() {\ + case $$1 in \ + $(foreach i, $(filter %.h,$^),\ + $(if $($(patsubst $(srctree)/%,%,$(i))-prereq), \ + $(i)$(close) \ + echo "$(foreach j, $($(patsubst $(srctree)/%,%,$(i))-prereq), \ + -include c$(j))";;)) \ + esac; \ + };\ + for i in $(filter %.h,$^); do \ + echo "#include "\"$$i\" \ | $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__\ -include stdint.h -include $(srcdir)/public/xen.h \ - $(foreach j, $($(patsubst $(srctree)/%,%,$i)-prereq), -include c$(j)) \ + `get_prereq $$i`\ -S -o /dev/null - \ - || exit $$?; echo $(i) >> $@.new;) \ + || exit $$?; echo $$i >> $@.new; done;\ mv $@.new $@ endef -- Anthony PERARD
[XEN PATCH v2 4/4] build: replace get-fields.sh by a perl script
The get-fields.sh which generate all the include/compat/.xlat/*.h headers is quite slow. It takes for example nearly 3 seconds to generate platform.h on a recent machine, or 2.3 seconds for memory.h. Since it's only text processing, rewriting the mix of shell/sed/python into a single perl script make the generation of those file a lot faster. I tried to keep a similar look for the code, to keep the code similar between the shell and perl, and to ease review. So some code in perl might look weird or could be written better. No functional change, the headers generated are identical. Signed-off-by: Anthony PERARD --- Notes: v2: - Add .pl extension to the perl script - remove "-w" from the shebang as it is duplicate of "use warning;" - Add a note in the commit message that the "headers generated are identical". xen/include/Makefile| 6 +- xen/tools/compat-xlat-header.pl | 539 xen/tools/get-fields.sh | 528 --- 3 files changed, 541 insertions(+), 532 deletions(-) create mode 100755 xen/tools/compat-xlat-header.pl delete mode 100644 xen/tools/get-fields.sh diff --git a/xen/include/Makefile b/xen/include/Makefile index 0d3e3d66e0..31b0be14bc 100644 --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -60,9 +60,7 @@ cmd_compat_c = \ quiet_cmd_xlat_headers = GEN $@ cmd_xlat_headers = \ -while read what name; do \ -$(SHELL) $(srctree)/tools/get-fields.sh "$$what" compat_$$name $< || exit $$?; \ -done <$(patsubst $(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst >$@.new; \ +$(PERL) $(srctree)/tools/compat-xlat-header.pl $< $(patsubst $(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst > $@.new; \ mv -f $@.new $@ targets += $(headers-y) @@ -80,7 +78,7 @@ $(obj)/compat/%.c: $(src)/public/%.h $(srcdir)/xlat.lst $(srctree)/tools/compat- $(call if_changed,compat_c) targets += $(patsubst compat/%, compat/.xlat/%, $(headers-y)) -$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst $(srctree)/tools/get-fields.sh FORCE +$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst $(srctree)/tools/compat-xlat-header.pl FORCE $(call if_changed,xlat_headers) quiet_cmd_xlat_lst = GEN $@ diff --git a/xen/tools/compat-xlat-header.pl b/xen/tools/compat-xlat-header.pl new file mode 100755 index 00..791230591c --- /dev/null +++ b/xen/tools/compat-xlat-header.pl @@ -0,0 +1,539 @@ +#!/usr/bin/perl + +use strict; +use warnings; + +open COMPAT_LIST, "<$ARGV[1]" or die "can't open $ARGV[1], $!"; +open HEADER, "<$ARGV[0]" or die "can't open $ARGV[0], $!"; + +my @typedefs; + +my @header_tokens; +while () { +next if m/^\s*#.*/; +s/([\]\[,;:{}])/ $1 /g; +s/^\s+//; +push(@header_tokens, split(/\s+/)); +} + +sub get_fields { +my ($looking_for) = @_; +my $level = 1; +my $aggr = 0; +my ($name, @fields); + +foreach (@header_tokens) { +if (/^(struct|union)$/) { +unless ($level != 1) { +$aggr = 1; +@fields = (); +$name = ''; +} +} elsif ($_ eq '{') { +$level++; +} elsif ($_ eq '}') { +$level--; +if ($level == 1 and $name eq $looking_for) { +push (@fields, $_); +return @fields; +} +} elsif (/^[a-zA-Z_].*/) { +unless ($aggr == 0 or $name ne "") { +$name = $_; +} +} +unless ($aggr == 0) { +push (@fields, $_); +} +} +return (); +} + +sub get_typedefs { +my $level = 1; +my $state = 0; +my @typedefs; +foreach (@_) { +if ($_ eq 'typedef') { +unless ($level != 1) { +$state = 1; +} +} elsif (m/^COMPAT_HANDLE\(.*\)$/) { +unless ($level != 1 or $state != 1) { +$state = 2; +} +} elsif (m/^[\[\{]$/) { +$level++; +} elsif (m/^[\]\}]$/) { +$level--; +} elsif ($_ eq ';') { +unless ($level != 1) { +$state = 0; +} +} elsif (m/^[a-zA-Z_].*$/) { +unless ($level != 1 or $state != 2) { +push (@typedefs, $_); +} +} +} +return @typedefs; +} + +sub build_enums { +my ($name, @tokens) = @_; + +my $level = 1; +my $kind = ''; +my $named = ''; +my (@fields, @members, $id); + +foreach (@tokens) { +if (m/^(struct|union)$/) { +unless ($level != 2) { +@fields = (''); +} +$kind="$_;$kind"; +} elsif ($_ eq '{') { +$level++; +} elsif ($_ eq '}') { +$level--; +if ($level == 1) { +my $subkind = $kind; +$subkind =~ s/;.*//; +
[XEN PATCH v2 3/4] build: set PERL
We are going to use it in a moment. Also update README about Perl requirement. Signed-off-by: Anthony PERARD --- Notes: v2: - update ./README xen/Makefile | 1 + README | 1 + 2 files changed, 2 insertions(+) diff --git a/xen/Makefile b/xen/Makefile index 82f5310b12..a6650a2acc 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -22,6 +22,7 @@ PYTHON_INTERPRETER:= $(word 1,$(shell which python3 python python2 2>/dev/null) export PYTHON ?= $(PYTHON_INTERPRETER) export CHECKPOLICY ?= checkpolicy +export PERL?= perl $(if $(filter __%, $(MAKECMDGOALS)), \ $(error targets prefixed with '__' are only for internal use)) diff --git a/README b/README index 5e55047ffd..c1c18de7e0 100644 --- a/README +++ b/README @@ -64,6 +64,7 @@ provided by your OS distributor: * iproute package (/sbin/ip) * GNU bison and GNU flex * ACPI ASL compiler (iasl) +* Perl In addition to the above there are a number of optional build prerequisites. Omitting these will cause the related features to be -- Anthony PERARD
[XEN PATCH v2 2/4] build: remove auto.conf prerequisite from compat/xlat.h target
Now that the command line generating "xlat.h" is check on rebuild, the header will be regenerated whenever the list of xlat headers changes due to change in ".config". We don't need to force a regeneration for every changes in ".config". Signed-off-by: Anthony PERARD Acked-by: Jan Beulich --- xen/include/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/include/Makefile b/xen/include/Makefile index 49c75a78f9..0d3e3d66e0 100644 --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -101,7 +101,7 @@ cmd_xlat_h = \ cat $(filter %.h,$^) >$@.new; \ mv -f $@.new $@ -$(obj)/compat/xlat.h: $(addprefix $(obj)/compat/.xlat/,$(xlat-y)) $(obj)/config/auto.conf FORCE +$(obj)/compat/xlat.h: $(addprefix $(obj)/compat/.xlat/,$(xlat-y)) FORCE $(call if_changed,xlat_h) ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH)) -- Anthony PERARD
[XEN PATCH v2 0/4] xen: rework compat headers generation
Patch series available in this git branch: https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git br.build-system-xen-include-rework-v2 v2: - new patch [1/4] to fix issue with command line that can be way too long - other small changes, and reorder patches Hi, This patch series is about 2 improvement. First one is to use $(if_changed, ) in "include/Makefile" to make the generation of the compat headers less verbose and to have the command line part of the decision to rebuild the headers. Second one is to replace one slow script by a much faster one, and save time when generating the headers. There's some number here: https://lore.kernel.org/xen-devel/yp3%2f%2fc%2fcacwlh...@perard.uk.xensource.com/ On my machine when doing a full build [of the hypervisor], with `ccache` enabled, it saves about 1.17 seconds (out of ~17s), and without ccache, it saves about 2.0 seconds (out of ~37s). Thanks. Anthony PERARD (4): build,include: rework shell script for headers++.chk build: remove auto.conf prerequisite from compat/xlat.h target build: set PERL build: replace get-fields.sh by a perl script xen/Makefile| 1 + xen/include/Makefile| 25 +- README | 1 + xen/tools/compat-xlat-header.pl | 539 xen/tools/get-fields.sh | 528 --- 5 files changed, 557 insertions(+), 537 deletions(-) create mode 100755 xen/tools/compat-xlat-header.pl delete mode 100644 xen/tools/get-fields.sh -- Anthony PERARD
Re: [PATCH] build: fix exporting for make 3.82
On 14.06.2022 18:00, Anthony PERARD wrote: > On Tue, Jun 14, 2022 at 05:40:27PM +0200, Jan Beulich wrote: >> GNU make 3.82 apparently has a quirk where exporting an undefined >> variable prevents its value from subsequently being updated. This >> situation can arise due to our adding of -rR to MAKEFLAGS, which takes >> effect also on make simply re-invoking itself. Once these flags are in >> effect, CC (in particular) is empty (undefined), and would be defined >> only via Config.mk including StdGNU.mk or alike. With the quirk, CC >> remains empty, yet with an empty CC the compiler minimum version check >> fails, breaking the build. >> >> Move the exporting of the various tool stack component variables past >> where they gain their (final) values. >> >> Signed-off-by: Jan Beulich >> --- >> There may be further factors playing into the described quirk, as I've >> also observed that simply running make as 2nd time would lead to >> successful building of xen/. >> >> While this wasn't a problem until several weeks back, I've not been able >> to identify which exact commit would have caused the breakage. Hence no >> Fixes: tag. > > Reviewed-by: Anthony PERARD Thanks. > Looks like this happened before: be63d9d47f ("build: tweak variable exporting > for make 3.82") Ah yes. I did think I had to deal with that before, but I did check patches only back to early 2021. But it's somewhat worse than described there: It's not just the origin which changes, but (as explained) it actually prevents the variable to further change its value. > So, maybe the issue is started again with 15a0578ca4 ("build: shuffle > main Makefile"), which move the include of Config.mk even later. Yes, that's certainly it. Thanks for spotting - I'll add a Fixes: tag. Jan
[PATCH] x86emul/test: improve failure location identification for FMA sub-test
When some FMA set of insns is included in the base instruction set (XOP, AVX512F, and AVX512-FP16 at present), simd_test() simply invokes fma_test(), negating its return value. In case of a failure this would yield a value close to 4G, which doesn't lend itself to easy identification of the failing test case. Recognize the case in simd_check_regs() and emit alternative output identifying FMA. Signed-off-by: Jan Beulich --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -259,7 +259,10 @@ static bool simd_check_regs(const struct { if ( !regs->eax ) return true; -printf("[line %u] ", (unsigned int)regs->eax); +if ( (int)regs->eax > 0 ) +printf("[line %u] ", (unsigned int)regs->eax); +else +printf("[FMA line %u] ", (unsigned int)-regs->eax); return false; }
Re: [PATCH] build: fix exporting for make 3.82
On Tue, Jun 14, 2022 at 05:40:27PM +0200, Jan Beulich wrote: > GNU make 3.82 apparently has a quirk where exporting an undefined > variable prevents its value from subsequently being updated. This > situation can arise due to our adding of -rR to MAKEFLAGS, which takes > effect also on make simply re-invoking itself. Once these flags are in > effect, CC (in particular) is empty (undefined), and would be defined > only via Config.mk including StdGNU.mk or alike. With the quirk, CC > remains empty, yet with an empty CC the compiler minimum version check > fails, breaking the build. > > Move the exporting of the various tool stack component variables past > where they gain their (final) values. > > Signed-off-by: Jan Beulich > --- > There may be further factors playing into the described quirk, as I've > also observed that simply running make as 2nd time would lead to > successful building of xen/. > > While this wasn't a problem until several weeks back, I've not been able > to identify which exact commit would have caused the breakage. Hence no > Fixes: tag. Reviewed-by: Anthony PERARD Looks like this happened before: be63d9d47f ("build: tweak variable exporting for make 3.82") So, maybe the issue is started again with 15a0578ca4 ("build: shuffle main Makefile"), which move the include of Config.mk even later. Thanks, -- Anthony PERARD
Re: [PATCH] build: fix exporting for make 3.82
On 14.06.2022 17:40, Jan Beulich wrote: > GNU make 3.82 apparently has a quirk where exporting an undefined > variable prevents its value from subsequently being updated. This > situation can arise due to our adding of -rR to MAKEFLAGS, which takes > effect also on make simply re-invoking itself. Once these flags are in > effect, CC (in particular) is empty (undefined), and would be defined > only via Config.mk including StdGNU.mk or alike. With the quirk, CC > remains empty, yet with an empty CC the compiler minimum version check > fails, breaking the build. > > Move the exporting of the various tool stack component variables past > where they gain their (final) values. > > Signed-off-by: Jan Beulich > --- > There may be further factors playing into the described quirk, as I've > also observed that simply running make as 2nd time would lead to > successful building of xen/. Albeit perhaps that's simply because then no re-invocation of make is involved, as auto.conf and auto.conf.cmd then already exist (and are up-to-date). Jan > While this wasn't a problem until several weeks back, I've not been able > to identify which exact commit would have caused the breakage. Hence no > Fixes: tag. > > --- a/xen/Makefile > +++ b/xen/Makefile > @@ -44,8 +44,6 @@ export ARCH SRCARCH > # Allow someone to change their config file > export KCONFIG_CONFIG ?= .config > > -export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE > - > export TARGET := xen > > .PHONY: dist > @@ -244,6 +242,7 @@ export TARGET_ARCH := $(shell echo $ > -e s'/riscv.*/riscv/g') > > export CONFIG_SHELL := $(SHELL) > +export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE > export YACC = $(if $(BISON),$(BISON),bison) > export LEX = $(if $(FLEX),$(FLEX),flex) > >
[PATCH] build: fix exporting for make 3.82
GNU make 3.82 apparently has a quirk where exporting an undefined variable prevents its value from subsequently being updated. This situation can arise due to our adding of -rR to MAKEFLAGS, which takes effect also on make simply re-invoking itself. Once these flags are in effect, CC (in particular) is empty (undefined), and would be defined only via Config.mk including StdGNU.mk or alike. With the quirk, CC remains empty, yet with an empty CC the compiler minimum version check fails, breaking the build. Move the exporting of the various tool stack component variables past where they gain their (final) values. Signed-off-by: Jan Beulich --- There may be further factors playing into the described quirk, as I've also observed that simply running make as 2nd time would lead to successful building of xen/. While this wasn't a problem until several weeks back, I've not been able to identify which exact commit would have caused the breakage. Hence no Fixes: tag. --- a/xen/Makefile +++ b/xen/Makefile @@ -44,8 +44,6 @@ export ARCH SRCARCH # Allow someone to change their config file export KCONFIG_CONFIG ?= .config -export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE - export TARGET := xen .PHONY: dist @@ -244,6 +242,7 @@ export TARGET_ARCH := $(shell echo $ -e s'/riscv.*/riscv/g') export CONFIG_SHELL := $(SHELL) +export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE export YACC = $(if $(BISON),$(BISON),bison) export LEX = $(if $(FLEX),$(FLEX),flex)
Re: [PATCH] tools/xenstore: simplify loop handling connection I/O
Hi Juergen, On 14/06/2022 16:31, Juergen Gross wrote: The loop handling input and output of connections of xenstored is open coding list_for_each_entry_safe() in an incredibly complicated way. Use list_for_each_entry_safe() instead, making it much more clear how the code is working. Signed-off-by: Juergen Gross Reviewed-by: Julien Grall Cheers, -- Julien Grall
[PATCH] tools/xenstore: simplify loop handling connection I/O
The loop handling input and output of connections of xenstored is open coding list_for_each_entry_safe() in an incredibly complicated way. Use list_for_each_entry_safe() instead, making it much more clear how the code is working. Signed-off-by: Juergen Gross --- tools/xenstore/xenstored_core.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c index 6e4022e5da..fa733e714e 100644 --- a/tools/xenstore/xenstored_core.c +++ b/tools/xenstore/xenstored_core.c @@ -2368,16 +2368,8 @@ int main(int argc, char *argv[]) } } - next = list_entry(connections.next, typeof(*conn), list); - if (&next->list != &connections) - talloc_increase_ref_count(next); - while (&next->list != &connections) { - conn = next; - - next = list_entry(conn->list.next, - typeof(*conn), list); - if (&next->list != &connections) - talloc_increase_ref_count(next); + list_for_each_entry_safe(conn, next, &connections, list) { + talloc_increase_ref_count(conn); if (conn_can_read(conn)) handle_input(conn); -- 2.35.3
Re: [PULL 00/15] Kraxel 20220614 patches
On 6/14/22 05:15, Gerd Hoffmann wrote: The following changes since commit debd0753663bc89c86f5462a53268f2e3f680f60: Merge tag 'pull-testing-next-140622-1' of https://github.com/stsquad/qemu into staging (2022-06-13 21:10:57 -0700) are available in the Git repository at: git://git.kraxel.org/qemu tags/kraxel-20220614-pull-request for you to fetch changes up to b95b56311a0890da0c9f7fc624529c3d7f8dbce0: virtio-gpu: Respect UI refresh rate for EDID (2022-06-14 10:34:37 +0200) usb: add CanoKey device, fixes for ehci + redir ui: fixes for gtk and cocoa, rework refresh rate virtio-gpu: scanout flush fix Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/7.1 as appropriate. r~ Akihiko Odaki (4): ui/cocoa: Fix poweroff request code ui/console: Do not return a value with ui_info ui: Deliver refresh rate via QemuUIInfo virtio-gpu: Respect UI refresh rate for EDID Arnout Engelen (1): hw/usb/hcd-ehci: fix writeback order Dongwon Kim (1): virtio-gpu: update done only on the scanout associated with rect Hongren (Zenithal) Zheng (6): hw/usb: Add CanoKey Implementation hw/usb/canokey: Add trace events meson: Add CanoKey docs: Add CanoKey documentation docs/system/devices/usb: Add CanoKey to USB devices examples MAINTAINERS: add myself as CanoKey maintainer Joelle van Dyne (1): usbredir: avoid queuing hello packet on snapshot restore Volker Rümelin (2): ui/gtk-gl-area: implement GL context destruction ui/gtk-gl-area: create the requested GL context version meson_options.txt| 2 + hw/usb/canokey.h | 69 +++ include/hw/virtio/virtio-gpu.h | 1 + include/ui/console.h | 4 +- include/ui/gtk.h | 2 +- hw/display/virtio-gpu-base.c | 7 +- hw/display/virtio-gpu.c | 4 + hw/display/virtio-vga.c | 5 +- hw/display/xenfb.c | 14 +- hw/usb/canokey.c | 313 +++ hw/usb/hcd-ehci.c| 5 +- hw/usb/redirect.c| 3 +- hw/vfio/display.c| 8 +- ui/console.c | 6 - ui/gtk-egl.c | 4 +- ui/gtk-gl-area.c | 42 - ui/gtk.c | 45 +++-- MAINTAINERS | 8 + docs/system/device-emulation.rst | 1 + docs/system/devices/canokey.rst | 168 + docs/system/devices/usb.rst | 4 + hw/usb/Kconfig | 5 + hw/usb/meson.build | 5 + hw/usb/trace-events | 16 ++ meson.build | 6 + scripts/meson-buildoptions.sh| 3 + ui/cocoa.m | 6 +- ui/trace-events | 2 + 28 files changed, 707 insertions(+), 51 deletions(-) create mode 100644 hw/usb/canokey.h create mode 100644 hw/usb/canokey.c create mode 100644 docs/system/devices/canokey.rst
Re: [PATCH 1/2] x86/pat: fix x86_has_pat_wp()
On 03.05.22 15:22, Juergen Gross wrote: x86_has_pat_wp() is using a wrong test, as it relies on the normal PAT configuration used by the kernel. In case the PAT MSR has been setup by another entity (e.g. BIOS or Xen hypervisor) it might return false even if the PAT configuration is allowing WP mappings. Fixes: 1f6f655e01ad ("x86/mm: Add a x86_has_pat_wp() helper") Signed-off-by: Juergen Gross --- arch/x86/mm/init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d8cfce221275..71e182ebced3 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -80,7 +80,8 @@ static uint8_t __pte2cachemode_tbl[8] = { /* Check that the write-protect PAT entry is set for write-protect */ bool x86_has_pat_wp(void) { - return __pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] == _PAGE_CACHE_MODE_WP; + return __pte2cachemode_tbl[__cachemode2pte_tbl[_PAGE_CACHE_MODE_WP]] == + _PAGE_CACHE_MODE_WP; } enum page_cache_mode pgprot2cachemode(pgprot_t pgprot) x86 maintainers, please consider taking this patch, as it is fixing a real bug. Patch 2 of this series can be dropped IMO. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage
On Thu, 9 Jun 2022 15:02:20 +0200 Petr Mladek wrote: > > I'm somewhat curious whether we can actually remove that trace event. > > Good question. > > Well, I think that it might be useful. It allows to see trace and > printk messages together. Yes people still use it. I was just asked about it at Kernel Recipes. That is, someone wanted printk mixed in with the tracing, and I told them about this event (which they didn't know about but was happy to hear that it existed). -- Steve
[qemu-mainline test] 171160: regressions - FAIL
flight 171160 qemu-mainline real [real] flight 171163 qemu-mainline real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/171160/ http://logs.test-lab.xenproject.org/osstest/logs/171163/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qcow222 guest-start.2fail REGR. vs. 171149 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qcow2 21 guest-start/debian.repeat fail in 171163 pass in 171160 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail pass in 171163-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-rtds 19 guest-start.2 fail blocked in 171149 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 171149 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 171149 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 171149 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 171149 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 171149 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 171149 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 171149 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 171149 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail
Re: [PATCH 24/30] panic: Refactor the panic path
On Thu 2022-05-26 13:25:57, Guilherme G. Piccoli wrote: > OK, so it seems we have some points in which agreement exists, and some > points that there is no agreement and instead, we have antagonistic / > opposite views and needs. Let's start with the easier part heh > > It seems everybody agrees that *we shouldn't over-engineer things*, and > as per Eric good words: making the panic path more feature-full or > increasing flexibility isn't a good idea. So, as a "corollary": the > panic level approach I'm proposing is not a good fit, I'll drop it and > let's go with something simpler. Makes sense. > Another point of agreement seems to be that _notifier lists in the panic > path are dangerous_, for *2 different reasons*: > > (a) We cannot guarantee that people won't add crazy callbacks there, we > can plan and document things the best as possible - it'll never be > enough, somebody eventually would slip a nonsense callback that would > break things and defeat the planned purpose of such a list; It is true that notifier lists might allow to add crazy stuff without proper review more easily. Things added into the core code would most likely get better review. But nothing is error-proof. And bugs will happen with any approach. > (b) As per Eric point, in a panic/crash situation we might have memory > corruption exactly in the list code / pointers, etc, so the notifier > lists are, by nature, a bit fragile. But I think we shouldn't consider > it completely "bollocks", since this approach has been used for a while > with a good success rate. So, lists aren't perfect at all, but at the > same time, they aren't completely useless. I am not able to judge this. Of course, any extra step increases the risk. I am just not sure how much more complicated it would be to hardcode the calls. Most of them are architecture and/or feature specific. And such code is often hard to review and maintain. > To avoid using a 4th list, 4th or 5th? We already have "hypervisor", "info", "pre-reboot", and "pre-loop". The 5th might be pre-crash-exec. > especially given the list nature is a bit > fragile, I'd suggest one of the 3 following approaches - I *really > appreciate feedbacks* on that so I can implement the best solution and > avoid wasting time in some poor/disliked solution: Honestly, I am not able to decide what might be better without seeing the code. Most things fits pretty well into the 4 proposed lists: "hypervisor", "info", "pre-reboot", and "pre-loop". IMHO, the only question is the code that needs to be always called even before crash_dump. I suggest that you solve the crash_dump callbacks the way that looks best to you. Ideally do it in a separate patch so it can be reviewed and reworked more easily. I believe that a fresh code with an updated split and simplified logic would help us to move forward. Best Regards, Petr
Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage
On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote: > --- a/kernel/time/tick-broadcast.c > +++ b/kernel/time/tick-broadcast.c > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh > * to avoid a deep idle transition as we are about to get the > * broadcast IPI right away. > */ > -int tick_check_broadcast_expired(void) > +noinstr int tick_check_broadcast_expired(void) > { > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H > + return arch_test_bit(smp_processor_id(), > cpumask_bits(tick_broadcast_force_mask)); > +#else > return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask); > +#endif > } This is somewhat not-ideal. :/ Could we unconditionally do the arch_test_bit() variant, with a comment, or does that not exist in some cases? Thanks, Mark.
[PULL 12/15] virtio-gpu: update done only on the scanout associated with rect
From: Dongwon Kim It only needs to update the scanouts containing the rect area coming with the resource-flush request from the guest. Cc: Gerd Hoffmann Cc: Vivek Kasireddy Signed-off-by: Dongwon Kim Message-Id: <20220505214030.4261-1-dongwon@intel.com> Signed-off-by: Gerd Hoffmann --- hw/display/virtio-gpu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index cd4a56056fd9..55c6dd576318 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -514,6 +514,9 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g, for (i = 0; i < g->parent_obj.conf.max_outputs; i++) { scanout = &g->parent_obj.scanout[i]; if (scanout->resource_id == res->resource_id && +rf.r.x >= scanout->x && rf.r.y >= scanout->y && +rf.r.x + rf.r.width <= scanout->x + scanout->width && +rf.r.y + rf.r.height <= scanout->y + scanout->height && console_has_gl(scanout->con)) { dpy_gl_update(scanout->con, 0, 0, scanout->width, scanout->height); -- 2.36.1
[PULL 11/15] usbredir: avoid queuing hello packet on snapshot restore
From: Joelle van Dyne When launching QEMU with "-loadvm", usbredir_create_parser() should avoid setting up the hello packet (just as with "-incoming". On the latest version of libusbredir, usbredirparser_unserialize() will return error if the parser is not "pristine." Signed-off-by: Joelle van Dyne Message-Id: <20220507041850.98716-...@getutm.app> Signed-off-by: Gerd Hoffmann --- hw/usb/redirect.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c index fd7df599bc0b..1bd30efc3ef0 100644 --- a/hw/usb/redirect.c +++ b/hw/usb/redirect.c @@ -1280,7 +1280,8 @@ static void usbredir_create_parser(USBRedirDevice *dev) } #endif -if (runstate_check(RUN_STATE_INMIGRATE)) { +if (runstate_check(RUN_STATE_INMIGRATE) || +runstate_check(RUN_STATE_PRELAUNCH)) { flags |= usbredirparser_fl_no_hello; } usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE, -- 2.36.1
Re: [PULL 00/16] Kraxel 20220613 patches
On Mon, Jun 13, 2022 at 08:52:21AM -0700, Richard Henderson wrote: > On 6/13/22 04:36, Gerd Hoffmann wrote: > > The following changes since commit dcb40541ebca7ec98a14d461593b3cd7282b4fac: > > > >Merge tag 'mips-20220611' of https://github.com/philmd/qemu into staging > > (2022-06-11 21:13:27 -0700) > > > > are available in the Git repository at: > > > >git://git.kraxel.org/qemu tags/kraxel-20220613-pull-request > > > > for you to fetch changes up to 23b87f7a3a13e93e248eef8a4b7257548855a620: > > > >ui: move 'pc-bios/keymaps' to 'ui/keymaps' (2022-06-13 10:59:25 +0200) > > > > > > usb: add CanoKey device, fixes for ehci + redir > > ui: fixes for gtk and cocoa, move keymaps (v2), rework refresh rate > > virtio-gpu: scanout flush fix > > This doesn't even configure: > > ../src/ui/keymaps/meson.build:55:4: ERROR: File ar does not exist. dropped keymaps patch for now, new version sent. take care, Gerd
[PULL 14/15] ui: Deliver refresh rate via QemuUIInfo
From: Akihiko Odaki This change adds a new member, refresh_rate to QemuUIInfo in include/ui/console.h. It represents the refresh rate of the physical display backend, and it is more appropriate than GUI update interval as the refresh rate which the emulated device reports: - sdl may set GUI update interval shorter than the refresh rate of the physical display to respond to user-generated events. - sdl and vnc aggressively changes GUI update interval, but a guests is typically not designed to respond to frequent refresh rate changes, or frequent "display mode" changes in general. The frequency of refresh rate changes of the physical display backend matches better to the guest's expectation. QemuUIInfo also has other members representing "display mode", which makes it suitable for refresh rate representation. It has a throttling of update notifications, and prevents frequent changes of the display mode. Signed-off-by: Akihiko Odaki Message-Id: <20220226115516.59830-3-akihiko.od...@gmail.com> Signed-off-by: Gerd Hoffmann --- include/ui/console.h | 2 +- include/ui/gtk.h | 2 +- hw/display/xenfb.c | 14 +++--- ui/console.c | 6 -- ui/gtk-egl.c | 4 ++-- ui/gtk-gl-area.c | 3 +-- ui/gtk.c | 45 +--- 7 files changed, 42 insertions(+), 34 deletions(-) diff --git a/include/ui/console.h b/include/ui/console.h index 642d6f5248cf..b64d82436097 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -139,6 +139,7 @@ typedef struct QemuUIInfo { int yoff; uint32_t width; uint32_t height; +uint32_t refresh_rate; } QemuUIInfo; /* cursor data format is 32bit RGBA */ @@ -431,7 +432,6 @@ typedef struct GraphicHwOps { void (*gfx_update)(void *opaque); bool gfx_update_async; /* if true, calls graphic_hw_update_done() */ void (*text_update)(void *opaque, console_ch_t *text); -void (*update_interval)(void *opaque, uint64_t interval); void (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info); void (*gl_block)(void *opaque, bool block); } GraphicHwOps; diff --git a/include/ui/gtk.h b/include/ui/gtk.h index 101b147d1b98..ae0f53740d19 100644 --- a/include/ui/gtk.h +++ b/include/ui/gtk.h @@ -155,7 +155,7 @@ extern bool gtk_use_gl_area; /* ui/gtk.c */ void gd_update_windowsize(VirtualConsole *vc); -int gd_monitor_update_interval(GtkWidget *widget); +void gd_update_monitor_refresh_rate(VirtualConsole *vc, GtkWidget *widget); void gd_hw_gl_flushed(void *vc); /* ui/gtk-egl.c */ diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c index cea10fe3c780..50857cd97a0b 100644 --- a/hw/display/xenfb.c +++ b/hw/display/xenfb.c @@ -777,16 +777,24 @@ static void xenfb_update(void *opaque) xenfb->up_fullscreen = 0; } -static void xenfb_update_interval(void *opaque, uint64_t interval) +static void xenfb_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) { struct XenFB *xenfb = opaque; +uint32_t refresh_rate; if (xenfb->feature_update) { #ifdef XENFB_TYPE_REFRESH_PERIOD if (xenfb_queue_full(xenfb)) { return; } -xenfb_send_refresh_period(xenfb, interval); + +refresh_rate = info->refresh_rate; +if (!refresh_rate) { +refresh_rate = 75; +} + +/* T = 1 / f = 1 [s*Hz] / f = 1000*1000 [ms*mHz] / f */ +xenfb_send_refresh_period(xenfb, 1000 * 1000 / refresh_rate); #endif } } @@ -983,5 +991,5 @@ struct XenDevOps xen_framebuffer_ops = { static const GraphicHwOps xenfb_ops = { .invalidate = xenfb_invalidate, .gfx_update = xenfb_update, -.update_interval = xenfb_update_interval, +.ui_info = xenfb_ui_info, }; diff --git a/ui/console.c b/ui/console.c index 36c80cd1de85..9331b85203a0 100644 --- a/ui/console.c +++ b/ui/console.c @@ -160,7 +160,6 @@ static void gui_update(void *opaque) uint64_t dcl_interval; DisplayState *ds = opaque; DisplayChangeListener *dcl; -QemuConsole *con; ds->refreshing = true; dpy_refresh(ds); @@ -175,11 +174,6 @@ static void gui_update(void *opaque) } if (ds->update_interval != interval) { ds->update_interval = interval; -QTAILQ_FOREACH(con, &consoles, next) { -if (con->hw_ops->update_interval) { -con->hw_ops->update_interval(con->hw, interval); -} -} trace_console_refresh(interval); } ds->last_update = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c index e3bd4bc27431..b5bffbab2522 100644 --- a/ui/gtk-egl.c +++ b/ui/gtk-egl.c @@ -140,8 +140,8 @@ void gd_egl_refresh(DisplayChangeListener *dcl) { VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl); -vc->gfx.dcl.update_interval = gd_monitor_update_interval( -vc->window ? vc->window : vc->gfx.drawing_area); +gd_update_monitor_refresh_rate( +vc, v
[PULL 15/15] virtio-gpu: Respect UI refresh rate for EDID
From: Akihiko Odaki Signed-off-by: Akihiko Odaki Message-Id: <20220226115516.59830-4-akihiko.od...@gmail.com> Signed-off-by: Gerd Hoffmann --- include/hw/virtio/virtio-gpu.h | 1 + hw/display/virtio-gpu-base.c | 1 + hw/display/virtio-gpu.c| 1 + 3 files changed, 3 insertions(+) diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h index afff9e158e31..2e28507efe21 100644 --- a/include/hw/virtio/virtio-gpu.h +++ b/include/hw/virtio/virtio-gpu.h @@ -80,6 +80,7 @@ struct virtio_gpu_scanout { struct virtio_gpu_requested_state { uint16_t width_mm, height_mm; uint32_t width, height; +uint32_t refresh_rate; int x, y; }; diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c index b21d6e5b0be8..a29f191aa82e 100644 --- a/hw/display/virtio-gpu-base.c +++ b/hw/display/virtio-gpu-base.c @@ -79,6 +79,7 @@ static void virtio_gpu_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) g->req_state[idx].x = info->xoff; g->req_state[idx].y = info->yoff; +g->req_state[idx].refresh_rate = info->refresh_rate; g->req_state[idx].width = info->width; g->req_state[idx].height = info->height; g->req_state[idx].width_mm = info->width_mm; diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index 55c6dd576318..20cc703dcc6e 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -217,6 +217,7 @@ virtio_gpu_generate_edid(VirtIOGPU *g, int scanout, .height_mm = b->req_state[scanout].height_mm, .prefx = b->req_state[scanout].width, .prefy = b->req_state[scanout].height, +.refresh_rate = b->req_state[scanout].refresh_rate, }; edid->size = cpu_to_le32(sizeof(edid->edid)); -- 2.36.1
[PULL 10/15] hw/usb/hcd-ehci: fix writeback order
From: Arnout Engelen The 'active' bit passes control over a qTD between the guest and the controller: set to 1 by guest to enable execution by the controller, and the controller sets it to '0' to hand back control to the guest. ehci_state_writeback write two dwords to main memory using DMA: the third dword of the qTD (containing dt, total bytes to transfer, cpage, cerr and status) and the fourth dword of the qTD (containing the offset). This commit makes sure the fourth dword is written before the third, avoiding a race condition where a new offset written into the qTD by the guest after it observed the status going to go to '0' gets overwritten by a 'late' DMA writeback of the previous offset. This race condition could lead to 'cpage out of range (5)' errors, and reproduced by: ./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device usb-ehci -blockdev driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso -device usb-storage,drive=iso,bootindex=0 -chardev pipe,id=shell,path=/tmp/pipe -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic (press a key, select 'Installation' (2), and accept the default values. On my machine the 'cpage out of range' is reproduced while loading the Linux Kernel about once per 7 attempts. With the fix in this commit it no longer fails) This problem was previously reported as a seabios problem in https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/ and as a nixos CI build failure in https://github.com/NixOS/nixpkgs/issues/170803 Signed-off-by: Arnout Engelen Signed-off-by: Gerd Hoffmann --- hw/usb/hcd-ehci.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 33a8a377bd95..d4da8dcb8d15 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -2011,7 +2011,10 @@ static int ehci_state_writeback(EHCIQueue *q) ehci_trace_qtd(q, NLPTR_GET(p->qtdaddr), (EHCIqtd *) &q->qh.next_qtd); qtd = (uint32_t *) &q->qh.next_qtd; addr = NLPTR_GET(p->qtdaddr); -put_dwords(q->ehci, addr + 2 * sizeof(uint32_t), qtd + 2, 2); +/* First write back the offset */ +put_dwords(q->ehci, addr + 3 * sizeof(uint32_t), qtd + 3, 1); +/* Then write back the token, clearing the 'active' bit */ +put_dwords(q->ehci, addr + 2 * sizeof(uint32_t), qtd + 2, 1); ehci_free_packet(p); /* -- 2.36.1
[PULL 13/15] ui/console: Do not return a value with ui_info
From: Akihiko Odaki The returned value is not used and misleading. Signed-off-by: Akihiko Odaki Message-Id: <20220226115516.59830-2-akihiko.od...@gmail.com> Signed-off-by: Gerd Hoffmann --- include/ui/console.h | 2 +- hw/display/virtio-gpu-base.c | 6 +++--- hw/display/virtio-vga.c | 5 ++--- hw/vfio/display.c| 8 +++- 4 files changed, 9 insertions(+), 12 deletions(-) diff --git a/include/ui/console.h b/include/ui/console.h index c44b28a972ca..642d6f5248cf 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -432,7 +432,7 @@ typedef struct GraphicHwOps { bool gfx_update_async; /* if true, calls graphic_hw_update_done() */ void (*text_update)(void *opaque, console_ch_t *text); void (*update_interval)(void *opaque, uint64_t interval); -int (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info); +void (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info); void (*gl_block)(void *opaque, bool block); } GraphicHwOps; diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c index 790cec333c8c..b21d6e5b0be8 100644 --- a/hw/display/virtio-gpu-base.c +++ b/hw/display/virtio-gpu-base.c @@ -69,12 +69,12 @@ static void virtio_gpu_notify_event(VirtIOGPUBase *g, uint32_t event_type) virtio_notify_config(&g->parent_obj); } -static int virtio_gpu_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) +static void virtio_gpu_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) { VirtIOGPUBase *g = opaque; if (idx >= g->conf.max_outputs) { -return -1; +return; } g->req_state[idx].x = info->xoff; @@ -92,7 +92,7 @@ static int virtio_gpu_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) /* send event to guest */ virtio_gpu_notify_event(g, VIRTIO_GPU_EVENT_DISPLAY); -return 0; +return; } static void diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c index c206b5da384b..4dcb34c4a740 100644 --- a/hw/display/virtio-vga.c +++ b/hw/display/virtio-vga.c @@ -47,15 +47,14 @@ static void virtio_vga_base_text_update(void *opaque, console_ch_t *chardata) } } -static int virtio_vga_base_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) +static void virtio_vga_base_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info) { VirtIOVGABase *vvga = opaque; VirtIOGPUBase *g = vvga->vgpu; if (g->hw_ops->ui_info) { -return g->hw_ops->ui_info(g, idx, info); +g->hw_ops->ui_info(g, idx, info); } -return -1; } static void virtio_vga_base_gl_block(void *opaque, bool block) diff --git a/hw/vfio/display.c b/hw/vfio/display.c index 89bc90508fb8..78f4d82c1c35 100644 --- a/hw/vfio/display.c +++ b/hw/vfio/display.c @@ -106,14 +106,14 @@ err: return; } -static int vfio_display_edid_ui_info(void *opaque, uint32_t idx, - QemuUIInfo *info) +static void vfio_display_edid_ui_info(void *opaque, uint32_t idx, + QemuUIInfo *info) { VFIOPCIDevice *vdev = opaque; VFIODisplay *dpy = vdev->dpy; if (!dpy->edid_regs) { -return 0; +return; } if (info->width && info->height) { @@ -121,8 +121,6 @@ static int vfio_display_edid_ui_info(void *opaque, uint32_t idx, } else { vfio_display_edid_update(vdev, false, 0, 0); } - -return 0; } static void vfio_display_edid_init(VFIOPCIDevice *vdev) -- 2.36.1
[PULL 09/15] MAINTAINERS: add myself as CanoKey maintainer
From: "Hongren (Zenithal) Zheng" Signed-off-by: Hongren (Zenithal) Zheng Message-Id: Signed-off-by: Gerd Hoffmann --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 0df25ed4b0a3..4cf6174f9f37 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2427,6 +2427,14 @@ F: hw/intc/s390_flic*.c F: include/hw/s390x/s390_flic.h L: qemu-s3...@nongnu.org +CanoKey +M: Hongren (Zenithal) Zheng +S: Maintained +R: Canokeys.org +F: hw/usb/canokey.c +F: hw/usb/canokey.h +F: docs/system/devices/canokey.rst + Subsystems -- Overall Audio backends -- 2.36.1
[PULL 05/15] hw/usb/canokey: Add trace events
From: "Hongren (Zenithal) Zheng" Signed-off-by: Hongren (Zenithal) Zheng Message-Id: Signed-off-by: Gerd Hoffmann --- hw/usb/canokey.c| 13 + hw/usb/trace-events | 16 2 files changed, 29 insertions(+) diff --git a/hw/usb/canokey.c b/hw/usb/canokey.c index 6cb8b7cdb089..4a08b1cbd776 100644 --- a/hw/usb/canokey.c +++ b/hw/usb/canokey.c @@ -14,6 +14,7 @@ #include "qapi/error.h" #include "hw/usb.h" #include "hw/qdev-properties.h" +#include "trace.h" #include "desc.h" #include "canokey.h" @@ -66,6 +67,7 @@ static const USBDesc desc_canokey = { */ int canokey_emu_stall_ep(void *base, uint8_t ep) { +trace_canokey_emu_stall_ep(ep); CanoKeyState *key = base; uint8_t ep_in = CANOKEY_EP_IN(ep); /* INTR IN has ep 129 */ key->ep_in_size[ep_in] = 0; @@ -75,6 +77,7 @@ int canokey_emu_stall_ep(void *base, uint8_t ep) int canokey_emu_set_address(void *base, uint8_t addr) { +trace_canokey_emu_set_address(addr); CanoKeyState *key = base; key->dev.addr = addr; return 0; @@ -83,6 +86,7 @@ int canokey_emu_set_address(void *base, uint8_t addr) int canokey_emu_prepare_receive( void *base, uint8_t ep, uint8_t *pbuf, uint16_t size) { +trace_canokey_emu_prepare_receive(ep, size); CanoKeyState *key = base; key->ep_out[ep] = pbuf; key->ep_out_size[ep] = size; @@ -92,6 +96,7 @@ int canokey_emu_prepare_receive( int canokey_emu_transmit( void *base, uint8_t ep, const uint8_t *pbuf, uint16_t size) { +trace_canokey_emu_transmit(ep, size); CanoKeyState *key = base; uint8_t ep_in = CANOKEY_EP_IN(ep); /* INTR IN has ep 129 */ memcpy(key->ep_in[ep_in] + key->ep_in_size[ep_in], @@ -125,6 +130,7 @@ uint32_t canokey_emu_get_rx_data_size(void *base, uint8_t ep) */ static void canokey_handle_reset(USBDevice *dev) { +trace_canokey_handle_reset(); CanoKeyState *key = CANOKEY(dev); for (int i = 0; i != CANOKEY_EP_NUM; ++i) { key->ep_in_state[i] = CANOKEY_EP_IN_WAIT; @@ -137,6 +143,7 @@ static void canokey_handle_reset(USBDevice *dev) static void canokey_handle_control(USBDevice *dev, USBPacket *p, int request, int value, int index, int length, uint8_t *data) { +trace_canokey_handle_control_setup(request, value, index, length); CanoKeyState *key = CANOKEY(dev); canokey_emu_setup(request, value, index, length); @@ -144,6 +151,7 @@ static void canokey_handle_control(USBDevice *dev, USBPacket *p, uint32_t dir_in = request & DeviceRequest; if (!dir_in) { /* OUT */ +trace_canokey_handle_control_out(); if (key->ep_out[0] != NULL) { memcpy(key->ep_out[0], data, length); } @@ -163,6 +171,7 @@ static void canokey_handle_control(USBDevice *dev, USBPacket *p, case CANOKEY_EP_IN_READY: memcpy(data, key->ep_in[0], key->ep_in_size[0]); p->actual_length = key->ep_in_size[0]; +trace_canokey_handle_control_in(p->actual_length); /* reset state */ key->ep_in_state[0] = CANOKEY_EP_IN_WAIT; key->ep_in_size[0] = 0; @@ -182,6 +191,7 @@ static void canokey_handle_data(USBDevice *dev, USBPacket *p) uint32_t out_len; switch (p->pid) { case USB_TOKEN_OUT: +trace_canokey_handle_data_out(ep_out, p->iov.size); usb_packet_copy(p, key->ep_out_buffer[ep_out], p->iov.size); out_pos = 0; while (out_pos != p->iov.size) { @@ -226,6 +236,7 @@ static void canokey_handle_data(USBDevice *dev, USBPacket *p) key->ep_in_size[ep_in] = 0; key->ep_in_pos[ep_in] = 0; } +trace_canokey_handle_data_in(ep_in, in_len); break; } break; @@ -237,6 +248,7 @@ static void canokey_handle_data(USBDevice *dev, USBPacket *p) static void canokey_realize(USBDevice *base, Error **errp) { +trace_canokey_realize(); CanoKeyState *key = CANOKEY(base); if (key->file == NULL) { @@ -260,6 +272,7 @@ static void canokey_realize(USBDevice *base, Error **errp) static void canokey_unrealize(USBDevice *base) { +trace_canokey_unrealize(); } static Property canokey_properties[] = { diff --git a/hw/usb/trace-events b/hw/usb/trace-events index 9773cb53300d..914ca7166829 100644 --- a/hw/usb/trace-events +++ b/hw/usb/trace-events @@ -345,3 +345,19 @@ usb_serial_set_baud(int bus, int addr, int baud) "dev %d:%u baud rate %d" usb_serial_set_data(int bus, int addr, int parity, int data, int stop) "dev %d:%u parity %c, data bits %d, stop bits %d" usb_serial_set_flow_control(int bus, int addr, int index) "dev %d:%u flow control %d" usb_serial_set_xonxoff(int bus, int addr, uint8_t xon, uint8_t xoff) "dev %d:%u xon 0x%x xoff 0x%x" + +# canokey.c +canokey_emu_stall_ep(uint8_t ep) "ep %d" +canokey_emu_set_address(uint8_t addr) "addr %d" +canokey_emu_prepare_receive(uint8_t ep, uint16_t size) "ep %d size %d" +canokey_emu_transmit(uint8_t e
[PULL 07/15] docs: Add CanoKey documentation
From: "Hongren (Zenithal) Zheng" Signed-off-by: Hongren (Zenithal) Zheng Message-Id: Signed-off-by: Gerd Hoffmann --- docs/system/device-emulation.rst | 1 + docs/system/devices/canokey.rst | 168 +++ 2 files changed, 169 insertions(+) create mode 100644 docs/system/devices/canokey.rst diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst index 3b729b920d7c..05060060563f 100644 --- a/docs/system/device-emulation.rst +++ b/docs/system/device-emulation.rst @@ -92,3 +92,4 @@ Emulated Devices devices/vhost-user.rst devices/virtio-pmem.rst devices/vhost-user-rng.rst + devices/canokey.rst diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst new file mode 100644 index ..169f99b8eb82 --- /dev/null +++ b/docs/system/devices/canokey.rst @@ -0,0 +1,168 @@ +.. _canokey: + +CanoKey QEMU + + +CanoKey [1]_ is an open-source secure key with supports of + +* U2F / FIDO2 with Ed25519 and HMAC-secret +* OpenPGP Card V3.4 with RSA4096, Ed25519 and more [2]_ +* PIV (NIST SP 800-73-4) +* HOTP / TOTP +* NDEF + +All these platform-independent features are in canokey-core [3]_. + +For different platforms, CanoKey has different implementations, +including both hardware implementions and virtual cards: + +* CanoKey STM32 [4]_ +* CanoKey Pigeon [5]_ +* (virt-card) CanoKey USB/IP +* (virt-card) CanoKey FunctionFS + +In QEMU, yet another CanoKey virt-card is implemented. +CanoKey QEMU exposes itself as a USB device to the guest OS. + +With the same software configuration as a hardware key, +the guest OS can use all the functionalities of a secure key as if +there was actually an hardware key plugged in. + +CanoKey QEMU provides much convenience for debuging: + +* libcanokey-qemu supports debuging output thus developers can + inspect what happens inside a secure key +* CanoKey QEMU supports trace event thus event +* QEMU USB stack supports pcap thus USB packet between the guest + and key can be captured and analysed + +Then for developers: + +* For developers on software with secure key support (e.g. FIDO2, OpenPGP), + they can see what happens inside the secure key +* For secure key developers, USB packets between guest OS and CanoKey + can be easily captured and analysed + +Also since this is a virtual card, it can be easily used in CI for testing +on code coping with secure key. + +Building + + +libcanokey-qemu is required to use CanoKey QEMU. + +.. code-block:: shell + +git clone https://github.com/canokeys/canokey-qemu +mkdir canokey-qemu/build +pushd canokey-qemu/build + +If you want to install libcanokey-qemu in a different place, +add ``-DCMAKE_INSTALL_PREFIX=/path/to/your/place`` to cmake below. + +.. code-block:: shell + +cmake .. +make +make install # may need sudo +popd + +Then configuring and building: + +.. code-block:: shell + +# depending on your env, lib/pkgconfig can be lib64/pkgconfig +export PKG_CONFIG_PATH=/path/to/your/place/lib/pkgconfig:$PKG_CONFIG_PATH +./configure --enable-canokey && make + +Using CanoKey QEMU +== + +CanoKey QEMU stores all its data on a file of the host specified by the argument +when invoking qemu. + +.. parsed-literal:: + +|qemu_system| -usb -device canokey,file=$HOME/.canokey-file + +Note: you should keep this file carefully as it may contain your private key! + +The first time when the file is used, it is created and initialized by CanoKey, +afterwards CanoKey QEMU would just read this file. + +After the guest OS boots, you can check that there is a USB device. + +For example, If the guest OS is an Linux machine. You may invoke lsusb +and find CanoKey QEMU there: + +.. code-block:: shell + +$ lsusb +Bus 001 Device 002: ID 20a0:42d4 Clay Logic CanoKey QEMU + +You may setup the key as guided in [6]_. The console for the key is at [7]_. + +Debuging + + +CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``, +the latter of which resides in QEMU. The former provides core functionality +of a secure key while the latter provides platform-dependent functions: +USB packet handling. + +If you want to trace what happens inside the secure key, when compiling +libcanokey-qemu, you should add ``-DQEMU_DEBUG_OUTPUT=ON`` in cmake command +line: + +.. code-block:: shell + +cmake .. -DQEMU_DEBUG_OUTPUT=ON + +If you want to trace events happened in canokey.c, use + +.. parsed-literal:: + +|qemu_system| --trace "canokey_*" \\ +-usb -device canokey,file=$HOME/.canokey-file + +If you want to capture USB packets between the guest and the host, you can: + +.. parsed-literal:: + +|qemu_system| -usb -device canokey,file=$HOME/.canokey-file,pcap=key.pcap + +Limitations +=== + +Currently libcanokey-qemu.so has dozens of global variables as it was originally +designed for embedded systems. Thus one qemu instance can not have +multiple CanoKey QEMU runn
[PULL 06/15] meson: Add CanoKey
From: "Hongren (Zenithal) Zheng" Signed-off-by: Hongren (Zenithal) Zheng Message-Id: Signed-off-by: Gerd Hoffmann --- meson_options.txt | 2 ++ hw/usb/Kconfig| 5 + hw/usb/meson.build| 5 + meson.build | 6 ++ scripts/meson-buildoptions.sh | 3 +++ 5 files changed, 21 insertions(+) diff --git a/meson_options.txt b/meson_options.txt index 2de94af03712..0e8197386b99 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -189,6 +189,8 @@ option('spice_protocol', type : 'feature', value : 'auto', description: 'Spice protocol support') option('u2f', type : 'feature', value : 'auto', description: 'U2F emulation support') +option('canokey', type : 'feature', value : 'auto', + description: 'CanoKey support') option('usb_redir', type : 'feature', value : 'auto', description: 'libusbredir support') option('l2tpv3', type : 'feature', value : 'auto', diff --git a/hw/usb/Kconfig b/hw/usb/Kconfig index 53f8283ffdc1..ce4f4339763e 100644 --- a/hw/usb/Kconfig +++ b/hw/usb/Kconfig @@ -119,6 +119,11 @@ config USB_U2F default y depends on USB +config USB_CANOKEY +bool +default y +depends on USB + config IMX_USBPHY bool default y diff --git a/hw/usb/meson.build b/hw/usb/meson.build index de853d780dd8..793df42e2127 100644 --- a/hw/usb/meson.build +++ b/hw/usb/meson.build @@ -63,6 +63,11 @@ if u2f.found() softmmu_ss.add(when: 'CONFIG_USB_U2F', if_true: [u2f, files('u2f-emulated.c')]) endif +# CanoKey +if canokey.found() + softmmu_ss.add(when: 'CONFIG_USB_CANOKEY', if_true: [canokey, files('canokey.c')]) +endif + # usb redirect if usbredir.found() usbredir_ss = ss.source_set() diff --git a/meson.build b/meson.build index 21cd949082dc..0c2e11ff0715 100644 --- a/meson.build +++ b/meson.build @@ -1408,6 +1408,12 @@ if have_system method: 'pkg-config', kwargs: static_kwargs) endif +canokey = not_found +if have_system + canokey = dependency('canokey-qemu', required: get_option('canokey'), + method: 'pkg-config', + kwargs: static_kwargs) +endif usbredir = not_found if not get_option('usb_redir').auto() or have_system usbredir = dependency('libusbredirparser-0.5', required: get_option('usb_redir'), diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index 00ea4d8cd169..1fc1d2e2c362 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -73,6 +73,7 @@ meson_options_help() { printf "%s\n" ' bpf eBPF support' printf "%s\n" ' brlapi brlapi character device driver' printf "%s\n" ' bzip2 bzip2 support for DMG images' + printf "%s\n" ' canokey CanoKey support' printf "%s\n" ' cap-ng cap_ng support' printf "%s\n" ' capstoneWhether and how to find the capstone library' printf "%s\n" ' cloop cloop image format support' @@ -204,6 +205,8 @@ _meson_option_parse() { --disable-brlapi) printf "%s" -Dbrlapi=disabled ;; --enable-bzip2) printf "%s" -Dbzip2=enabled ;; --disable-bzip2) printf "%s" -Dbzip2=disabled ;; +--enable-canokey) printf "%s" -Dcanokey=enabled ;; +--disable-canokey) printf "%s" -Dcanokey=disabled ;; --enable-cap-ng) printf "%s" -Dcap_ng=enabled ;; --disable-cap-ng) printf "%s" -Dcap_ng=disabled ;; --enable-capstone) printf "%s" -Dcapstone=enabled ;; -- 2.36.1
[PULL 04/15] hw/usb: Add CanoKey Implementation
From: "Hongren (Zenithal) Zheng" This commit added a new emulated device called CanoKey to QEMU. CanoKey implements platform independent features in canokey-core https://github.com/canokeys/canokey-core, and leaves the USB implementation to the platform. In this commit the USB part was implemented in QEMU using QEMU's USB APIs, therefore the emulated CanoKey can communicate with the guest OS using USB. Signed-off-by: Hongren (Zenithal) Zheng Message-Id: Signed-off-by: Gerd Hoffmann --- hw/usb/canokey.h | 69 +++ hw/usb/canokey.c | 300 +++ 2 files changed, 369 insertions(+) create mode 100644 hw/usb/canokey.h create mode 100644 hw/usb/canokey.c diff --git a/hw/usb/canokey.h b/hw/usb/canokey.h new file mode 100644 index ..24cf30420346 --- /dev/null +++ b/hw/usb/canokey.h @@ -0,0 +1,69 @@ +/* + * CanoKey QEMU device header. + * + * Copyright (c) 2021-2022 Canokeys.org + * Written by Hongren (Zenithal) Zheng + * + * This code is licensed under the Apache-2.0. + */ + +#ifndef CANOKEY_H +#define CANOKEY_H + +#include "hw/qdev-core.h" + +#define TYPE_CANOKEY "canokey" +#define CANOKEY(obj) \ +OBJECT_CHECK(CanoKeyState, (obj), TYPE_CANOKEY) + +/* + * State of Canokey (i.e. hw/canokey.c) + */ + +/* CTRL INTR BULK */ +#define CANOKEY_EP_NUM 3 +/* BULK/INTR IN can be up to 1352 bytes, e.g. get key info */ +#define CANOKEY_EP_IN_BUFFER_SIZE 2048 +/* BULK OUT can be up to 270 bytes, e.g. PIV import cert */ +#define CANOKEY_EP_OUT_BUFFER_SIZE 512 + +typedef enum { +CANOKEY_EP_IN_WAIT, +CANOKEY_EP_IN_READY, +CANOKEY_EP_IN_STALL +} CanoKeyEPState; + +typedef struct CanoKeyState { +USBDevice dev; + +/* IN packets from canokey device loop */ +uint8_t ep_in[CANOKEY_EP_NUM][CANOKEY_EP_IN_BUFFER_SIZE]; +/* + * See canokey_emu_transmit + * + * For large INTR IN, receive multiple data from canokey device loop + * in this case ep_in_size would increase with every call + */ +uint32_t ep_in_size[CANOKEY_EP_NUM]; +/* + * Used in canokey_handle_data + * for IN larger than p->iov.size, we would do multiple handle_data() + * + * The difference between ep_in_pos and ep_in_size: + * We first increase ep_in_size to fill ep_in buffer in device_loop, + * then use ep_in_pos to submit data from ep_in buffer in handle_data + */ +uint32_t ep_in_pos[CANOKEY_EP_NUM]; +CanoKeyEPState ep_in_state[CANOKEY_EP_NUM]; + +/* OUT pointer to canokey recv buffer */ +uint8_t *ep_out[CANOKEY_EP_NUM]; +uint32_t ep_out_size[CANOKEY_EP_NUM]; +/* For large BULK OUT, multiple write to ep_out is needed */ +uint8_t ep_out_buffer[CANOKEY_EP_NUM][CANOKEY_EP_OUT_BUFFER_SIZE]; + +/* Properties */ +char *file; /* canokey-file */ +} CanoKeyState; + +#endif /* CANOKEY_H */ diff --git a/hw/usb/canokey.c b/hw/usb/canokey.c new file mode 100644 index ..6cb8b7cdb089 --- /dev/null +++ b/hw/usb/canokey.c @@ -0,0 +1,300 @@ +/* + * CanoKey QEMU device implementation. + * + * Copyright (c) 2021-2022 Canokeys.org + * Written by Hongren (Zenithal) Zheng + * + * This code is licensed under the Apache-2.0. + */ + +#include "qemu/osdep.h" +#include + +#include "qemu/module.h" +#include "qapi/error.h" +#include "hw/usb.h" +#include "hw/qdev-properties.h" +#include "desc.h" +#include "canokey.h" + +#define CANOKEY_EP_IN(ep) ((ep) & 0x7F) + +#define CANOKEY_VENDOR_NUM 0x20a0 +#define CANOKEY_PRODUCT_NUM0x42d2 + +/* + * placeholder, canokey-qemu implements its own usb desc + * Namely we do not use usb_desc_handle_contorl + */ +enum { +STR_MANUFACTURER = 1, +STR_PRODUCT, +STR_SERIALNUMBER +}; + +static const USBDescStrings desc_strings = { +[STR_MANUFACTURER] = "canokeys.org", +[STR_PRODUCT] = "CanoKey QEMU", +[STR_SERIALNUMBER] = "0" +}; + +static const USBDescDevice desc_device_canokey = { +.bcdUSB= 0x0, +.bMaxPacketSize0 = 16, +.bNumConfigurations= 0, +.confs = NULL, +}; + +static const USBDesc desc_canokey = { +.id = { +.idVendor = CANOKEY_VENDOR_NUM, +.idProduct = CANOKEY_PRODUCT_NUM, +.bcdDevice = 0x0100, +.iManufacturer = STR_MANUFACTURER, +.iProduct = STR_PRODUCT, +.iSerialNumber = STR_SERIALNUMBER, +}, +.full = &desc_device_canokey, +.high = &desc_device_canokey, +.str = desc_strings, +}; + + +/* + * libcanokey-qemu.so side functions + * All functions are called from canokey_emu_device_loop + */ +int canokey_emu_stall_ep(void *base, uint8_t ep) +{ +CanoKeyState *key = base; +uint8_t ep_in = CANOKEY_EP_IN(ep); /* INTR IN has ep 129 */ +key->ep_in_size[ep_in] = 0; +key->ep_in_state[ep_in] = CANOKEY_EP_IN_STALL; +return 0; +} + +int canokey_emu_set_address(void *base, uint8_t addr) +{ +CanoKeyState *key = base; +key->dev.
[PULL 08/15] docs/system/devices/usb: Add CanoKey to USB devices examples
From: "Hongren (Zenithal) Zheng" Signed-off-by: Hongren (Zenithal) Zheng Message-Id: Signed-off-by: Gerd Hoffmann --- docs/system/devices/usb.rst | 4 1 file changed, 4 insertions(+) diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst index afb7d6c2268d..872d9167589b 100644 --- a/docs/system/devices/usb.rst +++ b/docs/system/devices/usb.rst @@ -199,6 +199,10 @@ option or the ``device_add`` monitor command. Available devices are: ``u2f-{emulated,passthru}`` Universal Second Factor device +``canokey`` + An Open-source Secure Key implementing FIDO2, OpenPGP, PIV and more. + For more information, see :ref:`canokey`. + Physical port addressing -- 2.36.1
[PULL 01/15] ui/gtk-gl-area: implement GL context destruction
From: Volker Rümelin The counterpart function for gd_gl_area_create_context() is currently empty. Implement the gd_gl_area_destroy_context() function to avoid GL context leaks. Signed-off-by: Volker Rümelin Message-Id: <20220605085131.7711-1-vr_q...@t-online.de> Signed-off-by: Gerd Hoffmann --- ui/gtk-gl-area.c | 8 +++- ui/trace-events | 1 + 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c index fc5a082eb846..0e20ea031d34 100644 --- a/ui/gtk-gl-area.c +++ b/ui/gtk-gl-area.c @@ -201,7 +201,13 @@ QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc, void gd_gl_area_destroy_context(DisplayGLCtx *dgc, QEMUGLContext ctx) { -/* FIXME */ +GdkGLContext *current_ctx = gdk_gl_context_get_current(); + +trace_gd_gl_area_destroy_context(ctx, current_ctx); +if (ctx == current_ctx) { +gdk_gl_context_clear_current(); +} +g_clear_object(&ctx); } void gd_gl_area_scanout_texture(DisplayChangeListener *dcl, diff --git a/ui/trace-events b/ui/trace-events index f78b5e66061f..1040ba0f88c7 100644 --- a/ui/trace-events +++ b/ui/trace-events @@ -26,6 +26,7 @@ gd_key_event(const char *tab, int gdk_keycode, int qkeycode, const char *action) gd_grab(const char *tab, const char *device, const char *reason) "tab=%s, dev=%s, reason=%s" gd_ungrab(const char *tab, const char *device) "tab=%s, dev=%s" gd_keymap_windowing(const char *name) "backend=%s" +gd_gl_area_destroy_context(void *ctx, void *current_ctx) "ctx=%p, current_ctx=%p" # vnc-auth-sasl.c # vnc-auth-vencrypt.c -- 2.36.1
[PULL 02/15] ui/gtk-gl-area: create the requested GL context version
From: Volker Rümelin Since about 2018 virglrenderer (commit fa835b0f88 "vrend: don't hardcode context version") tries to open the highest available GL context version. This is done by creating the known GL context versions from the highest to the lowest until (*create_gl_context) returns a context != NULL. This does not work properly with the current QEMU gd_gl_area_create_context() function, because gdk_gl_context_realize() on Wayland creates a version 3.0 legacy context if the requested GL context version can't be created. In order for virglrenderer to find the highest available GL context version, return NULL if the created context version is lower than the requested version. This fixes the following error: QEMU started with -device virtio-vga-gl -display gtk,gl=on. Under Wayland, the guest window remains black and the following information can be seen on the host. gl_version 30 - compat profile (qemu:5978): Gdk-WARNING **: 16:19:01.533: gdk_gl_context_set_required_version - GL context versions less than 3.2 are not supported. (qemu:5978): Gdk-WARNING **: 16:19:01.537: gdk_gl_context_set_required_version - GL context versions less than 3.2 are not supported. (qemu:5978): Gdk-WARNING **: 16:19:01.554: gdk_gl_context_set_required_version - GL context versions less than 3.2 are not supported. vrend_renderer_fill_caps: Entering with stale GL error: 1282 To reproduce this error, an OpenGL driver is required on the host that doesn't have the latest OpenGL extensions fully implemented. An example for this is the Intel i965 driver on a Haswell processor. Signed-off-by: Volker Rümelin Message-Id: <20220605085131.7711-2-vr_q...@t-online.de> Signed-off-by: Gerd Hoffmann --- ui/gtk-gl-area.c | 31 ++- ui/trace-events | 1 + 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c index 0e20ea031d34..2e0129c28cd4 100644 --- a/ui/gtk-gl-area.c +++ b/ui/gtk-gl-area.c @@ -170,6 +170,23 @@ void gd_gl_area_switch(DisplayChangeListener *dcl, } } +static int gd_cmp_gl_context_version(int major, int minor, QEMUGLParams *params) +{ +if (major > params->major_ver) { +return 1; +} +if (major < params->major_ver) { +return -1; +} +if (minor > params->minor_ver) { +return 1; +} +if (minor < params->minor_ver) { +return -1; +} +return 0; +} + QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc, QEMUGLParams *params) { @@ -177,8 +194,8 @@ QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc, GdkWindow *window; GdkGLContext *ctx; GError *err = NULL; +int major, minor; -gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area)); window = gtk_widget_get_window(vc->gfx.drawing_area); ctx = gdk_window_create_gl_context(window, &err); if (err) { @@ -196,6 +213,18 @@ QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc, g_clear_object(&ctx); return NULL; } + +gdk_gl_context_make_current(ctx); +gdk_gl_context_get_version(ctx, &major, &minor); +gdk_gl_context_clear_current(); +gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area)); + +if (gd_cmp_gl_context_version(major, minor, params) == -1) { +/* created ctx version < requested version */ +g_clear_object(&ctx); +} + +trace_gd_gl_area_create_context(ctx, params->major_ver, params->minor_ver); return ctx; } diff --git a/ui/trace-events b/ui/trace-events index 1040ba0f88c7..a922f00e10b4 100644 --- a/ui/trace-events +++ b/ui/trace-events @@ -26,6 +26,7 @@ gd_key_event(const char *tab, int gdk_keycode, int qkeycode, const char *action) gd_grab(const char *tab, const char *device, const char *reason) "tab=%s, dev=%s, reason=%s" gd_ungrab(const char *tab, const char *device) "tab=%s, dev=%s" gd_keymap_windowing(const char *name) "backend=%s" +gd_gl_area_create_context(void *ctx, int major, int minor) "ctx=%p, major=%d, minor=%d" gd_gl_area_destroy_context(void *ctx, void *current_ctx) "ctx=%p, current_ctx=%p" # vnc-auth-sasl.c -- 2.36.1
[PULL 03/15] ui/cocoa: Fix poweroff request code
From: Akihiko Odaki Signed-off-by: Akihiko Odaki Reviewed-by: Philippe Mathieu-Daudé Message-Id: <20220529082508.89097-1-akihiko.od...@gmail.com> Signed-off-by: Gerd Hoffmann --- ui/cocoa.m | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/ui/cocoa.m b/ui/cocoa.m index 09a62817f2a9..84c84e98fc5e 100644 --- a/ui/cocoa.m +++ b/ui/cocoa.m @@ -35,6 +35,7 @@ #include "ui/kbd-state.h" #include "sysemu/sysemu.h" #include "sysemu/runstate.h" +#include "sysemu/runstate-action.h" #include "sysemu/cpu-throttle.h" #include "qapi/error.h" #include "qapi/qapi-commands-block.h" @@ -1290,7 +1291,10 @@ - (void)applicationWillTerminate:(NSNotification *)aNotification { COCOA_DEBUG("QemuCocoaAppController: applicationWillTerminate\n"); -qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_UI); +with_iothread_lock(^{ +shutdown_action = SHUTDOWN_ACTION_POWEROFF; +qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_UI); +}); /* * Sleep here, because returning will cause OSX to kill us -- 2.36.1
[PULL 00/15] Kraxel 20220614 patches
The following changes since commit debd0753663bc89c86f5462a53268f2e3f680f60: Merge tag 'pull-testing-next-140622-1' of https://github.com/stsquad/qemu into staging (2022-06-13 21:10:57 -0700) are available in the Git repository at: git://git.kraxel.org/qemu tags/kraxel-20220614-pull-request for you to fetch changes up to b95b56311a0890da0c9f7fc624529c3d7f8dbce0: virtio-gpu: Respect UI refresh rate for EDID (2022-06-14 10:34:37 +0200) usb: add CanoKey device, fixes for ehci + redir ui: fixes for gtk and cocoa, rework refresh rate virtio-gpu: scanout flush fix Akihiko Odaki (4): ui/cocoa: Fix poweroff request code ui/console: Do not return a value with ui_info ui: Deliver refresh rate via QemuUIInfo virtio-gpu: Respect UI refresh rate for EDID Arnout Engelen (1): hw/usb/hcd-ehci: fix writeback order Dongwon Kim (1): virtio-gpu: update done only on the scanout associated with rect Hongren (Zenithal) Zheng (6): hw/usb: Add CanoKey Implementation hw/usb/canokey: Add trace events meson: Add CanoKey docs: Add CanoKey documentation docs/system/devices/usb: Add CanoKey to USB devices examples MAINTAINERS: add myself as CanoKey maintainer Joelle van Dyne (1): usbredir: avoid queuing hello packet on snapshot restore Volker Rümelin (2): ui/gtk-gl-area: implement GL context destruction ui/gtk-gl-area: create the requested GL context version meson_options.txt| 2 + hw/usb/canokey.h | 69 +++ include/hw/virtio/virtio-gpu.h | 1 + include/ui/console.h | 4 +- include/ui/gtk.h | 2 +- hw/display/virtio-gpu-base.c | 7 +- hw/display/virtio-gpu.c | 4 + hw/display/virtio-vga.c | 5 +- hw/display/xenfb.c | 14 +- hw/usb/canokey.c | 313 +++ hw/usb/hcd-ehci.c| 5 +- hw/usb/redirect.c| 3 +- hw/vfio/display.c| 8 +- ui/console.c | 6 - ui/gtk-egl.c | 4 +- ui/gtk-gl-area.c | 42 - ui/gtk.c | 45 +++-- MAINTAINERS | 8 + docs/system/device-emulation.rst | 1 + docs/system/devices/canokey.rst | 168 + docs/system/devices/usb.rst | 4 + hw/usb/Kconfig | 5 + hw/usb/meson.build | 5 + hw/usb/trace-events | 16 ++ meson.build | 6 + scripts/meson-buildoptions.sh| 3 + ui/cocoa.m | 6 +- ui/trace-events | 2 + 28 files changed, 707 insertions(+), 51 deletions(-) create mode 100644 hw/usb/canokey.h create mode 100644 hw/usb/canokey.c create mode 100644 docs/system/devices/canokey.rst -- 2.36.1
Re: [PULL 00/16] Kraxel 20220613 patches
> > Hmm, build worked here and CI passed too. > > > > I think this is one of those cases where the build directory must be > > deleted because one subdirectory is replaced by a compatibility > > symlink. > > Except 'configure' deals with that, as it explicitly rm -rf's the > symlink target: > > symlink() { > rm -rf "$2" > mkdir -p "$(dirname "$2")" > ln -s "$1" "$2" > } > > so i'm pretty confused as to what's going wrong here still 'git rebase -x ./make.sh master queue/kraxel' not working (where make.sh is a script effectively doing 'make -C build/$name' for multiple build trees with different configurations). 'git status' lists ui/keymaps/* as deleted. 'git reset --hard' fixes it. take care, Gerd
Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess
On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote: > Hi All! (omg so many) Hi Peter, Sorry for the delay; my plate has also been rather full recently. I'm beginning to page this in now. > These here few patches mostly clear out the utter mess that is cpuidle vs > rcuidle. > > At the end of the ride there's only 2 real RCU_NONIDLE() users left > > arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit()); > drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, > PERF_EF_RELOAD)); The latter of these is necessary because apparently PM notifiers are called with RCU not watching. Is that still the case today (or at the end of this series)? If so, that feels like fertile land for more issues (yaey...). If not, we should be able to drop this. I can go dig into that some more. > kernel/cfi.c: RCU_NONIDLE({ > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand full > of trace_.*_rcuidle() left: > > kernel/trace/trace_preemptirq.c: > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > kernel/trace/trace_preemptirq.c: > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > kernel/trace/trace_preemptirq.c: > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); > kernel/trace/trace_preemptirq.c: > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); > kernel/trace/trace_preemptirq.c: > trace_preempt_enable_rcuidle(a0, a1); > kernel/trace/trace_preemptirq.c: > trace_preempt_disable_rcuidle(a0, a1); > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY. I think those are also unused on arm64 too? If not, I can go attack that. > I've touched a _lot_ of code that I can't test and likely broken some of it :/ > In particular, the whole ARM cpuidle stuff was quite involved with OMAP being > the absolute 'winner'. > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to > GENERIC_ENTRY. Moving to GENERIC_ENTRY as a whole is going to take a tonne of work (refactoring both arm64 and the generic portion to be more amenable to each other), but we can certainly move closer to that for the bits that matter here. Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff that we can select regardless of GENERIC_ENTRY to make that easier. > I've also got a note that says ARM64 can probably do a WFE based > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs. Possibly; I'm not sure how much of a win that'll be given that by default we'll have a ~10KHz WFE wakeup from the timer, but we could take a peek. Thanks, Mark.
[xen-unstable test] 171157: regressions - FAIL
flight 171157 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/171157/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64-pvops 6 kernel-build fail REGR. vs. 171151 Tests which did not succeed, but are not blocking: test-arm64-arm64-examine 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a test-arm64-arm64-xl-seattle 1 build-check(1) blocked n/a test-arm64-arm64-xl-thunderx 1 build-check(1) blocked n/a test-arm64-arm64-xl-vhd 1 build-check(1) blocked n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 7 xen-install fail like 171151 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 171151 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 171151 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 171151 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 171151 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 171151 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 171151 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 171151 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 171151 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 171151 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 171151 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 171151 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 171151 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen c9a707df83aad17a6fcf2e8330ab3b5bead6fb8b baseline version: xen c9a707df83aad17a6fcf2e8330ab3b5bead6fb8b Last test of basis 171157 2022-06-14 01:53:24 Z0 days Testing same since (not found) 0 attempts jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-arm64
Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU
On 14.06.2022 13:08, Julien Grall wrote: >>> + unsigned int rc = 0; >> ... here you are setting rc to 0 even though it will be reassigned. >> Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving >> the definition there? > > Because I forgot to replace "return NOTIFY_DONE;" with : > > return !rc ? NOTIFY_DONE : notifier_from_errno(rc); That is what I thought. With these fixes you can add my Rb. > > In this case, we would need to initialize rc to 0 so it doesn't get used > initialized. > > Cheers, > Cheers, Michal
Re: [PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU
On 14/06/2022 12:05, Michal Orzel wrote: Hi Julien, On 14.06.2022 11:41, Julien Grall wrote: From: Julien Grall Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in xmalloc()" extended the checks in _xmalloc() to catch any use of the helpers from context with interrupts disabled. Unfortunately, the rule is not followed when initializing the per-CPU IRQs: (XEN) Xen call trace: (XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC) (XEN)[<>] (LR) (XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0 (XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4 (XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c (XEN)[<00288fa4>] start_secondary+0x194/0x274 (XEN)[<40010170>] 40010170 (XEN) (XEN) (XEN) (XEN) Panic on CPU 2: (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:601 (XEN) This is happening because zalloc_cpumask_var() may allocate memory if NR_CPUS is > 2 * sizeof(unsigned long). Avoid the problem by allocate the per-CPU IRQs while preparing the CPU. Shouldn't this be" by initializing the per-CPU IRQs while ..." ? I am fine with using "initializing" rather than "allocating". Either way this text is the same like in the previous patch so I think this is not correct. I can't quite parse this. Other than that: Reviewed-by: Michal Orzel Thanks! Cheers, -- Julien Grall
Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU
On 14/06/2022 12:02, Michal Orzel wrote: Hi Julien, Hi Michal, On 14.06.2022 11:41, Julien Grall wrote: From: Julien Grall Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in xmalloc()" extended the checks in _xmalloc() to catch any use of the helpers from context with interrupts disabled. Unfortunately, the rule is not followed when allocating the CPU sibling/core maps. (XEN) Xen call trace: (XEN)[<00238a5c>] _xmalloc+0xfc/0x314 (PC) (XEN)[<>] (LR) (XEN)[<00238c8c>] _xzalloc+0x18/0x4c (XEN)[<00288cb4>] smpboot.c#setup_cpu_sibling_map+0x38/0x138 (XEN)[<00289024>] start_secondary+0x1b4/0x270 (XEN)[<40010170>] 40010170 (XEN) (XEN) (XEN) (XEN) Panic on CPU 2: (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:601 (XEN) This is happening because zalloc_cpumask_var() may allocate memory if NR_CPUS is > 2 * sizeof(unsigned long). Avoid the problem by allocate the per-CPU IRQs while preparing the CPU. Shouldn't this be "by allocating the CPU sibling/core maps while ..." to reflect the commit title and to distinguish between this change and the IRQ one? Yes. I will update it. [...] static void remove_cpu_sibling_map(int cpu) @@ -292,9 +294,14 @@ smp_get_max_cpus (void) void __init smp_prepare_cpus(void) { +int rc; Here you are leaving rc uninitialized (which is ok) but ... + cpumask_copy(&cpu_present_map, &cpu_possible_map); -setup_cpu_sibling_map(0); +rc = setup_cpu_sibling_map(0); +if ( rc ) +panic("Unable to allocate CPU sibling/core maps\n"); + } /* Boot the current CPU */ @@ -361,8 +368,6 @@ void start_secondary(void) set_current(idle_vcpu[cpuid]); -setup_cpu_sibling_map(cpuid); - /* Run local notifiers */ notify_cpu_starting(cpuid); /* @@ -530,9 +535,19 @@ static int cpu_smpboot_callback(struct notifier_block *nfb, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; +unsigned int rc = 0; ... here you are setting rc to 0 even though it will be reassigned. Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving the definition there? Because I forgot to replace "return NOTIFY_DONE;" with : return !rc ? NOTIFY_DONE : notifier_from_errno(rc); In this case, we would need to initialize rc to 0 so it doesn't get used initialized. Cheers, -- Julien Grall
Re: [PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU
Hi Julien, On 14.06.2022 11:41, Julien Grall wrote: > From: Julien Grall > > Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in > xmalloc()" extended the checks in _xmalloc() to catch any use of the > helpers from context with interrupts disabled. > > Unfortunately, the rule is not followed when initializing the per-CPU > IRQs: > > (XEN) Xen call trace: > (XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC) > (XEN)[<>] (LR) > (XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0 > (XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4 > (XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c > (XEN)[<00288fa4>] start_secondary+0x194/0x274 > (XEN)[<40010170>] 40010170 > (XEN) > (XEN) > (XEN) > (XEN) Panic on CPU 2: > (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= > 1)' failed at common/xmalloc_tlsf.c:601 > (XEN) > > This is happening because zalloc_cpumask_var() may allocate memory > if NR_CPUS is > 2 * sizeof(unsigned long). > > Avoid the problem by allocate the per-CPU IRQs while preparing the > CPU. Shouldn't this be" by initializing the per-CPU IRQs while ..." ? Either way this text is the same like in the previous patch so I think this is not correct. Other than that: Reviewed-by: Michal Orzel
Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU
Hi Julien, On 14.06.2022 11:41, Julien Grall wrote: > From: Julien Grall > > Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in > xmalloc()" extended the checks in _xmalloc() to catch any use of the > helpers from context with interrupts disabled. > > Unfortunately, the rule is not followed when allocating the CPU > sibling/core maps. > > (XEN) Xen call trace: > (XEN)[<00238a5c>] _xmalloc+0xfc/0x314 (PC) > (XEN)[<>] (LR) > (XEN)[<00238c8c>] _xzalloc+0x18/0x4c > (XEN)[<00288cb4>] smpboot.c#setup_cpu_sibling_map+0x38/0x138 > (XEN)[<00289024>] start_secondary+0x1b4/0x270 > (XEN)[<40010170>] 40010170 > (XEN) > (XEN) > (XEN) > (XEN) Panic on CPU 2: > (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= > 1)' failed at common/xmalloc_tlsf.c:601 > (XEN) > > This is happening because zalloc_cpumask_var() may allocate memory > if NR_CPUS is > 2 * sizeof(unsigned long). > > Avoid the problem by allocate the per-CPU IRQs while preparing the > CPU. Shouldn't this be "by allocating the CPU sibling/core maps while ..." to reflect the commit title and to distinguish between this change and the IRQ one? > > This also has the benefit to remove a panic() in the secondary CPU > code. > > Signed-off-by: Julien Grall > --- > xen/arch/arm/smpboot.c | 25 - > 1 file changed, 20 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 4888bcd78a5a..2b0c92cd369b 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -79,15 +79,17 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); > static bool __read_mostly opt_hmp_unsafe = false; > boolean_param("hmp-unsafe", opt_hmp_unsafe); > > -static void setup_cpu_sibling_map(int cpu) > +static int setup_cpu_sibling_map(int cpu) > { > if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) || > !zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) ) > -panic("No memory for CPU sibling/core maps\n"); > +return -ENOMEM; > > /* A CPU is a sibling with itself and is always on its own core. */ > cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu)); > cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu)); > + > +return 0; > } > > static void remove_cpu_sibling_map(int cpu) > @@ -292,9 +294,14 @@ smp_get_max_cpus (void) > void __init > smp_prepare_cpus(void) > { > +int rc; Here you are leaving rc uninitialized (which is ok) but ... > + > cpumask_copy(&cpu_present_map, &cpu_possible_map); > > -setup_cpu_sibling_map(0); > +rc = setup_cpu_sibling_map(0); > +if ( rc ) > +panic("Unable to allocate CPU sibling/core maps\n"); > + > } > > /* Boot the current CPU */ > @@ -361,8 +368,6 @@ void start_secondary(void) > > set_current(idle_vcpu[cpuid]); > > -setup_cpu_sibling_map(cpuid); > - > /* Run local notifiers */ > notify_cpu_starting(cpuid); > /* > @@ -530,9 +535,19 @@ static int cpu_smpboot_callback(struct notifier_block > *nfb, > void *hcpu) > { > unsigned int cpu = (unsigned long)hcpu; > +unsigned int rc = 0; ... here you are setting rc to 0 even though it will be reassigned. Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving the definition there? > > switch ( action ) > { > +case CPU_UP_PREPARE: > +rc = setup_cpu_sibling_map(cpu); > +if ( rc ) > +printk(XENLOG_ERR > + "Unable to allocate CPU sibling/core map for CPU%u\n", Too many spaces between 'map' and 'for'. > + cpu); > + > +break; > + > case CPU_DEAD: > remove_cpu_sibling_map(cpu); > break; Cheers, Michal
Re: [PULL 00/16] Kraxel 20220613 patches
On Tue, Jun 14, 2022 at 11:40:38AM +0200, Gerd Hoffmann wrote: > On Mon, Jun 13, 2022 at 08:52:21AM -0700, Richard Henderson wrote: > > On 6/13/22 04:36, Gerd Hoffmann wrote: > > > The following changes since commit > > > dcb40541ebca7ec98a14d461593b3cd7282b4fac: > > > > > >Merge tag 'mips-20220611' of https://github.com/philmd/qemu into > > > staging (2022-06-11 21:13:27 -0700) > > > > > > are available in the Git repository at: > > > > > >git://git.kraxel.org/qemu tags/kraxel-20220613-pull-request > > > > > > for you to fetch changes up to 23b87f7a3a13e93e248eef8a4b7257548855a620: > > > > > >ui: move 'pc-bios/keymaps' to 'ui/keymaps' (2022-06-13 10:59:25 +0200) > > > > > > > > > usb: add CanoKey device, fixes for ehci + redir > > > ui: fixes for gtk and cocoa, move keymaps (v2), rework refresh rate > > > virtio-gpu: scanout flush fix > > > > This doesn't even configure: > > > > ../src/ui/keymaps/meson.build:55:4: ERROR: File ar does not exist. > > Hmm, build worked here and CI passed too. > > I think this is one of those cases where the build directory must be > deleted because one subdirectory is replaced by a compatibility > symlink. Except 'configure' deals with that, as it explicitly rm -rf's the symlink target: symlink() { rm -rf "$2" mkdir -p "$(dirname "$2")" ln -s "$1" "$2" } so i'm pretty confused as to what's going wrong here still With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH] xen/console: do not drop serial output from the hardware domain
On 14.06.2022 11:38, Roger Pau Monné wrote: > On Tue, Jun 14, 2022 at 11:13:07AM +0200, Jan Beulich wrote: >> On 14.06.2022 10:32, Roger Pau Monné wrote: >>> On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote: On 14.06.2022 08:52, Roger Pau Monné wrote: > On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote: >> On 13.06.2022 14:32, Roger Pau Monné wrote: >>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote: On 13.06.2022 11:04, Roger Pau Monné wrote: > On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote: >> On 13.06.2022 10:21, Roger Pau Monné wrote: >>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote: On 10.06.2022 17:06, Roger Pau Monne wrote: > Prevent dropping console output from the hardware domain, since > it's > likely important to have all the output if the boot fails without > having to resort to sync_console (which also affects the output > from > other guests). > > Do so by pairing the console_serial_puts() with > serial_{start,end}_log_everything(), so that no output is dropped. While I can see the goal, why would Dom0 output be (effectively) more important than Xen's own one (which isn't "forced")? And with this aiming at boot output only, wouldn't you want to stop the overriding once boot has completed (of which, if I'm not mistaken, we don't really have any signal coming from Dom0)? And even during boot I'm not convinced we'd want to let through everything, but perhaps just Dom0's kernel messages? >>> >>> I normally use sync_console on all the boxes I'm doing dev work, so >>> this request is something that come up internally. >>> >>> Didn't realize Xen output wasn't forced, since we already have rate >>> limiting based on log levels I was assuming that non-ratelimited >>> messages wouldn't be dropped. But yes, I agree that Xen (non-guest >>> triggered) output shouldn't be rate limited either. >> >> Which would raise the question of why we have log levels for >> non-guest >> messages. > > Hm, maybe I'm confused, but I don't see a direct relation between log > levels and rate limiting. If I set log level to WARNING I would > expect to not loose _any_ non-guest log messages with level WARNING or > above. It's still useful to have log levels for non-guest messages, > since user might want to filter out DEBUG non-guest messages for > example. It was me who was confused, because of the two log-everything variants we have (console and serial). You're right that your change is unrelated to log levels. However, when there are e.g. many warnings or when an admin has lowered the log level, what you (would) do is effectively force sync_console mode transiently (for a subset of messages, but that's secondary, especially because the "forced" output would still be waiting for earlier output to make it out). >>> >>> Right, it would have to wait for any previous output on the buffer to >>> go out first. In any case we can guarantee that no more output will >>> be added to the buffer while Xen waits for it to be flushed. >>> >>> So for the hardware domain it might make sense to wait for the TX >>> buffers to be half empty (the current tx_quench logic) by preempting >>> the hypercall. That however could cause issues if guests manage to >>> keep filling the buffer while the hardware domain is being preempted. >>> >>> Alternatively we could always reserve half of the buffer for the >>> hardware domain, and allow it to be preempted while waiting for space >>> (since it's guaranteed non hardware domains won't be able to steal the >>> allocation from the hardware domain). >> >> Getting complicated it seems. I have to admit that I wonder whether we >> wouldn't be better off leaving the current logic as is. > > Another possible solution (more like a band aid) is to increase the > buffer size from 4 pages to 8 or 16. That would likely allow to cope > fine with the high throughput of boot messages. You mean the buffer whose size is controlled by serial_tx_buffer? >>> >>> Yes. >>> On large systems one may want to simply make use of the command line option then; I don't think the built-in default needs changing. Or if so, then perhaps not statically at build time, but taking into account system properties (like CPU count). >>> >>> So how about we use: >>> >>> min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096)) >> >> That
[PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU
From: Julien Grall Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in xmalloc()" extended the checks in _xmalloc() to catch any use of the helpers from context with interrupts disabled. Unfortunately, the rule is not followed when initializing the per-CPU IRQs: (XEN) Xen call trace: (XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC) (XEN)[<>] (LR) (XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0 (XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4 (XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c (XEN)[<00288fa4>] start_secondary+0x194/0x274 (XEN)[<40010170>] 40010170 (XEN) (XEN) (XEN) (XEN) Panic on CPU 2: (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:601 (XEN) This is happening because zalloc_cpumask_var() may allocate memory if NR_CPUS is > 2 * sizeof(unsigned long). Avoid the problem by allocate the per-CPU IRQs while preparing the CPU. This also has the benefit to remove a BUG_ON() in the secondary CPU code. Signed-off-by: Julien Grall --- xen/arch/arm/include/asm/irq.h | 1 - xen/arch/arm/irq.c | 35 +++--- xen/arch/arm/smpboot.c | 2 -- 3 files changed, 28 insertions(+), 10 deletions(-) diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h index e45d57459899..245f49dcbac5 100644 --- a/xen/arch/arm/include/asm/irq.h +++ b/xen/arch/arm/include/asm/irq.h @@ -73,7 +73,6 @@ static inline bool is_lpi(unsigned int irq) bool is_assignable_irq(unsigned int irq); void init_IRQ(void); -void init_secondary_IRQ(void); int route_irq_to_guest(struct domain *d, unsigned int virq, unsigned int irq, const char *devname); diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index b761d90c4063..56bdcb95335d 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -17,6 +17,7 @@ * GNU General Public License for more details. */ +#include #include #include #include @@ -100,7 +101,7 @@ static int __init init_irq_data(void) return 0; } -static int init_local_irq_data(void) +static int init_local_irq_data(unsigned int cpu) { int irq; @@ -108,7 +109,7 @@ static int init_local_irq_data(void) for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ ) { -struct irq_desc *desc = irq_to_desc(irq); +struct irq_desc *desc = &per_cpu(local_irq_desc, cpu)[irq]; int rc = init_one_irq_desc(desc); if ( rc ) @@ -131,6 +132,29 @@ static int init_local_irq_data(void) return 0; } +static int cpu_callback(struct notifier_block *nfb, unsigned long action, +void *hcpu) +{ +unsigned long cpu = (unsigned long)hcpu; +int rc = 0; + +switch ( action ) +{ +case CPU_UP_PREPARE: +rc = init_local_irq_data(cpu); +if ( rc ) +printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%lu\n", + cpu); +break; +} + +return !rc ? NOTIFY_DONE : notifier_from_errno(rc); +} + +static struct notifier_block cpu_nfb = { +.notifier_call = cpu_callback, +}; + void __init init_IRQ(void) { int irq; @@ -140,13 +164,10 @@ void __init init_IRQ(void) local_irqs_type[irq] = IRQ_TYPE_INVALID; spin_unlock(&local_irqs_type_lock); -BUG_ON(init_local_irq_data() < 0); +BUG_ON(init_local_irq_data(smp_processor_id()) < 0); BUG_ON(init_irq_data() < 0); -} -void init_secondary_IRQ(void) -{ -BUG_ON(init_local_irq_data() < 0); +register_cpu_notifier(&cpu_nfb); } static inline struct irq_guest *irq_get_guest_info(struct irq_desc *desc) diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index 9bb32a301a70..4888bcd78a5a 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -359,8 +359,6 @@ void start_secondary(void) gic_init_secondary_cpu(); -init_secondary_IRQ(); - set_current(idle_vcpu[cpuid]); setup_cpu_sibling_map(cpuid); -- 2.32.0
[PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU
From: Julien Grall Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in xmalloc()" extended the checks in _xmalloc() to catch any use of the helpers from context with interrupts disabled. Unfortunately, the rule is not followed when allocating the CPU sibling/core maps. (XEN) Xen call trace: (XEN)[<00238a5c>] _xmalloc+0xfc/0x314 (PC) (XEN)[<>] (LR) (XEN)[<00238c8c>] _xzalloc+0x18/0x4c (XEN)[<00288cb4>] smpboot.c#setup_cpu_sibling_map+0x38/0x138 (XEN)[<00289024>] start_secondary+0x1b4/0x270 (XEN)[<40010170>] 40010170 (XEN) (XEN) (XEN) (XEN) Panic on CPU 2: (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:601 (XEN) This is happening because zalloc_cpumask_var() may allocate memory if NR_CPUS is > 2 * sizeof(unsigned long). Avoid the problem by allocate the per-CPU IRQs while preparing the CPU. This also has the benefit to remove a panic() in the secondary CPU code. Signed-off-by: Julien Grall --- xen/arch/arm/smpboot.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index 4888bcd78a5a..2b0c92cd369b 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -79,15 +79,17 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); static bool __read_mostly opt_hmp_unsafe = false; boolean_param("hmp-unsafe", opt_hmp_unsafe); -static void setup_cpu_sibling_map(int cpu) +static int setup_cpu_sibling_map(int cpu) { if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) || !zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) ) -panic("No memory for CPU sibling/core maps\n"); +return -ENOMEM; /* A CPU is a sibling with itself and is always on its own core. */ cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu)); cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu)); + +return 0; } static void remove_cpu_sibling_map(int cpu) @@ -292,9 +294,14 @@ smp_get_max_cpus (void) void __init smp_prepare_cpus(void) { +int rc; + cpumask_copy(&cpu_present_map, &cpu_possible_map); -setup_cpu_sibling_map(0); +rc = setup_cpu_sibling_map(0); +if ( rc ) +panic("Unable to allocate CPU sibling/core maps\n"); + } /* Boot the current CPU */ @@ -361,8 +368,6 @@ void start_secondary(void) set_current(idle_vcpu[cpuid]); -setup_cpu_sibling_map(cpuid); - /* Run local notifiers */ notify_cpu_starting(cpuid); /* @@ -530,9 +535,19 @@ static int cpu_smpboot_callback(struct notifier_block *nfb, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; +unsigned int rc = 0; switch ( action ) { +case CPU_UP_PREPARE: +rc = setup_cpu_sibling_map(cpu); +if ( rc ) +printk(XENLOG_ERR + "Unable to allocate CPU sibling/core map for CPU%u\n", + cpu); + +break; + case CPU_DEAD: remove_cpu_sibling_map(cpu); break; -- 2.32.0
Re: [PULL 00/16] Kraxel 20220613 patches
On Mon, Jun 13, 2022 at 08:52:21AM -0700, Richard Henderson wrote: > On 6/13/22 04:36, Gerd Hoffmann wrote: > > The following changes since commit dcb40541ebca7ec98a14d461593b3cd7282b4fac: > > > >Merge tag 'mips-20220611' of https://github.com/philmd/qemu into staging > > (2022-06-11 21:13:27 -0700) > > > > are available in the Git repository at: > > > >git://git.kraxel.org/qemu tags/kraxel-20220613-pull-request > > > > for you to fetch changes up to 23b87f7a3a13e93e248eef8a4b7257548855a620: > > > >ui: move 'pc-bios/keymaps' to 'ui/keymaps' (2022-06-13 10:59:25 +0200) > > > > > > usb: add CanoKey device, fixes for ehci + redir > > ui: fixes for gtk and cocoa, move keymaps (v2), rework refresh rate > > virtio-gpu: scanout flush fix > > This doesn't even configure: > > ../src/ui/keymaps/meson.build:55:4: ERROR: File ar does not exist. Hmm, build worked here and CI passed too. I think this is one of those cases where the build directory must be deleted because one subdirectory is replaced by a compatibility symlink. Or we drop the symlink idea and update the keymap loading code to check both old and new location. Daniel? take care, Gerd
Re: [PATCH] xen/console: do not drop serial output from the hardware domain
On Tue, Jun 14, 2022 at 11:13:07AM +0200, Jan Beulich wrote: > On 14.06.2022 10:32, Roger Pau Monné wrote: > > On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote: > >> On 14.06.2022 08:52, Roger Pau Monné wrote: > >>> On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote: > On 13.06.2022 14:32, Roger Pau Monné wrote: > > On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote: > >> On 13.06.2022 11:04, Roger Pau Monné wrote: > >>> On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote: > On 13.06.2022 10:21, Roger Pau Monné wrote: > > On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote: > >> On 10.06.2022 17:06, Roger Pau Monne wrote: > >>> Prevent dropping console output from the hardware domain, since > >>> it's > >>> likely important to have all the output if the boot fails without > >>> having to resort to sync_console (which also affects the output > >>> from > >>> other guests). > >>> > >>> Do so by pairing the console_serial_puts() with > >>> serial_{start,end}_log_everything(), so that no output is dropped. > >> > >> While I can see the goal, why would Dom0 output be (effectively) > >> more > >> important than Xen's own one (which isn't "forced")? And with this > >> aiming at boot output only, wouldn't you want to stop the > >> overriding > >> once boot has completed (of which, if I'm not mistaken, we don't > >> really have any signal coming from Dom0)? And even during boot I'm > >> not convinced we'd want to let through everything, but perhaps just > >> Dom0's kernel messages? > > > > I normally use sync_console on all the boxes I'm doing dev work, so > > this request is something that come up internally. > > > > Didn't realize Xen output wasn't forced, since we already have rate > > limiting based on log levels I was assuming that non-ratelimited > > messages wouldn't be dropped. But yes, I agree that Xen (non-guest > > triggered) output shouldn't be rate limited either. > > Which would raise the question of why we have log levels for > non-guest > messages. > >>> > >>> Hm, maybe I'm confused, but I don't see a direct relation between log > >>> levels and rate limiting. If I set log level to WARNING I would > >>> expect to not loose _any_ non-guest log messages with level WARNING or > >>> above. It's still useful to have log levels for non-guest messages, > >>> since user might want to filter out DEBUG non-guest messages for > >>> example. > >> > >> It was me who was confused, because of the two log-everything variants > >> we have (console and serial). You're right that your change is > >> unrelated > >> to log levels. However, when there are e.g. many warnings or when an > >> admin has lowered the log level, what you (would) do is effectively > >> force sync_console mode transiently (for a subset of messages, but > >> that's secondary, especially because the "forced" output would still > >> be waiting for earlier output to make it out). > > > > Right, it would have to wait for any previous output on the buffer to > > go out first. In any case we can guarantee that no more output will > > be added to the buffer while Xen waits for it to be flushed. > > > > So for the hardware domain it might make sense to wait for the TX > > buffers to be half empty (the current tx_quench logic) by preempting > > the hypercall. That however could cause issues if guests manage to > > keep filling the buffer while the hardware domain is being preempted. > > > > Alternatively we could always reserve half of the buffer for the > > hardware domain, and allow it to be preempted while waiting for space > > (since it's guaranteed non hardware domains won't be able to steal the > > allocation from the hardware domain). > > Getting complicated it seems. I have to admit that I wonder whether we > wouldn't be better off leaving the current logic as is. > >>> > >>> Another possible solution (more like a band aid) is to increase the > >>> buffer size from 4 pages to 8 or 16. That would likely allow to cope > >>> fine with the high throughput of boot messages. > >> > >> You mean the buffer whose size is controlled by serial_tx_buffer? > > > > Yes. > > > >> On > >> large systems one may want to simply make use of the command line > >> option then; I don't think the built-in default needs changing. Or > >> if so, then perhaps not statically at build time, but taking into > >> account system properties (like CPU count). > > > > So how about we use: > > > > min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096)) > > That would _reduce_ size on small systems, would
Re: [PATCH] xen/console: do not drop serial output from the hardware domain
On 14.06.2022 10:32, Roger Pau Monné wrote: > On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote: >> On 14.06.2022 08:52, Roger Pau Monné wrote: >>> On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote: On 13.06.2022 14:32, Roger Pau Monné wrote: > On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote: >> On 13.06.2022 11:04, Roger Pau Monné wrote: >>> On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote: On 13.06.2022 10:21, Roger Pau Monné wrote: > On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote: >> On 10.06.2022 17:06, Roger Pau Monne wrote: >>> Prevent dropping console output from the hardware domain, since it's >>> likely important to have all the output if the boot fails without >>> having to resort to sync_console (which also affects the output from >>> other guests). >>> >>> Do so by pairing the console_serial_puts() with >>> serial_{start,end}_log_everything(), so that no output is dropped. >> >> While I can see the goal, why would Dom0 output be (effectively) more >> important than Xen's own one (which isn't "forced")? And with this >> aiming at boot output only, wouldn't you want to stop the overriding >> once boot has completed (of which, if I'm not mistaken, we don't >> really have any signal coming from Dom0)? And even during boot I'm >> not convinced we'd want to let through everything, but perhaps just >> Dom0's kernel messages? > > I normally use sync_console on all the boxes I'm doing dev work, so > this request is something that come up internally. > > Didn't realize Xen output wasn't forced, since we already have rate > limiting based on log levels I was assuming that non-ratelimited > messages wouldn't be dropped. But yes, I agree that Xen (non-guest > triggered) output shouldn't be rate limited either. Which would raise the question of why we have log levels for non-guest messages. >>> >>> Hm, maybe I'm confused, but I don't see a direct relation between log >>> levels and rate limiting. If I set log level to WARNING I would >>> expect to not loose _any_ non-guest log messages with level WARNING or >>> above. It's still useful to have log levels for non-guest messages, >>> since user might want to filter out DEBUG non-guest messages for >>> example. >> >> It was me who was confused, because of the two log-everything variants >> we have (console and serial). You're right that your change is unrelated >> to log levels. However, when there are e.g. many warnings or when an >> admin has lowered the log level, what you (would) do is effectively >> force sync_console mode transiently (for a subset of messages, but >> that's secondary, especially because the "forced" output would still >> be waiting for earlier output to make it out). > > Right, it would have to wait for any previous output on the buffer to > go out first. In any case we can guarantee that no more output will > be added to the buffer while Xen waits for it to be flushed. > > So for the hardware domain it might make sense to wait for the TX > buffers to be half empty (the current tx_quench logic) by preempting > the hypercall. That however could cause issues if guests manage to > keep filling the buffer while the hardware domain is being preempted. > > Alternatively we could always reserve half of the buffer for the > hardware domain, and allow it to be preempted while waiting for space > (since it's guaranteed non hardware domains won't be able to steal the > allocation from the hardware domain). Getting complicated it seems. I have to admit that I wonder whether we wouldn't be better off leaving the current logic as is. >>> >>> Another possible solution (more like a band aid) is to increase the >>> buffer size from 4 pages to 8 or 16. That would likely allow to cope >>> fine with the high throughput of boot messages. >> >> You mean the buffer whose size is controlled by serial_tx_buffer? > > Yes. > >> On >> large systems one may want to simply make use of the command line >> option then; I don't think the built-in default needs changing. Or >> if so, then perhaps not statically at build time, but taking into >> account system properties (like CPU count). > > So how about we use: > > min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096)) That would _reduce_ size on small systems, wouldn't it? Originally you were after increasing the default size. But if you had meant max(), then I'd fear on very large systems this may grow a little too large. > Maybe we should also take CPU frequency into account, but that seems > too complex for the purpose. Why would frequency matter? Other aspects I could see
[linux-linus test] 171156: regressions - FAIL
flight 171156 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/171156/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-examine 8 reboot fail REGR. vs. 170714 test-amd64-amd64-xl-credit2 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-xl-qemuu-debianhvm-amd64 8 xen-bootfail REGR. vs. 170714 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-libvirt 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-xl-multivcpu 8 xen-bootfail REGR. vs. 170714 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host fail REGR. vs. 170714 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host fail REGR. vs. 170714 test-amd64-amd64-xl-pvhv2-intel 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-freebsd12-amd64 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-qemuu-nested-intel 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-seattle 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-credit2 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-libvirt-qcow2 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-libvirt-raw 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-credit1 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-vhd 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-examine 8 reboot fail REGR. vs. 170714 test-amd64-amd64-examine-bios 8 reboot fail REGR. vs. 170714 test-arm64-arm64-libvirt-raw 8 xen-boot fail REGR. vs. 170714 test-amd64-amd64-examine-uefi 8 reboot fail REGR. vs. 170714 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 170714 test-arm64-arm64-libvirt-xsm 8 xen-boot fail REGR. vs. 170714 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl 8 xen-boot fail pass in 171154 test-amd64-amd64-xl-vhd 8 xen-boot fail pass in 171154 test-amd64-amd64-xl-qemuu-ovmf-amd64 8 xen-boot fail pass in 171154 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 8 xen-boot fail pass in 171155 test-amd64-amd64-pair12 xen-boot/src_host fail pass in 171155 test-amd64-amd64-pair13 xen-boot/dst_host fail pass in 171155 test-amd64-amd64-xl-credit1 8 xen-boot fail pass in 171155 test-amd64-amd64-xl-xsm 8 xen-boot fail pass in 171155 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170714 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170714 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170714 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170714 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170714 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 170714 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 170714 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 170714 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cred
[libvirt test] 171159: regressions - FAIL
flight 171159 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/171159/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a version targeted for testing: libvirt 31b5ad06e315a8e9625a7fec56dc14af15895c89 baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 704 days Failing since151818 2020-07-11 04:18:52 Z 703 days 685 attempts Testing same since 171159 2022-06-14 04:18:58 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Amneesh Singh Andika Triwidada Andrea Bolognani Andrew Melnychenko Ani Sinha Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Bjoern Walk Boris Fiuczynski Brad Laue Brian Turek Bruno Haible Chris Mayo Christian Borntraeger Christian Ehrhardt Christian Kirbach Christian Schoenebeck Christophe Fergeau Claudio Fontana Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Didik Supriadi dinglimin Divya Garg Dmitrii Shcherbakov Dmytro Linkin Eiichi Tsukata Emilio Herrera Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fabiano Fidêncio Fangge Jin Farhan Ali Fedora Weblate Translation Franck Ridel Gavi Teitz gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Haonan Wang Hela Basa Helmut Grohne Hiroki Narukawa Hyman Huang(黄勇) Ian Wienand Ioanna Alifieraki Ivan Teterevkov Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen jason lee Jean-Baptiste Holcroft Jia Zhou Jianan Gao Jim Fehlig Jin Yan Jing Qi Jinsheng Zhang Jiri Denemark Joachim Falk John Ferlan John Levon John Levon Jonathan Watt Jonathon Jongsma Julio Faracco Justin Gatzen Ján Tomko Kashyap Chamarthy Kevin Locke Kim InSoo Koichi Murase Kristina Hanicova Laine Stump Laszlo Ersek Lee Yarwood Lei Yang Lena Voytek Liang Yan Liang Yan Liao Pingfang Lin Ma Lin Ma Lin Ma Liu Yiding Lubomir Rintel Luke Yue Luyao Zhong luzhipeng Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Mark Mielke Markus Schade Martin Kletzander Martin Pitt Masayoshi Mizuma Matej Cepl Matt Coleman Matt Coleman Mauro Matteo Cascella Max Goodhart Maxim Nestratov Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Moteen Shah Moteen Shah Muha Aliss Nathan Neal Gompa Nick Chevsky Nick Shyrokovskiy Nickys Music Group Nico Pache Nicolas Lécureuil Nicolas Lécureuil Nikolay Shirokovskiy Nikolay Shirokovskiy Nikolay Shirokovskiy Niteesh Dubey Olaf Hering Olesya Gerasimenko Or Ozeri Orion Poplawski Pany Paolo Bonzini Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peng Liang
Re: [PATCH] xen/console: do not drop serial output from the hardware domain
On Tue, Jun 14, 2022 at 10:32:53AM +0200, Roger Pau Monné wrote: > On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote: > > On 14.06.2022 08:52, Roger Pau Monné wrote: > > > On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote: > > >> On 13.06.2022 14:32, Roger Pau Monné wrote: > > >>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote: > > On 13.06.2022 11:04, Roger Pau Monné wrote: > > > On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote: > > >> On 13.06.2022 10:21, Roger Pau Monné wrote: > > >>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote: > > On 10.06.2022 17:06, Roger Pau Monne wrote: > > > Prevent dropping console output from the hardware domain, since > > > it's > > > likely important to have all the output if the boot fails without > > > having to resort to sync_console (which also affects the output > > > from > > > other guests). > > > > > > Do so by pairing the console_serial_puts() with > > > serial_{start,end}_log_everything(), so that no output is dropped. > > > > While I can see the goal, why would Dom0 output be (effectively) > > more > > important than Xen's own one (which isn't "forced")? And with this > > aiming at boot output only, wouldn't you want to stop the > > overriding > > once boot has completed (of which, if I'm not mistaken, we don't > > really have any signal coming from Dom0)? And even during boot I'm > > not convinced we'd want to let through everything, but perhaps just > > Dom0's kernel messages? > > >>> > > >>> I normally use sync_console on all the boxes I'm doing dev work, so > > >>> this request is something that come up internally. > > >>> > > >>> Didn't realize Xen output wasn't forced, since we already have rate > > >>> limiting based on log levels I was assuming that non-ratelimited > > >>> messages wouldn't be dropped. But yes, I agree that Xen (non-guest > > >>> triggered) output shouldn't be rate limited either. > > >> > > >> Which would raise the question of why we have log levels for > > >> non-guest > > >> messages. > > > > > > Hm, maybe I'm confused, but I don't see a direct relation between log > > > levels and rate limiting. If I set log level to WARNING I would > > > expect to not loose _any_ non-guest log messages with level WARNING or > > > above. It's still useful to have log levels for non-guest messages, > > > since user might want to filter out DEBUG non-guest messages for > > > example. > > > > It was me who was confused, because of the two log-everything variants > > we have (console and serial). You're right that your change is > > unrelated > > to log levels. However, when there are e.g. many warnings or when an > > admin has lowered the log level, what you (would) do is effectively > > force sync_console mode transiently (for a subset of messages, but > > that's secondary, especially because the "forced" output would still > > be waiting for earlier output to make it out). > > >>> > > >>> Right, it would have to wait for any previous output on the buffer to > > >>> go out first. In any case we can guarantee that no more output will > > >>> be added to the buffer while Xen waits for it to be flushed. > > >>> > > >>> So for the hardware domain it might make sense to wait for the TX > > >>> buffers to be half empty (the current tx_quench logic) by preempting > > >>> the hypercall. That however could cause issues if guests manage to > > >>> keep filling the buffer while the hardware domain is being preempted. > > >>> > > >>> Alternatively we could always reserve half of the buffer for the > > >>> hardware domain, and allow it to be preempted while waiting for space > > >>> (since it's guaranteed non hardware domains won't be able to steal the > > >>> allocation from the hardware domain). > > >> > > >> Getting complicated it seems. I have to admit that I wonder whether we > > >> wouldn't be better off leaving the current logic as is. > > > > > > Another possible solution (more like a band aid) is to increase the > > > buffer size from 4 pages to 8 or 16. That would likely allow to cope > > > fine with the high throughput of boot messages. > > > > You mean the buffer whose size is controlled by serial_tx_buffer? > > Yes. > > > On > > large systems one may want to simply make use of the command line > > option then; I don't think the built-in default needs changing. Or > > if so, then perhaps not statically at build time, but taking into > > account system properties (like CPU count). > > So how about we use: > > min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096)) Er, sorry, that should be max(...) instead. Thanks, Roger.
Re: [PATCH] xen/console: do not drop serial output from the hardware domain
On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote: > On 14.06.2022 08:52, Roger Pau Monné wrote: > > On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote: > >> On 13.06.2022 14:32, Roger Pau Monné wrote: > >>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote: > On 13.06.2022 11:04, Roger Pau Monné wrote: > > On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote: > >> On 13.06.2022 10:21, Roger Pau Monné wrote: > >>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote: > On 10.06.2022 17:06, Roger Pau Monne wrote: > > Prevent dropping console output from the hardware domain, since it's > > likely important to have all the output if the boot fails without > > having to resort to sync_console (which also affects the output from > > other guests). > > > > Do so by pairing the console_serial_puts() with > > serial_{start,end}_log_everything(), so that no output is dropped. > > While I can see the goal, why would Dom0 output be (effectively) more > important than Xen's own one (which isn't "forced")? And with this > aiming at boot output only, wouldn't you want to stop the overriding > once boot has completed (of which, if I'm not mistaken, we don't > really have any signal coming from Dom0)? And even during boot I'm > not convinced we'd want to let through everything, but perhaps just > Dom0's kernel messages? > >>> > >>> I normally use sync_console on all the boxes I'm doing dev work, so > >>> this request is something that come up internally. > >>> > >>> Didn't realize Xen output wasn't forced, since we already have rate > >>> limiting based on log levels I was assuming that non-ratelimited > >>> messages wouldn't be dropped. But yes, I agree that Xen (non-guest > >>> triggered) output shouldn't be rate limited either. > >> > >> Which would raise the question of why we have log levels for non-guest > >> messages. > > > > Hm, maybe I'm confused, but I don't see a direct relation between log > > levels and rate limiting. If I set log level to WARNING I would > > expect to not loose _any_ non-guest log messages with level WARNING or > > above. It's still useful to have log levels for non-guest messages, > > since user might want to filter out DEBUG non-guest messages for > > example. > > It was me who was confused, because of the two log-everything variants > we have (console and serial). You're right that your change is unrelated > to log levels. However, when there are e.g. many warnings or when an > admin has lowered the log level, what you (would) do is effectively > force sync_console mode transiently (for a subset of messages, but > that's secondary, especially because the "forced" output would still > be waiting for earlier output to make it out). > >>> > >>> Right, it would have to wait for any previous output on the buffer to > >>> go out first. In any case we can guarantee that no more output will > >>> be added to the buffer while Xen waits for it to be flushed. > >>> > >>> So for the hardware domain it might make sense to wait for the TX > >>> buffers to be half empty (the current tx_quench logic) by preempting > >>> the hypercall. That however could cause issues if guests manage to > >>> keep filling the buffer while the hardware domain is being preempted. > >>> > >>> Alternatively we could always reserve half of the buffer for the > >>> hardware domain, and allow it to be preempted while waiting for space > >>> (since it's guaranteed non hardware domains won't be able to steal the > >>> allocation from the hardware domain). > >> > >> Getting complicated it seems. I have to admit that I wonder whether we > >> wouldn't be better off leaving the current logic as is. > > > > Another possible solution (more like a band aid) is to increase the > > buffer size from 4 pages to 8 or 16. That would likely allow to cope > > fine with the high throughput of boot messages. > > You mean the buffer whose size is controlled by serial_tx_buffer? Yes. > On > large systems one may want to simply make use of the command line > option then; I don't think the built-in default needs changing. Or > if so, then perhaps not statically at build time, but taking into > account system properties (like CPU count). So how about we use: min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096)) Maybe we should also take CPU frequency into account, but that seems too complex for the purpose. Thanks, Roger.
Re: [PATCH] xen/console: do not drop serial output from the hardware domain
On 14.06.2022 08:52, Roger Pau Monné wrote: > On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote: >> On 13.06.2022 14:32, Roger Pau Monné wrote: >>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote: On 13.06.2022 11:04, Roger Pau Monné wrote: > On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote: >> On 13.06.2022 10:21, Roger Pau Monné wrote: >>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote: On 10.06.2022 17:06, Roger Pau Monne wrote: > Prevent dropping console output from the hardware domain, since it's > likely important to have all the output if the boot fails without > having to resort to sync_console (which also affects the output from > other guests). > > Do so by pairing the console_serial_puts() with > serial_{start,end}_log_everything(), so that no output is dropped. While I can see the goal, why would Dom0 output be (effectively) more important than Xen's own one (which isn't "forced")? And with this aiming at boot output only, wouldn't you want to stop the overriding once boot has completed (of which, if I'm not mistaken, we don't really have any signal coming from Dom0)? And even during boot I'm not convinced we'd want to let through everything, but perhaps just Dom0's kernel messages? >>> >>> I normally use sync_console on all the boxes I'm doing dev work, so >>> this request is something that come up internally. >>> >>> Didn't realize Xen output wasn't forced, since we already have rate >>> limiting based on log levels I was assuming that non-ratelimited >>> messages wouldn't be dropped. But yes, I agree that Xen (non-guest >>> triggered) output shouldn't be rate limited either. >> >> Which would raise the question of why we have log levels for non-guest >> messages. > > Hm, maybe I'm confused, but I don't see a direct relation between log > levels and rate limiting. If I set log level to WARNING I would > expect to not loose _any_ non-guest log messages with level WARNING or > above. It's still useful to have log levels for non-guest messages, > since user might want to filter out DEBUG non-guest messages for > example. It was me who was confused, because of the two log-everything variants we have (console and serial). You're right that your change is unrelated to log levels. However, when there are e.g. many warnings or when an admin has lowered the log level, what you (would) do is effectively force sync_console mode transiently (for a subset of messages, but that's secondary, especially because the "forced" output would still be waiting for earlier output to make it out). >>> >>> Right, it would have to wait for any previous output on the buffer to >>> go out first. In any case we can guarantee that no more output will >>> be added to the buffer while Xen waits for it to be flushed. >>> >>> So for the hardware domain it might make sense to wait for the TX >>> buffers to be half empty (the current tx_quench logic) by preempting >>> the hypercall. That however could cause issues if guests manage to >>> keep filling the buffer while the hardware domain is being preempted. >>> >>> Alternatively we could always reserve half of the buffer for the >>> hardware domain, and allow it to be preempted while waiting for space >>> (since it's guaranteed non hardware domains won't be able to steal the >>> allocation from the hardware domain). >> >> Getting complicated it seems. I have to admit that I wonder whether we >> wouldn't be better off leaving the current logic as is. > > Another possible solution (more like a band aid) is to increase the > buffer size from 4 pages to 8 or 16. That would likely allow to cope > fine with the high throughput of boot messages. You mean the buffer whose size is controlled by serial_tx_buffer? On large systems one may want to simply make use of the command line option then; I don't think the built-in default needs changing. Or if so, then perhaps not statically at build time, but taking into account system properties (like CPU count). Jan