Re: [XEN PATCH v2 1/4] build,include: rework shell script for headers++.chk

2022-06-14 Thread Jan Beulich
On 14.06.2022 18:22, Anthony PERARD wrote:
> The command line generated for headers++.chk by make is quite long,
> and in some environment it is too long. This issue have been seen in
> Yocto build environment.
> 
> Error messages:
> make[9]: execvp: /bin/sh: Argument list too long
> make[9]: *** [include/Makefile:181: include/headers++.chk] Error 127
> 
> Rework so that we do the foreach loop in shell rather that make to
> reduce the command line size by a lot. We also need a way to get
> headers prerequisite for some public headers so we use a switch "case"
> in shell to be able to do some simple pattern matching. Variables
> alone in POSIX shell don't allow to work with associative array or
> variables with "/".
> 
> Reported-by: Bertrand Marquis 
> Fixes: 28e13c7f43 ("build: xen/include: use if_changed")
> Signed-off-by: Anthony PERARD 
> Reviewed-by: Bertrand Marquis 
> ---
> 
> Notes:
> v2:
> - fix typo in commit message
> - fix out-of-tree build
> 
> v1:
> - was sent as a reply to v1 of the series
> 
>  xen/include/Makefile | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/include/Makefile b/xen/include/Makefile
> index 6d9bcc19b0..49c75a78f9 100644
> --- a/xen/include/Makefile
> +++ b/xen/include/Makefile
> @@ -158,13 +158,22 @@ define cmd_headerscxx_chk
>   touch $@.new; \
>   exit 0;   \
>   fi;   \
> - $(foreach i, $(filter %.h,$^),\
> - echo "#include "\"$(i)\"  \
> + get_prereq() {\
> + case $$1 in   \
> + $(foreach i, $(filter %.h,$^),\
> + $(if $($(patsubst $(srctree)/%,%,$(i))-prereq),   \
> + $(i)$(close)  \
> + echo "$(foreach j, $($(patsubst $(srctree)/%,%,$(i))-prereq), \
> + -include c$(j))";;))  \
> + esac; \

If I'm reading this right (indentation looks to be a little misleading
and hence one needs to count parentheses) the case statement could (in
theory) remain without any "body". As per the command language spec I'm
looking at this (if it works in the first place) is an extension, and
formally there's always at least one label required. Since we aim to be
portable in such regards, I'd like to request that there be a final
otherwise empty *) line.

> + };\
> + for i in $(filter %.h,$^); do \
> + echo "#include "\"$$i\"   \
>   | $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__\
> -include stdint.h -include $(srcdir)/public/xen.h   \
> -   $(foreach j, $($(patsubst $(srctree)/%,%,$i)-prereq), -include 
> c$(j)) \
> +   `get_prereq $$i`\

While I know we use back-ticked quoting elsewhere, I'd generally
recommend to use $() for readability. But maybe others view this
exactly the other way around ...

And a question without good context to put at: Isn't headers99.chk in
similar need of converting? It looks only slightly less involved than
the C++ one.

Jan

> -S -o /dev/null -   \
> - || exit $$?; echo $(i) >> $@.new;) \
> + || exit $$?; echo $$i >> $@.new; done;\
>   mv $@.new $@
>  endef
>  




Re: [PATCH V4 0/8] virtio: Solution to restrict memory access under Xen using xen-grant DMA-mapping layer

2022-06-14 Thread Viresh Kumar
Hi Oleksandr,

On Mon, Jun 6, 2022 at 10:16 AM Oleksandr Tyshchenko
 wrote:
> The high level idea is to create new Xen’s grant table based DMA-mapping 
> layer for the guest Linux whose main
> purpose is to provide a special 64-bit DMA address which is formed by using 
> the grant reference (for a page
> to be shared with the backend) with offset and setting the highest address 
> bit (this is for the backend to
> be able to distinguish grant ref based DMA address from normal GPA). For this 
> to work we need the ability
> to allocate contiguous (consecutive) grant references for multi-page 
> allocations. And the backend then needs
> to offer VIRTIO_F_ACCESS_PLATFORM and VIRTIO_F_VERSION_1 feature bits (it 
> must support virtio-mmio modern
> transport for 64-bit addresses in the virtqueue).

I was trying your series, from Linus's tree now and started seeing
boot failures,
failed to mount rootfs. And the reason probably is these messages:

[ 1.222498] virtio_scsi virtio1: device must provide VIRTIO_F_ACCESS_PLATFORM
[ 1.316334] virtio_net virtio0: device must provide VIRTIO_F_ACCESS_PLATFORM

I understand from your email that the backends need to offer
VIRTIO_F_ACCESS_PLATFORM flag now, but should this requirement be a
bit soft ? I mean shouldn't we allow both types of backends to run with the same
kernel, ones that offer this feature and others that don't ? The ones that don't
offer the feature, should continue to work like they used to, i.e.
without the restricted
memory access feature.

I am testing Xen currently with help of Qemu  over my x86 desktop and
these backends
(scsi and net) are part of QEMU itself I think, and I don't really
want to go and make the
change there.

Thanks.

--
Viresh



Re: [XEN PATCH v2 3/4] build: set PERL

2022-06-14 Thread Jan Beulich
On 14.06.2022 18:22, Anthony PERARD wrote:
> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -22,6 +22,7 @@ PYTHON_INTERPRETER  := $(word 1,$(shell which python3 
> python python2 2>/dev/null)
>  export PYTHON?= $(PYTHON_INTERPRETER)
>  
>  export CHECKPOLICY   ?= checkpolicy
> +export PERL  ?= perl
>  
>  $(if $(filter __%, $(MAKECMDGOALS)), \
>  $(error targets prefixed with '__' are only for internal use))

Considering my patch yesterday that moved the exporting of CC etc, I
wonder if - at the very least for consistency - this and the neighbouring
pre-existing exports shouldn't all be moved below the inclusion of
./Config.mk as well. After all ./config might override any of those.
(Yes, the ones here don't prevent such overriding, but only as long as
there aren't any further make quirks.)

Since this may want doing in a separate patch, this one is
Acked-by: Jan Beulich 

Jan



Re: [PATCH 23/36] arm64,smp: Remove trace_.*_rcuidle() usage

2022-06-14 Thread Marc Zyngier
On Tue, 14 Jun 2022 17:24:48 +0100,
Mark Rutland  wrote:
> 
> On Wed, Jun 08, 2022 at 04:27:46PM +0200, Peter Zijlstra wrote:
> > Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as
> > normal interrupts") this function is called in regular IRQ context.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> 
> [adding Marc since he authored that commit]
> 
> Makes sense to me:
> 
>   Acked-by: Mark Rutland 
> 
> Mark.
> 
> > ---
> >  arch/arm64/kernel/smp.c |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr)
> > unsigned int cpu = smp_processor_id();
> >  
> > if ((unsigned)ipinr < NR_IPI)
> > -   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
> > +   trace_ipi_entry(ipi_types[ipinr]);
> >  
> > switch (ipinr) {
> > case IPI_RESCHEDULE:
> > @@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr)
> > }
> >  
> > if ((unsigned)ipinr < NR_IPI)
> > -   trace_ipi_exit_rcuidle(ipi_types[ipinr]);
> > +   trace_ipi_exit(ipi_types[ipinr]);
> >  }
> >  
> >  static irqreturn_t ipi_handler(int irq, void *data)

Acked-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.



Re: [PATCH 34.5/36] cpuidle,omap4: Push RCU-idle into omap4_enter_lowpower()

2022-06-14 Thread Tony Lindgren
Hi,

Adding Aaro Koskinen and Peter Vasil for pm24xx for n800 and n810 related
idle.

* Peter Zijlstra  [220614 22:07]:
> On Mon, Jun 13, 2022 at 03:39:05PM +0300, Tony Lindgren wrote:
> > OMAP4 uses full SoC suspend modes as idle states, as such it needs the
> > whole power-domain and clock-domain code from the idle path.
> > 
> > All that code is not suitable to run with RCU disabled, as such push
> > RCU-idle deeper still.
> > 
> > Signed-off-by: Tony Lindgren 
> > ---
> > 
> > Peter here's one more for your series, looks like this is needed to avoid
> > warnings similar to what you did for omap3.
> 
> Thanks Tony!
> 
> I've had a brief look at omap2_pm_idle() and do I understand it right
> that something like the below patch would reduce it to a simple 'WFI'?

Yes that should do for omap2_do_wfi().

> What do I do with the rest of that code, because I don't think this
> thing has a cpuidle driver to take over, effectively turning it into
> dead code.

As we are establishing a policy where deeper idle states must be
handled by cpuidle, and for most part that has been the case for at least
10 years, I'd just drop the unused functions with an explanation in the
patch why we're doing it. Or the functions could be tagged with
__maybe_unused if folks prefer that.

In the pm24xx case we are not really causing a regression for users as
there are still pending patches to make n800 and n810 truly usable with
the mainline kernel. At least the PMIC and LCD related patches need some
work [0]. The deeper idle states can be added back later using cpuidle
as needed so we have a clear path.

Aaro & Peter V, do you have any better suggestions here as this will
mostly affect you guys currently?

Regards,

Tony

[0] 
https://lore.kernel.org/linux-omap/20211224214512.1583430-1-peter.va...@gmail.com/


> --- a/arch/arm/mach-omap2/pm24xx.c
> +++ b/arch/arm/mach-omap2/pm24xx.c
> @@ -126,10 +126,20 @@ static int omap2_allow_mpu_retention(voi
>   return 1;
>  }
>  
> -static void omap2_enter_mpu_retention(void)
> +static void omap2_do_wfi(void)
>  {
>   const int zero = 0;
>  
> + /* WFI */
> + asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
> +}
> +
> +#if 0
> +/*
> + * possible cpuidle implementation between WFI and full_retention above
> + */
> +static void omap2_enter_mpu_retention(void)
> +{
>   /* The peripherals seem not to be able to wake up the MPU when
>* it is in retention mode. */
>   if (omap2_allow_mpu_retention()) {
> @@ -146,8 +157,7 @@ static void omap2_enter_mpu_retention(vo
>   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
>   }
>  
> - /* WFI */
> - asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
> + omap2_do_wfi();
>  
>   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
>  }
> @@ -161,6 +171,7 @@ static int omap2_can_sleep(void)
>  
>   return 1;
>  }
> +#endif
>  
>  static void omap2_pm_idle(void)
>  {
> @@ -169,6 +180,7 @@ static void omap2_pm_idle(void)
>   if (omap_irq_pending())
>   return;
>  
> +#if 0
>   error = cpu_cluster_pm_enter();
>   if (error || !omap2_can_sleep()) {
>   omap2_enter_mpu_retention();
> @@ -179,6 +191,9 @@ static void omap2_pm_idle(void)
>  
>  out_cpu_cluster_pm:
>   cpu_cluster_pm_exit();
> +#else
> + omap2_do_wfi();
> +#endif
>  }
>  
>  static void __init prcm_setup_regs(void)



Re: [Pv-drivers] [PATCH 29/36] cpuidle, xenpv: Make more PARAVIRT_XXL noinstr clean

2022-06-14 Thread Peter Zijlstra
On Mon, Jun 13, 2022 at 07:23:13PM +, Nadav Amit wrote:
> On Jun 13, 2022, at 11:48 AM, Srivatsa S. Bhat  wrote:
> 
> > ⚠ External Email
> > 
> > On 6/8/22 4:27 PM, Peter Zijlstra wrote:
> >> vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() 
> >> leaves .noinstr.text section
> >> vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() 
> >> leaves .noinstr.text section
> >> vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to 
> >> HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section
> >> 
> >> Signed-off-by: Peter Zijlstra (Intel) 
> > 
> > Reviewed-by: Srivatsa S. Bhat (VMware) 
> > 
> >> 
> >> -static inline void wbinvd(void)
> >> +extern noinstr void pv_native_wbinvd(void);
> >> +
> >> +static __always_inline void wbinvd(void)
> >> {
> >>  PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
> >> }
> 
> I guess it is yet another instance of wrong accounting of GCC for
> the assembly blocks’ weight. I guess it is not a solution for older
> GCCs, but presumably PVOP_ALT_CALL() and friends should have
> used asm_inline or some new “asm_volatile_inline” variant.

Partially, some of the *SAN options also generate a metric ton of
nonsense when enabled and skew the compilers towards not inlining
things.



Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Peter Zijlstra
On Tue, Jun 14, 2022 at 05:13:16PM +0100, Mark Rutland wrote:
> On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> > All callers should still have RCU enabled.
> 
> IIUC with that true we should be able to drop the RCU_NONIDLE() from
> drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
> notifier.
> 
> I should be able to give that a spin on some hardware.
> 
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> >  kernel/cpu_pm.c |9 -
> >  1 file changed, 9 deletions(-)
> > 
> > --- a/kernel/cpu_pm.c
> > +++ b/kernel/cpu_pm.c
> > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
> >  {
> > int ret;
> >  
> > -   /*
> > -* This introduces a RCU read critical section, which could be
> > -* disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> > -* this.
> > -*/
> > -   rcu_irq_enter_irqson();
> > rcu_read_lock();
> > ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
> > rcu_read_unlock();
> > -   rcu_irq_exit_irqson();
> 
> To make this easier to debug, is it worth adding an assertion that RCU is
> watching here? e.g.
> 
>   RCU_LOCKDEP_WARN(!rcu_is_watching(),
>"cpu_pm_notify() used illegally from EQS");
> 

My understanding is that rcu_read_lock() implies something along those
lines when PROVE_RCU.



Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Peter Zijlstra
On Tue, Jun 14, 2022 at 01:41:13PM +0100, Mark Rutland wrote:
> On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> > --- a/kernel/time/tick-broadcast.c
> > +++ b/kernel/time/tick-broadcast.c
> > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
> >   * to avoid a deep idle transition as we are about to get the
> >   * broadcast IPI right away.
> >   */
> > -int tick_check_broadcast_expired(void)
> > +noinstr int tick_check_broadcast_expired(void)
> >  {
> > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> > +   return arch_test_bit(smp_processor_id(), 
> > cpumask_bits(tick_broadcast_force_mask));
> > +#else
> > return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> > +#endif
> >  }
> 
> This is somewhat not-ideal. :/

I'll say.

> Could we unconditionally do the arch_test_bit() variant, with a comment, or
> does that not exist in some cases?

Loads of build errors ensued, which is how I ended up with this mess ...



Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Peter Zijlstra
On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote:
> On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> > Hi All! (omg so many)
> 
> Hi Peter,
> 
> Sorry for the delay; my plate has also been rather full recently. I'm 
> beginning
> to page this in now.

No worries; we all have too much to do ;-)

> > These here few patches mostly clear out the utter mess that is cpuidle vs 
> > rcuidle.
> > 
> > At the end of the ride there's only 2 real RCU_NONIDLE() users left
> > 
> >   arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());
> >   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> > PERF_EF_RELOAD));
> 
> The latter of these is necessary because apparently PM notifiers are called
> with RCU not watching. Is that still the case today (or at the end of this
> series)? If so, that feels like fertile land for more issues (yaey...). If 
> not,
> we should be able to drop this.

That should be fixed; fingers crossed :-)

> >   kernel/cfi.c:   RCU_NONIDLE({
> > 
> > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand 
> > full
> > of trace_.*_rcuidle() left:
> > 
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> >   kernel/trace/trace_preemptirq.c:
> > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> >   kernel/trace/trace_preemptirq.c:
> > trace_preempt_enable_rcuidle(a0, a1);
> >   kernel/trace/trace_preemptirq.c:
> > trace_preempt_disable_rcuidle(a0, a1);
> > 
> > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.
> 
> I think those are also unused on arm64 too?
> 
> If not, I can go attack that.

My grep spots:

arch/arm64/kernel/entry-common.c:   trace_hardirqs_on();
arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();
arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();

The _on thing should be replaced with something like:

trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare();
instrumentation_end();
rcu_irq_exit();
lockdep_hardirqs_on(CALLER_ADDR0);

(as I think you know, since you have some of that already). And
something similar for the _off thing, but with _off_finish().

> > I've touched a _lot_ of code that I can't test and likely broken some of it 
> > :/
> > In particular, the whole ARM cpuidle stuff was quite involved with OMAP 
> > being
> > the absolute 'winner'.
> > 
> > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that 
> > to
> > GENERIC_ENTRY.
> 
> Moving to GENERIC_ENTRY as a whole is going to take a tonne of work
> (refactoring both arm64 and the generic portion to be more amenable to each
> other), but we can certainly move closer to that for the bits that matter 
> here.

I know ... been there etc.. :-)

> Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff 
> that
> we can select regardless of GENERIC_ENTRY to make that easier.

Possible yeah.

> > I've also got a note that says ARM64 can probably do a WFE based
> > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs.
> 
> Possibly; I'm not sure how much of a win that'll be given that by default 
> we'll
> have a ~10KHz WFE wakeup from the timer, but we could take a peek.

Ohh.. I didn't know it woke up *that* often. I just know Will made use
of it in things like smp_cond_load_relaxed() which would be somewhat
similar to a very shallow idle state that looks at the TIF word.



Re: [PATCH 34.5/36] cpuidle,omap4: Push RCU-idle into omap4_enter_lowpower()

2022-06-14 Thread Peter Zijlstra
On Mon, Jun 13, 2022 at 03:39:05PM +0300, Tony Lindgren wrote:
> OMAP4 uses full SoC suspend modes as idle states, as such it needs the
> whole power-domain and clock-domain code from the idle path.
> 
> All that code is not suitable to run with RCU disabled, as such push
> RCU-idle deeper still.
> 
> Signed-off-by: Tony Lindgren 
> ---
> 
> Peter here's one more for your series, looks like this is needed to avoid
> warnings similar to what you did for omap3.

Thanks Tony!

I've had a brief look at omap2_pm_idle() and do I understand it right
that something like the below patch would reduce it to a simple 'WFI'?

What do I do with the rest of that code, because I don't think this
thing has a cpuidle driver to take over, effectively turning it into
dead code.

--- a/arch/arm/mach-omap2/pm24xx.c
+++ b/arch/arm/mach-omap2/pm24xx.c
@@ -126,10 +126,20 @@ static int omap2_allow_mpu_retention(voi
return 1;
 }
 
-static void omap2_enter_mpu_retention(void)
+static void omap2_do_wfi(void)
 {
const int zero = 0;
 
+   /* WFI */
+   asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
+}
+
+#if 0
+/*
+ * possible cpuidle implementation between WFI and full_retention above
+ */
+static void omap2_enter_mpu_retention(void)
+{
/* The peripherals seem not to be able to wake up the MPU when
 * it is in retention mode. */
if (omap2_allow_mpu_retention()) {
@@ -146,8 +157,7 @@ static void omap2_enter_mpu_retention(vo
pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
}
 
-   /* WFI */
-   asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
+   omap2_do_wfi();
 
pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
 }
@@ -161,6 +171,7 @@ static int omap2_can_sleep(void)
 
return 1;
 }
+#endif
 
 static void omap2_pm_idle(void)
 {
@@ -169,6 +180,7 @@ static void omap2_pm_idle(void)
if (omap_irq_pending())
return;
 
+#if 0
error = cpu_cluster_pm_enter();
if (error || !omap2_can_sleep()) {
omap2_enter_mpu_retention();
@@ -179,6 +191,9 @@ static void omap2_pm_idle(void)
 
 out_cpu_cluster_pm:
cpu_cluster_pm_exit();
+#else
+   omap2_do_wfi();
+#endif
 }
 
 static void __init prcm_setup_regs(void)



Re: [PATCH 16/36] rcu: Fix rcu_idle_exit()

2022-06-14 Thread Paul E. McKenney
On Wed, Jun 08, 2022 at 04:27:39PM +0200, Peter Zijlstra wrote:
> Current rcu_idle_exit() is terminally broken because it uses
> local_irq_{save,restore}(), which are traced which uses RCU.
> 
> However, now that all the callers are sure to have IRQs disabled, we
> can remove these calls.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Acked-by: Paul E. McKenney 

We have some fun conflicts between this series and Frederic's context-tracking
series.  But it looks like these can be resolved by:

1.  A patch on top of Frederic's series that provides the old rcu_*()
names for the functions now prefixed with ct_*() such as
ct_idle_exit().

2.  Another patch on top of Frederic's series that takes the
changes remaining from this patch, shown below.  Frederic's
series uses raw_local_irq_save() and raw_local_irq_restore(),
which can then be removed.

Or is there a better way to do this?

Thanx, Paul



commit f64cee8c159e9863a74594efe3d33fb513a6a7b5
Author: Peter Zijlstra 
Date:   Tue Jun 14 17:24:43 2022 -0700

context_tracking: Interrupts always disabled for ct_idle_exit()

Now that the idle-loop cleanups have ensured that rcu_idle_exit() is
always invoked with interrupts disabled, remove the interrupt disabling
in favor of a debug check.

Signed-off-by: Peter Zijlstra 
Cc: Frederic Weisbecker 
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 1da44803fd319..99310cf5b0254 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -332,11 +332,8 @@ EXPORT_SYMBOL_GPL(ct_idle_enter);
  */
 void noinstr ct_idle_exit(void)
 {
-   unsigned long flags;
-
-   raw_local_irq_save(flags);
+   WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !raw_irqs_disabled());
ct_kernel_enter(false, RCU_DYNTICKS_IDX - CONTEXT_IDLE);
-   raw_local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(ct_idle_exit);
 



Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Mark Rutland
On Tue, Jun 14, 2022 at 06:58:30PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> > > Hi All! (omg so many)
> > 
> > Hi Peter,
> > 
> > Sorry for the delay; my plate has also been rather full recently. I'm 
> > beginning
> > to page this in now.
> 
> No worries; we all have too much to do ;-)
> 
> > > These here few patches mostly clear out the utter mess that is cpuidle vs 
> > > rcuidle.
> > > 
> > > At the end of the ride there's only 2 real RCU_NONIDLE() users left
> > > 
> > >   arch/arm64/kernel/suspend.c:
> > > RCU_NONIDLE(__cpu_suspend_exit());
> > >   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> > > PERF_EF_RELOAD));
> > 
> > The latter of these is necessary because apparently PM notifiers are called
> > with RCU not watching. Is that still the case today (or at the end of this
> > series)? If so, that feels like fertile land for more issues (yaey...). If 
> > not,
> > we should be able to drop this.
> 
> That should be fixed; fingers crossed :-)

Cool; I'll try to give that a spin when I'm sat next to some relevant hardware. 
:)

> > >   kernel/cfi.c:   RCU_NONIDLE({
> > > 
> > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand 
> > > full
> > > of trace_.*_rcuidle() left:
> > > 
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_preempt_enable_rcuidle(a0, a1);
> > >   kernel/trace/trace_preemptirq.c:
> > > trace_preempt_disable_rcuidle(a0, a1);
> > > 
> > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.
> > I think those are also unused on arm64 too?
> > 
> > If not, I can go attack that.
> 
> My grep spots:
> 
> arch/arm64/kernel/entry-common.c:   trace_hardirqs_on();
> arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();
> arch/arm64/include/asm/daifflags.h: trace_hardirqs_off();

Ah; I hadn't realised those used trace_.*_rcuidle() behind the scenes.

That affects local_irq_{enable,disable,restore}() too (which is what the
daifflags.h bits are emulating), and also the generic entry code's
irqentry_exit().

So it feels to me like we should be fixing those more generally? e.g. say that
with a new STRICT_ENTRY[_RCU], we can only call trace_hardirqs_{on,off}() with
RCU watching, and alter the definition of those?

> The _on thing should be replaced with something like:
> 
>   trace_hardirqs_on_prepare();
>   lockdep_hardirqs_on_prepare();
>   instrumentation_end();
>   rcu_irq_exit();
>   lockdep_hardirqs_on(CALLER_ADDR0);
> 
> (as I think you know, since you have some of that already). And
> something similar for the _off thing, but with _off_finish().

Sure; I knew that was necessary for the outermost parts of entry (and I think
that's all handled), I just hadn't realised that trace_hardirqs_{on,off} did
the rcuidle thing in the middle.

It'd be nice to not have to open-code the whole sequence everywhere for the
portions which run after entry and are instrumentable, so (as above) I reckon
we want to make trace_hardirqs_{on,off}() not do the rcuidle part
unnecessarily (which IIUC is an end-goal anyway)?

> > > I've touched a _lot_ of code that I can't test and likely broken some of 
> > > it :/
> > > In particular, the whole ARM cpuidle stuff was quite involved with OMAP 
> > > being
> > > the absolute 'winner'.
> > > 
> > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves 
> > > that to
> > > GENERIC_ENTRY.
> > 
> > Moving to GENERIC_ENTRY as a whole is going to take a tonne of work
> > (refactoring both arm64 and the generic portion to be more amenable to each
> > other), but we can certainly move closer to that for the bits that matter 
> > here.
> 
> I know ... been there etc.. :-)
> 
> > Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff 
> > that
> > we can select regardless of GENERIC_ENTRY to make that easier.
> 
> Possible yeah.
> 
> > > I've also got a note that says ARM64 can probably do a WFE based
> > > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs.
> > 
> > Possibly; I'm not sure how much of a win that'll be given that by default 
> > we'll
> > have a ~10KHz WFE wakeup from the timer, but we could take a peek.
> 
> Ohh.. I didn't know it woke up *that* often. I just know Will made use
> of it in things like smp_cond_load_relaxed() which would be s

Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Mark Rutland
On Tue, Jun 14, 2022 at 06:42:14PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 05:13:16PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> > > All callers should still have RCU enabled.
> > 
> > IIUC with that true we should be able to drop the RCU_NONIDLE() from
> > drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
> > notifier.
> > 
> > I should be able to give that a spin on some hardware.
> > 
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > ---
> > >  kernel/cpu_pm.c |9 -
> > >  1 file changed, 9 deletions(-)
> > > 
> > > --- a/kernel/cpu_pm.c
> > > +++ b/kernel/cpu_pm.c
> > > @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
> > >  {
> > >   int ret;
> > >  
> > > - /*
> > > -  * This introduces a RCU read critical section, which could be
> > > -  * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> > > -  * this.
> > > -  */
> > > - rcu_irq_enter_irqson();
> > >   rcu_read_lock();
> > >   ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
> > >   rcu_read_unlock();
> > > - rcu_irq_exit_irqson();
> > 
> > To make this easier to debug, is it worth adding an assertion that RCU is
> > watching here? e.g.
> > 
> > RCU_LOCKDEP_WARN(!rcu_is_watching(),
> >  "cpu_pm_notify() used illegally from EQS");
> > 
> 
> My understanding is that rcu_read_lock() implies something along those
> lines when PROVE_RCU.

Ah, duh. Given that:

Acked-by: Mark Rutland 

Mark.



Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Mark Rutland
On Tue, Jun 14, 2022 at 06:40:53PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2022 at 01:41:13PM +0100, Mark Rutland wrote:
> > On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> > > --- a/kernel/time/tick-broadcast.c
> > > +++ b/kernel/time/tick-broadcast.c
> > > @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
> > >   * to avoid a deep idle transition as we are about to get the
> > >   * broadcast IPI right away.
> > >   */
> > > -int tick_check_broadcast_expired(void)
> > > +noinstr int tick_check_broadcast_expired(void)
> > >  {
> > > +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> > > + return arch_test_bit(smp_processor_id(), 
> > > cpumask_bits(tick_broadcast_force_mask));
> > > +#else
> > >   return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> > > +#endif
> > >  }
> > 
> > This is somewhat not-ideal. :/
> 
> I'll say.
> 
> > Could we unconditionally do the arch_test_bit() variant, with a comment, or
> > does that not exist in some cases?
> 
> Loads of build errors ensued, which is how I ended up with this mess ...

Yaey :(

I see the same is true for the thread flag manipulation too.

I'll take a look and see if we can layer things so that we can use the arch_*()
helpers and wrap those consistently so that we don't have to check the CPP
guard.

Ideally we'd have a a better language that allows us to make some
context-senstive decisions, then we could hide all this gunk in the lower
levels with somethin like:

if (!THIS_IS_A_NOINSTR_FUNCTION()) {
explicit_instrumentation(...);
}

... ho hum.

Mark.



Re: [PATCH 16/36] rcu: Fix rcu_idle_exit()

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:39PM +0200, Peter Zijlstra wrote:
> Current rcu_idle_exit() is terminally broken because it uses
> local_irq_{save,restore}(), which are traced which uses RCU.
> 
> However, now that all the callers are sure to have IRQs disabled, we
> can remove these calls.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Acked-by: Paul E. McKenney 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/rcu/tree.c |9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -659,7 +659,7 @@ static noinstr void rcu_eqs_enter(bool u
>   * If you add or remove a call to rcu_idle_enter(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_enter(void)
> +void noinstr rcu_idle_enter(void)
>  {
>   lockdep_assert_irqs_disabled();
>   rcu_eqs_enter(false);
> @@ -896,13 +896,10 @@ static void noinstr rcu_eqs_exit(bool us
>   * If you add or remove a call to rcu_idle_exit(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
>  {
> - unsigned long flags;
> -
> - local_irq_save(flags);
> + lockdep_assert_irqs_disabled();
>   rcu_eqs_exit(false);
> - local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_exit);
>  
> 
> 



Re: [PATCH 20/36] arch/idle: Change arch_cpu_idle() IRQ behaviour

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:43PM +0200, Peter Zijlstra wrote:
> Current arch_cpu_idle() is called with IRQs disabled, but will return
> with IRQs enabled.
> 
> However, the very first thing the generic code does after calling
> arch_cpu_idle() is raw_local_irq_disable(). This means that
> architectures that can idle with IRQs disabled end up doing a
> pointless 'enable-disable' dance.
> 
> Therefore, push this IRQ disabling into the idle function, meaning
> that those architectures can avoid the pointless IRQ state flipping.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Nice!

  Acked-by: Mark Rutland  [arm64]

Mark.

> ---
>  arch/alpha/kernel/process.c  |1 -
>  arch/arc/kernel/process.c|3 +++
>  arch/arm/kernel/process.c|1 -
>  arch/arm/mach-gemini/board-dt.c  |3 ++-
>  arch/arm64/kernel/idle.c |1 -
>  arch/csky/kernel/process.c   |1 -
>  arch/csky/kernel/smp.c   |2 +-
>  arch/hexagon/kernel/process.c|1 -
>  arch/ia64/kernel/process.c   |1 +
>  arch/microblaze/kernel/process.c |1 -
>  arch/mips/kernel/idle.c  |8 +++-
>  arch/nios2/kernel/process.c  |1 -
>  arch/openrisc/kernel/process.c   |1 +
>  arch/parisc/kernel/process.c |2 --
>  arch/powerpc/kernel/idle.c   |5 ++---
>  arch/riscv/kernel/process.c  |1 -
>  arch/s390/kernel/idle.c  |1 -
>  arch/sh/kernel/idle.c|1 +
>  arch/sparc/kernel/leon_pmc.c |4 
>  arch/sparc/kernel/process_32.c   |1 -
>  arch/sparc/kernel/process_64.c   |3 ++-
>  arch/um/kernel/process.c |1 -
>  arch/x86/coco/tdx/tdx.c  |3 +++
>  arch/x86/kernel/process.c|   15 ---
>  arch/xtensa/kernel/process.c |1 +
>  kernel/sched/idle.c  |2 --
>  26 files changed, 28 insertions(+), 37 deletions(-)
> 
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
>  void arch_cpu_idle(void)
>  {
>   wtint(0);
> - raw_local_irq_enable();
>  }
>  
>  void arch_cpu_idle_dead(void)
> --- a/arch/arc/kernel/process.c
> +++ b/arch/arc/kernel/process.c
> @@ -114,6 +114,8 @@ void arch_cpu_idle(void)
>   "sleep %0   \n"
>   :
>   :"I"(arg)); /* can't be "r" has to be embedded const */
> +
> + raw_local_irq_disable();
>  }
>  
>  #else/* ARC700 */
> @@ -122,6 +124,7 @@ void arch_cpu_idle(void)
>  {
>   /* sleep, but enable both set E1/E2 (levels of interrupts) before 
> committing */
>   __asm__ __volatile__("sleep 0x3 \n");
> + raw_local_irq_disable();
>  }
>  
>  #endif
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -78,7 +78,6 @@ void arch_cpu_idle(void)
>   arm_pm_idle();
>   else
>   cpu_do_idle();
> - raw_local_irq_enable();
>  }
>  
>  void arch_cpu_idle_prepare(void)
> --- a/arch/arm/mach-gemini/board-dt.c
> +++ b/arch/arm/mach-gemini/board-dt.c
> @@ -42,8 +42,9 @@ static void gemini_idle(void)
>*/
>  
>   /* FIXME: Enabling interrupts here is racy! */
> - local_irq_enable();
> + raw_local_irq_enable();
>   cpu_do_idle();
> + raw_local_irq_disable();
>  }
>  
>  static void __init gemini_init_machine(void)
> --- a/arch/arm64/kernel/idle.c
> +++ b/arch/arm64/kernel/idle.c
> @@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
>* tricks
>*/
>   cpu_do_idle();
> - raw_local_irq_enable();
>  }
> --- a/arch/csky/kernel/process.c
> +++ b/arch/csky/kernel/process.c
> @@ -101,6 +101,5 @@ void arch_cpu_idle(void)
>  #ifdef CONFIG_CPU_PM_STOP
>   asm volatile("stop\n");
>  #endif
> - raw_local_irq_enable();
>  }
>  #endif
> --- a/arch/csky/kernel/smp.c
> +++ b/arch/csky/kernel/smp.c
> @@ -314,7 +314,7 @@ void arch_cpu_idle_dead(void)
>   while (!secondary_stack)
>   arch_cpu_idle();
>  
> - local_irq_disable();
> + raw_local_irq_disable();
>  
>   asm volatile(
>   "movsp, %0\n"
> --- a/arch/hexagon/kernel/process.c
> +++ b/arch/hexagon/kernel/process.c
> @@ -44,7 +44,6 @@ void arch_cpu_idle(void)
>  {
>   __vmwait();
>   /*  interrupts wake us up, but irqs are still disabled */
> - raw_local_irq_enable();
>  }
>  
>  /*
> --- a/arch/ia64/kernel/process.c
> +++ b/arch/ia64/kernel/process.c
> @@ -241,6 +241,7 @@ void arch_cpu_idle(void)
>   (*mark_idle)(1);
>  
>   raw_safe_halt();
> + raw_local_irq_disable();
>  
>   if (mark_idle)
>   (*mark_idle)(0);
> --- a/arch/microblaze/kernel/process.c
> +++ b/arch/microblaze/kernel/process.c
> @@ -138,5 +138,4 @@ int dump_fpu(struct pt_regs *regs, elf_f
>  
>  void arch_cpu_idle(void)
>  {
> -   raw_local_irq_enable();
>  }
> --- a/arch/mips/kernel/idle.c
> +++ b/arch/mips/kernel/idle.c
> @@ -33,13 +33,13 @@ static void __cpuidle r3081_wait(void)
>  {
>   unsigned long cfg

Re: [PATCH 25/36] time/tick-broadcast: Remove RCU_NONIDLE usage

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:48PM +0200, Peter Zijlstra wrote:
> No callers left that have already disabled RCU.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/time/tick-broadcast-hrtimer.c |   29 -
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> --- a/kernel/time/tick-broadcast-hrtimer.c
> +++ b/kernel/time/tick-broadcast-hrtimer.c
> @@ -56,25 +56,20 @@ static int bc_set_next(ktime_t expires,
>* hrtimer callback function is currently running, then
>* hrtimer_start() cannot move it and the timer stays on the CPU on
>* which it is assigned at the moment.
> +  */
> + hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
> + /*
> +  * The core tick broadcast mode expects bc->bound_on to be set
> +  * correctly to prevent a CPU which has the broadcast hrtimer
> +  * armed from going deep idle.
>*
> -  * As this can be called from idle code, the hrtimer_start()
> -  * invocation has to be wrapped with RCU_NONIDLE() as
> -  * hrtimer_start() can call into tracing.
> +  * As tick_broadcast_lock is held, nothing can change the cpu
> +  * base which was just established in hrtimer_start() above. So
> +  * the below access is safe even without holding the hrtimer
> +  * base lock.
>*/
> - RCU_NONIDLE( {
> - hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
> - /*
> -  * The core tick broadcast mode expects bc->bound_on to be set
> -  * correctly to prevent a CPU which has the broadcast hrtimer
> -  * armed from going deep idle.
> -  *
> -  * As tick_broadcast_lock is held, nothing can change the cpu
> -  * base which was just established in hrtimer_start() above. So
> -  * the below access is safe even without holding the hrtimer
> -  * base lock.
> -  */
> - bc->bound_on = bctimer.base->cpu_base->cpu;
> - } );
> + bc->bound_on = bctimer.base->cpu_base->cpu;
> +
>   return 0;
>  }
>  
> 
> 



Re: [PATCH 23/36] arm64,smp: Remove trace_.*_rcuidle() usage

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:46PM +0200, Peter Zijlstra wrote:
> Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as
> normal interrupts") this function is called in regular IRQ context.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

[adding Marc since he authored that commit]

Makes sense to me:

  Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kernel/smp.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr)
>   unsigned int cpu = smp_processor_id();
>  
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_entry_rcuidle(ipi_types[ipinr]);
> + trace_ipi_entry(ipi_types[ipinr]);
>  
>   switch (ipinr) {
>   case IPI_RESCHEDULE:
> @@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr)
>   }
>  
>   if ((unsigned)ipinr < NR_IPI)
> - trace_ipi_exit_rcuidle(ipi_types[ipinr]);
> + trace_ipi_exit(ipi_types[ipinr]);
>  }
>  
>  static irqreturn_t ipi_handler(int irq, void *data)
> 
> 



Re: [PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:38PM +0200, Peter Zijlstra wrote:
> All callers should still have RCU enabled.

IIUC with that true we should be able to drop the RCU_NONIDLE() from
drivers/perf/arm_pmu.c, as we only needed that for an invocation via a pm
notifier.

I should be able to give that a spin on some hardware.

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/cpu_pm.c |9 -
>  1 file changed, 9 deletions(-)
> 
> --- a/kernel/cpu_pm.c
> +++ b/kernel/cpu_pm.c
> @@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
>  {
>   int ret;
>  
> - /*
> -  * This introduces a RCU read critical section, which could be
> -  * disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
> -  * this.
> -  */
> - rcu_irq_enter_irqson();
>   rcu_read_lock();
>   ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
>   rcu_read_unlock();
> - rcu_irq_exit_irqson();

To make this easier to debug, is it worth adding an assertion that RCU is
watching here? e.g.

RCU_LOCKDEP_WARN(!rcu_is_watching(),
 "cpu_pm_notify() used illegally from EQS");

>  
>   return notifier_to_errno(ret);
>  }
> @@ -49,11 +42,9 @@ static int cpu_pm_notify_robust(enum cpu
>   unsigned long flags;
>   int ret;
>  
> - rcu_irq_enter_irqson();
>   raw_spin_lock_irqsave(&cpu_pm_notifier.lock, flags);
>   ret = raw_notifier_call_chain_robust(&cpu_pm_notifier.chain, event_up, 
> event_down, NULL);
>   raw_spin_unlock_irqrestore(&cpu_pm_notifier.lock, flags);
> - rcu_irq_exit_irqson();


... and likewise here?

Thanks,
Mark.

>  
>   return notifier_to_errno(ret);
>  }
> 
> 



[xen-unstable-smoke test] 171170: tolerable all pass - PUSHED

2022-06-14 Thread osstest service owner
flight 171170 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/171170/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  162dea4e768b835114c736cfd3fa1fc3742d39c5
baseline version:
 xen  c9a707df83aad17a6fcf2e8330ab3b5bead6fb8b

Last test of basis   171079  2022-06-11 12:00:26 Z3 days
Testing same since   171170  2022-06-15 00:02:01 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Roger Pau Monné 
  Stefano Stabellini 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   c9a707df83..162dea4e76  162dea4e768b835114c736cfd3fa1fc3742d39c5 -> smoke



[PATCH] xen/arm: avoid vtimer flip-flop transition in context switch

2022-06-14 Thread Wei Chen
virt_vtimer_save is calculating the new time for the vtimer and
has a potential risk of timer flip in:
"v->arch.virt_timer.cval + v->domain->arch.virt_timer_base.offset
- boot_count".
In this formula, "cval + offset" could make uint64_t overflow.
Generally speaking, this is difficult to trigger. But unfortunately
the problem was encountered with a platform where the timer started
with a very huge initial value, like 0xF3338991. On this
platform cval + offset is overflowing after running for a while.

So in this patch, we adjust the formula to use "offset - boot_count"
first, and then use the result to plus cval. This will avoid the
uint64_t overflow.

Signed-off-by: Wei Chen 
---
 xen/arch/arm/vtimer.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index 5bb5970f58..86e63303c8 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -144,8 +144,9 @@ void virt_timer_save(struct vcpu *v)
 if ( (v->arch.virt_timer.ctl & CNTx_CTL_ENABLE) &&
  !(v->arch.virt_timer.ctl & CNTx_CTL_MASK))
 {
-set_timer(&v->arch.virt_timer.timer, 
ticks_to_ns(v->arch.virt_timer.cval +
-  v->domain->arch.virt_timer_base.offset - boot_count));
+set_timer(&v->arch.virt_timer.timer,
+  ticks_to_ns(v->domain->arch.virt_timer_base.offset -
+  boot_count + v->arch.virt_timer.cval));
 }
 }
 
-- 
2.25.1




[linux-5.4 test] 171167: regressions - FAIL

2022-06-14 Thread osstest service owner
flight 171167 linux-5.4 real [real]
flight 171169 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/171167/
http://logs.test-lab.xenproject.org/osstest/logs/171169/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt-raw 17 guest-start/debian.repeat fail REGR. vs. 170895

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-raw 13 guest-start  fail in 171169 pass in 171167
 test-amd64-i386-xl-qemut-debianhvm-amd64 20 guest-start/debianhvm.repeat fail 
pass in 171169-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170895
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170895
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 170895
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 170895
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 170895
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170895
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170895
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 170895
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 170895
 test-armhf-armhf-xl-credit2  14 guest-start  fail  like 170895
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 170895
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170895
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 170895
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfa

Re: [RFC PATCH 2/2] xen/grant-table: Use unpopulated DMAable pages instead of real RAM ones

2022-06-14 Thread Stefano Stabellini
On Tue, 14 Jun 2022, Oleksandr wrote:
> On 11.06.22 02:55, Stefano Stabellini wrote:
> 
> Hello Stefano
> 
> > On Thu, 9 Jun 2022, Oleksandr wrote:
> > > On 04.06.22 00:19, Stefano Stabellini wrote:
> > > Hello Stefano
> > > 
> > > Thank you for having a look and sorry for the late response.
> > > 
> > > > On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko 
> > > > > 
> > > > > Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
> > > > > DMAable (contiguous) pages will be allocated for grant mapping into
> > > > > instead of ballooning out real RAM pages.
> > > > > 
> > > > > TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
> > > > > fails.
> > > > > 
> > > > > Signed-off-by: Oleksandr Tyshchenko 
> > > > > ---
> > > > >drivers/xen/grant-table.c | 27 +++
> > > > >1 file changed, 27 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > > > > index 8ac..2bb4392 100644
> > > > > --- a/drivers/xen/grant-table.c
> > > > > +++ b/drivers/xen/grant-table.c
> > > > > @@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
> > > > > */
> > > > >int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
> > > > >{
> > > > > +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
> > > > > + int ret;
> > > > This is an alternative implementation of the same function.
> > > Currently, yes.
> > > 
> > > 
> > > >If we are
> > > > going to use #ifdef, then I would #ifdef the entire function, rather
> > > > than just the body. Otherwise within the function body we can use
> > > > IS_ENABLED.
> > > 
> > > Good point. Note, there is one missing thing in current patch which is
> > > described in TODO.
> > > 
> > > "Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails." 
> > > So I
> > > will likely use IS_ENABLED within the function body.
> > > 
> > > If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages()
> > > will
> > > try to call xen_alloc_unpopulated_dma_pages() the first and if fails then
> > > fallback to allocate RAM pages and balloon them out.
> > > 
> > > One moment is not entirely clear to me. If we use fallback in
> > > gnttab_dma_alloc_pages() then we must use fallback in
> > > gnttab_dma_free_pages()
> > > as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM
> > > pages.
> > > The question is how to pass this information to the
> > > gnttab_dma_free_pages()?
> > > The first idea which comes to mind is to add a flag to struct
> > > gnttab_dma_alloc_args...
> >   You can check if the page is within the mhp_range range or part of
> > iomem_resource? If not, you can free it as a normal page.
> > 
> > If we do this, then the fallback is better implemented in
> > unpopulated-alloc.c because that is the one that is aware about
> > page addresses.
> 
> 
> I got your idea and agree this can work technically. Or if we finally decide
> to use the second option (use "dma_pool" for all) in the first patch
> "[RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations"
> then we will likely be able to check whether a page in question
> is within a "dma_pool" using gen_pool_has_addr().
> 
> I am still wondering, can we avoid the fallback implementation in
> unpopulated-alloc.c.
> Because for that purpose we will need to pull more code into
> unpopulated-alloc.c (to be more precise, almost everything which
> gnttab_dma_free_pages() already has except gnttab_pages_clear_private()) and
> pass more arguments to xen_free_unpopulated_dma_pages(). Also I might mistake,
> but having a fallback split between grant-table.c (to allocate RAM pages in
> gnttab_dma_alloc_pages()) and unpopulated-alloc.c (to free RAM pages in
> xen_free_unpopulated_dma_pages()) would look a bit weird.
> 
> I see two possible options for the fallback implementation in grant-table.c:
> 1. (less preferable) by introducing new flag in struct gnttab_dma_alloc_args
> 2. (more preferable) by guessing unpopulated (non real RAM) page using
> is_zone_device_page(), etc
> 
> 
> For example, with the second option the resulting code will look quite simple
> (only build tested):
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 738029d..3bda71f 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1047,6 +1047,23 @@ int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args
> *args)
>     size_t size;
>     int i, ret;
> 
> +   if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC)) {
> +   ret = xen_alloc_unpopulated_dma_pages(args->dev,
> args->nr_pages,
> +   args->pages);
> +   if (ret < 0)
> +   goto fallback;
> +
> +   ret = gnttab_pages_set_private(args->nr_pages, args->pages);
> +   if (ret < 0)
> +   goto fail;
> +
> +   args->vaddr = page_to_virt(

Re: [RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations

2022-06-14 Thread Stefano Stabellini
On Tue, 14 Jun 2022, Oleksandr wrote:
> On 11.06.22 03:12, Stefano Stabellini wrote:
> > On Wed, 8 Jun 2022, Oleksandr wrote:
> > > 2. Drop the "page_list" entirely and use "dma_pool" for all (contiguous
> > > and
> > > non-contiguous) allocations. After all, all pages are initially contiguous
> > > in
> > > fill_list() as they are built from the resource. This changes behavior for
> > > all
> > > users of xen_alloc_unpopulated_pages()
> > > 
> > > Below the diff for unpopulated-alloc.c. The patch is also available at:
> > > 
> > > https://github.com/otyshchenko1/linux/commit/7be569f113a4acbdc4bcb9b20cb3995b3151387a
> > > 
> > > 
> > > diff --git a/drivers/xen/unpopulated-alloc.c
> > > b/drivers/xen/unpopulated-alloc.c
> > > index a39f2d3..ab5c7bd 100644
> > > --- a/drivers/xen/unpopulated-alloc.c
> > > +++ b/drivers/xen/unpopulated-alloc.c
> > > @@ -1,5 +1,7 @@
> > >   // SPDX-License-Identifier: GPL-2.0
> > > +#include 
> > >   #include 
> > > +#include 
> > >   #include 
> > >   #include 
> > >   #include 
> > > @@ -13,8 +15,8 @@
> > >   #include 
> > > 
> > >   static DEFINE_MUTEX(list_lock);
> > > -static struct page *page_list;
> > > -static unsigned int list_count;
> > > +
> > > +static struct gen_pool *dma_pool;
> > > 
> > >   static struct resource *target_resource;
> > > 
> > > @@ -36,7 +38,7 @@ static int fill_list(unsigned int nr_pages)
> > >      struct dev_pagemap *pgmap;
> > >      struct resource *res, *tmp_res = NULL;
> > >      void *vaddr;
> > > -   unsigned int i, alloc_pages = round_up(nr_pages,
> > > PAGES_PER_SECTION);
> > > +   unsigned int alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
> > >      struct range mhp_range;
> > >      int ret;
> > > 
> > > @@ -106,6 +108,7 @@ static int fill_list(unsigned int nr_pages)
> > >    * conflict with any devices.
> > >    */
> > >      if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> > > +   unsigned int i;
> > >      xen_pfn_t pfn = PFN_DOWN(res->start);
> > > 
> > >      for (i = 0; i < alloc_pages; i++) {
> > > @@ -125,16 +128,17 @@ static int fill_list(unsigned int nr_pages)
> > >      goto err_memremap;
> > >      }
> > > 
> > > -   for (i = 0; i < alloc_pages; i++) {
> > > -   struct page *pg = virt_to_page(vaddr + PAGE_SIZE * i);
> > > -
> > > -   pg->zone_device_data = page_list;
> > > -   page_list = pg;
> > > -   list_count++;
> > > +   ret = gen_pool_add_virt(dma_pool, (unsigned long)vaddr,
> > > res->start,
> > > +   alloc_pages * PAGE_SIZE, NUMA_NO_NODE);
> > > +   if (ret) {
> > > +   pr_err("Cannot add memory range to the pool\n");
> > > +   goto err_pool;
> > >      }
> > > 
> > >      return 0;
> > > 
> > > +err_pool:
> > > +   memunmap_pages(pgmap);
> > >   err_memremap:
> > >      kfree(pgmap);
> > >   err_pgmap:
> > > @@ -149,51 +153,49 @@ static int fill_list(unsigned int nr_pages)
> > >      return ret;
> > >   }
> > > 
> > > -/**
> > > - * xen_alloc_unpopulated_pages - alloc unpopulated pages
> > > - * @nr_pages: Number of pages
> > > - * @pages: pages returned
> > > - * @return 0 on success, error otherwise
> > > - */
> > > -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page
> > > **pages)
> > > +static int alloc_unpopulated_pages(unsigned int nr_pages, struct page
> > > **pages,
> > > +   bool contiguous)
> > >   {
> > >      unsigned int i;
> > >      int ret = 0;
> > > +   void *vaddr;
> > > +   bool filled = false;
> > > 
> > >      /*
> > >   * Fallback to default behavior if we do not have any suitable
> > > resource
> > >   * to allocate required region from and as the result we won't be
> > > able
> > > to
> > >   * construct pages.
> > >   */
> > > -   if (!target_resource)
> > > +   if (!target_resource) {
> > > +   if (contiguous)
> > > +   return -ENODEV;
> > > +
> > >      return xen_alloc_ballooned_pages(nr_pages, pages);
> > > +   }
> > > 
> > >      mutex_lock(&list_lock);
> > > -   if (list_count < nr_pages) {
> > > -   ret = fill_list(nr_pages - list_count);
> > > +
> > > +   while (!(vaddr = (void *)gen_pool_alloc(dma_pool, nr_pages *
> > > PAGE_SIZE))) {
> > > +   if (filled)
> > > +   ret = -ENOMEM;
> > > +   else {
> > > +   ret = fill_list(nr_pages);
> > > +   filled = true;
> > > +   }
> > >      if (ret)
> > >      goto out;
> > >      }
> > > 
> > >      for (i = 0; i < nr_pages; i++) {
> > > -   struct page *pg = page_list;
> > > -
> > > -   BUG_ON(!pg);
> > > -   page_list = pg->zone_device_data;
> > > -   list_count--;
> 

Re: [PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU

2022-06-14 Thread Stefano Stabellini
On Tue, 14 Jun 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in
> xmalloc()" extended the checks in _xmalloc() to catch any use of the
> helpers from context with interrupts disabled.
> 
> Unfortunately, the rule is not followed when initializing the per-CPU
> IRQs:
> 
> (XEN) Xen call trace:
> (XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC)
> (XEN)[<>]  (LR)
> (XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0
> (XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4
> (XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c
> (XEN)[<00288fa4>] start_secondary+0x194/0x274
> (XEN)[<40010170>] 40010170
> (XEN)
> (XEN)
> (XEN) 
> (XEN) Panic on CPU 2:
> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 
> 1)' failed at common/xmalloc_tlsf.c:601
> (XEN) 
> 
> This is happening because zalloc_cpumask_var() may allocate memory
> if NR_CPUS is > 2 * sizeof(unsigned long).
> 
> Avoid the problem by allocate the per-CPU IRQs while preparing the
> CPU.
> 
> This also has the benefit to remove a BUG_ON() in the secondary CPU
> code.
> 
> Signed-off-by: Julien Grall 
> ---
>  xen/arch/arm/include/asm/irq.h |  1 -
>  xen/arch/arm/irq.c | 35 +++---
>  xen/arch/arm/smpboot.c |  2 --
>  3 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h
> index e45d57459899..245f49dcbac5 100644
> --- a/xen/arch/arm/include/asm/irq.h
> +++ b/xen/arch/arm/include/asm/irq.h
> @@ -73,7 +73,6 @@ static inline bool is_lpi(unsigned int irq)
>  bool is_assignable_irq(unsigned int irq);
>  
>  void init_IRQ(void);
> -void init_secondary_IRQ(void);
>  
>  int route_irq_to_guest(struct domain *d, unsigned int virq,
> unsigned int irq, const char *devname);
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index b761d90c4063..56bdcb95335d 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -17,6 +17,7 @@
>   * GNU General Public License for more details.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -100,7 +101,7 @@ static int __init init_irq_data(void)
>  return 0;
>  }
>  
> -static int init_local_irq_data(void)
> +static int init_local_irq_data(unsigned int cpu)
>  {
>  int irq;
>  
> @@ -108,7 +109,7 @@ static int init_local_irq_data(void)
>  
>  for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
>  {
> -struct irq_desc *desc = irq_to_desc(irq);
> +struct irq_desc *desc = &per_cpu(local_irq_desc, cpu)[irq];
>  int rc = init_one_irq_desc(desc);
>  
>  if ( rc )
> @@ -131,6 +132,29 @@ static int init_local_irq_data(void)
>  return 0;
>  }
>  
> +static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> +void *hcpu)
> +{
> +unsigned long cpu = (unsigned long)hcpu;

unsigned int cpu ?

The rest looks good


> +int rc = 0;
> +
> +switch ( action )
> +{
> +case CPU_UP_PREPARE:
> +rc = init_local_irq_data(cpu);
> +if ( rc )
> +printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%lu\n",
> +   cpu);
> +break;
> +}
> +
> +return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
> +}
> +
> +static struct notifier_block cpu_nfb = {
> +.notifier_call = cpu_callback,
> +};
> +
>  void __init init_IRQ(void)
>  {
>  int irq;
> @@ -140,13 +164,10 @@ void __init init_IRQ(void)
>  local_irqs_type[irq] = IRQ_TYPE_INVALID;
>  spin_unlock(&local_irqs_type_lock);
>  
> -BUG_ON(init_local_irq_data() < 0);
> +BUG_ON(init_local_irq_data(smp_processor_id()) < 0);
>  BUG_ON(init_irq_data() < 0);
> -}
>  
> -void init_secondary_IRQ(void)
> -{
> -BUG_ON(init_local_irq_data() < 0);
> +register_cpu_notifier(&cpu_nfb);
>  }
>  
>  static inline struct irq_guest *irq_get_guest_info(struct irq_desc *desc)
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 9bb32a301a70..4888bcd78a5a 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -359,8 +359,6 @@ void start_secondary(void)
>  
>  gic_init_secondary_cpu();
>  
> -init_secondary_IRQ();
> -
>  set_current(idle_vcpu[cpuid]);
>  
>  setup_cpu_sibling_map(cpuid);
> -- 
> 2.32.0
> 



Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU

2022-06-14 Thread Stefano Stabellini
On Tue, 14 Jun 2022, Michal Orzel wrote:
> On 14.06.2022 13:08, Julien Grall wrote:
> >>> +    unsigned int rc = 0;
> >> ... here you are setting rc to 0 even though it will be reassigned.
> >> Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving 
> >> the definition there?
> > 
> > Because I forgot to replace "return NOTIFY_DONE;" with :
> > 
> > return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
> That is what I thought.
> With these fixes you can add my Rb.

And also my

Acked-by: Stefano Stabellini 

[qemu-mainline test] 171165: tolerable FAIL - PUSHED

2022-06-14 Thread osstest service owner
flight 171165 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/171165/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 171149
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 171149
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 171149
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 171149
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 171149
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 171149
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 171149
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 171149
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 qemuudebd0753663bc89c86f5462a53268f2e3f680f60
baseline version:
 qemuudcb40541ebca7ec98a14d461593b3cd7282b4fac

Last test of basis   171149  2022-06-13 00:38:22 Z1 days
Testing same since   171160  2022-06-14 06:39:46 Z0 days2 attempts


People who touched revisions under test:
  Alex Bennée 
  Daniel P. Berrangé 
  Paolo Bonzini 

[linux-linus test] 171161: regressions - FAIL

2022-06-14 Thread osstest service owner
flight 171161 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/171161/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 170714
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 170714
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 170714
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
170714
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 170714
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 170714
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 170714
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 170714
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 170714
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 170714
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-vhd   8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 170714
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 170714
 test-arm64-arm64-libvirt-raw  8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 170714
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 170714

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl   8 xen-boot   fail pass in 171154
 test-amd64-amd64-xl-vhd   8 xen-boot   fail pass in 171154
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot   fail pass in 171154
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot   fail pass in 171155
 test-amd64-amd64-pair12 xen-boot/src_host  fail pass in 171155
 test-amd64-amd64-pair13 xen-boot/dst_host  fail pass in 171155
 test-amd64-amd64-xl-credit1   8 xen-boot   fail pass in 171155
 test-amd64-amd64-xl-xsm   8 xen-boot   fail pass in 171155
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail pass in 171156

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170714
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170714
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170714
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170714
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170714
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 170714
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 170714
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 170714
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cred

Re: [PATCH v2 2/2] xen/arm: add FF-A mediator

2022-06-14 Thread Volodymyr Babchuk


Hello Jens,

Sorry for late review, I was busy with internal projects.

This is preliminary review. I gave up at scatter-gather operations. Need
more time to review them properly.

One thing that bothers me is that Xen is non-preemptive and there are
plenty potentially long-running operations.

Jens Wiklander  writes:

> Adds a FF-A version 1.1 [1] mediator to communicate with a Secure
> Partition in secure world.
>
> The implementation is the bare minimum to be able to communicate with
> OP-TEE running as an SPMC at S-EL1.
>
> This is loosely based on the TEE mediator framework and the OP-TEE
> mediator.
>
> [1] 
> https://urldefense.com/v3/__https://developer.arm.com/documentation/den0077/latest__;!!GF_29dbcQIUBPA!1rn9xKdmcgMXOyZ_CvNIVq-wAS1ZI_Ews1w-Gqt0YPwSXyyTJedeFQgD65WhhOwIf_-cIa4EINzmwM4o62XPcMt1cTLcMZ7d$
>  [developer[.]arm[.]com]
> Signed-off-by: Jens Wiklander 
> ---
>  xen/arch/arm/Kconfig  |   11 +
>  xen/arch/arm/Makefile |1 +
>  xen/arch/arm/domain.c |   10 +
>  xen/arch/arm/ffa.c| 1624 +
>  xen/arch/arm/include/asm/domain.h |4 +
>  xen/arch/arm/include/asm/ffa.h|   71 ++
>  xen/arch/arm/vsmc.c   |   17 +-
>  7 files changed, 1735 insertions(+), 3 deletions(-)
>  create mode 100644 xen/arch/arm/ffa.c
>  create mode 100644 xen/arch/arm/include/asm/ffa.h
>
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index ecfa6822e4d3..5b75067e2745 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -106,6 +106,17 @@ config TEE
>  
>  source "arch/arm/tee/Kconfig"
>  
> +config FFA
> + bool "Enable FF-A mediator support" if EXPERT
> + default n
> + depends on ARM_64
> + help
> +   This option enables a minamal FF-A mediator. The mediator is
> +   generic as it follows the FF-A specification [1], but it only
> +   implements a small substet of the specification.
> +
> +   [1] 
> https://urldefense.com/v3/__https://developer.arm.com/documentation/den0077/latest__;!!GF_29dbcQIUBPA!1rn9xKdmcgMXOyZ_CvNIVq-wAS1ZI_Ews1w-Gqt0YPwSXyyTJedeFQgD65WhhOwIf_-cIa4EINzmwM4o62XPcMt1cTLcMZ7d$
>  [developer[.]arm[.]com]
> +
>  endmenu
>  
>  menu "ARM errata workaround via the alternative framework"
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 1d862351d111..dbf5e593a069 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -20,6 +20,7 @@ obj-y += domain.o
>  obj-y += domain_build.init.o
>  obj-y += domctl.o
>  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> +obj-$(CONFIG_FFA) += ffa.o
>  obj-y += gic.o
>  obj-y += gic-v2.o
>  obj-$(CONFIG_GICV3) += gic-v3.o
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 8110c1df8638..a93e6a9c4aef 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -756,6 +757,9 @@ int arch_domain_create(struct domain *d,
>  if ( (rc = tee_domain_init(d, config->arch.tee_type)) != 0 )
>  goto fail;
>  
> +if ( (rc = ffa_domain_init(d)) != 0 )

So, FFA support will be enabled for each domain? I think that this is
fine for experimental feature, but I want to hear maintainer's opinion.

> +goto fail;
> +
>  update_domain_wallclock_time(d);
>  
>  /*
> @@ -998,6 +1002,7 @@ static int relinquish_memory(struct domain *d, struct 
> page_list_head *list)
>  enum {
>  PROG_pci = 1,
>  PROG_tee,
> +PROG_ffa,
>  PROG_xen,
>  PROG_page,
>  PROG_mapping,
> @@ -1046,6 +1051,11 @@ int domain_relinquish_resources(struct domain *d)
>  if (ret )
>  return ret;
>  
> +PROGRESS(ffa):
> +ret = ffa_relinquish_resources(d);
> +if (ret )

Coding style: if ( ret )

> +return ret;
> +
>  PROGRESS(xen):
>  ret = relinquish_memory(d, &d->xenpage_list);
>  if ( ret )
> diff --git a/xen/arch/arm/ffa.c b/xen/arch/arm/ffa.c
> new file mode 100644
> index ..9063b7f2b59e
> --- /dev/null
> +++ b/xen/arch/arm/ffa.c
> @@ -0,0 +1,1624 @@
> +/*
> + * xen/arch/arm/ffa.c
> + *
> + * Arm Firmware Framework for ARMv8-A(FFA) mediator
> + *
> + * Copyright (C) 2021  Linaro Limited

It is 2022 already :)

> + *
> + * Permission is hereby granted, free of charge, to any person
> + * obtaining a copy of this software and associated documentation
> + * files (the "Software"), to deal in the Software without restriction,
> + * including without limitation the rights to use, copy, modify, merge,
> + * publish, distribute, sublicense, and/or sell copies of the Software,
> + * and to permit persons to whom the Software is furnished to do so,
> + * subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be
> + * included in all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS"

Xen Security Advisory 404 v1 (CVE-2022-21123,CVE-2022-21124,CVE-2022-21166) - x86: MMIO Stale Data vulnerabilities

2022-06-14 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

 Xen Security Advisory CVE-2022-21123,CVE-2022-21124,CVE-2022-21166 / XSA-404

 x86: MMIO Stale Data vulnerabilities

ISSUE DESCRIPTION
=

This issue is related to the SRBDS, TAA and MDS vulnerabilities.  Please
see:

  https://xenbits.xen.org/xsa/advisory-320.html (SRBDS)
  https://xenbits.xen.org/xsa/advisory-305.html (TAA)
  https://xenbits.xen.org/xsa/advisory-297.html (MDS)

Please see Intel's whitepaper:

  
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html

IMPACT
==

An attacker might be able to directly read or infer data from other
security contexts in the system.  This can include data belonging to
other VMs, or to Xen itself.  The degree to which an attacker can obtain
data depends on the CPU, and the system configuration.

VULNERABLE SYSTEMS
==

Systems running all versions of Xen are affected.

Only x86 processors are vulnerable.  Processors from other manufacturers
(e.g. ARM) are not believed to be vulnerable.

Only Intel based processors are affected.  Processors from other x86
manufacturers (e.g. AMD) are not believed to be vulnerable.

Please consult the Intel Security Advisory for details on the affected
processors and configurations.

Per Xen's support statement, PCI passthrough should be to trusted
domains because the overall system security depends on factors outside
of Xen's control.

As such, Xen, in a supported configuration, is not vulnerable to
DRPW/SBDR.

MITIGATION
==

All mitigations depend on functionality added in the IPU 2022.1 (May
2022) microcode release from Intel.  Consult your dom0 OS vendor.

To the best of the security team's understanding, the summary is as
follows:

Server CPUs (Xeon EP/EX, Scalable, and some Atom servers), excluding
Xeon E3 (which use the client CPU design), are potentially vulnerable to
DRPW (CVE-2022-21166).

Client CPUs (inc Xeon E3) are, furthermore, potentially vulnerable to
SBDR (CVE-2022-21123) and SBDS (CVE-2022-21125).

SBDS only affects CPUs vulnerable to MDS.  On these CPUs, there are
previously undiscovered leakage channels.  There is no change to the
existing MDS mitigations.

DRPW and SBDR only affects configurations where less privileged domains
have MMIO mappings of buggy endpoints.  Consult your hardware vendor.

In configurations where less privileged domains have MMIO access to
buggy endpoints, `spec-ctrl=unpriv-mmio` can be enabled which will cause
Xen to mitigate cross-domain fill buffer leakage, and extend SRBDS
protections to protect RNG data from leakage.

RESOLUTION
==

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

The patches are still under review.  An update will be sent once they
are reviewed and the backports are done.

xsa404/xsa404-?.patch   xen-unstable

$ sha256sum xsa404*/*
18b307c2cbbd08d568e9dcb2447901d94e22ff1e3945c3436173aa693f6456fb  
xsa404/xsa404-1.patch
d6f193ad963396285e983aa1c18539f67222582711fc62105c21b71b3b53a97d  
xsa404/xsa404-2.patch
d2c123ccdf5eb9f862d6e9cb0e59045ae18799a07db149c7d90e301ca20436aa  
xsa404/xsa404-3.patch
$

NOTE CONCERNING CVE-2022-21127 / Update to SRBDS


An issue was discovered with the SRBDS microcode mitigation.  A
microcode update was released as part of Intel's IPU 2022.1 in May 2022.

Updating microcode is sufficient to fix the issue, with no extra actions
required on Xen's behalf.  Consult your dom0 OS vendor or OEM for
updated microcode.

NOTE CONCERNING CVE-2022-21180 / Undefined MMIO Hang


A related issue was discovered.  See:

  
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/undefined-mmio-hang.html

Xen is not vulnerable to UMH in supported configurations.

The only mitigation to is avoid passing impacted devices through to
untrusted guests.
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmKo0Z0MHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZc8cH/RFgxQ4L8OewWMxsuowpgLg8NVyYGFMBgttscBh+
ANpjRTnV4yQGpt9nNFDAcXT1c/fvWhypOiwadEtczRl5k/Q96JOKFdiAc1QR35Oj
vmbCLgO20jQ/GdTzaqKUaGBwi8GLShJvH1zMPJ2KuXk5w5uFDhj2gEiB6Kdv9+9O
4FBxQkpDzll0gs5v16ien8btKhEuZj9lNtzXZw5j4+DJD69MvQqsRPVdEt+M17Ox
XGYcpfpLeGUaIUPFTPZDcFIJnMvqPBQyt+2eaeR2ezW2ouNpxepCSPsEDlAmSZ/K
uZA0ShyJD3pfCxjc8eztyF/4zajY5EvuEtWdUZC/3zVaUec=
=4EdA
-END PGP SIGNATURE-


xsa404/xsa404-1.patch
Description: Binary data


xsa404/xsa404-2.patch
Description: Binary data


xsa404/xsa404-3.patch
Description: Binary data


Re: [RFC PATCH 2/2] xen/grant-table: Use unpopulated DMAable pages instead of real RAM ones

2022-06-14 Thread Oleksandr



On 11.06.22 02:55, Stefano Stabellini wrote:

Hello Stefano


On Thu, 9 Jun 2022, Oleksandr wrote:

On 04.06.22 00:19, Stefano Stabellini wrote:
Hello Stefano

Thank you for having a look and sorry for the late response.


On Tue, 17 May 2022, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

Depends on CONFIG_XEN_UNPOPULATED_ALLOC. If enabled then unpopulated
DMAable (contiguous) pages will be allocated for grant mapping into
instead of ballooning out real RAM pages.

TODO: Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages()
fails.

Signed-off-by: Oleksandr Tyshchenko 
---
   drivers/xen/grant-table.c | 27 +++
   1 file changed, 27 insertions(+)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 8ac..2bb4392 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -864,6 +864,25 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
*/
   int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
   {
+#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
+   int ret;

This is an alternative implementation of the same function.

Currently, yes.



   If we are
going to use #ifdef, then I would #ifdef the entire function, rather
than just the body. Otherwise within the function body we can use
IS_ENABLED.


Good point. Note, there is one missing thing in current patch which is
described in TODO.

"Fallback to real RAM pages if xen_alloc_unpopulated_dma_pages() fails."  So I
will likely use IS_ENABLED within the function body.

If CONFIG_XEN_UNPOPULATED_ALLOC is enabled then gnttab_dma_alloc_pages() will
try to call xen_alloc_unpopulated_dma_pages() the first and if fails then
fallback to allocate RAM pages and balloon them out.

One moment is not entirely clear to me. If we use fallback in
gnttab_dma_alloc_pages() then we must use fallback in gnttab_dma_free_pages()
as well, we cannot use xen_free_unpopulated_dma_pages() for real RAM pages.
The question is how to pass this information to the gnttab_dma_free_pages()?
The first idea which comes to mind is to add a flag to struct
gnttab_dma_alloc_args...
  
You can check if the page is within the mhp_range range or part of

iomem_resource? If not, you can free it as a normal page.

If we do this, then the fallback is better implemented in
unpopulated-alloc.c because that is the one that is aware about
page addresses.



I got your idea and agree this can work technically. Or if we finally 
decide to use the second option (use "dma_pool" for all) in the first patch
"[RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA 
allocations" then we will likely be able to check whether a page in question

is within a "dma_pool" using gen_pool_has_addr().

I am still wondering, can we avoid the fallback implementation in 
unpopulated-alloc.c.
Because for that purpose we will need to pull more code into 
unpopulated-alloc.c (to be more precise, almost everything which 
gnttab_dma_free_pages() already has except gnttab_pages_clear_private()) 
and pass more arguments to xen_free_unpopulated_dma_pages(). Also I 
might mistake, but having a fallback split between grant-table.c (to 
allocate RAM pages in gnttab_dma_alloc_pages()) and unpopulated-alloc.c 
(to free RAM pages in xen_free_unpopulated_dma_pages()) would look a bit 
weird.


I see two possible options for the fallback implementation in grant-table.c:
1. (less preferable) by introducing new flag in struct gnttab_dma_alloc_args
2. (more preferable) by guessing unpopulated (non real RAM) page using 
is_zone_device_page(), etc



For example, with the second option the resulting code will look quite 
simple (only build tested):


diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 738029d..3bda71f 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1047,6 +1047,23 @@ int gnttab_dma_alloc_pages(struct 
gnttab_dma_alloc_args *args)

    size_t size;
    int i, ret;

+   if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC)) {
+   ret = xen_alloc_unpopulated_dma_pages(args->dev, 
args->nr_pages,

+   args->pages);
+   if (ret < 0)
+   goto fallback;
+
+   ret = gnttab_pages_set_private(args->nr_pages, args->pages);
+   if (ret < 0)
+   goto fail;
+
+   args->vaddr = page_to_virt(args->pages[0]);
+   args->dev_bus_addr = page_to_phys(args->pages[0]);
+
+   return ret;
+   }
+
+fallback:
    size = args->nr_pages << PAGE_SHIFT;
    if (args->coherent)
    args->vaddr = dma_alloc_coherent(args->dev, size,
@@ -1103,6 +1120,12 @@ int gnttab_dma_free_pages(struct 
gnttab_dma_alloc_args *args)


    gnttab_pages_clear_private(args->nr_pages, args->pages);

+   if (IS_ENABLED(CONFIG_XEN_UNPOPULATED_ALLOC) &&
+   is_zone_device_page(args->pages[0])) {
+   xen_free_unpopulated_dma_pages(args->

Re: [RFC PATCH 1/2] xen/unpopulated-alloc: Introduce helpers for DMA allocations

2022-06-14 Thread Oleksandr



On 11.06.22 03:12, Stefano Stabellini wrote:


Hello Stefano



On Wed, 8 Jun 2022, Oleksandr wrote:

2. Drop the "page_list" entirely and use "dma_pool" for all (contiguous and
non-contiguous) allocations. After all, all pages are initially contiguous in
fill_list() as they are built from the resource. This changes behavior for all
users of xen_alloc_unpopulated_pages()

Below the diff for unpopulated-alloc.c. The patch is also available at:

https://github.com/otyshchenko1/linux/commit/7be569f113a4acbdc4bcb9b20cb3995b3151387a


diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
index a39f2d3..ab5c7bd 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -1,5 +1,7 @@
  // SPDX-License-Identifier: GPL-2.0
+#include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -13,8 +15,8 @@
  #include 

  static DEFINE_MUTEX(list_lock);
-static struct page *page_list;
-static unsigned int list_count;
+
+static struct gen_pool *dma_pool;

  static struct resource *target_resource;

@@ -36,7 +38,7 @@ static int fill_list(unsigned int nr_pages)
     struct dev_pagemap *pgmap;
     struct resource *res, *tmp_res = NULL;
     void *vaddr;
-   unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
+   unsigned int alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
     struct range mhp_range;
     int ret;

@@ -106,6 +108,7 @@ static int fill_list(unsigned int nr_pages)
   * conflict with any devices.
   */
     if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned int i;
     xen_pfn_t pfn = PFN_DOWN(res->start);

     for (i = 0; i < alloc_pages; i++) {
@@ -125,16 +128,17 @@ static int fill_list(unsigned int nr_pages)
     goto err_memremap;
     }

-   for (i = 0; i < alloc_pages; i++) {
-   struct page *pg = virt_to_page(vaddr + PAGE_SIZE * i);
-
-   pg->zone_device_data = page_list;
-   page_list = pg;
-   list_count++;
+   ret = gen_pool_add_virt(dma_pool, (unsigned long)vaddr, res->start,
+   alloc_pages * PAGE_SIZE, NUMA_NO_NODE);
+   if (ret) {
+   pr_err("Cannot add memory range to the pool\n");
+   goto err_pool;
     }

     return 0;

+err_pool:
+   memunmap_pages(pgmap);
  err_memremap:
     kfree(pgmap);
  err_pgmap:
@@ -149,51 +153,49 @@ static int fill_list(unsigned int nr_pages)
     return ret;
  }

-/**
- * xen_alloc_unpopulated_pages - alloc unpopulated pages
- * @nr_pages: Number of pages
- * @pages: pages returned
- * @return 0 on success, error otherwise
- */
-int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
+static int alloc_unpopulated_pages(unsigned int nr_pages, struct page
**pages,
+   bool contiguous)
  {
     unsigned int i;
     int ret = 0;
+   void *vaddr;
+   bool filled = false;

     /*
  * Fallback to default behavior if we do not have any suitable
resource
  * to allocate required region from and as the result we won't be able
to
  * construct pages.
  */
-   if (!target_resource)
+   if (!target_resource) {
+   if (contiguous)
+   return -ENODEV;
+
     return xen_alloc_ballooned_pages(nr_pages, pages);
+   }

     mutex_lock(&list_lock);
-   if (list_count < nr_pages) {
-   ret = fill_list(nr_pages - list_count);
+
+   while (!(vaddr = (void *)gen_pool_alloc(dma_pool, nr_pages *
PAGE_SIZE))) {
+   if (filled)
+   ret = -ENOMEM;
+   else {
+   ret = fill_list(nr_pages);
+   filled = true;
+   }
     if (ret)
     goto out;
     }

     for (i = 0; i < nr_pages; i++) {
-   struct page *pg = page_list;
-
-   BUG_ON(!pg);
-   page_list = pg->zone_device_data;
-   list_count--;
-   pages[i] = pg;
+   pages[i] = virt_to_page(vaddr + PAGE_SIZE * i);

  #ifdef CONFIG_XEN_HAVE_PVMMU
     if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-   ret = xen_alloc_p2m_entry(page_to_pfn(pg));
+   ret = xen_alloc_p2m_entry(page_to_pfn(pages[i]));
     if (ret < 0) {
-   unsigned int j;
-
-   for (j = 0; j <= i; j++) {
- pages[j]->zone_device_data = page_list;
-   page_list = pages[j];
-   list_count++;
-   }
+   /* XXX Do we need to zeroed pages[i]? */
+   gen_pool_free(dma_pool, (unsigned long)vaddr,
+ 

[XEN PATCH v2 1/4] build,include: rework shell script for headers++.chk

2022-06-14 Thread Anthony PERARD
The command line generated for headers++.chk by make is quite long,
and in some environment it is too long. This issue have been seen in
Yocto build environment.

Error messages:
make[9]: execvp: /bin/sh: Argument list too long
make[9]: *** [include/Makefile:181: include/headers++.chk] Error 127

Rework so that we do the foreach loop in shell rather that make to
reduce the command line size by a lot. We also need a way to get
headers prerequisite for some public headers so we use a switch "case"
in shell to be able to do some simple pattern matching. Variables
alone in POSIX shell don't allow to work with associative array or
variables with "/".

Reported-by: Bertrand Marquis 
Fixes: 28e13c7f43 ("build: xen/include: use if_changed")
Signed-off-by: Anthony PERARD 
Reviewed-by: Bertrand Marquis 
---

Notes:
v2:
- fix typo in commit message
- fix out-of-tree build

v1:
- was sent as a reply to v1 of the series

 xen/include/Makefile | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/xen/include/Makefile b/xen/include/Makefile
index 6d9bcc19b0..49c75a78f9 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -158,13 +158,22 @@ define cmd_headerscxx_chk
touch $@.new; \
exit 0;   \
fi;   \
-   $(foreach i, $(filter %.h,$^),\
-   echo "#include "\"$(i)\"  \
+   get_prereq() {\
+   case $$1 in   \
+   $(foreach i, $(filter %.h,$^),\
+   $(if $($(patsubst $(srctree)/%,%,$(i))-prereq),   \
+   $(i)$(close)  \
+   echo "$(foreach j, $($(patsubst $(srctree)/%,%,$(i))-prereq), \
+   -include c$(j))";;))  \
+   esac; \
+   };\
+   for i in $(filter %.h,$^); do \
+   echo "#include "\"$$i\"   \
| $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__\
  -include stdint.h -include $(srcdir)/public/xen.h   \
- $(foreach j, $($(patsubst $(srctree)/%,%,$i)-prereq), -include 
c$(j)) \
+ `get_prereq $$i`\
  -S -o /dev/null -   \
-   || exit $$?; echo $(i) >> $@.new;) \
+   || exit $$?; echo $$i >> $@.new; done;\
mv $@.new $@
 endef
 
-- 
Anthony PERARD




[XEN PATCH v2 4/4] build: replace get-fields.sh by a perl script

2022-06-14 Thread Anthony PERARD
The get-fields.sh which generate all the include/compat/.xlat/*.h
headers is quite slow. It takes for example nearly 3 seconds to
generate platform.h on a recent machine, or 2.3 seconds for memory.h.

Since it's only text processing, rewriting the mix of shell/sed/python
into a single perl script make the generation of those file a lot
faster.

I tried to keep a similar look for the code, to keep the code similar
between the shell and perl, and to ease review. So some code in perl
might look weird or could be written better.

No functional change, the headers generated are identical.

Signed-off-by: Anthony PERARD 
---

Notes:
v2:
- Add .pl extension to the perl script
- remove "-w" from the shebang as it is duplicate of "use warning;"
- Add a note in the commit message that the "headers generated are 
identical".

 xen/include/Makefile|   6 +-
 xen/tools/compat-xlat-header.pl | 539 
 xen/tools/get-fields.sh | 528 ---
 3 files changed, 541 insertions(+), 532 deletions(-)
 create mode 100755 xen/tools/compat-xlat-header.pl
 delete mode 100644 xen/tools/get-fields.sh

diff --git a/xen/include/Makefile b/xen/include/Makefile
index 0d3e3d66e0..31b0be14bc 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -60,9 +60,7 @@ cmd_compat_c = \
 
 quiet_cmd_xlat_headers = GEN $@
 cmd_xlat_headers = \
-while read what name; do \
-$(SHELL) $(srctree)/tools/get-fields.sh "$$what" compat_$$name $< || 
exit $$?; \
-done <$(patsubst $(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst 
>$@.new; \
+$(PERL) $(srctree)/tools/compat-xlat-header.pl $< $(patsubst 
$(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst > $@.new; \
 mv -f $@.new $@
 
 targets += $(headers-y)
@@ -80,7 +78,7 @@ $(obj)/compat/%.c: $(src)/public/%.h $(srcdir)/xlat.lst 
$(srctree)/tools/compat-
$(call if_changed,compat_c)
 
 targets += $(patsubst compat/%, compat/.xlat/%, $(headers-y))
-$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst 
$(srctree)/tools/get-fields.sh FORCE
+$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst 
$(srctree)/tools/compat-xlat-header.pl FORCE
$(call if_changed,xlat_headers)
 
 quiet_cmd_xlat_lst = GEN $@
diff --git a/xen/tools/compat-xlat-header.pl b/xen/tools/compat-xlat-header.pl
new file mode 100755
index 00..791230591c
--- /dev/null
+++ b/xen/tools/compat-xlat-header.pl
@@ -0,0 +1,539 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+open COMPAT_LIST, "<$ARGV[1]" or die "can't open $ARGV[1], $!";
+open HEADER, "<$ARGV[0]" or die "can't open $ARGV[0], $!";
+
+my @typedefs;
+
+my @header_tokens;
+while () {
+next if m/^\s*#.*/;
+s/([\]\[,;:{}])/ $1 /g;
+s/^\s+//;
+push(@header_tokens, split(/\s+/));
+}
+
+sub get_fields {
+my ($looking_for) = @_;
+my $level = 1;
+my $aggr = 0;
+my ($name, @fields);
+
+foreach (@header_tokens) {
+if (/^(struct|union)$/) {
+unless ($level != 1) {
+$aggr = 1;
+@fields = ();
+$name = '';
+}
+} elsif ($_ eq '{') {
+$level++;
+} elsif ($_ eq '}') {
+$level--;
+if ($level == 1 and $name eq $looking_for) {
+push (@fields, $_);
+return @fields;
+}
+} elsif (/^[a-zA-Z_].*/) {
+unless ($aggr == 0 or $name ne "") {
+$name = $_;
+}
+}
+unless ($aggr == 0) {
+push (@fields, $_);
+}
+}
+return ();
+}
+
+sub get_typedefs {
+my $level = 1;
+my $state = 0;
+my @typedefs;
+foreach (@_) {
+if ($_ eq 'typedef') {
+unless ($level != 1) {
+$state = 1;
+}
+} elsif (m/^COMPAT_HANDLE\(.*\)$/) {
+unless ($level != 1 or $state != 1) {
+$state = 2;
+}
+} elsif (m/^[\[\{]$/) {
+$level++;
+} elsif (m/^[\]\}]$/) {
+$level--;
+} elsif ($_ eq ';') {
+unless  ($level != 1) {
+$state = 0;
+}
+} elsif (m/^[a-zA-Z_].*$/) {
+unless ($level != 1 or $state != 2) {
+push (@typedefs, $_);
+}
+}
+}
+return @typedefs;
+}
+
+sub build_enums {
+my ($name, @tokens) = @_;
+
+my $level = 1;
+my $kind = '';
+my $named = '';
+my (@fields, @members, $id);
+
+foreach (@tokens) {
+if (m/^(struct|union)$/) {
+unless ($level != 2) {
+@fields = ('');
+}
+$kind="$_;$kind";
+} elsif ($_ eq '{') {
+$level++;
+} elsif ($_ eq '}') {
+$level--;
+if ($level == 1) {
+my $subkind = $kind;
+$subkind =~ s/;.*//;
+

[XEN PATCH v2 3/4] build: set PERL

2022-06-14 Thread Anthony PERARD
We are going to use it in a moment.

Also update README about Perl requirement.

Signed-off-by: Anthony PERARD 
---

Notes:
v2:
- update ./README

 xen/Makefile | 1 +
 README   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/xen/Makefile b/xen/Makefile
index 82f5310b12..a6650a2acc 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -22,6 +22,7 @@ PYTHON_INTERPRETER:= $(word 1,$(shell which python3 
python python2 2>/dev/null)
 export PYTHON  ?= $(PYTHON_INTERPRETER)
 
 export CHECKPOLICY ?= checkpolicy
+export PERL?= perl
 
 $(if $(filter __%, $(MAKECMDGOALS)), \
 $(error targets prefixed with '__' are only for internal use))
diff --git a/README b/README
index 5e55047ffd..c1c18de7e0 100644
--- a/README
+++ b/README
@@ -64,6 +64,7 @@ provided by your OS distributor:
 * iproute package (/sbin/ip)
 * GNU bison and GNU flex
 * ACPI ASL compiler (iasl)
+* Perl
 
 In addition to the above there are a number of optional build
 prerequisites. Omitting these will cause the related features to be
-- 
Anthony PERARD




[XEN PATCH v2 2/4] build: remove auto.conf prerequisite from compat/xlat.h target

2022-06-14 Thread Anthony PERARD
Now that the command line generating "xlat.h" is check on rebuild, the
header will be regenerated whenever the list of xlat headers changes
due to change in ".config". We don't need to force a regeneration for
every changes in ".config".

Signed-off-by: Anthony PERARD 
Acked-by: Jan Beulich 
---
 xen/include/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/Makefile b/xen/include/Makefile
index 49c75a78f9..0d3e3d66e0 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -101,7 +101,7 @@ cmd_xlat_h = \
cat $(filter %.h,$^) >$@.new; \
mv -f $@.new $@
 
-$(obj)/compat/xlat.h: $(addprefix $(obj)/compat/.xlat/,$(xlat-y)) 
$(obj)/config/auto.conf FORCE
+$(obj)/compat/xlat.h: $(addprefix $(obj)/compat/.xlat/,$(xlat-y)) FORCE
$(call if_changed,xlat_h)
 
 ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH))
-- 
Anthony PERARD




[XEN PATCH v2 0/4] xen: rework compat headers generation

2022-06-14 Thread Anthony PERARD
Patch series available in this git branch:
https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git 
br.build-system-xen-include-rework-v2

v2:
- new patch [1/4] to fix issue with command line that can be way too long
- other small changes, and reorder patches

Hi,

This patch series is about 2 improvement. First one is to use $(if_changed, )
in "include/Makefile" to make the generation of the compat headers less verbose
and to have the command line part of the decision to rebuild the headers.
Second one is to replace one slow script by a much faster one, and save time
when generating the headers.

There's some number here:

https://lore.kernel.org/xen-devel/yp3%2f%2fc%2fcacwlh...@perard.uk.xensource.com/

On my machine when doing a full build [of the hypervisor], with `ccache`
enabled, it saves about 1.17 seconds (out of ~17s), and without ccache, it
saves about 2.0 seconds (out of ~37s).

Thanks.

Anthony PERARD (4):
  build,include: rework shell script for headers++.chk
  build: remove auto.conf prerequisite from compat/xlat.h target
  build: set PERL
  build: replace get-fields.sh by a perl script

 xen/Makefile|   1 +
 xen/include/Makefile|  25 +-
 README  |   1 +
 xen/tools/compat-xlat-header.pl | 539 
 xen/tools/get-fields.sh | 528 ---
 5 files changed, 557 insertions(+), 537 deletions(-)
 create mode 100755 xen/tools/compat-xlat-header.pl
 delete mode 100644 xen/tools/get-fields.sh

-- 
Anthony PERARD




Re: [PATCH] build: fix exporting for make 3.82

2022-06-14 Thread Jan Beulich
On 14.06.2022 18:00, Anthony PERARD wrote:
> On Tue, Jun 14, 2022 at 05:40:27PM +0200, Jan Beulich wrote:
>> GNU make 3.82 apparently has a quirk where exporting an undefined
>> variable prevents its value from subsequently being updated. This
>> situation can arise due to our adding of -rR to MAKEFLAGS, which takes
>> effect also on make simply re-invoking itself. Once these flags are in
>> effect, CC (in particular) is empty (undefined), and would be defined
>> only via Config.mk including StdGNU.mk or alike. With the quirk, CC
>> remains empty, yet with an empty CC the compiler minimum version check
>> fails, breaking the build.
>>
>> Move the exporting of the various tool stack component variables past
>> where they gain their (final) values.
>>
>> Signed-off-by: Jan Beulich 
>> ---
>> There may be further factors playing into the described quirk, as I've
>> also observed that simply running make as 2nd time would lead to
>> successful building of xen/.
>>
>> While this wasn't a problem until several weeks back, I've not been able
>> to identify which exact commit would have caused the breakage. Hence no
>> Fixes: tag.
> 
> Reviewed-by: Anthony PERARD 

Thanks.

> Looks like this happened before: be63d9d47f ("build: tweak variable exporting 
> for make 3.82")

Ah yes. I did think I had to deal with that before, but I did check patches
only back to early 2021. But it's somewhat worse than described there: It's
not just the origin which changes, but (as explained) it actually prevents
the variable to further change its value.

> So, maybe the issue is started again with 15a0578ca4 ("build: shuffle
> main Makefile"), which move the include of Config.mk even later.

Yes, that's certainly it. Thanks for spotting - I'll add a Fixes: tag.

Jan



[PATCH] x86emul/test: improve failure location identification for FMA sub-test

2022-06-14 Thread Jan Beulich
When some FMA set of insns is included in the base instruction set (XOP,
AVX512F, and AVX512-FP16 at present), simd_test() simply invokes
fma_test(), negating its return value. In case of a failure this would
yield a value close to 4G, which doesn't lend itself to easy
identification of the failing test case. Recognize the case in
simd_check_regs() and emit alternative output identifying FMA.

Signed-off-by: Jan Beulich 

--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -259,7 +259,10 @@ static bool simd_check_regs(const struct
 {
 if ( !regs->eax )
 return true;
-printf("[line %u] ", (unsigned int)regs->eax);
+if ( (int)regs->eax > 0 )
+printf("[line %u] ", (unsigned int)regs->eax);
+else
+printf("[FMA line %u] ", (unsigned int)-regs->eax);
 return false;
 }
 



Re: [PATCH] build: fix exporting for make 3.82

2022-06-14 Thread Anthony PERARD
On Tue, Jun 14, 2022 at 05:40:27PM +0200, Jan Beulich wrote:
> GNU make 3.82 apparently has a quirk where exporting an undefined
> variable prevents its value from subsequently being updated. This
> situation can arise due to our adding of -rR to MAKEFLAGS, which takes
> effect also on make simply re-invoking itself. Once these flags are in
> effect, CC (in particular) is empty (undefined), and would be defined
> only via Config.mk including StdGNU.mk or alike. With the quirk, CC
> remains empty, yet with an empty CC the compiler minimum version check
> fails, breaking the build.
> 
> Move the exporting of the various tool stack component variables past
> where they gain their (final) values.
> 
> Signed-off-by: Jan Beulich 
> ---
> There may be further factors playing into the described quirk, as I've
> also observed that simply running make as 2nd time would lead to
> successful building of xen/.
> 
> While this wasn't a problem until several weeks back, I've not been able
> to identify which exact commit would have caused the breakage. Hence no
> Fixes: tag.

Reviewed-by: Anthony PERARD 

Looks like this happened before: be63d9d47f ("build: tweak variable exporting 
for make 3.82")
So, maybe the issue is started again with 15a0578ca4 ("build: shuffle
main Makefile"), which move the include of Config.mk even later.

Thanks,

-- 
Anthony PERARD



Re: [PATCH] build: fix exporting for make 3.82

2022-06-14 Thread Jan Beulich
On 14.06.2022 17:40, Jan Beulich wrote:
> GNU make 3.82 apparently has a quirk where exporting an undefined
> variable prevents its value from subsequently being updated. This
> situation can arise due to our adding of -rR to MAKEFLAGS, which takes
> effect also on make simply re-invoking itself. Once these flags are in
> effect, CC (in particular) is empty (undefined), and would be defined
> only via Config.mk including StdGNU.mk or alike. With the quirk, CC
> remains empty, yet with an empty CC the compiler minimum version check
> fails, breaking the build.
> 
> Move the exporting of the various tool stack component variables past
> where they gain their (final) values.
> 
> Signed-off-by: Jan Beulich 
> ---
> There may be further factors playing into the described quirk, as I've
> also observed that simply running make as 2nd time would lead to
> successful building of xen/.

Albeit perhaps that's simply because then no re-invocation of make is
involved, as auto.conf and auto.conf.cmd then already exist (and are
up-to-date).

Jan

> While this wasn't a problem until several weeks back, I've not been able
> to identify which exact commit would have caused the breakage. Hence no
> Fixes: tag.
> 
> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -44,8 +44,6 @@ export ARCH SRCARCH
>  # Allow someone to change their config file
>  export KCONFIG_CONFIG ?= .config
>  
> -export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE
> -
>  export TARGET := xen
>  
>  .PHONY: dist
> @@ -244,6 +242,7 @@ export TARGET_ARCH := $(shell echo $
>  -e s'/riscv.*/riscv/g')
>  
>  export CONFIG_SHELL := $(SHELL)
> +export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE
>  export YACC = $(if $(BISON),$(BISON),bison)
>  export LEX = $(if $(FLEX),$(FLEX),flex)
>  
> 




[PATCH] build: fix exporting for make 3.82

2022-06-14 Thread Jan Beulich
GNU make 3.82 apparently has a quirk where exporting an undefined
variable prevents its value from subsequently being updated. This
situation can arise due to our adding of -rR to MAKEFLAGS, which takes
effect also on make simply re-invoking itself. Once these flags are in
effect, CC (in particular) is empty (undefined), and would be defined
only via Config.mk including StdGNU.mk or alike. With the quirk, CC
remains empty, yet with an empty CC the compiler minimum version check
fails, breaking the build.

Move the exporting of the various tool stack component variables past
where they gain their (final) values.

Signed-off-by: Jan Beulich 
---
There may be further factors playing into the described quirk, as I've
also observed that simply running make as 2nd time would lead to
successful building of xen/.

While this wasn't a problem until several weeks back, I've not been able
to identify which exact commit would have caused the breakage. Hence no
Fixes: tag.

--- a/xen/Makefile
+++ b/xen/Makefile
@@ -44,8 +44,6 @@ export ARCH SRCARCH
 # Allow someone to change their config file
 export KCONFIG_CONFIG ?= .config
 
-export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE
-
 export TARGET := xen
 
 .PHONY: dist
@@ -244,6 +242,7 @@ export TARGET_ARCH := $(shell echo $
 -e s'/riscv.*/riscv/g')
 
 export CONFIG_SHELL := $(SHELL)
+export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE
 export YACC = $(if $(BISON),$(BISON),bison)
 export LEX = $(if $(FLEX),$(FLEX),flex)
 



Re: [PATCH] tools/xenstore: simplify loop handling connection I/O

2022-06-14 Thread Julien Grall

Hi Juergen,

On 14/06/2022 16:31, Juergen Gross wrote:

The loop handling input and output of connections of xenstored is
open coding list_for_each_entry_safe() in an incredibly complicated
way.

Use list_for_each_entry_safe() instead, making it much more clear how
the code is working.

Signed-off-by: Juergen Gross 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall



[PATCH] tools/xenstore: simplify loop handling connection I/O

2022-06-14 Thread Juergen Gross
The loop handling input and output of connections of xenstored is
open coding list_for_each_entry_safe() in an incredibly complicated
way.

Use list_for_each_entry_safe() instead, making it much more clear how
the code is working.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_core.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 6e4022e5da..fa733e714e 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2368,16 +2368,8 @@ int main(int argc, char *argv[])
}
}
 
-   next = list_entry(connections.next, typeof(*conn), list);
-   if (&next->list != &connections)
-   talloc_increase_ref_count(next);
-   while (&next->list != &connections) {
-   conn = next;
-
-   next = list_entry(conn->list.next,
- typeof(*conn), list);
-   if (&next->list != &connections)
-   talloc_increase_ref_count(next);
+   list_for_each_entry_safe(conn, next, &connections, list) {
+   talloc_increase_ref_count(conn);
 
if (conn_can_read(conn))
handle_input(conn);
-- 
2.35.3




Re: [PULL 00/15] Kraxel 20220614 patches

2022-06-14 Thread Richard Henderson

On 6/14/22 05:15, Gerd Hoffmann wrote:

The following changes since commit debd0753663bc89c86f5462a53268f2e3f680f60:

   Merge tag 'pull-testing-next-140622-1' of https://github.com/stsquad/qemu 
into staging (2022-06-13 21:10:57 -0700)

are available in the Git repository at:

   git://git.kraxel.org/qemu tags/kraxel-20220614-pull-request

for you to fetch changes up to b95b56311a0890da0c9f7fc624529c3d7f8dbce0:

   virtio-gpu: Respect UI refresh rate for EDID (2022-06-14 10:34:37 +0200)


usb: add CanoKey device, fixes for ehci + redir
ui: fixes for gtk and cocoa, rework refresh rate
virtio-gpu: scanout flush fix


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Akihiko Odaki (4):
   ui/cocoa: Fix poweroff request code
   ui/console: Do not return a value with ui_info
   ui: Deliver refresh rate via QemuUIInfo
   virtio-gpu: Respect UI refresh rate for EDID

Arnout Engelen (1):
   hw/usb/hcd-ehci: fix writeback order

Dongwon Kim (1):
   virtio-gpu: update done only on the scanout associated with rect

Hongren (Zenithal) Zheng (6):
   hw/usb: Add CanoKey Implementation
   hw/usb/canokey: Add trace events
   meson: Add CanoKey
   docs: Add CanoKey documentation
   docs/system/devices/usb: Add CanoKey to USB devices examples
   MAINTAINERS: add myself as CanoKey maintainer

Joelle van Dyne (1):
   usbredir: avoid queuing hello packet on snapshot restore

Volker Rümelin (2):
   ui/gtk-gl-area: implement GL context destruction
   ui/gtk-gl-area: create the requested GL context version

  meson_options.txt|   2 +
  hw/usb/canokey.h |  69 +++
  include/hw/virtio/virtio-gpu.h   |   1 +
  include/ui/console.h |   4 +-
  include/ui/gtk.h |   2 +-
  hw/display/virtio-gpu-base.c |   7 +-
  hw/display/virtio-gpu.c  |   4 +
  hw/display/virtio-vga.c  |   5 +-
  hw/display/xenfb.c   |  14 +-
  hw/usb/canokey.c | 313 +++
  hw/usb/hcd-ehci.c|   5 +-
  hw/usb/redirect.c|   3 +-
  hw/vfio/display.c|   8 +-
  ui/console.c |   6 -
  ui/gtk-egl.c |   4 +-
  ui/gtk-gl-area.c |  42 -
  ui/gtk.c |  45 +++--
  MAINTAINERS  |   8 +
  docs/system/device-emulation.rst |   1 +
  docs/system/devices/canokey.rst  | 168 +
  docs/system/devices/usb.rst  |   4 +
  hw/usb/Kconfig   |   5 +
  hw/usb/meson.build   |   5 +
  hw/usb/trace-events  |  16 ++
  meson.build  |   6 +
  scripts/meson-buildoptions.sh|   3 +
  ui/cocoa.m   |   6 +-
  ui/trace-events  |   2 +
  28 files changed, 707 insertions(+), 51 deletions(-)
  create mode 100644 hw/usb/canokey.h
  create mode 100644 hw/usb/canokey.c
  create mode 100644 docs/system/devices/canokey.rst






Re: [PATCH 1/2] x86/pat: fix x86_has_pat_wp()

2022-06-14 Thread Juergen Gross

On 03.05.22 15:22, Juergen Gross wrote:

x86_has_pat_wp() is using a wrong test, as it relies on the normal
PAT configuration used by the kernel. In case the PAT MSR has been
setup by another entity (e.g. BIOS or Xen hypervisor) it might return
false even if the PAT configuration is allowing WP mappings.

Fixes: 1f6f655e01ad ("x86/mm: Add a x86_has_pat_wp() helper")
Signed-off-by: Juergen Gross 
---
  arch/x86/mm/init.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d8cfce221275..71e182ebced3 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -80,7 +80,8 @@ static uint8_t __pte2cachemode_tbl[8] = {
  /* Check that the write-protect PAT entry is set for write-protect */
  bool x86_has_pat_wp(void)
  {
-   return __pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] == _PAGE_CACHE_MODE_WP;
+   return __pte2cachemode_tbl[__cachemode2pte_tbl[_PAGE_CACHE_MODE_WP]] ==
+  _PAGE_CACHE_MODE_WP;
  }
  
  enum page_cache_mode pgprot2cachemode(pgprot_t pgprot)


x86 maintainers, please consider taking this patch, as it is fixing
a real bug. Patch 2 of this series can be dropped IMO.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-14 Thread Steven Rostedt
On Thu, 9 Jun 2022 15:02:20 +0200
Petr Mladek  wrote:

> > I'm somewhat curious whether we can actually remove that trace event.  
> 
> Good question.
> 
> Well, I think that it might be useful. It allows to see trace and
> printk messages together.

Yes people still use it. I was just asked about it at Kernel Recipes. That
is, someone wanted printk mixed in with the tracing, and I told them about
this event (which they didn't know about but was happy to hear that it
existed).

-- Steve



[qemu-mainline test] 171160: regressions - FAIL

2022-06-14 Thread osstest service owner
flight 171160 qemu-mainline real [real]
flight 171163 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/171160/
http://logs.test-lab.xenproject.org/osstest/logs/171163/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qcow222 guest-start.2fail REGR. vs. 171149

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qcow2 21 guest-start/debian.repeat fail in 171163 pass in 
171160
 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail pass in 
171163-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-rtds 19 guest-start.2   fail blocked in 171149
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 171149
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 171149
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 171149
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 171149
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 171149
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 171149
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 171149
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 171149
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail  

Re: [PATCH 24/30] panic: Refactor the panic path

2022-06-14 Thread Petr Mladek
On Thu 2022-05-26 13:25:57, Guilherme G. Piccoli wrote:
> OK, so it seems we have some points in which agreement exists, and some
> points that there is no agreement and instead, we have antagonistic /
> opposite views and needs. Let's start with the easier part heh
>
> It seems everybody agrees that *we shouldn't over-engineer things*, and
> as per Eric good words: making the panic path more feature-full or
> increasing flexibility isn't a good idea. So, as a "corollary": the
> panic level approach I'm proposing is not a good fit, I'll drop it and
> let's go with something simpler.

Makes sense.

> Another point of agreement seems to be that _notifier lists in the panic
> path are dangerous_, for *2 different reasons*:
> 
> (a) We cannot guarantee that people won't add crazy callbacks there, we
> can plan and document things the best as possible - it'll never be
> enough, somebody eventually would slip a nonsense callback that would
> break things and defeat the planned purpose of such a list;

It is true that notifier lists might allow to add crazy stuff
without proper review more easily. Things added into the core
code would most likely get better review.

But nothing is error-proof. And bugs will happen with any approach.


> (b) As per Eric point, in a panic/crash situation we might have memory
> corruption exactly in the list code / pointers, etc, so the notifier
> lists are, by nature, a bit fragile. But I think we shouldn't consider
> it completely "bollocks", since this approach has been used for a while
> with a good success rate. So, lists aren't perfect at all, but at the
> same time, they aren't completely useless.

I am not able to judge this. Of course, any extra step increases
the risk. I am just not sure how much more complicated it would
be to hardcode the calls. Most of them are architecture
and/or feature specific. And such code is often hard to
review and maintain.

> To avoid using a 4th list,

4th or 5th? We already have "hypervisor", "info", "pre-reboot", and "pre-loop".
The 5th might be pre-crash-exec.

> especially given the list nature is a bit
> fragile, I'd suggest one of the 3 following approaches - I *really
> appreciate feedbacks* on that so I can implement the best solution and
> avoid wasting time in some poor/disliked solution:

Honestly, I am not able to decide what might be better without seeing
the code.

Most things fits pretty well into the 4 proposed lists:
"hypervisor", "info", "pre-reboot", and "pre-loop". IMHO, the
only question is the code that needs to be always called
even before crash_dump.

I suggest that you solve the crash_dump callbacks the way that
looks best to you. Ideally do it in a separate patch so it can be
reviewed and reworked more easily.

I believe that a fresh code with an updated split and simplified
logic would help us to move forward.

Best Regards,
Petr



Re: [PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:37PM +0200, Peter Zijlstra wrote:
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -622,9 +622,13 @@ struct cpumask *tick_get_broadcast_onesh
>   * to avoid a deep idle transition as we are about to get the
>   * broadcast IPI right away.
>   */
> -int tick_check_broadcast_expired(void)
> +noinstr int tick_check_broadcast_expired(void)
>  {
> +#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
> + return arch_test_bit(smp_processor_id(), 
> cpumask_bits(tick_broadcast_force_mask));
> +#else
>   return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
> +#endif
>  }

This is somewhat not-ideal. :/

Could we unconditionally do the arch_test_bit() variant, with a comment, or
does that not exist in some cases?

Thanks,
Mark.



[PULL 12/15] virtio-gpu: update done only on the scanout associated with rect

2022-06-14 Thread Gerd Hoffmann
From: Dongwon Kim 

It only needs to update the scanouts containing the rect area
coming with the resource-flush request from the guest.

Cc: Gerd Hoffmann 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
Message-Id: <20220505214030.4261-1-dongwon@intel.com>
Signed-off-by: Gerd Hoffmann 
---
 hw/display/virtio-gpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index cd4a56056fd9..55c6dd576318 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -514,6 +514,9 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g,
 for (i = 0; i < g->parent_obj.conf.max_outputs; i++) {
 scanout = &g->parent_obj.scanout[i];
 if (scanout->resource_id == res->resource_id &&
+rf.r.x >= scanout->x && rf.r.y >= scanout->y &&
+rf.r.x + rf.r.width <= scanout->x + scanout->width &&
+rf.r.y + rf.r.height <= scanout->y + scanout->height &&
 console_has_gl(scanout->con)) {
 dpy_gl_update(scanout->con, 0, 0, scanout->width,
   scanout->height);
-- 
2.36.1




[PULL 11/15] usbredir: avoid queuing hello packet on snapshot restore

2022-06-14 Thread Gerd Hoffmann
From: Joelle van Dyne 

When launching QEMU with "-loadvm", usbredir_create_parser() should avoid
setting up the hello packet (just as with "-incoming". On the latest version
of libusbredir, usbredirparser_unserialize() will return error if the parser
is not "pristine."

Signed-off-by: Joelle van Dyne 
Message-Id: <20220507041850.98716-...@getutm.app>
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/redirect.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index fd7df599bc0b..1bd30efc3ef0 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1280,7 +1280,8 @@ static void usbredir_create_parser(USBRedirDevice *dev)
 }
 #endif
 
-if (runstate_check(RUN_STATE_INMIGRATE)) {
+if (runstate_check(RUN_STATE_INMIGRATE) ||
+runstate_check(RUN_STATE_PRELAUNCH)) {
 flags |= usbredirparser_fl_no_hello;
 }
 usbredirparser_init(dev->parser, VERSION, caps, USB_REDIR_CAPS_SIZE,
-- 
2.36.1




Re: [PULL 00/16] Kraxel 20220613 patches

2022-06-14 Thread Gerd Hoffmann
On Mon, Jun 13, 2022 at 08:52:21AM -0700, Richard Henderson wrote:
> On 6/13/22 04:36, Gerd Hoffmann wrote:
> > The following changes since commit dcb40541ebca7ec98a14d461593b3cd7282b4fac:
> > 
> >Merge tag 'mips-20220611' of https://github.com/philmd/qemu into staging 
> > (2022-06-11 21:13:27 -0700)
> > 
> > are available in the Git repository at:
> > 
> >git://git.kraxel.org/qemu tags/kraxel-20220613-pull-request
> > 
> > for you to fetch changes up to 23b87f7a3a13e93e248eef8a4b7257548855a620:
> > 
> >ui: move 'pc-bios/keymaps' to 'ui/keymaps' (2022-06-13 10:59:25 +0200)
> > 
> > 
> > usb: add CanoKey device, fixes for ehci + redir
> > ui: fixes for gtk and cocoa, move keymaps (v2), rework refresh rate
> > virtio-gpu: scanout flush fix
> 
> This doesn't even configure:
> 
> ../src/ui/keymaps/meson.build:55:4: ERROR: File ar does not exist.

dropped keymaps patch for now, new version sent.

take care,
  Gerd




[PULL 14/15] ui: Deliver refresh rate via QemuUIInfo

2022-06-14 Thread Gerd Hoffmann
From: Akihiko Odaki 

This change adds a new member, refresh_rate to QemuUIInfo in
include/ui/console.h. It represents the refresh rate of the
physical display backend, and it is more appropriate than
GUI update interval as the refresh rate which the emulated device
reports:
- sdl may set GUI update interval shorter than the refresh rate
  of the physical display to respond to user-generated events.
- sdl and vnc aggressively changes GUI update interval, but
  a guests is typically not designed to respond to frequent
  refresh rate changes, or frequent "display mode" changes in
  general. The frequency of refresh rate changes of the physical
  display backend matches better to the guest's expectation.

QemuUIInfo also has other members representing "display mode",
which makes it suitable for refresh rate representation. It has
a throttling of update notifications, and prevents frequent changes
of the display mode.

Signed-off-by: Akihiko Odaki 
Message-Id: <20220226115516.59830-3-akihiko.od...@gmail.com>
Signed-off-by: Gerd Hoffmann 
---
 include/ui/console.h |  2 +-
 include/ui/gtk.h |  2 +-
 hw/display/xenfb.c   | 14 +++---
 ui/console.c |  6 --
 ui/gtk-egl.c |  4 ++--
 ui/gtk-gl-area.c |  3 +--
 ui/gtk.c | 45 +---
 7 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index 642d6f5248cf..b64d82436097 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -139,6 +139,7 @@ typedef struct QemuUIInfo {
 int   yoff;
 uint32_t  width;
 uint32_t  height;
+uint32_t  refresh_rate;
 } QemuUIInfo;
 
 /* cursor data format is 32bit RGBA */
@@ -431,7 +432,6 @@ typedef struct GraphicHwOps {
 void (*gfx_update)(void *opaque);
 bool gfx_update_async; /* if true, calls graphic_hw_update_done() */
 void (*text_update)(void *opaque, console_ch_t *text);
-void (*update_interval)(void *opaque, uint64_t interval);
 void (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info);
 void (*gl_block)(void *opaque, bool block);
 } GraphicHwOps;
diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 101b147d1b98..ae0f53740d19 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -155,7 +155,7 @@ extern bool gtk_use_gl_area;
 
 /* ui/gtk.c */
 void gd_update_windowsize(VirtualConsole *vc);
-int gd_monitor_update_interval(GtkWidget *widget);
+void gd_update_monitor_refresh_rate(VirtualConsole *vc, GtkWidget *widget);
 void gd_hw_gl_flushed(void *vc);
 
 /* ui/gtk-egl.c */
diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c
index cea10fe3c780..50857cd97a0b 100644
--- a/hw/display/xenfb.c
+++ b/hw/display/xenfb.c
@@ -777,16 +777,24 @@ static void xenfb_update(void *opaque)
 xenfb->up_fullscreen = 0;
 }
 
-static void xenfb_update_interval(void *opaque, uint64_t interval)
+static void xenfb_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info)
 {
 struct XenFB *xenfb = opaque;
+uint32_t refresh_rate;
 
 if (xenfb->feature_update) {
 #ifdef XENFB_TYPE_REFRESH_PERIOD
 if (xenfb_queue_full(xenfb)) {
 return;
 }
-xenfb_send_refresh_period(xenfb, interval);
+
+refresh_rate = info->refresh_rate;
+if (!refresh_rate) {
+refresh_rate = 75;
+}
+
+/* T = 1 / f = 1 [s*Hz] / f = 1000*1000 [ms*mHz] / f */
+xenfb_send_refresh_period(xenfb, 1000 * 1000 / refresh_rate);
 #endif
 }
 }
@@ -983,5 +991,5 @@ struct XenDevOps xen_framebuffer_ops = {
 static const GraphicHwOps xenfb_ops = {
 .invalidate  = xenfb_invalidate,
 .gfx_update  = xenfb_update,
-.update_interval = xenfb_update_interval,
+.ui_info = xenfb_ui_info,
 };
diff --git a/ui/console.c b/ui/console.c
index 36c80cd1de85..9331b85203a0 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -160,7 +160,6 @@ static void gui_update(void *opaque)
 uint64_t dcl_interval;
 DisplayState *ds = opaque;
 DisplayChangeListener *dcl;
-QemuConsole *con;
 
 ds->refreshing = true;
 dpy_refresh(ds);
@@ -175,11 +174,6 @@ static void gui_update(void *opaque)
 }
 if (ds->update_interval != interval) {
 ds->update_interval = interval;
-QTAILQ_FOREACH(con, &consoles, next) {
-if (con->hw_ops->update_interval) {
-con->hw_ops->update_interval(con->hw, interval);
-}
-}
 trace_console_refresh(interval);
 }
 ds->last_update = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index e3bd4bc27431..b5bffbab2522 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -140,8 +140,8 @@ void gd_egl_refresh(DisplayChangeListener *dcl)
 {
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
-vc->gfx.dcl.update_interval = gd_monitor_update_interval(
-vc->window ? vc->window : vc->gfx.drawing_area);
+gd_update_monitor_refresh_rate(
+vc, v

[PULL 15/15] virtio-gpu: Respect UI refresh rate for EDID

2022-06-14 Thread Gerd Hoffmann
From: Akihiko Odaki 

Signed-off-by: Akihiko Odaki 
Message-Id: <20220226115516.59830-4-akihiko.od...@gmail.com>
Signed-off-by: Gerd Hoffmann 
---
 include/hw/virtio/virtio-gpu.h | 1 +
 hw/display/virtio-gpu-base.c   | 1 +
 hw/display/virtio-gpu.c| 1 +
 3 files changed, 3 insertions(+)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index afff9e158e31..2e28507efe21 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -80,6 +80,7 @@ struct virtio_gpu_scanout {
 struct virtio_gpu_requested_state {
 uint16_t width_mm, height_mm;
 uint32_t width, height;
+uint32_t refresh_rate;
 int x, y;
 };
 
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index b21d6e5b0be8..a29f191aa82e 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -79,6 +79,7 @@ static void virtio_gpu_ui_info(void *opaque, uint32_t idx, 
QemuUIInfo *info)
 
 g->req_state[idx].x = info->xoff;
 g->req_state[idx].y = info->yoff;
+g->req_state[idx].refresh_rate = info->refresh_rate;
 g->req_state[idx].width = info->width;
 g->req_state[idx].height = info->height;
 g->req_state[idx].width_mm = info->width_mm;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 55c6dd576318..20cc703dcc6e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -217,6 +217,7 @@ virtio_gpu_generate_edid(VirtIOGPU *g, int scanout,
 .height_mm = b->req_state[scanout].height_mm,
 .prefx = b->req_state[scanout].width,
 .prefy = b->req_state[scanout].height,
+.refresh_rate = b->req_state[scanout].refresh_rate,
 };
 
 edid->size = cpu_to_le32(sizeof(edid->edid));
-- 
2.36.1




[PULL 10/15] hw/usb/hcd-ehci: fix writeback order

2022-06-14 Thread Gerd Hoffmann
From: Arnout Engelen 

The 'active' bit passes control over a qTD between the guest and the
controller: set to 1 by guest to enable execution by the controller,
and the controller sets it to '0' to hand back control to the guest.

ehci_state_writeback write two dwords to main memory using DMA:
the third dword of the qTD (containing dt, total bytes to transfer,
cpage, cerr and status) and the fourth dword of the qTD (containing
the offset).

This commit makes sure the fourth dword is written before the third,
avoiding a race condition where a new offset written into the qTD
by the guest after it observed the status going to go to '0' gets
overwritten by a 'late' DMA writeback of the previous offset.

This race condition could lead to 'cpage out of range (5)' errors,
and reproduced by:

./qemu-system-x86_64 -enable-kvm -bios $SEABIOS/bios.bin -m 4096 -device 
usb-ehci -blockdev 
driver=file,read-only=on,filename=/home/aengelen/Downloads/openSUSE-Tumbleweed-DVD-i586-Snapshot20220428-Media.iso,node-name=iso
 -device usb-storage,drive=iso,bootindex=0 -chardev 
pipe,id=shell,path=/tmp/pipe -device virtio-serial -device 
virtconsole,chardev=shell -device virtio-rng-pci -serial mon:stdio -nographic

(press a key, select 'Installation' (2), and accept the default
values. On my machine the 'cpage out of range' is reproduced while
loading the Linux Kernel about once per 7 attempts. With the fix in
this commit it no longer fails)

This problem was previously reported as a seabios problem in
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/thread/OUTHT5ISSQJGXPNTUPY3O5E5EPZJCHM3/
and as a nixos CI build failure in
https://github.com/NixOS/nixpkgs/issues/170803

Signed-off-by: Arnout Engelen 
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-ehci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 33a8a377bd95..d4da8dcb8d15 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2011,7 +2011,10 @@ static int ehci_state_writeback(EHCIQueue *q)
 ehci_trace_qtd(q, NLPTR_GET(p->qtdaddr), (EHCIqtd *) &q->qh.next_qtd);
 qtd = (uint32_t *) &q->qh.next_qtd;
 addr = NLPTR_GET(p->qtdaddr);
-put_dwords(q->ehci, addr + 2 * sizeof(uint32_t), qtd + 2, 2);
+/* First write back the offset */
+put_dwords(q->ehci, addr + 3 * sizeof(uint32_t), qtd + 3, 1);
+/* Then write back the token, clearing the 'active' bit */
+put_dwords(q->ehci, addr + 2 * sizeof(uint32_t), qtd + 2, 1);
 ehci_free_packet(p);
 
 /*
-- 
2.36.1




[PULL 13/15] ui/console: Do not return a value with ui_info

2022-06-14 Thread Gerd Hoffmann
From: Akihiko Odaki 

The returned value is not used and misleading.

Signed-off-by: Akihiko Odaki 
Message-Id: <20220226115516.59830-2-akihiko.od...@gmail.com>
Signed-off-by: Gerd Hoffmann 
---
 include/ui/console.h | 2 +-
 hw/display/virtio-gpu-base.c | 6 +++---
 hw/display/virtio-vga.c  | 5 ++---
 hw/vfio/display.c| 8 +++-
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index c44b28a972ca..642d6f5248cf 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -432,7 +432,7 @@ typedef struct GraphicHwOps {
 bool gfx_update_async; /* if true, calls graphic_hw_update_done() */
 void (*text_update)(void *opaque, console_ch_t *text);
 void (*update_interval)(void *opaque, uint64_t interval);
-int (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info);
+void (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info);
 void (*gl_block)(void *opaque, bool block);
 } GraphicHwOps;
 
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 790cec333c8c..b21d6e5b0be8 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -69,12 +69,12 @@ static void virtio_gpu_notify_event(VirtIOGPUBase *g, 
uint32_t event_type)
 virtio_notify_config(&g->parent_obj);
 }
 
-static int virtio_gpu_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info)
+static void virtio_gpu_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info)
 {
 VirtIOGPUBase *g = opaque;
 
 if (idx >= g->conf.max_outputs) {
-return -1;
+return;
 }
 
 g->req_state[idx].x = info->xoff;
@@ -92,7 +92,7 @@ static int virtio_gpu_ui_info(void *opaque, uint32_t idx, 
QemuUIInfo *info)
 
 /* send event to guest */
 virtio_gpu_notify_event(g, VIRTIO_GPU_EVENT_DISPLAY);
-return 0;
+return;
 }
 
 static void
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index c206b5da384b..4dcb34c4a740 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -47,15 +47,14 @@ static void virtio_vga_base_text_update(void *opaque, 
console_ch_t *chardata)
 }
 }
 
-static int virtio_vga_base_ui_info(void *opaque, uint32_t idx, QemuUIInfo 
*info)
+static void virtio_vga_base_ui_info(void *opaque, uint32_t idx, QemuUIInfo 
*info)
 {
 VirtIOVGABase *vvga = opaque;
 VirtIOGPUBase *g = vvga->vgpu;
 
 if (g->hw_ops->ui_info) {
-return g->hw_ops->ui_info(g, idx, info);
+g->hw_ops->ui_info(g, idx, info);
 }
-return -1;
 }
 
 static void virtio_vga_base_gl_block(void *opaque, bool block)
diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 89bc90508fb8..78f4d82c1c35 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -106,14 +106,14 @@ err:
 return;
 }
 
-static int vfio_display_edid_ui_info(void *opaque, uint32_t idx,
- QemuUIInfo *info)
+static void vfio_display_edid_ui_info(void *opaque, uint32_t idx,
+  QemuUIInfo *info)
 {
 VFIOPCIDevice *vdev = opaque;
 VFIODisplay *dpy = vdev->dpy;
 
 if (!dpy->edid_regs) {
-return 0;
+return;
 }
 
 if (info->width && info->height) {
@@ -121,8 +121,6 @@ static int vfio_display_edid_ui_info(void *opaque, uint32_t 
idx,
 } else {
 vfio_display_edid_update(vdev, false, 0, 0);
 }
-
-return 0;
 }
 
 static void vfio_display_edid_init(VFIOPCIDevice *vdev)
-- 
2.36.1




[PULL 09/15] MAINTAINERS: add myself as CanoKey maintainer

2022-06-14 Thread Gerd Hoffmann
From: "Hongren (Zenithal) Zheng" 

Signed-off-by: Hongren (Zenithal) Zheng 
Message-Id: 
Signed-off-by: Gerd Hoffmann 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0df25ed4b0a3..4cf6174f9f37 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2427,6 +2427,14 @@ F: hw/intc/s390_flic*.c
 F: include/hw/s390x/s390_flic.h
 L: qemu-s3...@nongnu.org
 
+CanoKey
+M: Hongren (Zenithal) Zheng 
+S: Maintained
+R: Canokeys.org 
+F: hw/usb/canokey.c
+F: hw/usb/canokey.h
+F: docs/system/devices/canokey.rst
+
 Subsystems
 --
 Overall Audio backends
-- 
2.36.1




[PULL 05/15] hw/usb/canokey: Add trace events

2022-06-14 Thread Gerd Hoffmann
From: "Hongren (Zenithal) Zheng" 

Signed-off-by: Hongren (Zenithal) Zheng 
Message-Id: 
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/canokey.c| 13 +
 hw/usb/trace-events | 16 
 2 files changed, 29 insertions(+)

diff --git a/hw/usb/canokey.c b/hw/usb/canokey.c
index 6cb8b7cdb089..4a08b1cbd776 100644
--- a/hw/usb/canokey.c
+++ b/hw/usb/canokey.c
@@ -14,6 +14,7 @@
 #include "qapi/error.h"
 #include "hw/usb.h"
 #include "hw/qdev-properties.h"
+#include "trace.h"
 #include "desc.h"
 #include "canokey.h"
 
@@ -66,6 +67,7 @@ static const USBDesc desc_canokey = {
  */
 int canokey_emu_stall_ep(void *base, uint8_t ep)
 {
+trace_canokey_emu_stall_ep(ep);
 CanoKeyState *key = base;
 uint8_t ep_in = CANOKEY_EP_IN(ep); /* INTR IN has ep 129 */
 key->ep_in_size[ep_in] = 0;
@@ -75,6 +77,7 @@ int canokey_emu_stall_ep(void *base, uint8_t ep)
 
 int canokey_emu_set_address(void *base, uint8_t addr)
 {
+trace_canokey_emu_set_address(addr);
 CanoKeyState *key = base;
 key->dev.addr = addr;
 return 0;
@@ -83,6 +86,7 @@ int canokey_emu_set_address(void *base, uint8_t addr)
 int canokey_emu_prepare_receive(
 void *base, uint8_t ep, uint8_t *pbuf, uint16_t size)
 {
+trace_canokey_emu_prepare_receive(ep, size);
 CanoKeyState *key = base;
 key->ep_out[ep] = pbuf;
 key->ep_out_size[ep] = size;
@@ -92,6 +96,7 @@ int canokey_emu_prepare_receive(
 int canokey_emu_transmit(
 void *base, uint8_t ep, const uint8_t *pbuf, uint16_t size)
 {
+trace_canokey_emu_transmit(ep, size);
 CanoKeyState *key = base;
 uint8_t ep_in = CANOKEY_EP_IN(ep); /* INTR IN has ep 129 */
 memcpy(key->ep_in[ep_in] + key->ep_in_size[ep_in],
@@ -125,6 +130,7 @@ uint32_t canokey_emu_get_rx_data_size(void *base, uint8_t 
ep)
  */
 static void canokey_handle_reset(USBDevice *dev)
 {
+trace_canokey_handle_reset();
 CanoKeyState *key = CANOKEY(dev);
 for (int i = 0; i != CANOKEY_EP_NUM; ++i) {
 key->ep_in_state[i] = CANOKEY_EP_IN_WAIT;
@@ -137,6 +143,7 @@ static void canokey_handle_reset(USBDevice *dev)
 static void canokey_handle_control(USBDevice *dev, USBPacket *p,
int request, int value, int index, int length, uint8_t *data)
 {
+trace_canokey_handle_control_setup(request, value, index, length);
 CanoKeyState *key = CANOKEY(dev);
 
 canokey_emu_setup(request, value, index, length);
@@ -144,6 +151,7 @@ static void canokey_handle_control(USBDevice *dev, 
USBPacket *p,
 uint32_t dir_in = request & DeviceRequest;
 if (!dir_in) {
 /* OUT */
+trace_canokey_handle_control_out();
 if (key->ep_out[0] != NULL) {
 memcpy(key->ep_out[0], data, length);
 }
@@ -163,6 +171,7 @@ static void canokey_handle_control(USBDevice *dev, 
USBPacket *p,
 case CANOKEY_EP_IN_READY:
 memcpy(data, key->ep_in[0], key->ep_in_size[0]);
 p->actual_length = key->ep_in_size[0];
+trace_canokey_handle_control_in(p->actual_length);
 /* reset state */
 key->ep_in_state[0] = CANOKEY_EP_IN_WAIT;
 key->ep_in_size[0] = 0;
@@ -182,6 +191,7 @@ static void canokey_handle_data(USBDevice *dev, USBPacket 
*p)
 uint32_t out_len;
 switch (p->pid) {
 case USB_TOKEN_OUT:
+trace_canokey_handle_data_out(ep_out, p->iov.size);
 usb_packet_copy(p, key->ep_out_buffer[ep_out], p->iov.size);
 out_pos = 0;
 while (out_pos != p->iov.size) {
@@ -226,6 +236,7 @@ static void canokey_handle_data(USBDevice *dev, USBPacket 
*p)
 key->ep_in_size[ep_in] = 0;
 key->ep_in_pos[ep_in] = 0;
 }
+trace_canokey_handle_data_in(ep_in, in_len);
 break;
 }
 break;
@@ -237,6 +248,7 @@ static void canokey_handle_data(USBDevice *dev, USBPacket 
*p)
 
 static void canokey_realize(USBDevice *base, Error **errp)
 {
+trace_canokey_realize();
 CanoKeyState *key = CANOKEY(base);
 
 if (key->file == NULL) {
@@ -260,6 +272,7 @@ static void canokey_realize(USBDevice *base, Error **errp)
 
 static void canokey_unrealize(USBDevice *base)
 {
+trace_canokey_unrealize();
 }
 
 static Property canokey_properties[] = {
diff --git a/hw/usb/trace-events b/hw/usb/trace-events
index 9773cb53300d..914ca7166829 100644
--- a/hw/usb/trace-events
+++ b/hw/usb/trace-events
@@ -345,3 +345,19 @@ usb_serial_set_baud(int bus, int addr, int baud) "dev 
%d:%u baud rate %d"
 usb_serial_set_data(int bus, int addr, int parity, int data, int stop) "dev 
%d:%u parity %c, data bits %d, stop bits %d"
 usb_serial_set_flow_control(int bus, int addr, int index) "dev %d:%u flow 
control %d"
 usb_serial_set_xonxoff(int bus, int addr, uint8_t xon, uint8_t xoff) "dev 
%d:%u xon 0x%x xoff 0x%x"
+
+# canokey.c
+canokey_emu_stall_ep(uint8_t ep) "ep %d"
+canokey_emu_set_address(uint8_t addr) "addr %d"
+canokey_emu_prepare_receive(uint8_t ep, uint16_t size) "ep %d size %d"
+canokey_emu_transmit(uint8_t e

[PULL 07/15] docs: Add CanoKey documentation

2022-06-14 Thread Gerd Hoffmann
From: "Hongren (Zenithal) Zheng" 

Signed-off-by: Hongren (Zenithal) Zheng 
Message-Id: 
Signed-off-by: Gerd Hoffmann 
---
 docs/system/device-emulation.rst |   1 +
 docs/system/devices/canokey.rst  | 168 +++
 2 files changed, 169 insertions(+)
 create mode 100644 docs/system/devices/canokey.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 3b729b920d7c..05060060563f 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -92,3 +92,4 @@ Emulated Devices
devices/vhost-user.rst
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
+   devices/canokey.rst
diff --git a/docs/system/devices/canokey.rst b/docs/system/devices/canokey.rst
new file mode 100644
index ..169f99b8eb82
--- /dev/null
+++ b/docs/system/devices/canokey.rst
@@ -0,0 +1,168 @@
+.. _canokey:
+
+CanoKey QEMU
+
+
+CanoKey [1]_ is an open-source secure key with supports of
+
+* U2F / FIDO2 with Ed25519 and HMAC-secret
+* OpenPGP Card V3.4 with RSA4096, Ed25519 and more [2]_
+* PIV (NIST SP 800-73-4)
+* HOTP / TOTP
+* NDEF
+
+All these platform-independent features are in canokey-core [3]_.
+
+For different platforms, CanoKey has different implementations,
+including both hardware implementions and virtual cards:
+
+* CanoKey STM32 [4]_
+* CanoKey Pigeon [5]_
+* (virt-card) CanoKey USB/IP
+* (virt-card) CanoKey FunctionFS
+
+In QEMU, yet another CanoKey virt-card is implemented.
+CanoKey QEMU exposes itself as a USB device to the guest OS.
+
+With the same software configuration as a hardware key,
+the guest OS can use all the functionalities of a secure key as if
+there was actually an hardware key plugged in.
+
+CanoKey QEMU provides much convenience for debuging:
+
+* libcanokey-qemu supports debuging output thus developers can
+  inspect what happens inside a secure key
+* CanoKey QEMU supports trace event thus event
+* QEMU USB stack supports pcap thus USB packet between the guest
+  and key can be captured and analysed
+
+Then for developers:
+
+* For developers on software with secure key support (e.g. FIDO2, OpenPGP),
+  they can see what happens inside the secure key
+* For secure key developers, USB packets between guest OS and CanoKey
+  can be easily captured and analysed
+
+Also since this is a virtual card, it can be easily used in CI for testing
+on code coping with secure key.
+
+Building
+
+
+libcanokey-qemu is required to use CanoKey QEMU.
+
+.. code-block:: shell
+
+git clone https://github.com/canokeys/canokey-qemu
+mkdir canokey-qemu/build
+pushd canokey-qemu/build
+
+If you want to install libcanokey-qemu in a different place,
+add ``-DCMAKE_INSTALL_PREFIX=/path/to/your/place`` to cmake below.
+
+.. code-block:: shell
+
+cmake ..
+make
+make install # may need sudo
+popd
+
+Then configuring and building:
+
+.. code-block:: shell
+
+# depending on your env, lib/pkgconfig can be lib64/pkgconfig
+export PKG_CONFIG_PATH=/path/to/your/place/lib/pkgconfig:$PKG_CONFIG_PATH
+./configure --enable-canokey && make
+
+Using CanoKey QEMU
+==
+
+CanoKey QEMU stores all its data on a file of the host specified by the 
argument
+when invoking qemu.
+
+.. parsed-literal::
+
+|qemu_system| -usb -device canokey,file=$HOME/.canokey-file
+
+Note: you should keep this file carefully as it may contain your private key!
+
+The first time when the file is used, it is created and initialized by CanoKey,
+afterwards CanoKey QEMU would just read this file.
+
+After the guest OS boots, you can check that there is a USB device.
+
+For example, If the guest OS is an Linux machine. You may invoke lsusb
+and find CanoKey QEMU there:
+
+.. code-block:: shell
+
+$ lsusb
+Bus 001 Device 002: ID 20a0:42d4 Clay Logic CanoKey QEMU
+
+You may setup the key as guided in [6]_. The console for the key is at [7]_.
+
+Debuging
+
+
+CanoKey QEMU consists of two parts, ``libcanokey-qemu.so`` and ``canokey.c``,
+the latter of which resides in QEMU. The former provides core functionality
+of a secure key while the latter provides platform-dependent functions:
+USB packet handling.
+
+If you want to trace what happens inside the secure key, when compiling
+libcanokey-qemu, you should add ``-DQEMU_DEBUG_OUTPUT=ON`` in cmake command
+line:
+
+.. code-block:: shell
+
+cmake .. -DQEMU_DEBUG_OUTPUT=ON
+
+If you want to trace events happened in canokey.c, use
+
+.. parsed-literal::
+
+|qemu_system| --trace "canokey_*" \\
+-usb -device canokey,file=$HOME/.canokey-file
+
+If you want to capture USB packets between the guest and the host, you can:
+
+.. parsed-literal::
+
+|qemu_system| -usb -device canokey,file=$HOME/.canokey-file,pcap=key.pcap
+
+Limitations
+===
+
+Currently libcanokey-qemu.so has dozens of global variables as it was 
originally
+designed for embedded systems. Thus one qemu instance can not have
+multiple CanoKey QEMU runn

[PULL 06/15] meson: Add CanoKey

2022-06-14 Thread Gerd Hoffmann
From: "Hongren (Zenithal) Zheng" 

Signed-off-by: Hongren (Zenithal) Zheng 
Message-Id: 
Signed-off-by: Gerd Hoffmann 
---
 meson_options.txt | 2 ++
 hw/usb/Kconfig| 5 +
 hw/usb/meson.build| 5 +
 meson.build   | 6 ++
 scripts/meson-buildoptions.sh | 3 +++
 5 files changed, 21 insertions(+)

diff --git a/meson_options.txt b/meson_options.txt
index 2de94af03712..0e8197386b99 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -189,6 +189,8 @@ option('spice_protocol', type : 'feature', value : 'auto',
description: 'Spice protocol support')
 option('u2f', type : 'feature', value : 'auto',
description: 'U2F emulation support')
+option('canokey', type : 'feature', value : 'auto',
+   description: 'CanoKey support')
 option('usb_redir', type : 'feature', value : 'auto',
description: 'libusbredir support')
 option('l2tpv3', type : 'feature', value : 'auto',
diff --git a/hw/usb/Kconfig b/hw/usb/Kconfig
index 53f8283ffdc1..ce4f4339763e 100644
--- a/hw/usb/Kconfig
+++ b/hw/usb/Kconfig
@@ -119,6 +119,11 @@ config USB_U2F
 default y
 depends on USB
 
+config USB_CANOKEY
+bool
+default y
+depends on USB
+
 config IMX_USBPHY
 bool
 default y
diff --git a/hw/usb/meson.build b/hw/usb/meson.build
index de853d780dd8..793df42e2127 100644
--- a/hw/usb/meson.build
+++ b/hw/usb/meson.build
@@ -63,6 +63,11 @@ if u2f.found()
   softmmu_ss.add(when: 'CONFIG_USB_U2F', if_true: [u2f, 
files('u2f-emulated.c')])
 endif
 
+# CanoKey
+if canokey.found()
+  softmmu_ss.add(when: 'CONFIG_USB_CANOKEY', if_true: [canokey, 
files('canokey.c')])
+endif
+
 # usb redirect
 if usbredir.found()
   usbredir_ss = ss.source_set()
diff --git a/meson.build b/meson.build
index 21cd949082dc..0c2e11ff0715 100644
--- a/meson.build
+++ b/meson.build
@@ -1408,6 +1408,12 @@ if have_system
method: 'pkg-config',
kwargs: static_kwargs)
 endif
+canokey = not_found
+if have_system
+  canokey = dependency('canokey-qemu', required: get_option('canokey'),
+   method: 'pkg-config',
+   kwargs: static_kwargs)
+endif
 usbredir = not_found
 if not get_option('usb_redir').auto() or have_system
   usbredir = dependency('libusbredirparser-0.5', required: 
get_option('usb_redir'),
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 00ea4d8cd169..1fc1d2e2c362 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -73,6 +73,7 @@ meson_options_help() {
   printf "%s\n" '  bpf eBPF support'
   printf "%s\n" '  brlapi  brlapi character device driver'
   printf "%s\n" '  bzip2   bzip2 support for DMG images'
+  printf "%s\n" '  canokey CanoKey support'
   printf "%s\n" '  cap-ng  cap_ng support'
   printf "%s\n" '  capstoneWhether and how to find the capstone 
library'
   printf "%s\n" '  cloop   cloop image format support'
@@ -204,6 +205,8 @@ _meson_option_parse() {
 --disable-brlapi) printf "%s" -Dbrlapi=disabled ;;
 --enable-bzip2) printf "%s" -Dbzip2=enabled ;;
 --disable-bzip2) printf "%s" -Dbzip2=disabled ;;
+--enable-canokey) printf "%s" -Dcanokey=enabled ;;
+--disable-canokey) printf "%s" -Dcanokey=disabled ;;
 --enable-cap-ng) printf "%s" -Dcap_ng=enabled ;;
 --disable-cap-ng) printf "%s" -Dcap_ng=disabled ;;
 --enable-capstone) printf "%s" -Dcapstone=enabled ;;
-- 
2.36.1




[PULL 04/15] hw/usb: Add CanoKey Implementation

2022-06-14 Thread Gerd Hoffmann
From: "Hongren (Zenithal) Zheng" 

This commit added a new emulated device called CanoKey to QEMU.

CanoKey implements platform independent features in canokey-core
https://github.com/canokeys/canokey-core, and leaves the USB implementation
to the platform.

In this commit the USB part was implemented in QEMU using QEMU's USB APIs,
therefore the emulated CanoKey can communicate with the guest OS using USB.

Signed-off-by: Hongren (Zenithal) Zheng 
Message-Id: 
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/canokey.h |  69 +++
 hw/usb/canokey.c | 300 +++
 2 files changed, 369 insertions(+)
 create mode 100644 hw/usb/canokey.h
 create mode 100644 hw/usb/canokey.c

diff --git a/hw/usb/canokey.h b/hw/usb/canokey.h
new file mode 100644
index ..24cf30420346
--- /dev/null
+++ b/hw/usb/canokey.h
@@ -0,0 +1,69 @@
+/*
+ * CanoKey QEMU device header.
+ *
+ * Copyright (c) 2021-2022 Canokeys.org 
+ * Written by Hongren (Zenithal) Zheng 
+ *
+ * This code is licensed under the Apache-2.0.
+ */
+
+#ifndef CANOKEY_H
+#define CANOKEY_H
+
+#include "hw/qdev-core.h"
+
+#define TYPE_CANOKEY "canokey"
+#define CANOKEY(obj) \
+OBJECT_CHECK(CanoKeyState, (obj), TYPE_CANOKEY)
+
+/*
+ * State of Canokey (i.e. hw/canokey.c)
+ */
+
+/* CTRL INTR BULK */
+#define CANOKEY_EP_NUM 3
+/* BULK/INTR IN can be up to 1352 bytes, e.g. get key info */
+#define CANOKEY_EP_IN_BUFFER_SIZE 2048
+/* BULK OUT can be up to 270 bytes, e.g. PIV import cert */
+#define CANOKEY_EP_OUT_BUFFER_SIZE 512
+
+typedef enum {
+CANOKEY_EP_IN_WAIT,
+CANOKEY_EP_IN_READY,
+CANOKEY_EP_IN_STALL
+} CanoKeyEPState;
+
+typedef struct CanoKeyState {
+USBDevice dev;
+
+/* IN packets from canokey device loop */
+uint8_t ep_in[CANOKEY_EP_NUM][CANOKEY_EP_IN_BUFFER_SIZE];
+/*
+ * See canokey_emu_transmit
+ *
+ * For large INTR IN, receive multiple data from canokey device loop
+ * in this case ep_in_size would increase with every call
+ */
+uint32_t ep_in_size[CANOKEY_EP_NUM];
+/*
+ * Used in canokey_handle_data
+ * for IN larger than p->iov.size, we would do multiple handle_data()
+ *
+ * The difference between ep_in_pos and ep_in_size:
+ * We first increase ep_in_size to fill ep_in buffer in device_loop,
+ * then use ep_in_pos to submit data from ep_in buffer in handle_data
+ */
+uint32_t ep_in_pos[CANOKEY_EP_NUM];
+CanoKeyEPState ep_in_state[CANOKEY_EP_NUM];
+
+/* OUT pointer to canokey recv buffer */
+uint8_t *ep_out[CANOKEY_EP_NUM];
+uint32_t ep_out_size[CANOKEY_EP_NUM];
+/* For large BULK OUT, multiple write to ep_out is needed */
+uint8_t ep_out_buffer[CANOKEY_EP_NUM][CANOKEY_EP_OUT_BUFFER_SIZE];
+
+/* Properties */
+char *file; /* canokey-file */
+} CanoKeyState;
+
+#endif /* CANOKEY_H */
diff --git a/hw/usb/canokey.c b/hw/usb/canokey.c
new file mode 100644
index ..6cb8b7cdb089
--- /dev/null
+++ b/hw/usb/canokey.c
@@ -0,0 +1,300 @@
+/*
+ * CanoKey QEMU device implementation.
+ *
+ * Copyright (c) 2021-2022 Canokeys.org 
+ * Written by Hongren (Zenithal) Zheng 
+ *
+ * This code is licensed under the Apache-2.0.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "qemu/module.h"
+#include "qapi/error.h"
+#include "hw/usb.h"
+#include "hw/qdev-properties.h"
+#include "desc.h"
+#include "canokey.h"
+
+#define CANOKEY_EP_IN(ep) ((ep) & 0x7F)
+
+#define CANOKEY_VENDOR_NUM 0x20a0
+#define CANOKEY_PRODUCT_NUM0x42d2
+
+/*
+ * placeholder, canokey-qemu implements its own usb desc
+ * Namely we do not use usb_desc_handle_contorl
+ */
+enum {
+STR_MANUFACTURER = 1,
+STR_PRODUCT,
+STR_SERIALNUMBER
+};
+
+static const USBDescStrings desc_strings = {
+[STR_MANUFACTURER] = "canokeys.org",
+[STR_PRODUCT]  = "CanoKey QEMU",
+[STR_SERIALNUMBER] = "0"
+};
+
+static const USBDescDevice desc_device_canokey = {
+.bcdUSB= 0x0,
+.bMaxPacketSize0   = 16,
+.bNumConfigurations= 0,
+.confs = NULL,
+};
+
+static const USBDesc desc_canokey = {
+.id = {
+.idVendor  = CANOKEY_VENDOR_NUM,
+.idProduct = CANOKEY_PRODUCT_NUM,
+.bcdDevice = 0x0100,
+.iManufacturer = STR_MANUFACTURER,
+.iProduct  = STR_PRODUCT,
+.iSerialNumber = STR_SERIALNUMBER,
+},
+.full = &desc_device_canokey,
+.high = &desc_device_canokey,
+.str  = desc_strings,
+};
+
+
+/*
+ * libcanokey-qemu.so side functions
+ * All functions are called from canokey_emu_device_loop
+ */
+int canokey_emu_stall_ep(void *base, uint8_t ep)
+{
+CanoKeyState *key = base;
+uint8_t ep_in = CANOKEY_EP_IN(ep); /* INTR IN has ep 129 */
+key->ep_in_size[ep_in] = 0;
+key->ep_in_state[ep_in] = CANOKEY_EP_IN_STALL;
+return 0;
+}
+
+int canokey_emu_set_address(void *base, uint8_t addr)
+{
+CanoKeyState *key = base;
+key->dev.

[PULL 08/15] docs/system/devices/usb: Add CanoKey to USB devices examples

2022-06-14 Thread Gerd Hoffmann
From: "Hongren (Zenithal) Zheng" 

Signed-off-by: Hongren (Zenithal) Zheng 
Message-Id: 
Signed-off-by: Gerd Hoffmann 
---
 docs/system/devices/usb.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
index afb7d6c2268d..872d9167589b 100644
--- a/docs/system/devices/usb.rst
+++ b/docs/system/devices/usb.rst
@@ -199,6 +199,10 @@ option or the ``device_add`` monitor command. Available 
devices are:
 ``u2f-{emulated,passthru}``
Universal Second Factor device
 
+``canokey``
+   An Open-source Secure Key implementing FIDO2, OpenPGP, PIV and more.
+   For more information, see :ref:`canokey`.
+
 Physical port addressing
 
 
-- 
2.36.1




[PULL 01/15] ui/gtk-gl-area: implement GL context destruction

2022-06-14 Thread Gerd Hoffmann
From: Volker Rümelin 

The counterpart function for gd_gl_area_create_context() is
currently empty. Implement the gd_gl_area_destroy_context()
function to avoid GL context leaks.

Signed-off-by: Volker Rümelin 
Message-Id: <20220605085131.7711-1-vr_q...@t-online.de>
Signed-off-by: Gerd Hoffmann 
---
 ui/gtk-gl-area.c | 8 +++-
 ui/trace-events  | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index fc5a082eb846..0e20ea031d34 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -201,7 +201,13 @@ QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc,
 
 void gd_gl_area_destroy_context(DisplayGLCtx *dgc, QEMUGLContext ctx)
 {
-/* FIXME */
+GdkGLContext *current_ctx = gdk_gl_context_get_current();
+
+trace_gd_gl_area_destroy_context(ctx, current_ctx);
+if (ctx == current_ctx) {
+gdk_gl_context_clear_current();
+}
+g_clear_object(&ctx);
 }
 
 void gd_gl_area_scanout_texture(DisplayChangeListener *dcl,
diff --git a/ui/trace-events b/ui/trace-events
index f78b5e66061f..1040ba0f88c7 100644
--- a/ui/trace-events
+++ b/ui/trace-events
@@ -26,6 +26,7 @@ gd_key_event(const char *tab, int gdk_keycode, int qkeycode, 
const char *action)
 gd_grab(const char *tab, const char *device, const char *reason) "tab=%s, 
dev=%s, reason=%s"
 gd_ungrab(const char *tab, const char *device) "tab=%s, dev=%s"
 gd_keymap_windowing(const char *name) "backend=%s"
+gd_gl_area_destroy_context(void *ctx, void *current_ctx) "ctx=%p, 
current_ctx=%p"
 
 # vnc-auth-sasl.c
 # vnc-auth-vencrypt.c
-- 
2.36.1




[PULL 02/15] ui/gtk-gl-area: create the requested GL context version

2022-06-14 Thread Gerd Hoffmann
From: Volker Rümelin 

Since about 2018 virglrenderer (commit fa835b0f88 "vrend: don't
hardcode context version") tries to open the highest available GL
context version. This is done by creating the known GL context
versions from the highest to the lowest until (*create_gl_context)
returns a context != NULL.

This does not work properly with
the current QEMU gd_gl_area_create_context() function, because
gdk_gl_context_realize() on Wayland creates a version 3.0 legacy
context if the requested GL context version can't be created.

In order for virglrenderer to find the highest available GL
context version, return NULL if the created context version is
lower than the requested version.

This fixes the following error:
QEMU started with -device virtio-vga-gl -display gtk,gl=on.
Under Wayland, the guest window remains black and the following
information can be seen on the host.

gl_version 30 - compat profile
(qemu:5978): Gdk-WARNING **: 16:19:01.533:
  gdk_gl_context_set_required_version
  - GL context versions less than 3.2 are not supported.

(qemu:5978): Gdk-WARNING **: 16:19:01.537:
  gdk_gl_context_set_required_version -
  GL context versions less than 3.2 are not supported.

(qemu:5978): Gdk-WARNING **: 16:19:01.554:
  gdk_gl_context_set_required_version -
  GL context versions less than 3.2 are not supported.
vrend_renderer_fill_caps: Entering with stale GL error: 1282

To reproduce this error, an OpenGL driver is required on the host
that doesn't have the latest OpenGL extensions fully implemented.
An example for this is the Intel i965 driver on a Haswell processor.

Signed-off-by: Volker Rümelin 
Message-Id: <20220605085131.7711-2-vr_q...@t-online.de>
Signed-off-by: Gerd Hoffmann 
---
 ui/gtk-gl-area.c | 31 ++-
 ui/trace-events  |  1 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 0e20ea031d34..2e0129c28cd4 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -170,6 +170,23 @@ void gd_gl_area_switch(DisplayChangeListener *dcl,
 }
 }
 
+static int gd_cmp_gl_context_version(int major, int minor, QEMUGLParams 
*params)
+{
+if (major > params->major_ver) {
+return 1;
+}
+if (major < params->major_ver) {
+return -1;
+}
+if (minor > params->minor_ver) {
+return 1;
+}
+if (minor < params->minor_ver) {
+return -1;
+}
+return 0;
+}
+
 QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc,
 QEMUGLParams *params)
 {
@@ -177,8 +194,8 @@ QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc,
 GdkWindow *window;
 GdkGLContext *ctx;
 GError *err = NULL;
+int major, minor;
 
-gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
 window = gtk_widget_get_window(vc->gfx.drawing_area);
 ctx = gdk_window_create_gl_context(window, &err);
 if (err) {
@@ -196,6 +213,18 @@ QEMUGLContext gd_gl_area_create_context(DisplayGLCtx *dgc,
 g_clear_object(&ctx);
 return NULL;
 }
+
+gdk_gl_context_make_current(ctx);
+gdk_gl_context_get_version(ctx, &major, &minor);
+gdk_gl_context_clear_current();
+gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
+
+if (gd_cmp_gl_context_version(major, minor, params) == -1) {
+/* created ctx version < requested version */
+g_clear_object(&ctx);
+}
+
+trace_gd_gl_area_create_context(ctx, params->major_ver, params->minor_ver);
 return ctx;
 }
 
diff --git a/ui/trace-events b/ui/trace-events
index 1040ba0f88c7..a922f00e10b4 100644
--- a/ui/trace-events
+++ b/ui/trace-events
@@ -26,6 +26,7 @@ gd_key_event(const char *tab, int gdk_keycode, int qkeycode, 
const char *action)
 gd_grab(const char *tab, const char *device, const char *reason) "tab=%s, 
dev=%s, reason=%s"
 gd_ungrab(const char *tab, const char *device) "tab=%s, dev=%s"
 gd_keymap_windowing(const char *name) "backend=%s"
+gd_gl_area_create_context(void *ctx, int major, int minor) "ctx=%p, major=%d, 
minor=%d"
 gd_gl_area_destroy_context(void *ctx, void *current_ctx) "ctx=%p, 
current_ctx=%p"
 
 # vnc-auth-sasl.c
-- 
2.36.1




[PULL 03/15] ui/cocoa: Fix poweroff request code

2022-06-14 Thread Gerd Hoffmann
From: Akihiko Odaki 

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20220529082508.89097-1-akihiko.od...@gmail.com>
Signed-off-by: Gerd Hoffmann 
---
 ui/cocoa.m | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 09a62817f2a9..84c84e98fc5e 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -35,6 +35,7 @@
 #include "ui/kbd-state.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
+#include "sysemu/runstate-action.h"
 #include "sysemu/cpu-throttle.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block.h"
@@ -1290,7 +1291,10 @@ - (void)applicationWillTerminate:(NSNotification 
*)aNotification
 {
 COCOA_DEBUG("QemuCocoaAppController: applicationWillTerminate\n");
 
-qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_UI);
+with_iothread_lock(^{
+shutdown_action = SHUTDOWN_ACTION_POWEROFF;
+qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_UI);
+});
 
 /*
  * Sleep here, because returning will cause OSX to kill us
-- 
2.36.1




[PULL 00/15] Kraxel 20220614 patches

2022-06-14 Thread Gerd Hoffmann
The following changes since commit debd0753663bc89c86f5462a53268f2e3f680f60:

  Merge tag 'pull-testing-next-140622-1' of https://github.com/stsquad/qemu 
into staging (2022-06-13 21:10:57 -0700)

are available in the Git repository at:

  git://git.kraxel.org/qemu tags/kraxel-20220614-pull-request

for you to fetch changes up to b95b56311a0890da0c9f7fc624529c3d7f8dbce0:

  virtio-gpu: Respect UI refresh rate for EDID (2022-06-14 10:34:37 +0200)


usb: add CanoKey device, fixes for ehci + redir
ui: fixes for gtk and cocoa, rework refresh rate
virtio-gpu: scanout flush fix



Akihiko Odaki (4):
  ui/cocoa: Fix poweroff request code
  ui/console: Do not return a value with ui_info
  ui: Deliver refresh rate via QemuUIInfo
  virtio-gpu: Respect UI refresh rate for EDID

Arnout Engelen (1):
  hw/usb/hcd-ehci: fix writeback order

Dongwon Kim (1):
  virtio-gpu: update done only on the scanout associated with rect

Hongren (Zenithal) Zheng (6):
  hw/usb: Add CanoKey Implementation
  hw/usb/canokey: Add trace events
  meson: Add CanoKey
  docs: Add CanoKey documentation
  docs/system/devices/usb: Add CanoKey to USB devices examples
  MAINTAINERS: add myself as CanoKey maintainer

Joelle van Dyne (1):
  usbredir: avoid queuing hello packet on snapshot restore

Volker Rümelin (2):
  ui/gtk-gl-area: implement GL context destruction
  ui/gtk-gl-area: create the requested GL context version

 meson_options.txt|   2 +
 hw/usb/canokey.h |  69 +++
 include/hw/virtio/virtio-gpu.h   |   1 +
 include/ui/console.h |   4 +-
 include/ui/gtk.h |   2 +-
 hw/display/virtio-gpu-base.c |   7 +-
 hw/display/virtio-gpu.c  |   4 +
 hw/display/virtio-vga.c  |   5 +-
 hw/display/xenfb.c   |  14 +-
 hw/usb/canokey.c | 313 +++
 hw/usb/hcd-ehci.c|   5 +-
 hw/usb/redirect.c|   3 +-
 hw/vfio/display.c|   8 +-
 ui/console.c |   6 -
 ui/gtk-egl.c |   4 +-
 ui/gtk-gl-area.c |  42 -
 ui/gtk.c |  45 +++--
 MAINTAINERS  |   8 +
 docs/system/device-emulation.rst |   1 +
 docs/system/devices/canokey.rst  | 168 +
 docs/system/devices/usb.rst  |   4 +
 hw/usb/Kconfig   |   5 +
 hw/usb/meson.build   |   5 +
 hw/usb/trace-events  |  16 ++
 meson.build  |   6 +
 scripts/meson-buildoptions.sh|   3 +
 ui/cocoa.m   |   6 +-
 ui/trace-events  |   2 +
 28 files changed, 707 insertions(+), 51 deletions(-)
 create mode 100644 hw/usb/canokey.h
 create mode 100644 hw/usb/canokey.c
 create mode 100644 docs/system/devices/canokey.rst

-- 
2.36.1




Re: [PULL 00/16] Kraxel 20220613 patches

2022-06-14 Thread Gerd Hoffmann
> > Hmm, build worked here and CI passed too.
> > 
> > I think this is one of those cases where the build directory must be
> > deleted because one subdirectory is replaced by a compatibility
> > symlink.
> 
> Except 'configure' deals with that, as it explicitly rm -rf's the
> symlink target:
> 
> symlink() {
>   rm -rf "$2"
>   mkdir -p "$(dirname "$2")"
>   ln -s "$1" "$2"
> }
> 
> so i'm pretty confused as to what's going wrong here still

'git rebase -x ./make.sh master queue/kraxel' not working (where make.sh
is a script effectively doing 'make -C build/$name' for multiple build
trees with different configurations).

'git status' lists ui/keymaps/* as deleted.
'git reset --hard' fixes it.

take care,
  Gerd




Re: [PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-14 Thread Mark Rutland
On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote:
> Hi All! (omg so many)

Hi Peter,

Sorry for the delay; my plate has also been rather full recently. I'm beginning
to page this in now.

> These here few patches mostly clear out the utter mess that is cpuidle vs 
> rcuidle.
> 
> At the end of the ride there's only 2 real RCU_NONIDLE() users left
> 
>   arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());
>   drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
> PERF_EF_RELOAD));

The latter of these is necessary because apparently PM notifiers are called
with RCU not watching. Is that still the case today (or at the end of this
series)? If so, that feels like fertile land for more issues (yaey...). If not,
we should be able to drop this.

I can go dig into that some more.

>   kernel/cfi.c:   RCU_NONIDLE({
> 
> (the CFI one is likely dead in the kCFI rewrite) and there's only a hand full
> of trace_.*_rcuidle() left:
> 
>   kernel/trace/trace_preemptirq.c:
> trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
>   kernel/trace/trace_preemptirq.c:
> trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
>   kernel/trace/trace_preemptirq.c:
> trace_preempt_enable_rcuidle(a0, a1);
>   kernel/trace/trace_preemptirq.c:
> trace_preempt_disable_rcuidle(a0, a1);
> 
> All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.

I think those are also unused on arm64 too?

If not, I can go attack that.

> I've touched a _lot_ of code that I can't test and likely broken some of it :/
> In particular, the whole ARM cpuidle stuff was quite involved with OMAP being
> the absolute 'winner'.
> 
> I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to
> GENERIC_ENTRY.

Moving to GENERIC_ENTRY as a whole is going to take a tonne of work
(refactoring both arm64 and the generic portion to be more amenable to each
other), but we can certainly move closer to that for the bits that matter here.

Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff that
we can select regardless of GENERIC_ENTRY to make that easier.

> I've also got a note that says ARM64 can probably do a WFE based
> idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs.

Possibly; I'm not sure how much of a win that'll be given that by default we'll
have a ~10KHz WFE wakeup from the timer, but we could take a peek.

Thanks,
Mark.



[xen-unstable test] 171157: regressions - FAIL

2022-06-14 Thread osstest service owner
flight 171157 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/171157/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-pvops 6 kernel-build fail REGR. vs. 171151

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   7 xen-install  fail  like 171151
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 171151
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 171151
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 171151
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 171151
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 171151
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 171151
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 171151
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 171151
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 171151
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 171151
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 171151
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 171151
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  c9a707df83aad17a6fcf2e8330ab3b5bead6fb8b
baseline version:
 xen  c9a707df83aad17a6fcf2e8330ab3b5bead6fb8b

Last test of basis   171157  2022-06-14 01:53:24 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64

Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU

2022-06-14 Thread Michal Orzel



On 14.06.2022 13:08, Julien Grall wrote:
>>> +    unsigned int rc = 0;
>> ... here you are setting rc to 0 even though it will be reassigned.
>> Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving 
>> the definition there?
> 
> Because I forgot to replace "return NOTIFY_DONE;" with :
> 
> return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
That is what I thought.
With these fixes you can add my Rb.

> 
> In this case, we would need to initialize rc to 0 so it doesn't get used 
> initialized.
> 
> Cheers,
> 

Cheers,
Michal



Re: [PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU

2022-06-14 Thread Julien Grall




On 14/06/2022 12:05, Michal Orzel wrote:

Hi Julien,

On 14.06.2022 11:41, Julien Grall wrote:

From: Julien Grall 

Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in
xmalloc()" extended the checks in _xmalloc() to catch any use of the
helpers from context with interrupts disabled.

Unfortunately, the rule is not followed when initializing the per-CPU
IRQs:

(XEN) Xen call trace:
(XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC)
(XEN)[<>]  (LR)
(XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0
(XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4
(XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c
(XEN)[<00288fa4>] start_secondary+0x194/0x274
(XEN)[<40010170>] 40010170
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 2:
(XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 
1)' failed at common/xmalloc_tlsf.c:601
(XEN) 

This is happening because zalloc_cpumask_var() may allocate memory
if NR_CPUS is > 2 * sizeof(unsigned long).

Avoid the problem by allocate the per-CPU IRQs while preparing the
CPU.

Shouldn't this be" by initializing the per-CPU IRQs while ..." ?


I am fine with using "initializing" rather than "allocating".


Either way this text is the same like in the previous patch so I think this is 
not correct.


I can't quite parse this.


Other than that:
Reviewed-by: Michal Orzel 


Thanks!

Cheers,

--
Julien Grall



Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU

2022-06-14 Thread Julien Grall




On 14/06/2022 12:02, Michal Orzel wrote:

Hi Julien,


Hi Michal,


On 14.06.2022 11:41, Julien Grall wrote:

From: Julien Grall 

Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in
xmalloc()" extended the checks in _xmalloc() to catch any use of the
helpers from context with interrupts disabled.

Unfortunately, the rule is not followed when allocating the CPU
sibling/core maps.

(XEN) Xen call trace:
(XEN)[<00238a5c>] _xmalloc+0xfc/0x314 (PC)
(XEN)[<>]  (LR)
(XEN)[<00238c8c>] _xzalloc+0x18/0x4c
(XEN)[<00288cb4>] smpboot.c#setup_cpu_sibling_map+0x38/0x138
(XEN)[<00289024>] start_secondary+0x1b4/0x270
(XEN)[<40010170>] 40010170
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 2:
(XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 
1)' failed at common/xmalloc_tlsf.c:601
(XEN) 

This is happening because zalloc_cpumask_var() may allocate memory
if NR_CPUS is > 2 * sizeof(unsigned long).

Avoid the problem by allocate the per-CPU IRQs while preparing the
CPU.

Shouldn't this be "by allocating the CPU sibling/core maps while ..."
to reflect the commit title and to distinguish between this change and the IRQ 
one?


Yes. I will update it.

[...]


  static void remove_cpu_sibling_map(int cpu)
@@ -292,9 +294,14 @@ smp_get_max_cpus (void)
  void __init
  smp_prepare_cpus(void)
  {
+int rc;

Here you are leaving rc uninitialized (which is ok) but ...


+
  cpumask_copy(&cpu_present_map, &cpu_possible_map);
  
-setup_cpu_sibling_map(0);

+rc = setup_cpu_sibling_map(0);
+if ( rc )
+panic("Unable to allocate CPU sibling/core maps\n");
+
  }
  
  /* Boot the current CPU */

@@ -361,8 +368,6 @@ void start_secondary(void)
  
  set_current(idle_vcpu[cpuid]);
  
-setup_cpu_sibling_map(cpuid);

-
  /* Run local notifiers */
  notify_cpu_starting(cpuid);
  /*
@@ -530,9 +535,19 @@ static int cpu_smpboot_callback(struct notifier_block *nfb,
  void *hcpu)
  {
  unsigned int cpu = (unsigned long)hcpu;
+unsigned int rc = 0;

... here you are setting rc to 0 even though it will be reassigned.
Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving the 
definition there?


Because I forgot to replace "return NOTIFY_DONE;" with :

return !rc ? NOTIFY_DONE : notifier_from_errno(rc);

In this case, we would need to initialize rc to 0 so it doesn't get used 
initialized.


Cheers,

--
Julien Grall



Re: [PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU

2022-06-14 Thread Michal Orzel
Hi Julien,

On 14.06.2022 11:41, Julien Grall wrote:
> From: Julien Grall 
> 
> Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in
> xmalloc()" extended the checks in _xmalloc() to catch any use of the
> helpers from context with interrupts disabled.
> 
> Unfortunately, the rule is not followed when initializing the per-CPU
> IRQs:
> 
> (XEN) Xen call trace:
> (XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC)
> (XEN)[<>]  (LR)
> (XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0
> (XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4
> (XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c
> (XEN)[<00288fa4>] start_secondary+0x194/0x274
> (XEN)[<40010170>] 40010170
> (XEN)
> (XEN)
> (XEN) 
> (XEN) Panic on CPU 2:
> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 
> 1)' failed at common/xmalloc_tlsf.c:601
> (XEN) 
> 
> This is happening because zalloc_cpumask_var() may allocate memory
> if NR_CPUS is > 2 * sizeof(unsigned long).
> 
> Avoid the problem by allocate the per-CPU IRQs while preparing the
> CPU.
Shouldn't this be" by initializing the per-CPU IRQs while ..." ?
Either way this text is the same like in the previous patch so I think this is 
not correct.

Other than that:
Reviewed-by: Michal Orzel 




Re: [PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU

2022-06-14 Thread Michal Orzel
Hi Julien,

On 14.06.2022 11:41, Julien Grall wrote:
> From: Julien Grall 
> 
> Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in
> xmalloc()" extended the checks in _xmalloc() to catch any use of the
> helpers from context with interrupts disabled.
> 
> Unfortunately, the rule is not followed when allocating the CPU
> sibling/core maps.
> 
> (XEN) Xen call trace:
> (XEN)[<00238a5c>] _xmalloc+0xfc/0x314 (PC)
> (XEN)[<>]  (LR)
> (XEN)[<00238c8c>] _xzalloc+0x18/0x4c
> (XEN)[<00288cb4>] smpboot.c#setup_cpu_sibling_map+0x38/0x138
> (XEN)[<00289024>] start_secondary+0x1b4/0x270
> (XEN)[<40010170>] 40010170
> (XEN)
> (XEN)
> (XEN) 
> (XEN) Panic on CPU 2:
> (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 
> 1)' failed at common/xmalloc_tlsf.c:601
> (XEN) 
> 
> This is happening because zalloc_cpumask_var() may allocate memory
> if NR_CPUS is > 2 * sizeof(unsigned long).
> 
> Avoid the problem by allocate the per-CPU IRQs while preparing the
> CPU.
Shouldn't this be "by allocating the CPU sibling/core maps while ..."
to reflect the commit title and to distinguish between this change and the IRQ 
one?

> 
> This also has the benefit to remove a panic() in the secondary CPU
> code.
> 
> Signed-off-by: Julien Grall 
> ---
>  xen/arch/arm/smpboot.c | 25 -
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 4888bcd78a5a..2b0c92cd369b 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -79,15 +79,17 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
>  static bool __read_mostly opt_hmp_unsafe = false;
>  boolean_param("hmp-unsafe", opt_hmp_unsafe);
>  
> -static void setup_cpu_sibling_map(int cpu)
> +static int setup_cpu_sibling_map(int cpu)
>  {
>  if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) ||
>   !zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) )
> -panic("No memory for CPU sibling/core maps\n");
> +return -ENOMEM;
>  
>  /* A CPU is a sibling with itself and is always on its own core. */
>  cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
>  cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
> +
> +return 0;
>  }
>  
>  static void remove_cpu_sibling_map(int cpu)
> @@ -292,9 +294,14 @@ smp_get_max_cpus (void)
>  void __init
>  smp_prepare_cpus(void)
>  {
> +int rc;
Here you are leaving rc uninitialized (which is ok) but ...

> +
>  cpumask_copy(&cpu_present_map, &cpu_possible_map);
>  
> -setup_cpu_sibling_map(0);
> +rc = setup_cpu_sibling_map(0);
> +if ( rc )
> +panic("Unable to allocate CPU sibling/core maps\n");
> +
>  }
>  
>  /* Boot the current CPU */
> @@ -361,8 +368,6 @@ void start_secondary(void)
>  
>  set_current(idle_vcpu[cpuid]);
>  
> -setup_cpu_sibling_map(cpuid);
> -
>  /* Run local notifiers */
>  notify_cpu_starting(cpuid);
>  /*
> @@ -530,9 +535,19 @@ static int cpu_smpboot_callback(struct notifier_block 
> *nfb,
>  void *hcpu)
>  {
>  unsigned int cpu = (unsigned long)hcpu;
> +unsigned int rc = 0;
... here you are setting rc to 0 even though it will be reassigned.
Furthermore, if rc is used only in case of CPU_UP_PREPARE, why not moving the 
definition there?

>  
>  switch ( action )
>  {
> +case CPU_UP_PREPARE:
> +rc = setup_cpu_sibling_map(cpu);
> +if ( rc )
> +printk(XENLOG_ERR
> +   "Unable to allocate CPU sibling/core map  for CPU%u\n",
Too many spaces between 'map' and 'for'.

> +   cpu);
> +
> +break;
> +
>  case CPU_DEAD:
>  remove_cpu_sibling_map(cpu);
>  break;

Cheers,
Michal



Re: [PULL 00/16] Kraxel 20220613 patches

2022-06-14 Thread Daniel P . Berrangé
On Tue, Jun 14, 2022 at 11:40:38AM +0200, Gerd Hoffmann wrote:
> On Mon, Jun 13, 2022 at 08:52:21AM -0700, Richard Henderson wrote:
> > On 6/13/22 04:36, Gerd Hoffmann wrote:
> > > The following changes since commit 
> > > dcb40541ebca7ec98a14d461593b3cd7282b4fac:
> > > 
> > >Merge tag 'mips-20220611' of https://github.com/philmd/qemu into 
> > > staging (2022-06-11 21:13:27 -0700)
> > > 
> > > are available in the Git repository at:
> > > 
> > >git://git.kraxel.org/qemu tags/kraxel-20220613-pull-request
> > > 
> > > for you to fetch changes up to 23b87f7a3a13e93e248eef8a4b7257548855a620:
> > > 
> > >ui: move 'pc-bios/keymaps' to 'ui/keymaps' (2022-06-13 10:59:25 +0200)
> > > 
> > > 
> > > usb: add CanoKey device, fixes for ehci + redir
> > > ui: fixes for gtk and cocoa, move keymaps (v2), rework refresh rate
> > > virtio-gpu: scanout flush fix
> > 
> > This doesn't even configure:
> > 
> > ../src/ui/keymaps/meson.build:55:4: ERROR: File ar does not exist.
> 
> Hmm, build worked here and CI passed too.
> 
> I think this is one of those cases where the build directory must be
> deleted because one subdirectory is replaced by a compatibility
> symlink.

Except 'configure' deals with that, as it explicitly rm -rf's the
symlink target:

symlink() {
  rm -rf "$2"
  mkdir -p "$(dirname "$2")"
  ln -s "$1" "$2"
}


so i'm pretty confused as to what's going wrong here still


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] xen/console: do not drop serial output from the hardware domain

2022-06-14 Thread Jan Beulich
On 14.06.2022 11:38, Roger Pau Monné wrote:
> On Tue, Jun 14, 2022 at 11:13:07AM +0200, Jan Beulich wrote:
>> On 14.06.2022 10:32, Roger Pau Monné wrote:
>>> On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote:
 On 14.06.2022 08:52, Roger Pau Monné wrote:
> On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote:
>> On 13.06.2022 14:32, Roger Pau Monné wrote:
>>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote:
 On 13.06.2022 11:04, Roger Pau Monné wrote:
> On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote:
>> On 13.06.2022 10:21, Roger Pau Monné wrote:
>>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote:
 On 10.06.2022 17:06, Roger Pau Monne wrote:
> Prevent dropping console output from the hardware domain, since 
> it's
> likely important to have all the output if the boot fails without
> having to resort to sync_console (which also affects the output 
> from
> other guests).
>
> Do so by pairing the console_serial_puts() with
> serial_{start,end}_log_everything(), so that no output is dropped.

 While I can see the goal, why would Dom0 output be (effectively) 
 more
 important than Xen's own one (which isn't "forced")? And with this
 aiming at boot output only, wouldn't you want to stop the 
 overriding
 once boot has completed (of which, if I'm not mistaken, we don't
 really have any signal coming from Dom0)? And even during boot I'm
 not convinced we'd want to let through everything, but perhaps just
 Dom0's kernel messages?
>>>
>>> I normally use sync_console on all the boxes I'm doing dev work, so
>>> this request is something that come up internally.
>>>
>>> Didn't realize Xen output wasn't forced, since we already have rate
>>> limiting based on log levels I was assuming that non-ratelimited
>>> messages wouldn't be dropped.  But yes, I agree that Xen (non-guest
>>> triggered) output shouldn't be rate limited either.
>>
>> Which would raise the question of why we have log levels for 
>> non-guest
>> messages.
>
> Hm, maybe I'm confused, but I don't see a direct relation between log
> levels and rate limiting.  If I set log level to WARNING I would
> expect to not loose _any_ non-guest log messages with level WARNING or
> above.  It's still useful to have log levels for non-guest messages,
> since user might want to filter out DEBUG non-guest messages for
> example.

 It was me who was confused, because of the two log-everything variants
 we have (console and serial). You're right that your change is 
 unrelated
 to log levels. However, when there are e.g. many warnings or when an
 admin has lowered the log level, what you (would) do is effectively
 force sync_console mode transiently (for a subset of messages, but
 that's secondary, especially because the "forced" output would still
 be waiting for earlier output to make it out).
>>>
>>> Right, it would have to wait for any previous output on the buffer to
>>> go out first.  In any case we can guarantee that no more output will
>>> be added to the buffer while Xen waits for it to be flushed.
>>>
>>> So for the hardware domain it might make sense to wait for the TX
>>> buffers to be half empty (the current tx_quench logic) by preempting
>>> the hypercall.  That however could cause issues if guests manage to
>>> keep filling the buffer while the hardware domain is being preempted.
>>>
>>> Alternatively we could always reserve half of the buffer for the
>>> hardware domain, and allow it to be preempted while waiting for space
>>> (since it's guaranteed non hardware domains won't be able to steal the
>>> allocation from the hardware domain).
>>
>> Getting complicated it seems. I have to admit that I wonder whether we
>> wouldn't be better off leaving the current logic as is.
>
> Another possible solution (more like a band aid) is to increase the
> buffer size from 4 pages to 8 or 16.  That would likely allow to cope
> fine with the high throughput of boot messages.

 You mean the buffer whose size is controlled by serial_tx_buffer?
>>>
>>> Yes.
>>>
 On
 large systems one may want to simply make use of the command line
 option then; I don't think the built-in default needs changing. Or
 if so, then perhaps not statically at build time, but taking into
 account system properties (like CPU count).
>>>
>>> So how about we use:
>>>
>>> min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096))
>>
>> That

[PATCH] xen/arm: irq: Initialize the per-CPU IRQs while preparing the CPU

2022-06-14 Thread Julien Grall
From: Julien Grall 

Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in
xmalloc()" extended the checks in _xmalloc() to catch any use of the
helpers from context with interrupts disabled.

Unfortunately, the rule is not followed when initializing the per-CPU
IRQs:

(XEN) Xen call trace:
(XEN)[<002389f4>] _xmalloc+0xfc/0x314 (PC)
(XEN)[<>]  (LR)
(XEN)[<0021a7c4>] init_one_irq_desc+0x48/0xd0
(XEN)[<002807a8>] irq.c#init_local_irq_data+0x48/0xa4
(XEN)[<00280834>] init_secondary_IRQ+0x10/0x2c
(XEN)[<00288fa4>] start_secondary+0x194/0x274
(XEN)[<40010170>] 40010170
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 2:
(XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 
1)' failed at common/xmalloc_tlsf.c:601
(XEN) 

This is happening because zalloc_cpumask_var() may allocate memory
if NR_CPUS is > 2 * sizeof(unsigned long).

Avoid the problem by allocate the per-CPU IRQs while preparing the
CPU.

This also has the benefit to remove a BUG_ON() in the secondary CPU
code.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/include/asm/irq.h |  1 -
 xen/arch/arm/irq.c | 35 +++---
 xen/arch/arm/smpboot.c |  2 --
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h
index e45d57459899..245f49dcbac5 100644
--- a/xen/arch/arm/include/asm/irq.h
+++ b/xen/arch/arm/include/asm/irq.h
@@ -73,7 +73,6 @@ static inline bool is_lpi(unsigned int irq)
 bool is_assignable_irq(unsigned int irq);
 
 void init_IRQ(void);
-void init_secondary_IRQ(void);
 
 int route_irq_to_guest(struct domain *d, unsigned int virq,
unsigned int irq, const char *devname);
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index b761d90c4063..56bdcb95335d 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -17,6 +17,7 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -100,7 +101,7 @@ static int __init init_irq_data(void)
 return 0;
 }
 
-static int init_local_irq_data(void)
+static int init_local_irq_data(unsigned int cpu)
 {
 int irq;
 
@@ -108,7 +109,7 @@ static int init_local_irq_data(void)
 
 for ( irq = 0; irq < NR_LOCAL_IRQS; irq++ )
 {
-struct irq_desc *desc = irq_to_desc(irq);
+struct irq_desc *desc = &per_cpu(local_irq_desc, cpu)[irq];
 int rc = init_one_irq_desc(desc);
 
 if ( rc )
@@ -131,6 +132,29 @@ static int init_local_irq_data(void)
 return 0;
 }
 
+static int cpu_callback(struct notifier_block *nfb, unsigned long action,
+void *hcpu)
+{
+unsigned long cpu = (unsigned long)hcpu;
+int rc = 0;
+
+switch ( action )
+{
+case CPU_UP_PREPARE:
+rc = init_local_irq_data(cpu);
+if ( rc )
+printk(XENLOG_ERR "Unable to allocate local IRQ for CPU%lu\n",
+   cpu);
+break;
+}
+
+return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
+}
+
+static struct notifier_block cpu_nfb = {
+.notifier_call = cpu_callback,
+};
+
 void __init init_IRQ(void)
 {
 int irq;
@@ -140,13 +164,10 @@ void __init init_IRQ(void)
 local_irqs_type[irq] = IRQ_TYPE_INVALID;
 spin_unlock(&local_irqs_type_lock);
 
-BUG_ON(init_local_irq_data() < 0);
+BUG_ON(init_local_irq_data(smp_processor_id()) < 0);
 BUG_ON(init_irq_data() < 0);
-}
 
-void init_secondary_IRQ(void)
-{
-BUG_ON(init_local_irq_data() < 0);
+register_cpu_notifier(&cpu_nfb);
 }
 
 static inline struct irq_guest *irq_get_guest_info(struct irq_desc *desc)
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 9bb32a301a70..4888bcd78a5a 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -359,8 +359,6 @@ void start_secondary(void)
 
 gic_init_secondary_cpu();
 
-init_secondary_IRQ();
-
 set_current(idle_vcpu[cpuid]);
 
 setup_cpu_sibling_map(cpuid);
-- 
2.32.0




[PATCH] xen/arm: smpboot: Allocate the CPU sibling/core maps while preparing the CPU

2022-06-14 Thread Julien Grall
From: Julien Grall 

Commit 5047cd1d5dea "xen/common: Use enhanced ASSERT_ALLOC_CONTEXT in
xmalloc()" extended the checks in _xmalloc() to catch any use of the
helpers from context with interrupts disabled.

Unfortunately, the rule is not followed when allocating the CPU
sibling/core maps.

(XEN) Xen call trace:
(XEN)[<00238a5c>] _xmalloc+0xfc/0x314 (PC)
(XEN)[<>]  (LR)
(XEN)[<00238c8c>] _xzalloc+0x18/0x4c
(XEN)[<00288cb4>] smpboot.c#setup_cpu_sibling_map+0x38/0x138
(XEN)[<00289024>] start_secondary+0x1b4/0x270
(XEN)[<40010170>] 40010170
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 2:
(XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 
1)' failed at common/xmalloc_tlsf.c:601
(XEN) 

This is happening because zalloc_cpumask_var() may allocate memory
if NR_CPUS is > 2 * sizeof(unsigned long).

Avoid the problem by allocate the per-CPU IRQs while preparing the
CPU.

This also has the benefit to remove a panic() in the secondary CPU
code.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/smpboot.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 4888bcd78a5a..2b0c92cd369b 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -79,15 +79,17 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 static bool __read_mostly opt_hmp_unsafe = false;
 boolean_param("hmp-unsafe", opt_hmp_unsafe);
 
-static void setup_cpu_sibling_map(int cpu)
+static int setup_cpu_sibling_map(int cpu)
 {
 if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) ||
  !zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) )
-panic("No memory for CPU sibling/core maps\n");
+return -ENOMEM;
 
 /* A CPU is a sibling with itself and is always on its own core. */
 cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
 cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
+
+return 0;
 }
 
 static void remove_cpu_sibling_map(int cpu)
@@ -292,9 +294,14 @@ smp_get_max_cpus (void)
 void __init
 smp_prepare_cpus(void)
 {
+int rc;
+
 cpumask_copy(&cpu_present_map, &cpu_possible_map);
 
-setup_cpu_sibling_map(0);
+rc = setup_cpu_sibling_map(0);
+if ( rc )
+panic("Unable to allocate CPU sibling/core maps\n");
+
 }
 
 /* Boot the current CPU */
@@ -361,8 +368,6 @@ void start_secondary(void)
 
 set_current(idle_vcpu[cpuid]);
 
-setup_cpu_sibling_map(cpuid);
-
 /* Run local notifiers */
 notify_cpu_starting(cpuid);
 /*
@@ -530,9 +535,19 @@ static int cpu_smpboot_callback(struct notifier_block *nfb,
 void *hcpu)
 {
 unsigned int cpu = (unsigned long)hcpu;
+unsigned int rc = 0;
 
 switch ( action )
 {
+case CPU_UP_PREPARE:
+rc = setup_cpu_sibling_map(cpu);
+if ( rc )
+printk(XENLOG_ERR
+   "Unable to allocate CPU sibling/core map  for CPU%u\n",
+   cpu);
+
+break;
+
 case CPU_DEAD:
 remove_cpu_sibling_map(cpu);
 break;
-- 
2.32.0




Re: [PULL 00/16] Kraxel 20220613 patches

2022-06-14 Thread Gerd Hoffmann
On Mon, Jun 13, 2022 at 08:52:21AM -0700, Richard Henderson wrote:
> On 6/13/22 04:36, Gerd Hoffmann wrote:
> > The following changes since commit dcb40541ebca7ec98a14d461593b3cd7282b4fac:
> > 
> >Merge tag 'mips-20220611' of https://github.com/philmd/qemu into staging 
> > (2022-06-11 21:13:27 -0700)
> > 
> > are available in the Git repository at:
> > 
> >git://git.kraxel.org/qemu tags/kraxel-20220613-pull-request
> > 
> > for you to fetch changes up to 23b87f7a3a13e93e248eef8a4b7257548855a620:
> > 
> >ui: move 'pc-bios/keymaps' to 'ui/keymaps' (2022-06-13 10:59:25 +0200)
> > 
> > 
> > usb: add CanoKey device, fixes for ehci + redir
> > ui: fixes for gtk and cocoa, move keymaps (v2), rework refresh rate
> > virtio-gpu: scanout flush fix
> 
> This doesn't even configure:
> 
> ../src/ui/keymaps/meson.build:55:4: ERROR: File ar does not exist.

Hmm, build worked here and CI passed too.

I think this is one of those cases where the build directory must be
deleted because one subdirectory is replaced by a compatibility
symlink.

Or we drop the symlink idea and update the keymap loading code to check
both old and new location.  Daniel?

take care,
  Gerd




Re: [PATCH] xen/console: do not drop serial output from the hardware domain

2022-06-14 Thread Roger Pau Monné
On Tue, Jun 14, 2022 at 11:13:07AM +0200, Jan Beulich wrote:
> On 14.06.2022 10:32, Roger Pau Monné wrote:
> > On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote:
> >> On 14.06.2022 08:52, Roger Pau Monné wrote:
> >>> On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote:
>  On 13.06.2022 14:32, Roger Pau Monné wrote:
> > On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote:
> >> On 13.06.2022 11:04, Roger Pau Monné wrote:
> >>> On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote:
>  On 13.06.2022 10:21, Roger Pau Monné wrote:
> > On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote:
> >> On 10.06.2022 17:06, Roger Pau Monne wrote:
> >>> Prevent dropping console output from the hardware domain, since 
> >>> it's
> >>> likely important to have all the output if the boot fails without
> >>> having to resort to sync_console (which also affects the output 
> >>> from
> >>> other guests).
> >>>
> >>> Do so by pairing the console_serial_puts() with
> >>> serial_{start,end}_log_everything(), so that no output is dropped.
> >>
> >> While I can see the goal, why would Dom0 output be (effectively) 
> >> more
> >> important than Xen's own one (which isn't "forced")? And with this
> >> aiming at boot output only, wouldn't you want to stop the 
> >> overriding
> >> once boot has completed (of which, if I'm not mistaken, we don't
> >> really have any signal coming from Dom0)? And even during boot I'm
> >> not convinced we'd want to let through everything, but perhaps just
> >> Dom0's kernel messages?
> >
> > I normally use sync_console on all the boxes I'm doing dev work, so
> > this request is something that come up internally.
> >
> > Didn't realize Xen output wasn't forced, since we already have rate
> > limiting based on log levels I was assuming that non-ratelimited
> > messages wouldn't be dropped.  But yes, I agree that Xen (non-guest
> > triggered) output shouldn't be rate limited either.
> 
>  Which would raise the question of why we have log levels for 
>  non-guest
>  messages.
> >>>
> >>> Hm, maybe I'm confused, but I don't see a direct relation between log
> >>> levels and rate limiting.  If I set log level to WARNING I would
> >>> expect to not loose _any_ non-guest log messages with level WARNING or
> >>> above.  It's still useful to have log levels for non-guest messages,
> >>> since user might want to filter out DEBUG non-guest messages for
> >>> example.
> >>
> >> It was me who was confused, because of the two log-everything variants
> >> we have (console and serial). You're right that your change is 
> >> unrelated
> >> to log levels. However, when there are e.g. many warnings or when an
> >> admin has lowered the log level, what you (would) do is effectively
> >> force sync_console mode transiently (for a subset of messages, but
> >> that's secondary, especially because the "forced" output would still
> >> be waiting for earlier output to make it out).
> >
> > Right, it would have to wait for any previous output on the buffer to
> > go out first.  In any case we can guarantee that no more output will
> > be added to the buffer while Xen waits for it to be flushed.
> >
> > So for the hardware domain it might make sense to wait for the TX
> > buffers to be half empty (the current tx_quench logic) by preempting
> > the hypercall.  That however could cause issues if guests manage to
> > keep filling the buffer while the hardware domain is being preempted.
> >
> > Alternatively we could always reserve half of the buffer for the
> > hardware domain, and allow it to be preempted while waiting for space
> > (since it's guaranteed non hardware domains won't be able to steal the
> > allocation from the hardware domain).
> 
>  Getting complicated it seems. I have to admit that I wonder whether we
>  wouldn't be better off leaving the current logic as is.
> >>>
> >>> Another possible solution (more like a band aid) is to increase the
> >>> buffer size from 4 pages to 8 or 16.  That would likely allow to cope
> >>> fine with the high throughput of boot messages.
> >>
> >> You mean the buffer whose size is controlled by serial_tx_buffer?
> > 
> > Yes.
> > 
> >> On
> >> large systems one may want to simply make use of the command line
> >> option then; I don't think the built-in default needs changing. Or
> >> if so, then perhaps not statically at build time, but taking into
> >> account system properties (like CPU count).
> > 
> > So how about we use:
> > 
> > min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096))
> 
> That would _reduce_ size on small systems, would

Re: [PATCH] xen/console: do not drop serial output from the hardware domain

2022-06-14 Thread Jan Beulich
On 14.06.2022 10:32, Roger Pau Monné wrote:
> On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote:
>> On 14.06.2022 08:52, Roger Pau Monné wrote:
>>> On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote:
 On 13.06.2022 14:32, Roger Pau Monné wrote:
> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote:
>> On 13.06.2022 11:04, Roger Pau Monné wrote:
>>> On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote:
 On 13.06.2022 10:21, Roger Pau Monné wrote:
> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote:
>> On 10.06.2022 17:06, Roger Pau Monne wrote:
>>> Prevent dropping console output from the hardware domain, since it's
>>> likely important to have all the output if the boot fails without
>>> having to resort to sync_console (which also affects the output from
>>> other guests).
>>>
>>> Do so by pairing the console_serial_puts() with
>>> serial_{start,end}_log_everything(), so that no output is dropped.
>>
>> While I can see the goal, why would Dom0 output be (effectively) more
>> important than Xen's own one (which isn't "forced")? And with this
>> aiming at boot output only, wouldn't you want to stop the overriding
>> once boot has completed (of which, if I'm not mistaken, we don't
>> really have any signal coming from Dom0)? And even during boot I'm
>> not convinced we'd want to let through everything, but perhaps just
>> Dom0's kernel messages?
>
> I normally use sync_console on all the boxes I'm doing dev work, so
> this request is something that come up internally.
>
> Didn't realize Xen output wasn't forced, since we already have rate
> limiting based on log levels I was assuming that non-ratelimited
> messages wouldn't be dropped.  But yes, I agree that Xen (non-guest
> triggered) output shouldn't be rate limited either.

 Which would raise the question of why we have log levels for non-guest
 messages.
>>>
>>> Hm, maybe I'm confused, but I don't see a direct relation between log
>>> levels and rate limiting.  If I set log level to WARNING I would
>>> expect to not loose _any_ non-guest log messages with level WARNING or
>>> above.  It's still useful to have log levels for non-guest messages,
>>> since user might want to filter out DEBUG non-guest messages for
>>> example.
>>
>> It was me who was confused, because of the two log-everything variants
>> we have (console and serial). You're right that your change is unrelated
>> to log levels. However, when there are e.g. many warnings or when an
>> admin has lowered the log level, what you (would) do is effectively
>> force sync_console mode transiently (for a subset of messages, but
>> that's secondary, especially because the "forced" output would still
>> be waiting for earlier output to make it out).
>
> Right, it would have to wait for any previous output on the buffer to
> go out first.  In any case we can guarantee that no more output will
> be added to the buffer while Xen waits for it to be flushed.
>
> So for the hardware domain it might make sense to wait for the TX
> buffers to be half empty (the current tx_quench logic) by preempting
> the hypercall.  That however could cause issues if guests manage to
> keep filling the buffer while the hardware domain is being preempted.
>
> Alternatively we could always reserve half of the buffer for the
> hardware domain, and allow it to be preempted while waiting for space
> (since it's guaranteed non hardware domains won't be able to steal the
> allocation from the hardware domain).

 Getting complicated it seems. I have to admit that I wonder whether we
 wouldn't be better off leaving the current logic as is.
>>>
>>> Another possible solution (more like a band aid) is to increase the
>>> buffer size from 4 pages to 8 or 16.  That would likely allow to cope
>>> fine with the high throughput of boot messages.
>>
>> You mean the buffer whose size is controlled by serial_tx_buffer?
> 
> Yes.
> 
>> On
>> large systems one may want to simply make use of the command line
>> option then; I don't think the built-in default needs changing. Or
>> if so, then perhaps not statically at build time, but taking into
>> account system properties (like CPU count).
> 
> So how about we use:
> 
> min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096))

That would _reduce_ size on small systems, wouldn't it? Originally
you were after increasing the default size. But if you had meant
max(), then I'd fear on very large systems this may grow a little
too large.

> Maybe we should also take CPU frequency into account, but that seems
> too complex for the purpose.

Why would frequency matter? Other aspects I could see 

[linux-linus test] 171156: regressions - FAIL

2022-06-14 Thread osstest service owner
flight 171156 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/171156/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 170714
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 170714
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 170714
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
170714
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 170714
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 170714
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 170714
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 170714
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 170714
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 170714
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-vhd   8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 170714
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 170714
 test-arm64-arm64-libvirt-raw  8 xen-boot fail REGR. vs. 170714
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 170714
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 170714
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 170714

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl   8 xen-boot   fail pass in 171154
 test-amd64-amd64-xl-vhd   8 xen-boot   fail pass in 171154
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot   fail pass in 171154
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot   fail pass in 171155
 test-amd64-amd64-pair12 xen-boot/src_host  fail pass in 171155
 test-amd64-amd64-pair13 xen-boot/dst_host  fail pass in 171155
 test-amd64-amd64-xl-credit1   8 xen-boot   fail pass in 171155
 test-amd64-amd64-xl-xsm   8 xen-boot   fail pass in 171155

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170714
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170714
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170714
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170714
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170714
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 170714
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 170714
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 170714
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cred

[libvirt test] 171159: regressions - FAIL

2022-06-14 Thread osstest service owner
flight 171159 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/171159/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  31b5ad06e315a8e9625a7fec56dc14af15895c89
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  704 days
Failing since151818  2020-07-11 04:18:52 Z  703 days  685 attempts
Testing same since   171159  2022-06-14 04:18:58 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Amneesh Singh 
  Andika Triwidada 
  Andrea Bolognani 
  Andrew Melnychenko 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Claudio Fontana 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Haonan Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jing Qi 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  John Levon 
  John Levon 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kim InSoo 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Lena Voytek 
  Liang Yan 
  Liang Yan 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  luzhipeng 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Mark Mielke 
  Markus Schade 
  Martin Kletzander 
  Martin Pitt 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Max Goodhart 
  Maxim Nestratov 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Moteen Shah 
  Moteen Shah 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Nikolay Shirokovskiy 
  Nikolay Shirokovskiy 
  Niteesh Dubey 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Paolo Bonzini 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peng Liang 
 

Re: [PATCH] xen/console: do not drop serial output from the hardware domain

2022-06-14 Thread Roger Pau Monné
On Tue, Jun 14, 2022 at 10:32:53AM +0200, Roger Pau Monné wrote:
> On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote:
> > On 14.06.2022 08:52, Roger Pau Monné wrote:
> > > On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote:
> > >> On 13.06.2022 14:32, Roger Pau Monné wrote:
> > >>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote:
> >  On 13.06.2022 11:04, Roger Pau Monné wrote:
> > > On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote:
> > >> On 13.06.2022 10:21, Roger Pau Monné wrote:
> > >>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote:
> >  On 10.06.2022 17:06, Roger Pau Monne wrote:
> > > Prevent dropping console output from the hardware domain, since 
> > > it's
> > > likely important to have all the output if the boot fails without
> > > having to resort to sync_console (which also affects the output 
> > > from
> > > other guests).
> > >
> > > Do so by pairing the console_serial_puts() with
> > > serial_{start,end}_log_everything(), so that no output is dropped.
> > 
> >  While I can see the goal, why would Dom0 output be (effectively) 
> >  more
> >  important than Xen's own one (which isn't "forced")? And with this
> >  aiming at boot output only, wouldn't you want to stop the 
> >  overriding
> >  once boot has completed (of which, if I'm not mistaken, we don't
> >  really have any signal coming from Dom0)? And even during boot I'm
> >  not convinced we'd want to let through everything, but perhaps just
> >  Dom0's kernel messages?
> > >>>
> > >>> I normally use sync_console on all the boxes I'm doing dev work, so
> > >>> this request is something that come up internally.
> > >>>
> > >>> Didn't realize Xen output wasn't forced, since we already have rate
> > >>> limiting based on log levels I was assuming that non-ratelimited
> > >>> messages wouldn't be dropped.  But yes, I agree that Xen (non-guest
> > >>> triggered) output shouldn't be rate limited either.
> > >>
> > >> Which would raise the question of why we have log levels for 
> > >> non-guest
> > >> messages.
> > >
> > > Hm, maybe I'm confused, but I don't see a direct relation between log
> > > levels and rate limiting.  If I set log level to WARNING I would
> > > expect to not loose _any_ non-guest log messages with level WARNING or
> > > above.  It's still useful to have log levels for non-guest messages,
> > > since user might want to filter out DEBUG non-guest messages for
> > > example.
> > 
> >  It was me who was confused, because of the two log-everything variants
> >  we have (console and serial). You're right that your change is 
> >  unrelated
> >  to log levels. However, when there are e.g. many warnings or when an
> >  admin has lowered the log level, what you (would) do is effectively
> >  force sync_console mode transiently (for a subset of messages, but
> >  that's secondary, especially because the "forced" output would still
> >  be waiting for earlier output to make it out).
> > >>>
> > >>> Right, it would have to wait for any previous output on the buffer to
> > >>> go out first.  In any case we can guarantee that no more output will
> > >>> be added to the buffer while Xen waits for it to be flushed.
> > >>>
> > >>> So for the hardware domain it might make sense to wait for the TX
> > >>> buffers to be half empty (the current tx_quench logic) by preempting
> > >>> the hypercall.  That however could cause issues if guests manage to
> > >>> keep filling the buffer while the hardware domain is being preempted.
> > >>>
> > >>> Alternatively we could always reserve half of the buffer for the
> > >>> hardware domain, and allow it to be preempted while waiting for space
> > >>> (since it's guaranteed non hardware domains won't be able to steal the
> > >>> allocation from the hardware domain).
> > >>
> > >> Getting complicated it seems. I have to admit that I wonder whether we
> > >> wouldn't be better off leaving the current logic as is.
> > > 
> > > Another possible solution (more like a band aid) is to increase the
> > > buffer size from 4 pages to 8 or 16.  That would likely allow to cope
> > > fine with the high throughput of boot messages.
> > 
> > You mean the buffer whose size is controlled by serial_tx_buffer?
> 
> Yes.
> 
> > On
> > large systems one may want to simply make use of the command line
> > option then; I don't think the built-in default needs changing. Or
> > if so, then perhaps not statically at build time, but taking into
> > account system properties (like CPU count).
> 
> So how about we use:
> 
> min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096))

Er, sorry, that should be max(...) instead.

Thanks, Roger.



Re: [PATCH] xen/console: do not drop serial output from the hardware domain

2022-06-14 Thread Roger Pau Monné
On Tue, Jun 14, 2022 at 10:10:03AM +0200, Jan Beulich wrote:
> On 14.06.2022 08:52, Roger Pau Monné wrote:
> > On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote:
> >> On 13.06.2022 14:32, Roger Pau Monné wrote:
> >>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote:
>  On 13.06.2022 11:04, Roger Pau Monné wrote:
> > On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote:
> >> On 13.06.2022 10:21, Roger Pau Monné wrote:
> >>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote:
>  On 10.06.2022 17:06, Roger Pau Monne wrote:
> > Prevent dropping console output from the hardware domain, since it's
> > likely important to have all the output if the boot fails without
> > having to resort to sync_console (which also affects the output from
> > other guests).
> >
> > Do so by pairing the console_serial_puts() with
> > serial_{start,end}_log_everything(), so that no output is dropped.
> 
>  While I can see the goal, why would Dom0 output be (effectively) more
>  important than Xen's own one (which isn't "forced")? And with this
>  aiming at boot output only, wouldn't you want to stop the overriding
>  once boot has completed (of which, if I'm not mistaken, we don't
>  really have any signal coming from Dom0)? And even during boot I'm
>  not convinced we'd want to let through everything, but perhaps just
>  Dom0's kernel messages?
> >>>
> >>> I normally use sync_console on all the boxes I'm doing dev work, so
> >>> this request is something that come up internally.
> >>>
> >>> Didn't realize Xen output wasn't forced, since we already have rate
> >>> limiting based on log levels I was assuming that non-ratelimited
> >>> messages wouldn't be dropped.  But yes, I agree that Xen (non-guest
> >>> triggered) output shouldn't be rate limited either.
> >>
> >> Which would raise the question of why we have log levels for non-guest
> >> messages.
> >
> > Hm, maybe I'm confused, but I don't see a direct relation between log
> > levels and rate limiting.  If I set log level to WARNING I would
> > expect to not loose _any_ non-guest log messages with level WARNING or
> > above.  It's still useful to have log levels for non-guest messages,
> > since user might want to filter out DEBUG non-guest messages for
> > example.
> 
>  It was me who was confused, because of the two log-everything variants
>  we have (console and serial). You're right that your change is unrelated
>  to log levels. However, when there are e.g. many warnings or when an
>  admin has lowered the log level, what you (would) do is effectively
>  force sync_console mode transiently (for a subset of messages, but
>  that's secondary, especially because the "forced" output would still
>  be waiting for earlier output to make it out).
> >>>
> >>> Right, it would have to wait for any previous output on the buffer to
> >>> go out first.  In any case we can guarantee that no more output will
> >>> be added to the buffer while Xen waits for it to be flushed.
> >>>
> >>> So for the hardware domain it might make sense to wait for the TX
> >>> buffers to be half empty (the current tx_quench logic) by preempting
> >>> the hypercall.  That however could cause issues if guests manage to
> >>> keep filling the buffer while the hardware domain is being preempted.
> >>>
> >>> Alternatively we could always reserve half of the buffer for the
> >>> hardware domain, and allow it to be preempted while waiting for space
> >>> (since it's guaranteed non hardware domains won't be able to steal the
> >>> allocation from the hardware domain).
> >>
> >> Getting complicated it seems. I have to admit that I wonder whether we
> >> wouldn't be better off leaving the current logic as is.
> > 
> > Another possible solution (more like a band aid) is to increase the
> > buffer size from 4 pages to 8 or 16.  That would likely allow to cope
> > fine with the high throughput of boot messages.
> 
> You mean the buffer whose size is controlled by serial_tx_buffer?

Yes.

> On
> large systems one may want to simply make use of the command line
> option then; I don't think the built-in default needs changing. Or
> if so, then perhaps not statically at build time, but taking into
> account system properties (like CPU count).

So how about we use:

min(16384, ROUNDUP(1024 * num_possible_cpus(), 4096))

Maybe we should also take CPU frequency into account, but that seems
too complex for the purpose.

Thanks, Roger.



Re: [PATCH] xen/console: do not drop serial output from the hardware domain

2022-06-14 Thread Jan Beulich
On 14.06.2022 08:52, Roger Pau Monné wrote:
> On Mon, Jun 13, 2022 at 03:56:54PM +0200, Jan Beulich wrote:
>> On 13.06.2022 14:32, Roger Pau Monné wrote:
>>> On Mon, Jun 13, 2022 at 11:18:49AM +0200, Jan Beulich wrote:
 On 13.06.2022 11:04, Roger Pau Monné wrote:
> On Mon, Jun 13, 2022 at 10:29:43AM +0200, Jan Beulich wrote:
>> On 13.06.2022 10:21, Roger Pau Monné wrote:
>>> On Mon, Jun 13, 2022 at 09:30:06AM +0200, Jan Beulich wrote:
 On 10.06.2022 17:06, Roger Pau Monne wrote:
> Prevent dropping console output from the hardware domain, since it's
> likely important to have all the output if the boot fails without
> having to resort to sync_console (which also affects the output from
> other guests).
>
> Do so by pairing the console_serial_puts() with
> serial_{start,end}_log_everything(), so that no output is dropped.

 While I can see the goal, why would Dom0 output be (effectively) more
 important than Xen's own one (which isn't "forced")? And with this
 aiming at boot output only, wouldn't you want to stop the overriding
 once boot has completed (of which, if I'm not mistaken, we don't
 really have any signal coming from Dom0)? And even during boot I'm
 not convinced we'd want to let through everything, but perhaps just
 Dom0's kernel messages?
>>>
>>> I normally use sync_console on all the boxes I'm doing dev work, so
>>> this request is something that come up internally.
>>>
>>> Didn't realize Xen output wasn't forced, since we already have rate
>>> limiting based on log levels I was assuming that non-ratelimited
>>> messages wouldn't be dropped.  But yes, I agree that Xen (non-guest
>>> triggered) output shouldn't be rate limited either.
>>
>> Which would raise the question of why we have log levels for non-guest
>> messages.
>
> Hm, maybe I'm confused, but I don't see a direct relation between log
> levels and rate limiting.  If I set log level to WARNING I would
> expect to not loose _any_ non-guest log messages with level WARNING or
> above.  It's still useful to have log levels for non-guest messages,
> since user might want to filter out DEBUG non-guest messages for
> example.

 It was me who was confused, because of the two log-everything variants
 we have (console and serial). You're right that your change is unrelated
 to log levels. However, when there are e.g. many warnings or when an
 admin has lowered the log level, what you (would) do is effectively
 force sync_console mode transiently (for a subset of messages, but
 that's secondary, especially because the "forced" output would still
 be waiting for earlier output to make it out).
>>>
>>> Right, it would have to wait for any previous output on the buffer to
>>> go out first.  In any case we can guarantee that no more output will
>>> be added to the buffer while Xen waits for it to be flushed.
>>>
>>> So for the hardware domain it might make sense to wait for the TX
>>> buffers to be half empty (the current tx_quench logic) by preempting
>>> the hypercall.  That however could cause issues if guests manage to
>>> keep filling the buffer while the hardware domain is being preempted.
>>>
>>> Alternatively we could always reserve half of the buffer for the
>>> hardware domain, and allow it to be preempted while waiting for space
>>> (since it's guaranteed non hardware domains won't be able to steal the
>>> allocation from the hardware domain).
>>
>> Getting complicated it seems. I have to admit that I wonder whether we
>> wouldn't be better off leaving the current logic as is.
> 
> Another possible solution (more like a band aid) is to increase the
> buffer size from 4 pages to 8 or 16.  That would likely allow to cope
> fine with the high throughput of boot messages.

You mean the buffer whose size is controlled by serial_tx_buffer? On
large systems one may want to simply make use of the command line
option then; I don't think the built-in default needs changing. Or
if so, then perhaps not statically at build time, but taking into
account system properties (like CPU count).

Jan