Re: [PATCH] x86/x2apic: introduce a mixed physical/cluster mode

2023-12-04 Thread Jan Beulich
On 04.12.2023 02:02, Elliott Mitchell wrote:
> On Mon, Nov 27, 2023 at 09:27:18AM +0100, Roger Pau Monn
>> On Fri, Nov 24, 2023 at 05:15:34PM -0800, Elliott Mitchell wrote:
>>> On Thu, Nov 23, 2023 at 10:39:37AM +0100, Roger Pau Monn
 On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
> It was insisted that full logs be sent to xen-devel.  Perhaps I am
> paranoid, but I doubt I would have been successful at scrubbing all
> hardware serial numbers.  As such, I was willing to post substantial
> sections, but not everything.

 Can you please point out which hardware serial numbers are you
 referring to, and where are they printed in Xen dmesg?
>>>
>>> I didn't confirm the presence of hardware serial numbers getting into
>>> Xen's dmesg, but there was a lot of data and many hexadecimal numbers.
>>> With 4000 lines of output, checking for concerning data is troublesome.
>>
>> 4000 lines of output from Xen dmesg seems quite suspicious.  Xen dmesg
>> on a fully booted box is ~400 lines on my local box.  That might get
>> quite longer if you have a box with a lot of memory region, or a lot
>> of CPUs.
> 
> That was from 4 boots with differing settings.  Since it was full console
> log, it also had the initial Linux kernel boot messages.  Each log was
> ~1000 lines.
> 
>>> There was enough for alarms to trigger.
>>
>> That's fine, but without pointing out which information is sensitive,
>> it's very unlikely such concerns will ever get fixed.
>>
>> Could you at least go over the log and point out the first instance of
>> such line that you consider contains sensitive information?
> 
> I would have been more comfortable with getting some guidance on which
> portions were desired or which could be left out.  No need for Linux's
> boot messages, what would cut things down by half.  No need for the
> memory map, lots more goes.

I didn't think "xl dmesg" spit out any Dom0 kernel messages by default.

Jan

> It is easier to be comfortable with 40 line sections than 1000 line
> sections.
> 
> 




Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-12-04 Thread Jan Beulich
On 04.12.2023 06:02, Kelly Choi wrote:
> In the specific example above, it's difficult in the sense that informal
> voting wasn't officially in the governance yet when the feedback was
> raised. What I would recommend in this instance is that if George and
> others feel very strongly about removing that term and have given a proper
> explanation, then I'd advise calling an informal vote within the thread and
> following the decision. Alternatively if after this conversation, members
> understand Andy's point of view and the term doesn't have serious
> consequences - let's agree with what Andy inputted in the first place and
> move this project ahead.  In an ideal world, we wouldn't require voting,
> but rather a discussion. However, if there are strong opinions for/against
> a specific decision that is causing us to be at a standstill, this is where
> informal voting helps.

I have some trouble with what the above expresses (and what was an issue
already before): First, the subject of this thread says "informal", yet it
invokes more what I'd call a formal vote. Then above you again say "calling
an informal vote", which isn't my understanding from what an informal vote
is. That's primarily based on my experience with the earlier informal vote
that was taken on another subject: It was simply assumed that supplied tags
had already cast a vote. Similarly here I'd expect that the opinions already
expressed simply constitute the "informal" in the entire process. The
difference merely is that in the other case a majority was in favor (which
can be expressed by A-b tags), while here a majority is against (which
cannot sensibly be expressed by tags, as a NAK would imo be pretty
inappropriate).

Jan



Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough device in PVH dom0

2023-12-04 Thread Thomas Gleixner
On Fri, Nov 24 2023 at 18:31, Jiqian Chen wrote:
> diff --git a/drivers/xen/xen-pciback/pci_stub.c 
> b/drivers/xen/xen-pciback/pci_stub.c
> index 5a96b6c66c07..b83d02bcc76c 100644
> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -357,6 +357,7 @@ static int pcistub_match(struct pci_dev *dev)
>  static int pcistub_init_device(struct pci_dev *dev)
>  {
>   struct xen_pcibk_dev_data *dev_data;
> + struct irq_desc *desc = NULL;
>   int err = 0;
>  
>   dev_dbg(&dev->dev, "initializing...\n");
> @@ -399,6 +400,12 @@ static int pcistub_init_device(struct pci_dev *dev)
>   if (err)
>   goto config_release;
>  
> + if (xen_initial_domain() && xen_pvh_domain()) {
> + if (dev->irq <= 0 || !(desc = irq_to_desc(dev->irq)))

Driver code has absolutely no business to access irq_desc.

> + goto config_release;
> + unmask_irq(desc);

Or to invoke any internal function.

> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -439,6 +439,7 @@ void unmask_irq(struct irq_desc *desc)
>   irq_state_clr_masked(desc);
>   }
>  }
> +EXPORT_SYMBOL_GPL(unmask_irq);

Not going to happen.

> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -380,7 +380,7 @@ struct irq_desc *irq_to_desc(unsigned int irq)
>  {
>   return mtree_load(&sparse_irqs, irq);
>  }
> -#ifdef CONFIG_KVM_BOOK3S_64_HV_MODULE
> +#if defined CONFIG_KVM_BOOK3S_64_HV_MODULE || defined CONFIG_XEN_PVH

Neither that.

This all smells badly like a XEN internal issue and we are not going to
hack around it by exposing interrupt internals.

Thanks,

tglx



Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-12-04 Thread Jan Beulich
On 01.12.2023 22:44, Stefano Stabellini wrote:
> Replying here on a couple of different people on this thread.
> 
> 
> On Thu, 30 Nov 2023, Tamas K Lengyel wrote:
>> I think this form is bad and is not helpful. 
> 
> I agree with Tamas and (also Jan) that this form is not helpful.
> 
> 
> On Fri, 1 Dec 2023, George Dunlap wrote:
>> If most people in the community really do think that "broken" is
>> suitable for the documentation in our project, then of course the
>> maintainers should stop objecting to that kind of language.  If most
>> of the people in the community think that "broken" is *not* suitable
>> for technical documentation, then of course this isn't an example of
>> unreasonable review (although other instances may be).
> 
> I think there was a misconception when Kelly created this form that the
> push back was on the usage of the word "broken" globally in Xen Project.
> It is not the case.
> 
> I for example agree that "broken" can be used in Xen Project, but I
> don't think that it is a good idea to use it in that specific instance.
> 
> 
> On Fri, 1 Dec 2023, George Dunlap wrote:
>> [Andy] removing "broken" is a completely unreasonable request
> 
> I am in favor on moving faster and nitpicking less. Also, Andy put the
> effort to produce the patch so he should have the default choice in the
> wording. If the choice is taking the patch as is or rejecting it, I
> would take it as is.

I'm afraid there's also disagreement about where nitpicking starts. To me
"broken" in the context here is technically incorrect. Hence asking for
the wording to be changed wouldn't count as nitpicking to me. When badly
worded comments of mine are commented upon, I also don't count this as
nitpicking (there's always a grey area, of course).

Jan

> I might have a preference on a different wording and I voiced that
> preference. We could say that my request should have been optional, not
> mandatory. But when the majority of reviewers request the same thing,
> which wording choice should apply?
> 
> If we decide to ignore the feedback as unresonable or because it should
> have been all optional and commit the patch, what would stop anyone from
> sending a patch to "fix" the wording in the comments to use "deprecated"
> instead? And if someone pushes back on the second patch, would that be
> nitpicking? If we commit the second patch, what if someone send a third
> patch changing the wording back to "broken"? We risk getting into
> "commit wars".
> 
> To avoid this, I think we should go with the majority, whatever that is,
> and the decision has to stick. We have just introduced informal votes.
> We should have just used that in the original thread. By the informal
> voting, we have 3 against "broken" and 2 in favor (not 1 as George wrote
> as Andrew's vote counts too). 
> 
> The easiest would have been to go with the majority, resend the patch,
> get it committed. If Andrew feels strongly that the "broken" is the best
> wording, then a proper voting form is a good idea, like Kelly did (which
> I think is a full formal vote, not an informal vote). Except that the
> form Kelly created is too generic and has too few options.
> 
> In conclusion, I don't care about the wording. I do care that we reach
> alignment and move forward quicker. I think the informal voting
> mechanism is the best way to do it.




Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs

2023-12-04 Thread Jan Beulich
On 02.12.2023 03:56, Stefano Stabellini wrote:
> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>  bus = PCI_BUS(machine_sbdf);
>>>  devfn = PCI_DEVFN(machine_sbdf);
>>>  
>>> +if ( needs_vpci(d) && !has_vpci(d) )
>>> +{
>>> +printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI 
>>> support not enabled\n",
>>> +   &PCI_SBDF(seg, bus, devfn), d);
>>> +ret = -EPERM;
>>> +break;
>>
>> I think this is likely too restrictive going forward.  The current
>> approach is indeed to enable vPCI on a per-domain basis because that's
>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>
>> If we start to expose vPCI suport to guests the interface should be on
>> a per-device basis, so that vPCI could be enabled for some devices,
>> while others could still be handled by ioreq servers.
>>
>> We might want to add a new flag to xen_domctl_assign_device (used by
>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>> use vPCI.
> 
> Actually I don't think this is a good idea. I am all for flexibility but
> supporting multiple different configurations comes at an extra cost for
> both maintainers and contributors. I think we should try to reduce the
> amount of configurations we support rather than increasing them
> (especially on x86 where we have PV, PVH, HVM).
> 
> I don't think we should enable IOREQ servers to handle PCI passthrough
> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> Passthrough can be handled by vPCI just fine. I think this should be a
> good anti-feature to have (a goal to explicitly not add this feature) to
> reduce complexity. Unless you see a specific usecase to add support for
> it?

I could see very special devices to want accompanying by a special ioreq
server. I could also see tandem pass-through/emulated device pairs to
potentially be of use and require coordination in the same ioreq server.

That said, I wouldn't insist on allowing a mix of vPCI and ioreq servers
to be used until an actual use case (with code supporting it) actually
appears.

Jan



Re: [PATCH v11 16/17] xen/arm: vpci: permit access to guest vpci space

2023-12-04 Thread Jan Beulich
On 02.12.2023 02:27, Volodymyr Babchuk wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -695,6 +695,9 @@ struct domain *domain_create(domid_t domid,
>  radix_tree_init(&d->pirq_tree);
>  }
>  
> +if ( !is_idle_domain(d) )
> +d->iomem_caps = rangeset_new(d, "I/O Memory", 
> RANGESETF_prettyprint_hex);
> +
>  if ( (err = arch_domain_create(d, config, flags)) != 0 )
>  goto fail;
>  init_status |= INIT_arch;
> @@ -711,7 +714,6 @@ struct domain *domain_create(domid_t domid,
>  watchdog_domain_init(d);
>  init_status |= INIT_watchdog;
>  
> -d->iomem_caps = rangeset_new(d, "I/O Memory", 
> RANGESETF_prettyprint_hex);
>  d->irq_caps   = rangeset_new(d, "Interrupts", 0);
>  if ( !d->iomem_caps || !d->irq_caps )
>  goto fail;

I'm pretty sure I asked before why I/O mem caps' init would be moved, but
IRQ caps' would remain where it is.

Jan



Re: [PATCH v11 06/17] vpci/header: rework exit path in init_bars

2023-12-04 Thread Jan Beulich
On 02.12.2023 02:27, Volodymyr Babchuk wrote:
> Introduce "fail" label in init_bars() function to have the centralized
> error return path. This is the pre-requirement for the future changes
> in this function.
> 
> This patch does not introduce functional changes.
> 
> Signed-off-by: Volodymyr Babchuk 
> Suggested-by: Roger Pau Monné 
> Acked-by: Roger Pau Monné 

Nit: Tags in chronological order please.

Jan



Re: [XEN PATCH v2 1/3] automation/eclair: tag function calls to address violations of MISRA C:2012 Rule 13.1

2023-12-04 Thread Simone Ballarin

On 02/12/23 04:19, Stefano Stabellini wrote:

On Fri, 24 Nov 2023, Simone Ballarin wrote:

Rule 13.1: Initializer lists shall not contain persistent side effects

Invocations of functions in initializer lists cause violations of rule
13.1 if the called functions are not tagged with __attribute_pure__ or
__attribute_const__ as they can produce persistent side effects.

Handling these violations with  attributes is not always possible: the
pure and const attributes may cause unwanted and potentially dangerous
optimisations.

To avoid this problem ECLAIR allows using the same attributes in the
-call_properties setting. Additionally, it adds the noeffect attribute
with the following definition:
"like pure but can also read volatile variable not triggering side effects"

These patch tags some functions used in initializer lists to address
violations of Rule 13.1.

No functional changes.

Signed-off-by: Simone Ballarin 


Ideally we should also list them somewhere in a document, maybe
docs/misra/deviations.rst? Or a new doc? It would be best if this info
wouldn't only exist in call_properties.ecl.


They are not actually deviations, but information that can help
document the code: I suggest creating a new document.

Then, ECLAIR or any other tool will be able to retrieve these properties
directly from this new file.

If you agree I will do it in a separate patch.



But give that the below is OK:
Acked-by: Stefano Stabellini 



---
Changes in v2:
New patch partly based on "xen/arm: address violations of MISRA C:2012 Rule 
13.1"
and "xen/include: add pure and const attributes". This new patch uses
ECL tagging instead of compiler attributes.
---
  .../ECLAIR/call_properties.ecl| 22 +++
  1 file changed, 22 insertions(+)

diff --git a/automation/eclair_analysis/ECLAIR/call_properties.ecl 
b/automation/eclair_analysis/ECLAIR/call_properties.ecl
index 3f7794bf8b..c2b2a6182e 100644
--- a/automation/eclair_analysis/ECLAIR/call_properties.ecl
+++ b/automation/eclair_analysis/ECLAIR/call_properties.ecl
@@ -73,6 +73,17 @@
  -call_properties+={"macro(^va_start$)", {"pointee_write(1=always)", 
"pointee_read(1=never)", "taken()"}}
  -call_properties+={"macro(^memcmp$)", {"pointee_write(1..2=never)", 
"taken()"}}
  -call_properties+={"macro(^memcpy$)", {"pointee_write(1=always&&2..=never)", 
"pointee_read(1=never&&2..=always)", "taken()"}}
+-call_properties+={"name(get_cpu_info)",{pure}}
+-call_properties+={"name(pdx_to_pfn)",{pure}}
+-call_properties+={"name(is_pci_passthrough_enabled)",{const}}
+-call_properties+={"name(get_cycles)", {"noeffect"}}
+-call_properties+={"name(msi_gflags)",{const}}
+-call_properties+={"name(hvm_save_size)",{pure}}
+-call_properties+={"name(cpu_has)",{pure}}
+-call_properties+={"name(boot_cpu_has)",{pure}}
+-call_properties+={"name(get_cpu_info)",{pure}}
+-call_properties+={"name(put_pte_flags)",{const}}
+-call_properties+={"name(is_pv_vcpu)",{pure}}
  
  -doc_begin="Property inferred as a consequence of the semantics of device_tree_get_reg"

  -call_properties+={"name(acquire_static_memory_bank)", {"pointee_write(4..=always)", 
"pointee_read(4..=never)", "taken()"}}
@@ -104,3 +115,14 @@ Furthermore, their uses do initialize the involved 
variables as needed by futher
  -call_properties+={"macro(^(__)?(raw_)?copy_from_(paddr|guest|compat)(_offset)?$)", 
{"pointee_write(1=always)", "pointee_read(1=never)", "taken()"}}
  -call_properties+={"macro(^(__)?copy_to_(guest|compat)(_offset)?$)", {"pointee_write(2=always)", 
"pointee_read(2=never)", "taken()"}}
  -doc_end
+
+-doc_begin="Functions generated by build_atomic_read cannot be considered pure
+since the input pointer is volatile, but they do not produce any persistent 
side
+effect."
+-call_properties+={"^read_u(8|16|32|64|int)_atomic.*$", {noeffect}}
+-doc_end
+
+-doc_begin="Functions generated by TYPE_SAFE are const."
+-call_properties+={"^(mfn|gfn|pfn)_x\\(.*$",{const}}
+-call_properties+={"^_(mfn|gfn|pfn)\\(.*$",{const}}
+-doc_end
--
2.34.1



--
Simone Ballarin, M.Sc.

Field Application Engineer, BUGSENG (https://bugseng.com)




Re: [PATCH v11 09/17] rangeset: add rangeset_empty() function

2023-12-04 Thread Jan Beulich
On 02.12.2023 02:27, Volodymyr Babchuk wrote:
> This function can be used when user wants to remove all rangeset
> entries but do not want to destroy rangeset itself.

I have to admit that I'm not happy with the name: We're not consistently
naming all predicate-like helpers is_...() (see e.g. cpumask_empty()).
May I suggest to use something which unambiguously expresses an action to
be taken, e.g. rangeset_purge(), rangeset_reset(), or (less suitable as
some ambiguity would remain, yet it would be in line with e.g.
cpumask_clear()) rangeset_clear()?

Jan



Re: [RFC PATCH] xen/arm: Add emulation of Debug Data Transfer Registers

2023-12-04 Thread Michal Orzel
Hi Ayan,

On 01/12/2023 19:50, Ayan Kumar Halder wrote:
> Currently if user enables HVC_DCC config option in Linux, it invokes
> access to debug data transfer registers (ie DBGDTRTX_EL0 on arm64,
> DBGDTRTXINT on arm32). As these registers are not emulated, Xen injects
> an undefined exception to the guest. And Linux crashes.
> 
> We wish to avoid this crash by adding a "partial" emulation. DBGDTR_EL0
> is emulated as TXfull | RXfull.
TX/RX are status bits of MDCCSR_EL0, not DBGDTR_EL0. This applies here and 
elsewhere.

> Refer ARM DDI 0487I.a ID081822, D17.3.8, DBGDTRTX_EL0
> "If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN"
> Also D17.3.7 DBGDTRRX_EL0,
> " If RXfull is set to 1, return the last value written to DTRRX."
> 
> Thus, any OS is expected to read DBGDTR_EL0 and check for TXfull
> before using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() --->
> hvc_dcc_check() , it returns -ENODEV. In this way, we are preventing
> the guest to be aborted.
> 
> We also emulate DBGDTRTX_EL0 as RAZ/WI.
> 
> We have added emulation for AArch32 variant of these registers as well.
> Also, we have added handle_read_val_wi() to emulate DBGDSCREXT register
Emulating DBGDSCREXT is not needed. See below.

> to return a specific value (ie TXfull | RXfull) and ignore any writes
> to this register.
> 
> Signed-off-by: Michal Orzel 
> Signed-off-by: Ayan Kumar Halder 
> ---
>  xen/arch/arm/arm64/vsysreg.c | 21 ++
>  xen/arch/arm/include/asm/arm64/hsr.h |  3 +++
>  xen/arch/arm/include/asm/cpregs.h|  2 ++
>  xen/arch/arm/include/asm/traps.h |  4 
>  xen/arch/arm/traps.c | 18 +++
>  xen/arch/arm/vcpreg.c| 33 +---
>  6 files changed, 69 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
> index b5d54c569b..5082dfb02e 100644
> --- a/xen/arch/arm/arm64/vsysreg.c
> +++ b/xen/arch/arm/arm64/vsysreg.c
> @@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs,
>   *
>   * Unhandled:
>   *MDCCINT_EL1
> - *DBGDTR_EL0
> - *DBGDTRRX_EL0
> - *DBGDTRTX_EL0
>   *OSDTRRX_EL1
>   *OSDTRTX_EL1
>   *OSECCR_EL1
> @@ -172,11 +169,27 @@ void do_sysreg(struct cpu_user_regs *regs,
>  case HSR_SYSREG_MDSCR_EL1:
>  return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
>  case HSR_SYSREG_MDCCSR_EL0:
> +{
> +/*
> + * Bit 29: TX full, bit 30: RX full
> + * Given that we emulate DCC registers as RAZ/WI, doing the same for
> + * MDCCSR_EL0 would cause a guest to continue using the DCC despite 
> no
> + * real effect. Setting the TX/RX status bits should result in a 
> probe
> + * fail (based on Linux behavior).
> + */
> +register_t guest_reg_value = (1U << 29) | (1U << 30);
> +
>  /*
>   * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We emulate 
> that
>   * register as RAZ/WI above. So RO at both EL0 and EL1.
>   */
> -return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0);
> +return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 0,
> +  guest_reg_value);
> +}
> +case HSR_SYSREG_DBGDTR_EL0:
> +/* DBGDTR[TR]X_EL0 share the same encoding */
> +case HSR_SYSREG_DBGDTRTX_EL0:
> +return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0);
>  HSR_SYSREG_DBG_CASES(DBGBVR):
>  HSR_SYSREG_DBG_CASES(DBGBCR):
>  HSR_SYSREG_DBG_CASES(DBGWVR):
> diff --git a/xen/arch/arm/include/asm/arm64/hsr.h 
> b/xen/arch/arm/include/asm/arm64/hsr.h
> index e691d41c17..1495ccddea 100644
> --- a/xen/arch/arm/include/asm/arm64/hsr.h
> +++ b/xen/arch/arm/include/asm/arm64/hsr.h
> @@ -47,6 +47,9 @@
>  #define HSR_SYSREG_OSDLR_EL1  HSR_SYSREG(2,0,c1,c3,4)
>  #define HSR_SYSREG_DBGPRCR_EL1HSR_SYSREG(2,0,c1,c4,4)
>  #define HSR_SYSREG_MDCCSR_EL0 HSR_SYSREG(2,3,c0,c1,0)
> +#define HSR_SYSREG_DBGDTR_EL0 HSR_SYSREG(2,3,c0,c4,0)
> +#define HSR_SYSREG_DBGDTRTX_EL0   HSR_SYSREG(2,3,c0,c5,0)
> +#define HSR_SYSREG_DBGDTRRX_EL0   HSR_SYSREG(2,3,c0,c5,0)
>  
>  #define HSR_SYSREG_DBGBVRn_EL1(n) HSR_SYSREG(2,0,c0,c##n,4)
>  #define HSR_SYSREG_DBGBCRn_EL1(n) HSR_SYSREG(2,0,c0,c##n,5)
> diff --git a/xen/arch/arm/include/asm/cpregs.h 
> b/xen/arch/arm/include/asm/cpregs.h
> index 6b083de204..aec9e8f329 100644
> --- a/xen/arch/arm/include/asm/cpregs.h
> +++ b/xen/arch/arm/include/asm/cpregs.h
> @@ -75,6 +75,8 @@
>  #define DBGDIDR p14,0,c0,c0,0   /* Debug ID Register */
>  #define DBGDSCRINT  p14,0,c0,c1,0   /* Debug Status and Control Internal 
> */
>  #define DBGDSCREXT  p14,0,c0,c2,2   /* Debug Status and Control External 
> */
> +#define DBGDTRRXINT p14,0,c0,c5,0   /* Debug Data Transfer Register, 
> Receive */
> +#define DBGDTRTXINT p14,0,c0,c5,0   /* Debug Data Transfer Register, 
> Transmit */
>  #

Re: [PATCH v5 5/7] xen: ifdef inclusion of in

2023-12-04 Thread Jan Beulich
On 01.12.2023 21:48, Oleksii Kurochko wrote:
> Ifdef-ing inclusion of  allows to avoid
> generation of empty  for cases when
> CONFIG_GRANT_TABLE is not enabled.
> 
> The following changes were done for Arm:
>  should be included directly because it contains
> gnttab_dom0_frames() macros which is unique for Arm and is used in
> arch/arm/domain_build.c.
>  is #ifdef-ed with CONFIG_GRANT_TABLE in
>  so in case of !CONFIG_GRANT_TABLE gnttab_dom0_frames
> won't be available for use in arch/arm/domain_build.c.
> 
> Suggested-by: Jan Beulich 

Not really, no: In particular ...

> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -15,6 +15,7 @@ config CORE_PARKING
>  config GRANT_TABLE
>   bool "Grant table support" if EXPERT
>   default y
> + depends on ARM || X86

... this I explicitly said I consider wrong to add.

Jan



Re: [PATCH v5 6/7] xen/asm-generic: ifdef inclusion of

2023-12-04 Thread Jan Beulich
On 01.12.2023 21:48, Oleksii Kurochko wrote:
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -87,7 +87,7 @@ config MEM_ACCESS_ALWAYS_ON
>  config MEM_ACCESS
>   def_bool MEM_ACCESS_ALWAYS_ON
>   prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON
> - depends on HVM
> + depends on HVM && (ARM || X86)

While - unlike for GRANT_TABLE - I view going this route as an option, I
still think it wants doing without explicitly naming architectures here.
There wants to be a HAS_MEM_ACCESS, selected by Arm's Kconfig and by
MEM_ACCESS_ALWAYS_ON (which in turn x86 selects). This could then also
replace the HVM dependency here - x86 ought to select MEM_ACCESS_ALWAYS_ON
only when HVM is enabled.

Jan



[PATCH 1/2] x86/xen: add CPU dependencies for 32-bit build

2023-12-04 Thread Arnd Bergmann
From: Arnd Bergmann 

Xen only supports modern CPUs even when running a 32-bit kernel, and it now
requires a kernel built for a 64 byte (or larger) cache line:

In file included from :
In function 'xen_vcpu_setup',
inlined from 'xen_vcpu_setup_restore' at arch/x86/xen/enlighten.c:111:3,
inlined from 'xen_vcpu_restore' at arch/x86/xen/enlighten.c:141:3:
include/linux/compiler_types.h:435:45: error: call to 
'__compiletime_assert_287' declared with attribute error: BUILD_BUG_ON failed: 
sizeof(*vcpup) > SMP_CACHE_BYTES
arch/x86/xen/enlighten.c:166:9: note: in expansion of macro 'BUILD_BUG_ON'
  166 | BUILD_BUG_ON(sizeof(*vcpup) > SMP_CACHE_BYTES);
  | ^~~~

Enforce the dependency with a whitelist of CPU configurations. In normal
distro kernels, CONFIG_X86_GENERIC is enabled, and this works fine. When this
is not set, still allow Xen to be built on kernels that target a 64-bit
capable CPU.

Fixes: db2832309a82 ("x86/xen: fix percpu vcpu_info allocation")
Signed-off-by: Arnd Bergmann 
---
 arch/x86/xen/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 9b1ec5d8c99c..a65fc2ae15b4 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -9,6 +9,7 @@ config XEN
select PARAVIRT_CLOCK
select X86_HV_CALLBACK_VECTOR
depends on X86_64 || (X86_32 && X86_PAE)
+   depends on X86_64 || (X86_GENERIC || MPENTIUM4 || MCORE2 || MATOM || 
MK8)
depends on X86_LOCAL_APIC && X86_TSC
help
  This is the Linux Xen port.  Enabling this will allow the
-- 
2.39.2




Re: [RFC KERNEL PATCH v2 1/3] xen/pci: Add xen_reset_device_state function

2023-12-04 Thread Chen, Jiqian

On 2023/12/4 15:58, Thomas Gleixner wrote:
> On Fri, Nov 24 2023 at 18:31, Jiqian Chen wrote:
>> When device on dom0 side has been reset, the vpci on Xen side
>> won't get notification, so that the cached state in vpci is
>> all out of date with the real device state.
>> To solve that problem, this patch add a function to clear all
> 
> Please get rid of 'this patch' all over the series.
> 
> # grep -r 'This patch' Documentation/process/
Thanks. I will remove "this patch" or "I/we" in next version.

> 
>> vpci device state when device is reset on dom0 side.
>>
>> And call that function in pcistub_init_device. Because when
>> we use "pci-assignable-add" to assign a passthrough device in
>> Xen, it will reset passthrough device and the vpci state will
>> out of date, and then device will fail to restore bar state.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
> 
> This Signed-off-by chain is incorrect.
> 
> Documentation/process/submitting-patches.rst has a full chapter about
> S-O-B and the correct usage.
I am the author of this series of patches, and Huang Rui transported the v1 to 
upstream. And now I transport v2. I am not aware that the SOB chain is 
incorrect.
Do you have any suggestions?

> 
> Thanks,
> 
> tglx

-- 
Best regards,
Jiqian Chen.


Re: xen | Failed pipeline for staging | 525c7c09

2023-12-04 Thread Jan Beulich
On 02.12.2023 22:00, GitLab wrote:
> 
> 
> Pipeline #1092667236 has failed!
> 
> Project: xen ( https://gitlab.com/xen-project/xen )
> Branch: staging ( https://gitlab.com/xen-project/xen/-/commits/staging )
> 
> Commit: 525c7c09 ( 
> https://gitlab.com/xen-project/xen/-/commit/525c7c094b258e8a46b494488eef96f5670eb352
>  )
> Commit Message: xen/arm: Move static event channel feature to a...
> Commit Author: Michal Orzel ( https://gitlab.com/orzelmichal )
> Committed by: Julien Grall
> 
> 
> Pipeline #1092667236 ( 
> https://gitlab.com/xen-project/xen/-/pipelines/1092667236 ) triggered by 
> Ganis ( https://gitlab.com/ganis )
> had 3 failed jobs.
> 
> Job #5664669062 ( https://gitlab.com/xen-project/xen/-/jobs/5664669062/raw )
> 
> Stage: test
> Name: zen3p-smoke-x86-64-gcc-debug
> Job #5664669074 ( https://gitlab.com/xen-project/xen/-/jobs/5664669074/raw )
> 
> Stage: test
> Name: zen3p-smoke-x86-64-dom0pvh-gcc-debug
> Job #5664669076 ( https://gitlab.com/xen-project/xen/-/jobs/5664669076/raw )
> 
> Stage: test
> Name: zen3p-pci-hvm-x86-64-gcc-debug

None of the referenced "raw" files look to have any contents at all. What does
this mean?

Jan



Re: [BUG]i2c_hid_acpi broken with 4.17.2 on Framework Laptop 13 AMD

2023-12-04 Thread Jan Beulich
On 03.12.2023 10:56, Sébastien Chaumat wrote:
> Hello,
> 
>  Trying to get the Framework Laptop 13 AMD to work with QubesOS I hit the
> following Xen issue :
> 
> Xen version : 4.17.2
> Kernel : 6.5.12-300.fc39.x86_64
> CPU  model name : AMD Ryzen 7 7840U w/ Radeon  780M Graphics
> 
> The touchpad is not working (not detected by evtest) because ( see below
> for XXX values) :
> [   10.215870] i2c_hid_acpi i2c-FRMXXX: failed to reset device: -61
> 
> which is maybe related to the previous messages :
> 
> [2.065750] [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW
> (0x0)

Not very likely to be connected to this. Afaict you'll see these on all
systems.

> ...
> [2.464598] amd_gpio AMDI0030:00: failed to enable wake-up interrupt

Possibly releated to this. You'll want to obtain a full-verbosity hypervisor
log with a debug hypervisor, as there may be hypervisor debug messages
telling us what Xen may not like.

Jan



Re: xen | Failed pipeline for staging | 525c7c09

2023-12-04 Thread Michal Orzel



On 04/12/2023 09:56, Jan Beulich wrote:
> 
> 
> On 02.12.2023 22:00, GitLab wrote:
>>
>>
>> Pipeline #1092667236 has failed!
>>
>> Project: xen ( https://gitlab.com/xen-project/xen )
>> Branch: staging ( https://gitlab.com/xen-project/xen/-/commits/staging )
>>
>> Commit: 525c7c09 ( 
>> https://gitlab.com/xen-project/xen/-/commit/525c7c094b258e8a46b494488eef96f5670eb352
>>  )
>> Commit Message: xen/arm: Move static event channel feature to a...
>> Commit Author: Michal Orzel ( https://gitlab.com/orzelmichal )
>> Committed by: Julien Grall
>>
>>
>> Pipeline #1092667236 ( 
>> https://gitlab.com/xen-project/xen/-/pipelines/1092667236 ) triggered by 
>> Ganis ( https://gitlab.com/ganis )
>> had 3 failed jobs.
>>
>> Job #5664669062 ( https://gitlab.com/xen-project/xen/-/jobs/5664669062/raw )
>>
>> Stage: test
>> Name: zen3p-smoke-x86-64-gcc-debug
>> Job #5664669074 ( https://gitlab.com/xen-project/xen/-/jobs/5664669074/raw )
>>
>> Stage: test
>> Name: zen3p-smoke-x86-64-dom0pvh-gcc-debug
>> Job #5664669076 ( https://gitlab.com/xen-project/xen/-/jobs/5664669076/raw )
>>
>> Stage: test
>> Name: zen3p-pci-hvm-x86-64-gcc-debug
> 
> None of the referenced "raw" files look to have any contents at all. What does
> this mean?
It means that jobs got stuck and we should ping Marek.

~Michal



Re: Trying add smt=off disabled cores to cpupool crash xen

2023-12-04 Thread Jan Beulich
On 01.12.2023 21:12, Andrew Cooper wrote:
> On 01/12/2023 7:59 pm, René Winther Højgaard wrote:
>> If I set smt=off and try to configure cpupools with credit(1) as if
>> all cores are available, I get the following crash.  
>>
>> The crash happens when I try to use xl cpupool-add-cpu on the disabled
>> HT sibling cores.
>>
>> Hyper-threading is enabled in the firmware, and only disabled with
>> smt=off.
> 
> CC'ing some maintainers.
> 
> I expect this will also explode when a CPU is runtime offlined with
> `xen-hptool cpu-offline` and then added to a cpupool.
> 
> Interestingly, the crash is mov (%rdx,%rax,1),%r13, and I think that's
> the percpu posion value in %rdx.
> 
> I expect cpupools want to reject parked/offline CPUs.

While the only explicit check there is

if ( cpu >= nr_cpu_ids )
goto addcpu_out;

I would have expected this

if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
 cpumask_intersects(cpus, &cpupool_locked_cpus) )
goto addcpu_out;

to deal with the situation, as parked/offline CPUs shouldn't be "free".
Jürgen?

Jan



Improving conflict resolution inside the Xen Project

2023-12-04 Thread Charles-H. Schulz
Hello,

Following up on the discussions on the use of certain terms and the way to
handle disagreements that ultimately lead to conflicts, I would like to
suggest certain mecanisms to more effectively handle such conflicts. This
does not mean I'm against any informal or formal voting - I think it was the
right thing to do at the right time. But looking forward we need to handle
conflicts resolution in a way that inflicts the minimum possible tension and
paralysis on the code contributions and bug fixing.

This means that we need to move to a model that's already quite common
in other projects, namely, a model where conflicts resolution is dealt with
processes implemented by one of the more or less formal committees chartered
or existing in the community governance.

What this would imply: in the absence of a more complex governance within the
Xen Project I suggest the Advisory Board be put in charge of conflicts
resolution. With a simple process that I (and others hopefully) will clarify
soon, the Advisory Board would within a specific timeframe collect and
analyze the conflict either on its own or called upon by one or more
community members. It could effectively talk in more detail and off lists to
the relevant persons and then suggest a way out of the conflict.

Ideally - it's Monday and Christmas is not far so let's dream a bit -  this
would lift the burden of the conflict management away from the development
list and allow it to be focused on development. The Advisory Board should in
these cases work in harmony with the informally proposed Technical Advisoy
Board in order to ensure its decisions are not just the consensus of its own
members but are well understood throughout the project.

Hope this helps,
-- 
Charles-H. Schulz
Vates SAS.



signature.asc
Description: PGP signature


Re: Moving domain from credit2 to credit cpupool crash xen

2023-12-04 Thread Jan Beulich
On 01.12.2023 21:13, René Winther Højgaard wrote:
> When I move a domain from pool0 with credit2 to any pool with credit(1) I get 
> the following crash.
> 
> 
> Software: Xen-4.17.3 / Qubes OS 4.2.0-RC5
> Firmware: Dasharo 0.9.0 - Z790P
> Hardware: 13900K
> (XEN) Xen BUG at common/sched/credit.c:1051(XEN) [ Xen-4.17.3-pre  x86_64 
>  debug=y  Not tainted ]
> (XEN) CPU:    2
> (XEN) RIP:    e008:[] credit.c#csched_free_udata+0x12/0x14
> (XEN) RFLAGS: 00010202   CONTEXT: hypervisor (d0v2)
> (XEN) rax: 82d040237ceb   rbx: 0014   rcx: 0013
> (XEN) rdx: 831087d7   rsi: 830ad80e8da0   rdi: 830ad80e8da0
> (XEN) rbp:    rsp: 831087d7fc90   r8:  830e2d6a49b0
> (XEN) r9:  831087d7fbe0   r10: 83107c481068   r11: 002cfd1c274a
> (XEN) r12: 830ad80e8c80   r13: 83107c45bee0   r14: 
> (XEN) r15: 82d0405a9288   cr0: 80050033   cr4: 00b526e0
> (XEN) cr3: 0009284d8000   cr2: 7fb535181240
> (XEN) fsb: 7fb534c5f380   gsb: 8881b9d0   gss: 
> (XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
> (XEN) Xen code around  
> (credit.c#csched_free_udata+0x12/0x14):
> (XEN)  75 06 e8 19 74 ff ff c3 <0f> 0b f3 0f 1e fa 53 48 8b 5f 18 48 85 db 74 
> 2b
> (XEN) Xen stack trace from rsp=831087d7fc90:
> (XEN)    82d040247503 00132030 830ad80e8bf0 82d0405a9288
> (XEN)    83107f59aa80 830ad80e8c80 83107c45bee0 830ad80e8bf0
> (XEN)    831000af1010 83107c45bee0 830ad80ed000 83107c45bee0
> (XEN)     82d04045d5d8 82d0405ae680 82d040235303
> (XEN)    831087d7fe20 fffe 82d040236ec3 830ad80ed000
> (XEN)     7fb535230010 831087d7 
> (XEN)    82d04045d5d8 82d040234763 c102 
> (XEN)     c102 000d 8101ede6
> (XEN)    e033 00011082 c90046ebba90 e02b
> (XEN)    5a33a1a65352beef feadf9effdf1beef 122ae2fa736bbeef 46023e9af174beef
> (XEN)    82d040227cc6 831087d7fe48  00011082
> (XEN)     831087d7  8101ede4
> (XEN)    82d0403495d0 00150012 00010006 000d
> (XEN)    7ffdf93fb3fc 00431042 0043d990 0043d9b0
> (XEN)    7fb534eb8434 7ffdf93fb400 0013 02361838
> (XEN)    04457fe81f7cf300 02360870 ff80 
> (XEN)    7ffdf93fc652 0043d980 831087d7fef8 0023
> (XEN)    83107f544000   
> (XEN)    82d0402dd07f 83107f544000  
> (XEN)    82d0402012b7  88811abbc100 7ffdf93fb2c0
> (XEN) Xen call trace:
> (XEN)    [] R credit.c#csched_free_udata+0x12/0x14
> (XEN)    [] S sched_move_domain+0x5b0/0x5cc

Hmm, looks like sched_move_domain()'s calling of sched_free_udata() uses the
new pool's scheduler, not that of the original pool. I'm puzzled though that
there's no sign at all in the function of it caring about what the original
pool was. IOW I'm not sure that the simple and obvious change to latch the
original pool into a local and then use it on the "out_free" path is going
to be enough. Jürgen (sorry, again you)?

Jan



Re: [PATCH] Config.mk: drop -Wdeclaration-after-statement

2023-12-04 Thread Alexander Kanavin

On 12/1/23 20:14, Julien Grall wrote:


So I agree that if we were to remove -Wdeclaration-after-statement 
then we should also update the CODING_STYLE. However, I am not 
entirely sure I would want to mix code and declaration in the hypervisor.


Anyway, I think this is a separate discussion from resolving the 
immediate problem (i.e. building the python bindings).


So for now, I think it would make sense to push the 
-Wdeclaration-after-statement to the tools.


@Alexander, are you going to send a new version? If not, I would be 
happy to do it.


Please do it, as in the meantime, my attention has focused entirely 
elsewhere, so I'd have to switch context and find time to study the xen 
source. I don't have specific interest in xen, the reason I looked into 
it is that we're updating python to 3.12 in yocto and this one was one 
of the many issues that came up all over the userspace stack.



--
Alexander Kanavin
Linutronix GmbH | Bahnhofstrasse 3 | D-88690 Uhldingen-Mühlhofen
Phone: +49 7556 25 999 39; Fax.: +49 7556 25 999 99

Hinweise zum Datenschutz finden Sie hier (Informations on data privacy
can be found here): https://linutronix.de/legal/data-protection.php

Linutronix GmbH | Firmensitz (Registered Office): Uhldingen-Mühlhofen |
Registergericht (Registration Court): Amtsgericht Freiburg i.Br., HRB700
806 | Geschäftsführer (Managing Directors): Heinz Egger, Thomas Gleixner,
Sharon Heck, Yulia Beck, Tiffany Silva




Re: Moving domain from credit2 to credit cpupool crash xen

2023-12-04 Thread Jan Beulich
On 04.12.2023 10:23, Jan Beulich wrote:
> On 01.12.2023 21:13, René Winther Højgaard wrote:
>> When I move a domain from pool0 with credit2 to any pool with credit(1) I 
>> get the following crash.
>>
>>
>> Software: Xen-4.17.3 / Qubes OS 4.2.0-RC5
>> Firmware: Dasharo 0.9.0 - Z790P
>> Hardware: 13900K
>> (XEN) Xen BUG at common/sched/credit.c:1051(XEN) [ Xen-4.17.3-pre  
>> x86_64  debug=y  Not tainted ]
>> (XEN) CPU:    2
>> (XEN) RIP:    e008:[] credit.c#csched_free_udata+0x12/0x14
>> (XEN) RFLAGS: 00010202   CONTEXT: hypervisor (d0v2)
>> (XEN) rax: 82d040237ceb   rbx: 0014   rcx: 0013
>> (XEN) rdx: 831087d7   rsi: 830ad80e8da0   rdi: 830ad80e8da0
>> (XEN) rbp:    rsp: 831087d7fc90   r8:  830e2d6a49b0
>> (XEN) r9:  831087d7fbe0   r10: 83107c481068   r11: 002cfd1c274a
>> (XEN) r12: 830ad80e8c80   r13: 83107c45bee0   r14: 
>> (XEN) r15: 82d0405a9288   cr0: 80050033   cr4: 00b526e0
>> (XEN) cr3: 0009284d8000   cr2: 7fb535181240
>> (XEN) fsb: 7fb534c5f380   gsb: 8881b9d0   gss: 
>> (XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
>> (XEN) Xen code around  
>> (credit.c#csched_free_udata+0x12/0x14):
>> (XEN)  75 06 e8 19 74 ff ff c3 <0f> 0b f3 0f 1e fa 53 48 8b 5f 18 48 85 db 
>> 74 2b
>> (XEN) Xen stack trace from rsp=831087d7fc90:
>> (XEN)    82d040247503 00132030 830ad80e8bf0 82d0405a9288
>> (XEN)    83107f59aa80 830ad80e8c80 83107c45bee0 830ad80e8bf0
>> (XEN)    831000af1010 83107c45bee0 830ad80ed000 83107c45bee0
>> (XEN)     82d04045d5d8 82d0405ae680 82d040235303
>> (XEN)    831087d7fe20 fffe 82d040236ec3 830ad80ed000
>> (XEN)     7fb535230010 831087d7 
>> (XEN)    82d04045d5d8 82d040234763 c102 
>> (XEN)     c102 000d 8101ede6
>> (XEN)    e033 00011082 c90046ebba90 e02b
>> (XEN)    5a33a1a65352beef feadf9effdf1beef 122ae2fa736bbeef 46023e9af174beef
>> (XEN)    82d040227cc6 831087d7fe48  00011082
>> (XEN)     831087d7  8101ede4
>> (XEN)    82d0403495d0 00150012 00010006 000d
>> (XEN)    7ffdf93fb3fc 00431042 0043d990 0043d9b0
>> (XEN)    7fb534eb8434 7ffdf93fb400 0013 02361838
>> (XEN)    04457fe81f7cf300 02360870 ff80 
>> (XEN)    7ffdf93fc652 0043d980 831087d7fef8 0023
>> (XEN)    83107f544000   
>> (XEN)    82d0402dd07f 83107f544000  
>> (XEN)    82d0402012b7  88811abbc100 7ffdf93fb2c0
>> (XEN) Xen call trace:
>> (XEN)    [] R credit.c#csched_free_udata+0x12/0x14
>> (XEN)    [] S sched_move_domain+0x5b0/0x5cc
> 
> Hmm, looks like sched_move_domain()'s calling of sched_free_udata() uses the
> new pool's scheduler, not that of the original pool. I'm puzzled though that
> there's no sign at all in the function of it caring about what the original
> pool was. IOW I'm not sure that the simple and obvious change to latch the
> original pool into a local and then use it on the "out_free" path is going
> to be enough. Jürgen (sorry, again you)?

Hmm, should have added "in the error case". Seeing there is old_ops, perhaps
simply

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@
 for ( unit = old_units; unit; )
 {
 if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(ret ? c->sched : old_ops, unit->priv);
 old_unit = unit;
 unit = unit->next_in_list;
 xfree(old_unit);


(not even compile tested)?

Jan



Re: [PATCH v5 6/7] xen/asm-generic: ifdef inclusion of

2023-12-04 Thread Oleksii
On Mon, 2023-12-04 at 09:46 +0100, Jan Beulich wrote:
> On 01.12.2023 21:48, Oleksii Kurochko wrote:
> > --- a/xen/common/Kconfig
> > +++ b/xen/common/Kconfig
> > @@ -87,7 +87,7 @@ config MEM_ACCESS_ALWAYS_ON
> >  config MEM_ACCESS
> >     def_bool MEM_ACCESS_ALWAYS_ON
> >     prompt "Memory Access and VM events" if
> > !MEM_ACCESS_ALWAYS_ON
> > -   depends on HVM
> > +   depends on HVM && (ARM || X86)
> 
> While - unlike for GRANT_TABLE - I view going this route as an
> option, I
> still think it wants doing without explicitly naming architectures
> here.
> There wants to be a HAS_MEM_ACCESS, selected by Arm's Kconfig and by
> MEM_ACCESS_ALWAYS_ON (which in turn x86 selects). This could then
> also
> replace the HVM dependency here - x86 ought to select
> MEM_ACCESS_ALWAYS_ON
> only when HVM is enabled.
Thanks.

It will be definetely better to introduce HAS_MEM_ACCESS.


~ Oleksii



Re: [PATCH] CODING_STYLE: Add a section of the naming convention

2023-12-04 Thread Jan Beulich
On 01.12.2023 19:49, Julien Grall wrote:
> +Naming convention
> +-
> +
> +'-' should be used to separate words in commandline options and filenames.
> +E.g. timer-works.
> +
> +Note that some of the options and filenames are using '_'. This is now
> +deprecated.

I certainly appreciate and second the intent, yet I'm afraid "Naming convention"
in the doc would (to me at least) first and foremost talk about identifiers used
in the various source files. If this really is to be about only file names and
command line options, then I think the heading would better say so. 
Alternatively
a clear indication would want adding that text about identifiers is supposed to
be here, but is yet to be written. (The text itself, for the intended purpose,
reads fine to me, fwiw.)

Jan



Re: [PATCH v5 5/7] xen: ifdef inclusion of in

2023-12-04 Thread Oleksii
On Mon, 2023-12-04 at 09:41 +0100, Jan Beulich wrote:
> On 01.12.2023 21:48, Oleksii Kurochko wrote:
> > Ifdef-ing inclusion of  allows to avoid
> > generation of empty  for cases when
> > CONFIG_GRANT_TABLE is not enabled.
> > 
> > The following changes were done for Arm:
> >  should be included directly because it contains
> > gnttab_dom0_frames() macros which is unique for Arm and is used in
> > arch/arm/domain_build.c.
> >  is #ifdef-ed with CONFIG_GRANT_TABLE in
> >  so in case of !CONFIG_GRANT_TABLE
> > gnttab_dom0_frames
> > won't be available for use in arch/arm/domain_build.c.
> > 
> > Suggested-by: Jan Beulich 
> 
> Not really, no: In particular ...
> 
> > --- a/xen/common/Kconfig
> > +++ b/xen/common/Kconfig
> > @@ -15,6 +15,7 @@ config CORE_PARKING
> >  config GRANT_TABLE
> >     bool "Grant table support" if EXPERT
> >     default y
> > +   depends on ARM || X86
> 
> ... this I explicitly said I consider wrong to add.
Then I misunderstood you.

What about to do the same as with MEM_ACCESS config and introduce
HAS_GRANT_TABLE?

Or would it be better just update "depends on" to !RISCV && !PPC?

~ Oleksii



[PATCH v2 2/6] amd-vi: set IOMMU page table levels based on guest reported paddr width

2023-12-04 Thread Roger Pau Monne
However take into account the minimum number of levels required by unity maps
when setting the page table levels.

The previous setting of the page table levels for PV guests based on the
highest RAM address was bogus, as there can be other non-RAM regions past the
highest RAM address that need to be mapped, for example device MMIO.

For HVM we also take amd_iommu_min_paging_mode into account, however if unity
maps require more than 4 levels attempting to add those will currently fail at
the p2m level, as 4 levels is the maximum supported.

Fixes: 0700c962ac2d ('Add AMD IOMMU support into hypervisor')
Signed-off-by: Roger Pau Monné 
---
changes since v1:
 - Use paging_max_paddr_bits() instead of hardcoding
   DEFAULT_DOMAIN_ADDRESS_WIDTH.
---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 6bc73dc21052..00a25e649f22 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -359,21 +359,17 @@ int __read_mostly amd_iommu_min_paging_mode = 1;
 static int cf_check amd_iommu_domain_init(struct domain *d)
 {
 struct domain_iommu *hd = dom_iommu(d);
+int pglvl = amd_iommu_get_paging_mode(
+PFN_DOWN(1UL << paging_max_paddr_bits(d)));
+
+if ( pglvl < 0 )
+return pglvl;
 
 /*
- * Choose the number of levels for the IOMMU page tables.
- * - PV needs 3 or 4, depending on whether there is RAM (including hotplug
- *   RAM) above the 512G boundary.
- * - HVM could in principle use 3 or 4 depending on how much guest
- *   physical address space we give it, but this isn't known yet so use 4
- *   unilaterally.
- * - Unity maps may require an even higher number.
+ * Choose the number of levels for the IOMMU page tables, taking into
+ * account unity maps.
  */
-hd->arch.amd.paging_mode = max(amd_iommu_get_paging_mode(
-is_hvm_domain(d)
-? 1UL << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
-: get_upper_mfn_bound() + 1),
-amd_iommu_min_paging_mode);
+hd->arch.amd.paging_mode = max(pglvl, amd_iommu_min_paging_mode);
 
 return 0;
 }
-- 
2.43.0




[PATCH v2 5/6] x86/iommu: switch hwdom IOMMU to use a rangeset

2023-12-04 Thread Roger Pau Monne
The current loop that iterates from 0 to the maximum RAM address in order to
setup the IOMMU mappings is highly inefficient, and it will get worse as the
amount of RAM increases.  It's also not accounting for any reserved regions
past the last RAM address.

Instead of iterating over memory addresses, iterate over the memory map regions
and use a rangeset in order to keep track of which ranges need to be identity
mapped in the hardware domain physical address space.

On an AMD EPYC 7452 with 512GiB of RAM, the time to execute
arch_iommu_hwdom_init() in nanoseconds is:

x old
+ new
N   Min   MaxMedian   AvgStddev
x   5 2.2364154e+10  2.338244e+10 2.2474685e+10 2.2622409e+10 4.2949869e+08
+   5   1025012   1033036   1026188 1028276.2 3623.1194
Difference at 95.0% confidence
-2.26214e+10 +/- 4.42931e+08
-99.9955% +/- 9.05152e-05%
(Student's t, pooled s = 3.03701e+08)

Execution time of arch_iommu_hwdom_init() goes down from ~22s to ~0.001s.

Note there's a change for HVM domains (ie: PVH dom0) that get switched to
create the p2m mappings using map_mmio_regions() instead of
p2m_add_identity_entry(), so that ranges can be mapped with a single function
call if possible.  Note that the interface of map_mmio_regions() doesn't
allow creating read-only mappings, but so far there are no such mappings
created for PVH dom0 in arch_iommu_hwdom_init().

No change intended in the resulting mappings that a hardware domain gets.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Split from bigger patch.
 - Remove unneeded default case.
---
 xen/drivers/passthrough/x86/iommu.c | 157 
 1 file changed, 42 insertions(+), 115 deletions(-)

diff --git a/xen/drivers/passthrough/x86/iommu.c 
b/xen/drivers/passthrough/x86/iommu.c
index 7e97805fccec..81d6129189d0 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -300,76 +300,6 @@ void iommu_identity_map_teardown(struct domain *d)
 }
 }
 
-static unsigned int __hwdom_init hwdom_iommu_map(const struct domain *d,
- unsigned long pfn,
- unsigned long max_pfn)
-{
-mfn_t mfn = _mfn(pfn);
-unsigned int i, type, perms = IOMMUF_readable | IOMMUF_writable;
-
-/*
- * Set up 1:1 mapping for dom0. Default to include only conventional RAM
- * areas and let RMRRs include needed reserved regions. When set, the
- * inclusive mapping additionally maps in every pfn up to 4GB except those
- * that fall in unusable ranges for PV Dom0.
- */
-if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
-return 0;
-
-switch ( type = page_get_ram_type(mfn) )
-{
-case RAM_TYPE_UNUSABLE:
-return 0;
-
-case RAM_TYPE_CONVENTIONAL:
-if ( iommu_hwdom_strict )
-return 0;
-break;
-
-default:
-if ( type & RAM_TYPE_RESERVED )
-{
-if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved )
-perms = 0;
-}
-else if ( is_hvm_domain(d) )
-return 0;
-else if ( !iommu_hwdom_inclusive || pfn > max_pfn )
-perms = 0;
-}
-
-/* Check that it doesn't overlap with the Interrupt Address Range. */
-if ( pfn >= 0xfee00 && pfn <= 0xfeeff )
-return 0;
-/* ... or the IO-APIC */
-if ( has_vioapic(d) )
-{
-for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
-if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
-return 0;
-}
-else if ( is_pv_domain(d) )
-{
-/*
- * Be consistent with CPU mappings: Dom0 is permitted to establish r/o
- * ones there (also for e.g. HPET in certain cases), so it should also
- * have such established for IOMMUs.
- */
-if ( iomem_access_permitted(d, pfn, pfn) &&
- rangeset_contains_singleton(mmio_ro_ranges, pfn) )
-perms = IOMMUF_readable;
-}
-/*
- * ... or the PCIe MCFG regions.
- * TODO: runtime added MMCFG regions are not checked to make sure they
- * don't overlap with already mapped regions, thus preventing trapping.
- */
-if ( has_vpci(d) && vpci_is_mmcfg_address(d, pfn_to_paddr(pfn)) )
-return 0;
-
-return perms;
-}
-
 static int __hwdom_init cf_check map_subtract(unsigned long s, unsigned long e,
   void *data)
 {
@@ -444,8 +374,8 @@ static int __hwdom_init cf_check identity_map(unsigned long 
s, unsigned long e,
 
 void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
 {
-unsigned long i, top, max_pfn, start, count;
-unsigned int flush_flags = 0, start_perms = 0;
+const unsigned long max_pfn = PFN_DOWN(GB(4)) - 1;
+unsigned int i;
 struct rangeset *map;
 struct map_data map_data = { .d = d };
 i

[PATCH v2 4/6] x86/iommu: remove regions not to be mapped

2023-12-04 Thread Roger Pau Monne
Introduce the code to remove regions not to be mapped from the rangeset
that will be used to setup the IOMMU page tables for the hardware domain.

This change also introduces two new functions: remove_xen_ranges() and
vpci_subtract_mmcfg() that copy the logic in xen_in_range() and
vpci_is_mmcfg_address() respectively and remove the ranges that would otherwise
be intercepted by the original functions.

Note that the rangeset is still not populated.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Split from bigger patch.
---
 xen/arch/x86/hvm/io.c   | 16 
 xen/arch/x86/include/asm/hvm/io.h   |  3 ++
 xen/arch/x86/include/asm/setup.h|  1 +
 xen/arch/x86/setup.c| 48 ++
 xen/drivers/passthrough/x86/iommu.c | 64 +
 5 files changed, 132 insertions(+)

diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index d75af83ad01f..a42854c52b65 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -369,6 +369,22 @@ bool vpci_is_mmcfg_address(const struct domain *d, paddr_t 
addr)
 return vpci_mmcfg_find(d, addr);
 }
 
+int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset 
*r)
+{
+const struct hvm_mmcfg *mmcfg;
+
+list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next )
+{
+int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr),
+   PFN_DOWN(mmcfg->addr + mmcfg->size - 
1));
+
+if ( rc )
+return rc;
+}
+
+return 0;
+}
+
 static unsigned int vpci_mmcfg_decode_addr(const struct hvm_mmcfg *mmcfg,
paddr_t addr, pci_sbdf_t *sbdf)
 {
diff --git a/xen/arch/x86/include/asm/hvm/io.h 
b/xen/arch/x86/include/asm/hvm/io.h
index a97731657801..e1e5e6fe7491 100644
--- a/xen/arch/x86/include/asm/hvm/io.h
+++ b/xen/arch/x86/include/asm/hvm/io.h
@@ -156,6 +156,9 @@ void destroy_vpci_mmcfg(struct domain *d);
 /* Check if an address is between a MMCFG region for a domain. */
 bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr);
 
+/* Remove MMCFG regions from a given rangeset. */
+int vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index 9a460e4db8f4..cd07d98101d8 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -37,6 +37,7 @@ void discard_initial_images(void);
 void *bootstrap_map(const module_t *mod);
 
 int xen_in_range(unsigned long mfn);
+int remove_xen_ranges(struct rangeset *r);
 
 extern uint8_t kbd_shift_flags;
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3cba2be0af6c..71fa0b46f181 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2136,6 +2136,54 @@ int __hwdom_init xen_in_range(unsigned long mfn)
 return 0;
 }
 
+int __hwdom_init remove_xen_ranges(struct rangeset *r)
+{
+paddr_t start, end;
+int rc;
+
+/* S3 resume code (and other real mode trampoline code) */
+rc = rangeset_remove_range(r, PFN_DOWN(bootsym_phys(trampoline_start)),
+   PFN_DOWN(bootsym_phys(trampoline_end)));
+if ( rc )
+return rc;
+
+/*
+ * This needs to remain in sync with the uses of the same symbols in
+ * - __start_xen()
+ * - is_xen_fixed_mfn()
+ * - tboot_shutdown()
+ */
+/* hypervisor .text + .rodata */
+rc = rangeset_remove_range(r, PFN_DOWN(__pa(&_stext)),
+   PFN_DOWN(__pa(&__2M_rodata_end)));
+if ( rc )
+return rc;
+
+/* hypervisor .data + .bss */
+if ( efi_boot_mem_unused(&start, &end) )
+{
+ASSERT(__pa(start) >= __pa(&__2M_rwdata_start));
+rc = rangeset_remove_range(r, PFN_DOWN(__pa(&__2M_rwdata_start)),
+   PFN_DOWN(__pa(start)));
+if ( rc )
+return rc;
+ASSERT(__pa(end) <= __pa(&__2M_rwdata_end));
+rc = rangeset_remove_range(r, PFN_DOWN(__pa(end)),
+   PFN_DOWN(__pa(&__2M_rwdata_end)));
+if ( rc )
+return rc;
+}
+else
+{
+rc = rangeset_remove_range(r, PFN_DOWN(__pa(&__2M_rwdata_start)),
+   PFN_DOWN(__pa(&__2M_rwdata_end)));
+if ( rc )
+return rc;
+}
+
+return 0;
+}
+
 static int __hwdom_init cf_check io_bitmap_cb(
 unsigned long s, unsigned long e, void *ctx)
 {
diff --git a/xen/drivers/passthrough/x86/iommu.c 
b/xen/drivers/passthrough/x86/iommu.c
index 531a428f6496..7e97805fccec 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -370,6 +370,14 @@ static unsigned int __hwdom_init hwdom_iommu_map(const 
struct domain *d,
 return perms;
 }
 
+static int __hwdom_init cf_check map_subtract(unsigned long s, unsigned long e,
+ 

[PATCH v2 1/6] iommu/vt-d: do not assume page table levels for quarantine domain

2023-12-04 Thread Roger Pau Monne
Like XSA-445, do not assume IOMMU page table levels on VT-d are always set
based on DEFAULT_DOMAIN_ADDRESS_WIDTH and instead fetch the value set by
intel_iommu_hwdom_init() from the domain iommu structure.  This prevents
changes to intel_iommu_hwdom_init() possibly getting the levels out of sync
with what intel_iommu_quarantine_init() expects.

No functional change, since on Intel domains are hardcoded to use
DEFAULT_DOMAIN_ADDRESS_WIDTH.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/drivers/passthrough/vtd/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index e13b7d99db40..bc6181c9f911 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -3162,7 +3162,7 @@ static int cf_check intel_iommu_quarantine_init(struct 
pci_dev *pdev,
 {
 struct domain_iommu *hd = dom_iommu(dom_io);
 struct page_info *pg;
-unsigned int agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+unsigned int agaw = hd->arch.vtd.agaw;
 unsigned int level = agaw_to_level(agaw);
 const struct acpi_drhd_unit *drhd;
 const struct acpi_rmrr_unit *rmrr;
-- 
2.43.0




[PATCH v2 0/6] x86/iommu: improve setup time of hwdom IOMMU

2023-12-04 Thread Roger Pau Monne
Hello,

The aim of the series is to reduce boot time setup of IOMMU page tables
for dom0.

The first patch is completely unrelated leftover work from XSA-445, just
included in the series because it's IOMMU code.

Second patch is a pre-req, as further patches can end up attempting to
create maps above the max RAM address, and hence without properly
setting the IOMMU page tables levels those attempts to map would fail.

Last 4 patches rework the hardware domain IOMMU setup to use a rangeset
instead of iterating over all addresses up to the max RAM page.  See
patch 5/6 for performance figures.

Thanks, Roger.

Roger Pau Monne (6):
  iommu/vt-d: do not assume page table levels for quarantine domain
  amd-vi: set IOMMU page table levels based on guest reported paddr
width
  x86/iommu: introduce a rangeset to perform hwdom IOMMU setup
  x86/iommu: remove regions not to be mapped
  x86/iommu: switch hwdom IOMMU to use a rangeset
  x86/iommu: cleanup unused functions

 xen/arch/x86/hvm/io.c   |  16 ++
 xen/arch/x86/include/asm/hvm/io.h   |   4 +-
 xen/arch/x86/include/asm/setup.h|   2 +-
 xen/arch/x86/setup.c|  81 +++---
 xen/arch/x86/tboot.c|   2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  20 +-
 xen/drivers/passthrough/vtd/iommu.c |   2 +-
 xen/drivers/passthrough/x86/iommu.c | 279 +---
 8 files changed, 248 insertions(+), 158 deletions(-)

-- 
2.43.0




[PATCH v2 3/6] x86/iommu: introduce a rangeset to perform hwdom IOMMU setup

2023-12-04 Thread Roger Pau Monne
This change just introduces the boilerplate code in order to use a rangeset
when setting up the hardware domain IOMMU mappings.  The rangeset is never
populated in this patch, so it's a non-functional change as far as the mappings
the domain gets established.

Note there's a change for HVM domains (ie: PVH dom0) that will get switched to
create the p2m mappings using map_mmio_regions() instead of
p2m_add_identity_entry(), so that ranges can be mapped with a single function
call if possible.  Note that the interface of map_mmio_regions() doesn't allow
creating read-only mappings, but so far there are no such mappings created for
PVH dom0 in arch_iommu_hwdom_init().

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Split from bigger patch.
---
 xen/drivers/passthrough/x86/iommu.c | 89 +
 1 file changed, 89 insertions(+)

diff --git a/xen/drivers/passthrough/x86/iommu.c 
b/xen/drivers/passthrough/x86/iommu.c
index 857dccb6a465..531a428f6496 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -370,10 +370,77 @@ static unsigned int __hwdom_init hwdom_iommu_map(const 
struct domain *d,
 return perms;
 }
 
+struct map_data {
+struct domain *d;
+unsigned int flush_flags;
+bool ro;
+};
+
+static int __hwdom_init cf_check identity_map(unsigned long s, unsigned long e,
+  void *data)
+{
+struct map_data *info = data;
+struct domain *d = info->d;
+long rc;
+
+if ( iommu_verbose )
+printk(XENLOG_INFO " [%010lx, %010lx] R%c\n",
+   s, e, info->ro ? 'O' : 'W');
+
+if ( paging_mode_translate(d) )
+{
+if ( info->ro )
+{
+ASSERT_UNREACHABLE();
+return 0;
+}
+while ( (rc = map_mmio_regions(d, _gfn(s), e - s + 1, _mfn(s))) > 0 )
+{
+s += rc;
+process_pending_softirqs();
+}
+}
+else
+{
+const unsigned int perms = IOMMUF_readable | IOMMUF_preempt |
+   (info->ro ? 0 : IOMMUF_writable);
+
+if ( info->ro && !iomem_access_permitted(d, s, e) )
+{
+/*
+ * Should be more fine grained in order to not map the forbidden
+ * frame instead of rejecting the region as a whole, but it's only
+ * for read-only MMIO regions, which are very limited.
+ */
+printk(XENLOG_DEBUG
+   "IOMMU read-only mapping of region [%lx, %lx] forbidden\n",
+   s, e);
+return 0;
+}
+while ( (rc = iommu_map(d, _dfn(s), _mfn(s), e - s + 1,
+perms, &info->flush_flags)) > 0 )
+{
+s += rc;
+process_pending_softirqs();
+}
+}
+ASSERT(rc <= 0);
+if ( rc )
+printk(XENLOG_WARNING
+   "IOMMU identity mapping of [%lx, %lx] failed: %ld\n",
+   s, e, rc);
+
+/* Ignore errors and attempt to map the remaining regions. */
+return 0;
+}
+
 void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
 {
 unsigned long i, top, max_pfn, start, count;
 unsigned int flush_flags = 0, start_perms = 0;
+struct rangeset *map;
+struct map_data map_data = { .d = d };
+int rc;
 
 BUG_ON(!is_hardware_domain(d));
 
@@ -397,6 +464,10 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
 if ( iommu_hwdom_passthrough )
 return;
 
+map = rangeset_new(NULL, NULL, 0);
+if ( !map )
+panic("IOMMU init: unable to allocate rangeset\n");
+
 max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
 top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
 
@@ -451,6 +522,24 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
 goto commit;
 }
 
+if ( iommu_verbose )
+printk(XENLOG_INFO "d%u: identity mappings for IOMMU:\n",
+   d->domain_id);
+
+rc = rangeset_report_ranges(map, 0, ~0UL, identity_map, &map_data);
+if ( rc )
+panic("IOMMU unable to create mappings: %d\n", rc);
+if ( is_pv_domain(d) )
+{
+map_data.ro = true;
+rc = rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL, identity_map,
+&map_data);
+if ( rc )
+panic("IOMMU unable to create read-only mappings: %d\n", rc);
+}
+
+rangeset_destroy(map);
+
 /* Use if to avoid compiler warning */
 if ( iommu_iotlb_flush_all(d, flush_flags) )
 return;
-- 
2.43.0




[PATCH v2 6/6] x86/iommu: cleanup unused functions

2023-12-04 Thread Roger Pau Monne
Remove xen_in_range() and vpci_is_mmcfg_address() now that hey are unused.

Adjust comments to point to the new functions that replace the existing ones.

No functional change.

Signed-off-by: Roger Pau Monné 
---
Can be squashed with the previous patch if desired, split as a separate patch
for clarity.
---
 xen/arch/x86/include/asm/hvm/io.h |  3 --
 xen/arch/x86/include/asm/setup.h  |  1 -
 xen/arch/x86/setup.c  | 53 ++-
 xen/arch/x86/tboot.c  |  2 +-
 4 files changed, 3 insertions(+), 56 deletions(-)

diff --git a/xen/arch/x86/include/asm/hvm/io.h 
b/xen/arch/x86/include/asm/hvm/io.h
index e1e5e6fe7491..24d1b6134f02 100644
--- a/xen/arch/x86/include/asm/hvm/io.h
+++ b/xen/arch/x86/include/asm/hvm/io.h
@@ -153,9 +153,6 @@ int register_vpci_mmcfg_handler(struct domain *d, paddr_t 
addr,
 /* Destroy tracked MMCFG areas. */
 void destroy_vpci_mmcfg(struct domain *d);
 
-/* Check if an address is between a MMCFG region for a domain. */
-bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr);
-
 /* Remove MMCFG regions from a given rangeset. */
 int vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r);
 
diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index cd07d98101d8..1ced1299c77b 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -36,7 +36,6 @@ unsigned long initial_images_nrpages(nodeid_t node);
 void discard_initial_images(void);
 void *bootstrap_map(const module_t *mod);
 
-int xen_in_range(unsigned long mfn);
 int remove_xen_ranges(struct rangeset *r);
 
 extern uint8_t kbd_shift_flags;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 71fa0b46f181..7d2cb61a2a4a 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1343,7 +1343,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 relocated = true;
 
 /*
- * This needs to remain in sync with xen_in_range() and the
+ * This needs to remain in sync with remove_xen_ranges() and the
  * respective reserve_e820_ram() invocation below. No need to
  * query efi_boot_mem_unused() here, though.
  */
@@ -1495,7 +1495,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 if ( using_2M_mapping() )
 efi_boot_mem_unused(NULL, NULL);
 
-/* This needs to remain in sync with xen_in_range(). */
+/* This needs to remain in sync with remove_xen_ranges(). */
 if ( efi_boot_mem_unused(&eb_start, &eb_end) )
 {
 reserve_e820_ram(&boot_e820, __pa(_stext), __pa(eb_start));
@@ -2087,55 +2087,6 @@ void arch_get_xen_caps(xen_capabilities_info_t *info)
 }
 }
 
-int __hwdom_init xen_in_range(unsigned long mfn)
-{
-paddr_t start, end;
-int i;
-
-enum { region_s3, region_ro, region_rw, region_bss, nr_regions };
-static struct {
-paddr_t s, e;
-} xen_regions[nr_regions] __hwdom_initdata;
-
-/* initialize first time */
-if ( !xen_regions[0].s )
-{
-/* S3 resume code (and other real mode trampoline code) */
-xen_regions[region_s3].s = bootsym_phys(trampoline_start);
-xen_regions[region_s3].e = bootsym_phys(trampoline_end);
-
-/*
- * This needs to remain in sync with the uses of the same symbols in
- * - __start_xen() (above)
- * - is_xen_fixed_mfn()
- * - tboot_shutdown()
- */
-
-/* hypervisor .text + .rodata */
-xen_regions[region_ro].s = __pa(&_stext);
-xen_regions[region_ro].e = __pa(&__2M_rodata_end);
-/* hypervisor .data + .bss */
-xen_regions[region_rw].s = __pa(&__2M_rwdata_start);
-xen_regions[region_rw].e = __pa(&__2M_rwdata_end);
-if ( efi_boot_mem_unused(&start, &end) )
-{
-ASSERT(__pa(start) >= xen_regions[region_rw].s);
-ASSERT(__pa(end) <= xen_regions[region_rw].e);
-xen_regions[region_rw].e = __pa(start);
-xen_regions[region_bss].s = __pa(end);
-xen_regions[region_bss].e = __pa(&__2M_rwdata_end);
-}
-}
-
-start = (paddr_t)mfn << PAGE_SHIFT;
-end = start + PAGE_SIZE;
-for ( i = 0; i < nr_regions; i++ )
-if ( (start < xen_regions[i].e) && (end > xen_regions[i].s) )
-return 1;
-
-return 0;
-}
-
 int __hwdom_init remove_xen_ranges(struct rangeset *r)
 {
 paddr_t start, end;
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 86c4c22cacb8..4c254b4e34b4 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -321,7 +321,7 @@ void tboot_shutdown(uint32_t shutdown_type)
 
 /*
  * Xen regions for tboot to MAC. This needs to remain in sync with
- * xen_in_range().
+ * remove_xen_ranges().
  */
 g_tboot_shared->num_mac_regions = 3;
 /* S3 resume code (and other real mode trampoline code) */
-- 
2.43.0




Re: [PATCH v5 5/7] xen: ifdef inclusion of in

2023-12-04 Thread Jan Beulich
On 04.12.2023 10:39, Oleksii wrote:
> On Mon, 2023-12-04 at 09:41 +0100, Jan Beulich wrote:
>> On 01.12.2023 21:48, Oleksii Kurochko wrote:
>>> Ifdef-ing inclusion of  allows to avoid
>>> generation of empty  for cases when
>>> CONFIG_GRANT_TABLE is not enabled.
>>>
>>> The following changes were done for Arm:
>>>  should be included directly because it contains
>>> gnttab_dom0_frames() macros which is unique for Arm and is used in
>>> arch/arm/domain_build.c.
>>>  is #ifdef-ed with CONFIG_GRANT_TABLE in
>>>  so in case of !CONFIG_GRANT_TABLE
>>> gnttab_dom0_frames
>>> won't be available for use in arch/arm/domain_build.c.
>>>
>>> Suggested-by: Jan Beulich 
>>
>> Not really, no: In particular ...
>>
>>> --- a/xen/common/Kconfig
>>> +++ b/xen/common/Kconfig
>>> @@ -15,6 +15,7 @@ config CORE_PARKING
>>>  config GRANT_TABLE
>>>     bool "Grant table support" if EXPERT
>>>     default y
>>> +   depends on ARM || X86
>>
>> ... this I explicitly said I consider wrong to add.
> Then I misunderstood you.
> 
> What about to do the same as with MEM_ACCESS config and introduce
> HAS_GRANT_TABLE?

That's an option, provided (and I put that under question before) there
realistically can be ports which don't mean to support grant tables.
You mentioned that things are fine for the dom0less setup you're testing,
but I don't think a fully-functional Xen port makes sense to only support
dom0less. But of course I'm willing to hear arguments to the contrary.

> Or would it be better just update "depends on" to !RISCV && !PPC?

Definitely not.

Jan



Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator

2023-12-04 Thread Oleksii
Hi Julien and Shawn,

On Fri, 2023-12-01 at 20:56 +, Julien Grall wrote:
> (+ Arm and RISC-V folks)
> 
> Hi Shawn,
> 
> On 01/12/2023 20:59, Shawn Anastasio wrote:
> > Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early
> > boot
> > allocator. Routines for parsing arm-specific devicetree nodes (e.g.
> > multiboot) were excluded, reducing the overall footprint of code
> > that
> > was copied.
> 
> I expect RISC-V to want similar code. I am not really thrilled in the
> idea of having 3 similar copy of the parsing. So can we extract the 
> common bits (or harmonize it) so it can be shared?
> 
> Maybe Oleksii has already a version doing that.
On RISC-V side I introduced the common bootfdt but probably it should
be aligned with the latest version ( my plan was to do that before
sending a patch to upstream ) of ARM's bootfdt.c.

https://gitlab.com/xen-project/people/olkur/xen/-/commit/ad2c8766a7d0886df84f6c60823275816c2115f5

I can re-work it this week and send it to the ML but I'll be happy if
Shawn can do it or his version as common so I'll be able to reuse it
soon.

~ Oleksii



Re: Trying add smt=off disabled cores to cpupool crash xen

2023-12-04 Thread Juergen Gross

On 04.12.23 10:15, Jan Beulich wrote:

On 01.12.2023 21:12, Andrew Cooper wrote:

On 01/12/2023 7:59 pm, René Winther Højgaard wrote:

If I set smt=off and try to configure cpupools with credit(1) as if
all cores are available, I get the following crash.

The crash happens when I try to use xl cpupool-add-cpu on the disabled
HT sibling cores.

Hyper-threading is enabled in the firmware, and only disabled with
smt=off.


CC'ing some maintainers.

I expect this will also explode when a CPU is runtime offlined with
`xen-hptool cpu-offline` and then added to a cpupool.

Interestingly, the crash is mov (%rdx,%rax,1),%r13, and I think that's
the percpu posion value in %rdx.

I expect cpupools want to reject parked/offline CPUs.


While the only explicit check there is

 if ( cpu >= nr_cpu_ids )
 goto addcpu_out;

I would have expected this

 if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
  cpumask_intersects(cpus, &cpupool_locked_cpus) )
 goto addcpu_out;

to deal with the situation, as parked/offline CPUs shouldn't be "free".
Jürgen?


The problem is the call of sched_get_opt_cpumask() to need the percpu area
of the cpu in question.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


xen | Failed pipeline for staging | 525c7c09

2023-12-04 Thread GitLab


Pipeline #1093622286 has failed!

Project: xen ( https://gitlab.com/xen-project/xen )
Branch: staging ( https://gitlab.com/xen-project/xen/-/commits/staging )

Commit: 525c7c09 ( 
https://gitlab.com/xen-project/xen/-/commit/525c7c094b258e8a46b494488eef96f5670eb352
 )
Commit Message: xen/arm: Move static event channel feature to a...
Commit Author: Michal Orzel ( https://gitlab.com/orzelmichal )
Committed by: Julien Grall


Pipeline #1093622286 ( 
https://gitlab.com/xen-project/xen/-/pipelines/1093622286 ) triggered by Andrew 
Cooper ( https://gitlab.com/andyhhp )
had 3 failed jobs.

Job #5669490041 ( https://gitlab.com/xen-project/xen/-/jobs/5669490041/raw )

Stage: test
Name: zen3p-pci-hvm-x86-64-gcc-debug
Job #5669490022 ( https://gitlab.com/xen-project/xen/-/jobs/5669490022/raw )

Stage: test
Name: zen3p-smoke-x86-64-dom0pvh-gcc-debug
Job #5669490010 ( https://gitlab.com/xen-project/xen/-/jobs/5669490010/raw )

Stage: test
Name: zen3p-smoke-x86-64-gcc-debug

-- 
You're receiving this email because of your account on gitlab.com.





Re: Trying add smt=off disabled cores to cpupool crash xen

2023-12-04 Thread Jan Beulich
On 04.12.2023 11:02, Juergen Gross wrote:
> On 04.12.23 10:15, Jan Beulich wrote:
>> On 01.12.2023 21:12, Andrew Cooper wrote:
>>> On 01/12/2023 7:59 pm, René Winther Højgaard wrote:
 If I set smt=off and try to configure cpupools with credit(1) as if
 all cores are available, I get the following crash.

 The crash happens when I try to use xl cpupool-add-cpu on the disabled
 HT sibling cores.

 Hyper-threading is enabled in the firmware, and only disabled with
 smt=off.
>>>
>>> CC'ing some maintainers.
>>>
>>> I expect this will also explode when a CPU is runtime offlined with
>>> `xen-hptool cpu-offline` and then added to a cpupool.
>>>
>>> Interestingly, the crash is mov (%rdx,%rax,1),%r13, and I think that's
>>> the percpu posion value in %rdx.
>>>
>>> I expect cpupools want to reject parked/offline CPUs.
>>
>> While the only explicit check there is
>>
>>  if ( cpu >= nr_cpu_ids )
>>  goto addcpu_out;
>>
>> I would have expected this
>>
>>  if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
>>   cpumask_intersects(cpus, &cpupool_locked_cpus) )
>>  goto addcpu_out;
>>
>> to deal with the situation, as parked/offline CPUs shouldn't be "free".
>> Jürgen?
> 
> The problem is the call of sched_get_opt_cpumask() to need the percpu area
> of the cpu in question.

That was my first thought, too, but then I saw cpupool_assign_cpu_locked() on
the call trace, which is called only afterwards. Plus sched_get_opt_cpumask()
needs the per-CPU area only when granularity was switched from its default of
SCHED_GRAN_cpu afaics.

Jan



Re: Moving domain from credit2 to credit cpupool crash xen

2023-12-04 Thread Juergen Gross

On 04.12.23 10:31, Jan Beulich wrote:

On 04.12.2023 10:23, Jan Beulich wrote:

On 01.12.2023 21:13, René Winther Højgaard wrote:

When I move a domain from pool0 with credit2 to any pool with credit(1) I get 
the following crash.


Software: Xen-4.17.3 / Qubes OS 4.2.0-RC5
Firmware: Dasharo 0.9.0 - Z790P
Hardware: 13900K
(XEN) Xen BUG at common/sched/credit.c:1051(XEN) [ Xen-4.17.3-pre  x86_64  
debug=y  Not tainted ]
(XEN) CPU:    2
(XEN) RIP:    e008:[] credit.c#csched_free_udata+0x12/0x14
(XEN) RFLAGS: 00010202   CONTEXT: hypervisor (d0v2)
(XEN) rax: 82d040237ceb   rbx: 0014   rcx: 0013
(XEN) rdx: 831087d7   rsi: 830ad80e8da0   rdi: 830ad80e8da0
(XEN) rbp:    rsp: 831087d7fc90   r8:  830e2d6a49b0
(XEN) r9:  831087d7fbe0   r10: 83107c481068   r11: 002cfd1c274a
(XEN) r12: 830ad80e8c80   r13: 83107c45bee0   r14: 
(XEN) r15: 82d0405a9288   cr0: 80050033   cr4: 00b526e0
(XEN) cr3: 0009284d8000   cr2: 7fb535181240
(XEN) fsb: 7fb534c5f380   gsb: 8881b9d0   gss: 
(XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
(XEN) Xen code around  (credit.c#csched_free_udata+0x12/0x14):
(XEN)  75 06 e8 19 74 ff ff c3 <0f> 0b f3 0f 1e fa 53 48 8b 5f 18 48 85 db 74 2b
(XEN) Xen stack trace from rsp=831087d7fc90:
(XEN)    82d040247503 00132030 830ad80e8bf0 82d0405a9288
(XEN)    83107f59aa80 830ad80e8c80 83107c45bee0 830ad80e8bf0
(XEN)    831000af1010 83107c45bee0 830ad80ed000 83107c45bee0
(XEN)     82d04045d5d8 82d0405ae680 82d040235303
(XEN)    831087d7fe20 fffe 82d040236ec3 830ad80ed000
(XEN)     7fb535230010 831087d7 
(XEN)    82d04045d5d8 82d040234763 c102 
(XEN)     c102 000d 8101ede6
(XEN)    e033 00011082 c90046ebba90 e02b
(XEN)    5a33a1a65352beef feadf9effdf1beef 122ae2fa736bbeef 46023e9af174beef
(XEN)    82d040227cc6 831087d7fe48  00011082
(XEN)     831087d7  8101ede4
(XEN)    82d0403495d0 00150012 00010006 000d
(XEN)    7ffdf93fb3fc 00431042 0043d990 0043d9b0
(XEN)    7fb534eb8434 7ffdf93fb400 0013 02361838
(XEN)    04457fe81f7cf300 02360870 ff80 
(XEN)    7ffdf93fc652 0043d980 831087d7fef8 0023
(XEN)    83107f544000   
(XEN)    82d0402dd07f 83107f544000  
(XEN)    82d0402012b7  88811abbc100 7ffdf93fb2c0
(XEN) Xen call trace:
(XEN)    [] R credit.c#csched_free_udata+0x12/0x14
(XEN)    [] S sched_move_domain+0x5b0/0x5cc


Hmm, looks like sched_move_domain()'s calling of sched_free_udata() uses the
new pool's scheduler, not that of the original pool. I'm puzzled though that
there's no sign at all in the function of it caring about what the original
pool was. IOW I'm not sure that the simple and obvious change to latch the
original pool into a local and then use it on the "out_free" path is going
to be enough. Jürgen (sorry, again you)?


Hmm, should have added "in the error case". Seeing there is old_ops, perhaps
simply

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@
  for ( unit = old_units; unit; )
  {
  if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(ret ? c->sched : old_ops, unit->priv);
  old_unit = unit;
  unit = unit->next_in_list;
  xfree(old_unit);



Yes, this looks fine.

In case you want to send a proper patch, you can add my

Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough device in PVH dom0

2023-12-04 Thread Roger Pau Monné
On Fri, Dec 01, 2023 at 07:37:55PM -0800, Stefano Stabellini wrote:
> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> > On Thu, Nov 30, 2023 at 07:15:17PM -0800, Stefano Stabellini wrote:
> > > On Thu, 30 Nov 2023, Roger Pau Monné wrote:
> > > > On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote:
> > > > > On Fri, 24 Nov 2023, Jiqian Chen wrote:
> > > > > > This patch is to solve two problems we encountered when we try to
> > > > > > passthrough a device to hvm domU base on Xen PVH dom0.
> > > > > > 
> > > > > > First, hvm guest will alloc a pirq and irq for a passthrough device
> > > > > > by using gsi, before that, the gsi must first has a mapping in dom0,
> > > > > > see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call
> > > > > > into Xen and check whether dom0 has the mapping. See
> > > > > > XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH
> > > > > > dom0 and it return irq is 0, and then return -EPERM.
> > > > > > This is because the passthrough device doesn't do PHYSDEVOP_map_pirq
> > > > > > when thay are enabled.
> > > > > > 
> > > > > > Second, in PVH dom0, the gsi of a passthrough device doesn't get
> > > > > > registered, but gsi must be configured for it to be able to be
> > > > > > mapped into a domU.
> > > > > > 
> > > > > > After searching codes, we can find map_pirq and register_gsi will be
> > > > > > done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when
> > > > > > the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems
> > > > > > can be conclude to that the gsi of a passthrough device doesn't be
> > > > > > unmasked.
> > > > > > 
> > > > > > To solve the unmaske problem, this patch call the unmask_irq when we
> > > > > > assign a device to be passthrough. So that the gsi can get 
> > > > > > registered
> > > > > > and mapped in PVH dom0.
> > > > > 
> > > > > 
> > > > > Roger, this seems to be more of a Xen issue than a Linux issue. Why do
> > > > > we need the unmask check in Xen? Couldn't we just do:
> > > > > 
> > > > > 
> > > > > diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> > > > > index 4e40d3609a..df262a4a18 100644
> > > > > --- a/xen/arch/x86/hvm/vioapic.c
> > > > > +++ b/xen/arch/x86/hvm/vioapic.c
> > > > > @@ -287,7 +287,7 @@ static void vioapic_write_redirent(
> > > > >  hvm_dpci_eoi(d, gsi);
> > > > >  }
> > > > >  
> > > > > -if ( is_hardware_domain(d) && unmasked )
> > > > > +if ( is_hardware_domain(d) )
> > > > >  {
> > > > >  /*
> > > > >   * NB: don't call vioapic_hwdom_map_gsi while holding 
> > > > > hvm.irq_lock
> > > > 
> > > > There are some issues with this approach.
> > > > 
> > > > mp_register_gsi() will only setup the trigger and polarity of the
> > > > IO-APIC pin once, so we do so once the guest unmask the pin in order
> > > > to assert that the configuration is the intended one.  A guest is
> > > > allowed to write all kind of nonsense stuff to the IO-APIC RTE, but
> > > > that doesn't take effect unless the pin is unmasked.
> > > > 
> > > > Overall the question would be whether we have any guarantees that
> > > > the hardware domain has properly configured the pin, even if it's not
> > > > using it itself (as it hasn't been unmasked).
> > > > 
> > > > IIRC PCI legacy interrupts are level triggered and low polarity, so we
> > > > could configure any pins that are not setup at bind time?
> > > 
> > > That could work.
> > > 
> > > Another idea is to move only the call to allocate_and_map_gsi_pirq at
> > > bind time? That might be enough to pass a pirq_access_permitted check.
> > 
> > Maybe, albeit that would change the behavior of XEN_DOMCTL_bind_pt_irq
> > just for PT_IRQ_TYPE_PCI and only when called from a PVH dom0 (as the
> > parameter would be a GSI instead of a previously mapped IRQ).  Such
> > difference just for PT_IRQ_TYPE_PCI is slightly weird - if we go that
> > route I would recommend that we instead introduce a new dmop that has
> > this syntax regardless of the domain type it's called from.
> 
> Looking at the code it is certainly a bit confusing. My point was that
> we don't need to wait until polarity and trigger are set appropriately
> to allow Dom0 to pass successfully a pirq_access_permitted() check. Xen
> should be able to figure out that Dom0 is permitted pirq access.

The logic is certainly not straightforward, and it could benefit from
some comments.

The irq permissions are a bit special, in that they get setup when the
IRQ is mapped.

The problem however is not so much with IRQ permissions, that we can
indeed sort out internally in Xen.  Such check in dom0 has the side
effect of preventing the IRQ from being assigned to a domU without the
hardware source being properly configured AFAICT.

> 
> So the idea was to move the call to allocate_and_map_gsi_pirq() earlier
> somewhere because allocate_and_map_gsi_pirq doesn't require trigger or
> polarity to be configured to work. But the suggestion of do

Re: [RFC PATCH] xen/arm: Add emulation of Debug Data Transfer Registers

2023-12-04 Thread Julien Grall

Hi Ayan,

On 01/12/2023 18:50, Ayan Kumar Halder wrote:

Currently if user enables HVC_DCC config option in Linux, it invokes
access to debug data transfer registers (ie DBGDTRTX_EL0 on arm64,
DBGDTRTXINT on arm32). As these registers are not emulated, Xen injects
an undefined exception to the guest. And Linux crashes.


I am missing some data points here to be able to say whether I would be 
ok with emulating the registers. So some questions:
  * As you wrote below, HVC_DCC will return -ENODEV after this 
emulation. So may I ask what's the use case to enable it? (e.g. is there 
a distro turning this on?)
 * Linux is writing to the registers unconditionally, but is the spec 
mandating the implementation of the registers? (I couldn't find either way)

 * When was this check introduced in Linux? Did it ever changed?



We wish to avoid this crash by adding a "partial" emulation. DBGDTR_EL0
is emulated as TXfull | RXfull.


Skimming through the Arm Arm, I see that TXfull and Rxfull indicates 
that both buffers are full but it doesn't explicitly say this means the 
feature is not available.


I understand this is what Linux checks, but if we want to partially 
emulate the registers in Xen, then I'd like us to make sure this is 
backed by the Arm Arm rather than based on Linux implementation (which 
can change at any point).



Refer ARM DDI 0487I.a ID081822, D17.3.8, DBGDTRTX_EL0
"If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN"
Also D17.3.7 DBGDTRRX_EL0,
" If RXfull is set to 1, return the last value written to DTRRX."

Thus, any OS is expected to read DBGDTR_EL0 and check for TXfull
before using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() --->
hvc_dcc_check() , it returns -ENODEV. In this way, we are preventing
the guest to be aborted.


See above, what guarantees us that Linux will not change this behavior 
in the future?




We also emulate DBGDTRTX_EL0 as RAZ/WI.

We have added emulation for AArch32 variant of these registers as well.
Also, we have added handle_read_val_wi() to emulate DBGDSCREXT register
to return a specific value (ie TXfull | RXfull) and ignore any writes
to this register.

Signed-off-by: Michal Orzel 
Signed-off-by: Ayan Kumar Halder 


We usually expect the first Signed-off-by to also be the author. So 
should Michal be the author of this patch?



---
  xen/arch/arm/arm64/vsysreg.c | 21 ++
  xen/arch/arm/include/asm/arm64/hsr.h |  3 +++
  xen/arch/arm/include/asm/cpregs.h|  2 ++
  xen/arch/arm/include/asm/traps.h |  4 
  xen/arch/arm/traps.c | 18 +++
  xen/arch/arm/vcpreg.c| 33 +---
  6 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index b5d54c569b..5082dfb02e 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs,
   *
   * Unhandled:
   *MDCCINT_EL1
- *DBGDTR_EL0
- *DBGDTRRX_EL0
- *DBGDTRTX_EL0
   *OSDTRRX_EL1
   *OSDTRTX_EL1
   *OSECCR_EL1
@@ -172,11 +169,27 @@ void do_sysreg(struct cpu_user_regs *regs,
  case HSR_SYSREG_MDSCR_EL1:
  return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
  case HSR_SYSREG_MDCCSR_EL0:
+{
+/*
+ * Bit 29: TX full, bit 30: RX full
+ * Given that we emulate DCC registers as RAZ/WI, doing the same for
+ * MDCCSR_EL0 would cause a guest to continue using the DCC despite no
+ * real effect. Setting the TX/RX status bits should result in a probe
+ * fail (based on Linux behavior).
+ */
+register_t guest_reg_value = (1U << 29) | (1U << 30);
+
  /*
   * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We emulate 
that
   * register as RAZ/WI above. So RO at both EL0 and EL1.
   */
-return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0);
+return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 0,
+  guest_reg_value);
+}
+case HSR_SYSREG_DBGDTR_EL0:
+/* DBGDTR[TR]X_EL0 share the same encoding */
+case HSR_SYSREG_DBGDTRTX_EL0:
+return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0);
  HSR_SYSREG_DBG_CASES(DBGBVR):
  HSR_SYSREG_DBG_CASES(DBGBCR):
  HSR_SYSREG_DBG_CASES(DBGWVR):
diff --git a/xen/arch/arm/include/asm/arm64/hsr.h 
b/xen/arch/arm/include/asm/arm64/hsr.h
index e691d41c17..1495ccddea 100644
--- a/xen/arch/arm/include/asm/arm64/hsr.h
+++ b/xen/arch/arm/include/asm/arm64/hsr.h
@@ -47,6 +47,9 @@
  #define HSR_SYSREG_OSDLR_EL1  HSR_SYSREG(2,0,c1,c3,4)
  #define HSR_SYSREG_DBGPRCR_EL1HSR_SYSREG(2,0,c1,c4,4)
  #define HSR_SYSREG_MDCCSR_EL0 HSR_SYSREG(2,3,c0,c1,0)
+#define HSR_SYSREG_DBGDTR_EL0 HSR_SYSREG(2,3,c0,c4,0)
+#define HSR_SYSREG_DBGDTRTX_EL0   HSR_SYSREG(2,3,c0,c5,0)
+#define HS

Re: [PATCH v5 5/7] xen: ifdef inclusion of in

2023-12-04 Thread Oleksii
On Mon, 2023-12-04 at 10:46 +0100, Jan Beulich wrote:
> On 04.12.2023 10:39, Oleksii wrote:
> > On Mon, 2023-12-04 at 09:41 +0100, Jan Beulich wrote:
> > > On 01.12.2023 21:48, Oleksii Kurochko wrote:
> > > > Ifdef-ing inclusion of  allows to avoid
> > > > generation of empty  for cases when
> > > > CONFIG_GRANT_TABLE is not enabled.
> > > > 
> > > > The following changes were done for Arm:
> > > >  should be included directly because it
> > > > contains
> > > > gnttab_dom0_frames() macros which is unique for Arm and is used
> > > > in
> > > > arch/arm/domain_build.c.
> > > >  is #ifdef-ed with CONFIG_GRANT_TABLE in
> > > >  so in case of !CONFIG_GRANT_TABLE
> > > > gnttab_dom0_frames
> > > > won't be available for use in arch/arm/domain_build.c.
> > > > 
> > > > Suggested-by: Jan Beulich 
> > > 
> > > Not really, no: In particular ...
If it is comment was addressed to Suggested-by. Then it was added when
we didn't have a discussion about config GRANT_TABLE and depends on
things.
Probably I had to remove it after I started to update Kconfig things.

I am really sorry if I had to remove that before send this patch
version.

> > > 
> > > > --- a/xen/common/Kconfig
> > > > +++ b/xen/common/Kconfig
> > > > @@ -15,6 +15,7 @@ config CORE_PARKING
> > > >  config GRANT_TABLE
> > > >     bool "Grant table support" if EXPERT
> > > >     default y
> > > > +   depends on ARM || X86
> > > 
> > > ... this I explicitly said I consider wrong to add.
> > Then I misunderstood you.
> > 
> > What about to do the same as with MEM_ACCESS config and introduce
> > HAS_GRANT_TABLE?
> 
> That's an option, provided (and I put that under question before)
> there
> realistically can be ports which don't mean to support grant tables.
> You mentioned that things are fine for the dom0less setup you're
> testing,
> but I don't think a fully-functional Xen port makes sense to only
> support
> dom0less. But of course I'm willing to hear arguments to the
> contrary.
Unfortunately, I am not experienced in the depths of Xen, but I used
grant tables in Arm to share frames between dom0 and guest in PV
drivers. It should be another usage of grant tables.

I assume it is possible not to use grant tables and come up with
another solution, but it isn't the best idea, and I don't have any
reason not to use what already exists and works. Are there any cases
where something else, instead of grant tables, is used?

Therefore, I agree that a fully functional Xen port will support only
dom0less, but it can live for a long time only with dom0less. And in
non-dom0less grant tables will be used somewhere sooner or later.

> 
> > Or would it be better just update "depends on" to !RISCV && !PPC?
> 
> Definitely not.
So we have a "weird" situation.

Considering the message above, grant tables are likely to be used
anyway. From this point of view, there is not a lot of sense in
introducing "temporary" HAS_GRANT_TABLE as, at some point, every
architecture will use grant tables ( the same is with "depends on
!RISCV && &!PPC or any other combination ), and HAS_GRANT_TABLE won't
be needed any more except the time when support for new architecture
will be started and it will live without grant tables for some time.

But an introduction of HAS_GRANT_TABLE makes sense ( at least, when the
work on support for new architectures will be started ) for me.

If you ( or anyone else ) don't mind, I'll update the patch with an
introduction of HAS_GRANT_TABLE.

~ Oleksii



Re: [XEN PATCH v2 2/3] xen/arm: add SAF deviation for debugging and logging effects

2023-12-04 Thread Simone Ballarin

On 28/11/23 09:42, Jan Beulich wrote:

On 27.11.2023 18:34, Simone Ballarin wrote:

On 27/11/23 16:09, Jan Beulich wrote:

On 27.11.2023 15:35, Simone Ballarin wrote:

On 27/11/23 11:46, Jan Beulich wrote:

On 24.11.2023 18:29, Simone Ballarin wrote:

--- a/docs/misra/safe.json
+++ b/docs/misra/safe.json
@@ -28,6 +28,22 @@
},
{
"id": "SAF-3-safe",
+"analyser": {
+"eclair": "MC3R1.R13.1"
+},
+"name": "MC3R1.R13.1: effects for debugging and logging",
+"text": "Effects for debugging and loggings reasons that crash 
execution or produce logs are allowed in initializer lists. The evaluation order in abnormal 
conditions is not relevant."
+},


I'm wary of this statement. Order may not matter much anymore _after_ an
abnormal condition was encountered, but in the course of determining whether
an abnormal condition might have been reached it may very well still matter.


Do you object to the deviation in general? Or just to the wording?


Primarily the wording. Yet the need to adjust the wording also hints at there
needing to be care when actually making use of this deviation. Which it turn
I'm not convinced is in the spirit of Misra


The rule is really strict, but imho the only real dangerous situation is
when an entry performs a persistent side effect that can change the
behavior of another entry. I.e.:

int y = 0;
int x[2] =
{
y=1, /* first element will be always 1 */
y/* second element can be either 0 or 1 */
};

Above we have a dependency between the first entry and the second.

Now let's consider logging effects:

#define LOG(x) printf("LOG: %s", x);

int x[2] =
{
({ LOG("1"); 1; }),
({ LOG("2"); 2; })
};


Here the execution can print:
"LOG: 1LOG: 2" or
"LOG: 2LOG: 1".

Do we agree that the evaluation order of prints caused by logging
functions/macros do not normally cause dependencies between the
entries? The execution is still non-deterministic, but does it really
matter?.

In the case of function that crash execution, no dependencies can exist
since no further entries will be evaluated.

In conclusion, I propose the following rewording:

"text": "Effects that crash execution or produce logs are allowed in
initializer lists. Logging effects do not affect the evaluation of
subsequent entries. Crash effects are allowed as they represent the
end of the execution.


Let's assume we have a BUG_ON() (as the "crash effect") the condition of
which depends on where in the sequence of operations it actually executes,
i.e. (potentially) dependent upon another part of the initializer. I hope
we agree that's not something that should be deviated? Yet even the re-
worded statement would - according to my reading - permit doing so.

I guess I should try to remember to bring this up on this afternoon's call.


I known you had a conversation about that during the call, but from what
I known, a clear decision for crash effects wasn't taken.

If the community does not want SAF deviations for such cases, we should 
consider moving them to external variables.





--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -110,18 +110,21 @@ static unsigned long copy_guest(void *buf, uint64_t addr, 
unsigned int len,
unsigned long raw_copy_to_guest(void *to, const void *from, unsigned int 
len)
{
return copy_guest((void *)from, (vaddr_t)to, len,
+  /* SAF-4-safe No persistent side effects */
  GVA_INFO(current), COPY_to_guest | COPY_linear);
}

unsigned long raw_copy_to_guest_flush_dcache(void *to, const void *from,

 unsigned int len)
{
+/* SAF-4-safe No persistent side effects */
return copy_guest((void *)from, (vaddr_t)to, len, GVA_INFO(current),
  COPY_to_guest | COPY_flush_dcache | COPY_linear);
}

unsigned long raw_clear_guest(void *to, unsigned int len)

{
+/* SAF-4-safe No persistent side effects */
return copy_guest(NULL, (vaddr_t)to, len, GVA_INFO(current),
  COPY_to_guest | COPY_linear);
}
@@ -129,6 +132,7 @@ unsigned long raw_clear_guest(void *to, unsigned int len)
unsigned long raw_copy_from_guest(void *to, const void __user *from,
  unsigned int len)
{
+/* SAF-4-safe No persistent side effects */
return copy_guest(to, (vaddr_t)from, len, GVA_INFO(current),
  COPY_from_guest | COPY_linear);
}


I can only guess that in all four of these it's the use of "current" which
requires the comment. Yet imo that either needs making explicit, or such a
comment shouldn't go on use sites of "current", but on its definition site.



"current" does not contain any violation of R13.1. Its expansion
produces a side-effect, but this is not a problem in itself. The real
problem is that GVA_INFO expand

Re: [PATCH v5 5/7] xen: ifdef inclusion of in

2023-12-04 Thread Jan Beulich
On 04.12.2023 11:34, Oleksii wrote:
> If you ( or anyone else ) don't mind, I'll update the patch with an
> introduction of HAS_GRANT_TABLE.

I won't NAK such a patch, but unless convincing arguments appear I also
won't ACK it.

Jan



Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-12-04 Thread George Dunlap
On Mon, Dec 4, 2023 at 8:16 AM Jan Beulich  wrote:
> > I am in favor on moving faster and nitpicking less. Also, Andy put the
> > effort to produce the patch so he should have the default choice in the
> > wording. If the choice is taking the patch as is or rejecting it, I
> > would take it as is.
>
> I'm afraid there's also disagreement about where nitpicking starts. To me
> "broken" in the context here is technically incorrect. Hence asking for
> the wording to be changed wouldn't count as nitpicking to me. When badly
> worded comments of mine are commented upon, I also don't count this as
> nitpicking (there's always a grey area, of course).

Whether something is nitpicking or not is a value judgement; different
people come to different conclusions.  I'm afraid even the argument
about whether "broken" is appropriate to use in this context is a
matter of judgement: there are arguments both ways, and different
people have come to different conclusions.  The problem here is that
some people definitely think "broken" is the wrong word; and others
definitely think "broken" is the right word (some +2 and some -2).

Given that we're always going to have differences of judgement, we
need ways to move forward in spite of that.  Ideally the maintainers
should be implementing things according to the value of the community
as a whole: we should not be quibbling over things that the community
as a whole doesn't want quibbled over.

The basic mechanisms we have, voting (with the ability to appeal to
the committers) is meant to be a quick way to approximate that.  The
assumption is that with 6 active people in the leadership team ("the
committers" at the moment), from different companies and backgrounds,
the chance of a vote of the committers being out of sync with the
community is fairly small.

But of course, small is not impossible.  The list of committers hasn't
changed significantly in a while; it's entirely possible in this sort
of case for the values of committers and the values of the community
as a whole to drift.  How do we determine whether that's the case or
not?  Hence the community-wide survey.

The problem with "nitpicking" goes a bit deeper.  By its nature,
nitpicking doesn't really rise to the level of "something obviously
wrong"; it's more "too much time spent asking for changes to code
which is not obviously wrong".  How much is "too much"?  Again, it's a
value judgment.  If someone is nitpicking your patch, it's not really
that obvious what to do about it -- to ask for a formal vote of the
committers about a tiny change request is just as "nitpicky" or
"petty" as the tiny change request itself; it's not this or that
change which causes the problem with nitpicks, but the cumulative
effect.  How can this be "calibrated", so that we can ensure that
maintainers are just the right level if picky -- neither letting
sloppy / ugly code get checked in, nor wasting people's time and
emotional effort asking for changes which aren't worth the effort?
And how do we give people practical options to respond to a maintainer
who they think is being "picky", such that they can either be
convinced that he code changes are worth asking, or that the
maintainer can stop asking for minor changes that aren't worth the
benefits?

The last one I don't really have an answer for.

 -George



[PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread Jan Beulich
It is only in the error case that we want to clean up the new pool's
scheduler data; in the success case it's rather the old scheduler's
data which needs cleaning up.

Reported-by: René Winther Højgaard 
Signed-off-by: Jan Beulich 
Reviewed-by: Juergen Gross 

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
 for ( unit = old_units; unit; )
 {
 if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(ret ? c->sched : old_ops, unit->priv);
 old_unit = unit;
 unit = unit->next_in_list;
 xfree(old_unit);



Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs

2023-12-04 Thread Roger Pau Monné
On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> > >  bus = PCI_BUS(machine_sbdf);
> > >  devfn = PCI_DEVFN(machine_sbdf);
> > >  
> > > +if ( needs_vpci(d) && !has_vpci(d) )
> > > +{
> > > +printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI 
> > > support not enabled\n",
> > > +   &PCI_SBDF(seg, bus, devfn), d);
> > > +ret = -EPERM;
> > > +break;
> > 
> > I think this is likely too restrictive going forward.  The current
> > approach is indeed to enable vPCI on a per-domain basis because that's
> > how PVH dom0 uses it, due to being unable to use ioreq servers.
> > 
> > If we start to expose vPCI suport to guests the interface should be on
> > a per-device basis, so that vPCI could be enabled for some devices,
> > while others could still be handled by ioreq servers.
> > 
> > We might want to add a new flag to xen_domctl_assign_device (used by
> > XEN_DOMCTL_assign_device) in order to signal whether the device will
> > use vPCI.
> 
> Actually I don't think this is a good idea. I am all for flexibility but
> supporting multiple different configurations comes at an extra cost for
> both maintainers and contributors. I think we should try to reduce the
> amount of configurations we support rather than increasing them
> (especially on x86 where we have PV, PVH, HVM).

I think it's perfectly fine to initially require a domain to have all
its devices either passed through using vPCI or ireqs, but the
interface IMO should allow for such differentiation in the future.
That's why I think introducing a domain wide vPCI flag might not be
the best option going forward.

It would be perfectly fine for XEN_DOMCTL_assign_device to set a
domain wide vPCI flag, iow:

if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
{
if ( has_arch_pdevs(d) )
{
printk("All passthrough devices must use the same backend\n");
return -EINVAL;
}

/* Set vPCI domain flag */
}

We have already agreed that we want to aim for a setup where ioreqs
and vPCI could be used for the same domain, but I guess you assumed
that ioreqs would be used for emulated devices exclusively and vPCI
for passthrough devices?

Overall if we agree that ioreqs and vPCI should co-exist for a domain,
I'm not sure there's much reason to limit ioreqs to only handle
emulated devices, seems like an arbitrary limitation.

> I don't think we should enable IOREQ servers to handle PCI passthrough
> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> Passthrough can be handled by vPCI just fine. I think this should be a
> good anti-feature to have (a goal to explicitly not add this feature) to
> reduce complexity. Unless you see a specific usecase to add support for
> it?

There are passthrough devices (GPUs) that might require some extra
mediation on dom0 (like the Intel GVT-g thing) that would force the
usage of ioreqs to passthrough.

It's important that the interfaces we introduce are correct IMO,
because that ends up reflecting on the configuration options that we
expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
gets placed there will ultimately influence how the option gets
exposed in xl/libxl, and the interface there is relevant to keep
stable for end user sanity.

Thanks, Roger.



Re: [PATCH] x86/x2apic: introduce a mixed physical/cluster mode

2023-12-04 Thread Roger Pau Monné
On Sun, Dec 03, 2023 at 05:02:04PM -0800, Elliott Mitchell wrote:
> On Mon, Nov 27, 2023 at 09:27:18AM +0100, Roger Pau Monn
> > On Fri, Nov 24, 2023 at 05:15:34PM -0800, Elliott Mitchell wrote:
> > > On Thu, Nov 23, 2023 at 10:39:37AM +0100, Roger Pau Monn
> > > > On Tue, Nov 21, 2023 at 04:56:47PM -0800, Elliott Mitchell wrote:
> > > > > It was insisted that full logs be sent to xen-devel.  Perhaps I am
> > > > > paranoid, but I doubt I would have been successful at scrubbing all
> > > > > hardware serial numbers.  As such, I was willing to post substantial
> > > > > sections, but not everything.
> > > > 
> > > > Can you please point out which hardware serial numbers are you
> > > > referring to, and where are they printed in Xen dmesg?
> > > 
> > > I didn't confirm the presence of hardware serial numbers getting into
> > > Xen's dmesg, but there was a lot of data and many hexadecimal numbers.
> > > With 4000 lines of output, checking for concerning data is troublesome.
> > 
> > 4000 lines of output from Xen dmesg seems quite suspicious.  Xen dmesg
> > on a fully booted box is ~400 lines on my local box.  That might get
> > quite longer if you have a box with a lot of memory region, or a lot
> > of CPUs.
> 
> That was from 4 boots with differing settings.  Since it was full console
> log, it also had the initial Linux kernel boot messages.  Each log was
> ~1000 lines.

I think this is unfair.  People on the list was willing to go over
your 1000 lines of log in order to help solve your issue, and you
couldn't go over them to assert your claims about it containing
sensitive information?

> > > There was enough for alarms to trigger.
> > 
> > That's fine, but without pointing out which information is sensitive,
> > it's very unlikely such concerns will ever get fixed.
> > 
> > Could you at least go over the log and point out the first instance of
> > such line that you consider contains sensitive information?
> 
> I would have been more comfortable with getting some guidance on which
> portions were desired or which could be left out.  No need for Linux's
> boot messages, what would cut things down by half.

IIRC the specific request was for Xen logs, so yes, you could have
left out the Linux side, at least until asked for.

> memory map, lots more goes.
> 
> It is easier to be comfortable with 40 line sections than 1000 line
> sections.

I don't think that's really feasible, it's unclear how much you would
trim, and whether such trimming could remove important information to
help debug the issue.

Again, it would be very helpful for us to get pointed out at what
sensitive information is being printed, so that it can be removed.

Thanks, Roger.



Re: [RFC XEN PATCH v2 1/3] xen/vpci: Clear all vpci status of device

2023-12-04 Thread Roger Pau Monné
On Mon, Dec 04, 2023 at 06:57:03AM +, Chen, Jiqian wrote:
> Hi Daniel P. Smith,
> 
> On 2023/11/30 22:52, Roger Pau Monné wrote:
> > On Thu, Nov 30, 2023 at 07:39:38AM -0500, Daniel P. Smith wrote:
> >> On 11/30/23 07:25, Daniel P. Smith wrote:
> >>> On 11/30/23 01:22, Chen, Jiqian wrote:
>  Hi Roger and Daniel P. Smith,
> 
>  On 2023/11/28 22:08, Roger Pau Monné wrote:
> > On Fri, Nov 24, 2023 at 06:41:34PM +0800, Jiqian Chen wrote:
> >> When a device has been reset on dom0 side, the vpci on Xen
> >> side won't get notification, so the cached state in vpci is
> >> all out of date compare with the real device state.
> >> To solve that problem, this patch add new hypercall to clear
> >> all vpci device state. And when reset device happens on dom0
> >> side, dom0 can call this hypercall to notify vpci.
> >>
> >> Signed-off-by: Jiqian Chen 
> >> Signed-off-by: Huang Rui 
> >> ---
> >>   xen/arch/x86/hvm/hypercall.c  |  1 +
> >>   xen/drivers/passthrough/pci.c | 21 +
> >>   xen/drivers/pci/physdev.c | 14 ++
> >>   xen/drivers/vpci/vpci.c   |  9 +
> >>   xen/include/public/physdev.h  |  2 ++
> >>   xen/include/xen/pci.h |  1 +
> >>   xen/include/xen/vpci.h    |  6 ++
> >>   7 files changed, 54 insertions(+)
> >>
> >> diff --git a/xen/arch/x86/hvm/hypercall.c
> >> b/xen/arch/x86/hvm/hypercall.c
> >> index eeb73e1aa5..6ad5b4d5f1 100644
> >> --- a/xen/arch/x86/hvm/hypercall.c
> >> +++ b/xen/arch/x86/hvm/hypercall.c
> >> @@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd,
> >> XEN_GUEST_HANDLE_PARAM(void) arg)
> >>   case PHYSDEVOP_pci_mmcfg_reserved:
> >>   case PHYSDEVOP_pci_device_add:
> >>   case PHYSDEVOP_pci_device_remove:
> >> +    case PHYSDEVOP_pci_device_state_reset:
> >>   case PHYSDEVOP_dbgp_op:
> >>   if ( !is_hardware_domain(currd) )
> >>   return -ENOSYS;
> >> diff --git a/xen/drivers/passthrough/pci.c
> >> b/xen/drivers/passthrough/pci.c
> >> index 04d00c7c37..f871715585 100644
> >> --- a/xen/drivers/passthrough/pci.c
> >> +++ b/xen/drivers/passthrough/pci.c
> >> @@ -824,6 +824,27 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> >>   return ret;
> >>   }
> >> +int pci_reset_device_state(u16 seg, u8 bus, u8 devfn)
> >
> > You could use pci_sbdf_t here instead of 3 parameters.
>  Will change in next version, thank you.
> 
> >
> > I'm however unsure whether we really need this helper just to fetch
> > the pdev and call vpci_reset_device_state(), I think you could place
> > this logic directly in pci_physdev_op().  Unless there are plans to
> > use such logic outside of pci_physdev_op().
>  If I place the logic of pci_reset_device_state directly in
>  pci_physdev_op. I think I need to declare vpci_reset_device_state in
>  pci.h? If it is suitable, I will change in next version.
> 
> >
> >> +{
> >> +    struct pci_dev *pdev;
> >> +    int ret = -ENODEV;
> >
> > Some XSM check should be introduced fro this operation, as none of the
> > existing ones is suitable.  See xsm_resource_unplug_pci() for example.
> >
> > xsm_resource_reset_state_pci() or some such I would assume.
>  I don't know what I should do in XSM function(assume it is
>  xsm_resource_reset_state_pci).
>  Hi Daniel P. Smith, could you please give me some suggestions?
>  Just to check the XSM_PRIV action?
> 
> >>>
> >>> Roger, thank you for seeing this and adding me in!
> >>>
> >>> Jiqian, I just wanted to let you know I have seen your post but I have
> >>> been a little tied up this week. Just with the cursory look, I think
> >>> Roger is suggesting a new XSM check/hook is warranted.
> >>>
> >>> If you would like to attempt at writing the dummy policy side, mimic
> >>> xsm_resource_plug_pci in xen/include/xsm/dummy.h and
> >>> xen/include/xsm/xsm.h, then I can look at handling the FLASK portion
> >>> next week and provide it to you for inclusion into the series. If you
> >>> are not comfortable with it, I can look at the whole thing next week.
> >>> Just let me know what you would be comfortable with.
> >>
> >> Apologies, thinking about for a moment and was thinking the hook should be
> >> called xsm_resource_config_pci. I would reset as a config operation and
> >> there might be new ones in the future. I do not believe there is a need to
> >> have fine grain access control down to individual config operation, thus
> >> they could all be captured under this one hook. Roger, what are your
> >> thoughts about this, in particular how you see vpci evolving?
> > 
> > So the configuration space reset should only be done by the domain
> > that's also capable of triggering the physical device reset, usually
> > the hardware domain.  I don't think it's possi

Re: [RFC XEN PATCH v2 3/3] tools: Add new function to get gsi from irq

2023-12-04 Thread Roger Pau Monné
On Mon, Dec 04, 2023 at 05:38:06AM +, Chen, Jiqian wrote:
> On 2023/12/1 17:03, Roger Pau Monné wrote:
> > On Thu, Nov 30, 2023 at 07:09:12PM -0800, Stefano Stabellini wrote:
> >> On Thu, 30 Nov 2023, Roger Pau Monné wrote:
> >>> On Wed, Nov 29, 2023 at 08:02:40PM -0800, Stefano Stabellini wrote:
>  n Wed, 29 Nov 2023, Stefano Stabellini wrote:
> > On Tue, 28 Nov 2023, Roger Pau Monné wrote:
> >> On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote:
> >>> In PVH dom0, it uses the linux local interrupt mechanism,
> >>> when it allocs irq for a gsi, it is dynamic, and follow
> >>> the principle of applying first, distributing first. And
> >>> if you debug the kernel codes, you will find the irq
> >>> number is alloced from small to large, but the applying
> >>> gsi number is not, may gsi 38 comes before gsi 28, that
> >>> causes the irq number is not equal with the gsi number.
> >>> And when we passthrough a device, QEMU will use its gsi
> >>> number to do mapping actions, see xen_pt_realize->
> >>> xc_physdev_map_pirq, but the gsi number is got from file
> >>> /sys/bus/pci/devices/:xx:xx.x/irq in current code,
> >>> so it will fail when mapping.
> >>> And in current codes, there is no method to translate
> >>> irq to gsi for userspace.
> >>
> >> I think it would be cleaner to just introduce a new sysfs node that
> >> contains the gsi if a device is using one (much like the irq sysfs
> >> node)?
> >>
> >> Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so
> >> placing it in privcmd does seem quite strange to me.  I understand
> >> that for passthrough we need the GSI, but such translation layer from
> >> IRQ to GSI is all Linux internal, and it would be much simpler to just
> >> expose the GSI in sysfs IMO.
> >
> > Maybe something to add to drivers/xen/sys-hypervisor.c in Linux.
> > Juergen what do you think?
> 
>  Let me also add that privcmd.c is already a Linux specific interface.
>  Although it was born to be a Xen hypercall "proxy" in reality today we
>  have a few privcmd ioctls that don't translate into hypercalls. So I
>  don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem.
> >>>
> >>> Maybe not all ioctls translate to hypercalls (I guess you are
> >>> referring to the IOCTL_PRIVCMD_RESTRICT ioctl), but they are specific
> >>> Xen actions.  Getting the GSI used by a device has nothing do to with
> >>> Xen.
> >>>
> >>> IMO drivers/xen/sys-hypervisor.c is also not appropriate, but I'm not
> >>> the maintainer of any of those components.
> >>>
> >>> There's nothing Xen specific about fetching the GSI associated with a
> >>> PCI device.  The fact that Xen needs it for passthrough is just a red
> >>> herring, further cases where the GSI is needed might arise outside of
> >>> Xen, and hence such node would better be placed in a generic
> >>> location.  The right location should be /sys/bus/pci/devices//gsi.
> >>
> >> That might be true but /sys/bus/pci/devices//gsi is a non-Xen
> >> generic interface and the maintainers of that portion of Linux code
> >> might have a different opinion. We'll have to see.
> > 
> > Right, but before resorting to implement a Xen specific workaround
> > let's attempt to do it the proper way :).
> > 
> > I cannot see why exposing the gsi on sysfs like that would be an
> > issue.  There's a lot of resource information exposed on sysfs
> > already, and it's a trivial node to implement.
> Thanks for both of you' s suggestions. At present, it seems the result of 
> discussion is that it needs to add a gsi sysfs. I will modify it in the next 
> version and then add the corresponding maintainer to the review list.

Thanks, please keep xen-devel on Cc if possible.  Maybe if the
suggested path is not suitable maintainers can recommend another path
where the gsi (or equivalent) node could live.

Thanks, Roger.



[PATCH] bump default SeaBIOS version to 1.16.3

2023-12-04 Thread Jan Beulich
Signed-off-by: Jan Beulich 

--- a/Config.mk
+++ b/Config.mk
@@ -229,7 +229,7 @@ MINIOS_UPSTREAM_URL ?= https://xenbits.x
 MINIOS_UPSTREAM_REVISION ?= b08019f0b2fbc30c75169a160acb9fd9af5d68f4
 
 SEABIOS_UPSTREAM_URL ?= https://xenbits.xen.org/git-http/seabios.git
-SEABIOS_UPSTREAM_REVISION ?= rel-1.16.2
+SEABIOS_UPSTREAM_REVISION ?= rel-1.16.3
 
 ETHERBOOT_NICS ?= rtl8139 8086100e
 



Re: [PATCH] CODING_STYLE: Add a section of the naming convention

2023-12-04 Thread Luca Fancellu


> On 1 Dec 2023, at 18:49, Julien Grall  wrote:
> 
> 
> 
> On 01/12/2023 18:47, Julien Grall wrote:
>> From: Julien Grall 
>> Several maintainers have expressed a stronger preference
>> to use '-' when in filename and option that contains multiple
>> words.
>> So document it in CODING_STYLE.
>> Signed-off-by: Julien Grall 
>> ---
>>  CODING_STYLE | 9 +
>>  1 file changed, 9 insertions(+)
>> diff --git a/CODING_STYLE b/CODING_STYLE
>> index ced3ade5a6fb..afd09177745b 100644
>> --- a/CODING_STYLE
>> +++ b/CODING_STYLE
>> @@ -144,6 +144,15 @@ separate lines and each line should begin with a 
>> leading '*'.
>>   * Note beginning and end markers on separate lines and leading '*'.
>>   */
>>  +Naming convention
>> +-
>> +
>> +When command line option or filename contain multiple words, a '-'
>> +should be to separate them. E.g. 'timer-works'.
>> +
>> +Note that some of the option and filename are using '_'. This is now
>> +deprecated.
> 
> Urgh, I sent the wrong draft :(. This is the wording I wanted to propose:
> 
> +Naming convention
> +-
> +
> +'-' should be used to separate words in commandline options and filenames.
> +E.g. timer-works.
> +
> +Note that some of the options and filenames are using '_'. This is now
> +deprecated.
> +
> 

Hi Julien,

Can we make an exception for python files that are meant to be used as module?
Because modules containing ‘-‘ cannot be imported using ‘import’ keyword and
needs another way to do them which is not conventional

Cheers,
Luca


> Cheers,
> 
> -- 
> Julien Grall
> 



Re: [PATCH] CODING_STYLE: Add a section of the naming convention

2023-12-04 Thread Julien Grall

Hi Luca,

On 04/12/2023 11:20, Luca Fancellu wrote:




On 1 Dec 2023, at 18:49, Julien Grall  wrote:



On 01/12/2023 18:47, Julien Grall wrote:

From: Julien Grall 
Several maintainers have expressed a stronger preference
to use '-' when in filename and option that contains multiple
words.
So document it in CODING_STYLE.
Signed-off-by: Julien Grall 
---
  CODING_STYLE | 9 +
  1 file changed, 9 insertions(+)
diff --git a/CODING_STYLE b/CODING_STYLE
index ced3ade5a6fb..afd09177745b 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -144,6 +144,15 @@ separate lines and each line should begin with a leading 
'*'.
   * Note beginning and end markers on separate lines and leading '*'.
   */
  +Naming convention
+-
+
+When command line option or filename contain multiple words, a '-'
+should be to separate them. E.g. 'timer-works'.
+
+Note that some of the option and filename are using '_'. This is now
+deprecated.


Urgh, I sent the wrong draft :(. This is the wording I wanted to propose:

+Naming convention
+-
+
+'-' should be used to separate words in commandline options and filenames.
+E.g. timer-works.
+
+Note that some of the options and filenames are using '_'. This is now
+deprecated.
+



Hi Julien,

Can we make an exception for python files that are meant to be used as module?
Because modules containing ‘-‘ cannot be imported using ‘import’ keyword and
needs another way to do them which is not conventional


I am not sure this needs to be written down explicitely. At the top of 
the file we have:


"The Xen coding style described below is the coding style used by the
Xen hypervisor itself (xen/*) as well as various associated low-level
libraries (e.g. tools/libxc/*).

An exception is made for files which are imported from an external
source. In these cases the prevailing coding style of the upstream
source is generally used (commonly the Linux coding style).

Other parts of the code base may use other coding styles, sometimes
explicitly (e.g. tools/libxl/CODING_STYLE) but often implicitly (Linux
coding style is fairly common). In general you should copy the style
of the surrounding code. If you are unsure please ask."

and I would not describe Python as low-level.

Cheers,

--
Julien Grall



Re: [PATCH] CODING_STYLE: Add a section of the naming convention

2023-12-04 Thread Luca Fancellu


> On 4 Dec 2023, at 11:31, Julien Grall  wrote:
> 
> Hi Luca,
> 
> On 04/12/2023 11:20, Luca Fancellu wrote:
>>> On 1 Dec 2023, at 18:49, Julien Grall  wrote:
>>> 
>>> 
>>> 
>>> On 01/12/2023 18:47, Julien Grall wrote:
 From: Julien Grall 
 Several maintainers have expressed a stronger preference
 to use '-' when in filename and option that contains multiple
 words.
 So document it in CODING_STYLE.
 Signed-off-by: Julien Grall 
 ---
  CODING_STYLE | 9 +
  1 file changed, 9 insertions(+)
 diff --git a/CODING_STYLE b/CODING_STYLE
 index ced3ade5a6fb..afd09177745b 100644
 --- a/CODING_STYLE
 +++ b/CODING_STYLE
 @@ -144,6 +144,15 @@ separate lines and each line should begin with a 
 leading '*'.
   * Note beginning and end markers on separate lines and leading '*'.
   */
  +Naming convention
 +-
 +
 +When command line option or filename contain multiple words, a '-'
 +should be to separate them. E.g. 'timer-works'.
 +
 +Note that some of the option and filename are using '_'. This is now
 +deprecated.
>>> 
>>> Urgh, I sent the wrong draft :(. This is the wording I wanted to propose:
>>> 
>>> +Naming convention
>>> +-
>>> +
>>> +'-' should be used to separate words in commandline options and filenames.
>>> +E.g. timer-works.
>>> +
>>> +Note that some of the options and filenames are using '_'. This is now
>>> +deprecated.
>>> +
>>> 
>> Hi Julien,
>> Can we make an exception for python files that are meant to be used as 
>> module?
>> Because modules containing ‘-‘ cannot be imported using ‘import’ keyword and
>> needs another way to do them which is not conventional
> 
> I am not sure this needs to be written down explicitely. At the top of the 
> file we have:
> 
> "The Xen coding style described below is the coding style used by the
> Xen hypervisor itself (xen/*) as well as various associated low-level
> libraries (e.g. tools/libxc/*).
> 
> An exception is made for files which are imported from an external
> source. In these cases the prevailing coding style of the upstream
> source is generally used (commonly the Linux coding style).
> 
> Other parts of the code base may use other coding styles, sometimes
> explicitly (e.g. tools/libxl/CODING_STYLE) but often implicitly (Linux
> coding style is fairly common). In general you should copy the style
> of the surrounding code. If you are unsure please ask."
> 
> and I would not describe Python as low-level.

Ok makes sense to me! Thanks


> 
> Cheers,
> 
> -- 
> Julien Grall



Re: [PATCH v9 2/2] xen/vpci: header: filter PCI capabilities

2023-12-04 Thread Jan Beulich
On 01.12.2023 16:45, Stewart Hildebrand wrote:
> --- a/xen/drivers/pci/pci.c
> +++ b/xen/drivers/pci/pci.c
> @@ -39,31 +39,42 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, 
> unsigned int cap)
>  return 0;
>  }
>  
> -unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
> -   unsigned int cap)
> +unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
> +   const unsigned int *cap, unsigned int n,

Imo this would better be written as cap[] (or even caps[]).

> @@ -545,6 +546,68 @@ static int cf_check init_bars(struct pci_dev *pdev)
>  if ( rc )
>  return rc;
>  
> +if ( !is_hardware_domain(pdev->domain) )
> +{
> +if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST )
> +{
> +/* Only expose capabilities to the guest that vPCI can handle. */
> +unsigned int next, ttl = 48;
> +const unsigned int supported_caps[] = {
> +PCI_CAP_ID_MSI,
> +PCI_CAP_ID_MSIX,
> +};

static?

With the two adjustments
Reviewed-by: Jan Beulich 

Jan



Re: [PATCH] x86/DMI: adjustments to comply with Misra C:2012 Rule 9.3

2023-12-04 Thread Nicola Vetrini

On 2023-11-30 15:56, Jan Beulich wrote:

On 30.11.2023 08:55, Jan Beulich wrote:

The rule demands that all array elements be initialized (or dedicated
initializers be used). Introduce a small set of macros to allow doing 
so

without unduly affecting use sites (in particular in terms of how many
elements .matches[] actually has; right now there's no use of
DMI_MATCH4(), so we could even consider reducing the array size to 3).

Note that DMI_MATCH() needs adjustment because of the comma included 
in

its expansion, which - due to being unparenthesized - would otherwise
cause macro arguments in the "further replacement" step to be wrong.


Sadly this doesn't work with older gcc (4.8.5 is what I had an issue 
with,

complaining "initializer element is not constant").

Jan


Hi,

I tried plugging the relevant code into godbolt.org to try gcc-4.8.5, 
but I'm not able to reproduce the error (see 
https://godbolt.org/z/cP88YeWhh). Can you please provide some more 
details on where the issue is?


--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



Re: [XEN PATCH 7/7] xen/page_alloc: deviate first_valid_mfn for MISRA C Rule 8.4

2023-12-04 Thread Nicola Vetrini

On 2023-12-04 08:44, Jan Beulich wrote:

On 02.12.2023 04:03, Stefano Stabellini wrote:

On Fri, 1 Dec 2023, Jan Beulich wrote:

On 01.12.2023 03:47, Stefano Stabellini wrote:

On Wed, 29 Nov 2023, Nicola Vetrini wrote:

No functional change.

Signed-off-by: Nicola Vetrini 
---
The preferred way to deviate is to use asmlinkage, but this 
modification is only
the consequence of NUMA on ARM (and possibly PPC) being a work in 
progress.
As stated in the comment above the textual deviation, 
first_valid_mfn will
likely then become static and there would be no need for the 
comment anymore.
This works towards having the analysis for this rule clean (i.e. no 
violations);
the interest in having a clean rule is that then it could be used 
to signal

newly introduced violations by making the analysis job fail.


Please add this text as part of the commit message. It can be done 
on

commit.


I assume you saw my reply on another of the patches in this series as 
to
asmlinkage use on variables? IOW I think this paragraph would also 
need

adjustment to account for that.


I was going to ask you about that: reading your reply
https://marc.info/?l=xen-devel&m=170142048615336 it is not clear to me
what you are asking or suggesting as next step in regard to asmlinkage
use on variables.


Either we need a separate attribute, or we need affirmation that 
calling

convention attributes are ignored (and going to be going forward) for
variables, or we need to resort to SAF-* comments. I'm not sure what's
best (assuming the "affirm" wouldn't really be possible).



Well, gcc does warn on unsupported attributes for the entity which are 
being dropped. This appears to be the case for calling convention 
attributes, as they are not listed in their documentation for variable 
attributes, but some more digging would be required to determine whether 
that's always the case.


--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



[xen-unstable test] 183983: tolerable FAIL

2023-12-04 Thread osstest service owner
flight 183983 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183983/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183971
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183978
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183978
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183978
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183978
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183978
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183978
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183978
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183978
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183978
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183978
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183978
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  525c7c094b258e8a46b494488eef96f5670eb352
baseline version:
 xen  525c7c094b258e8a46b494488eef96f5670eb352

Last test of basis   183983  2023-12-04 01:54:08 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i3

Re: Trying add smt=off disabled cores to cpupool crash xen

2023-12-04 Thread Juergen Gross

On 04.12.23 11:13, Jan Beulich wrote:

On 04.12.2023 11:02, Juergen Gross wrote:

On 04.12.23 10:15, Jan Beulich wrote:

On 01.12.2023 21:12, Andrew Cooper wrote:

On 01/12/2023 7:59 pm, René Winther Højgaard wrote:

If I set smt=off and try to configure cpupools with credit(1) as if
all cores are available, I get the following crash.

The crash happens when I try to use xl cpupool-add-cpu on the disabled
HT sibling cores.

Hyper-threading is enabled in the firmware, and only disabled with
smt=off.


CC'ing some maintainers.

I expect this will also explode when a CPU is runtime offlined with
`xen-hptool cpu-offline` and then added to a cpupool.

Interestingly, the crash is mov (%rdx,%rax,1),%r13, and I think that's
the percpu posion value in %rdx.

I expect cpupools want to reject parked/offline CPUs.


While the only explicit check there is

  if ( cpu >= nr_cpu_ids )
  goto addcpu_out;

I would have expected this

  if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
   cpumask_intersects(cpus, &cpupool_locked_cpus) )
  goto addcpu_out;

to deal with the situation, as parked/offline CPUs shouldn't be "free".
Jürgen?


The problem is the call of sched_get_opt_cpumask() to need the percpu area
of the cpu in question.


That was my first thought, too, but then I saw cpupool_assign_cpu_locked() on
the call trace, which is called only afterwards. Plus sched_get_opt_cpumask()
needs the per-CPU area only when granularity was switched from its default of
SCHED_GRAN_cpu afaics.


Oh right you are.

My patch is needed for larger granularities, though.

I've tried to hit the same problem as René, but everything works as intended (no
crash, but adding an offline cpu is being rejected).

René, could you please tell us what exactly you've been doing? This would be:

- Xen command line parameters
- Output of "xl info"
- Output of "xl cpupool-list" before starting to manipulate cpupools
- Output of "xl cpupool-list -c" before starting to manipulate cpupools
- Cpupool config file used to create new cpupool
- xl commands you've used to setup the cpupool and adding the cpu(s) to it

Thanks,


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] x86/DMI: adjustments to comply with Misra C:2012 Rule 9.3

2023-12-04 Thread Jan Beulich
On 04.12.2023 12:52, Nicola Vetrini wrote:
> On 2023-11-30 15:56, Jan Beulich wrote:
>> On 30.11.2023 08:55, Jan Beulich wrote:
>>> The rule demands that all array elements be initialized (or dedicated
>>> initializers be used). Introduce a small set of macros to allow doing 
>>> so
>>> without unduly affecting use sites (in particular in terms of how many
>>> elements .matches[] actually has; right now there's no use of
>>> DMI_MATCH4(), so we could even consider reducing the array size to 3).
>>>
>>> Note that DMI_MATCH() needs adjustment because of the comma included 
>>> in
>>> its expansion, which - due to being unparenthesized - would otherwise
>>> cause macro arguments in the "further replacement" step to be wrong.
>>
>> Sadly this doesn't work with older gcc (4.8.5 is what I had an issue 
>> with,
>> complaining "initializer element is not constant").
> 
> I tried plugging the relevant code into godbolt.org to try gcc-4.8.5, 
> but I'm not able to reproduce the error (see 
> https://godbolt.org/z/cP88YeWhh). Can you please provide some more 
> details on where the issue is?

It apparently doesn't like the compound literal as initializer. I need
to re-submit the adjusted patch, though.

Jan



Re: [PATCH 06/12] scsi: remove AioContext locking

2023-12-04 Thread Kevin Wolf
Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> The AioContext lock no longer has any effect. Remove it.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/hw/virtio/virtio-scsi.h | 14 --
>  hw/scsi/scsi-bus.c  |  2 --
>  hw/scsi/scsi-disk.c | 28 
>  hw/scsi/virtio-scsi.c   | 18 --
>  4 files changed, 4 insertions(+), 58 deletions(-)

> @@ -2531,13 +2527,11 @@ static void scsi_unrealize(SCSIDevice *dev)
>  static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
>  {
>  SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
> -AioContext *ctx = NULL;
> +
>  /* can happen for devices without drive. The error message for missing
>   * backend will be issued in scsi_realize
>   */
>  if (s->qdev.conf.blk) {
> -ctx = blk_get_aio_context(s->qdev.conf.blk);
> -aio_context_acquire(ctx);
>  if (!blkconf_blocksizes(&s->qdev.conf, errp)) {
>  goto out;
>  }
> @@ -2549,15 +2543,11 @@ static void scsi_hd_realize(SCSIDevice *dev, Error 
> **errp)
>  }
>  scsi_realize(&s->qdev, errp);
>  out:
> -if (ctx) {
> -aio_context_release(ctx);
> -}
>  }

This doesn't build for me:

../hw/scsi/scsi-disk.c:2545:1: error: label at end of compound statement is a 
C2x extension [-Werror,-Wc2x-extensions]
}
^
1 error generated.

Kevin




Re: [PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-12-04 Thread Kevin Wolf
Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> Protect the Task Management Function BH state with a lock. The TMF BH
> runs in the main loop thread. An IOThread might process a TMF at the
> same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
> must be protected by a lock.
> 
> Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
> This avoids more locking to protect the virtqueue and SCSI layer state.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Kevin Wolf 




Re: [PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-12-04 Thread Kevin Wolf
Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> Protect the Task Management Function BH state with a lock. The TMF BH
> runs in the main loop thread. An IOThread might process a TMF at the
> same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
> must be protected by a lock.
> 
> Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
> This avoids more locking to protect the virtqueue and SCSI layer state.
> 
> Signed-off-by: Stefan Hajnoczi 

The second part reminds me that the implicit protection of the virtqueue
and SCSI data structures by having all accesses in a single thread is
hard to review and I think we wanted to put some assertions there to
check that we're really running in the right thread. I don't think we
have done that so far, so I suppose after this patch would be the place
in the series to add them, before we remove the protection by the
AioContext lock?

Kevin




Re: [PATCH 02/12] tests: remove aio_context_acquire() tests

2023-12-04 Thread Kevin Wolf
Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> The aio_context_acquire() API is being removed. Drop the test case that
> calls the API.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Kevin Wolf 




Re: [PATCH 1/2] x86/xen: add CPU dependencies for 32-bit build

2023-12-04 Thread Juergen Gross

On 04.12.23 09:47, Arnd Bergmann wrote:

From: Arnd Bergmann 

Xen only supports modern CPUs even when running a 32-bit kernel, and it now
requires a kernel built for a 64 byte (or larger) cache line:

In file included from :
In function 'xen_vcpu_setup',
 inlined from 'xen_vcpu_setup_restore' at arch/x86/xen/enlighten.c:111:3,
 inlined from 'xen_vcpu_restore' at arch/x86/xen/enlighten.c:141:3:
include/linux/compiler_types.h:435:45: error: call to '__compiletime_assert_287' 
declared with attribute error: BUILD_BUG_ON failed: sizeof(*vcpup) > 
SMP_CACHE_BYTES
arch/x86/xen/enlighten.c:166:9: note: in expansion of macro 'BUILD_BUG_ON'
   166 | BUILD_BUG_ON(sizeof(*vcpup) > SMP_CACHE_BYTES);
   | ^~~~

Enforce the dependency with a whitelist of CPU configurations. In normal
distro kernels, CONFIG_X86_GENERIC is enabled, and this works fine. When this
is not set, still allow Xen to be built on kernels that target a 64-bit
capable CPU.

Fixes: db2832309a82 ("x86/xen: fix percpu vcpu_info allocation")
Signed-off-by: Arnd Bergmann 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread George Dunlap
On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich  wrote:
>
> It is only in the error case that we want to clean up the new pool's
> scheduler data; in the success case it's rather the old scheduler's
> data which needs cleaning up.
>
> Reported-by: René Winther Højgaard 
> Signed-off-by: Jan Beulich 
> Reviewed-by: Juergen Gross 
>
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>  for ( unit = old_units; unit; )
>  {
>  if ( unit->priv )
> -sched_free_udata(c->sched, unit->priv);
> +sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>  old_unit = unit;
>  unit = unit->next_in_list;
>  xfree(old_unit);

This code is unfortunately written in a "clever" way which seems to
have introduced some confusion.  The one place which calls "goto
out_free" goes through and replaces *most* of the "old_*" variables
with the "new" equivalents.  That's why we're iterating over
`old_units` even on the failure path.

The result is that this change doesn't catch another bug on the
following line, in the error case:

sched_free_domdata(old_ops, old_domdata);

At this point, old_ops is still the old ops, but old_domdata is the
*new* domdata.

A patch like the following (compile tested only) would fix it along
the lines of the original intent:
8<---
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..78f21839d3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 {
 old_units = new_units;
 old_domdata = domdata;
+old_ops = c->sched;
 ret = -ENOMEM;
 goto out_free;
 }
@@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 domain_unpause(d);

  out_free:
+/*
+ * NB if we've jumped here, "old_units", "old_ops", and so on will
+ * actually be pointing to the new ops, since when aborting it's
+ * the new ops we want to free.
+ */
 for ( unit = old_units; unit; )
 {
 if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(old_ops, unit->priv);
 old_unit = unit;
 unit = unit->next_in_list;
 xfree(old_unit);
>8

But given that this kind of cleverness has already fooled two of our
most senior developers, I'd suggest making the whole thing more
explicit; something like the attached (again compile-tested only)?

 -George


0001-sched-clarify-and-correct-sched_move_domain-s-cleanu.patch
Description: Binary data


Re: [RFC PATCH] xen/arm: Add emulation of Debug Data Transfer Registers

2023-12-04 Thread Ayan Kumar Halder



On 04/12/2023 10:31, Julien Grall wrote:

Hi Ayan,

Hi Julien,


On 01/12/2023 18:50, Ayan Kumar Halder wrote:

Currently if user enables HVC_DCC config option in Linux, it invokes
access to debug data transfer registers (ie DBGDTRTX_EL0 on arm64,
DBGDTRTXINT on arm32). As these registers are not emulated, Xen injects
an undefined exception to the guest. And Linux crashes.


I am missing some data points here to be able to say whether I would 
be ok with emulating the registers. So some questions:
  * As you wrote below, HVC_DCC will return -ENODEV after this 
emulation. So may I ask what's the use case to enable it? (e.g. is 
there a distro turning this on?)


I am not aware of any distro using (or not using) this feature. This 
issue came to light during our internal testing, when HVC_DCC was 
enabled to use the debug console. When Linux runs without Xen, then we 
could observe the logs on the debug console. When Linux was running as a 
VM, it crashed.


My intention here was to do the bare minimum emulation so that the crash 
could be avoided.


 * Linux is writing to the registers unconditionally, but is the spec 
mandating the implementation of the registers? (I couldn't find either 
way)


From ARM DDI 0487I.a ID081822, H1.2, External debug,

"The Debug Communications Channel, DCC, passes data between the PE and 
the debugger:


— The DCC includes the data transfer registers, DTRRX and DTRTX, and 
associated flow-control flags.


— Although the DCC is an essential part of Debug state operation, it can 
also be used in Non-debug state."


From this I infer that the spec mandates the implementation of these 
registers. IOW, these registers should always be present in any Arm 
compliant SoC.



 * When was this check introduced in Linux? Did it ever changed?


This check was introduced in the following commit

"commit f35dc083506e2fd7739d8615971c46b5246e
Author: Rob Herring 
Date:   Tue Sep 24 21:05:58 2013 -0500

    TTY: hvc_dcc: probe for a JTAG connection before registering

    Enabling the ARM DCC console and using without a JTAG connection will
    simply hang the system. Since distros like to turn on all options, this
    is a reoccurring problem to debug. We can do better by checking if
    anything is attached and handling characters. There is no way to probe
    this, so send a newline and check that it is handled.
"

As of today, this check hasn't changed.



We wish to avoid this crash by adding a "partial" emulation. DBGDTR_EL0
is emulated as TXfull | RXfull.


Skimming through the Arm Arm, I see that TXfull and Rxfull indicates 
that both buffers are full but it doesn't explicitly say this means 
the feature is not available.

We are not saying that the feature is not available. Rather ...


I understand this is what Linux checks, but if we want to partially 
emulate the registers in Xen, then I'd like us to make sure this is 
backed by the Arm Arm rather than based on Linux implementation (which 
can change at any point).



Refer ARM DDI 0487I.a ID081822, D17.3.8, DBGDTRTX_EL0
"If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN"
Also D17.3.7 DBGDTRRX_EL0,
" If RXfull is set to 1, return the last value written to DTRRX."

Thus, any OS is expected to read DBGDTR_EL0 and check for TXfull
before using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() --->
hvc_dcc_check() , it returns -ENODEV. In this way, we are preventing
the guest to be aborted.


See above, what guarantees us that Linux will not change this behavior 
in the future?


If I understand "If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN" 
correctly, it seems that Arm ARM expects OS to check for TXfull.


If the condition is true, it should return some error.

Let me know if I misunderstood this.





We also emulate DBGDTRTX_EL0 as RAZ/WI.

We have added emulation for AArch32 variant of these registers as well.
Also, we have added handle_read_val_wi() to emulate DBGDSCREXT register
to return a specific value (ie TXfull | RXfull) and ignore any writes
to this register.

Signed-off-by: Michal Orzel 
Signed-off-by: Ayan Kumar Halder 


We usually expect the first Signed-off-by to also be the author. So 
should Michal be the author of this patch?

Yes, I will make Michal as the author.



---
  xen/arch/arm/arm64/vsysreg.c | 21 ++
  xen/arch/arm/include/asm/arm64/hsr.h |  3 +++
  xen/arch/arm/include/asm/cpregs.h    |  2 ++
  xen/arch/arm/include/asm/traps.h |  4 
  xen/arch/arm/traps.c | 18 +++
  xen/arch/arm/vcpreg.c    | 33 +---
  6 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index b5d54c569b..5082dfb02e 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs,
   *
   * Unhandled:
   *    MDCCINT_EL1
- *    DBGDTR_EL0
- *    DBGDTRRX_EL0
-   

Re: [PATCH] bump default SeaBIOS version to 1.16.3

2023-12-04 Thread Roger Pau Monné
On Mon, Dec 04, 2023 at 12:12:58PM +0100, Jan Beulich wrote:
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH 03/12] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-12-04 Thread Kevin Wolf
Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> aio_context_acquire()/aio_context_release() has been replaced by
> fine-grained locking to protect state shared by multiple threads. The
> AioContext lock still plays the role of balancing locking in
> AIO_WAIT_WHILE() and many functions in QEMU either require that the
> AioContext lock is held or not held for this reason. In other words, the
> AioContext lock is purely there for consistency with itself and serves
> no real purpose anymore.
> 
> Stop actually acquiring/releasing the lock in
> aio_context_acquire()/aio_context_release() so that subsequent patches
> can remove callers across the codebase incrementally.
> 
> I have performed "make check" and qemu-iotests stress tests across
> x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
> result of eliminating the lock.
> 
> Signed-off-by: Stefan Hajnoczi 

YOLO.

Acked-by: Kevin Wolf 




Re: [PATCH v2] x86emul: avoid triggering event related assertions

2023-12-04 Thread Andrew Cooper
On 17/04/2023 1:23 pm, Jan Beulich wrote:
> The assertion at the end of x86_emulate_wrapper() as well as the ones
> in x86_emul_{hw_exception,pagefault}() can trigger if we ignore
> X86EMUL_EXCEPTION coming back from certain hook functions. Squash
> exceptions when merely probing MSRs, plus on SWAPGS'es "best effort"
> error handling path.
>
> In adjust_bnd() add another assertion after the read_xcr(0, ...)
> invocation, paralleling the one in x86emul_get_fpu() - XCR0 reads should
> never fault when XSAVE is (implicitly) known to be available.
>
> Also update the respective comment in x86_emulate_wrapper().
>
> Fixes: 14a6be89ec04 ("x86emul: correct EFLAGS.TF handling")
> Fixes: cb2626c75813 ("x86emul: conditionally clear BNDn for branches")
> Fixes: 6eb43fcf8a0b ("x86emul: support SWAPGS")
> Reported-by: AFL
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 



Re: [PATCH 04/12] graph-lock: remove AioContext locking

2023-12-04 Thread Kevin Wolf
Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> Stop acquiring/releasing the AioContext lock in
> bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any
> effect.
> 
> The distinction between bdrv_graph_wrunlock() and
> bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed
> into one function.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Kevin Wolf 




Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread Juergen Gross

On 04.12.23 14:00, George Dunlap wrote:

On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich  wrote:


It is only in the error case that we want to clean up the new pool's
scheduler data; in the success case it's rather the old scheduler's
data which needs cleaning up.

Reported-by: René Winther Højgaard 
Signed-off-by: Jan Beulich 
Reviewed-by: Juergen Gross 

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
  for ( unit = old_units; unit; )
  {
  if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(ret ? c->sched : old_ops, unit->priv);
  old_unit = unit;
  unit = unit->next_in_list;
  xfree(old_unit);


This code is unfortunately written in a "clever" way which seems to
have introduced some confusion.  The one place which calls "goto
out_free" goes through and replaces *most* of the "old_*" variables
with the "new" equivalents.  That's why we're iterating over
`old_units` even on the failure path.

The result is that this change doesn't catch another bug on the
following line, in the error case:

sched_free_domdata(old_ops, old_domdata);

At this point, old_ops is still the old ops, but old_domdata is the
*new* domdata.

A patch like the following (compile tested only) would fix it along
the lines of the original intent:
8<---
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..78f21839d3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
  {
  old_units = new_units;
  old_domdata = domdata;
+old_ops = c->sched;
  ret = -ENOMEM;
  goto out_free;
  }
@@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
  domain_unpause(d);

   out_free:
+/*
+ * NB if we've jumped here, "old_units", "old_ops", and so on will
+ * actually be pointing to the new ops, since when aborting it's
+ * the new ops we want to free.
+ */
  for ( unit = old_units; unit; )
  {
  if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(old_ops, unit->priv);
  old_unit = unit;
  unit = unit->next_in_list;
  xfree(old_unit);
>8

But given that this kind of cleverness has already fooled two of our
most senior developers, I'd suggest making the whole thing more
explicit; something like the attached (again compile-tested only)?


And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch.

Thoughts?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


xen | Successful pipeline for staging | 525c7c09

2023-12-04 Thread GitLab


Pipeline #1093622286 has passed!

Project: xen ( https://gitlab.com/xen-project/xen )
Branch: staging ( https://gitlab.com/xen-project/xen/-/commits/staging )

Commit: 525c7c09 ( 
https://gitlab.com/xen-project/xen/-/commit/525c7c094b258e8a46b494488eef96f5670eb352
 )
Commit Message: xen/arm: Move static event channel feature to a...
Commit Author: Michal Orzel ( https://gitlab.com/orzelmichal )
Committed by: Julien Grall



Pipeline #1093622286 ( 
https://gitlab.com/xen-project/xen/-/pipelines/1093622286 ) triggered by Andrew 
Cooper ( https://gitlab.com/andyhhp )
successfully completed 136 jobs in 3 stages.

-- 
You're receiving this email because of your account on gitlab.com.





Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread George Dunlap
On Mon, Dec 4, 2023 at 1:47 PM Juergen Gross  wrote:
>
> On 04.12.23 14:00, George Dunlap wrote:
> > On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich  wrote:
> >>
> >> It is only in the error case that we want to clean up the new pool's
> >> scheduler data; in the success case it's rather the old scheduler's
> >> data which needs cleaning up.
> >>
> >> Reported-by: René Winther Højgaard 
> >> Signed-off-by: Jan Beulich 
> >> Reviewed-by: Juergen Gross 
> >>
> >> --- a/xen/common/sched/core.c
> >> +++ b/xen/common/sched/core.c
> >> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
> >>   for ( unit = old_units; unit; )
> >>   {
> >>   if ( unit->priv )
> >> -sched_free_udata(c->sched, unit->priv);
> >> +sched_free_udata(ret ? c->sched : old_ops, unit->priv);
> >>   old_unit = unit;
> >>   unit = unit->next_in_list;
> >>   xfree(old_unit);
> >
> > This code is unfortunately written in a "clever" way which seems to
> > have introduced some confusion.  The one place which calls "goto
> > out_free" goes through and replaces *most* of the "old_*" variables
> > with the "new" equivalents.  That's why we're iterating over
> > `old_units` even on the failure path.
> >
> > The result is that this change doesn't catch another bug on the
> > following line, in the error case:
> >
> > sched_free_domdata(old_ops, old_domdata);
> >
> > At this point, old_ops is still the old ops, but old_domdata is the
> > *new* domdata.
> >
> > A patch like the following (compile tested only) would fix it along
> > the lines of the original intent:
> > 8<---
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index eba0cea4bb..78f21839d3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool 
> > *c)
> >   {
> >   old_units = new_units;
> >   old_domdata = domdata;
> > +old_ops = c->sched;
> >   ret = -ENOMEM;
> >   goto out_free;
> >   }
> > @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct 
> > cpupool *c)
> >   domain_unpause(d);
> >
> >out_free:
> > +/*
> > + * NB if we've jumped here, "old_units", "old_ops", and so on will
> > + * actually be pointing to the new ops, since when aborting it's
> > + * the new ops we want to free.
> > + */
> >   for ( unit = old_units; unit; )
> >   {
> >   if ( unit->priv )
> > -sched_free_udata(c->sched, unit->priv);
> > +sched_free_udata(old_ops, unit->priv);
> >   old_unit = unit;
> >   unit = unit->next_in_list;
> >   xfree(old_unit);
> > >8
> >
> > But given that this kind of cleverness has already fooled two of our
> > most senior developers, I'd suggest making the whole thing more
> > explicit; something like the attached (again compile-tested only)?
>
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch.
>
> Thoughts?

I only see a PGP key and signature.  Did you forget to attach the patch?

 -George



Re: [PATCH v9 1/2] xen/vpci: header: status register handler

2023-12-04 Thread Roger Pau Monné
On Fri, Dec 01, 2023 at 10:45:49AM -0500, Stewart Hildebrand wrote:
> Introduce a handler for the PCI status register, with ability to mask
> the capabilities bit. The status register contains RsvdZ bits,
> read-only bits, and write-1-to-clear bits. Additionally, we use RsvdP to
> mask the capabilities bit. Introduce bitmasks to handle these in vPCI.
> If a bit in the bitmask is set, then the special meaning applies:
> 
>   ro_mask: read normal, guest write ignore (preserve on write to hardware)
>   rw1c_mask: read normal, write 1 to clear
>   rsvdp_mask: read as zero, guest write ignore (preserve on write to hardware)
>   rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
> 
> The RO/RW1C/RsvdP/RsvdZ naming and definitions were borrowed from the
> PCI Express Base 6.1 specification. RsvdP/RsvdZ bits help Xen enforce
> our view of the world. Xen preserves the value of read-only bits on
> write to hardware, discarding the guests write value. This is done in
> case hardware wrongly implements R/O bits as R/W.
> 
> The mask_cap_list flag will be set in a follow-on patch.
^ s/patch/change/

> 
> Signed-off-by: Stewart Hildebrand 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH v9 2/2] xen/vpci: header: filter PCI capabilities

2023-12-04 Thread Roger Pau Monné
On Mon, Dec 04, 2023 at 12:48:00PM +0100, Jan Beulich wrote:
> On 01.12.2023 16:45, Stewart Hildebrand wrote:
> > --- a/xen/drivers/pci/pci.c
> > +++ b/xen/drivers/pci/pci.c
> > @@ -39,31 +39,42 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, 
> > unsigned int cap)
> >  return 0;
> >  }
> >  
> > -unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
> > -   unsigned int cap)
> > +unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
> > +   const unsigned int *cap, unsigned int n,
> 
> Imo this would better be written as cap[] (or even caps[]).
> 
> > @@ -545,6 +546,68 @@ static int cf_check init_bars(struct pci_dev *pdev)
> >  if ( rc )
> >  return rc;
> >  
> > +if ( !is_hardware_domain(pdev->domain) )
> > +{
> > +if ( pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST 
> > )
> > +{
> > +/* Only expose capabilities to the guest that vPCI can handle. 
> > */
> > +unsigned int next, ttl = 48;
> > +const unsigned int supported_caps[] = {
> > +PCI_CAP_ID_MSI,
> > +PCI_CAP_ID_MSIX,
> > +};
> 
> static?
> 
> With the two adjustments
> Reviewed-by: Jan Beulich 

FTAOD: please also keep my RB when doing those adjustments.

Thanks, Roger.



Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread Juergen Gross

On 04.12.23 14:00, George Dunlap wrote:

On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich  wrote:


It is only in the error case that we want to clean up the new pool's
scheduler data; in the success case it's rather the old scheduler's
data which needs cleaning up.

Reported-by: René Winther Højgaard 
Signed-off-by: Jan Beulich 
Reviewed-by: Juergen Gross 

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
  for ( unit = old_units; unit; )
  {
  if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(ret ? c->sched : old_ops, unit->priv);
  old_unit = unit;
  unit = unit->next_in_list;
  xfree(old_unit);


This code is unfortunately written in a "clever" way which seems to
have introduced some confusion.  The one place which calls "goto
out_free" goes through and replaces *most* of the "old_*" variables
with the "new" equivalents.  That's why we're iterating over
`old_units` even on the failure path.

The result is that this change doesn't catch another bug on the
following line, in the error case:

sched_free_domdata(old_ops, old_domdata);

At this point, old_ops is still the old ops, but old_domdata is the
*new* domdata.

A patch like the following (compile tested only) would fix it along
the lines of the original intent:
8<---
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..78f21839d3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
  {
  old_units = new_units;
  old_domdata = domdata;
+old_ops = c->sched;
  ret = -ENOMEM;
  goto out_free;
  }
@@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
  domain_unpause(d);

   out_free:
+/*
+ * NB if we've jumped here, "old_units", "old_ops", and so on will
+ * actually be pointing to the new ops, since when aborting it's
+ * the new ops we want to free.
+ */
  for ( unit = old_units; unit; )
  {
  if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(old_ops, unit->priv);
  old_unit = unit;
  unit = unit->next_in_list;
  xfree(old_unit);
>8

But given that this kind of cleverness has already fooled two of our
most senior developers, I'd suggest making the whole thing more
explicit; something like the attached (again compile-tested only)?


And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch (this time it should be really there).

Thoughts?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key
From b8313d7060e9499fa7580265bda703cce05fe46a Mon Sep 17 00:00:00 2001
From: Juergen Gross 
Date: Mon, 4 Dec 2023 14:15:45 +0100
Subject: [PATCH] xen/sched: fix sched_move_domain()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Do cleanup in sched_move_domain() in a dedicated service function,
which is called either in error case with newly allocated data, or in
success case with the old data to be freed.

This will at once fix some subtle bugs which sneaked in due to
forgetting to overwrite some pointers in the error case.

Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
Reported-by: René Winther Højgaard 
Initial-fix-by: Jan Beulich 
Initial-fix-by: George Dunlap 
Signed-off-by: Juergen Gross 
---
 xen/common/sched/core.c | 47 +++--
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..fd92e9b13a 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -647,6 +647,24 @@ static void sched_move_irqs(const struct sched_unit *unit)
 vcpu_move_irqs(v);
 }
 
+static void sched_move_domain_cleanup(struct scheduler *ops,
+  struct sched_unit *units,
+  void *domdata)
+{
+struct sched_unit *unit, *old_unit;
+
+for ( unit = units; unit; )
+{
+if ( unit->priv )
+sched_free_udata(ops, unit->priv);
+old_unit = unit;
+unit = unit->next_in_list;
+xfree(old_unit);
+}
+
+sched_free_domdata(ops, domdata);
+}
+
 /*
  * Move a domain from one cpupool to another.
  *
@@ -686,7 +704,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 void *old_domdata;
 unsigned int gran = cpupool_get_granularity(c);
 unsigned int n_units = d->vcpu[0] ? DIV_ROUND_UP(d->max_vcpus, gran) : 0;
-int ret = 0;
 
 for_each_vcpu ( d, v )
 {
@@ -699,8 +716,9 @@ int sched_move_domain(stru

Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread George Dunlap
On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross  wrote:
>
> On 04.12.23 14:00, George Dunlap wrote:
> > On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich  wrote:
> >>
> >> It is only in the error case that we want to clean up the new pool's
> >> scheduler data; in the success case it's rather the old scheduler's
> >> data which needs cleaning up.
> >>
> >> Reported-by: René Winther Højgaard 
> >> Signed-off-by: Jan Beulich 
> >> Reviewed-by: Juergen Gross 
> >>
> >> --- a/xen/common/sched/core.c
> >> +++ b/xen/common/sched/core.c
> >> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
> >>   for ( unit = old_units; unit; )
> >>   {
> >>   if ( unit->priv )
> >> -sched_free_udata(c->sched, unit->priv);
> >> +sched_free_udata(ret ? c->sched : old_ops, unit->priv);
> >>   old_unit = unit;
> >>   unit = unit->next_in_list;
> >>   xfree(old_unit);
> >
> > This code is unfortunately written in a "clever" way which seems to
> > have introduced some confusion.  The one place which calls "goto
> > out_free" goes through and replaces *most* of the "old_*" variables
> > with the "new" equivalents.  That's why we're iterating over
> > `old_units` even on the failure path.
> >
> > The result is that this change doesn't catch another bug on the
> > following line, in the error case:
> >
> > sched_free_domdata(old_ops, old_domdata);
> >
> > At this point, old_ops is still the old ops, but old_domdata is the
> > *new* domdata.
> >
> > A patch like the following (compile tested only) would fix it along
> > the lines of the original intent:
> > 8<---
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index eba0cea4bb..78f21839d3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool 
> > *c)
> >   {
> >   old_units = new_units;
> >   old_domdata = domdata;
> > +old_ops = c->sched;
> >   ret = -ENOMEM;
> >   goto out_free;
> >   }
> > @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct 
> > cpupool *c)
> >   domain_unpause(d);
> >
> >out_free:
> > +/*
> > + * NB if we've jumped here, "old_units", "old_ops", and so on will
> > + * actually be pointing to the new ops, since when aborting it's
> > + * the new ops we want to free.
> > + */
> >   for ( unit = old_units; unit; )
> >   {
> >   if ( unit->priv )
> > -sched_free_udata(c->sched, unit->priv);
> > +sched_free_udata(old_ops, unit->priv);
> >   old_unit = unit;
> >   unit = unit->next_in_list;
> >   xfree(old_unit);
> > >8
> >
> > But given that this kind of cleverness has already fooled two of our
> > most senior developers, I'd suggest making the whole thing more
> > explicit; something like the attached (again compile-tested only)?
>
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch (this time it should be really there).

Yes, I thought about making a function as well -- that works for me too.

Personally I prefer to keep the "goto out", rather than duplicating
the rcu_read_unlock().  I'd yield if Jan said he preferred
duplication, however.

 -George



Re: [PATCH 05/12] block: remove AioContext locking

2023-12-04 Thread Kevin Wolf
Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> This is the big patch that removes
> aio_context_acquire()/aio_context_release() from the block layer and
> affected block layer users.
> 
> There isn't a clean way to split this patch and the reviewers are likely
> the same group of people, so I decided to do it in one patch.
> 
> Signed-off-by: Stefan Hajnoczi 

> @@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, 
> AioContext *old_ctx)
>  
>  void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
>  {
> -AioContext *ctx = bdrv_get_aio_context(bs);
> -
> -/* In the main thread, bs->aio_context won't change concurrently */
> -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> -
> -/*
> - * We're in coroutine context, so we already hold the lock of the main
> - * loop AioContext. Don't lock it twice to avoid deadlocks.
> - */
> -assert(qemu_in_coroutine());
> -if (ctx != qemu_get_aio_context()) {
> -aio_context_acquire(ctx);
> -}
> +/* TODO removed in next patch */
>  }

It's still there at the end of the series.

>  void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
>  {
> -AioContext *ctx = bdrv_get_aio_context(bs);
> -
> -assert(qemu_in_coroutine());
> -if (ctx != qemu_get_aio_context()) {
> -aio_context_release(ctx);
> -}
> +/* TODO removed in next patch */
>  }

This one, too.

Reviewed-by: Kevin Wolf 




Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread Jan Beulich
On 04.12.2023 15:10, Juergen Gross wrote:
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch (this time it should be really there).

Looks more neat to me than George's. Just one minor thing: Please can the
first parameter of sched_move_domain_cleanup() be constified?

Jan



Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread Jan Beulich
On 04.12.2023 15:18, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross  wrote:
>>
>> On 04.12.23 14:00, George Dunlap wrote:
>>> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich  wrote:

 It is only in the error case that we want to clean up the new pool's
 scheduler data; in the success case it's rather the old scheduler's
 data which needs cleaning up.

 Reported-by: René Winther Højgaard 
 Signed-off-by: Jan Beulich 
 Reviewed-by: Juergen Gross 

 --- a/xen/common/sched/core.c
 +++ b/xen/common/sched/core.c
 @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
   for ( unit = old_units; unit; )
   {
   if ( unit->priv )
 -sched_free_udata(c->sched, unit->priv);
 +sched_free_udata(ret ? c->sched : old_ops, unit->priv);
   old_unit = unit;
   unit = unit->next_in_list;
   xfree(old_unit);
>>>
>>> This code is unfortunately written in a "clever" way which seems to
>>> have introduced some confusion.  The one place which calls "goto
>>> out_free" goes through and replaces *most* of the "old_*" variables
>>> with the "new" equivalents.  That's why we're iterating over
>>> `old_units` even on the failure path.
>>>
>>> The result is that this change doesn't catch another bug on the
>>> following line, in the error case:
>>>
>>> sched_free_domdata(old_ops, old_domdata);
>>>
>>> At this point, old_ops is still the old ops, but old_domdata is the
>>> *new* domdata.
>>>
>>> A patch like the following (compile tested only) would fix it along
>>> the lines of the original intent:
>>> 8<---
>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>> index eba0cea4bb..78f21839d3 100644
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool 
>>> *c)
>>>   {
>>>   old_units = new_units;
>>>   old_domdata = domdata;
>>> +old_ops = c->sched;
>>>   ret = -ENOMEM;
>>>   goto out_free;
>>>   }
>>> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct 
>>> cpupool *c)
>>>   domain_unpause(d);
>>>
>>>out_free:
>>> +/*
>>> + * NB if we've jumped here, "old_units", "old_ops", and so on will
>>> + * actually be pointing to the new ops, since when aborting it's
>>> + * the new ops we want to free.
>>> + */
>>>   for ( unit = old_units; unit; )
>>>   {
>>>   if ( unit->priv )
>>> -sched_free_udata(c->sched, unit->priv);
>>> +sched_free_udata(old_ops, unit->priv);
>>>   old_unit = unit;
>>>   unit = unit->next_in_list;
>>>   xfree(old_unit);
>>> >8
>>>
>>> But given that this kind of cleverness has already fooled two of our
>>> most senior developers, I'd suggest making the whole thing more
>>> explicit; something like the attached (again compile-tested only)?
>>
>> And I have again a third approach, making it crystal clear what is happening
>> with which data. No need to explain what is freed via which variables. See
>> attached patch (this time it should be really there).
> 
> Yes, I thought about making a function as well -- that works for me too.
> 
> Personally I prefer to keep the "goto out", rather than duplicating
> the rcu_read_unlock().  I'd yield if Jan said he preferred
> duplication, however.

I'm on the edge there actually.

Jan



Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread Juergen Gross

On 04.12.23 15:38, Jan Beulich wrote:

On 04.12.2023 15:10, Juergen Gross wrote:

And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch (this time it should be really there).


Looks more neat to me than George's. Just one minor thing: Please can the
first parameter of sched_move_domain_cleanup() be constified?


Yes, will do that.

I'll send out V2 soon together with the other fix (this probably wants
an update of the commit message) and a small cleanup patch I have.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [linux-linus test] 183981: regressions - FAIL

2023-12-04 Thread Anthony PERARD
On Mon, Dec 04, 2023 at 06:56:50AM +0100, Juergen Gross wrote:
> On 04.12.23 00:51, osstest service owner wrote:
> > flight 183981 linux-linus real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/183981/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >   build-i386-pvops  6 kernel-build fail REGR. vs. 
> > 183973
> 
> This test is nonsense. We don't support running on a Pentium-Pro, which is the
> configured processor for the kernel.
> 
> We should do a 32-bit build using a more recent processor model.

That's Linux doing, that's the default model. osstest doesn't change it
or select anything in particular.

Cheers,

-- 
Anthony PERARD



Re: [PATCH] sched: correct sched_move_domain()'s cleanup path

2023-12-04 Thread Juergen Gross

On 04.12.23 15:39, Jan Beulich wrote:

On 04.12.2023 15:18, George Dunlap wrote:

On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross  wrote:


On 04.12.23 14:00, George Dunlap wrote:

On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich  wrote:


It is only in the error case that we want to clean up the new pool's
scheduler data; in the success case it's rather the old scheduler's
data which needs cleaning up.

Reported-by: René Winther Højgaard 
Signed-off-by: Jan Beulich 
Reviewed-by: Juergen Gross 

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
   for ( unit = old_units; unit; )
   {
   if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(ret ? c->sched : old_ops, unit->priv);
   old_unit = unit;
   unit = unit->next_in_list;
   xfree(old_unit);


This code is unfortunately written in a "clever" way which seems to
have introduced some confusion.  The one place which calls "goto
out_free" goes through and replaces *most* of the "old_*" variables
with the "new" equivalents.  That's why we're iterating over
`old_units` even on the failure path.

The result is that this change doesn't catch another bug on the
following line, in the error case:

sched_free_domdata(old_ops, old_domdata);

At this point, old_ops is still the old ops, but old_domdata is the
*new* domdata.

A patch like the following (compile tested only) would fix it along
the lines of the original intent:
8<---
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..78f21839d3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
   {
   old_units = new_units;
   old_domdata = domdata;
+old_ops = c->sched;
   ret = -ENOMEM;
   goto out_free;
   }
@@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
   domain_unpause(d);

out_free:
+/*
+ * NB if we've jumped here, "old_units", "old_ops", and so on will
+ * actually be pointing to the new ops, since when aborting it's
+ * the new ops we want to free.
+ */
   for ( unit = old_units; unit; )
   {
   if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
+sched_free_udata(old_ops, unit->priv);
   old_unit = unit;
   unit = unit->next_in_list;
   xfree(old_unit);
>8

But given that this kind of cleverness has already fooled two of our
most senior developers, I'd suggest making the whole thing more
explicit; something like the attached (again compile-tested only)?


And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch (this time it should be really there).


Yes, I thought about making a function as well -- that works for me too.

Personally I prefer to keep the "goto out", rather than duplicating
the rcu_read_unlock().  I'd yield if Jan said he preferred
duplication, however.


I'm on the edge there actually.


In this case I'd prefer it my way, as it avoids having to scroll down to the
out: label to see what is happening there. Additionally it enables to get rid
of the ret variable.

In the end the main part of the patch is the new function, so I'm not really
feeling strong regarding the dropping of "goto out".


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [linux-linus test] 183981: regressions - FAIL

2023-12-04 Thread Juergen Gross

On 04.12.23 15:42, Anthony PERARD wrote:

On Mon, Dec 04, 2023 at 06:56:50AM +0100, Juergen Gross wrote:

On 04.12.23 00:51, osstest service owner wrote:

flight 183981 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183981/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
   build-i386-pvops  6 kernel-build fail REGR. vs. 
183973


This test is nonsense. We don't support running on a Pentium-Pro, which is the
configured processor for the kernel.

We should do a 32-bit build using a more recent processor model.


That's Linux doing, that's the default model. osstest doesn't change it
or select anything in particular.


I think X86_GENERIC would be an appropriate selection.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-12-04 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 09:25:52AM -0600, Eric Blake wrote:
> On Wed, Nov 29, 2023 at 02:55:42PM -0500, Stefan Hajnoczi wrote:
> > Protect the Task Management Function BH state with a lock. The TMF BH
> > runs in the main loop thread. An IOThread might process a TMF at the
> > same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
> > must be protected by a lock.
> > 
> > Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
> > This avoids more locking to protect the virtqueue and SCSI layer state.
> 
> Are we trying to get this into 8.2?

No. 8.2 still has the AioContext lock is therefore safe without this
patch.

Stefan

> 
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/hw/virtio/virtio-scsi.h |  3 +-
> >  hw/scsi/virtio-scsi.c   | 62 ++---
> >  2 files changed, 43 insertions(+), 22 deletions(-)
> >
> 
> Reviewed-by: Eric Blake 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
> 


signature.asc
Description: PGP signature


Re: [PATCH v4 3/6] xen: decouple generic xen code from legacy backends codebase

2023-12-04 Thread Paul Durrant

On 02/12/2023 01:41, Volodymyr Babchuk wrote:

In xen-all.c there are unneeded dependencies on xen-legacy-backend.c:

  - xen_init() uses xen_pv_printf() to report errors, but it does not
  provide a pointer to the struct XenLegacyDevice, so it is kind of
  useless, we can use standard error_report() instead.

  - xen-all.c has function xenstore_record_dm_state() which uses global
  variable "xenstore" defined and initialized in xen-legacy-backend.c
  It is used exactly once, so we can just open a new connection to the
  xenstore, update DM state and close connection back.

Those two changes allows us to remove xen-legacy-backend.c at all,
what should be done in the future anyways. But right now this patch
moves us one step close to have QEMU build without legacy Xen
backends.

Signed-off-by: Volodymyr Babchuk 

---

In v4:

  - New in v4, previous was part of "xen: add option to disable legacy
  backends"
  - Do not move xenstore global variable from xen-legacy-backend.c,
instead use a local variable.
---
  accel/xen/xen-all.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)



Reviewed-by: Paul Durrant 




Re: [PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-12-04 Thread Stefan Hajnoczi
On Mon, Dec 04, 2023 at 01:46:13PM +0100, Kevin Wolf wrote:
> Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> > Protect the Task Management Function BH state with a lock. The TMF BH
> > runs in the main loop thread. An IOThread might process a TMF at the
> > same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
> > must be protected by a lock.
> > 
> > Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
> > This avoids more locking to protect the virtqueue and SCSI layer state.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> The second part reminds me that the implicit protection of the virtqueue
> and SCSI data structures by having all accesses in a single thread is
> hard to review and I think we wanted to put some assertions there to
> check that we're really running in the right thread. I don't think we
> have done that so far, so I suppose after this patch would be the place
> in the series to add them, before we remove the protection by the
> AioContext lock?

Thanks for reminding me. I will add assertions in the next revision of
this series.

Stefan


signature.asc
Description: PGP signature


Re: [XEN PATCH 1/2] x86/p2m: preparation work for xenmem_add_to_physmap_one()

2023-12-04 Thread Jan Beulich
On 30.11.2023 16:48, Federico Serafini wrote:
> The objective is to use parameter name "gfn" for
> xenmem_add_to_physmap_one().
> Since the name "gfn" is currently used as identifier for a local
> variable, bad things could happen if new uses of such variable are
> committed while a renaming patch is waiting for the approval.
> To avoid such danger, as first thing rename the local variable from
> "gfn" to "gmfn".

"..., in line with XENMAPSPACE_gmfn which is the only case it is used
with."

This is to justify the name not matching our generally aimed at "gfn"
and "mfn" scheme.

> No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Jan Beulich 

Jan



Re: [XEN PATCH 2/2] x86/p2m: address a violation of MISRA C:2012 Rule 8.3

2023-12-04 Thread Jan Beulich
On 30.11.2023 16:48, Federico Serafini wrote:
> Make function declaration and definition consistent changing
> parameter name from "gpfn" to "gfn".
> For consistency, rename also "old_gpfn" to "old_gfn".
> No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Jan Beulich 

> ---
> This patch depends on patch 1/2 of the same series.

There's no need to state this, btw. Within a series later patches depending
on earlier ones if the default. There instead it can help committers if it
is made clear when patches do not depend on one another (and hence can be
committed in a order different from the submission's).

Jan



Re: [PATCH 05/12] block: remove AioContext locking

2023-12-04 Thread Stefan Hajnoczi
On Mon, Dec 04, 2023 at 03:33:57PM +0100, Kevin Wolf wrote:
> Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> > This is the big patch that removes
> > aio_context_acquire()/aio_context_release() from the block layer and
> > affected block layer users.
> > 
> > There isn't a clean way to split this patch and the reviewers are likely
> > the same group of people, so I decided to do it in one patch.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> > @@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState 
> > *bs, AioContext *old_ctx)
> >  
> >  void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
> >  {
> > -AioContext *ctx = bdrv_get_aio_context(bs);
> > -
> > -/* In the main thread, bs->aio_context won't change concurrently */
> > -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > -
> > -/*
> > - * We're in coroutine context, so we already hold the lock of the main
> > - * loop AioContext. Don't lock it twice to avoid deadlocks.
> > - */
> > -assert(qemu_in_coroutine());
> > -if (ctx != qemu_get_aio_context()) {
> > -aio_context_acquire(ctx);
> > -}
> > +/* TODO removed in next patch */
> >  }
> 
> It's still there at the end of the series.

Will fix in v2. Thanks!


signature.asc
Description: PGP signature


Re: [XEN PATCH 2/2] x86/p2m: address a violation of MISRA C:2012 Rule 8.3

2023-12-04 Thread Federico Serafini

On 04/12/23 15:54, Jan Beulich wrote:

On 30.11.2023 16:48, Federico Serafini wrote:

Make function declaration and definition consistent changing
parameter name from "gpfn" to "gfn".
For consistency, rename also "old_gpfn" to "old_gfn".
No functional change.

Signed-off-by: Federico Serafini 


Reviewed-by: Jan Beulich 


---
This patch depends on patch 1/2 of the same series.


There's no need to state this, btw. Within a series later patches depending
on earlier ones if the default. There instead it can help committers if it
is made clear when patches do not depend on one another (and hence can be
committed in a order different from the submission's).


Thanks for the information.

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



[PATCH v2 1/3] xen/sched: fix adding offline cpu to cpupool

2023-12-04 Thread Juergen Gross
Trying to add an offline cpu to a cpupool can crash the hypervisor,
as the probably non-existing percpu area of the cpu is accessed before
the availability of the cpu is being tested. This can happen in case
the cpupool's granularity is "core" or "socket".

Fix that by testing the cpu to be online.

Fixes: cb563d7665f2 ("xen/sched: support core scheduling for moving cpus 
to/from cpupools")
Reported-by: René Winther Højgaard 
Signed-off-by: Juergen Gross 
---
V2:
- enhance commit message
---
 xen/common/sched/cpupool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 2e094b0cfa..ad8f608462 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -892,6 +892,8 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
 if ( cpu >= nr_cpu_ids )
 goto addcpu_out;
 ret = -ENODEV;
+if ( !cpu_online(cpu) )
+goto addcpu_out;
 cpus = sched_get_opt_cpumask(c->gran, cpu);
 if ( !cpumask_subset(cpus, &cpupool_free_cpus) ||
  cpumask_intersects(cpus, &cpupool_locked_cpus) )
-- 
2.35.3




[PATCH v2 3/3] xen/sched: do some minor cleanup of sched_move_domain()

2023-12-04 Thread Juergen Gross
Do some minor cleanups:

- Move setting of old_domdata and old_units next to each other
- Drop incrementing unit_idx in the final loop of sched_move_domain()
  as it isn't used afterwards
- Rename new_p to new_cpu and unit_p to unit_cpu

Signed-off-by: Juergen Gross 
---
V2:
- new patch
---
 xen/common/sched/core.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 901782bbb4..f6ac1e5af8 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -698,7 +698,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 struct sched_unit *unit, *old_unit;
 struct sched_unit *new_units = NULL, *old_units;
 struct sched_unit **unit_ptr = &new_units;
-unsigned int new_p, unit_idx;
+unsigned int new_cpu, unit_idx;
 void *domdata;
 struct scheduler *old_ops = dom_scheduler(d);
 void *old_domdata;
@@ -748,13 +748,14 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 domain_pause(d);
 
 old_domdata = d->sched_priv;
+old_units = d->sched_unit_list;
 
 /*
  * Remove all units from the old scheduler, and temporarily move them to
  * the same processor to make locking easier when moving the new units to
  * new processors.
  */
-new_p = cpumask_first(d->cpupool->cpu_valid);
+new_cpu = cpumask_first(d->cpupool->cpu_valid);
 for_each_sched_unit ( d, unit )
 {
 spinlock_t *lock;
@@ -762,12 +763,10 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 sched_remove_unit(old_ops, unit);
 
 lock = unit_schedule_lock_irq(unit);
-sched_set_res(unit, get_sched_res(new_p));
+sched_set_res(unit, get_sched_res(new_cpu));
 spin_unlock_irq(lock);
 }
 
-old_units = d->sched_unit_list;
-
 d->cpupool = c;
 d->sched_priv = domdata;
 
@@ -781,32 +780,32 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 unit->state_entry_time = old_unit->state_entry_time;
 unit->runstate_cnt[v->runstate.state]++;
 /* Temporarily use old resource assignment */
-unit->res = get_sched_res(new_p);
+unit->res = get_sched_res(new_cpu);
 
 v->sched_unit = unit;
 }
 
 d->sched_unit_list = new_units;
 
-new_p = cpumask_first(c->cpu_valid);
+new_cpu = cpumask_first(c->cpu_valid);
 for_each_sched_unit ( d, unit )
 {
 spinlock_t *lock;
-unsigned int unit_p = new_p;
+unsigned int unit_cpu = new_cpu;
 
 for_each_sched_unit_vcpu ( unit, v )
 {
-migrate_timer(&v->periodic_timer, new_p);
-migrate_timer(&v->singleshot_timer, new_p);
-migrate_timer(&v->poll_timer, new_p);
-new_p = cpumask_cycle(new_p, c->cpu_valid);
+migrate_timer(&v->periodic_timer, new_cpu);
+migrate_timer(&v->singleshot_timer, new_cpu);
+migrate_timer(&v->poll_timer, new_cpu);
+new_cpu = cpumask_cycle(new_cpu, c->cpu_valid);
 }
 
 lock = unit_schedule_lock_irq(unit);
 
 sched_set_affinity(unit, &cpumask_all, &cpumask_all);
 
-sched_set_res(unit, get_sched_res(unit_p));
+sched_set_res(unit, get_sched_res(unit_cpu));
 /*
  * With v->processor modified we must not
  * - make any further changes assuming we hold the scheduler lock,
@@ -818,8 +817,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 sched_move_irqs(unit);
 
 sched_insert_unit(c->sched, unit);
-
-unit_idx++;
 }
 
 domain_update_node_affinity(d);
-- 
2.35.3




[PATCH v2 2/3] xen/sched: fix sched_move_domain()

2023-12-04 Thread Juergen Gross
Do cleanup in sched_move_domain() in a dedicated service function,
which is called either in error case with newly allocated data, or in
success case with the old data to be freed.

This will at once fix some subtle bugs which sneaked in due to
forgetting to overwrite some pointers in the error case.

Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with 
different granularity")
Reported-by: René Winther Højgaard 
Initial-fix-by: Jan Beulich 
Initial-fix-by: George Dunlap 
Signed-off-by: Juergen Gross 
---
V2:
- make ops parameter of new function const (Jan Beulich)
---
 xen/common/sched/core.c | 47 +++--
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..901782bbb4 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -647,6 +647,24 @@ static void sched_move_irqs(const struct sched_unit *unit)
 vcpu_move_irqs(v);
 }
 
+static void sched_move_domain_cleanup(const struct scheduler *ops,
+  struct sched_unit *units,
+  void *domdata)
+{
+struct sched_unit *unit, *old_unit;
+
+for ( unit = units; unit; )
+{
+if ( unit->priv )
+sched_free_udata(ops, unit->priv);
+old_unit = unit;
+unit = unit->next_in_list;
+xfree(old_unit);
+}
+
+sched_free_domdata(ops, domdata);
+}
+
 /*
  * Move a domain from one cpupool to another.
  *
@@ -686,7 +704,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 void *old_domdata;
 unsigned int gran = cpupool_get_granularity(c);
 unsigned int n_units = d->vcpu[0] ? DIV_ROUND_UP(d->max_vcpus, gran) : 0;
-int ret = 0;
 
 for_each_vcpu ( d, v )
 {
@@ -699,8 +716,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 domdata = sched_alloc_domdata(c->sched, d);
 if ( IS_ERR(domdata) )
 {
-ret = PTR_ERR(domdata);
-goto out;
+rcu_read_unlock(&sched_res_rculock);
+
+return PTR_ERR(domdata);
 }
 
 for ( unit_idx = 0; unit_idx < n_units; unit_idx++ )
@@ -718,10 +736,10 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 if ( !unit || !unit->priv )
 {
-old_units = new_units;
-old_domdata = domdata;
-ret = -ENOMEM;
-goto out_free;
+sched_move_domain_cleanup(c->sched, new_units, domdata);
+rcu_read_unlock(&sched_res_rculock);
+
+return -ENOMEM;
 }
 
 unit_ptr = &unit->next_in_list;
@@ -808,22 +826,11 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
 domain_unpause(d);
 
- out_free:
-for ( unit = old_units; unit; )
-{
-if ( unit->priv )
-sched_free_udata(c->sched, unit->priv);
-old_unit = unit;
-unit = unit->next_in_list;
-xfree(old_unit);
-}
-
-sched_free_domdata(old_ops, old_domdata);
+sched_move_domain_cleanup(old_ops, old_units, old_domdata);
 
- out:
 rcu_read_unlock(&sched_res_rculock);
 
-return ret;
+return 0;
 }
 
 void sched_destroy_vcpu(struct vcpu *v)
-- 
2.35.3




[PATCH v2 0/3] xen/sched: fixes and cleanup related to cpupools

2023-12-04 Thread Juergen Gross
Fix 2 bugs related to cpupool handling and do some related cleanups.

V2:
- send out the fixes tagged properly as "PATCH"
- add cleanup patch
 
Juergen Gross (3):
  xen/sched: fix adding offline cpu to cpupool
  xen/sched: fix sched_move_domain()
  xen/sched: do some minor cleanup of sched_move_domain()

 xen/common/sched/core.c| 74 --
 xen/common/sched/cpupool.c |  2 ++
 2 files changed, 41 insertions(+), 35 deletions(-)

-- 
2.35.3




Re: [PATCH 06/12] scsi: remove AioContext locking

2023-12-04 Thread Stefan Hajnoczi
On Mon, Dec 04, 2023 at 01:23:09PM +0100, Kevin Wolf wrote:
> Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> > The AioContext lock no longer has any effect. Remove it.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/hw/virtio/virtio-scsi.h | 14 --
> >  hw/scsi/scsi-bus.c  |  2 --
> >  hw/scsi/scsi-disk.c | 28 
> >  hw/scsi/virtio-scsi.c   | 18 --
> >  4 files changed, 4 insertions(+), 58 deletions(-)
> 
> > @@ -2531,13 +2527,11 @@ static void scsi_unrealize(SCSIDevice *dev)
> >  static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
> >  {
> >  SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
> > -AioContext *ctx = NULL;
> > +
> >  /* can happen for devices without drive. The error message for missing
> >   * backend will be issued in scsi_realize
> >   */
> >  if (s->qdev.conf.blk) {
> > -ctx = blk_get_aio_context(s->qdev.conf.blk);
> > -aio_context_acquire(ctx);
> >  if (!blkconf_blocksizes(&s->qdev.conf, errp)) {
> >  goto out;
> >  }
> > @@ -2549,15 +2543,11 @@ static void scsi_hd_realize(SCSIDevice *dev, Error 
> > **errp)
> >  }
> >  scsi_realize(&s->qdev, errp);
> >  out:
> > -if (ctx) {
> > -aio_context_release(ctx);
> > -}
> >  }
> 
> This doesn't build for me:
> 
> ../hw/scsi/scsi-disk.c:2545:1: error: label at end of compound statement is a 
> C2x extension [-Werror,-Wc2x-extensions]
> }
> ^
> 1 error generated.

Will fix in v2. Thanks!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 05/12] block: remove AioContext locking

2023-12-04 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 03:31:37PM -0600, Eric Blake wrote:
> On Wed, Nov 29, 2023 at 02:55:46PM -0500, Stefan Hajnoczi wrote:
> > This is the big patch that removes
> > aio_context_acquire()/aio_context_release() from the block layer and
> > affected block layer users.
> > 
> > There isn't a clean way to split this patch and the reviewers are likely
> > the same group of people, so I decided to do it in one patch.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> 
> > +++ b/block.c
> > @@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState 
> > *bs, AioContext *old_ctx)
> >  
> >  void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
> >  {
> > -AioContext *ctx = bdrv_get_aio_context(bs);
> > -
> > -/* In the main thread, bs->aio_context won't change concurrently */
> > -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > -
> > -/*
> > - * We're in coroutine context, so we already hold the lock of the main
> > - * loop AioContext. Don't lock it twice to avoid deadlocks.
> > - */
> > -assert(qemu_in_coroutine());
> 
> Is this assertion worth keeping in the short term?...

Probably not because coroutine vs non-coroutine functions don't change
in this patch series, so it's unlikely that this will break.

> 
> > -if (ctx != qemu_get_aio_context()) {
> > -aio_context_acquire(ctx);
> > -}
> > +/* TODO removed in next patch */
> >  }
> 
> ...I guess I'll see in the next patch.
> 
> >  
> >  void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
> >  {
> > -AioContext *ctx = bdrv_get_aio_context(bs);
> > -
> > -assert(qemu_in_coroutine());
> > -if (ctx != qemu_get_aio_context()) {
> > -aio_context_release(ctx);
> > -}
> > +/* TODO removed in next patch */
> >  }
> 
> Same comment.
> 
> > +++ b/blockdev.c
> > @@ -1395,7 +1352,6 @@ static void 
> > external_snapshot_action(TransactionAction *action,
> >  /* File name of the new image (for 'blockdev-snapshot-sync') */
> >  const char *new_image_file;
> >  ExternalSnapshotState *state = g_new0(ExternalSnapshotState, 1);
> > -AioContext *aio_context;
> >  uint64_t perm, shared;
> >  
> >  /* TODO We'll eventually have to take a writer lock in this function */
> 
> I'm guessing removal of the locking gets us one step closer to
> implementing this TODO at a later time?  Or is it now a stale comment?
> Either way, it doesn't affect this patch.

I'm not sure. Kevin can answer questions about the graph lock.

> > +++ b/tests/unit/test-blockjob.c
> 
> > -static void test_complete_in_standby(void)
> > -{
> 
> > @@ -531,13 +402,5 @@ int main(int argc, char **argv)
> >  g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
> >  g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
> >  g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
> > -
> > -/*
> > - * This test is flaky and sometimes fails in CI and otherwise:
> > - * don't run unless user opts in via environment variable.
> > - */
> > -if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> > -g_test_add_func("/blockjob/complete_in_standby", 
> > test_complete_in_standby);
> > -}
> 
> Looks like you ripped out this entire test, because it is no longer
> viable.  I might have mentioned it in the commit message, or squashed
> the removal of this test into the earlier 02/12 patch.

I have sent a separate patch to remove this test and once it's merged
this hunk will disappear this patch series:
https://lore.kernel.org/qemu-devel/20231127170210.422728-1-stefa...@redhat.com/

Stefan


signature.asc
Description: PGP signature


Re: [XEN PATCH 1/2] x86/p2m: preparation work for xenmem_add_to_physmap_one()

2023-12-04 Thread Federico Serafini

On 04/12/23 15:51, Jan Beulich wrote:

On 30.11.2023 16:48, Federico Serafini wrote:

The objective is to use parameter name "gfn" for
xenmem_add_to_physmap_one().
Since the name "gfn" is currently used as identifier for a local
variable, bad things could happen if new uses of such variable are
committed while a renaming patch is waiting for the approval.
To avoid such danger, as first thing rename the local variable from
"gfn" to "gmfn".


"..., in line with XENMAPSPACE_gmfn which is the only case it is used
with."

This is to justify the name not matching our generally aimed at "gfn"
and "mfn" scheme.


No functional change.

Signed-off-by: Federico Serafini 


Reviewed-by: Jan Beulich 


There is an use of "gfn" also few lines outside of the
switch statement, within an if condition where also XENMAPSPACE_gmfn is
involved:
what is true is that "gfn" is used only when space == XENMAPSPACE_gmfn.

What do you think about improve the description by adding:
"..., in line with XENMAPSPACE_gmfn which is the only *space* it is used
with."

However, the description improvement can be done on commit?

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



Re: [PATCH v11 16/17] xen/arm: vpci: permit access to guest vpci space

2023-12-04 Thread Stewart Hildebrand
On 12/4/23 03:29, Jan Beulich wrote:
> On 02.12.2023 02:27, Volodymyr Babchuk wrote:
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -695,6 +695,9 @@ struct domain *domain_create(domid_t domid,
>>  radix_tree_init(&d->pirq_tree);
>>  }
>>  
>> +if ( !is_idle_domain(d) )
>> +d->iomem_caps = rangeset_new(d, "I/O Memory", 
>> RANGESETF_prettyprint_hex);
>> +
>>  if ( (err = arch_domain_create(d, config, flags)) != 0 )
>>  goto fail;
>>  init_status |= INIT_arch;
>> @@ -711,7 +714,6 @@ struct domain *domain_create(domid_t domid,
>>  watchdog_domain_init(d);
>>  init_status |= INIT_watchdog;
>>  
>> -d->iomem_caps = rangeset_new(d, "I/O Memory", 
>> RANGESETF_prettyprint_hex);
>>  d->irq_caps   = rangeset_new(d, "Interrupts", 0);
>>  if ( !d->iomem_caps || !d->irq_caps )
>>  goto fail;
> 
> I'm pretty sure I asked before why I/O mem caps' init would be moved, but
> IRQ caps' would remain where it is.

You did. Sorry about that, I made the change locally but forgot to propagate it 
to Volodymyr. I will reply here with the updated patch.

> 
> Jan



[PATCH v11.1 16/17] xen/arm: vpci: permit access to guest vpci space

2023-12-04 Thread Stewart Hildebrand
Move iomem_caps initialization earlier (before arch_domain_create()).

Signed-off-by: Stewart Hildebrand 
Signed-off-by: Volodymyr Babchuk 
---
Changes in v11:
* move both iomem_caps and irq_caps initializations earlier, along with NULL
  check

Changes in v10:
* fix off-by-one
* also permit access to GUEST_VPCI_PREFETCH_MEM_ADDR

Changes in v9:
* new patch

This is sort of a follow-up to:

  baa6ea700386 ("vpci: add permission checks to map_range()")

I don't believe we need a fixes tag since this depends on the vPCI p2m BAR
patches.
---
 xen/arch/arm/vpci.c |  7 +++
 xen/common/domain.c | 10 +-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 796ff55d09d0..f8cdd085e27f 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -2,6 +2,7 @@
 /*
  * xen/arch/arm/vpci.c
  */
+#include 
 #include 
 #include 
 #include 
@@ -128,6 +129,12 @@ int domain_vpci_init(struct domain *d)
 }
 register_mmio_handler(d, &vpci_mmio_handler,
   GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, 
NULL);
+iomem_permit_access(d, paddr_to_pfn(GUEST_VPCI_MEM_ADDR),
+paddr_to_pfn(GUEST_VPCI_MEM_ADDR +
+ GUEST_VPCI_MEM_SIZE - 1));
+iomem_permit_access(d, paddr_to_pfn(GUEST_VPCI_PREFETCH_MEM_ADDR),
+paddr_to_pfn(GUEST_VPCI_PREFETCH_MEM_ADDR +
+ GUEST_VPCI_PREFETCH_MEM_SIZE - 1));
 }
 
 return 0;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 26b4d558a41c..0aeb0520c96f 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -706,6 +706,11 @@ struct domain *domain_create(domid_t domid,
 d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
 
 radix_tree_init(&d->pirq_tree);
+
+d->iomem_caps = rangeset_new(d, "I/O Memory", 
RANGESETF_prettyprint_hex);
+d->irq_caps   = rangeset_new(d, "Interrupts", 0);
+if ( !d->iomem_caps || !d->irq_caps )
+goto fail;
 }
 
 if ( (err = arch_domain_create(d, config, flags)) != 0 )
@@ -724,11 +729,6 @@ struct domain *domain_create(domid_t domid,
 watchdog_domain_init(d);
 init_status |= INIT_watchdog;
 
-d->iomem_caps = rangeset_new(d, "I/O Memory", 
RANGESETF_prettyprint_hex);
-d->irq_caps   = rangeset_new(d, "Interrupts", 0);
-if ( !d->iomem_caps || !d->irq_caps )
-goto fail;
-
 if ( (err = xsm_domain_create(XSM_HOOK, d, config->ssidref)) != 0 )
 goto fail;
 
-- 
2.43.0




  1   2   >