date:20231201

flight 183969 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183969/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183961
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183961
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183961
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183961
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183961
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183961
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183961
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183961
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linuxc1c09da07c550971a1764a113963533dcc8e4d2a
baseline version:
 linux994d5c58e50e91bb02c7be4a91d5186292a895c8

Last test of basis   183961  2023-12-01 08:05:10 Z0 days
Testing same since   183969  2023-12-01 21:42:21 Z0 days1 attempts


People who touched revisions under test:
  Brian Foster 
  Jan Kara 
  Kent Overstreet 
  Linus Torvalds 
  Ritesh Harjani (IBM) 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt

[GIT PULL] xen: branch for v6.7-rc4

2023-12-01 Thread Juergen Gross

Linus,

Please git pull the following tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
for-linus-6.7a-rc4-tag

xen: branch for v6.7-rc4

It contains 2 fixes:

- A fix for the Xen event driver setting the correct return value when
  experiencing an allocation failure

- A fix for allocating space for a struct in the percpu area to not
  cross page boundaries (this one is for x86, a similar one for Arm was
  already in the pull request for rc3)


Thanks.

Juergen

 arch/x86/xen/enlighten.c | 6 +-
 arch/x86/xen/xen-ops.h   | 2 +-
 drivers/xen/events/events_base.c | 4 +++-
 3 files changed, 9 insertions(+), 3 deletions(-)

Dan Carpenter (1):
  xen/events: fix error code in xen_bind_pirq_msi_to_irq()

Juergen Gross (1):
  x86/xen: fix percpu vcpu_info allocation

xen | Failed pipeline for staging | def73fc1

2023-12-01 Thread GitLab



Pipeline #1091754937 has failed!

Project: xen ( https://gitlab.com/xen-project/xen )
Branch: staging ( https://gitlab.com/xen-project/xen/-/commits/staging )

Commit: def73fc1 ( 
https://gitlab.com/xen-project/xen/-/commit/def73fc14407252cc801f35cd7746e60ccd70884
 )
Commit Message: automation/eclair: improve scheduled analyses

...
Commit Author: Simone Ballarin
Committed by: Stefano Stabellini


Pipeline #1091754937 ( 
https://gitlab.com/xen-project/xen/-/pipelines/1091754937 ) triggered by Ganis 
( https://gitlab.com/ganis )
had 3 failed jobs.

Job #5658840883 ( https://gitlab.com/xen-project/xen/-/jobs/5658840883/raw )

Stage: test
Name: zen3p-pci-hvm-x86-64-gcc-debug
Job #5658840879 ( https://gitlab.com/xen-project/xen/-/jobs/5658840879/raw )

Stage: test
Name: zen3p-smoke-x86-64-dom0pvh-gcc-debug
Job #5658840875 ( https://gitlab.com/xen-project/xen/-/jobs/5658840875/raw )

Stage: test
Name: zen3p-smoke-x86-64-gcc-debug

-- 
You're receiving this email because of your account on gitlab.com.

Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough device in PVH dom0

On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> On Thu, Nov 30, 2023 at 07:15:17PM -0800, Stefano Stabellini wrote:
> > On Thu, 30 Nov 2023, Roger Pau Monné wrote:
> > > On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote:
> > > > On Fri, 24 Nov 2023, Jiqian Chen wrote:
> > > > > This patch is to solve two problems we encountered when we try to
> > > > > passthrough a device to hvm domU base on Xen PVH dom0.
> > > > > 
> > > > > First, hvm guest will alloc a pirq and irq for a passthrough device
> > > > > by using gsi, before that, the gsi must first has a mapping in dom0,
> > > > > see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call
> > > > > into Xen and check whether dom0 has the mapping. See
> > > > > XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH
> > > > > dom0 and it return irq is 0, and then return -EPERM.
> > > > > This is because the passthrough device doesn't do PHYSDEVOP_map_pirq
> > > > > when thay are enabled.
> > > > > 
> > > > > Second, in PVH dom0, the gsi of a passthrough device doesn't get
> > > > > registered, but gsi must be configured for it to be able to be
> > > > > mapped into a domU.
> > > > > 
> > > > > After searching codes, we can find map_pirq and register_gsi will be
> > > > > done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when
> > > > > the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems
> > > > > can be conclude to that the gsi of a passthrough device doesn't be
> > > > > unmasked.
> > > > > 
> > > > > To solve the unmaske problem, this patch call the unmask_irq when we
> > > > > assign a device to be passthrough. So that the gsi can get registered
> > > > > and mapped in PVH dom0.
> > > > 
> > > > 
> > > > Roger, this seems to be more of a Xen issue than a Linux issue. Why do
> > > > we need the unmask check in Xen? Couldn't we just do:
> > > > 
> > > > 
> > > > diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> > > > index 4e40d3609a..df262a4a18 100644
> > > > --- a/xen/arch/x86/hvm/vioapic.c
> > > > +++ b/xen/arch/x86/hvm/vioapic.c
> > > > @@ -287,7 +287,7 @@ static void vioapic_write_redirent(
> > > >  hvm_dpci_eoi(d, gsi);
> > > >  }
> > > >  
> > > > -if ( is_hardware_domain(d) && unmasked )
> > > > +if ( is_hardware_domain(d) )
> > > >  {
> > > >  /*
> > > >   * NB: don't call vioapic_hwdom_map_gsi while holding 
> > > > hvm.irq_lock
> > > 
> > > There are some issues with this approach.
> > > 
> > > mp_register_gsi() will only setup the trigger and polarity of the
> > > IO-APIC pin once, so we do so once the guest unmask the pin in order
> > > to assert that the configuration is the intended one.  A guest is
> > > allowed to write all kind of nonsense stuff to the IO-APIC RTE, but
> > > that doesn't take effect unless the pin is unmasked.
> > > 
> > > Overall the question would be whether we have any guarantees that
> > > the hardware domain has properly configured the pin, even if it's not
> > > using it itself (as it hasn't been unmasked).
> > > 
> > > IIRC PCI legacy interrupts are level triggered and low polarity, so we
> > > could configure any pins that are not setup at bind time?
> > 
> > That could work.
> > 
> > Another idea is to move only the call to allocate_and_map_gsi_pirq at
> > bind time? That might be enough to pass a pirq_access_permitted check.
> 
> Maybe, albeit that would change the behavior of XEN_DOMCTL_bind_pt_irq
> just for PT_IRQ_TYPE_PCI and only when called from a PVH dom0 (as the
> parameter would be a GSI instead of a previously mapped IRQ).  Such
> difference just for PT_IRQ_TYPE_PCI is slightly weird - if we go that
> route I would recommend that we instead introduce a new dmop that has
> this syntax regardless of the domain type it's called from.

Looking at the code it is certainly a bit confusing. My point was that
we don't need to wait until polarity and trigger are set appropriately
to allow Dom0 to pass successfully a pirq_access_permitted() check. Xen
should be able to figure out that Dom0 is permitted pirq access.

So the idea was to move the call to allocate_and_map_gsi_pirq() earlier
somewhere because allocate_and_map_gsi_pirq doesn't require trigger or
polarity to be configured to work. But the suggestion of doing it a
"bind time" (meaning: XEN_DOMCTL_bind_pt_irq) was a bad idea.

But maybe we can find another location, maybe within
xen/arch/x86/hvm/vioapic.c, to call allocate_and_map_gsi_pirq() before
trigger and polarity are set and before the interrupt is unmasked.

Then we change the implementation of vioapic_hwdom_map_gsi to skip the
call to allocate_and_map_gsi_pirq, because by the time
vioapic_hwdom_map_gsi we assume that allocate_and_map_gsi_pirq had
already been done.

I am not familiar with vioapic.c but to give you an idea of what I was
thinking:


diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 4e40d3609a..16d56fe851 100644
---

Re: [XEN PATCH v2 3/3] xen: address violations of MISRA C:2012 Rule 13.1

On Mon, 27 Nov 2023, Jan Beulich wrote:
> On 24.11.2023 18:29, Simone Ballarin wrote:
> > Rule 13.1: Initializer lists shall not contain persistent side effects
> > 
> > The assignment operation in:
> > 
> > .irq = rc = uart->irq,
> > 
> > is a persistent side effect in a struct initializer list.
> > 
> > This patch avoids rc assignment and directly uses uart->irq
> > in the following if statement.
> > 
> > No functional changes.
> > 
> > Signed-off-by: Maria Celeste Cesario  
> > Signed-off-by: Simone Ballarin 
> 
> Who's the author of this patch? (Either the order of the SoB is wrong, or
> there's a From: tag missing.)
> 
> > ---
> > Changes in v2:
> > - avoid assignment of rc;
> > - drop changes in vcpu_yield(void).
> > ---
> >  xen/drivers/char/ns16550.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> This warrants a more specific subject prefix. Also there's only a single
> violation being dealt with here.
> 
> > --- a/xen/drivers/char/ns16550.c
> > +++ b/xen/drivers/char/ns16550.c
> > @@ -445,11 +445,13 @@ static void __init cf_check 
> > ns16550_init_postirq(struct serial_port *port)
> >  struct msi_info msi = {
> >  .sbdf = PCI_SBDF(0, uart->ps_bdf[0], uart->ps_bdf[1],
> >   uart->ps_bdf[2]),
> > -.irq = rc = uart->irq,
> > +.irq = uart->irq,
> >  .entry_nr = 1
> >  };
> >  
> > -if ( rc > 0 )
> > +rc = 0;
> > +
> > +if ( uart->irq > 0 )
> >  {
> >  struct msi_desc *msi_desc = NULL;
> 
> The fact that there's no functional change here isn't really obvious.
> Imo you want to prove that to a reasonable degree in the description.
 
Agreed. Only reading this chunk, wouldn't it be better to do:

};

rc = uart->irq;

if ( rc > 0 )

at least it would be obvious?

Re: [XEN PATCH v2 1/3] automation/eclair: tag function calls to address violations of MISRA C:2012 Rule 13.1

On Fri, 24 Nov 2023, Simone Ballarin wrote:
> Rule 13.1: Initializer lists shall not contain persistent side effects
> 
> Invocations of functions in initializer lists cause violations of rule
> 13.1 if the called functions are not tagged with __attribute_pure__ or
> __attribute_const__ as they can produce persistent side effects.
> 
> Handling these violations with  attributes is not always possible: the
> pure and const attributes may cause unwanted and potentially dangerous
> optimisations.
> 
> To avoid this problem ECLAIR allows using the same attributes in the
> -call_properties setting. Additionally, it adds the noeffect attribute
> with the following definition:
> "like pure but can also read volatile variable not triggering side effects"
> 
> These patch tags some functions used in initializer lists to address
> violations of Rule 13.1.
> 
> No functional changes.
> 
> Signed-off-by: Simone Ballarin 

Ideally we should also list them somewhere in a document, maybe
docs/misra/deviations.rst? Or a new doc? It would be best if this info
wouldn't only exist in call_properties.ecl.

But give that the below is OK:
Acked-by: Stefano Stabellini 


> ---
> Changes in v2:
> New patch partly based on "xen/arm: address violations of MISRA C:2012 Rule 
> 13.1"
> and "xen/include: add pure and const attributes". This new patch uses
> ECL tagging instead of compiler attributes.
> ---
>  .../ECLAIR/call_properties.ecl| 22 +++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/automation/eclair_analysis/ECLAIR/call_properties.ecl 
> b/automation/eclair_analysis/ECLAIR/call_properties.ecl
> index 3f7794bf8b..c2b2a6182e 100644
> --- a/automation/eclair_analysis/ECLAIR/call_properties.ecl
> +++ b/automation/eclair_analysis/ECLAIR/call_properties.ecl
> @@ -73,6 +73,17 @@
>  -call_properties+={"macro(^va_start$)", {"pointee_write(1=always)", 
> "pointee_read(1=never)", "taken()"}}
>  -call_properties+={"macro(^memcmp$)", {"pointee_write(1..2=never)", 
> "taken()"}}
>  -call_properties+={"macro(^memcpy$)", {"pointee_write(1=always&&2..=never)", 
> "pointee_read(1=never&&2..=always)", "taken()"}}
> +-call_properties+={"name(get_cpu_info)",{pure}}
> +-call_properties+={"name(pdx_to_pfn)",{pure}}
> +-call_properties+={"name(is_pci_passthrough_enabled)",{const}}
> +-call_properties+={"name(get_cycles)", {"noeffect"}}
> +-call_properties+={"name(msi_gflags)",{const}}
> +-call_properties+={"name(hvm_save_size)",{pure}}
> +-call_properties+={"name(cpu_has)",{pure}}
> +-call_properties+={"name(boot_cpu_has)",{pure}}
> +-call_properties+={"name(get_cpu_info)",{pure}}
> +-call_properties+={"name(put_pte_flags)",{const}}
> +-call_properties+={"name(is_pv_vcpu)",{pure}}
>  
>  -doc_begin="Property inferred as a consequence of the semantics of 
> device_tree_get_reg"
>  -call_properties+={"name(acquire_static_memory_bank)", 
> {"pointee_write(4..=always)", "pointee_read(4..=never)", "taken()"}}
> @@ -104,3 +115,14 @@ Furthermore, their uses do initialize the involved 
> variables as needed by futher
>  
> -call_properties+={"macro(^(__)?(raw_)?copy_from_(paddr|guest|compat)(_offset)?$)",
>  {"pointee_write(1=always)", "pointee_read(1=never)", "taken()"}}
>  -call_properties+={"macro(^(__)?copy_to_(guest|compat)(_offset)?$)", 
> {"pointee_write(2=always)", "pointee_read(2=never)", "taken()"}}
>  -doc_end
> +
> +-doc_begin="Functions generated by build_atomic_read cannot be considered 
> pure
> +since the input pointer is volatile, but they do not produce any persistent 
> side
> +effect."
> +-call_properties+={"^read_u(8|16|32|64|int)_atomic.*$", {noeffect}}
> +-doc_end
> +
> +-doc_begin="Functions generated by TYPE_SAFE are const."
> +-call_properties+={"^(mfn|gfn|pfn)_x\\(.*$",{const}}
> +-call_properties+={"^_(mfn|gfn|pfn)\\(.*$",{const}}
> +-doc_end
> -- 
> 2.34.1
>

Re: [XEN PATCH] automation/eclair: tag files as "adopted" and "out of scope"

On Fri, 24 Nov 2023, Federico Serafini wrote:
> Tag arm64/efibind.h as "adopted":
> it is used to build the efi stub, which is a separate entry point
> for Xen when booted from EFI firmware.
> 
> Tag common/coverage/* as "out-of-scope":
> it is code to support gcov, hence it is part of the testing machinery.
> 
> Signed-off-by: Federico Serafini 

I think they should be in the exclude-list ?


> ---
>  automation/eclair_analysis/ECLAIR/out_of_scope.ecl | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/automation/eclair_analysis/ECLAIR/out_of_scope.ecl 
> b/automation/eclair_analysis/ECLAIR/out_of_scope.ecl
> index e1ec4a607c..3bd385ecf9 100644
> --- a/automation/eclair_analysis/ECLAIR/out_of_scope.ecl
> +++ b/automation/eclair_analysis/ECLAIR/out_of_scope.ecl
> @@ -84,6 +84,7 @@
>  -doc_begin="Files imported from the gnu-efi package"
>  -file_tag+={adopted,"^xen/include/efi/.*$"}
>  -file_tag+={adopted,"^xen/arch/x86/include/asm/x86_64/efibind\\.h$"}
> +-file_tag+={adopted,"^xen/arch/arm/include/asm/arm64/efibind\\.h$"}
>  -doc_end
>  
>  -doc_begin="Build tools are out of scope."
> @@ -104,6 +105,10 @@
>  -file_tag+={out_of_scope,"^xen/include/xen/xxhash\\.h$"}
>  -doc_end
>  
> +-doc_begin="Out of scope code to support gcov."
> +-file_tag+={out_of_scope, "^xen/common/coverage/.*$"}
> +-doc_end
> +
>  -doc_begin="Headers under xen/include/public/ are the description of the 
> public
>  hypercall ABI so the community is extremely conservative in making changes
>  there, because the interface is maintained for backward compatibility: ignore
> -- 
> 2.34.1
>

Re: [PATCH v4 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table

2023-12-01 Thread Marek Marczykowski-Górecki

On Mon, Nov 27, 2023 at 06:00:57PM +0100, Jan Beulich wrote:
> On 24.11.2023 02:47, Marek Marczykowski-Górecki wrote:
> > GCC gets confused about 'desc' variable:
> > 
> > arch/x86/hvm/vmsi.c: In function ‘msixtbl_range’:
> > arch/x86/hvm/vmsi.c:553:8: error: ‘desc’ may be used uninitialized 
> > [-Werror=maybe-uninitialized]
> >   553 | if ( desc )
> >   |^
> > arch/x86/hvm/vmsi.c:537:28: note: ‘desc’ was declared here
> >   537 | const struct msi_desc *desc;
> >   |^~~~
> 
> This could do with also indicating the gcc version. Issues like this
> tend to get fixed over time.

Sure, I'll add it's GCC 12.2.1.
And indeed, GCC 13.2.1 does not complain anymore.

> > +
> > +if ( !msix->adj_access_idx[adj_type] )
> > +{
> > +gprintk(XENLOG_WARNING,
> > +"Page for adjacent(%d) MSI-X table access not initialized 
> > for %pp (addr %#lx, gtable %#lx\n",
> > +adj_type, >pdev->sbdf, addr, entry->gtable);
> > +
> > +return ADJACENT_DONT_HANDLE;
> > +}
> > +
> > +/* If PBA lives on the same page too, discard writes. */
> > +if ( write &&
> > + ((adj_type == ADJ_IDX_LAST &&
> > +   msix->table.last == msix->pba.first) ||
> > +  (adj_type == ADJ_IDX_FIRST &&
> > +   msix->table.first == msix->pba.last)) )
> > +{
> > +gprintk(XENLOG_WARNING,
> > +"MSI-X table and PBA of %pp live on the same page, "
> > +"writing to other registers there is not implemented\n",
> > +>pdev->sbdf);
> 
> Here and above I think verbosity needs limiting to the first instance per
> device per domain.

Is there some clever API for that already, or do I need to introduce
extra variable in some of those structures (msixtbl_entry? pci_dev?) ?

(other requested changes ok)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [XEN PATCH 7/7] xen/page_alloc: deviate first_valid_mfn for MISRA C Rule 8.4

On Fri, 1 Dec 2023, Jan Beulich wrote:
> On 01.12.2023 03:47, Stefano Stabellini wrote:
> > On Wed, 29 Nov 2023, Nicola Vetrini wrote:
> >> No functional change.
> >>
> >> Signed-off-by: Nicola Vetrini 
> >> ---
> >> The preferred way to deviate is to use asmlinkage, but this modification 
> >> is only
> >> the consequence of NUMA on ARM (and possibly PPC) being a work in progress.
> >> As stated in the comment above the textual deviation, first_valid_mfn will
> >> likely then become static and there would be no need for the comment 
> >> anymore.
> >> This works towards having the analysis for this rule clean (i.e. no 
> >> violations);
> >> the interest in having a clean rule is that then it could be used to signal
> >> newly introduced violations by making the analysis job fail.
> > 
> > Please add this text as part of the commit message. It can be done on
> > commit.
> 
> I assume you saw my reply on another of the patches in this series as to
> asmlinkage use on variables? IOW I think this paragraph would also need
> adjustment to account for that.

I was going to ask you about that: reading your reply
https://marc.info/?l=xen-devel=170142048615336 it is not clear to me
what you are asking or suggesting as next step in regard to asmlinkage
use on variables.

Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs

On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> >  bus = PCI_BUS(machine_sbdf);
> >  devfn = PCI_DEVFN(machine_sbdf);
> >  
> > +if ( needs_vpci(d) && !has_vpci(d) )
> > +{
> > +printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI 
> > support not enabled\n",
> > +   _SBDF(seg, bus, devfn), d);
> > +ret = -EPERM;
> > +break;
> 
> I think this is likely too restrictive going forward.  The current
> approach is indeed to enable vPCI on a per-domain basis because that's
> how PVH dom0 uses it, due to being unable to use ioreq servers.
> 
> If we start to expose vPCI suport to guests the interface should be on
> a per-device basis, so that vPCI could be enabled for some devices,
> while others could still be handled by ioreq servers.
> 
> We might want to add a new flag to xen_domctl_assign_device (used by
> XEN_DOMCTL_assign_device) in order to signal whether the device will
> use vPCI.

Actually I don't think this is a good idea. I am all for flexibility but
supporting multiple different configurations comes at an extra cost for
both maintainers and contributors. I think we should try to reduce the
amount of configurations we support rather than increasing them
(especially on x86 where we have PV, PVH, HVM).

I don't think we should enable IOREQ servers to handle PCI passthrough
for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
Passthrough can be handled by vPCI just fine. I think this should be a
good anti-feature to have (a goal to explicitly not add this feature) to
reduce complexity. Unless you see a specific usecase to add support for
it?

Re: [PATCH v2 3/5] xen/x86: introduce self modifying code test

On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> > > @@ -1261,6 +1269,7 @@ struct xen_sysctl {
> > >  struct xen_sysctl_livepatch_op  livepatch;
> > >  #if defined(__i386__) || defined(__x86_64__)
> > >  struct xen_sysctl_cpu_policycpu_policy;
> > > +struct xen_sysctl_test_smc  smc;
> > 
> > Imo the field name would better be test_smc (leaving aside Stefano's 
> > comment).
> 
> Right, will see what Stefano thinks about using test_smoc.

If you meant "test_smoc", that's totally fine.

If you meant "test_smc" I think that must be a test for the virtual SMC
interface that Xen exposes, right?  Good we need some tests for vsmc.c
;-)

[PATCH v4 3/6] xen: decouple generic xen code from legacy backends codebase

In xen-all.c there are unneeded dependencies on xen-legacy-backend.c:

 - xen_init() uses xen_pv_printf() to report errors, but it does not
 provide a pointer to the struct XenLegacyDevice, so it is kind of
 useless, we can use standard error_report() instead.

 - xen-all.c has function xenstore_record_dm_state() which uses global
 variable "xenstore" defined and initialized in xen-legacy-backend.c
 It is used exactly once, so we can just open a new connection to the
 xenstore, update DM state and close connection back.

Those two changes allows us to remove xen-legacy-backend.c at all,
what should be done in the future anyways. But right now this patch
moves us one step close to have QEMU build without legacy Xen
backends.

Signed-off-by: Volodymyr Babchuk 

---

In v4:

 - New in v4, previous was part of "xen: add option to disable legacy
 backends"
 - Do not move xenstore global variable from xen-legacy-backend.c,
   instead use a local variable.
---
 accel/xen/xen-all.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
index 5ff0cb8bd9..6c2342581f 100644
--- a/accel/xen/xen-all.c
+++ b/accel/xen/xen-all.c
@@ -33,12 +33,20 @@ xendevicemodel_handle *xen_dmod;
 static void xenstore_record_dm_state(const char *state)
 {
 char path[50];
+struct qemu_xs_handle *xsh = qemu_xen_xs_open();
+
+if (!xsh) {
+error_report("error opening xenstore");
+exit(1);
+}
 
 snprintf(path, sizeof (path), "device-model/%u/state", xen_domid);
-if (!qemu_xen_xs_write(xenstore, XBT_NULL, path, state, strlen(state))) {
+if (!qemu_xen_xs_write(xsh, XBT_NULL, path, state, strlen(state))) {
 error_report("error recording dm state");
 exit(1);
 }
+
+qemu_xen_xs_close(xsh);
 }
 
 
@@ -80,18 +88,18 @@ static int xen_init(MachineState *ms)
 
 xen_xc = xc_interface_open(0, 0, 0);
 if (xen_xc == NULL) {
-xen_pv_printf(NULL, 0, "can't open xen interface\n");
+error_report("can't open xen interface\n");
 return -1;
 }
 xen_fmem = xenforeignmemory_open(0, 0);
 if (xen_fmem == NULL) {
-xen_pv_printf(NULL, 0, "can't open xen fmem interface\n");
+error_report("can't open xen fmem interface\n");
 xc_interface_close(xen_xc);
 return -1;
 }
 xen_dmod = xendevicemodel_open(0, 0);
 if (xen_dmod == NULL) {
-xen_pv_printf(NULL, 0, "can't open xen devicemodel interface\n");
+error_report("can't open xen devicemodel interface\n");
 xenforeignmemory_close(xen_fmem);
 xc_interface_close(xen_xc);
 return -1;
-- 
2.42.0

[PATCH v4 6/6] xen_arm: Add virtual PCIe host bridge support

From: Oleksandr Tyshchenko 

The bridge is needed for virtio-pci support, as QEMU can emulate the
whole bridge with any virtio-pci devices connected to it.

This patch provides a flexible way to configure PCIe bridge resources
using QEMU machine properties. We made this for several reasons:

- We don't want to clash with vPCI devices, so we need information
  from Xen toolstack on which PCI bus to use.
- The guest memory layout that describes these resources is not stable
  and may vary between guests, so we cannot rely on static resources
  to be always the same for both ends.
- Also the device-models which run in different domains and serve
  virtio-pci devices for the same guest should use different host
  bridge resources for Xen to distinguish. The rule for the guest
  device-tree generation is one PCI host bridge per backend domain.

Signed-off-by: Oleksandr Tyshchenko 
Signed-off-by: Volodymyr Babchuk 

---

Changes in v3:

 - Use QOM properties instead of reading from XenStore
 - Remove unneeded includes
 - Move pcie_* fields into "struct cfg"

Changes in v2:

 - Renamed virtio_pci_host to pcie_host entries in XenStore, because
 there is nothing specific to virtio-pci: any PCI device can be
 emulated via this newly created bridge.
---
 hw/arm/xen_arm.c| 226 
 hw/xen/xen-hvm-common.c |   9 +-
 include/hw/xen/xen_native.h |   8 +-
 3 files changed, 240 insertions(+), 3 deletions(-)

diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
index b9c3ae14b6..dc6d3a1d82 100644
--- a/hw/arm/xen_arm.c
+++ b/hw/arm/xen_arm.c
@@ -34,6 +34,7 @@
 #include "hw/xen/xen-hvm-common.h"
 #include "sysemu/tpm.h"
 #include "hw/xen/arch_hvm.h"
+#include "hw/pci-host/gpex.h"
 
 #define TYPE_XEN_ARM  MACHINE_TYPE_NAME("xenpvh")
 OBJECT_DECLARE_SIMPLE_TYPE(XenArmState, XEN_ARM)
@@ -57,6 +58,10 @@ struct XenArmState {
 
 struct {
 uint64_t tpm_base_addr;
+MemMapEntry pcie_mmio;
+MemMapEntry pcie_ecam;
+MemMapEntry pcie_mmio_high;
+int pcie_irq_base;
 } cfg;
 };
 
@@ -73,6 +78,15 @@ static MemoryRegion ram_lo, ram_hi;
 #define NR_VIRTIO_MMIO_DEVICES   \
(GUEST_VIRTIO_MMIO_SPI_LAST - GUEST_VIRTIO_MMIO_SPI_FIRST)
 
+#define XEN_ARM_PCIE_ECAM_BASE  "pcie-ecam-base"
+#define XEN_ARM_PCIE_ECAM_SIZE  "pcie-ecam-size"
+#define XEN_ARM_PCIE_MEM_BASE   "pcie-mem-base"
+#define XEN_ARM_PCIE_MEM_SIZE   "pcie-mem-size"
+#define XEN_ARM_PCIE_PREFETCH_BASE  "pcie-prefetch-mem-base"
+#define XEN_ARM_PCIE_PREFETCH_SIZE  "pcie-prefetch-mem-size"
+#define XEN_ARM_PCIE_IRQ_BASE   "pcie-irq-base"
+
+/* TODO It should be xendevicemodel_set_pci_intx_level() for PCI interrupts. */
 static void xen_set_irq(void *opaque, int irq, int level)
 {
 if (xendevicemodel_set_irq_level(xen_dmod, xen_domid, irq, level)) {
@@ -129,6 +143,89 @@ static void xen_init_ram(MachineState *machine)
 }
 }
 
+static bool xen_validate_pcie_config(XenArmState *xam)
+{
+if (xam->cfg.pcie_ecam.base == 0 &&
+xam->cfg.pcie_ecam.size == 0 &&
+xam->cfg.pcie_mmio.base == 0 &&
+xam->cfg.pcie_mmio.size == 0 &&
+xam->cfg.pcie_mmio_high.base == 0 &&
+xam->cfg.pcie_mmio_high.size == 0 &&
+xam->cfg.pcie_irq_base == 0) {
+
+/* It's okay, user just don't want PCIe brige */
+
+return false;
+}
+
+if (xam->cfg.pcie_ecam.base == 0 ||
+xam->cfg.pcie_ecam.size == 0 ||
+xam->cfg.pcie_mmio.base == 0 ||
+xam->cfg.pcie_mmio.size == 0 ||
+xam->cfg.pcie_mmio_high.base == 0 ||
+xam->cfg.pcie_mmio_high.size == 0 ||
+xam->cfg.pcie_irq_base == 0) {
+
+/* User provided some PCIe options, but not all of them */
+
+error_printf("Incomplete PCIe bridge configuration\n");
+
+exit(1);
+}
+
+return true;
+}
+
+static void xen_create_pcie(XenArmState *xam)
+{
+MemoryRegion *mmio_alias, *mmio_alias_high, *mmio_reg;
+MemoryRegion *ecam_alias, *ecam_reg;
+DeviceState *dev;
+int i;
+
+dev = qdev_new(TYPE_GPEX_HOST);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+
+/* Map ECAM space */
+ecam_alias = g_new0(MemoryRegion, 1);
+ecam_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
+memory_region_init_alias(ecam_alias, OBJECT(dev), "pcie-ecam",
+ ecam_reg, 0, xam->cfg.pcie_ecam.size);
+memory_region_add_subregion(get_system_memory(), xam->cfg.pcie_ecam.base,
+ecam_alias);
+
+/* Map the MMIO space */
+mmio_alias = g_new0(MemoryRegion, 1);
+mmio_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 1);
+memory_region_init_alias(mmio_alias, OBJECT(dev), "pcie-mmio",
+ mmio_reg,
+ xam->cfg.pcie_mmio.base,
+ xam->cfg.pcie_mmio.size);
+memory_region_add_subregion(get_system_memory(),

[RFC PATCH v4 4/6] xen: add option to disable legacy backends

This patch makes legacy backends optional. As was discussed at [1]
this is a solution to a problem when we can't run QEMU as a device
model in a non-privileged domain. This is because legacy backends
assume that they are always running in domain with ID = 0. Actually,
this may prevent running QEMU in a privileged domain with ID not equal
to zero.

With this patch it is possible to provide
"--disable-xen-legacy-backends" configure option to get QEMU binary
that can run in a driver domain. With price of not be able to use
legacy backends of course.

[1]
https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg05022.html

Signed-off-by: Volodymyr Babchuk 

---

I am not sure if I made correct changes to build system, so this patch
is tagged as RFC.

Changes in v3:
 - New patch in v3
---
 hw/9pfs/meson.build   |  4 +++-
 hw/display/meson.build|  4 +++-
 hw/i386/pc.c  |  2 ++
 hw/usb/meson.build|  5 -
 hw/xen/meson.build| 11 ---
 hw/xen/xen-hvm-common.c   |  2 ++
 hw/xenpv/xen_machine_pv.c |  2 ++
 meson.build   |  5 +
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  4 
 10 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/hw/9pfs/meson.build b/hw/9pfs/meson.build
index 2944ea63c3..e8306ba8d2 100644
--- a/hw/9pfs/meson.build
+++ b/hw/9pfs/meson.build
@@ -15,7 +15,9 @@ fs_ss.add(files(
 ))
 fs_ss.add(when: 'CONFIG_LINUX', if_true: files('9p-util-linux.c'))
 fs_ss.add(when: 'CONFIG_DARWIN', if_true: files('9p-util-darwin.c'))
-fs_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen-9p-backend.c'))
+if have_xen_legacy_backends
+  fs_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen-9p-backend.c'))
+endif
 system_ss.add_all(when: 'CONFIG_FSDEV_9P', if_true: fs_ss)
 
 specific_ss.add(when: 'CONFIG_VIRTIO_9P', if_true: files('virtio-9p-device.c'))
diff --git a/hw/display/meson.build b/hw/display/meson.build
index 344dfe3d8c..18d657f6b3 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -14,7 +14,9 @@ system_ss.add(when: 'CONFIG_PL110', if_true: files('pl110.c'))
 system_ss.add(when: 'CONFIG_SII9022', if_true: files('sii9022.c'))
 system_ss.add(when: 'CONFIG_SSD0303', if_true: files('ssd0303.c'))
 system_ss.add(when: 'CONFIG_SSD0323', if_true: files('ssd0323.c'))
-system_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xenfb.c'))
+if have_xen_legacy_backends
+  system_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xenfb.c'))
+endif
 
 system_ss.add(when: 'CONFIG_VGA_PCI', if_true: files('vga-pci.c'))
 system_ss.add(when: 'CONFIG_VGA_ISA', if_true: files('vga-isa.c'))
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 29b9964733..91857af428 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1263,7 +1263,9 @@ void pc_basic_device_init(struct PCMachineState *pcms,
 pci_create_simple(pcms->bus, -1, "xen-platform");
 }
 pcms->xenbus = xen_bus_init();
+#ifdef CONFIG_XEN_LEGACY_BACKENDS
 xen_be_init();
+#endif
 }
 #endif
 
diff --git a/hw/usb/meson.build b/hw/usb/meson.build
index e94149ebde..8d395745b2 100644
--- a/hw/usb/meson.build
+++ b/hw/usb/meson.build
@@ -84,6 +84,9 @@ if libusb.found()
   hw_usb_modules += {'host': usbhost_ss}
 endif
 
-system_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN_BUS', libusb], if_true: 
files('xen-usb.c'))
+if have_xen_legacy_backends
+  system_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN_BUS', libusb],
+if_true: files('xen-usb.c'))
+endif
 
 modules += { 'hw-usb': hw_usb_modules }
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index d887fa9ba4..964c3364f2 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -2,11 +2,16 @@ system_ss.add(when: ['CONFIG_XEN_BUS'], if_true: files(
   'xen-backend.c',
   'xen-bus-helper.c',
   'xen-bus.c',
-  'xen-legacy-backend.c',
-  'xen_devconfig.c',
-  'xen_pvdev.c',
 ))
 
+if have_xen_legacy_backends
+  system_ss.add(when: ['CONFIG_XEN_BUS'], if_true: files(
+'xen_pvdev.c',
+'xen-legacy-backend.c',
+'xen_devconfig.c',
+  ))
+endif
+
 system_ss.add(when: ['CONFIG_XEN', xen], if_true: files(
   'xen-operations.c',
 ))
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 565dc39c8f..2e7897dbd2 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -869,7 +869,9 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 
 xen_bus_init();
 
+#ifdef CONFIG_XEN_LEGACY_BACKENDS
 xen_be_init();
+#endif
 
 return;
 
diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
index 9f9f137f99..03a55f345c 100644
--- a/hw/xenpv/xen_machine_pv.c
+++ b/hw/xenpv/xen_machine_pv.c
@@ -37,7 +37,9 @@ static void xen_init_pv(MachineState *machine)
 setup_xen_backend_ops();
 
 /* Initialize backend core & drivers */
+#ifdef CONFIG_XEN_LEGACY_BACKENDS
 xen_be_init();
+#endif
 
 switch (xen_mode) {
 case XEN_ATTACH:
diff --git a/meson.build b/meson.build
index ec01f8b138..c8a43dd97d

[PATCH v4 2/6] xen: backends: don't overwrite XenStore nodes created by toolstack

Xen PV devices in QEMU can be created in two ways: either by QEMU
itself, if they were passed via command line, or by Xen toolstack. In
the latter case, QEMU scans XenStore entries and configures devices
accordingly.

In the second case we don't want QEMU to write/delete front-end
entries for two reasons: it might have no access to those entries if
it is running in un-privileged domain and it is just incorrect to
overwrite entries already provided by Xen toolstack, because toolstack
manages those nodes. For example, it might read backend- or frontend-
state to be sure that they are both disconnected and it is safe to
destroy a domain.

This patch checks presence of xendev->backend to check if Xen PV
device was configured by Xen toolstack to decide if it should touch
frontend entries in XenStore. Also, when we need to remove XenStore
entries during device teardown only if they weren't created by Xen
toolstack. If they were created by toolstack, then it is toolstack's
job to do proper clean-up.

Suggested-by: Paul Durrant 
Suggested-by: David Woodhouse 
Co-Authored-by: Oleksandr Tyshchenko 
Signed-off-by: Volodymyr Babchuk 
Reviewed-by: David Woodhouse 

---

Changes in v4:
  - don't touch "tty" entry in the console backend

Changes in v3:
 - Rephrased the commit message
---
 hw/block/xen-block.c | 16 +---
 hw/net/xen_nic.c | 18 ++
 hw/xen/xen-bus.c | 14 +-
 3 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index c2ac9db4a2..dac519a6d3 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -390,13 +390,15 @@ static void xen_block_realize(XenDevice *xendev, Error 
**errp)
 
 xen_device_backend_printf(xendev, "info", "%u", blockdev->info);
 
-xen_device_frontend_printf(xendev, "virtual-device", "%lu",
-   vdev->number);
-xen_device_frontend_printf(xendev, "device-type", "%s",
-   blockdev->device_type);
-
-xen_device_backend_printf(xendev, "sector-size", "%u",
-  conf->logical_block_size);
+if (!xendev->backend) {
+xen_device_frontend_printf(xendev, "virtual-device", "%lu",
+   vdev->number);
+xen_device_frontend_printf(xendev, "device-type", "%s",
+   blockdev->device_type);
+
+xen_device_backend_printf(xendev, "sector-size", "%u",
+  conf->logical_block_size);
+}
 
 xen_block_set_size(blockdev);
 
diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index afa10c96e8..27442bef38 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -315,14 +315,16 @@ static void xen_netdev_realize(XenDevice *xendev, Error 
**errp)
 
 qemu_macaddr_default_if_unset(>conf.macaddr);
 
-xen_device_frontend_printf(xendev, "mac", "%02x:%02x:%02x:%02x:%02x:%02x",
-   netdev->conf.macaddr.a[0],
-   netdev->conf.macaddr.a[1],
-   netdev->conf.macaddr.a[2],
-   netdev->conf.macaddr.a[3],
-   netdev->conf.macaddr.a[4],
-   netdev->conf.macaddr.a[5]);
-
+if (!xendev->backend) {
+xen_device_frontend_printf(xendev, "mac",
+   "%02x:%02x:%02x:%02x:%02x:%02x",
+   netdev->conf.macaddr.a[0],
+   netdev->conf.macaddr.a[1],
+   netdev->conf.macaddr.a[2],
+   netdev->conf.macaddr.a[3],
+   netdev->conf.macaddr.a[4],
+   netdev->conf.macaddr.a[5]);
+}
 netdev->nic = qemu_new_nic(_xen_info, >conf,
object_get_typename(OBJECT(xendev)),
DEVICE(xendev)->id, netdev);
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index dd0171ab98..d0f17aeb27 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -599,8 +599,10 @@ static void xen_device_backend_destroy(XenDevice *xendev)
 
 g_assert(xenbus->xsh);
 
-xs_node_destroy(xenbus->xsh, XBT_NULL, xendev->backend_path,
-_err);
+if (!xendev->backend) {
+xs_node_destroy(xenbus->xsh, XBT_NULL, xendev->backend_path,
+_err);
+}
 g_free(xendev->backend_path);
 xendev->backend_path = NULL;
 
@@ -764,8 +766,10 @@ static void xen_device_frontend_destroy(XenDevice *xendev)
 
 g_assert(xenbus->xsh);
 
-xs_node_destroy(xenbus->xsh, XBT_NULL, xendev->frontend_path,
-_err);
+if (!xendev->backend) {
+xs_node_destroy(xenbus->xsh, XBT_NULL, xendev->frontend_path,
+_err);
+}
 g_free(xendev->frontend_path);
 xendev->frontend_path = NULL;
 
@@ -1063,7 +1067,7 @@

[PATCH v4 1/6] hw/xen: Set XenBackendInstance in the XenDevice before realizing it

From: David Woodhouse 

This allows a XenDevice implementation to know whether it was created
by QEMU, or merely discovered in XenStore after the toolstack created
it. This will allow us to create frontend/backend nodes only when we
should, rather than unconditionally attempting to overwrite them from
a driver domain which doesn't have privileges to do so.

As an added benefit, it also means we no longer have to call the
xen_backend_set_device() function from the device models immediately
after calling qdev_realize_and_unref(). Even though we could make
the argument that it's safe to do so, and the pointer to the unreffed
device *will* actually still be valid, it still made my skin itch to
look at it.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
---
 hw/block/xen-block.c |  3 +--
 hw/char/xen_console.c|  2 +-
 hw/net/xen_nic.c |  2 +-
 hw/xen/xen-backend.c | 15 +--
 hw/xen/xen-bus.c |  4 
 include/hw/xen/xen-backend.h |  2 --
 include/hw/xen/xen-bus.h |  2 ++
 7 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 6d64ede94f..c2ac9db4a2 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -1081,13 +1081,12 @@ static void xen_block_device_create(XenBackendInstance 
*backend,
 
 blockdev->iothread = iothread;
 blockdev->drive = drive;
+xendev->backend = backend;
 
 if (!qdev_realize_and_unref(DEVICE(xendev), BUS(xenbus), errp)) {
 error_prepend(errp, "realization of device %s failed: ", type);
 goto fail;
 }
-
-xen_backend_set_device(backend, xendev);
 return;
 
 fail:
diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index 5cbee2f184..bef8a3a621 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -600,8 +600,8 @@ static void xen_console_device_create(XenBackendInstance 
*backend,
 goto fail;
 }
 
+xendev->backend = backend;
 if (qdev_realize_and_unref(DEVICE(xendev), BUS(xenbus), errp)) {
-xen_backend_set_device(backend, xendev);
 goto done;
 }
 
diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index af4ba3f1e6..afa10c96e8 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -627,8 +627,8 @@ static void xen_net_device_create(XenBackendInstance 
*backend,
 net->dev = number;
 memcpy(>conf.macaddr, , sizeof(mac));
 
+xendev->backend = backend;
 if (qdev_realize_and_unref(DEVICE(xendev), BUS(xenbus), errp)) {
-xen_backend_set_device(backend, xendev);
 return;
 }
 
diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c
index b9bf70a9f5..b2e753ebc8 100644
--- a/hw/xen/xen-backend.c
+++ b/hw/xen/xen-backend.c
@@ -88,19 +88,6 @@ static void xen_backend_list_add(XenBackendInstance *backend)
 QLIST_INSERT_HEAD(_list, backend, entry);
 }
 
-static XenBackendInstance *xen_backend_list_find(XenDevice *xendev)
-{
-XenBackendInstance *backend;
-
-QLIST_FOREACH(backend, _list, entry) {
-if (backend->xendev == xendev) {
-return backend;
-}
-}
-
-return NULL;
-}
-
 bool xen_backend_exists(const char *type, const char *name)
 {
 const XenBackendImpl *impl = xen_backend_table_lookup(type);
@@ -170,7 +157,7 @@ XenDevice *xen_backend_get_device(XenBackendInstance 
*backend)
 
 bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp)
 {
-XenBackendInstance *backend = xen_backend_list_find(xendev);
+XenBackendInstance *backend = xendev->backend;
 const XenBackendImpl *impl;
 
 if (!backend) {
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 4973e7d9c9..dd0171ab98 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -1079,6 +1079,10 @@ static void xen_device_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
+if (xendev->backend) {
+xen_backend_set_device(xendev->backend, xendev);
+}
+
 xendev->exit.notify = xen_device_exit;
 qemu_add_exit_notifier(>exit);
 return;
diff --git a/include/hw/xen/xen-backend.h b/include/hw/xen/xen-backend.h
index 0f01631ae7..ea080ba7c9 100644
--- a/include/hw/xen/xen-backend.h
+++ b/include/hw/xen/xen-backend.h
@@ -10,8 +10,6 @@
 
 #include "hw/xen/xen-bus.h"
 
-typedef struct XenBackendInstance XenBackendInstance;
-
 typedef void (*XenBackendDeviceCreate)(XenBackendInstance *backend,
QDict *opts, Error **errp);
 typedef void (*XenBackendDeviceDestroy)(XenBackendInstance *backend,
diff --git a/include/hw/xen/xen-bus.h b/include/hw/xen/xen-bus.h
index 334ddd1ff6..7647c4c38e 100644
--- a/include/hw/xen/xen-bus.h
+++ b/include/hw/xen/xen-bus.h
@@ -14,9 +14,11 @@
 #include "qom/object.h"
 
 typedef struct XenEventChannel XenEventChannel;
+typedef struct XenBackendInstance XenBackendInstance;
 
 struct XenDevice {
 DeviceState qdev;
+XenBackendInstance *backend;
 domid_t frontend_id;
 char *name;
 struct qemu_xs_handle *xsh;

[PATCH v11 07/17] vpci/header: implement guest BAR register handlers

From: Oleksandr Andrushchenko 

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

All empty, IO and ROM BARs for guests are emulated by returning 0 on
reads and ignoring writes: this BARs are special with this respect as
their lower bits have special meaning, so returning default ~0 on read
may confuse guest OS.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---
In v11:
- Access guest_addr after adjusting for MEM64_HI bar in
guest_bar_write()
- guest bar handlers renamed and now  _mem_ part to denote
that they are handling only memory BARs
- refuse to update guest BAR address if BAR is enabled
In v10:
- ull -> ULL to be MISRA-compatbile
- Use PAGE_OFFSET() instead of combining with ~PAGE_MASK
- Set type of empty bars to VPCI_BAR_EMPTY
In v9:
- factored-out "fail" label introduction in init_bars()
- replaced #ifdef CONFIG_X86 with IS_ENABLED()
- do not pass bars[i] to empty_bar_read() handler
- store guest's BAR address instead of guests BAR register view
Since v6:
- unify the writing of the PCI_COMMAND register on the
  error path into a label
- do not introduce bar_ignore_access helper and open code
- s/guest_bar_ignore_read/empty_bar_read
- update error message in guest_bar_write
- only setup empty_bar_read for IO if !x86
Since v5:
- make sure that the guest set address has the same page offset
  as the physical address on the host
- remove guest_rom_{read|write} as those just implement the default
  behaviour of the registers not being handled
- adjusted comment for struct vpci.addr field
- add guest handlers for BARs which are not handled and will otherwise
  return ~0 on read and ignore writes. The BARs are special with this
  respect as their lower bits have special meaning, so returning ~0
  doesn't seem to be right
Since v4:
- updated commit message
- s/guest_addr/guest_reg
Since v3:
- squashed two patches: dynamic add/remove handlers and guest BAR
  handler implementation
- fix guest BAR read of the high part of a 64bit BAR (Roger)
- add error handling to vpci_assign_device
- s/dom%pd/%pd
- blank line before return
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment
---
 xen/drivers/vpci/header.c | 135 +-
 xen/include/xen/vpci.h|   3 +
 2 files changed, 122 insertions(+), 16 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index e6a1d58c42..43216429d9 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -477,6 +477,75 @@ static void cf_check bar_write(
 pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
+ unsigned int reg, uint32_t val,
+ void *data)
+{
+struct vpci_bar *bar = data;
+bool hi = false;
+uint64_t guest_addr;
+
+if ( bar->type == VPCI_BAR_MEM64_HI )
+{
+ASSERT(reg > PCI_BASE_ADDRESS_0);
+bar--;
+hi = true;
+}
+else
+{
+val &= PCI_BASE_ADDRESS_MEM_MASK;
+}
+
+guest_addr = bar->guest_addr;
+guest_addr &= ~(0xULL << (hi ? 32 : 0));
+guest_addr |= (uint64_t)val << (hi ? 32 : 0);
+
+/* Allow guest to size BAR correctly */
+guest_addr &= ~(bar->size - 1);
+
+/*
+ * Xen only cares whether the BAR is mapped into the p2m, so allow BAR
+ * writes as long as the BAR is not mapped into the p2m.
+ */
+if ( bar->enabled )
+{
+/* If the value written is the current one avoid printing a warning. */
+if ( guest_addr != bar->guest_addr )
+gprintk(XENLOG_WARNING,
+"%pp: ignored guest BAR %zu write while mapped\n",
+>sbdf, bar - pdev->vpci->header.bars + hi);
+return;
+}
+bar->guest_addr = guest_addr;
+}
+
+static uint32_t cf_check guest_mem_bar_read(const struct pci_dev *pdev,
+unsigned int reg, void *data)
+{
+const struct vpci_bar *bar = data;
+uint32_t reg_val;
+
+if ( bar->type ==

[PATCH v11 03/17] vpci: use per-domain PCI lock to protect vpci structure

From: Oleksandr Andrushchenko 

Use a previously introduced per-domain read/write lock to check
whether vpci is present, so we are sure there are no accesses to the
contents of the vpci struct if not. This lock can be used (and in a
few cases is used right away) so that vpci removal can be performed
while holding the lock in write mode. Previously such removal could
race with vpci_read for example.

When taking both d->pci_lock and pdev->vpci->lock, they should be
taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
possible deadlock situations.

1. Per-domain's pci_rwlock is used to protect pdev->vpci structure
from being removed.

2. Writing the command register and ROM BAR register may trigger
modify_bars to run, which in turn may access multiple pdevs while
checking for the existing BAR's overlap. The overlapping check, if
done under the read lock, requires vpci->lock to be acquired on both
devices being compared, which may produce a deadlock. It is not
possible to upgrade read lock to write lock in such a case. So, in
order to prevent the deadlock, use d->pci_lock instead. To prevent
deadlock while locking both hwdom->pci_lock and dom_xen->pci_lock,
always lock hwdom first.

All other code, which doesn't lead to pdev->vpci destruction and does
not access multiple pdevs at the same time, can still use a
combination of the read lock and pdev->vpci->lock.

3. Drop const qualifier where the new rwlock is used and this is
appropriate.

4. Do not call process_pending_softirqs with any locks held. For that
unlock prior the call and re-acquire the locks after. After
re-acquiring the lock there is no need to check if pdev->vpci exists:
 - in apply_map because of the context it is called (no race condition
   possible)
 - for MSI/MSI-X debug code because it is called at the end of
   pdev->vpci access and no further access to pdev->vpci is made

5. Use d->pci_lock around for_each_pdev and pci_get_pdev_by_domain
while accessing pdevs in vpci code.

6. We are removing multiple ASSERT(pcidevs_locked()) instances because
they are too strict now: they should be corrected to
ASSERT(pcidevs_locked() || rw_is_locked(>pci_lock)), but problem is
that mentioned instances does not have access to the domain
pointer and it is not feasible to pass a domain pointer to a function
just for debugging purposes.

Suggested-by: Roger Pau Monné 
Suggested-by: Jan Beulich 
Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 

---
Changes in v11:
 - Fixed commit message regarding possible spinlocks
 - Removed parameter from allocate_and_map_msi_pirq(), which was added
 in the prev version. Now we are taking pcidevs_lock in
 physdev_map_pirq()
 - Returned ASSERT to pci_enable_msi
 - Fixed case when we took read lock instead of write one
 - Fixed label indentation

Changes in v10:
 - Moved printk pas locked area
 - Returned back ASSERTs
 - Added new parameter to allocate_and_map_msi_pirq() so it knows if
 it should take the global pci lock
 - Added comment about possible improvement in vpci_write
 - Changed ASSERT(rw_is_locked()) to rw_is_write_locked() in
   appropriate places
 - Renamed release_domain_locks() to release_domain_write_locks()
 - moved domain_done label in vpci_dump_msi() to correct place
Changes in v9:
 - extended locked region to protect vpci_remove_device and
   vpci_add_handlers() calls
 - vpci_write() takes lock in the write mode to protect
   potential call to modify_bars()
 - renamed lock releasing function
 - removed ASSERT()s from msi code
 - added trylock in vpci_dump_msi

Changes in v8:
 - changed d->vpci_lock to d->pci_lock
 - introducing d->pci_lock in a separate patch
 - extended locked region in vpci_process_pending
 - removed pcidevs_lockis vpci_dump_msi()
 - removed some changes as they are not needed with
   the new locking scheme
 - added handling for hwdom && dom_xen case
---
 xen/arch/x86/hvm/vmsi.c   | 22 +++
 xen/arch/x86/hvm/vmx/vmx.c|  2 --
 xen/arch/x86/irq.c|  8 +++---
 xen/arch/x86/msi.c| 10 ++-
 xen/arch/x86/physdev.c|  2 ++
 xen/drivers/passthrough/pci.c |  9 +++---
 xen/drivers/vpci/header.c | 18 
 xen/drivers/vpci/msi.c| 28 +--
 xen/drivers/vpci/msix.c   | 52 ++-
 xen/drivers/vpci/vpci.c   | 50 +++--
 10 files changed, 160 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 128f236362..03caf91bee 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -468,7 +468,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq 
*pirq, uint64_t gtable)
 struct msixtbl_entry *entry, *new_entry;
 int r = -EINVAL;
 
-ASSERT(pcidevs_locked());
+ASSERT(pcidevs_locked() || rw_is_locked(>pci_lock));
 ASSERT(rw_is_write_locked(>event_lock));
 
 if ( !msixtbl_initialised(d) )
@@ -538,7 +538,7 @@ void

[PATCH v11 00/17] PCI devices passthrough on Arm, part 3

This is next version of vPCI rework. Aim of this series is to prepare
ground for introducing PCI support on ARM platform.

in v11:
 - Added my Signed-off-by tag to all patches
 - Patch "vpci/header: emulate PCI_COMMAND register for guests" is in
   intermediate state, because it was agreed to rework it once Stewart's
   series on register handling are in.
 - Addressed comments, please see patch descriptions for details.

in v10:

 - Removed patch ("xen/arm: vpci: check guest range"), proper fix
   for the issue is part of ("vpci/header: emulate PCI_COMMAND
   register for guests")
 - Removed patch ("pci/header: reset the command register when adding
   devices")
 - Added patch ("rangeset: add rangeset_empty() function") because
   this function is needed in ("vpci/header: handle p2m range sets
   per BAR")
 - Added ("vpci/header: handle p2m range sets per BAR") which addressed
   an issue discovered by Andrii Chepurnyi during virtio integration
 - Added ("pci: msi: pass pdev to pci_enable_msi() function"), which is
   prereq for ("pci: introduce per-domain PCI rwlock")
 - Fixed "Since v9/v8/... " comments in changelogs to reduce confusion.
   I left "Since" entries for older versions, because they were added
   by original author of the patches.

in v9:

v9 includes addressed commentes from a previous one. Also it
introduces a couple patches from Stewart. This patches are related to
vPCI use on ARM. Patch "vpci/header: rework exit path in init_bars"
was factored-out from "vpci/header: handle p2m range sets per BAR".

in v8:

The biggest change from previous, mistakenly named, v7 series is how
locking is implemented. Instead of d->vpci_rwlock we introduce
d->pci_lock which has broader scope, as it protects not only domain's
vpci state, but domain's list of PCI devices as well.

As we discussed in IRC with Roger, it is not feasible to rework all
the existing code to use the new lock right away. It was agreed that
any write access to d->pdev_list will be protected by **both**
d->pci_lock in write mode and pcidevs_lock(). Read access on other
hand should be protected by either d->pci_lock in read mode or
pcidevs_lock(). It is expected that existing code will use
pcidevs_lock() and new users will use new rw lock. Of course, this
does not mean that new users shall not use pcidevs_lock() when it is
appropriate.



Changes from previous versions are described in each separate patch.

Hello all,

This is next version of vPCI rework. Aim of this series is to prepare
ground for introducing PCI support on ARM platform.

in v10:

 - Removed patch ("xen/arm: vpci: check guest range"), proper fix
   for the issue is part of ("vpci/header: emulate PCI_COMMAND
   register for guests")
 - Removed patch ("pci/header: reset the command register when adding
   devices")
 - Added patch ("rangeset: add rangeset_empty() function") because
   this function is needed in ("vpci/header: handle p2m range sets
   per BAR")
 - Added ("vpci/header: handle p2m range sets per BAR") which addressed
   an issue discovered by Andrii Chepurnyi during virtio integration
 - Added ("pci: msi: pass pdev to pci_enable_msi() function"), which is
   prereq for ("pci: introduce per-domain PCI rwlock")
 - Fixed "Since v9/v8/... " comments in changelogs to reduce confusion.
   I left "Since" entries for older versions, because they were added
   by original author of the patches.

in v9:

v9 includes addressed commentes from a previous one. Also it
introduces a couple patches from Stewart. This patches are related to
vPCI use on ARM. Patch "vpci/header: rework exit path in init_bars"
was factored-out from "vpci/header: handle p2m range sets per BAR".

in v8:

The biggest change from previous, mistakenly named, v7 series is how
locking is implemented. Instead of d->vpci_rwlock we introduce
d->pci_lock which has broader scope, as it protects not only domain's
vpci state, but domain's list of PCI devices as well.

As we discussed in IRC with Roger, it is not feasible to rework all
the existing code to use the new lock right away. It was agreed that
any write access to d->pdev_list will be protected by **both**
d->pci_lock in write mode and pcidevs_lock(). Read access on other
hand should be protected by either d->pci_lock in read mode or
pcidevs_lock(). It is expected that existing code will use
pcidevs_lock() and new users will use new rw lock. Of course, this
does not mean that new users shall not use pcidevs_lock() when it is
appropriate.



Changes from previous versions are described in each separate patch.


Oleksandr Andrushchenko (11):
  vpci: use per-domain PCI lock to protect vpci structure
  vpci: restrict unhandled read/write operations for guests
  vpci: add hooks for PCI device assign/de-assign
  vpci/header: implement guest BAR register handlers
  rangeset: add RANGESETF_no_print flag
  vpci/header: handle p2m range sets per BAR
  vpci/header: program p2m with guest BAR view
  vpci/header: emulate PCI_COMMAND register for guests
  vpci: add

[PATCH v11 01/17] pci: msi: pass pdev to pci_enable_msi() function

Previously pci_enable_msi() function obtained pdev pointer by itself,
but taking into account upcoming changes to PCI locking, it is better
when caller passes already acquired pdev pointer to the function,
because caller knows better how to obtain the pointer and which locks
are needed to be used. Also, in most cases caller already has pointer
to pdev, so we can avoid an extra list walk.

Signed-off-by: Volodymyr Babchuk 

---
In v11:
 - Made pdev parameter very first in pci_enable_msi() and friends.
 - Extended the commit message
 - Added check for pdev into ns16550 driver
 - Replaced hard tabs with spaces

Changes in v10:

 - New in v10. This is the result of discussion in "vpci: add initial
 support for virtual PCI bus topology"
---
 xen/arch/x86/include/asm/msi.h |  5 +++--
 xen/arch/x86/irq.c |  2 +-
 xen/arch/x86/msi.c | 19 ++-
 xen/drivers/char/ns16550.c | 28 ++--
 4 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index c1ece2786e..07b3ee55e9 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -81,8 +81,9 @@ struct irq_desc;
 struct hw_interrupt_type;
 struct msi_desc;
 /* Helper functions */
-extern int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc);
-extern void pci_disable_msi(struct msi_desc *msi_desc);
+extern int pci_enable_msi(struct pci_dev *pdev, struct msi_info *msi,
+  struct msi_desc **desc);
+extern void pci_disable_msi(struct msi_desc *desc);
 extern int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool off);
 extern void pci_cleanup_msi(struct pci_dev *pdev);
 extern int setup_msi_irq(struct irq_desc *desc, struct msi_desc *msidesc);
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 6e668b1b4f..50e49e1a4b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2176,7 +2176,7 @@ int map_domain_pirq(
 if ( !pdev )
 goto done;
 
-ret = pci_enable_msi(msi, _desc);
+ret = pci_enable_msi(pdev, msi, _desc);
 if ( ret )
 {
 if ( ret > 0 )
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 7f8e794254..335c0868a2 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -983,13 +983,13 @@ static int msix_capability_init(struct pci_dev *dev,
  * irq or non-zero for otherwise.
  **/
 
-static int __pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
+static int __pci_enable_msi(struct pci_dev *pdev, struct msi_info *msi,
+struct msi_desc **desc)
 {
-struct pci_dev *pdev;
 struct msi_desc *old_desc;
 
 ASSERT(pcidevs_locked());
-pdev = pci_get_pdev(NULL, msi->sbdf);
+
 if ( !pdev )
 return -ENODEV;
 
@@ -1038,13 +1038,13 @@ static void __pci_disable_msi(struct msi_desc *entry)
  * of irqs available. Driver should use the returned value to re-send
  * its request.
  **/
-static int __pci_enable_msix(struct msi_info *msi, struct msi_desc **desc)
+static int __pci_enable_msix(struct pci_dev *pdev, struct msi_info *msi,
+ struct msi_desc **desc)
 {
-struct pci_dev *pdev;
 struct msi_desc *old_desc;
 
 ASSERT(pcidevs_locked());
-pdev = pci_get_pdev(NULL, msi->sbdf);
+
 if ( !pdev || !pdev->msix )
 return -ENODEV;
 
@@ -1151,15 +1151,16 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool 
off)
  * Notice: only construct the msi_desc
  * no change to irq_desc here, and the interrupt is masked
  */
-int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc)
+int pci_enable_msi(struct pci_dev *pdev, struct msi_info *msi,
+   struct msi_desc **desc)
 {
 ASSERT(pcidevs_locked());
 
 if ( !use_msi )
 return -EPERM;
 
-return msi->table_base ? __pci_enable_msix(msi, desc) :
- __pci_enable_msi(msi, desc);
+return msi->table_base ? __pci_enable_msix(pdev, msi, desc) :
+ __pci_enable_msi(pdev, msi, desc);
 }
 
 /*
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index ddf2a48be6..cfe9ff8d2a 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -452,21 +452,29 @@ static void __init cf_check ns16550_init_postirq(struct 
serial_port *port)
 if ( rc > 0 )
 {
 struct msi_desc *msi_desc = NULL;
+struct pci_dev *pdev;
 
 pcidevs_lock();
 
-rc = pci_enable_msi(, _desc);
-if ( !rc )
+pdev = pci_get_pdev(NULL, msi.sbdf);
+if ( pdev )
 {
-struct irq_desc *desc = irq_to_desc(msi.irq);
-unsigned long flags;
-
-spin_lock_irqsave(>lock, flags);
-rc = setup_msi_irq(desc, msi_desc);
-spin_unlock_irqrestore(>lock,

[PATCH v11 08/17] rangeset: add RANGESETF_no_print flag

From: Oleksandr Andrushchenko 

There are range sets which should not be printed, so introduce a flag
which allows marking those as such. Implement relevant logic to skip
such entries while printing.

While at it also simplify the definition of the flags by directly
defining those without helpers.

Suggested-by: Jan Beulich 
Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Reviewed-by: Jan Beulich 
---
Since v5:
- comment indentation (Jan)
Since v1:
- update BUG_ON with new flag
- simplify the definition of the flags
---
 xen/common/rangeset.c  | 5 -
 xen/include/xen/rangeset.h | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 16a4c3b842..0ccd53caac 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
 INIT_LIST_HEAD(>range_list);
 r->nr_ranges = -1;
 
-BUG_ON(flags & ~RANGESETF_prettyprint_hex);
+BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
 r->flags = flags;
 
 safe_strcpy(r->name, name ?: "(no name)");
@@ -575,6 +575,9 @@ void rangeset_domain_printk(
 
 list_for_each_entry ( r, >rangesets, rangeset_list )
 {
+if ( r->flags & RANGESETF_no_print )
+continue;
+
 printk("");
 rangeset_printk(r);
 printk("\n");
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 8be0722787..87bd956962 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -49,8 +49,9 @@ void rangeset_limit(
 
 /* Flags for passing to rangeset_new(). */
  /* Pretty-print range limits in hexadecimal. */
-#define _RANGESETF_prettyprint_hex 0
-#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
+#define RANGESETF_prettyprint_hex   (1U << 0)
+ /* Do not print entries marked with this flag. */
+#define RANGESETF_no_print  (1U << 1)
 
 bool __must_check rangeset_is_empty(
 const struct rangeset *r);
-- 
2.42.0

[PATCH v11 11/17] vpci/header: program p2m with guest BAR view

From: Oleksandr Andrushchenko 

Take into account guest's BAR view and program its p2m accordingly:
gfn is guest's view of the BAR and mfn is the physical BAR value.
This way hardware domain sees physical BAR values and guest sees
emulated ones.

Hardware domain continues getting the BARs identity mapped, while for
domUs the BARs are mapped at the requested guest address without
modifying the BAR address in the device PCI config space.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---
In v11:
- Add vmsix_guest_table_addr() and vmsix_guest_table_base() functions
  to access guest's view of the VMSIx tables.
- Use MFN (not GFN) to check access permissions
- Move page offset check to this patch
- Call rangeset_remove_range() with correct parameters
In v10:
- Moved GFN variable definition outside the loop in map_range()
- Updated printk error message in map_range()
- Now BAR address is always stored in bar->guest_addr, even for
  HW dom, this removes bunch of ugly is_hwdom() checks in modify_bars()
- vmsix_table_base() now uses .guest_addr instead of .addr
In v9:
- Extended the commit message
- Use bar->guest_addr in modify_bars
- Extended printk error message in map_range
- Moved map_data initialization so .bar can be initialized during declaration
Since v5:
- remove debug print in map_range callback
- remove "identity" from the debug print
Since v4:
- moved start_{gfn|mfn} calculation into map_range
- pass vpci_bar in the map_data instead of start_{gfn|mfn}
- s/guest_addr/guest_reg
Since v3:
- updated comment (Roger)
- removed gfn_add(map->start_gfn, rc); which is wrong
- use v->domain instead of v->vpci.pdev->domain
- removed odd e.g. in comment
- s/d%d/%pd in altered code
- use gdprintk for map/unmap logs
Since v2:
- improve readability for data.start_gfn and restructure ?: construct
Since v1:
 - s/MSI/MSI-X in comments
---
 xen/drivers/vpci/header.c | 79 +--
 xen/include/xen/vpci.h| 13 +++
 2 files changed, 73 insertions(+), 19 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 7c84cee5d1..21b3fb5579 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -33,6 +33,7 @@
 
 struct map_data {
 struct domain *d;
+const struct vpci_bar *bar;
 bool map;
 };
 
@@ -40,13 +41,24 @@ static int cf_check map_range(
 unsigned long s, unsigned long e, void *data, unsigned long *c)
 {
 const struct map_data *map = data;
+/* Start address of the BAR as seen by the guest. */
+unsigned long start_gfn = PFN_DOWN(map->bar->guest_addr);
+/* Physical start address of the BAR. */
+mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
 int rc;
 
 for ( ; ; )
 {
 unsigned long size = e - s + 1;
+/*
+ * Ranges to be mapped don't always start at the BAR start address, as
+ * there can be holes or partially consumed ranges. Account for the
+ * offset of the current address from the BAR start.
+ */
+mfn_t map_mfn = mfn_add(start_mfn, s - start_gfn);
+unsigned long m_end = mfn_x(map_mfn) + size - 1;
 
-if ( !iomem_access_permitted(map->d, s, e) )
+if ( !iomem_access_permitted(map->d, mfn_x(map_mfn), m_end) )
 {
 printk(XENLOG_G_WARNING
"%pd denied access to MMIO range [%#lx, %#lx]\n",
@@ -54,7 +66,8 @@ static int cf_check map_range(
 return -EPERM;
 }
 
-rc = xsm_iomem_mapping(XSM_HOOK, map->d, s, e, map->map);
+rc = xsm_iomem_mapping(XSM_HOOK, map->d, mfn_x(map_mfn), m_end,
+   map->map);
 if ( rc )
 {
 printk(XENLOG_G_WARNING
@@ -72,8 +85,8 @@ static int cf_check map_range(
  * - {un}map_mmio_regions doesn't support preemption.
  */
 
-rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
-  : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
+rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, map_mfn)
+  : unmap_mmio_regions(map->d, _gfn(s), size, map_mfn);
 if ( rc == 0 )
 {
 *c += size;
@@ -82,8 +95,9 @@ static int cf_check map_range(
 if ( rc < 0 )
 {
 printk(XENLOG_G_WARNING
-   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
-   map->map ? "" : "un", s, e, map->d->domain_id, rc);
+   "Failed to %smap [%lx %lx] -> [%lx %lx] for %pd: %d\n",
+   map->map ? "" : "un", s, e, mfn_x(map_mfn),
+   mfn_x(map_mfn) + size, map->d, rc);
 break;
 }
 ASSERT(rc < size);
@@ -162,10 +176,6 @@ static void modify_decoding(const struct pci_dev *pdev, 
uint16_t cmd,
 bool vpci_process_pending(struct vcpu *v)
 {
 struct pci_dev *pdev = v->vpci.pdev;
-struct map_data data = {
-.d = v->domain,
-

[PATCH v11 06/17] vpci/header: rework exit path in init_bars

Introduce "fail" label in init_bars() function to have the centralized
error return path. This is the pre-requirement for the future changes
in this function.

This patch does not introduce functional changes.

Signed-off-by: Volodymyr Babchuk 
Suggested-by: Roger Pau Monné 
Acked-by: Roger Pau Monné 
--
In v11:
- Do not remove empty line between "goto fail;" and "continue;"
In v10:
- Added Roger's A-b tag.
In v9:
- New in v9
---
 xen/drivers/vpci/header.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index ec6c93eef6..e6a1d58c42 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -581,10 +581,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
 rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
4, [i]);
 if ( rc )
-{
-pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-return rc;
-}
+goto fail;
 
 continue;
 }
@@ -604,10 +601,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
 rc = pci_size_mem_bar(pdev->sbdf, reg, , ,
   (i == num_bars - 1) ? PCI_BAR_LAST : 0);
 if ( rc < 0 )
-{
-pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-return rc;
-}
+goto fail;
 
 if ( size == 0 )
 {
@@ -622,10 +616,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
 rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
[i]);
 if ( rc )
-{
-pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-return rc;
-}
+goto fail;
 }
 
 /* Check expansion ROM. */
@@ -647,6 +638,10 @@ static int cf_check init_bars(struct pci_dev *pdev)
 }
 
 return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
+
+ fail:
+pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+return rc;
 }
 REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
 
-- 
2.42.0

[PATCH v11 17/17] arm/vpci: honor access size when returning an error

Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0x in the target
register.

Signed-off-by: Volodymyr Babchuk 
---
 xen/arch/arm/vpci.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index b6ef440f17..05a479096e 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -42,6 +42,8 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 {
 struct pci_host_bridge *bridge = p;
 pci_sbdf_t sbdf;
+const uint8_t access_size = (1 << info->dabt.size) * 8;
+const uint64_t access_mask = GENMASK_ULL(access_size - 1, 0);
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
@@ -49,7 +51,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 
 if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, ) )
 {
-*r = ~0UL;
+*r = access_mask;
 return 1;
 }
 
@@ -60,7 +62,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 return 1;
 }
 
-*r = ~0UL;
+*r = access_mask;
 
 return 0;
 }
-- 
2.42.0

[PATCH v11 13/17] vpci: add initial support for virtual PCI bus topology

From: Oleksandr Andrushchenko 

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---
In v11:
- Fixed code formatting
- Removed bogus write_unlock() call
- Fixed type for new_dev_number
In v10:
- Removed ASSERT(pcidevs_locked())
- Removed redundant code (local sbdf variable, clearing sbdf during
device removal, etc)
- Added __maybe_unused attribute to "out:" label
- Introduced HAS_VPCI_GUEST_SUPPORT Kconfig option, as this is the
  first patch where it is used (previously was in "vpci: add hooks for
  PCI device assign/de-assign")
In v9:
- Lock in add_virtual_device() replaced with ASSERT (thanks, Stewart)
In v8:
- Added write lock in add_virtual_device
Since v6:
- re-work wrt new locking scheme
- OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
Since v5:
- s/vpci_add_virtual_device/add_virtual_device and make it static
- call add_virtual_device from vpci_assign_device and do not use
  REGISTER_VPCI_INIT machinery
- add pcidevs_locked ASSERT
- use DECLARE_BITMAP for vpci_dev_assigned_map
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/Kconfig |  4 +++
 xen/drivers/vpci/vpci.c | 57 +
 xen/include/xen/sched.h |  8 ++
 xen/include/xen/vpci.h  | 11 
 4 files changed, 80 insertions(+)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47..780490cf8e 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
bool
 
+config HAS_VPCI_GUEST_SUPPORT
+   bool
+   depends on HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 8865c1580e..c92f2d7bc3 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -36,6 +36,49 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static int add_virtual_device(struct pci_dev *pdev)
+{
+struct domain *d = pdev->domain;
+unsigned int new_dev_number;
+
+if ( is_hardware_domain(d) )
+return 0;
+
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+/*
+ * Each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ */
+if ( pdev->info.is_extfn && !pdev->info.is_virtfn )
+{
+gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+ >sbdf);
+return -EOPNOTSUPP;
+}
+new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
+ VPCI_MAX_VIRT_DEV);
+if ( new_dev_number == VPCI_MAX_VIRT_DEV )
+return -ENOSPC;
+
+__set_bit(new_dev_number, >vpci_dev_assigned_map);
+
+/*
+ * Both segment and bus number are 0:
+ *  - we emulate a single host bridge for the guest, e.g. segment 0
+ *  - with bus 0 the virtual devices are seen as embedded
+ *endpoints behind the root complex
+ *
+ * TODO: add support for multi-function devices.
+ */
+pdev->vpci->guest_sbdf = PCI_SBDF(0, 0, new_dev_number, 0);
+
+return 0;
+}
+
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 void vpci_deassign_device(struct pci_dev *pdev)
 {
 unsigned int i;
@@ -45,6 +88,12 @@ void vpci_deassign_device(struct pci_dev *pdev)
 if ( !has_vpci(pdev->domain) || !pdev->vpci )
 return;
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+if ( pdev->vpci->guest_sbdf.sbdf != ~0 )
+__clear_bit(pdev->vpci->guest_sbdf.dev,
+>domain->vpci_dev_assigned_map);
+#endif
+

[PATCH v11 14/17] xen/arm: translate virtual PCI bus topology for guests

From: Oleksandr Andrushchenko 

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---
In v11:
- Fixed format issues
- Added ASSERT_UNREACHABLE() to the dummy implementation of
vpci_translate_virtual_device()
- Moved variable in vpci_sbdf_from_gpa(), now it is easier to follow
the logic in the function
Since v9:
- Commend about required lock replaced with ASSERT()
- Style fixes
- call to vpci_translate_virtual_device folded into vpci_sbdf_from_gpa
Since v8:
- locks moved out of vpci_translate_virtual_device()
Since v6:
- add pcidevs locking to vpci_translate_virtual_device
- update wrt to the new locking scheme
Since v5:
- add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT
  case to simplify ifdefery
- add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device
- reset output register on failed virtual SBDF translation
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/vpci.c | 47 +++--
 xen/drivers/vpci/vpci.c | 24 +
 xen/include/xen/vpci.h  | 12 +++
 3 files changed, 72 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3bc4bb5508..7a6a0017d1 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -7,31 +7,51 @@
 
 #include 
 
-static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,
- paddr_t gpa)
+static bool vpci_sbdf_from_gpa(struct domain *d,
+   const struct pci_host_bridge *bridge,
+   paddr_t gpa, pci_sbdf_t *sbdf)
 {
-pci_sbdf_t sbdf;
+bool translated = true;
+
+ASSERT(sbdf);
 
 if ( bridge )
 {
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
-sbdf.seg = bridge->segment;
-sbdf.bus += bridge->cfg->busn_start;
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
+sbdf->seg = bridge->segment;
+sbdf->bus += bridge->cfg->busn_start;
 }
 else
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+{
+/*
+ * For the passed through devices we need to map their virtual SBDF
+ * to the physical PCI device being passed through.
+ */
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+read_lock(>pci_lock);
+translated = vpci_translate_virtual_device(d, sbdf);
+read_unlock(>pci_lock);
+}
 
-return sbdf;
+return translated;
 }
 
 static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
   register_t *r, void *p)
 {
 struct pci_host_bridge *bridge = p;
-pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+pci_sbdf_t sbdf;
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
+ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, ) )
+{
+*r = ~0UL;
+return 1;
+}
+
 if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
 1U << info->dabt.size, ) )
 {
@@ -39,7 +59,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 return 1;
 }
 
-*r = ~0ul;
+*r = ~0UL;
 
 return 0;
 }
@@ -48,7 +68,12 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
register_t r, void *p)
 {
 struct pci_host_bridge *bridge = p;
-pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+pci_sbdf_t sbdf;
+
+ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, ) )
+return 1;
 
 return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
1U << info->dabt.size, r);
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index c92f2d7bc3..7c0b610ccc 100644
--- a/xen/drivers/vpci/vpci.c
+++

[PATCH v11 02/17] pci: introduce per-domain PCI rwlock

Add per-domain d->pci_lock that protects access to
d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
that underlying pdev will not disappear under feet. This is a rw-lock,
but this patch adds only write_lock()s. There will be read_lock()
users in the next patches.

This lock should be taken in write mode every time d->pdev_list is
altered. All write accesses also should be protected by pcidevs_lock()
as well. Idea is that any user that wants read access to the list or
to the devices stored in the list should use either this new
d->pci_lock or old pcidevs_lock(). Usage of any of this two locks will
ensure only that pdev of interest will not disappear from under feet
and that the pdev still will be assigned to the same domain. Of
course, any new users should use pcidevs_lock() when it is
appropriate (e.g. when accessing any other state that is protected by
the said lock). In case both the newly introduced per-domain rwlock
and the pcidevs lock is taken, the latter must be acquired first.

Suggested-by: Roger Pau Monné 
Suggested-by: Jan Beulich 
Signed-off-by: Volodymyr Babchuk 
Reviewed-by: Roger Pau Monné 

---

Changes in v10:
 - pdev->domain is assigned after removing from source domain but
   before adding to target domain in reassign_device() functions.

Changes in v9:
 - returned back "pdev->domain = target;" in AMD IOMMU code
 - used "source" instead of pdev->domain in IOMMU functions
 - added comment about lock ordering in the commit message
 - reduced locked regions
 - minor changes non-functional changes in various places

Changes in v8:
 - New patch

Changes in v8 vs RFC:
 - Removed all read_locks after discussion with Roger in #xendevel
 - pci_release_devices() now returns the first error code
 - extended commit message
 - added missing lock in pci_remove_device()
 - extended locked region in pci_add_device() to protect list_del() calls
---
 xen/common/domain.c |  1 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
 xen/drivers/passthrough/pci.c   | 71 +
 xen/drivers/passthrough/vtd/iommu.c |  9 ++-
 xen/include/xen/sched.h |  1 +
 5 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index cd2ca6d49a..9b8902daa3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -651,6 +651,7 @@ struct domain *domain_create(domid_t domid,
 
 #ifdef CONFIG_HAS_PCI
 INIT_LIST_HEAD(>pdev_list);
+rwlock_init(>pci_lock);
 #endif
 
 /* All error paths can depend on the above setup. */
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 6bc73dc210..5cd208bbef 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -481,8 +481,15 @@ static int cf_check reassign_device(
 
 if ( devfn == pdev->devfn && pdev->domain != target )
 {
-list_move(>domain_list, >pdev_list);
+write_lock(>pci_lock);
+list_del(>domain_list);
+write_unlock(>pci_lock);
+
 pdev->domain = target;
+
+write_lock(>pci_lock);
+list_add(>domain_list, >pdev_list);
+write_unlock(>pci_lock);
 }
 
 /*
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 04d00c7c37..b8ad4fa07c 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -453,7 +453,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
 if ( pdev->domain )
 return;
 pdev->domain = dom_xen;
+write_lock(_xen->pci_lock);
 list_add(>domain_list, _xen->pdev_list);
+write_unlock(_xen->pci_lock);
 }
 
 int __init pci_hide_device(unsigned int seg, unsigned int bus,
@@ -746,7 +748,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( !pdev->domain )
 {
 pdev->domain = hardware_domain;
+write_lock(_domain->pci_lock);
 list_add(>domain_list, _domain->pdev_list);
+write_unlock(_domain->pci_lock);
 
 /*
  * For devices not discovered by Xen during boot, add vPCI handlers
@@ -756,7 +760,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( ret )
 {
 printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
+write_lock(_domain->pci_lock);
 list_del(>domain_list);
+write_unlock(_domain->pci_lock);
 pdev->domain = NULL;
 goto out;
 }
@@ -764,7 +770,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( ret )
 {
 vpci_remove_device(pdev);
+write_lock(_domain->pci_lock);
 list_del(>domain_list);
+write_unlock(_domain->pci_lock);
 pdev->domain = NULL;
 goto out;
 }
@@ -814,7 +822,11 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
 pci_cleanup_msi(pdev);
 ret =

[PATCH v11 16/17] xen/arm: vpci: permit access to guest vpci space

From: Stewart Hildebrand 

Move iomem_caps initialization earlier (before arch_domain_create()).

Signed-off-by: Stewart Hildebrand 
Signed-off-by: Volodymyr Babchuk 
---
Changes in v10:
* fix off-by-one
* also permit access to GUEST_VPCI_PREFETCH_MEM_ADDR

Changes in v9:
* new patch

This is sort of a follow-up to:

  baa6ea700386 ("vpci: add permission checks to map_range()")

I don't believe we need a fixes tag since this depends on the vPCI p2m BAR
patches.
---
 xen/arch/arm/vpci.c | 9 +
 xen/common/domain.c | 4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 348ba0fbc8..b6ef440f17 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -2,6 +2,7 @@
 /*
  * xen/arch/arm/vpci.c
  */
+#include 
 #include 
 #include 
 
@@ -115,8 +116,16 @@ int domain_vpci_init(struct domain *d)
 return ret;
 }
 else
+{
 register_mmio_handler(d, _mmio_handler,
   GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, 
NULL);
+iomem_permit_access(d, paddr_to_pfn(GUEST_VPCI_MEM_ADDR),
+paddr_to_pfn(GUEST_VPCI_MEM_ADDR +
+ GUEST_VPCI_MEM_SIZE - 1));
+iomem_permit_access(d, paddr_to_pfn(GUEST_VPCI_PREFETCH_MEM_ADDR),
+paddr_to_pfn(GUEST_VPCI_PREFETCH_MEM_ADDR +
+ GUEST_VPCI_PREFETCH_MEM_SIZE - 1));
+}
 
 return 0;
 }
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 9b8902daa3..dccd272533 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -695,6 +695,9 @@ struct domain *domain_create(domid_t domid,
 radix_tree_init(>pirq_tree);
 }
 
+if ( !is_idle_domain(d) )
+d->iomem_caps = rangeset_new(d, "I/O Memory", 
RANGESETF_prettyprint_hex);
+
 if ( (err = arch_domain_create(d, config, flags)) != 0 )
 goto fail;
 init_status |= INIT_arch;
@@ -711,7 +714,6 @@ struct domain *domain_create(domid_t domid,
 watchdog_domain_init(d);
 init_status |= INIT_watchdog;
 
-d->iomem_caps = rangeset_new(d, "I/O Memory", 
RANGESETF_prettyprint_hex);
 d->irq_caps   = rangeset_new(d, "Interrupts", 0);
 if ( !d->iomem_caps || !d->irq_caps )
 goto fail;
-- 
2.42.0

[PATCH v11 15/17] xen/arm: account IO handlers for emulated PCI MSI-X

From: Oleksandr Andrushchenko 

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko 
Acked-by: Julien Grall 
Signed-off-by: Volodymyr Babchuk 

---
This actually moved here from the part 2 of the prep work for PCI
passthrough on Arm as it seems to be the proper place for it.

Since v5:
- optimize with IS_ENABLED(CONFIG_HAS_PCI_MSI) since VPCI_MAX_VIRT_DEV is
  defined unconditionally
New in v5
---
 xen/arch/arm/vpci.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 7a6a0017d1..348ba0fbc8 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -130,6 +130,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+unsigned int count;
+
 if ( !has_vpci(d) )
 return 0;
 
@@ -150,7 +152,17 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct 
domain *d)
  * For guests each host bridge requires one region to cover the
  * configuration space. At the moment, we only expose a single host bridge.
  */
-return 1;
+count = 1;
+
+/*
+ * There's a single MSI-X MMIO handler that deals with both PBA
+ * and MSI-X tables per each PCI device being passed through.
+ * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
+ */
+if ( IS_ENABLED(CONFIG_HAS_PCI_MSI) )
+count += VPCI_MAX_VIRT_DEV;
+
+return count;
 }
 
 /*
-- 
2.42.0

[PATCH v11 09/17] rangeset: add rangeset_empty() function

This function can be used when user wants to remove all rangeset
entries but do not want to destroy rangeset itself.

Signed-off-by: Volodymyr Babchuk 

---
Changes in v11:
 - Now the function only empties rangeset, without removing it from
   domain's list

Changes in v10:
 - New in v10. The function is used in "vpci/header: handle p2m range sets per 
BAR"
---
 xen/common/rangeset.c  | 16 
 xen/include/xen/rangeset.h |  3 ++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 0ccd53caac..d0c525cb50 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -448,11 +448,20 @@ struct rangeset *rangeset_new(
 return r;
 }
 
-void rangeset_destroy(
-struct rangeset *r)
+void rangeset_empty(struct rangeset *r)
 {
 struct range *x;
 
+if ( r == NULL )
+return;
+
+while ( (x = first_range(r)) != NULL )
+destroy_range(r, x);
+}
+
+void rangeset_destroy(
+struct rangeset *r)
+{
 if ( r == NULL )
 return;
 
@@ -463,8 +472,7 @@ void rangeset_destroy(
 spin_unlock(>domain->rangesets_lock);
 }
 
-while ( (x = first_range(r)) != NULL )
-destroy_range(r, x);
+rangeset_empty(r);
 
 xfree(r);
 }
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 87bd956962..62cb67b49b 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -56,7 +56,7 @@ void rangeset_limit(
 bool __must_check rangeset_is_empty(
 const struct rangeset *r);
 
-/* Add/claim/remove/query a numeric range. */
+/* Add/claim/remove/query/empty a numeric range. */
 int __must_check rangeset_add_range(
 struct rangeset *r, unsigned long s, unsigned long e);
 int __must_check rangeset_claim_range(struct rangeset *r, unsigned long size,
@@ -70,6 +70,7 @@ bool __must_check rangeset_overlaps_range(
 int rangeset_report_ranges(
 struct rangeset *r, unsigned long s, unsigned long e,
 int (*cb)(unsigned long s, unsigned long e, void *data), void *ctxt);
+void rangeset_empty(struct rangeset *r);
 
 /*
  * Note that the consume function can return an error value apart from
-- 
2.42.0

[PATCH v11 05/17] vpci: add hooks for PCI device assign/de-assign

From: Oleksandr Andrushchenko 

When a PCI device gets assigned/de-assigned we need to
initialize/de-initialize vPCI state for the device.

Also, rename vpci_add_handlers() to vpci_assign_device() and
vpci_remove_device() to vpci_deassign_device() to better reflect role
of the functions.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---
In v11:
- Call vpci_assign_device() in "deassign_device" if IOMMU call
"reassign_device" was successful.
In v10:
- removed HAS_VPCI_GUEST_SUPPORT checks
- HAS_VPCI_GUEST_SUPPORT config option (in Kconfig) as it is not used
  anywhere
In v9:
- removed previous  vpci_[de]assign_device function and renamed
  existing handlers
- dropped attempts to handle errors in assign_device() function
- do not call vpci_assign_device for dom_io
- use d instead of pdev->domain
- use IS_ENABLED macro
In v8:
- removed vpci_deassign_device
In v6:
- do not pass struct domain to vpci_{assign|deassign}_device as
  pdev->domain can be used
- do not leave the device assigned (pdev->domain == new domain) in case
  vpci_assign_device fails: try to de-assign and if this also fails, then
  crash the domain
In v5:
- do not split code into run_vpci_init
- do not check for is_system_domain in vpci_{de}assign_device
- do not use vpci_remove_device_handlers_locked and re-allocate
  pdev->vpci completely
- make vpci_deassign_device void
In v4:
 - de-assign vPCI from the previous domain on device assignment
 - do not remove handlers in vpci_assign_device as those must not
   exist at that point
In v3:
 - remove toolstack roll-back description from the commit message
   as error are to be handled with proper cleanup in Xen itself
 - remove __must_check
 - remove redundant rc check while assigning devices
 - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
 - use REGISTER_VPCI_INIT machinery to run required steps on device
   init/assign: add run_vpci_init helper
In v2:
- define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
  for x86
In v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - extended the commit message
---
 xen/drivers/passthrough/pci.c | 24 
 xen/drivers/vpci/header.c |  2 +-
 xen/drivers/vpci/vpci.c   |  6 +++---
 xen/include/xen/vpci.h| 10 +-
 4 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 182da45acb..a3312fdab2 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -755,7 +755,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
  * For devices not discovered by Xen during boot, add vPCI handlers
  * when Dom0 first informs Xen about such devices.
  */
-ret = vpci_add_handlers(pdev);
+ret = vpci_assign_device(pdev);
 if ( ret )
 {
 list_del(>domain_list);
@@ -769,7 +769,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 if ( ret )
 {
 write_lock(_domain->pci_lock);
-vpci_remove_device(pdev);
+vpci_deassign_device(pdev);
 list_del(>domain_list);
 write_unlock(_domain->pci_lock);
 pdev->domain = NULL;
@@ -817,7 +817,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
 list_for_each_entry ( pdev, >alldevs_list, alldevs_list )
 if ( pdev->bus == bus && pdev->devfn == devfn )
 {
-vpci_remove_device(pdev);
+vpci_deassign_device(pdev);
 pci_cleanup_msi(pdev);
 ret = iommu_remove_device(pdev);
 if ( pdev->domain )
@@ -875,6 +875,10 @@ static int deassign_device(struct domain *d, uint16_t seg, 
uint8_t bus,
 goto out;
 }
 
+write_lock(>pci_lock);
+vpci_deassign_device(pdev);
+write_unlock(>pci_lock);
+
 devfn = pdev->devfn;
 ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn,
  pci_to_dev(pdev));
@@ -886,6 +890,10 @@ static int deassign_device(struct domain *d, uint16_t seg, 
uint8_t bus,
 
 pdev->fault.count = 0;
 
+write_lock(>pci_lock);
+ret = vpci_assign_device(pdev);
+write_unlock(>pci_lock);
+
  out:
 if ( ret )
 printk(XENLOG_G_ERR "%pd: deassign (%pp) failed (%d)\n",
@@ -1146,7 +1154,7 @@ static void __hwdom_init setup_one_hwdom_device(const 
struct setup_hwdom *ctxt,
   PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
 
 write_lock(>d->pci_lock);
-err = vpci_add_handlers(pdev);
+err = vpci_assign_device(pdev);
 write_unlock(>d->pci_lock);
 if ( err )
 printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n",
@@ -1476,6 +1484,10 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn, u32 flag)
 if ( pdev->broken && d != hardware_domain && d != dom_io )
 goto done;
 
+write_lock(>domain->pci_lock);
+vpci_deassign_device(pdev);
+

[PATCH v11 10/17] vpci/header: handle p2m range sets per BAR

From: Oleksandr Andrushchenko 

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.
As the range sets are now created when a PCI device is added and destroyed
when it is removed so make them named and accounted.

Note that rangesets were chosen here despite there being only up to
3 separate ranges in each set (typically just 1). But rangeset per BAR
was chosen for the ease of implementation and existing code re-usability.

Also note that error handling of vpci_process_pending() is slightly
modified, and that vPCI handlers are no longer removed if the creation
of the mappings in vpci_process_pending() fails, as that's unlikely to
lead to a functional device in any case.

This is in preparation of making non-identity mappings in p2m for the MMIOs.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Reviewed-by: Roger Pau Monné 

---
In v11:
- Modified commit message to note changes in error handling in
vpci_process_pending()
- Removed redundant ASSERT() in defer_map. There is no reason to
introduce it in this patch and there is no other patch where
introducing that ASSERT() was appropriate.
- Fixed formatting
- vpci_process_pending() clears v->vpci.pdev if it failed
  checks at the beginning
- Added Roger's R-B tag
In v10:
- Added additional checks to vpci_process_pending()
- vpci_process_pending() now clears rangeset in case of failure
- Fixed locks in vpci_process_pending()
- Fixed coding style issues
- Fixed error handling in init_bars
In v9:
- removed d->vpci.map_pending in favor of checking v->vpci.pdev !=
NULL
- printk -> gprintk
- renamed bar variable to fix shadowing
- fixed bug with iterating on remote device's BARs
- relaxed lock in vpci_process_pending
- removed stale comment
Since v6:
- update according to the new locking scheme
- remove odd fail label in modify_bars
Since v5:
- fix comments
- move rangeset allocation to init_bars and only allocate
  for MAPPABLE BARs
- check for overlap with the already setup BAR ranges
Since v4:
- use named range sets for BARs (Jan)
- changes required by the new locking scheme
- updated commit message (Jan)
Since v3:
- re-work vpci_cancel_pending accordingly to the per-BAR handling
- s/num_mem_ranges/map_pending and s/uint8_t/bool
- ASSERT(bar->mem) in modify_bars
- create and destroy the rangesets on add/remove
---
 xen/drivers/vpci/header.c | 257 ++
 xen/drivers/vpci/vpci.c   |   6 +
 xen/include/xen/vpci.h|   2 +-
 3 files changed, 185 insertions(+), 80 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 43216429d9..7c84cee5d1 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -161,63 +161,107 @@ static void modify_decoding(const struct pci_dev *pdev, 
uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-if ( v->vpci.mem )
+struct pci_dev *pdev = v->vpci.pdev;
+struct map_data data = {
+.d = v->domain,
+.map = v->vpci.cmd & PCI_COMMAND_MEMORY,
+};
+struct vpci_header *header = NULL;
+unsigned int i;
+
+if ( !pdev )
+return false;
+
+read_lock(>domain->pci_lock);
+
+if ( !pdev->vpci || (v->domain != pdev->domain) )
+{
+v->vpci.pdev = NULL;
+read_unlock(>domain->pci_lock);
+return false;
+}
+
+header = >vpci->header;
+for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
 {
-struct map_data data = {
-.d = v->domain,
-.map = v->vpci.cmd & PCI_COMMAND_MEMORY,
-};
-int rc = rangeset_consume_ranges(v->vpci.mem, map_range, );
+struct vpci_bar *bar = >bars[i];
+int rc;
+
+if ( rangeset_is_empty(bar->mem) )
+continue;
+
+rc = rangeset_consume_ranges(bar->mem, map_range, );
 
 if ( rc == -ERESTART )
+{
+read_unlock(>domain->pci_lock);
 return true;
+}
 
-write_lock(>domain->pci_lock);
-spin_lock(>vpci.pdev->vpci->lock);
-/* Disable memory decoding unconditionally on failure. */
-modify_decoding(v->vpci.pdev,
-rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-!rc && v->vpci.rom_only);
-spin_unlock(>vpci.pdev->vpci->lock);
-
-rangeset_destroy(v->vpci.mem);
-v->vpci.mem = NULL;
 if ( rc )
-/*
- * FIXME: in case of failure remove the device from the domain.
- * Note that there might still be leftover mappings. While this is
- * safe for Dom0, for DomUs the domain will likely need to be
- * killed in order to avoid leaking stale p2m mappings on
- * failure.
- */
-vpci_deassign_device(v->vpci.pdev);
-write_unlock(>domain->pci_lock);
+{
+spin_lock(>vpci->lock);
+/* Disable memory decoding

[PATCH v11 04/17] vpci: restrict unhandled read/write operations for guests

From: Oleksandr Andrushchenko 

A guest would be able to read and write those registers which are not
emulated and have no respective vPCI handlers, so it will be possible
for it to access the hardware directly.
In order to prevent a guest from reads and writes from/to the unhandled
registers make sure only hardware domain can access the hardware directly
and restrict guests from doing so.

Suggested-by: Roger Pau Monné 
Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Reviewed-by: Roger Pau Monné 

---
Since v9:
- removed stray formatting change
- added Roger's R-b tag
Since v6:
- do not use is_hwdom parameter for vpci_{read|write}_hw and use
  current->domain internally
- update commit message
New in v6
---
 xen/drivers/vpci/vpci.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 0b694beadf..4fec4b26d9 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -233,6 +233,10 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int 
reg,
 {
 uint32_t data;
 
+/* Guest domains are not allowed to read real hardware. */
+if ( !is_hardware_domain(current->domain) )
+return ~(uint32_t)0;
+
 switch ( size )
 {
 case 4:
@@ -276,6 +280,10 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int 
reg,
 static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
   uint32_t data)
 {
+/* Guest domains are not allowed to write real hardware. */
+if ( !is_hardware_domain(current->domain) )
+return;
+
 switch ( size )
 {
 case 4:
-- 
2.42.0

[PATCH v11 12/17] vpci/header: emulate PCI_COMMAND register for guests

From: Oleksandr Andrushchenko 

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's view of this will want to be zero initially, the host having set
it to 1 may not easily be overwritten with 0, or else we'd effectively
imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the guest's view
of the command register.

Here is the full list of command register bits with notes about
emulation, along with QEMU behavior in the same situation:

PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
in real device. Instead it is always set to 1. A guest can write to this
register, but writes are ignored.

PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
Xen case, we handle writes to this bit by mapping/unmapping BAR
regions. For devices assigned to DomUs, memory decoding will be
disabled at the initialization.

PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
writes to this bit.

PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
access to host bridge that supports software generation of special
cycles. In our case guest has no access to host bridges at all. Value
after reset is 0. QEMU passes through writes of this bit, we will do
the same.

PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
to be generated. It requires additional configuration via Cacheline
Size register. We are not emulating this register right now and we
can't expect guest to properly configure it. QEMU "emulates" access to
Cachline Size register by ignoring all writes to it. QEMU passes through
writes of PCI_COMMAND_INVALIDATE bit, we will do the same.

PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
through writes of this bit, we will do the same.

PCI_COMMAND_PARITY - Controls how device response to parity
errors. QEMU ignores writes to this bit, we will do the same.

PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
through writes of this bit, so we will do the same.

PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
writes to this bit, we will do the same.

PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
transactions. It is configured by firmware, so we don't want guest to
control it. QEMU ignores writes to this bit, we will do the same.

PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
enabled, device is prohibited from asserting INTx as per
specification. Value after reset is 0. In QEMU case, it checks of INTx
was mapped for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
change value of this bit if MSI(X) is enabled.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
---

It is better to rework this patch using new register handling tools
that Stewart Hildenbrand upstreaming right now.

In v11:
- Fix copy-paste mistake: vpci->msi should be vpci->msix
- Handle PCI_COMMAND_IO
- Fix condition for disabling INTx in the MSI-X code
- Show domU changes to only allowed bits
- Show PCI_COMMAND_MEMORY write only after P2M was altered
- Update comments in the code
In v10:
- Added cf_check attribute to guest_cmd_read
- Removed warning about non-zero cmd
- Updated comment MSI code regarding disabling INTX
- Used ternary operator in vpci_add_register() call
- Disable memory decoding for DomUs in init_bars()
In v9:
- Reworked guest_cmd_read
- Added handling for more bits
Since v6:
- fold guest's logic into cmd_write
- implement cmd_read, so we can report emulated INTx state to guests
- introduce header->guest_cmd to hold the emulated state of the
  PCI_COMMAND register for guests
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c | 53 ---
 xen/drivers/vpci/msi.c|  6 +
 xen/drivers/vpci/msix.c   |  5 
 xen/include/xen/vpci.h|  3 +++
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 21b3fb5579..bc2ebe125b 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -167,6 +167,9 @@ static void modify_decoding(const struct pci_dev *pdev, 
uint16_t cmd,
 if ( !rom_only )
 {
 pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+/* Show DomU that we updated P2M */
+header->guest_cmd &= ~PCI_COMMAND_MEMORY;
+header->guest_cmd = (val &

[xen-unstable test] 183965: tolerable FAIL - PUSHED

flight 183965 xen-unstable real [real]
flight 183970 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/183965/
http://logs.test-lab.xenproject.org/osstest/logs/183970/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemut-rhel6hvm-amd 14 guest-start/redhat.repeat fail pass in 
183970-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183952
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183959
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183959
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183959
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183959
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183959
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183959
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183959
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183959
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183959
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183959
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183959
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1571ff7a987b88b20598a6d49910457f3b2c59f1
baseline version:
 xen  f0dd0cd9598f22ee5509bb5d1466e4821834c4ba

Last test of basis   183959  2023-12-01

Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator


Hi,

On 01/12/2023 00:01, Timothy Pearson wrote:



- Original Message -

From: "Julien Grall" 
To: "Timothy Pearson" 
Cc: "Shawn Anastasio" , "xen-devel" 
, "Jan Beulich"
, "Daniel P. Smith" , "Stefano 
Stabellini" ,
"Bertrand Marquis" , "Michal Orzel" , 
"Oleksii"

Sent: Friday, December 1, 2023 4:35:55 PM
Subject: Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator



Hi,

On 01/12/2023 22:10, tpear...@raptorengineering.com wrote:

(+ Arm and RISC-V folks)

Hi Shawn,

On 01/12/2023 20:59, Shawn Anastasio wrote:

Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
allocator. Routines for parsing arm-specific devicetree nodes (e.g.
multiboot) were excluded, reducing the overall footprint of code that
was copied.


I expect RISC-V to want similar code. I am not really thrilled in the
idea of having 3 similar copy of the parsing. So can we extract the
common bits (or harmonize it) so it can be shared?

Maybe Oleksii has already a version doing that.


Just my $0.02, but wouldn't it make more sense to have the RISC-V port
handle the deduplication, seeing as the POWER support came first here?  We
don't know if/when the RISC-V port will be ready for submission, so I'm
not sure why we should be on the hook for this particular work.


That would have been a valid point if you were writing a brand new
implementation. But this was *copied* from Arm.

Looking at the diff between arm/bootfdt.c and ppc/bootfdt.c, you seem to
have:
- As well copied some code from arm/setup.c
- Re-order some statement (not clear why)
- Remove some code which you say are Arm specific. Yet some is part
of the Device-Tree spec and I would expect to be used in the future.

So my question here stands. Why are you mainly copying verbatimly the
Arm code rather than consolidating in one place?


That's fair, with the future RISC-V port removed from the discussion and good 
reasons still being put forth it makes more sense to deduplicate now.  Thank 
you for clarifying the objection! :)


I have had a brief look at the differences. I moved some of the 
functions to bootfdt.c in order to match PPC. Below the diff after that.


Leaving aside some of the clean-up, it sounds like you:
* removed BOOTMOD_{RAMDISK, XSM...}. So how do you plan to pass XSM blob 
and ramdisk in the future?
* split long prink-messages. For Xen, we keep them in one line even if 
it is over 80 characters to facilite grepping.
* Remove device_tree_node_is_available(), I believe you still need it 
because the property is not Arm specific.
* Remove process_multiboot(), how do you plan to handle dom0less domain 
in the future?

* Likewise for xen,static-mem and boot_fdt_cmdline()?
* fdt_get_mem_rsv_paddr(), this is part of the DT is used to reserve 
memory. This was superseed to /reserved-memory, but I wonder how 
widespread this is on PPC?
* You are removing the sorting of the memory bank. We had to do the 
sorting on Arm because some DT didn't have the banks sorted and this 
helped will the logic memory subystem logic. I can understand if you 
don't need it, but it would not hurt.
* If am not mistaken you are adding the Xen module as BOOTMOD_KERNEL, 
however this is meant to be used for the domain kernel. Xen should be 
BOOTMOD_XEN.


Overall, it feels to me that nearly (if not all) bootfdt.c can be moved 
to common (maybe under a new directory common/device-tree?) and #ifdef 
bits that are dom0less specific (we now have a Kconfig for that). You 
can do the clean-up as well, but they would belong to separate patches.


I hope that helps.

Cheers,

--- ../arm/bootfdt.c2023-12-01 23:02:43.501050219 +
+++ bootfdt.c   2023-12-01 22:26:47.719734253 +
@@ -1,9 +1,12 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Early Device Tree
+ * Early Device Tree and boot info bookkeeping.
+ * Derived from arch/arm/bootfdt.c and setup.c.
  *
  * Copyright (C) 2012-2014 Citrix Systems, Inc.
+ * Copyright (C) Raptor Engineering LLC
  */
+
 #include 
 #include 
 #include 
@@ -15,7 +18,8 @@
 #include 
 #include 
 #include 
-#include 
+
+struct bootinfo __initdata bootinfo;

 struct bootmodule __init *add_boot_module(bootmodule_kind kind,
   paddr_t start, paddr_t size,
@@ -62,10 +66,6 @@
 case BOOTMOD_XEN: return "Xen";
 case BOOTMOD_FDT: return "Device Tree";
 case BOOTMOD_KERNEL:  return "Kernel";
-case BOOTMOD_RAMDISK: return "Ramdisk";
-case BOOTMOD_XSM: return "XSM";
-case BOOTMOD_GUEST_DTB: return "DTB";
-case BOOTMOD_UNKNOWN: return "Unknown";
 default: BUG();
 }
 }
@@ -91,8 +91,9 @@
 continue;
 else
 {
-printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping 
with mod[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n",

-   region_start, region_end, i, mod_start, mod_end);
+printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping with"
+

Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-12-01 Thread George Dunlap

On Fri, Dec 1, 2023 at 9:44 PM Stefano Stabellini
 wrote:
> By the informal
> voting, we have 3 against "broken" and 2 in favor (not 1 as George wrote
> as Andrew's vote counts too).

Just to clarify: The opinions on that thread (if you include all
versions of the series) were:

Andy, Daniel for keeping "broken
Julien, Jan, Stefano, George: for changing "broken"

That's the "2 (+) / 4 split" I referred to (The "(+)" being the people
who agreed with Andy in private).  Regarding voting, I was only
counting the maintainers of the code in question; it coming under THE
REST, that would include everyone except Daniel; hence 1 - 4.  Not at
all that Daniel's opinion doesn't matter, but that from a governance
perspective, it's the maintainers (and then the committers) who get
votes in the case of a formal escalation.

 -George

Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator

2023-12-01 Thread Timothy Pearson




- Original Message -
> From: "Julien Grall" 
> To: "Timothy Pearson" 
> Cc: "Shawn Anastasio" , "xen-devel" 
> , "Jan Beulich"
> , "Daniel P. Smith" , 
> "Stefano Stabellini" ,
> "Bertrand Marquis" , "Michal Orzel" 
> , "Oleksii"
> 
> Sent: Friday, December 1, 2023 4:35:55 PM
> Subject: Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator

> Hi,
> 
> On 01/12/2023 22:10, tpear...@raptorengineering.com wrote:
>>> (+ Arm and RISC-V folks)
>>>
>>> Hi Shawn,
>>>
>>> On 01/12/2023 20:59, Shawn Anastasio wrote:
 Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
 allocator. Routines for parsing arm-specific devicetree nodes (e.g.
 multiboot) were excluded, reducing the overall footprint of code that
 was copied.
>>>
>>> I expect RISC-V to want similar code. I am not really thrilled in the
>>> idea of having 3 similar copy of the parsing. So can we extract the
>>> common bits (or harmonize it) so it can be shared?
>>>
>>> Maybe Oleksii has already a version doing that.
>> 
>> Just my $0.02, but wouldn't it make more sense to have the RISC-V port
>> handle the deduplication, seeing as the POWER support came first here?  We
>> don't know if/when the RISC-V port will be ready for submission, so I'm
>> not sure why we should be on the hook for this particular work.
> 
> That would have been a valid point if you were writing a brand new
> implementation. But this was *copied* from Arm.
> 
> Looking at the diff between arm/bootfdt.c and ppc/bootfdt.c, you seem to
> have:
>- As well copied some code from arm/setup.c
>- Re-order some statement (not clear why)
>- Remove some code which you say are Arm specific. Yet some is part
> of the Device-Tree spec and I would expect to be used in the future.
> 
> So my question here stands. Why are you mainly copying verbatimly the
> Arm code rather than consolidating in one place?

That's fair, with the future RISC-V port removed from the discussion and good 
reasons still being put forth it makes more sense to deduplicate now.  Thank 
you for clarifying the objection! :)

Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator


Hi,

On 01/12/2023 22:10, tpear...@raptorengineering.com wrote:

(+ Arm and RISC-V folks)

Hi Shawn,

On 01/12/2023 20:59, Shawn Anastasio wrote:

Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
allocator. Routines for parsing arm-specific devicetree nodes (e.g.
multiboot) were excluded, reducing the overall footprint of code that
was copied.


I expect RISC-V to want similar code. I am not really thrilled in the
idea of having 3 similar copy of the parsing. So can we extract the
common bits (or harmonize it) so it can be shared?

Maybe Oleksii has already a version doing that.


Just my $0.02, but wouldn't it make more sense to have the RISC-V port
handle the deduplication, seeing as the POWER support came first here?  We
don't know if/when the RISC-V port will be ready for submission, so I'm
not sure why we should be on the hook for this particular work.


That would have been a valid point if you were writing a brand new 
implementation. But this was *copied* from Arm.


Looking at the diff between arm/bootfdt.c and ppc/bootfdt.c, you seem to 
have:

   - As well copied some code from arm/setup.c
   - Re-order some statement (not clear why)
   - Remove some code which you say are Arm specific. Yet some is part 
of the Device-Tree spec and I would expect to be used in the future.


So my question here stands. Why are you mainly copying verbatimly the 
Arm code rather than consolidating in one place?


Cheers,

--
Julien Grall

[xen-unstable-smoke test] 183968: tolerable all pass - PUSHED

flight 183968 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183968/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  525c7c094b258e8a46b494488eef96f5670eb352
baseline version:
 xen  1571ff7a987b88b20598a6d49910457f3b2c59f1

Last test of basis   183963  2023-12-01 10:02:10 Z0 days
Testing same since   183968  2023-12-01 20:00:25 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Michal Orzel 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   1571ff7a98..525c7c094b  525c7c094b258e8a46b494488eef96f5670eb352 -> smoke

Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

Replying here on a couple of different people on this thread.

On Thu, 30 Nov 2023, Tamas K Lengyel wrote:
> I think this form is bad and is not helpful. 

I agree with Tamas and (also Jan) that this form is not helpful.

On Fri, 1 Dec 2023, George Dunlap wrote:
> If most people in the community really do think that "broken" is
> suitable for the documentation in our project, then of course the
> maintainers should stop objecting to that kind of language.  If most
> of the people in the community think that "broken" is *not* suitable
> for technical documentation, then of course this isn't an example of
> unreasonable review (although other instances may be).

I think there was a misconception when Kelly created this form that the
push back was on the usage of the word "broken" globally in Xen Project.
It is not the case.

I for example agree that "broken" can be used in Xen Project, but I
don't think that it is a good idea to use it in that specific instance.

On Fri, 1 Dec 2023, George Dunlap wrote:
> [Andy] removing "broken" is a completely unreasonable request

I am in favor on moving faster and nitpicking less. Also, Andy put the
effort to produce the patch so he should have the default choice in the
wording. If the choice is taking the patch as is or rejecting it, I
would take it as is.

I might have a preference on a different wording and I voiced that
preference. We could say that my request should have been optional, not
mandatory. But when the majority of reviewers request the same thing,
which wording choice should apply?

If we decide to ignore the feedback as unresonable or because it should
have been all optional and commit the patch, what would stop anyone from
sending a patch to "fix" the wording in the comments to use "deprecated"
instead? And if someone pushes back on the second patch, would that be
nitpicking? If we commit the second patch, what if someone send a third
patch changing the wording back to "broken"? We risk getting into
"commit wars".

To avoid this, I think we should go with the majority, whatever that is,
and the decision has to stick. We have just introduced informal votes.
We should have just used that in the original thread. By the informal
voting, we have 3 against "broken" and 2 in favor (not 1 as George wrote
as Andrew's vote counts too). 

The easiest would have been to go with the majority, resend the patch,
get it committed. If Andrew feels strongly that the "broken" is the best
wording, then a proper voting form is a good idea, like Kelly did (which
I think is a full formal vote, not an informal vote). Except that the
form Kelly created is too generic and has too few options.

In conclusion, I don't care about the wording. I do care that we reach
alignment and move forward quicker. I think the informal voting
mechanism is the best way to do it.

[linux-linus test] 183961: tolerable FAIL - PUSHED

flight 183961 linux-linus real [real]
flight 183967 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/183961/
http://logs.test-lab.xenproject.org/osstest/logs/183967/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-freebsd11-amd64 18 guest-saverestore.2 fail pass in 
183967-retest
 test-armhf-armhf-xl-arndale  10 host-ping-check-xen fail pass in 183967-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale 15 migrate-support-check fail in 183967 never pass
 test-armhf-armhf-xl-arndale 16 saverestore-support-check fail in 183967 never 
pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183957
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183957
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 183957
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183957
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183957
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183957
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183957
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183957
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux994d5c58e50e91bb02c7be4a91d5186292a895c8
baseline version:
 linux6172a5180fcc65170bfa2d49e55427567860f2a7

Last test of basis   183957  2023-11-30 23:42:23 Z0 days
Testing same since   183961  2023-12-01 08:05:10 Z0 days1 attempts


People who touched revisions under test:
  "Gustavo A. R. Silva" 
  Arnaldo Carvalho de Melo 
  Bill Wendling 
  Dmitry Antipov 
  Gustavo A. R. Silva 
  Kees Cook 
  Linus Torvalds 
  Masahiro Yamada 
  Maxime Ripard 
  Michal Wajdeczko 
  Miguel Ojeda 
  Namhyung Kim 
  Nathan Chancellor 
  ndesaulni...@google.com 
  Nick Desaulniers 
  Oliver Upton 
  Richard Fitzgerald 
  Shuah Khan 
  Yang Jihong 

jobs:

Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator

2023-12-01 Thread tpearson

> (+ Arm and RISC-V folks)
>
> Hi Shawn,
>
> On 01/12/2023 20:59, Shawn Anastasio wrote:
>> Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
>> allocator. Routines for parsing arm-specific devicetree nodes (e.g.
>> multiboot) were excluded, reducing the overall footprint of code that
>> was copied.
>
> I expect RISC-V to want similar code. I am not really thrilled in the
> idea of having 3 similar copy of the parsing. So can we extract the
> common bits (or harmonize it) so it can be shared?
>
> Maybe Oleksii has already a version doing that.

Just my $0.02, but wouldn't it make more sense to have the RISC-V port
handle the deduplication, seeing as the POWER support came first here?  We
don't know if/when the RISC-V port will be ready for submission, so I'm
not sure why we should be on the hook for this particular work.

Thanks!

Re: [PATCH 1/3] xen/ppc: Enable Boot Allocator


(+ Arm and RISC-V folks)

Hi Shawn,

On 01/12/2023 20:59, Shawn Anastasio wrote:

Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
allocator. Routines for parsing arm-specific devicetree nodes (e.g.
multiboot) were excluded, reducing the overall footprint of code that
was copied.


I expect RISC-V to want similar code. I am not really thrilled in the 
idea of having 3 similar copy of the parsing. So can we extract the 
common bits (or harmonize it) so it can be shared?


Maybe Oleksii has already a version doing that.

Cheers,

--
Julien Grall

[PATCH v5 0/7] Introduce generic headers

Some headers are common between several architectures, so the current patch 
series
provide them.

Another one reason to have them as generic is a simplification of adding support
necessary to make a complete Xen build as it was/is being done in the patch 
series [1]
and [2].

Also, instead of providing generic/stub headers, it was used
"#ifdef CONFIG_* #include  #endif" instead of providing empty headers.

This patch series is a pre-requisite for "Enable build of full Xen for RISC-V" 
[3].

[1] 
https://lore.kernel.org/xen-devel/cover.1694543103.git.sanasta...@raptorengineering.com/
[2] 
https://lore.kernel.org/xen-devel/cover.1692181079.git.oleksii.kuroc...@gmail.com/
[3] 
https://lore.kernel.org/xen-devel/cover.1700761381.git.oleksii.kuroc...@gmail.com/

---
Changes in V5:
 - Update the patch series message as patch related to delay.h was merged.
 - Rebase on top of staging because half of the patches of the patch series were
   merged to staging branch.
 - Add A-by for some of the patches.
 - Add "depends on X86 || Arm" for CONFIG_GRANT_TABLE and CONFIG_MEM_ACCESS to 
be
   sure it won't be turned on by randconfig in CI.
 - Partly switch Arm and PPC to asm-generic/monitor.h.
 - Some other minor changes
---
Changes in V4:
 - Update the cover letter message
 - Add Reviewed-by/Acked-by for patches:
[PATCH v3 01/14] xen/asm-generic: introduce stub header paging.h
[PATCH v3 03/14] xen/asm-generic: introduce generic hypercall.h
[PATCH v3 04/14] xen/asm-generic: introduce generic header iocap.h
[PATCH v3 05/14] xen/asm-generic: introduce stub header 
[PATCH v3 06/14] xen/asm-generic: introduce generic header percpu.h
[PATCH v3 07/14] xen/asm-generic: introduce generalized hardirq.h
[PATCH v3 08/14] xen/asm-generic: introduce generic div64.h header
[PATCH v3 09/14] xen/asm-generic: introduce generic header altp2m.h
[PATCH v3 10/14] xen/asm-generic: introduce stub header monitor.h
[PATCH v3 11/14] xen/asm-generic: introduce stub header numa.h
[PATCH v3 12/14] xen/asm-generic: introduce stub header softirq.h
 - Fix some code style and minor issues.
 - Use asm-generic version of device.h for Arm and PPC.
---
Changes in V3:
 - Update the commit message of the cover letter.
 - Drop the following patch as it can be arch-specific enough:
   * [PATCH v2 09/15] xen/asm-generic: introduce generic header smp.h
 - Drop correspondent arch specific headers and use asm-generic version of
   a header.
 - Back to the patch series patches:
   * xen: ifdef inclusion of  in 
   * xen/asm-generic: ifdef inclusion of 
---
Changes in V2:
 - Update the commit message of the cover letter.
 - Drop the following patches because they are arch-specific or was sent as a 
separate patch:
   - xen/asm-generic: introduce stub header event.h
 - xen/asm-generic: introduce stub header spinlock.h
 - [PATCH v1 03/29] xen/asm-generic: introduce stub header cpufeature.h
 - [PATCH v1 07/29] xen/asm-generic: introduce stub header 
guest_atomics.h
 - [PATCH v1 10/29] xen/asm-generic: introduce stub header iommu.h
 - [PATCH v1 12/29] xen/asm-generic: introduce stub header pci.h 
because separate patch was sent [5]
 - [PATCH v1 14/29] xen/asm-generic: introduce stub header setup.h
 - [PATCH v1 15/29] xen/asm-generic: introduce stub header xenoprof.h 
because of [3].
 - [PATCH v1 16/29] xen/asm-generic: introduce stub header flushtlb.h
 - [PATCH v1 22/29] xen/asm-generic: introduce stub header delay.h 
because of [3]
 - [PATCH v1 23/29] xen/asm-generic: introduce stub header domain.h
 - [PATCH v1 24/29] xen/asm-generic: introduce stub header 
guest_access.h
 - [PATCH v1 25/29] xen/asm-generic: introduce stub header irq.h ( 
probably not so generic as I expected, I'll back to it if it will be necessary 
in the future )
 - [PATCH v1 28/29] xen/asm-generic: introduce stub header p2m.h ( 
probably not so generic as I expected, I'll back to it if it will be necessary 
in the future )
 - For the rest of the patches please look at changes for each patch separately.
---

Oleksii Kurochko (7):
  xen/asm-generic: introduce generic div64.h header
  xen/asm-generic: introduce stub header monitor.h
  xen/asm-generic: introduce stub header numa.h
  xen/asm-generic: introduce stub header softirq.h
  xen: ifdef inclusion of  in 
  xen/asm-generic: ifdef inclusion of 
  xen/asm-generic: introduce generic device.h

 xen/arch/arm/device.c |  15 ++-
 xen/arch/arm/domain_build.c   |   3 +-
 xen/arch/arm/gic-v2.c |   4 +-
 xen/arch/arm/gic-v3.c |   6 +-
 xen/arch/arm/gic.c|   4 +-
 xen/arch/arm/include/asm/Makefile |   3 +
 xen/arch/arm/include/asm/div64.h  |   8 +-
 xen/arch/arm/include/asm/monitor.h|  28 +---
 xen/arch/arm/p2m.c|   1 +

[PATCH v5 2/7] xen/asm-generic: introduce stub header monitor.h

The header is shared between several archs so it is
moved to asm-generic.

Switch partly Arm and PPC to asm-generic/monitor.h and only
arch_monitor_get_capabilities() left in arch-specific/monitor.h.

Signed-off-by: Oleksii Kurochko 
Acked-by: Jan Beulich 
---
Changes in V5:
  - Switched partly Arm and PPC to asm-generic monitor.h only
arch_monitor_get_capabilities() left in arch-specific/monitor.h.
  - Updated the commit message.
---
Changes in V4:
 - Removed the double blank line.
 - Added Acked-by: Jan Beulich .
 - Update the commit message
---
Changes in V3:
 - Use forward-declaration of struct domain instead of " #include  
".
 - Add ' include  '
 - Drop PPC's monitor.h.
---
Changes in V2:
- remove inclusion of "+#include "
- add "struct xen_domctl_monitor_op;"
- remove one of SPDX tags.
---
 xen/arch/arm/include/asm/monitor.h | 28 +--
 xen/arch/ppc/include/asm/monitor.h | 28 +--
 xen/include/asm-generic/monitor.h  | 57 ++
 3 files changed, 59 insertions(+), 54 deletions(-)
 create mode 100644 xen/include/asm-generic/monitor.h

diff --git a/xen/arch/arm/include/asm/monitor.h 
b/xen/arch/arm/include/asm/monitor.h
index 7567be66bd..045217c310 100644
--- a/xen/arch/arm/include/asm/monitor.h
+++ b/xen/arch/arm/include/asm/monitor.h
@@ -25,33 +25,7 @@
 #include 
 #include 
 
-static inline
-void arch_monitor_allow_userspace(struct domain *d, bool allow_userspace)
-{
-}
-
-static inline
-int arch_monitor_domctl_op(struct domain *d, struct xen_domctl_monitor_op *mop)
-{
-/* No arch-specific monitor ops on ARM. */
-return -EOPNOTSUPP;
-}
-
-int arch_monitor_domctl_event(struct domain *d,
-  struct xen_domctl_monitor_op *mop);
-
-static inline
-int arch_monitor_init_domain(struct domain *d)
-{
-/* No arch-specific domain initialization on ARM. */
-return 0;
-}
-
-static inline
-void arch_monitor_cleanup_domain(struct domain *d)
-{
-/* No arch-specific domain cleanup on ARM. */
-}
+#include 
 
 static inline uint32_t arch_monitor_get_capabilities(struct domain *d)
 {
diff --git a/xen/arch/ppc/include/asm/monitor.h 
b/xen/arch/ppc/include/asm/monitor.h
index e5b0282bf1..89000dacc6 100644
--- a/xen/arch/ppc/include/asm/monitor.h
+++ b/xen/arch/ppc/include/asm/monitor.h
@@ -6,33 +6,7 @@
 #include 
 #include 
 
-static inline
-void arch_monitor_allow_userspace(struct domain *d, bool allow_userspace)
-{
-}
-
-static inline
-int arch_monitor_domctl_op(struct domain *d, struct xen_domctl_monitor_op *mop)
-{
-/* No arch-specific monitor ops on PPC. */
-return -EOPNOTSUPP;
-}
-
-int arch_monitor_domctl_event(struct domain *d,
-  struct xen_domctl_monitor_op *mop);
-
-static inline
-int arch_monitor_init_domain(struct domain *d)
-{
-/* No arch-specific domain initialization on PPC. */
-return 0;
-}
-
-static inline
-void arch_monitor_cleanup_domain(struct domain *d)
-{
-/* No arch-specific domain cleanup on PPC. */
-}
+#include 
 
 static inline uint32_t arch_monitor_get_capabilities(struct domain *d)
 {
diff --git a/xen/include/asm-generic/monitor.h 
b/xen/include/asm-generic/monitor.h
new file mode 100644
index 00..74e4870cd7
--- /dev/null
+++ b/xen/include/asm-generic/monitor.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * include/asm-generic/monitor.h
+ *
+ * Arch-specific monitor_op domctl handler.
+ *
+ * Copyright (c) 2015 Tamas K Lengyel (ta...@tklengyel.com)
+ * Copyright (c) 2016, Bitdefender S.R.L.
+ *
+ */
+
+#ifndef __ASM_GENERIC_MONITOR_H__
+#define __ASM_GENERIC_MONITOR_H__
+
+#include 
+
+struct domain;
+struct xen_domctl_monitor_op;
+
+static inline
+void arch_monitor_allow_userspace(struct domain *d, bool allow_userspace)
+{
+}
+
+static inline
+int arch_monitor_domctl_op(struct domain *d, struct xen_domctl_monitor_op *mop)
+{
+/* No arch-specific monitor ops on GENERIC. */
+return -EOPNOTSUPP;
+}
+
+int arch_monitor_domctl_event(struct domain *d,
+  struct xen_domctl_monitor_op *mop);
+
+static inline
+int arch_monitor_init_domain(struct domain *d)
+{
+/* No arch-specific domain initialization on GENERIC. */
+return 0;
+}
+
+static inline
+void arch_monitor_cleanup_domain(struct domain *d)
+{
+/* No arch-specific domain cleanup on GENERIC. */
+}
+
+#endif /* __ASM_GENERIC_MONITOR_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: BSD
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.43.0

[PATCH v5 6/7] xen/asm-generic: ifdef inclusion of

ifdefing inclusion of  in 
allows to avoid generation of empty  header
for the case when !CONFIG_MEM_ACCESS.

For Arm it was explicitly added inclusion of  for p2m.c
and traps.c because they require some functions from  which
aren't available in case of !CONFIG_MEM_ACCESS.

Suggested-by: Jan Beulich 
Signed-off-by: Oleksii Kurochko 
---
Changes in V5:
 - Added dependencies for "Config MEM_ACCESS" to be sure that randconfig will 
not
   turn on the config.
---
Changes in V4:
 - Nothing changed. Only rebase.
---
Changes in V3:
 - Remove unnecessary comment.
---
 xen/arch/arm/p2m.c| 1 +
 xen/arch/arm/traps.c  | 1 +
 xen/arch/ppc/include/asm/mem_access.h | 5 -
 xen/common/Kconfig| 2 +-
 xen/include/xen/mem_access.h  | 2 ++
 5 files changed, 5 insertions(+), 6 deletions(-)
 delete mode 100644 xen/arch/ppc/include/asm/mem_access.h

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index b991b76ce4..2465c266e9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 3784e8276e..37a457f4b1 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/xen/arch/ppc/include/asm/mem_access.h 
b/xen/arch/ppc/include/asm/mem_access.h
deleted file mode 100644
index e7986dfdbd..00
--- a/xen/arch/ppc/include/asm/mem_access.h
+++ /dev/null
@@ -1,5 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef __ASM_PPC_MEM_ACCESS_H__
-#define __ASM_PPC_MEM_ACCESS_H__
-
-#endif /* __ASM_PPC_MEM_ACCESS_H__ */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 13e26ca06f..d84e395a0b 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -87,7 +87,7 @@ config MEM_ACCESS_ALWAYS_ON
 config MEM_ACCESS
def_bool MEM_ACCESS_ALWAYS_ON
prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON
-   depends on HVM
+   depends on HVM && (ARM || X86)
---help---
 
  Framework to configure memory access types for guests and receive
diff --git a/xen/include/xen/mem_access.h b/xen/include/xen/mem_access.h
index 4e4811680d..87d93b31f6 100644
--- a/xen/include/xen/mem_access.h
+++ b/xen/include/xen/mem_access.h
@@ -33,7 +33,9 @@
  */
 struct vm_event_st;
 
+#ifdef CONFIG_MEM_ACCESS
 #include 
+#endif
 
 /*
  * Additional access types, which are used to further restrict
-- 
2.43.0

[PATCH v5 7/7] xen/asm-generic: introduce generic device.h

Arm, PPC and RISC-V use the same device.h thereby device.h
was moved to asm-generic. Arm's device.h was taken as a base with
the following changes:
 - #ifdef PCI related things.
 - #ifdef ACPI related things.
 - Rename #ifdef guards.
 - Add SPDX tag.
 - #ifdef CONFIG_HAS_DEVICE_TREE related things.
 - #ifdef-ing iommu related things with CONFIG_HAS_PASSTHROUGH.

Also Arm and PPC are switched to asm-generic version of device.h

Signed-off-by: Oleksii Kurochko 
---

 Jan wrote the following:
   Overall I think there are too many changes done all in one go here.
   But it's mostly Arm which is affected, so I'll leave judging about that
   to the Arm maintainers.
 
 Arm maintainers will it be fine for you to not split the patch?

---
Changes in V5:
  - Removed generated file: xen/include/headers++.chk.new
  - Removed pointless #ifdef CONFIG_HAS_DEVICE_TREE ... #endif for PPC as
CONFIG_HAS_DEVICE_TREE will be always used for PPC.
---
Changes in V4:
 - Updated the commit message
 - Switched Arm and PPC to asm-generic version of device.h
 - Replaced HAS_PCI with CONFIG_HAS_PCI
 - ifdef-ing iommu filed of dev_archdata struct with CONFIG_HAS_PASSTHROUGH
 - ifdef-ing iommu_fwspec of device struct with CONFIG_HAS_PASSTHROUGH
 - ifdef-ing DT related things with CONFIG_HAS_DEVICE_TREE
 - Updated the commit message ( remove a note with question about
   if device.h should be in asm-generic or not )
 - Replaced DEVICE_IC with DEVICE_INTERRUPT_CONTROLLER
 - Rationalized usage of CONFIG_HAS_* in device.h
 - Fixed indents for ACPI_DEVICE_START and ACPI_DEVICE_END
---
Changes in V3:
 - ifdef device tree related things.
 - update the commit message
---
Changes in V2:
- take ( as common ) device.h from Arm as PPC and RISC-V use it as a 
base.
- #ifdef PCI related things.
- #ifdef ACPI related things.
- rename DEVICE_GIC to DEVIC_IC.
- rename #ifdef guards.
- switch Arm and PPC to generic device.h
- add SPDX tag
- update the commit message

---
 xen/arch/arm/device.c |  15 ++-
 xen/arch/arm/domain_build.c   |   2 +-
 xen/arch/arm/gic-v2.c |   4 +-
 xen/arch/arm/gic-v3.c |   6 +-
 xen/arch/arm/gic.c|   4 +-
 xen/arch/arm/include/asm/Makefile |   1 +
 xen/arch/ppc/include/asm/Makefile |   1 +
 xen/arch/ppc/include/asm/device.h |  53 
 .../asm => include/asm-generic}/device.h  | 125 +++---
 9 files changed, 102 insertions(+), 109 deletions(-)
 delete mode 100644 xen/arch/ppc/include/asm/device.h
 rename xen/{arch/arm/include/asm => include/asm-generic}/device.h (79%)

diff --git a/xen/arch/arm/device.c b/xen/arch/arm/device.c
index 1f631d3274..affbe79f9a 100644
--- a/xen/arch/arm/device.c
+++ b/xen/arch/arm/device.c
@@ -16,7 +16,10 @@
 #include 
 
 extern const struct device_desc _sdevice[], _edevice[];
+
+#ifdef CONFIG_ACPI
 extern const struct acpi_device_desc _asdevice[], _aedevice[];
+#endif
 
 int __init device_init(struct dt_device_node *dev, enum device_class class,
const void *data)
@@ -45,6 +48,7 @@ int __init device_init(struct dt_device_node *dev, enum 
device_class class,
 return -EBADF;
 }
 
+#ifdef CONFIG_ACPI
 int __init acpi_device_init(enum device_class class, const void *data, int 
class_type)
 {
 const struct acpi_device_desc *desc;
@@ -61,6 +65,7 @@ int __init acpi_device_init(enum device_class class, const 
void *data, int class
 
 return -EBADF;
 }
+#endif
 
 enum device_class device_get_class(const struct dt_device_node *dev)
 {
@@ -329,9 +334,13 @@ int handle_device(struct domain *d, struct dt_device_node 
*dev, p2m_type_t p2mt,
 struct map_range_data mr_data = {
 .d = d,
 .p2mt = p2mt,
-.skip_mapping = !own_device ||
-(is_pci_passthrough_enabled() &&
-(device_get_class(dev) == DEVICE_PCI_HOSTBRIDGE)),
+.skip_mapping =
+!own_device
+#ifdef CONFIG_HAS_PCI
+|| (is_pci_passthrough_enabled() &&
+(device_get_class(dev) == DEVICE_PCI_HOSTBRIDGE))
+#endif
+,
 .iomem_ranges = iomem_ranges,
 .irq_ranges = irq_ranges
 };
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 28df515a3d..a0518993b1 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1651,7 +1651,7 @@ static int __init handle_node(struct domain *d, struct 
kernel_info *kinfo,
  * Replace these nodes with our own. Note that the original may be
  * used_by DOMID_XEN so this check comes first.
  */
-if ( device_get_class(node) == DEVICE_GIC )
+if ( device_get_class(node) == DEVICE_INTERRUPT_CONTROLLER )
 return make_gic_node(d, kinfo->fdt, node);
 if ( dt_match_node(timer_matches,

[PATCH v5 3/7] xen/asm-generic: introduce stub header numa.h

 is common through some archs so it is moved
to asm-generic.

Signed-off-by: Oleksii Kurochko 
Reviewed-by: Michal Orzel 
Acked-by: Jan Beulich 
Acked-by: Shawn Anastasio 
---
Changes in V5:
  - Added Acked-by: Jan Beulich 
  - Updated the comment around first_valid_mfn. ( Arm -> GENERIC )
  - Added Acked-by: Shawn Anastasio 
---
Changes in V4:
 - Updated guards name: *ARCH_GENERIC* -> *ASM_GENERIC*.
 - Moved inclusion of xen/mm-frame.h under "#ifndef CONFIG_NUMA".
 - Added Reviewed-by: Michal Orzel .
---
Changes in V3:
 - Remove old header inclusion in asm-generic numa.h and include
and 
 - Drop Arm and PPC's numa.h and use asm-generic version instead.
---
Changes in V2:
- update the commit message.
- change u8 to uint8_t.
- add ifnded CONFIG_NUMA.
---
 xen/arch/arm/include/asm/Makefile |  1 +
 xen/arch/ppc/include/asm/Makefile |  1 +
 xen/arch/ppc/include/asm/numa.h   | 26 ---
 .../asm => include/asm-generic}/numa.h| 16 +++-
 4 files changed, 12 insertions(+), 32 deletions(-)
 delete mode 100644 xen/arch/ppc/include/asm/numa.h
 rename xen/{arch/arm/include/asm => include/asm-generic}/numa.h (67%)

diff --git a/xen/arch/arm/include/asm/Makefile 
b/xen/arch/arm/include/asm/Makefile
index 8221429c2c..0c855a798a 100644
--- a/xen/arch/arm/include/asm/Makefile
+++ b/xen/arch/arm/include/asm/Makefile
@@ -2,6 +2,7 @@
 generic-y += altp2m.h
 generic-y += hardirq.h
 generic-y += iocap.h
+generic-y += numa.h
 generic-y += paging.h
 generic-y += percpu.h
 generic-y += random.h
diff --git a/xen/arch/ppc/include/asm/Makefile 
b/xen/arch/ppc/include/asm/Makefile
index a8e848d4d0..f09c5ea8a1 100644
--- a/xen/arch/ppc/include/asm/Makefile
+++ b/xen/arch/ppc/include/asm/Makefile
@@ -4,6 +4,7 @@ generic-y += div64.h
 generic-y += hardirq.h
 generic-y += hypercall.h
 generic-y += iocap.h
+generic-y += numa.h
 generic-y += paging.h
 generic-y += percpu.h
 generic-y += random.h
diff --git a/xen/arch/ppc/include/asm/numa.h b/xen/arch/ppc/include/asm/numa.h
deleted file mode 100644
index 7fdf66c3da..00
--- a/xen/arch/ppc/include/asm/numa.h
+++ /dev/null
@@ -1,26 +0,0 @@
-#ifndef __ASM_PPC_NUMA_H__
-#define __ASM_PPC_NUMA_H__
-
-#include 
-#include 
-
-typedef uint8_t nodeid_t;
-
-/* Fake one node for now. See also node_online_map. */
-#define cpu_to_node(cpu) 0
-#define node_to_cpumask(node)   (cpu_online_map)
-
-/*
- * TODO: make first_valid_mfn static when NUMA is supported on PPC, this
- * is required because the dummy helpers are using it.
- */
-extern mfn_t first_valid_mfn;
-
-/* XXX: implement NUMA support */
-#define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
-#define node_start_pfn(nid) (mfn_x(first_valid_mfn))
-#define __node_distance(a, b) (20)
-
-#define arch_want_default_dmazone() (false)
-
-#endif /* __ASM_PPC_NUMA_H__ */
diff --git a/xen/arch/arm/include/asm/numa.h b/xen/include/asm-generic/numa.h
similarity index 67%
rename from xen/arch/arm/include/asm/numa.h
rename to xen/include/asm-generic/numa.h
index e2bee2bd82..7f95a77e89 100644
--- a/xen/arch/arm/include/asm/numa.h
+++ b/xen/include/asm-generic/numa.h
@@ -1,18 +1,21 @@
-#ifndef __ARCH_ARM_NUMA_H
-#define __ARCH_ARM_NUMA_H
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_NUMA_H
+#define __ASM_GENERIC_NUMA_H
 
-#include 
+#include 
 
-typedef u8 nodeid_t;
+typedef uint8_t nodeid_t;
 
 #ifndef CONFIG_NUMA
 
+#include 
+
 /* Fake one node for now. See also node_online_map. */
 #define cpu_to_node(cpu) 0
 #define node_to_cpumask(node)   (cpu_online_map)
 
 /*
- * TODO: make first_valid_mfn static when NUMA is supported on Arm, this
+ * TODO: make first_valid_mfn static when NUMA is supported on GENERIC, this
  * is required because the dummy helpers are using it.
  */
 extern mfn_t first_valid_mfn;
@@ -26,7 +29,8 @@ extern mfn_t first_valid_mfn;
 
 #define arch_want_default_dmazone() (false)
 
-#endif /* __ARCH_ARM_NUMA_H */
+#endif /* __ASM_GENERIC_NUMA_H */
+
 /*
  * Local variables:
  * mode: C
-- 
2.43.0

[PATCH v5 4/7] xen/asm-generic: introduce stub header softirq.h

 is common between Arm, PPC and RISC-V so it is
moved to asm-generic.

Drop Arm and PPC's softirq.h and use asm-generic version instead.

Signed-off-by: Oleksii Kurochko 
Reviewed-by: Michal Orzel 
Acked-by: Jan Beulich 
Acked-by: Shawn Anastasio 
---
Changes in V5:
 - Strayed "Added" in commit message
 - Added Acked-by: Shawn Anastasio 
---
Changes in V4:
 - Added Reviewed-by: Michal Orzel 
 - Added Acked-by: Jan Beulich 
---
Changes in V3:
 - Drop Arm and PPC's softirq.h
 - Update the commit message.
---
Changes in V2:
- update the commit message.
---
 xen/arch/arm/include/asm/Makefile | 1 +
 xen/arch/ppc/include/asm/Makefile | 1 +
 xen/arch/ppc/include/asm/softirq.h| 8 
 .../arm/include/asm => include/asm-generic}/softirq.h | 7 ---
 4 files changed, 6 insertions(+), 11 deletions(-)
 delete mode 100644 xen/arch/ppc/include/asm/softirq.h
 rename xen/{arch/arm/include/asm => include/asm-generic}/softirq.h (56%)

diff --git a/xen/arch/arm/include/asm/Makefile 
b/xen/arch/arm/include/asm/Makefile
index 0c855a798a..a28cc5d1b1 100644
--- a/xen/arch/arm/include/asm/Makefile
+++ b/xen/arch/arm/include/asm/Makefile
@@ -6,4 +6,5 @@ generic-y += numa.h
 generic-y += paging.h
 generic-y += percpu.h
 generic-y += random.h
+generic-y += softirq.h
 generic-y += vm_event.h
diff --git a/xen/arch/ppc/include/asm/Makefile 
b/xen/arch/ppc/include/asm/Makefile
index f09c5ea8a1..efd72862c8 100644
--- a/xen/arch/ppc/include/asm/Makefile
+++ b/xen/arch/ppc/include/asm/Makefile
@@ -8,4 +8,5 @@ generic-y += numa.h
 generic-y += paging.h
 generic-y += percpu.h
 generic-y += random.h
+generic-y += softirq.h
 generic-y += vm_event.h
diff --git a/xen/arch/ppc/include/asm/softirq.h 
b/xen/arch/ppc/include/asm/softirq.h
deleted file mode 100644
index a0b28a5e51..00
--- a/xen/arch/ppc/include/asm/softirq.h
+++ /dev/null
@@ -1,8 +0,0 @@
-#ifndef __ASM_PPC_SOFTIRQ_H__
-#define __ASM_PPC_SOFTIRQ_H__
-
-#define NR_ARCH_SOFTIRQS 0
-
-#define arch_skip_send_event_check(cpu) 0
-
-#endif /* __ASM_PPC_SOFTIRQ_H__ */
diff --git a/xen/arch/arm/include/asm/softirq.h 
b/xen/include/asm-generic/softirq.h
similarity index 56%
rename from xen/arch/arm/include/asm/softirq.h
rename to xen/include/asm-generic/softirq.h
index 976e0ebd70..83be855e50 100644
--- a/xen/arch/arm/include/asm/softirq.h
+++ b/xen/include/asm-generic/softirq.h
@@ -1,11 +1,12 @@
-#ifndef __ASM_SOFTIRQ_H__
-#define __ASM_SOFTIRQ_H__
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_SOFTIRQ_H__
+#define __ASM_GENERIC_SOFTIRQ_H__
 
 #define NR_ARCH_SOFTIRQS   0
 
 #define arch_skip_send_event_check(cpu) 0
 
-#endif /* __ASM_SOFTIRQ_H__ */
+#endif /* __ASM_GENERIC_SOFTIRQ_H__ */
 /*
  * Local variables:
  * mode: C
-- 
2.43.0

[PATCH v5 5/7] xen: ifdef inclusion of in

Ifdef-ing inclusion of  allows to avoid
generation of empty  for cases when
CONFIG_GRANT_TABLE is not enabled.

The following changes were done for Arm:
 should be included directly because it contains
gnttab_dom0_frames() macros which is unique for Arm and is used in
arch/arm/domain_build.c.
 is #ifdef-ed with CONFIG_GRANT_TABLE in
 so in case of !CONFIG_GRANT_TABLE gnttab_dom0_frames
won't be available for use in arch/arm/domain_build.c.

Suggested-by: Jan Beulich 
Signed-off-by: Oleksii Kurochko 
---
Changes in V5:
 - Added dependencies for "Config GRANT_TABLE" to be sure that randconfig will 
not
   turn on the config.
---
Changes in V4:
 - Nothing changed. Only rebase.
---
Changes in V3:
 - Remove unnecessary comment.
---
 xen/arch/arm/domain_build.c| 1 +
 xen/arch/ppc/include/asm/grant_table.h | 5 -
 xen/common/Kconfig | 1 +
 xen/include/xen/grant_table.h  | 3 +++
 4 files changed, 5 insertions(+), 5 deletions(-)
 delete mode 100644 xen/arch/ppc/include/asm/grant_table.h

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index df66fb88d8..28df515a3d 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #define STATIC_EVTCHN_NODE_SIZE_CELLS 2
diff --git a/xen/arch/ppc/include/asm/grant_table.h 
b/xen/arch/ppc/include/asm/grant_table.h
deleted file mode 100644
index d0ff58dd3d..00
--- a/xen/arch/ppc/include/asm/grant_table.h
+++ /dev/null
@@ -1,5 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef __ASM_PPC_GRANT_TABLE_H__
-#define __ASM_PPC_GRANT_TABLE_H__
-
-#endif /* __ASM_PPC_GRANT_TABLE_H__ */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 310ad4229c..13e26ca06f 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -15,6 +15,7 @@ config CORE_PARKING
 config GRANT_TABLE
bool "Grant table support" if EXPERT
default y
+   depends on ARM || X86
---help---
  Grant table provides a generic mechanism to memory sharing
  between domains. This shared memory interface underpins the
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 85fe6b7b5e..50edfecfb6 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -26,7 +26,10 @@
 #include 
 #include 
 #include 
+
+#ifdef CONFIG_GRANT_TABLE
 #include 
+#endif
 
 struct grant_table;
 
-- 
2.43.0

[PATCH v5 1/7] xen/asm-generic: introduce generic div64.h header

All archs have the do_div implementation for BITS_PER_LONG == 64
so do_div64.h is moved to asm-generic.

x86 and PPC were switched to asm-generic version of div64.h.
Arm was switched partly because Arm has different implementation
for 32-bits.

Signed-off-by: Oleksii Kurochko 
Acked-by: Jan Beulich 
Acked-by: Shawn Anastasio 
---
Changes in V5:
  - add Acked-by: Shawn Anastasio 
  - Update the commit message
  - Partly switch Arm's div64.h to asm-generic version. Arm has different
implementation for 32-bits so only 64-bit version was switched.
---
Changes in V4:
 - Added Acked-by: Jan Beulich .
 - include  in Arm's div64.h for 64-bit case.
---
Changes in V3:
 - Drop x86 and PPC's div64.h.
 - Update the commit message.
---
Changes in V2:
- rename base to divisor
- add "#if BITS_PER_LONG == 64"
- fix code style
---
 xen/arch/arm/include/asm/div64.h  |  8 +---
 xen/arch/ppc/include/asm/Makefile |  1 +
 xen/arch/ppc/include/asm/div64.h  | 14 --
 xen/arch/x86/include/asm/Makefile |  1 +
 xen/arch/x86/include/asm/div64.h  | 14 --
 xen/include/asm-generic/div64.h   | 27 +++
 6 files changed, 30 insertions(+), 35 deletions(-)
 delete mode 100644 xen/arch/ppc/include/asm/div64.h
 delete mode 100644 xen/arch/x86/include/asm/div64.h
 create mode 100644 xen/include/asm-generic/div64.h

diff --git a/xen/arch/arm/include/asm/div64.h b/xen/arch/arm/include/asm/div64.h
index fc667a80f9..0459d5cc01 100644
--- a/xen/arch/arm/include/asm/div64.h
+++ b/xen/arch/arm/include/asm/div64.h
@@ -24,13 +24,7 @@
 
 #if BITS_PER_LONG == 64
 
-# define do_div(n,base) ({  \
-uint32_t __base = (base);   \
-uint32_t __rem; \
-__rem = ((uint64_t)(n)) % __base;   \
-(n) = ((uint64_t)(n)) / __base; \
-__rem;  \
- })
+#include 
 
 #elif BITS_PER_LONG == 32
 
diff --git a/xen/arch/ppc/include/asm/Makefile 
b/xen/arch/ppc/include/asm/Makefile
index 2da995bb2f..a8e848d4d0 100644
--- a/xen/arch/ppc/include/asm/Makefile
+++ b/xen/arch/ppc/include/asm/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 generic-y += altp2m.h
+generic-y += div64.h
 generic-y += hardirq.h
 generic-y += hypercall.h
 generic-y += iocap.h
diff --git a/xen/arch/ppc/include/asm/div64.h b/xen/arch/ppc/include/asm/div64.h
deleted file mode 100644
index d213e50585..00
--- a/xen/arch/ppc/include/asm/div64.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-#ifndef __ASM_PPC_DIV64_H__
-#define __ASM_PPC_DIV64_H__
-
-#include 
-
-#define do_div(n, base) ({   \
-uint32_t base_ = (base); \
-uint32_t rem_ = (uint64_t)(n) % base_;   \
-(n) = (uint64_t)(n) / base_; \
-rem_;\
-})
-
-#endif /* __ASM_PPC_DIV64_H__ */
diff --git a/xen/arch/x86/include/asm/Makefile 
b/xen/arch/x86/include/asm/Makefile
index 874429ed30..daab34ff0a 100644
--- a/xen/arch/x86/include/asm/Makefile
+++ b/xen/arch/x86/include/asm/Makefile
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
+generic-y += div64.h
 generic-y += percpu.h
diff --git a/xen/arch/x86/include/asm/div64.h b/xen/arch/x86/include/asm/div64.h
deleted file mode 100644
index dd49f64a3b..00
--- a/xen/arch/x86/include/asm/div64.h
+++ /dev/null
@@ -1,14 +0,0 @@
-#ifndef __X86_DIV64
-#define __X86_DIV64
-
-#include 
-
-#define do_div(n,base) ({   \
-uint32_t __base = (base);   \
-uint32_t __rem; \
-__rem = ((uint64_t)(n)) % __base;   \
-(n) = ((uint64_t)(n)) / __base; \
-__rem;  \
-})
-
-#endif
diff --git a/xen/include/asm-generic/div64.h b/xen/include/asm-generic/div64.h
new file mode 100644
index 00..068d8a11ad
--- /dev/null
+++ b/xen/include/asm-generic/div64.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_DIV64
+#define __ASM_GENERIC_DIV64
+
+#include 
+
+#if BITS_PER_LONG == 64
+
+#define do_div(n, divisor) ({   \
+uint32_t divisor_ = (divisor);  \
+uint32_t rem_ = (uint64_t)(n) % divisor_;   \
+(n) = (uint64_t)(n) / divisor_; \
+rem_;   \
+})
+
+#endif /* BITS_PER_LONG */
+
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.43.0

Moving domain from credit2 to credit cpupool crash xen

2023-12-01 Thread René Winther Højgaard

When I move a domain from pool0 with credit2 to any pool with credit(1) I get 
the following crash.


Software: Xen-4.17.3 / Qubes OS 4.2.0-RC5
Firmware: Dasharo 0.9.0 - Z790P
Hardware: 13900K
(XEN) Xen BUG at common/sched/credit.c:1051(XEN) [ Xen-4.17.3-pre  x86_64  
debug=y  Not tainted ]
(XEN) CPU:    2
(XEN) RIP:    e008:[] credit.c#csched_free_udata+0x12/0x14
(XEN) RFLAGS: 00010202   CONTEXT: hypervisor (d0v2)
(XEN) rax: 82d040237ceb   rbx: 0014   rcx: 0013
(XEN) rdx: 831087d7   rsi: 830ad80e8da0   rdi: 830ad80e8da0
(XEN) rbp:    rsp: 831087d7fc90   r8:  830e2d6a49b0
(XEN) r9:  831087d7fbe0   r10: 83107c481068   r11: 002cfd1c274a
(XEN) r12: 830ad80e8c80   r13: 83107c45bee0   r14: 
(XEN) r15: 82d0405a9288   cr0: 80050033   cr4: 00b526e0
(XEN) cr3: 0009284d8000   cr2: 7fb535181240
(XEN) fsb: 7fb534c5f380   gsb: 8881b9d0   gss: 
(XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
(XEN) Xen code around  (credit.c#csched_free_udata+0x12/0x14):
(XEN)  75 06 e8 19 74 ff ff c3 <0f> 0b f3 0f 1e fa 53 48 8b 5f 18 48 85 db 74 2b
(XEN) Xen stack trace from rsp=831087d7fc90:
(XEN)    82d040247503 00132030 830ad80e8bf0 82d0405a9288
(XEN)    83107f59aa80 830ad80e8c80 83107c45bee0 830ad80e8bf0
(XEN)    831000af1010 83107c45bee0 830ad80ed000 83107c45bee0
(XEN)     82d04045d5d8 82d0405ae680 82d040235303
(XEN)    831087d7fe20 fffe 82d040236ec3 830ad80ed000
(XEN)     7fb535230010 831087d7 
(XEN)    82d04045d5d8 82d040234763 c102 
(XEN)     c102 000d 8101ede6
(XEN)    e033 00011082 c90046ebba90 e02b
(XEN)    5a33a1a65352beef feadf9effdf1beef 122ae2fa736bbeef 46023e9af174beef
(XEN)    82d040227cc6 831087d7fe48  00011082
(XEN)     831087d7  8101ede4
(XEN)    82d0403495d0 00150012 00010006 000d
(XEN)    7ffdf93fb3fc 00431042 0043d990 0043d9b0
(XEN)    7fb534eb8434 7ffdf93fb400 0013 02361838
(XEN)    04457fe81f7cf300 02360870 ff80 
(XEN)    7ffdf93fc652 0043d980 831087d7fef8 0023
(XEN)    83107f544000   
(XEN)    82d0402dd07f 83107f544000  
(XEN)    82d0402012b7  88811abbc100 7ffdf93fb2c0
(XEN) Xen call trace:
(XEN)    [] R credit.c#csched_free_udata+0x12/0x14
(XEN)    [] S sched_move_domain+0x5b0/0x5cc
(XEN)    [] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
(XEN)    [] S cpupool_do_sysctl+0x725/0x760
(XEN)    [] S do_sysctl+0x827/0x1269
(XEN)    [] S timer.c#timer_lock+0x69/0x143
(XEN)    [] S x86_emulate_wrapper+0x24/0x56
(XEN)    [] S pv_hypercall+0x3a2/0x4a9
(XEN)    [] S lstar_enter+0x137/0x140
(XEN)
(XEN) debugtrace_dump() global buffer starting
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN)
(XEN) 
(XEN) Panic on CPU 2:
(XEN) Xen BUG at common/sched/credit.c:1051
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

/rene

publickey - renewin@proton.me - 0x43C32E54.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature

Re: Trying add smt=off disabled cores to cpupool crash xen

2023-12-01 Thread Andrew Cooper

On 01/12/2023 7:59 pm, René Winther Højgaard wrote:
> If I set smt=off and try to configure cpupools with credit(1) as if
> all cores are available, I get the following crash.  
>
> The crash happens when I try to use xl cpupool-add-cpu on the disabled
> HT sibling cores.
>
> Hyper-threading is enabled in the firmware, and only disabled with
> smt=off.

CC'ing some maintainers.

I expect this will also explode when a CPU is runtime offlined with
`xen-hptool cpu-offline` and then added to a cpupool.

Interestingly, the crash is mov (%rdx,%rax,1),%r13, and I think that's
the percpu posion value in %rdx.

I expect cpupools want to reject parked/offline CPUs.

~Andrew

>
> Software: Xen-4.17.3 / Qubes OS 4.2.0-RC5
> Firmware: Dasharo 0.9.0 - Z790P
> Hardware: 13900K
>
> (XEN) [ Xen-4.17.3-pre  x86_64  debug=y  Not tainted ]
> (XEN) CPU:    6
> (XEN) RIP:    e008:[] schedule_cpu_add+0x50/0x456
> (XEN) RFLAGS: 00010202   CONTEXT: hypervisor (d0v3)
> (XEN) rax: 82d0405a9288   rbx: 83107f5a1980   rcx:
> 0020
> (XEN) rdx: 80007d2fbfa59000   rsi: 83107f5a1980   rdi:
> 0020
> (XEN) rbp: 0009   rsp: 831087d3fc68   r8:
>  
> (XEN) r9:  82d0405b6b60   r10: 831087d22ab0   r11:
> 0003
> (XEN) r12: 831087d22ab0   r13: 0020   r14:
> 831087d22ab0
> (XEN) r15: 82d0405ae680   cr0: 80050033   cr4:
> 00b526e0
> (XEN) cr3: 000912e3   cr2: 72e5cb008375
> (XEN) fsb: 72e5caac7380   gsb: 8881b9d8   gss:
> 
> (XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
> (XEN) Xen code around  (schedule_cpu_add+0x50/0x456):
> (XEN)  db 8e 37 00 48 8b 14 ca <4c> 8b 2c 02 3b 3d 75 f0 1f 00 0f 83
> c9 01 00 00
> (XEN) Xen stack trace from rsp=831087d3fc68:
> (XEN)    83107f5a16e0 82d040204c3b 83100018
> 831087d3fd28
> (XEN)    831087d3fcc8 3431831087d3fcd0 83107f002033
> 831087d3fcd0
> (XEN)     831087d40d70 82d040246d48
> 
> (XEN)    83107f5a1980 0009 831087d22ab0
> 0020
> (XEN)    831087d22ab0 82d0405ae680 82d040235dec
> 831087d3fe20
> (XEN)    ffed 0009 83107f5a1980
> 82d040236b05
> (XEN)      72e5cb098010
> 831087d3
> (XEN)     82d04045d5d8 82d040234763
> c102
> (XEN)      c102
> 000d
> (XEN)    8101ede6 e033 00011082
> c90043c1fb00
> (XEN)    e02b 11e6f31d9b4cbeef 96994088d9fcbeef
> 7d897394f3ecbeef
> (XEN)    c501dd1632b4beef 82d040227cc6 831087d3fe48
> 
> (XEN)    00011082  831087d3
> 
> (XEN)    8101ede4 82d0403495d0 00150012
> 00020004
> (XEN)     0009 72e5cad9cb60
> 7be382ddb0c16b00
> (XEN)    00a97768 00a97150 
> 7ffe90589abc
> (XEN)    7ffe9058a780 0043d990 0043d9b0
> 72e5cad20434
> (XEN)    7ffe90589ac0 72e5cafa3f79 0008
> 831087d3fef8
> (XEN)    0023 83107f52b000 
> 
> (XEN)     82d0402dd07f 83107f52b000
> 
> (XEN) Xen call trace:
> (XEN)    [] R schedule_cpu_add+0x50/0x456
> (XEN)    [] S debugtrace_printk+0x119/0x2cc
> (XEN)    [] S free_affinity_masks+0x15/0x17
> (XEN)    [] S
> cpupool.c#cpupool_assign_cpu_locked+0x53/0x160
> (XEN)    [] S cpupool_do_sysctl+0x367/0x760
> (XEN)    [] S do_sysctl+0x827/0x1269
> (XEN)    [] S timer.c#timer_lock+0x69/0x143
> (XEN)    [] S x86_emulate_wrapper+0x24/0x56
> (XEN)    [] S pv_hypercall+0x3a2/0x4a9
> (XEN)    [] S lstar_enter+0x137/0x140
> (XEN)
> (XEN) debugtrace_dump() global buffer starting
> (XEN) wrap: 0
> (XEN) debugtrace_dump() global buffer finished
> (XEN)
> (XEN) 
> (XEN) Panic on CPU 6:
> (XEN) GENERAL PROTECTION FAULT
> (XEN) [error_code=]
> (XEN) 
> (XEN)
> (XEN) Reboot in five seconds...
>
> /rene

Trying add smt=off disabled cores to cpupool crash xen

2023-12-01 Thread René Winther Højgaard

If I set smt=off and try to configure cpupools with credit(1) as if all cores 
are available, I get the following crash.  


The crash happens when I try to use xl cpupool-add-cpu on the disabled HT 
sibling cores.



Hyper-threading is enabled in the firmware, and only disabled with smt=off.



Software: Xen-4.17.3 / Qubes OS 4.2.0-RC5
Firmware: Dasharo 0.9.0 - Z790P
Hardware: 13900K



(XEN) [ Xen-4.17.3-pre  x86_64  debug=y  Not tainted ](XEN) CPU:    6
(XEN) RIP:    e008:[] schedule_cpu_add+0x50/0x456
(XEN) RFLAGS: 00010202   CONTEXT: hypervisor (d0v3)
(XEN) rax: 82d0405a9288   rbx: 83107f5a1980   rcx: 0020
(XEN) rdx: 80007d2fbfa59000   rsi: 83107f5a1980   rdi: 0020
(XEN) rbp: 0009   rsp: 831087d3fc68   r8:  
(XEN) r9:  82d0405b6b60   r10: 831087d22ab0   r11: 0003
(XEN) r12: 831087d22ab0   r13: 0020   r14: 831087d22ab0
(XEN) r15: 82d0405ae680   cr0: 80050033   cr4: 00b526e0
(XEN) cr3: 000912e3   cr2: 72e5cb008375
(XEN) fsb: 72e5caac7380   gsb: 8881b9d8   gss: 
(XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008
(XEN) Xen code around  (schedule_cpu_add+0x50/0x456):
(XEN)  db 8e 37 00 48 8b 14 ca <4c> 8b 2c 02 3b 3d 75 f0 1f 00 0f 83 c9 01 00 00
(XEN) Xen stack trace from rsp=831087d3fc68:
(XEN)    83107f5a16e0 82d040204c3b 83100018 831087d3fd28
(XEN)    831087d3fcc8 3431831087d3fcd0 83107f002033 831087d3fcd0
(XEN)     831087d40d70 82d040246d48 
(XEN)    83107f5a1980 0009 831087d22ab0 0020
(XEN)    831087d22ab0 82d0405ae680 82d040235dec 831087d3fe20
(XEN)    ffed 0009 83107f5a1980 82d040236b05
(XEN)      72e5cb098010 831087d3
(XEN)     82d04045d5d8 82d040234763 c102
(XEN)      c102 000d
(XEN)    8101ede6 e033 00011082 c90043c1fb00
(XEN)    e02b 11e6f31d9b4cbeef 96994088d9fcbeef 7d897394f3ecbeef
(XEN)    c501dd1632b4beef 82d040227cc6 831087d3fe48 
(XEN)    00011082  831087d3 
(XEN)    8101ede4 82d0403495d0 00150012 00020004
(XEN)     0009 72e5cad9cb60 7be382ddb0c16b00
(XEN)    00a97768 00a97150  7ffe90589abc
(XEN)    7ffe9058a780 0043d990 0043d9b0 72e5cad20434
(XEN)    7ffe90589ac0 72e5cafa3f79 0008 831087d3fef8
(XEN)    0023 83107f52b000  
(XEN)     82d0402dd07f 83107f52b000 
(XEN) Xen call trace:
(XEN)    [] R schedule_cpu_add+0x50/0x456
(XEN)    [] S debugtrace_printk+0x119/0x2cc
(XEN)    [] S free_affinity_masks+0x15/0x17
(XEN)    [] S cpupool.c#cpupool_assign_cpu_locked+0x53/0x160
(XEN)    [] S cpupool_do_sysctl+0x367/0x760
(XEN)    [] S do_sysctl+0x827/0x1269
(XEN)    [] S timer.c#timer_lock+0x69/0x143
(XEN)    [] S x86_emulate_wrapper+0x24/0x56
(XEN)    [] S pv_hypercall+0x3a2/0x4a9
(XEN)    [] S lstar_enter+0x137/0x140
(XEN)
(XEN) debugtrace_dump() global buffer starting
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN)
(XEN) 
(XEN) Panic on CPU 6:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=]
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

/rene

publickey - renewin@proton.me - 0x43C32E54.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature

[PATCH 1/3] xen/ppc: Enable Boot Allocator

Adapt arm's earlyfdt parsing code to ppc64 and enable Xen's early boot
allocator. Routines for parsing arm-specific devicetree nodes (e.g.
multiboot) were excluded, reducing the overall footprint of code that
was copied.

Signed-off-by: Shawn Anastasio 
---
 xen/arch/ppc/Makefile|   1 +
 xen/arch/ppc/bootfdt.c   | 507 +++
 xen/arch/ppc/include/asm/setup.h | 113 +++
 xen/arch/ppc/setup.c | 109 ++-
 4 files changed, 729 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/ppc/bootfdt.c

diff --git a/xen/arch/ppc/Makefile b/xen/arch/ppc/Makefile
index 71feb5e2c4..8a2a809c70 100644
--- a/xen/arch/ppc/Makefile
+++ b/xen/arch/ppc/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_PPC64) += ppc64/
 
+obj-y += bootfdt.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.init.o
 obj-y += mm-radix.o
 obj-y += opal.o
diff --git a/xen/arch/ppc/bootfdt.c b/xen/arch/ppc/bootfdt.c
new file mode 100644
index 00..791e1ca61f
--- /dev/null
+++ b/xen/arch/ppc/bootfdt.c
@@ -0,0 +1,507 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Early Device Tree and boot info bookkeeping.
+ * Derived from arch/arm/bootfdt.c and setup.c.
+ *
+ * Copyright (C) 2012-2014 Citrix Systems, Inc.
+ * Copyright (C) Raptor Engineering LLC
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct bootinfo __initdata bootinfo;
+
+struct bootmodule __init *add_boot_module(bootmodule_kind kind,
+  paddr_t start, paddr_t size,
+  bool domU)
+{
+struct bootmodules *mods = 
+struct bootmodule *mod;
+unsigned int i;
+
+if ( mods->nr_mods == MAX_MODULES )
+{
+printk("Ignoring %s boot module at %"PRIpaddr"-%"PRIpaddr" (too 
many)\n",
+   boot_module_kind_as_string(kind), start, start + size);
+return NULL;
+}
+
+if ( check_reserved_regions_overlap(start, size) )
+return NULL;
+
+for ( i = 0 ; i < mods->nr_mods ; i++ )
+{
+mod = >module[i];
+if ( mod->kind == kind && mod->start == start )
+{
+if ( !domU )
+mod->domU = false;
+return mod;
+}
+}
+
+mod = >module[mods->nr_mods++];
+mod->kind = kind;
+mod->start = start;
+mod->size = size;
+mod->domU = domU;
+
+return mod;
+}
+
+const char * __init boot_module_kind_as_string(bootmodule_kind kind)
+{
+switch ( kind )
+{
+case BOOTMOD_XEN: return "Xen";
+case BOOTMOD_FDT: return "Device Tree";
+case BOOTMOD_KERNEL:  return "Kernel";
+default: BUG();
+}
+}
+
+/*
+ * TODO: '*_end' could be 0 if the module/region is at the end of the physical
+ * address space. This is for now not handled as it requires more rework.
+ */
+static bool __init bootmodules_overlap_check(struct bootmodules *bootmodules,
+ paddr_t region_start,
+ paddr_t region_size)
+{
+paddr_t mod_start = INVALID_PADDR, mod_end = 0;
+paddr_t region_end = region_start + region_size;
+unsigned int i, mod_num = bootmodules->nr_mods;
+
+for ( i = 0; i < mod_num; i++ )
+{
+mod_start = bootmodules->module[i].start;
+mod_end = mod_start + bootmodules->module[i].size;
+
+if ( region_end <= mod_start || region_start >= mod_end )
+continue;
+else
+{
+printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping with"
+   " mod[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n", region_start,
+   region_end, i, mod_start, mod_end);
+return true;
+}
+}
+
+return false;
+}
+
+/*
+ * TODO: '*_end' could be 0 if the bank/region is at the end of the physical
+ * address space. This is for now not handled as it requires more rework.
+ */
+static bool __init meminfo_overlap_check(struct meminfo *meminfo,
+ paddr_t region_start,
+ paddr_t region_size)
+{
+paddr_t bank_start = INVALID_PADDR, bank_end = 0;
+paddr_t region_end = region_start + region_size;
+unsigned int i, bank_num = meminfo->nr_banks;
+
+for ( i = 0; i < bank_num; i++ )
+{
+bank_start = meminfo->bank[i].start;
+bank_end = bank_start + meminfo->bank[i].size;
+
+if ( region_end <= bank_start || region_start >= bank_end )
+continue;
+else
+{
+printk("Region: [%#"PRIpaddr", %#"PRIpaddr") overlapping with"
+   " bank[%u]: [%#"PRIpaddr", %#"PRIpaddr")\n", region_start,
+   region_end, i, bank_start, bank_end);
+return true;
+}
+}
+
+return false;
+}
+
+/*
+ * Given an input physical address range, check if this range is overlapping
+ * with the existing

[PATCH 0/3] Early Boot Allocation on Power

Hello all,

This series enables the Xen boot time allocator on Power by parsing
the available memory regions from the firmware-provided device tree.

The device tree processing and bookkeeping code was adapted from Arm,
but it was trimmed down to exclude code for parsing Arm-specific dt
nodes.

Additionally, the final patch of the series updates the radix mmu
code to use the newly-enabled boot allocator for the Partition and
Process tables, instead of statically allocating them like was
previously done. Among other things, switching to run-time allocation
allows us to allocate a full-sized Process Table instead of the
minimal one that was previously used to keep the Xen binary size small.

Thanks,

Shawn Anastasio (3):
  xen/ppc: Enable Boot Allocator
  xen/ppc: mm-radix: Replace debug printing code with printk
  xen/ppc: mm-radix: Allocate Partition and Process Tables at runtime

 xen/arch/ppc/Makefile|   1 +
 xen/arch/ppc/bootfdt.c   | 507 +++
 xen/arch/ppc/include/asm/setup.h | 113 +++
 xen/arch/ppc/mm-radix.c  | 197 ++--
 xen/arch/ppc/setup.c | 109 ++-
 5 files changed, 823 insertions(+), 104 deletions(-)
 create mode 100644 xen/arch/ppc/bootfdt.c

--
2.30.2

[PATCH 3/3] xen/ppc: mm-radix: Allocate Partition and Process Tables at runtime

In the initial mm-radix implementation, the in-memory partition and
process tables required to configure the MMU were allocated statically
since the boot allocator was not yet available.

Now that it is, allocate these tables at runtime and bump the size of
the Process Table to its maximum supported value (on POWER9).

Signed-off-by: Shawn Anastasio 
---
 xen/arch/ppc/mm-radix.c | 167 +++-
 1 file changed, 96 insertions(+), 71 deletions(-)

diff --git a/xen/arch/ppc/mm-radix.c b/xen/arch/ppc/mm-radix.c
index de181cf6f1..fae5ebfdcc 100644
--- a/xen/arch/ppc/mm-radix.c
+++ b/xen/arch/ppc/mm-radix.c
@@ -34,17 +34,13 @@ static struct lvl2_pd 
initial_lvl2_lvl3_pd_pool[INITIAL_LVL2_LVL3_PD_COUNT];
 static size_t __initdata initial_lvl4_pt_pool_used;
 static struct lvl4_pt initial_lvl4_pt_pool[INITIAL_LVL4_PT_COUNT];
 
-/* Only reserve minimum Partition and Process tables  */
 #define PATB_SIZE_LOG2 16 /* Only supported partition table size on POWER9 */
 #define PATB_SIZE  (1UL << PATB_SIZE_LOG2)
-#define PRTB_SIZE_LOG2 12
+#define PRTB_SIZE_LOG2 24 /* Maximum process table size on POWER9 */
 #define PRTB_SIZE  (1UL << PRTB_SIZE_LOG2)
 
-static struct patb_entry
-__aligned(PATB_SIZE) initial_patb[PATB_SIZE / sizeof(struct patb_entry)];
-
-static struct prtb_entry
-__aligned(PRTB_SIZE) initial_prtb[PRTB_SIZE / sizeof(struct prtb_entry)];
+static struct patb_entry *initial_patb;
+static struct prtb_entry *initial_prtb;
 
 static __init struct lvl1_pd *lvl1_pd_pool_alloc(void)
 {
@@ -86,6 +82,62 @@ static __init struct lvl4_pt *lvl4_pt_pool_alloc(void)
 return _lvl4_pt_pool[initial_lvl4_pt_pool_used++];
 }
 
+static void map_page_initial(struct lvl1_pd *lvl1, vaddr_t virt, paddr_t phys,
+ unsigned long flags)
+{
+struct lvl2_pd *lvl2;
+struct lvl3_pd *lvl3;
+struct lvl4_pt *lvl4;
+pde_t *pde;
+pte_t *pte;
+
+/* Allocate LVL 2 PD if necessary */
+pde = pt_entry(lvl1, virt);
+if ( !pde_is_valid(*pde) )
+{
+lvl2 = lvl2_pd_pool_alloc();
+*pde = paddr_to_pde(__pa(lvl2), PDE_VALID,
+XEN_PT_ENTRIES_LOG2_LVL_2);
+}
+else
+lvl2 = __va(pde_to_paddr(*pde));
+
+/* Allocate LVL 3 PD if necessary */
+pde = pt_entry(lvl2, virt);
+if ( !pde_is_valid(*pde) )
+{
+lvl3 = lvl3_pd_pool_alloc();
+*pde = paddr_to_pde(__pa(lvl3), PDE_VALID,
+XEN_PT_ENTRIES_LOG2_LVL_3);
+}
+else
+lvl3 = __va(pde_to_paddr(*pde));
+
+/* Allocate LVL 4 PT if necessary */
+pde = pt_entry(lvl3, virt);
+if ( !pde_is_valid(*pde) )
+{
+lvl4 = lvl4_pt_pool_alloc();
+*pde = paddr_to_pde(__pa(lvl4), PDE_VALID,
+XEN_PT_ENTRIES_LOG2_LVL_4);
+}
+else
+lvl4 = __va(pde_to_paddr(*pde));
+
+/* Finally, create PTE in LVL 4 PT */
+pte = pt_entry(lvl4, virt);
+if ( !pte_is_valid(*pte) )
+{
+radix_dprintk("%016lx being mapped to %016lx\n", phys, virt);
+*pte = paddr_to_pte(phys, flags);
+}
+else
+{
+early_printk("BUG: Tried to create PTE for already-mapped page!");
+die();
+}
+}
+
 static void __init setup_initial_mapping(struct lvl1_pd *lvl1,
  vaddr_t map_start,
  vaddr_t map_end,
@@ -105,80 +157,43 @@ static void __init setup_initial_mapping(struct lvl1_pd 
*lvl1,
 die();
 }
 
+/* Identity map Xen itself */
 for ( page_addr = map_start; page_addr < map_end; page_addr += PAGE_SIZE )
 {
-struct lvl2_pd *lvl2;
-struct lvl3_pd *lvl3;
-struct lvl4_pt *lvl4;
-pde_t *pde;
-pte_t *pte;
-
-/* Allocate LVL 2 PD if necessary */
-pde = pt_entry(lvl1, page_addr);
-if ( !pde_is_valid(*pde) )
-{
-lvl2 = lvl2_pd_pool_alloc();
-*pde = paddr_to_pde(__pa(lvl2), PDE_VALID,
-XEN_PT_ENTRIES_LOG2_LVL_2);
-}
-else
-lvl2 = __va(pde_to_paddr(*pde));
+unsigned long flags;
 
-/* Allocate LVL 3 PD if necessary */
-pde = pt_entry(lvl2, page_addr);
-if ( !pde_is_valid(*pde) )
+if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) )
 {
-lvl3 = lvl3_pd_pool_alloc();
-*pde = paddr_to_pde(__pa(lvl3), PDE_VALID,
-XEN_PT_ENTRIES_LOG2_LVL_3);
+radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr);
+flags = PTE_XEN_RX;
 }
-else
-lvl3 = __va(pde_to_paddr(*pde));
-
-/* Allocate LVL 4 PT if necessary */
-pde = pt_entry(lvl3, page_addr);
-if ( !pde_is_valid(*pde) )
-{
-lvl4 = lvl4_pt_pool_alloc();
-*pde = paddr_to_pde(__pa(lvl4), PDE_VALID,
-

[PATCH 2/3] xen/ppc: mm-radix: Replace debug printing code with printk

Now that we have common code building, there's no need to keep the old
itoa64+debug print function in mm-radix.c

Signed-off-by: Shawn Anastasio 
---
 xen/arch/ppc/mm-radix.c | 58 +
 1 file changed, 12 insertions(+), 46 deletions(-)

diff --git a/xen/arch/ppc/mm-radix.c b/xen/arch/ppc/mm-radix.c
index daa411a6fa..de181cf6f1 100644
--- a/xen/arch/ppc/mm-radix.c
+++ b/xen/arch/ppc/mm-radix.c
@@ -15,6 +15,12 @@
 
 void enable_mmu(void);
 
+#ifdef NDEBUG
+#define radix_dprintk(...)
+#else
+#define radix_dprintk(msg, ...) printk(XENLOG_DEBUG msg, ## __VA_ARGS__)
+#endif
+
 #define INITIAL_LVL1_PD_COUNT  1
 #define INITIAL_LVL2_LVL3_PD_COUNT 2
 #define INITIAL_LVL4_PT_COUNT  256
@@ -80,45 +86,6 @@ static __init struct lvl4_pt *lvl4_pt_pool_alloc(void)
 return _lvl4_pt_pool[initial_lvl4_pt_pool_used++];
 }
 
-#ifndef NDEBUG
-/* TODO: Remove once we get common/ building */
-static char *__init itoa64_hex(uint64_t val, char *out_buf, size_t buf_len)
-{
-uint64_t cur;
-size_t i = buf_len - 1;
-
-/* Null terminate buffer */
-out_buf[i] = '\0';
-
-/* Add digits in reverse */
-cur = val;
-while ( cur && i > 0 )
-{
-out_buf[--i] = "0123456789ABCDEF"[cur % 16];
-cur /= 16;
-}
-
-/* Pad to 16 digits */
-while ( i > 0 )
-out_buf[--i] = '0';
-
-return out_buf + i;
-}
-#endif
-
-static void __init radix_dprint(uint64_t addr, const char *msg)
-{
-#ifndef NDEBUG
-char buf[sizeof("DEADBEEFCAFEBABA")];
-char *addr_s = itoa64_hex(addr, buf, sizeof(buf));
-
-early_printk("(0x");
-early_printk(addr_s);
-early_printk(") ");
-early_printk(msg);
-#endif
-}
-
 static void __init setup_initial_mapping(struct lvl1_pd *lvl1,
  vaddr_t map_start,
  vaddr_t map_end,
@@ -186,27 +153,26 @@ static void __init setup_initial_mapping(struct lvl1_pd 
*lvl1,
 unsigned long paddr = (page_addr - map_start) + phys_base;
 unsigned long flags;
 
-radix_dprint(paddr, "being mapped to ");
-radix_dprint(page_addr, "!\n");
+radix_dprintk("%016lx being mapped to %016lx\n", paddr, page_addr);
 if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) )
 {
-radix_dprint(page_addr, "being marked as TEXT (RX)\n");
+radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr);
 flags = PTE_XEN_RX;
 }
 else if ( is_kernel_rodata(page_addr) )
 {
-radix_dprint(page_addr, "being marked as RODATA (RO)\n");
+radix_dprintk("%016lx being marked as RODATA (RO)\n", 
page_addr);
 flags = PTE_XEN_RO;
 }
 else
 {
-radix_dprint(page_addr, "being marked as DEFAULT (RW)\n");
+radix_dprintk("%016lx being marked as DEFAULT (RW)\n", 
page_addr);
 flags = PTE_XEN_RW;
 }
 
 *pte = paddr_to_pte(paddr, flags);
-radix_dprint(paddr_to_pte(paddr, flags).pte,
- "is result of PTE map!\n");
+radix_dprintk("%016lx is the result of PTE map\n",
+paddr_to_pte(paddr, flags).pte);
 }
 else
 {
-- 
2.30.2

Re: [PATCH v2 05/29] tools/xenlogd: add 9pfs response generation support

2023-12-01 Thread Jason Andryuk

On Fri, Nov 10, 2023 at 1:41 PM Juergen Gross  wrote:
> +static void fill_buffer(struct ring *ring, uint8_t cmd, uint16_t tag,
> +const char *fmt, ...)
> +{
> +struct p9_header *hdr = ring->buffer;
> +void *data = hdr + 1;
> +const char *f;
> +const void *par;
> +const char *str_val;
> +const struct p9_qid *qid;
> +unsigned int len;
> +va_list ap;
> +unsigned int array_sz = 0;
> +unsigned int elem_sz = 0;
> +
> +hdr->cmd = cmd;
> +hdr->tag = tag;
> +
> +va_start(ap, fmt);
> +
> +for ( f = fmt; *f; f++ )
> +{
> +if ( !array_sz )
> +par = va_arg(ap, const void *);
> +else
> +{
> +par += elem_sz;
> +array_sz--;
> +}
> +
> +switch ( *f )
> +{
> +case 'a':
> +f++;
> +if ( !*f || array_sz )
> +fmt_err(fmt);
> +array_sz = *(const unsigned int *)par;
> +if ( array_sz > 0x )
> +{
> +syslog(LOG_CRIT, "array size %u in fill_buffer()", array_sz);
> +exit(1);
> +}
> +*(__packed uint16_t *)data = array_sz;

Compiling on Fedora 39, gcc 13.2.1:

io.c: In function ‘fill_buffer’:
io.c:233:13: error: ‘packed’ attribute ignored for type ‘uint16_t *’
{aka ‘short unsigned int *’} [-Werror=attributes]
  233 | *(__packed uint16_t *)data = array_sz;
  | ^

For all these uses of __packed.

Regards,
Jason

Re: [PATCH] Config.mk: drop -Wdeclaration-after-statement

Hi Jan,

On 30/11/2023 08:36, Jan Beulich wrote:

On 29.11.2023 14:10, Anthony PERARD wrote:

On Wed, Nov 29, 2023 at 11:47:24AM +0100, Julien Grall wrote:

+ Anthony for the tools
+ Juergen for Xenstored

On 29/11/2023 11:34, Alexander Kanavin wrote:

On 11/29/23 08:51, Jan Beulich wrote:

On 28.11.2023 18:47, Alexander Kanavin wrote:

Such constructs are fully allowed by C99:
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Mixed-Labels-and-Declarations.html#Mixed-Labels-and-Declarations

There's more to this: It may also be a policy of ours (or of any
sub-component)
to demand that declarations and statements are properly separated.
This would
therefore need discussing first.

The error is coming from python 3.12 headers and not from anything in
xen tree, no? As you don't have control over those headers, I'm not sure
what other solution there could be.

We seem to add -Wno-declaration-after-statement for some components in
tools/. So one possibility would be to move the flags to an hypervisor
specific makefile (in xen/).

You mean xen/Makefile I hope.

Anthony/Juergen, do you have any concern if the tools are built without
-Wdeclaration-after-statement?

I don't, and as you said, there's already quite a few
-Wno-declaration-after-statement.

It can be nice to add a new variable in the middle of a function, it's
like creating a new scope without adding extra indentation (if we wanted
a new scope, we would need {} thus the intend).

To be clear, I wouldn't mind this in the hypervisor either. But then I also
don't see why we should further request people to separate declarations
from statements in an easily noticeable way. Thing is that imo something
like this wants spelling out in ./CODING_STYLE.

So I agree that if we were to remove -Wdeclaration-after-statement then
we should also update the CODING_STYLE. However, I am not entirely sure
I would want to mix code and declaration in the hypervisor.

Anyway, I think this is a separate discussion from resolving the
immediate problem (i.e. building the python bindings).

So for now, I think it would make sense to push the
-Wdeclaration-after-statement to the tools.

@Alexander, are you going to send a new version? If not, I would be
happy to do it.

Cheers,

--
Julien Grall

Re: [PATCH v2] xen/arm: Move static event channel feature to a separate module


Hi Luca,

On 30/11/2023 13:17, Luca Fancellu wrote:

On 30 Nov 2023, at 09:57, Michal Orzel  wrote:

Move static event channel feature related code to a separate module
(static-evtchn.{c,h}) in the spirit of fine granular configuration, so
that the feature can be disabled if not needed.

Introduce Kconfig option CONFIG_STATIC_EVTCHN, enabled by default (to
keep the current behavior) dependent on CONFIG_DOM0LESS. While it could
be possible to create a loopback connection for dom0 only, this use case
does not really need this feature and all the docs and commit messages
refer explicitly to the use in dom0less system.

The only function visible externally is alloc_static_evtchn(), so move
the prototype to static-evtchn.h and provide a stub in case a feature
is disabled. Guard static_evtchn_created in struct dt_device_node as
well as its helpers.

Signed-off-by: Michal Orzel 
---


Hi Michal,

FWIW because it is already Ack-ed.

Reviewed-by: Luca Fancellu 


Usually when I provide an Acked-by, it means I went through the code and 
generally happy with the patch, but I didn't review throroughly. Even if 
I provide a reviewed-by, it is always useful to have extra pair of eyes 
:). So thanks for looking at it!


It is now committed.

Cheers,







--
Julien Grall

[RFC PATCH] xen/arm: Add emulation of Debug Data Transfer Registers

2023-12-01 Thread Ayan Kumar Halder

Currently if user enables HVC_DCC config option in Linux, it invokes
access to debug data transfer registers (ie DBGDTRTX_EL0 on arm64,
DBGDTRTXINT on arm32). As these registers are not emulated, Xen injects
an undefined exception to the guest. And Linux crashes.

We wish to avoid this crash by adding a "partial" emulation. DBGDTR_EL0
is emulated as TXfull | RXfull.
Refer ARM DDI 0487I.a ID081822, D17.3.8, DBGDTRTX_EL0
"If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN"
Also D17.3.7 DBGDTRRX_EL0,
" If RXfull is set to 1, return the last value written to DTRRX."

Thus, any OS is expected to read DBGDTR_EL0 and check for TXfull
before using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() --->
hvc_dcc_check() , it returns -ENODEV. In this way, we are preventing
the guest to be aborted.

We also emulate DBGDTRTX_EL0 as RAZ/WI.

We have added emulation for AArch32 variant of these registers as well.
Also, we have added handle_read_val_wi() to emulate DBGDSCREXT register
to return a specific value (ie TXfull | RXfull) and ignore any writes
to this register.

Signed-off-by: Michal Orzel 
Signed-off-by: Ayan Kumar Halder 
---
 xen/arch/arm/arm64/vsysreg.c | 21 ++
 xen/arch/arm/include/asm/arm64/hsr.h |  3 +++
 xen/arch/arm/include/asm/cpregs.h|  2 ++
 xen/arch/arm/include/asm/traps.h |  4 
 xen/arch/arm/traps.c | 18 +++
 xen/arch/arm/vcpreg.c| 33 +---
 6 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index b5d54c569b..5082dfb02e 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs,
  *
  * Unhandled:
  *MDCCINT_EL1
- *DBGDTR_EL0
- *DBGDTRRX_EL0
- *DBGDTRTX_EL0
  *OSDTRRX_EL1
  *OSDTRTX_EL1
  *OSECCR_EL1
@@ -172,11 +169,27 @@ void do_sysreg(struct cpu_user_regs *regs,
 case HSR_SYSREG_MDSCR_EL1:
 return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
 case HSR_SYSREG_MDCCSR_EL0:
+{
+/*
+ * Bit 29: TX full, bit 30: RX full
+ * Given that we emulate DCC registers as RAZ/WI, doing the same for
+ * MDCCSR_EL0 would cause a guest to continue using the DCC despite no
+ * real effect. Setting the TX/RX status bits should result in a probe
+ * fail (based on Linux behavior).
+ */
+register_t guest_reg_value = (1U << 29) | (1U << 30);
+
 /*
  * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We emulate 
that
  * register as RAZ/WI above. So RO at both EL0 and EL1.
  */
-return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0);
+return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 0,
+  guest_reg_value);
+}
+case HSR_SYSREG_DBGDTR_EL0:
+/* DBGDTR[TR]X_EL0 share the same encoding */
+case HSR_SYSREG_DBGDTRTX_EL0:
+return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0);
 HSR_SYSREG_DBG_CASES(DBGBVR):
 HSR_SYSREG_DBG_CASES(DBGBCR):
 HSR_SYSREG_DBG_CASES(DBGWVR):
diff --git a/xen/arch/arm/include/asm/arm64/hsr.h 
b/xen/arch/arm/include/asm/arm64/hsr.h
index e691d41c17..1495ccddea 100644
--- a/xen/arch/arm/include/asm/arm64/hsr.h
+++ b/xen/arch/arm/include/asm/arm64/hsr.h
@@ -47,6 +47,9 @@
 #define HSR_SYSREG_OSDLR_EL1  HSR_SYSREG(2,0,c1,c3,4)
 #define HSR_SYSREG_DBGPRCR_EL1HSR_SYSREG(2,0,c1,c4,4)
 #define HSR_SYSREG_MDCCSR_EL0 HSR_SYSREG(2,3,c0,c1,0)
+#define HSR_SYSREG_DBGDTR_EL0 HSR_SYSREG(2,3,c0,c4,0)
+#define HSR_SYSREG_DBGDTRTX_EL0   HSR_SYSREG(2,3,c0,c5,0)
+#define HSR_SYSREG_DBGDTRRX_EL0   HSR_SYSREG(2,3,c0,c5,0)
 
 #define HSR_SYSREG_DBGBVRn_EL1(n) HSR_SYSREG(2,0,c0,c##n,4)
 #define HSR_SYSREG_DBGBCRn_EL1(n) HSR_SYSREG(2,0,c0,c##n,5)
diff --git a/xen/arch/arm/include/asm/cpregs.h 
b/xen/arch/arm/include/asm/cpregs.h
index 6b083de204..aec9e8f329 100644
--- a/xen/arch/arm/include/asm/cpregs.h
+++ b/xen/arch/arm/include/asm/cpregs.h
@@ -75,6 +75,8 @@
 #define DBGDIDR p14,0,c0,c0,0   /* Debug ID Register */
 #define DBGDSCRINT  p14,0,c0,c1,0   /* Debug Status and Control Internal */
 #define DBGDSCREXT  p14,0,c0,c2,2   /* Debug Status and Control External */
+#define DBGDTRRXINT p14,0,c0,c5,0   /* Debug Data Transfer Register, 
Receive */
+#define DBGDTRTXINT p14,0,c0,c5,0   /* Debug Data Transfer Register, 
Transmit */
 #define DBGVCR  p14,0,c0,c7,0   /* Vector Catch */
 #define DBGBVR0 p14,0,c0,c0,4   /* Breakpoint Value 0 */
 #define DBGBCR0 p14,0,c0,c0,5   /* Breakpoint Control 0 */
diff --git a/xen/arch/arm/include/asm/traps.h b/xen/arch/arm/include/asm/traps.h
index 883dae368e..a2692722d5 100644
--- a/xen/arch/arm/include/asm/traps.h
+++ b/xen/arch/arm/include/asm/traps.h
@@ -56,6 +56,10 @@ void

Re: [PATCH] CODING_STYLE: Add a section of the naming convention





On 01/12/2023 18:47, Julien Grall wrote:

From: Julien Grall 

Several maintainers have expressed a stronger preference
to use '-' when in filename and option that contains multiple
words.

So document it in CODING_STYLE.

Signed-off-by: Julien Grall 
---
  CODING_STYLE | 9 +
  1 file changed, 9 insertions(+)

diff --git a/CODING_STYLE b/CODING_STYLE
index ced3ade5a6fb..afd09177745b 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -144,6 +144,15 @@ separate lines and each line should begin with a leading 
'*'.
   * Note beginning and end markers on separate lines and leading '*'.
   */
  
+Naming convention

+-
+
+When command line option or filename contain multiple words, a '-'
+should be to separate them. E.g. 'timer-works'.
+
+Note that some of the option and filename are using '_'. This is now
+deprecated.


Urgh, I sent the wrong draft :(. This is the wording I wanted to propose:

+Naming convention
+-
+
+'-' should be used to separate words in commandline options and filenames.
+E.g. timer-works.
+
+Note that some of the options and filenames are using '_'. This is now
+deprecated.
+

Cheers,

--
Julien Grall

[ovmf test] 183966: all pass - PUSHED

flight 183966 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183966/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 70b174e24db4a6de1590fda65846074dcb9fd7d3
baseline version:
 ovmf 534021965f6f7c417610add53984f39d6945bbcf

Last test of basis   183958  2023-12-01 01:15:11 Z0 days
Testing same since   183966  2023-12-01 15:11:04 Z0 days1 attempts


People who touched revisions under test:
  Abner Chang 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   534021965f..70b174e24d  70b174e24db4a6de1590fda65846074dcb9fd7d3 -> 
xen-tested-master

[PATCH] CODING_STYLE: Add a section of the naming convention

From: Julien Grall 

Several maintainers have expressed a stronger preference
to use '-' when in filename and option that contains multiple
words.

So document it in CODING_STYLE.

Signed-off-by: Julien Grall 
---
 CODING_STYLE | 9 +
 1 file changed, 9 insertions(+)

diff --git a/CODING_STYLE b/CODING_STYLE
index ced3ade5a6fb..afd09177745b 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -144,6 +144,15 @@ separate lines and each line should begin with a leading 
'*'.
  * Note beginning and end markers on separate lines and leading '*'.
  */
 
+Naming convention
+-
+
+When command line option or filename contain multiple words, a '-'
+should be to separate them. E.g. 'timer-works'.
+
+Note that some of the option and filename are using '_'. This is now
+deprecated.
+
 Emacs local variables
 -
 
-- 
2.40.1

Re: [PATCH 12/12] block: remove outdated AioContext locking comments

2023-12-01 Thread Eric Blake

On Wed, Nov 29, 2023 at 02:55:53PM -0500, Stefan Hajnoczi wrote:
> The AioContext lock no longer exists.
> 
> There is one noteworthy change:
> 
>   - * More specifically, these functions use BDRV_POLL_WHILE(bs), which
>   - * requires the caller to be either in the main thread and hold
>   - * the BlockdriverState (bs) AioContext lock, or directly in the
>   - * home thread that runs the bs AioContext. Calling them from
>   - * another thread in another AioContext would cause deadlocks.
>   + * More specifically, these functions use BDRV_POLL_WHILE(bs), which 
> requires
>   + * the caller to be either in the main thread or directly in the home 
> thread
>   + * that runs the bs AioContext. Calling them from another thread in another
>   + * AioContext would cause deadlocks.
> 
> I am not sure whether deadlocks are still possible. Maybe they have just
> moved to the fine-grained locks that have replaced the AioContext. Since
> I am not sure if the deadlocks are gone, I have kept the substance
> unchanged and just removed mention of the AioContext.

I'd rather text that may be overly conservative than an omission that
could lead to a bug; so I'm okay with your action there.

> 
> Signed-off-by: Stefan Hajnoczi 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH 11/12] job: remove outdated AioContext locking comments

2023-12-01 Thread Eric Blake

On Wed, Nov 29, 2023 at 02:55:52PM -0500, Stefan Hajnoczi wrote:
> The AioContext lock no longer exists.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/qemu/job.h | 20 
>  1 file changed, 20 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [XEN PATCH 6/7] xen/x86: remove stale comment

2023-12-01 Thread Nicola Vetrini


On 2023-11-30 17:41, Jan Beulich wrote:

On 29.11.2023 16:24, Nicola Vetrini wrote:

The comment referred to the declaration for do_mca, which
now is part of hypercall-defs.h, therefore the comment is stale.


If the comments were stale, the #include-s should also be able to
disappear?


--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-#include  /* for do_mca */
+#include 
 #include 


Here specifically I think the comment isn't stale, as xen/hypercall.h
includes xen/hypercall-defs.h.



Ok, I see your point


--- a/xen/arch/x86/include/asm/hypercall.h
+++ b/xen/arch/x86/include/asm/hypercall.h
@@ -12,7 +12,7 @@
 #include 
 #include 
 #include 
-#include  /* for do_mca */
+#include 
 #include 


Here otoh I'm not even sure this public header (or the others) is (are)
really needed.



I confirm this. It build even without this header.

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)

[PATCH v9 2/2] xen/vpci: header: filter PCI capabilities

2023-12-01 Thread Stewart Hildebrand

Currently, Xen vPCI only supports virtualizing the MSI and MSI-X capabilities.
Hide all other PCI capabilities (including extended capabilities) from domUs for
now, even though there may be certain devices/drivers that depend on being able
to discover certain capabilities.

We parse the physical PCI capabilities linked list and add vPCI register
handlers for the next elements, inserting our own next value, thus presenting a
modified linked list to the domU.

Introduce helper functions vpci_hw_read8 and vpci_read_val. The vpci_read_val
helper function returns a fixed value, which may be used for read as zero
registers, or registers whose value doesn't change.

Introduce pci_find_next_cap_ttl() helper while adapting the logic from
pci_find_next_cap() to suit our needs, and implement the existing
pci_find_next_cap() in terms of the new helper.

Signed-off-by: Stewart Hildebrand 
Reviewed-by: Roger Pau Monné 
---
v8->v9:
* move local variable definitions inside loop in pci_find_next_cap_ttl()
* constify supported_caps array and cap parameter of pci_find_next_cap_ttl()
* add comment by vpci_read_val() helper
* rename s/init_bars/init_header/
* add Roger's R-b tag

v7->v8:
* use to array instead of match function
* include lib.h for ARRAY_SIZE
* don't emulate PCI_CAPABILITY_LIST register if PCI_STATUS_CAP_LIST bit is not
  set in hardware
* spell out RAZ/WI acronym
* dropped R-b tag since the patch has changed moderately since the last rev

v6->v7:
* no change

v5->v6:
* add register handlers before status register handler in init_bars()
* s/header->mask_cap_list/mask_cap_list/

v4->v5:
* use more appropriate types, continued
* get rid of unnecessary hook function
* add Jan's R-b

v3->v4:
* move mask_cap_list setting to this patch
* leave pci_find_next_cap signature alone
* use more appropriate types

v2->v3:
* get rid of > 0 in loop condition
* implement pci_find_next_cap in terms of new pci_find_next_cap_ttl function so
  that hypothetical future callers wouldn't be required to pass 
* change NULL to (void *)0 for RAZ value passed to vpci_read_val
* change type of ttl to unsigned int
* remember to mask off the low 2 bits of next in the initial loop iteration
* change return type of pci_find_next_cap and pci_find_next_cap_ttl
* avoid wrapping the PCI_STATUS_CAP_LIST condition by using ! instead of == 0

v1->v2:
* change type of ttl to int
* use switch statement instead of if/else
* adapt existing pci_find_next_cap helper instead of rolling our own
* pass ttl as in/out
* "pass through" the lower 2 bits of the next pointer
* squash helper functions into this patch to avoid transient dead code situation
* extended capabilities RAZ/WI
---
 xen/drivers/pci/pci.c | 33 ---
 xen/drivers/vpci/header.c | 67 +--
 xen/drivers/vpci/vpci.c   | 12 +++
 xen/include/xen/pci.h |  3 ++
 xen/include/xen/vpci.h|  6 
 5 files changed, 108 insertions(+), 13 deletions(-)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index 3569ccb24e9e..e6ccc86214ba 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -39,31 +39,42 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned 
int cap)
 return 0;
 }
 
-unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
-   unsigned int cap)
+unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
+   const unsigned int *cap, unsigned int n,
+   unsigned int *ttl)
 {
-u8 id;
-int ttl = 48;
-
-while ( ttl-- )
+while ( (*ttl)-- )
 {
+unsigned int id, i;
+
 pos = pci_conf_read8(sbdf, pos);
 if ( pos < 0x40 )
 break;
 
-pos &= ~3;
-id = pci_conf_read8(sbdf, pos + PCI_CAP_LIST_ID);
+id = pci_conf_read8(sbdf, (pos & ~3) + PCI_CAP_LIST_ID);
 
 if ( id == 0xff )
 break;
-if ( id == cap )
-return pos;
+for ( i = 0; i < n; i++ )
+{
+if ( id == cap[i] )
+return pos;
+}
 
-pos += PCI_CAP_LIST_NEXT;
+pos = (pos & ~3) + PCI_CAP_LIST_NEXT;
 }
+
 return 0;
 }
 
+unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
+   unsigned int cap)
+{
+unsigned int ttl = 48;
+
+return pci_find_next_cap_ttl(sbdf, pos, , 1, ) & ~3;
+}
+
 /**
  * pci_find_ext_capability - Find an extended capability
  * @sbdf: PCI device to query
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 351318121e48..3be2e21cd925 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -18,6 +18,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -513,7 +514,7 @@ static void cf_check rom_write(
 rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
-static int cf_check init_bars(struct pci_dev *pdev)
+static int cf_check init_header(struct

[PATCH v9 1/2] xen/vpci: header: status register handler

2023-12-01 Thread Stewart Hildebrand

Introduce a handler for the PCI status register, with ability to mask
the capabilities bit. The status register contains RsvdZ bits,
read-only bits, and write-1-to-clear bits. Additionally, we use RsvdP to
mask the capabilities bit. Introduce bitmasks to handle these in vPCI.
If a bit in the bitmask is set, then the special meaning applies:

  ro_mask: read normal, guest write ignore (preserve on write to hardware)
  rw1c_mask: read normal, write 1 to clear
  rsvdp_mask: read as zero, guest write ignore (preserve on write to hardware)
  rsvdz_mask: read as zero, guest write ignore (write zero to hardware)

The RO/RW1C/RsvdP/RsvdZ naming and definitions were borrowed from the
PCI Express Base 6.1 specification. RsvdP/RsvdZ bits help Xen enforce
our view of the world. Xen preserves the value of read-only bits on
write to hardware, discarding the guests write value. This is done in
case hardware wrongly implements R/O bits as R/W.

The mask_cap_list flag will be set in a follow-on patch.

Signed-off-by: Stewart Hildebrand 
---
v8->v9:
* check that masks don't have bits set above register size
* rename variable in vpci_write_helper()
* only export one vpci_add_register* function, make the other one static inline
* style fixups

v7->v8:
* move PCI_STATUS_UDF to rsvdz_mask (per PCI Express Base 6 spec)
* add support for rsvdp bits
* add tests for ro/rw1c/rsvdp/rsvdz bits in tools/tests/vpci/main.c
* dropped R-b tag [1] since the patch has changed moderately since the last rev

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-09/msg00909.html

v6->v7:
* re-work args passed to vpci_add_register_mask() (called in init_bars())
* also check for overlap of (rsvdz_mask & ro_mask) in add_register()
* slightly adjust masking operation in vpci_write_helper()

v5->v6:
* remove duplicate PCI_STATUS_CAP_LIST in constant definition
* style fixup in constant definitions
* s/res_mask/rsvdz_mask/
* combine a new masking operation into single line
* preserve r/o bits on write
* get rid of status_read. Instead, use rsvdz_mask for conditionally masking
  PCI_STATUS_CAP_LIST bit
* add comment about PCI_STATUS_CAP_LIST and rsvdp behavior
* add sanity checks in add_register
* move mask_cap_list from struct vpci_header to local variable

v4->v5:
* add support for res_mask
* add support for ro_mask (squash ro_mask patch)
* add constants for reserved, read-only, and rw1c masks

v3->v4:
* move mask_cap_list setting to the capabilities patch
* single pci_conf_read16 in status_read
* align mask_cap_list bitfield in struct vpci_header
* change to rw1c bit mask instead of treating whole register as rw1c
* drop subsystem prefix on renamed add_register function

v2->v3:
* new patch
---
 tools/tests/vpci/main.c| 111 +
 xen/drivers/vpci/header.c  |  12 
 xen/drivers/vpci/vpci.c|  52 -
 xen/include/xen/pci_regs.h |   9 +++
 xen/include/xen/vpci.h |  24 ++--
 5 files changed, 189 insertions(+), 19 deletions(-)

diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index b9a0a6006bb9..64d4552936c7 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -70,6 +70,28 @@ static void vpci_write32(const struct pci_dev *pdev, 
unsigned int reg,
 *(uint32_t *)data = val;
 }
 
+struct mask_data {
+uint32_t val;
+uint32_t rw1c_mask;
+};
+
+static uint32_t vpci_read32_mask(const struct pci_dev *pdev, unsigned int reg,
+ void *data)
+{
+const struct mask_data *md = data;
+
+return md->val;
+}
+
+static void vpci_write32_mask(const struct pci_dev *pdev, unsigned int reg,
+  uint32_t val, void *data)
+{
+struct mask_data *md = data;
+
+md->val  = val | (md->val & md->rw1c_mask);
+md->val &= ~(val & md->rw1c_mask);
+}
+
 #define VPCI_READ(reg, size, data) ({   \
 data = vpci_read((pci_sbdf_t){ .sbdf = 0 }, reg, size); \
 })
@@ -94,9 +116,21 @@ static void vpci_write32(const struct pci_dev *pdev, 
unsigned int reg,
 assert(!vpci_add_register(test_pdev.vpci, fread, fwrite, off, size, \
   ))
 
+#define VPCI_ADD_REG_MASK(fread, fwrite, off, size, store, 
\
+  ro_mask, rw1c_mask, rsvdp_mask, rsvdz_mask)  
\
+assert(!vpci_add_register_mask(test_pdev.vpci, fread, fwrite, off, size,   
\
+   , 
\
+   ro_mask, rw1c_mask, rsvdp_mask, rsvdz_mask))
+
 #define VPCI_ADD_INVALID_REG(fread, fwrite, off, size)  \
 assert(vpci_add_register(test_pdev.vpci, fread, fwrite, off, size, NULL))
 
+#define VPCI_ADD_INVALID_REG_MASK(fread, fwrite, off, size,   \
+  ro_mask, rw1c_mask, rsvdp_mask, rsvdz_mask) \
+assert(vpci_add_register_mask(test_pdev.vpci, fread, fwrite, off, size,   \
+

[PATCH v9 0/2] vPCI capabilities filtering

2023-12-01 Thread Stewart Hildebrand

This small series enables vPCI to filter which PCI capabilities we expose to a
domU. This series adds vPCI register handlers within
xen/drivers/vpci/header.c:init_bars(), along with some supporting functions.

Note there are minor rebase conflicts with the in-progress vPCI series [1].
These conflicts fall into the category of functions and code being added
adjacent to one another, so are easily resolved. I did not identify any
dependency on the vPCI locking work, and the two series deal with different
aspects of emulating the PCI header.

Future work may involve adding handlers for more registers in the vPCI header,
such as VID/DID, etc. Future work may also involve exposing additional
capabilities to the guest for broader device/driver support.

v8->v9:
* address feedback

v7->v8:
* address feedback

v6->v7:
* address feedback in ("xen/vpci: header: status register handler")
* drop ("xen/pci: convert pci_find_*cap* to pci_sbdf_t") and
  ("x86/msi: rearrange read_pci_mem_bar slightly") as they were committed

v5->v6:
* drop ("xen/pci: update PCI_STATUS_* constants") as it has been committed

v4->v5:
* drop ("x86/msi: remove some unused-but-set-variables") as it has been
  committed
* add ("xen/pci: update PCI_STATUS_* constants")
* squash ro_mask patch

v3->v4:
* drop "xen/pci: address a violation of MISRA C:2012 Rule 8.3" as it has been
  committed
* re-order status register handler and capabilities filtering patches
* split an unrelated change from ("xen/pci: convert pci_find_*cap* to 
pci_sbdf_t")
  into its own patch
* add new patch ("x86/msi: rearrange read_pci_mem_bar slightly") based on
  feedback
* add new RFC patch ("xen/vpci: support ro mask")

v2->v3:
* drop RFC "xen/vpci: header: avoid cast for value passed to vpci_read_val"
* minor misra C violation fixup in preparatory patch
* switch to pci_sbdf_t in preparatory patch
* introduce status handler

v1->v2:
* squash helper functions into the patch where they are used to avoid transient
  dead code situation
* add new RFC patch, possibly throwaway, to get an idea of what it would look
  like to get rid of the (void *)(uintptr_t) cast by introducing a new memory
  allocation

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-08/msg02361.html

Stewart Hildebrand (2):
  xen/vpci: header: status register handler
  xen/vpci: header: filter PCI capabilities

 tools/tests/vpci/main.c| 111 +
 xen/drivers/pci/pci.c  |  33 +++
 xen/drivers/vpci/header.c  |  79 +-
 xen/drivers/vpci/vpci.c|  64 -
 xen/include/xen/pci.h  |   3 +
 xen/include/xen/pci_regs.h |   9 +++
 xen/include/xen/vpci.h |  30 --
 7 files changed, 297 insertions(+), 32 deletions(-)


base-commit: 1571ff7a987b88b20598a6d49910457f3b2c59f1
-- 
2.43.0

[xen-unstable test] 183959: tolerable FAIL

flight 183959 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183959/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-installfail pass in 183952

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop fail in 183952 blocked in 
183959
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183952
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183952
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183952
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183952
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183952
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183952
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183952
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183952
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183952
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183952
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183952
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  f0dd0cd9598f22ee5509bb5d1466e4821834c4ba
baseline version:
 xen  f0dd0cd9598f22ee5509bb5d1466e4821834c4ba

Last test of basis   183959  2023-12-01 03:24:21 Z0 days
Testing same since  (not found) 0 attempts

jobs:

Re: [PATCH v2] xen: address violations of MISRA C:2012 Rule 11.8.


On 01/12/23 14:48, Julien Grall wrote:



On 01/12/2023 13:42, Simone Ballarin wrote:

On 01/12/23 12:48, Julien Grall wrote:

Hi Simone,

On 01/12/2023 11:37, Simone Ballarin wrote:

From: Maria Celeste Cesario 

Remove or amend casts to comply with Rule 11.8.

The violations are resolved either by adding missing const
qualifiers in casts or by removing unnecessary cast.

Change type of operands from char* to uintptr_t: uintptr_t is
the appropriate type for memory address operations.

No functional change.

---
Changes in v2:
- arm/regs.h: add const qualifier to the first operand,
 change types of both operands from char* to uintptr_t.
- x86/regs.h: add const qualifier to both operands. Change
 types of both operands from char* to uintptr_t to
 conform with the arm version.
- dom0less-build.c: rebase change in the new file.

Signed-off-by: Maria Celeste Cesario 


Signed-off-by: Simone Ballarin  
---
  xen/arch/arm/dom0less-build.c | 2 +-
  xen/arch/arm/include/asm/atomic.h | 2 +-
  xen/arch/arm/include/asm/regs.h   | 2 +-
  xen/arch/x86/include/asm/regs.h   | 3 ++-
  4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c 
b/xen/arch/arm/dom0less-build.c

index d39cbd969a..fb63ec6fd1 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -354,7 +354,7 @@ static int __init handle_passthrough_prop(struct 
kernel_info *kinfo,

  if ( node == NULL )
  {
  printk(XENLOG_ERR "Couldn't find node %s in host_dt!\n",
-   (char *)xen_path->data);
+   xen_path->data);
  return -EINVAL;
  }
diff --git a/xen/arch/arm/include/asm/atomic.h 
b/xen/arch/arm/include/asm/atomic.h

index 64314d59b3..517216d2a8 100644
--- a/xen/arch/arm/include/asm/atomic.h
+++ b/xen/arch/arm/include/asm/atomic.h
@@ -154,7 +154,7 @@ static always_inline void 
write_atomic_size(volatile void *p,

   */
  static inline int atomic_read(const atomic_t *v)
  {
-    return *(volatile int *)>counter;
+    return *(const volatile int *)>counter;
  }
  static inline int _atomic_read(atomic_t v)
diff --git a/xen/arch/arm/include/asm/regs.h 
b/xen/arch/arm/include/asm/regs.h

index 8a0db95415..b28eb5de7a 100644
--- a/xen/arch/arm/include/asm/regs.h
+++ b/xen/arch/arm/include/asm/regs.h
@@ -48,7 +48,7 @@ static inline bool regs_mode_is_32bit(const struct 
cpu_user_regs *regs)

  static inline bool guest_mode(const struct cpu_user_regs *r)
  {
-    unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);
+    unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - 
(const uintptr_t)(r);


NIT: The const should not be necessary here. Am I correct?


The const in the first parameter is not necessary, I will drop it.


I am confused. In the case of 'r' the const applied to the pointee not 
the pointer (e.g. the pointer can be modified but not the content). So 
the 'const' should not be necessary even for the second parameter.




Yes, sorry. Here there is no reason to use a const: if we cast to a 
non-pointer type (uintptr_t) rule 11.8 does not apply.



Cheers,



--
Simone Ballarin, M.Sc.

Field Application Engineer, BUGSENG (https://bugseng.com)

Re: [PATCH v2] xen: address violations of MISRA C:2012 Rule 11.8.





On 01/12/2023 13:42, Simone Ballarin wrote:

On 01/12/23 12:48, Julien Grall wrote:

Hi Simone,

On 01/12/2023 11:37, Simone Ballarin wrote:

From: Maria Celeste Cesario 

Remove or amend casts to comply with Rule 11.8.

The violations are resolved either by adding missing const
qualifiers in casts or by removing unnecessary cast.

Change type of operands from char* to uintptr_t: uintptr_t is
the appropriate type for memory address operations.

No functional change.

---
Changes in v2:
- arm/regs.h: add const qualifier to the first operand,
 change types of both operands from char* to uintptr_t.
- x86/regs.h: add const qualifier to both operands. Change
 types of both operands from char* to uintptr_t to
 conform with the arm version.
- dom0less-build.c: rebase change in the new file.

Signed-off-by: Maria Celeste Cesario  


Signed-off-by: Simone Ballarin  
---
  xen/arch/arm/dom0less-build.c | 2 +-
  xen/arch/arm/include/asm/atomic.h | 2 +-
  xen/arch/arm/include/asm/regs.h   | 2 +-
  xen/arch/x86/include/asm/regs.h   | 3 ++-
  4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c 
b/xen/arch/arm/dom0less-build.c

index d39cbd969a..fb63ec6fd1 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -354,7 +354,7 @@ static int __init handle_passthrough_prop(struct 
kernel_info *kinfo,

  if ( node == NULL )
  {
  printk(XENLOG_ERR "Couldn't find node %s in host_dt!\n",
-   (char *)xen_path->data);
+   xen_path->data);
  return -EINVAL;
  }
diff --git a/xen/arch/arm/include/asm/atomic.h 
b/xen/arch/arm/include/asm/atomic.h

index 64314d59b3..517216d2a8 100644
--- a/xen/arch/arm/include/asm/atomic.h
+++ b/xen/arch/arm/include/asm/atomic.h
@@ -154,7 +154,7 @@ static always_inline void 
write_atomic_size(volatile void *p,

   */
  static inline int atomic_read(const atomic_t *v)
  {
-    return *(volatile int *)>counter;
+    return *(const volatile int *)>counter;
  }
  static inline int _atomic_read(atomic_t v)
diff --git a/xen/arch/arm/include/asm/regs.h 
b/xen/arch/arm/include/asm/regs.h

index 8a0db95415..b28eb5de7a 100644
--- a/xen/arch/arm/include/asm/regs.h
+++ b/xen/arch/arm/include/asm/regs.h
@@ -48,7 +48,7 @@ static inline bool regs_mode_is_32bit(const struct 
cpu_user_regs *regs)

  static inline bool guest_mode(const struct cpu_user_regs *r)
  {
-    unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);
+    unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - 
(const uintptr_t)(r);


NIT: The const should not be necessary here. Am I correct?


The const in the first parameter is not necessary, I will drop it.


I am confused. In the case of 'r' the const applied to the pointee not 
the pointer (e.g. the pointer can be modified but not the content). So 
the 'const' should not be necessary even for the second parameter.


Cheers,

--
Julien Grall

Re: [PATCH v2] xen: address violations of MISRA C:2012 Rule 11.8.


On 01/12/23 14:03, Jan Beulich wrote:

On 01.12.2023 12:48, Julien Grall wrote:

On 01/12/2023 11:37, Simone Ballarin wrote:

--- a/xen/arch/arm/include/asm/regs.h
+++ b/xen/arch/arm/include/asm/regs.h
@@ -48,7 +48,7 @@ static inline bool regs_mode_is_32bit(const struct 
cpu_user_regs *regs)
   
   static inline bool guest_mode(const struct cpu_user_regs *r)

   {
-unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);
+unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - (const 
uintptr_t)(r);


NIT: The const should not be necessary here. Am I correct?


--- a/xen/arch/x86/include/asm/regs.h
+++ b/xen/arch/x86/include/asm/regs.h
@@ -6,7 +6,8 @@
   
   #define guest_mode(r) \

   ({   
 \
-unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r); \
+unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - \
+(const uintptr_t(r)); \


Was this compiled on x86? Shouldn't the last one be (const uintptr_t)(r))?


And again with the stray const-s dropped and with indentation adjusted.



I will remove the const in the first parameter and fix the indentation
in the following way:

unsigned long diff = (uintptr_t)guest_cpu_user_regs() -\
 (const uintptr_t)(r); \


Jan



--
Simone Ballarin, M.Sc.

Field Application Engineer, BUGSENG (https://bugseng.com)

Re: [PATCH v2] xen: address violations of MISRA C:2012 Rule 11.8.


On 01/12/23 12:48, Julien Grall wrote:

Hi Simone,

On 01/12/2023 11:37, Simone Ballarin wrote:

From: Maria Celeste Cesario 

Remove or amend casts to comply with Rule 11.8.

The violations are resolved either by adding missing const
qualifiers in casts or by removing unnecessary cast.

Change type of operands from char* to uintptr_t: uintptr_t is
the appropriate type for memory address operations.

No functional change.

---
Changes in v2:
- arm/regs.h: add const qualifier to the first operand,
 change types of both operands from char* to uintptr_t.
- x86/regs.h: add const qualifier to both operands. Change
 types of both operands from char* to uintptr_t to
 conform with the arm version.
- dom0less-build.c: rebase change in the new file.

Signed-off-by: Maria Celeste Cesario  
Signed-off-by: Simone Ballarin  
---
  xen/arch/arm/dom0less-build.c | 2 +-
  xen/arch/arm/include/asm/atomic.h | 2 +-
  xen/arch/arm/include/asm/regs.h   | 2 +-
  xen/arch/x86/include/asm/regs.h   | 3 ++-
  4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c 
b/xen/arch/arm/dom0less-build.c

index d39cbd969a..fb63ec6fd1 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -354,7 +354,7 @@ static int __init handle_passthrough_prop(struct 
kernel_info *kinfo,

  if ( node == NULL )
  {
  printk(XENLOG_ERR "Couldn't find node %s in host_dt!\n",
-   (char *)xen_path->data);
+   xen_path->data);
  return -EINVAL;
  }
diff --git a/xen/arch/arm/include/asm/atomic.h 
b/xen/arch/arm/include/asm/atomic.h

index 64314d59b3..517216d2a8 100644
--- a/xen/arch/arm/include/asm/atomic.h
+++ b/xen/arch/arm/include/asm/atomic.h
@@ -154,7 +154,7 @@ static always_inline void 
write_atomic_size(volatile void *p,

   */
  static inline int atomic_read(const atomic_t *v)
  {
-    return *(volatile int *)>counter;
+    return *(const volatile int *)>counter;
  }
  static inline int _atomic_read(atomic_t v)
diff --git a/xen/arch/arm/include/asm/regs.h 
b/xen/arch/arm/include/asm/regs.h

index 8a0db95415..b28eb5de7a 100644
--- a/xen/arch/arm/include/asm/regs.h
+++ b/xen/arch/arm/include/asm/regs.h
@@ -48,7 +48,7 @@ static inline bool regs_mode_is_32bit(const struct 
cpu_user_regs *regs)

  static inline bool guest_mode(const struct cpu_user_regs *r)
  {
-    unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);
+    unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - 
(const uintptr_t)(r);


NIT: The const should not be necessary here. Am I correct?


The const in the first parameter is not necessary, I will drop it.




  /* Frame pointer must point into current CPU stack. */
  ASSERT(diff < STACK_SIZE);
  /* If not a guest frame, it must be a hypervisor frame. */
diff --git a/xen/arch/x86/include/asm/regs.h 
b/xen/arch/x86/include/asm/regs.h

index 3fb94deedc..64f1e0d400 100644
--- a/xen/arch/x86/include/asm/regs.h
+++ b/xen/arch/x86/include/asm/regs.h
@@ -6,7 +6,8 @@
  #define 
guest_mode(r) \
  
({    \
-    unsigned long diff = (char *)guest_cpu_user_regs() - (char 
*)(r); \
+    unsigned long diff = (const uintptr_t)guest_cpu_user_regs() 
- \
+    (const 
uintptr_t(r)); \


Was this compiled on x86? Shouldn't the last one be (const uintptr_t)(r))?



Yes, you are right. I'll fix in it in v3.


Cheers,



--
Simone Ballarin, M.Sc.

Field Application Engineer, BUGSENG (https://bugseng.com)

[xen-unstable-smoke test] 183963: tolerable all pass - PUSHED

flight 183963 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183963/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1571ff7a987b88b20598a6d49910457f3b2c59f1
baseline version:
 xen  def73fc14407252cc801f35cd7746e60ccd70884

Last test of basis   183960  2023-12-01 04:00:28 Z0 days
Testing same since   183963  2023-12-01 10:02:10 Z0 days1 attempts


People who touched revisions under test:
  Alejandro Vallejo 
  Federico Serafini 
  Jan Beulich 
  Julien Grall 
  Nicola Vetrini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   def73fc144..1571ff7a98  1571ff7a987b88b20598a6d49910457f3b2c59f1 -> smoke

Re: [PATCH v2] xen: address violations of MISRA C:2012 Rule 11.8.

On 01.12.2023 12:48, Julien Grall wrote:
> On 01/12/2023 11:37, Simone Ballarin wrote:
>> --- a/xen/arch/arm/include/asm/regs.h
>> +++ b/xen/arch/arm/include/asm/regs.h
>> @@ -48,7 +48,7 @@ static inline bool regs_mode_is_32bit(const struct 
>> cpu_user_regs *regs)
>>   
>>   static inline bool guest_mode(const struct cpu_user_regs *r)
>>   {
>> -unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);
>> +unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - (const 
>> uintptr_t)(r);
> 
> NIT: The const should not be necessary here. Am I correct?
> 
>> --- a/xen/arch/x86/include/asm/regs.h
>> +++ b/xen/arch/x86/include/asm/regs.h
>> @@ -6,7 +6,8 @@
>>   
>>   #define guest_mode(r)  
>>\
>>   ({ 
>>\
>> -unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);   
>>   \
>> +unsigned long diff = (const uintptr_t)guest_cpu_user_regs() -   
>>   \
>> +(const 
>> uintptr_t(r)); \
> 
> Was this compiled on x86? Shouldn't the last one be (const uintptr_t)(r))?

And again with the stray const-s dropped and with indentation adjusted.

Jan

Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

On 01.12.2023 12:31, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 11:59:09AM +0100, Jan Beulich wrote:
>> On 01.12.2023 11:21, Roger Pau Monné wrote:
>>> On Fri, Dec 01, 2023 at 10:41:45AM +0100, Jan Beulich wrote:
 On 01.12.2023 09:50, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 07:53:29AM +0100, Jan Beulich wrote:
>> On 30.11.2023 18:37, Roger Pau Monné wrote:
>>> On Thu, Nov 30, 2023 at 05:55:07PM +0100, Jan Beulich wrote:
 On 28.11.2023 11:03, Roger Pau Monne wrote:
> The minimal function size requirements for livepatch are either 5 
> bytes (for
> jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always 
> at least
> that size by requesting the compiled to align the functions to 8 or 
> 16 bytes,
> depending on whether Xen is build with IBT support.

 How is alignment going to enforce minimum function size? If a function 
 is
 last in a section, there may not be any padding added (ahead of 
 linking at
 least). The trailing padding also isn't part of the function.
>>>
>>> If each function lives in it's own section (by using
>>> -ffunction-sections), and each section is aligned, then I think we can
>>> guarantee that there will always be enough padding space?
>>>
>>> Even the last function/section on the .text block would still be
>>> aligned, and as long as the function alignment <= SECTION_ALIGN
>>> there will be enough padding left.  I should add some build time
>>> assert that CONFIG_CC_FUNCTION_ALIGNMENT <= SECTION_ALIGN.
>>
>> I'm not sure of there being a requirement for a section to be padded to
>> its alignment. If the following section has smaller alignment, it could
>> be made start earlier. Of course our linker scripts might guarantee
>> this ...
>
> I do think so, given our linker script arrangements for the .text
> section:
>
> DECL_SECTION(.text) {
> [...]
> } PHDR(text) = 0x9090
>
> . = ALIGN(SECTION_ALIGN);
>
> The end of the text section is aligned to SECTION_ALIGN, so as long as
> SECTION_ALIGN >= CONFIG_CC_FUNCTION_ALIGNMENT the alignment should
> guarantee a minimal function size.
>
> Do you think it would be clearer if I add the following paragraph:
>
> "Given the Xen linker script arrangement of the .text section, we can
> ensure that when all functions are aligned to the given boundary the
> function size will always be a multiple of such alignment, even for
> the last function in .text, as the linker script aligns the end of the
> section to SECTION_ALIGN."

 I think this would be useful to have there. Beyond that, assembly code
 also needs considering btw.
>>>
>>> Assembly will get dealt with once we start to also have separate
>>> sections for each assembly function.  We cannot patch assembly code at
>>> the moment anyway, due to lack of debug symbols.
>>
>> Well, yes, that's one part of it. The other is that some .text coming
>> from an assembly source may follow one coming from some C source, and
>> if the assembly one then isn't properly aligned, padding space again
>> wouldn't necessarily be large enough. This may be alright now (where
>> .text is the only thing that can come from .S and would be linked
>> ahead of all .text.*, being the only thing that can come from .c),
> 
> What about adding:
> 
> #ifdef CONFIG_CC_SPLIT_SECTIONS
>*(.text.*)
> #endif
> #ifdef CONFIG_CC_FUNCTION_ALIGNMENT
>/* Ensure enough padding regardless of next section alignment. */
>. = ALIGN(CONFIG_CC_FUNCTION_ALIGNMENT)
> #endif
> 
> In order to assert that the end of .text.* is also aligned?

Probably.

>> but
>> it might subtly when assembly code is also switched to per-function
>> sections (you may recall that a patch to this effect is already
>> pending: "common: honor CONFIG_CC_SPLIT_SECTIONS also for assembly
>> functions").
> 
> Yes, I think such patch should also honor the required alignment
> specified in CONFIG_CC_FUNCTION_ALIGNMENT.

I've added a note for myself to that patch, to adjust once yours has
landed (which given the state of my series is likely going to be much
earlier).

Jan

Re: [PATCH v2 3/5] xen/x86: introduce self modifying code test

On Thu, Nov 30, 2023 at 06:02:55PM +0100, Jan Beulich wrote:
> On 28.11.2023 11:03, Roger Pau Monne wrote:
> > --- /dev/null
> > +++ b/xen/arch/x86/test-smc.c
> > @@ -0,0 +1,68 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +static bool cf_check test_insn_replacement(void)
> > +{
> > +#define EXPECTED_VALUE 2
> > +unsigned int r = ~EXPECTED_VALUE;
> 
> The compiler is permitted to elide the initializer unless ...
> 
> > +alternative_io("", "mov $" STR(EXPECTED_VALUE) ", %0",
> > +   X86_FEATURE_ALWAYS, "=r"(r));
> 
> ... you use "+r" here.

I see, '=' assumes the operand is always written to, which is not the
case if alternative is not applied.

> Also (nit) there's a blank missing between that
> string and the opening parethesis. Also what's wrong with passing
> EXPECTED_VALUE in as an "i" constraint input operand?

Me not knowing enough inline assembly I think, that's what's wrong.

> > @@ -1261,6 +1269,7 @@ struct xen_sysctl {
> >  struct xen_sysctl_livepatch_op  livepatch;
> >  #if defined(__i386__) || defined(__x86_64__)
> >  struct xen_sysctl_cpu_policycpu_policy;
> > +struct xen_sysctl_test_smc  smc;
> 
> Imo the field name would better be test_smc (leaving aside Stefano's comment).

Right, will see what Stefano thinks about using test_smoc.

Thanks, Roger.

Re: [PATCH v2 3/5] xen/x86: introduce self modifying code test

On Wed, Nov 29, 2023 at 06:58:38PM -0800, Stefano Stabellini wrote:
> On Tue, 28 Nov 2023, Roger Pau Monne wrote:
> > Introduce a helper to perform checks related to self modifying code, and 
> > start
> > by creating a simple test to check that alternatives have been applied.
> > 
> > Such test is hooked into the boot process and called just after alternatives
> > have been applied.  In case of failure a message is printed, and the 
> > hypervisor
> > is tainted as not having passed the tests, this does require introducing a 
> > new
> > taint bit (printed as 'A').
> > 
> > A new sysctl is also introduced to run the tests on demand.  While there 
> > are no
> > current users introduced here, further changes will introduce those, and 
> > it's
> > helpful to have the interface defined in the sysctl header from the start.
> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Changes since v1:
> >  - Rework test and interface.
> > ---
> >  tools/include/xenctrl.h |  2 +
> >  tools/libs/ctrl/xc_misc.c   | 14 ++
> >  xen/arch/x86/Makefile   |  1 +
> >  xen/arch/x86/include/asm/test-smc.h | 18 
> >  xen/arch/x86/setup.c|  3 ++
> >  xen/arch/x86/sysctl.c   |  7 +++
> >  xen/arch/x86/test-smc.c | 68 +
> 
> If possible, could we name this differently?

Wikipedia also suggests 'smoc' as an alternative acronym for
self-modifying code, would test-smoc be OK?

Thanks, Roger.

Re: [PATCH v3] xen/public: fix flexible array definitions





On 01/12/2023 11:20, Juergen Gross wrote:

Flexible arrays in public headers can be problematic with some
compilers.

With XEN_FLEX_ARRAY_DIM there is a mechanism available to deal with
this issue, but care must be taken to not change the affected structs
in an incompatible way.

So bump __XEN_LATEST_INTERFACE_VERSION__ and introduce a new macro
XENPV_FLEX_ARRAY_DIM which will be XENPV_FLEX_ARRAY_DIM with the
interface version being new enough and "1" (the value used today in
the affected headers) when the interface version is an old one.

Replace the arr[1] instances (this includes the ones seen to be
problematic in recent Linux kernels [1]) with arr[XENPV_FLEX_ARRAY_DIM]
in order to avoid compilation errors.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217693

Signed-off-by: Juergen Gross 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v2] xen: address violations of MISRA C:2012 Rule 11.8.


Hi Simone,

On 01/12/2023 11:37, Simone Ballarin wrote:

From: Maria Celeste Cesario 

Remove or amend casts to comply with Rule 11.8.

The violations are resolved either by adding missing const
qualifiers in casts or by removing unnecessary cast.

Change type of operands from char* to uintptr_t: uintptr_t is
the appropriate type for memory address operations.

No functional change.

---
Changes in v2:
- arm/regs.h: add const qualifier to the first operand,
 change types of both operands from char* to uintptr_t.
- x86/regs.h: add const qualifier to both operands. Change
 types of both operands from char* to uintptr_t to
 conform with the arm version.
- dom0less-build.c: rebase change in the new file.

Signed-off-by: Maria Celeste Cesario  
Signed-off-by: Simone Ballarin  
---
  xen/arch/arm/dom0less-build.c | 2 +-
  xen/arch/arm/include/asm/atomic.h | 2 +-
  xen/arch/arm/include/asm/regs.h   | 2 +-
  xen/arch/x86/include/asm/regs.h   | 3 ++-
  4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index d39cbd969a..fb63ec6fd1 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -354,7 +354,7 @@ static int __init handle_passthrough_prop(struct 
kernel_info *kinfo,
  if ( node == NULL )
  {
  printk(XENLOG_ERR "Couldn't find node %s in host_dt!\n",
-   (char *)xen_path->data);
+   xen_path->data);
  return -EINVAL;
  }
  
diff --git a/xen/arch/arm/include/asm/atomic.h b/xen/arch/arm/include/asm/atomic.h

index 64314d59b3..517216d2a8 100644
--- a/xen/arch/arm/include/asm/atomic.h
+++ b/xen/arch/arm/include/asm/atomic.h
@@ -154,7 +154,7 @@ static always_inline void write_atomic_size(volatile void 
*p,
   */
  static inline int atomic_read(const atomic_t *v)
  {
-return *(volatile int *)>counter;
+return *(const volatile int *)>counter;
  }
  
  static inline int _atomic_read(atomic_t v)

diff --git a/xen/arch/arm/include/asm/regs.h b/xen/arch/arm/include/asm/regs.h
index 8a0db95415..b28eb5de7a 100644
--- a/xen/arch/arm/include/asm/regs.h
+++ b/xen/arch/arm/include/asm/regs.h
@@ -48,7 +48,7 @@ static inline bool regs_mode_is_32bit(const struct 
cpu_user_regs *regs)
  
  static inline bool guest_mode(const struct cpu_user_regs *r)

  {
-unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);
+unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - (const 
uintptr_t)(r);


NIT: The const should not be necessary here. Am I correct?


  /* Frame pointer must point into current CPU stack. */
  ASSERT(diff < STACK_SIZE);
  /* If not a guest frame, it must be a hypervisor frame. */
diff --git a/xen/arch/x86/include/asm/regs.h b/xen/arch/x86/include/asm/regs.h
index 3fb94deedc..64f1e0d400 100644
--- a/xen/arch/x86/include/asm/regs.h
+++ b/xen/arch/x86/include/asm/regs.h
@@ -6,7 +6,8 @@
  
  #define guest_mode(r) \

  ({
\
-unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r); \
+unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - \
+(const uintptr_t(r)); \


Was this compiled on x86? Shouldn't the last one be (const uintptr_t)(r))?

Cheers,

--
Julien Grall

[PATCH v2] xen: address violations of MISRA C:2012 Rule 11.8.

From: Maria Celeste Cesario 

Remove or amend casts to comply with Rule 11.8.

The violations are resolved either by adding missing const
qualifiers in casts or by removing unnecessary cast.

Change type of operands from char* to uintptr_t: uintptr_t is
the appropriate type for memory address operations.

No functional change.

---
Changes in v2:
- arm/regs.h: add const qualifier to the first operand,
change types of both operands from char* to uintptr_t.
- x86/regs.h: add const qualifier to both operands. Change
types of both operands from char* to uintptr_t to
conform with the arm version.
- dom0less-build.c: rebase change in the new file.

Signed-off-by: Maria Celeste Cesario  
Signed-off-by: Simone Ballarin  
---
 xen/arch/arm/dom0less-build.c | 2 +-
 xen/arch/arm/include/asm/atomic.h | 2 +-
 xen/arch/arm/include/asm/regs.h   | 2 +-
 xen/arch/x86/include/asm/regs.h   | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index d39cbd969a..fb63ec6fd1 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -354,7 +354,7 @@ static int __init handle_passthrough_prop(struct 
kernel_info *kinfo,
 if ( node == NULL )
 {
 printk(XENLOG_ERR "Couldn't find node %s in host_dt!\n",
-   (char *)xen_path->data);
+   xen_path->data);
 return -EINVAL;
 }
 
diff --git a/xen/arch/arm/include/asm/atomic.h 
b/xen/arch/arm/include/asm/atomic.h
index 64314d59b3..517216d2a8 100644
--- a/xen/arch/arm/include/asm/atomic.h
+++ b/xen/arch/arm/include/asm/atomic.h
@@ -154,7 +154,7 @@ static always_inline void write_atomic_size(volatile void 
*p,
  */
 static inline int atomic_read(const atomic_t *v)
 {
-return *(volatile int *)>counter;
+return *(const volatile int *)>counter;
 }
 
 static inline int _atomic_read(atomic_t v)
diff --git a/xen/arch/arm/include/asm/regs.h b/xen/arch/arm/include/asm/regs.h
index 8a0db95415..b28eb5de7a 100644
--- a/xen/arch/arm/include/asm/regs.h
+++ b/xen/arch/arm/include/asm/regs.h
@@ -48,7 +48,7 @@ static inline bool regs_mode_is_32bit(const struct 
cpu_user_regs *regs)
 
 static inline bool guest_mode(const struct cpu_user_regs *r)
 {
-unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r);
+unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - (const 
uintptr_t)(r);
 /* Frame pointer must point into current CPU stack. */
 ASSERT(diff < STACK_SIZE);
 /* If not a guest frame, it must be a hypervisor frame. */
diff --git a/xen/arch/x86/include/asm/regs.h b/xen/arch/x86/include/asm/regs.h
index 3fb94deedc..64f1e0d400 100644
--- a/xen/arch/x86/include/asm/regs.h
+++ b/xen/arch/x86/include/asm/regs.h
@@ -6,7 +6,8 @@
 
 #define guest_mode(r) \
 ({\
-unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r); \
+unsigned long diff = (const uintptr_t)guest_cpu_user_regs() - \
+(const uintptr_t(r)); \
 /* Frame pointer must point into current CPU stack. */\
 ASSERT(diff < STACK_SIZE);\
 /* If not a guest frame, it must be a hypervisor frame. */\
-- 
2.40.0

Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

On Fri, Dec 01, 2023 at 11:59:09AM +0100, Jan Beulich wrote:
> On 01.12.2023 11:21, Roger Pau Monné wrote:
> > On Fri, Dec 01, 2023 at 10:41:45AM +0100, Jan Beulich wrote:
> >> On 01.12.2023 09:50, Roger Pau Monné wrote:
> >>> On Fri, Dec 01, 2023 at 07:53:29AM +0100, Jan Beulich wrote:
>  On 30.11.2023 18:37, Roger Pau Monné wrote:
> > On Thu, Nov 30, 2023 at 05:55:07PM +0100, Jan Beulich wrote:
> >> On 28.11.2023 11:03, Roger Pau Monne wrote:
> >>> The minimal function size requirements for livepatch are either 5 
> >>> bytes (for
> >>> jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always 
> >>> at least
> >>> that size by requesting the compiled to align the functions to 8 or 
> >>> 16 bytes,
> >>> depending on whether Xen is build with IBT support.
> >>
> >> How is alignment going to enforce minimum function size? If a function 
> >> is
> >> last in a section, there may not be any padding added (ahead of 
> >> linking at
> >> least). The trailing padding also isn't part of the function.
> >
> > If each function lives in it's own section (by using
> > -ffunction-sections), and each section is aligned, then I think we can
> > guarantee that there will always be enough padding space?
> >
> > Even the last function/section on the .text block would still be
> > aligned, and as long as the function alignment <= SECTION_ALIGN
> > there will be enough padding left.  I should add some build time
> > assert that CONFIG_CC_FUNCTION_ALIGNMENT <= SECTION_ALIGN.
> 
>  I'm not sure of there being a requirement for a section to be padded to
>  its alignment. If the following section has smaller alignment, it could
>  be made start earlier. Of course our linker scripts might guarantee
>  this ...
> >>>
> >>> I do think so, given our linker script arrangements for the .text
> >>> section:
> >>>
> >>> DECL_SECTION(.text) {
> >>> [...]
> >>> } PHDR(text) = 0x9090
> >>>
> >>> . = ALIGN(SECTION_ALIGN);
> >>>
> >>> The end of the text section is aligned to SECTION_ALIGN, so as long as
> >>> SECTION_ALIGN >= CONFIG_CC_FUNCTION_ALIGNMENT the alignment should
> >>> guarantee a minimal function size.
> >>>
> >>> Do you think it would be clearer if I add the following paragraph:
> >>>
> >>> "Given the Xen linker script arrangement of the .text section, we can
> >>> ensure that when all functions are aligned to the given boundary the
> >>> function size will always be a multiple of such alignment, even for
> >>> the last function in .text, as the linker script aligns the end of the
> >>> section to SECTION_ALIGN."
> >>
> >> I think this would be useful to have there. Beyond that, assembly code
> >> also needs considering btw.
> > 
> > Assembly will get dealt with once we start to also have separate
> > sections for each assembly function.  We cannot patch assembly code at
> > the moment anyway, due to lack of debug symbols.
> 
> Well, yes, that's one part of it. The other is that some .text coming
> from an assembly source may follow one coming from some C source, and
> if the assembly one then isn't properly aligned, padding space again
> wouldn't necessarily be large enough. This may be alright now (where
> .text is the only thing that can come from .S and would be linked
> ahead of all .text.*, being the only thing that can come from .c),

What about adding:

#ifdef CONFIG_CC_SPLIT_SECTIONS
   *(.text.*)
#endif
#ifdef CONFIG_CC_FUNCTION_ALIGNMENT
   /* Ensure enough padding regardless of next section alignment. */
   . = ALIGN(CONFIG_CC_FUNCTION_ALIGNMENT)
#endif

In order to assert that the end of .text.* is also aligned?

> but
> it might subtly when assembly code is also switched to per-function
> sections (you may recall that a patch to this effect is already
> pending: "common: honor CONFIG_CC_SPLIT_SECTIONS also for assembly
> functions").

Yes, I think such patch should also honor the required alignment
specified in CONFIG_CC_FUNCTION_ALIGNMENT.

Thanks, Roger.

Re: [PATCH v3] xen/public: fix flexible array definitions

On 01.12.2023 12:20, Juergen Gross wrote:
> Flexible arrays in public headers can be problematic with some
> compilers.
> 
> With XEN_FLEX_ARRAY_DIM there is a mechanism available to deal with
> this issue, but care must be taken to not change the affected structs
> in an incompatible way.
> 
> So bump __XEN_LATEST_INTERFACE_VERSION__ and introduce a new macro
> XENPV_FLEX_ARRAY_DIM which will be XENPV_FLEX_ARRAY_DIM with the
> interface version being new enough and "1" (the value used today in
> the affected headers) when the interface version is an old one.
> 
> Replace the arr[1] instances (this includes the ones seen to be
> problematic in recent Linux kernels [1]) with arr[XENPV_FLEX_ARRAY_DIM]
> in order to avoid compilation errors.
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217693
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 

> --- a/xen/include/public/io/ring.h
> +++ b/xen/include/public/io/ring.h
> @@ -25,8 +25,16 @@
>   * and grant_table.h from the Xen public headers.
>   */
>  
> +#include "../xen.h"
>  #include "../xen-compat.h"

Just to mention it: While perhaps good practice, I'm not convinced this
extra #include is actually needed here.

> +/* Some PV I/O interfaces need a compatibility variant. */
> +#if __XEN_INTERFACE_VERSION__ < 0x00041300
> +#define XENPV_FLEX_ARRAY_DIM  1 /* variable size */
> +#else
> +#define XENPV_FLEX_ARRAY_DIM  XEN_FLEX_ARRAY_DIM
> +#endif
> +
>  #if __XEN_INTERFACE_VERSION__ < 0x00030208
>  #define xen_mb()  mb()
>  #define xen_rmb() rmb()
> @@ -110,7 +118,7 @@ struct __name##_sring {   
>   \
>  uint8_t pvt_pad[4]; \
>  } pvt;  \
>  uint8_t __pad[44];  \
> -union __name##_sring_entry ring[1]; /* variable-length */   \
> +union __name##_sring_entry ring[XENPV_FLEX_ARRAY_DIM];  \
>  };  \
>  \
>  /* "Front" end's private variables */   \
> @@ -479,7 +487,7 @@ struct name##_data_intf { 
> \
>  uint8_t pad2[56];
>  \
>   
>  \
>  RING_IDX ring_order; 
>  \
> -grant_ref_t ref[];   
>  \
> +grant_ref_t ref[XEN_FLEX_ARRAY_DIM]; 
>  \
>  };   
>  \
>  DEFINE_XEN_FLEX_RING(name)
>

Re: [PATCH v3] xen/public: fix flexible array definitions

2023-12-01 Thread Henry Wang

Hi Juergen,

> On Dec 1, 2023, at 19:20, Juergen Gross  wrote:
> 
> Flexible arrays in public headers can be problematic with some
> compilers.
> 
> With XEN_FLEX_ARRAY_DIM there is a mechanism available to deal with
> this issue, but care must be taken to not change the affected structs
> in an incompatible way.
> 
> So bump __XEN_LATEST_INTERFACE_VERSION__ and introduce a new macro
> XENPV_FLEX_ARRAY_DIM which will be XENPV_FLEX_ARRAY_DIM with the
> interface version being new enough and "1" (the value used today in
> the affected headers) when the interface version is an old one.
> 
> Replace the arr[1] instances (this includes the ones seen to be
> problematic in recent Linux kernels [1]) with arr[XENPV_FLEX_ARRAY_DIM]
> in order to avoid compilation errors.
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217693
> 
> Signed-off-by: Juergen Gross 

Acked-by: Henry Wang  # CHANGELOG

Kind regards,
Henry

[PATCH v3] xen/public: fix flexible array definitions

2023-12-01 Thread Juergen Gross

Flexible arrays in public headers can be problematic with some
compilers.

With XEN_FLEX_ARRAY_DIM there is a mechanism available to deal with
this issue, but care must be taken to not change the affected structs
in an incompatible way.

So bump __XEN_LATEST_INTERFACE_VERSION__ and introduce a new macro
XENPV_FLEX_ARRAY_DIM which will be XENPV_FLEX_ARRAY_DIM with the
interface version being new enough and "1" (the value used today in
the affected headers) when the interface version is an old one.

Replace the arr[1] instances (this includes the ones seen to be
problematic in recent Linux kernels [1]) with arr[XENPV_FLEX_ARRAY_DIM]
in order to avoid compilation errors.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217693

Signed-off-by: Juergen Gross 
---
V2:
- bump interface version and make change only for new version
  (Jan Beulich)
V3:
- move XENPV_FLEX_ARRAY_DIM definition to ring.h (Jan Beulich)
- fix 2 wrong XENPV_FLEX_ARRAY_DIM use cases (Julien Grall)
- add CHANGELOG.md entry (Andrew Cooper)
---
 CHANGELOG.md |  2 ++
 xen/include/public/io/cameraif.h |  2 +-
 xen/include/public/io/displif.h  |  2 +-
 xen/include/public/io/fsif.h |  4 ++--
 xen/include/public/io/pvcalls.h  |  2 +-
 xen/include/public/io/ring.h | 12 ++--
 xen/include/public/io/sndif.h|  2 +-
 xen/include/public/xen-compat.h  |  2 +-
 8 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4ecebb9f68..5ee5d41fc9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,8 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
 ## [4.19.0 
UNRELEASED](https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=staging)
 - TBD
 
 ### Changed
+ - Changed flexible array definitions in public I/O interface headers to not
+   use "1" as the number of array elements.
 
 ### Added
  - On x86:
diff --git a/xen/include/public/io/cameraif.h b/xen/include/public/io/cameraif.h
index 13763abef9..a389443769 100644
--- a/xen/include/public/io/cameraif.h
+++ b/xen/include/public/io/cameraif.h
@@ -763,7 +763,7 @@ struct xencamera_buf_create_req {
  */
 struct xencamera_page_directory {
 grant_ref_t gref_dir_next_page;
-grant_ref_t gref[1]; /* Variable length */
+grant_ref_t gref[XENPV_FLEX_ARRAY_DIM];
 };
 
 /*
diff --git a/xen/include/public/io/displif.h b/xen/include/public/io/displif.h
index 73d0cbdf15..132c96fa5c 100644
--- a/xen/include/public/io/displif.h
+++ b/xen/include/public/io/displif.h
@@ -537,7 +537,7 @@ struct xendispl_dbuf_create_req {
 
 struct xendispl_page_directory {
 grant_ref_t gref_dir_next_page;
-grant_ref_t gref[1]; /* Variable length */
+grant_ref_t gref[XENPV_FLEX_ARRAY_DIM];
 };
 
 /*
diff --git a/xen/include/public/io/fsif.h b/xen/include/public/io/fsif.h
index ec57850233..dcade1c698 100644
--- a/xen/include/public/io/fsif.h
+++ b/xen/include/public/io/fsif.h
@@ -40,7 +40,7 @@ struct fsif_read_request {
 int32_t pad;
 uint64_t len;
 uint64_t offset;
-grant_ref_t grefs[1];  /* Variable length */
+grant_ref_t grefs[XENPV_FLEX_ARRAY_DIM];
 };
 
 struct fsif_write_request {
@@ -48,7 +48,7 @@ struct fsif_write_request {
 int32_t pad;
 uint64_t len;
 uint64_t offset;
-grant_ref_t grefs[1];  /* Variable length */
+grant_ref_t grefs[XENPV_FLEX_ARRAY_DIM];
 };
 
 struct fsif_stat_request {
diff --git a/xen/include/public/io/pvcalls.h b/xen/include/public/io/pvcalls.h
index 230b0719e3..c8c7602470 100644
--- a/xen/include/public/io/pvcalls.h
+++ b/xen/include/public/io/pvcalls.h
@@ -30,7 +30,7 @@ struct pvcalls_data_intf {
 uint8_t pad2[52];
 
 RING_IDX ring_order;
-grant_ref_t ref[];
+grant_ref_t ref[XEN_FLEX_ARRAY_DIM];
 };
 DEFINE_XEN_FLEX_RING(pvcalls);
 
diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h
index 0cae4367be..a79d913142 100644
--- a/xen/include/public/io/ring.h
+++ b/xen/include/public/io/ring.h
@@ -25,8 +25,16 @@
  * and grant_table.h from the Xen public headers.
  */
 
+#include "../xen.h"
 #include "../xen-compat.h"
 
+/* Some PV I/O interfaces need a compatibility variant. */
+#if __XEN_INTERFACE_VERSION__ < 0x00041300
+#define XENPV_FLEX_ARRAY_DIM  1 /* variable size */
+#else
+#define XENPV_FLEX_ARRAY_DIM  XEN_FLEX_ARRAY_DIM
+#endif
+
 #if __XEN_INTERFACE_VERSION__ < 0x00030208
 #define xen_mb()  mb()
 #define xen_rmb() rmb()
@@ -110,7 +118,7 @@ struct __name##_sring { 
\
 uint8_t pvt_pad[4]; \
 } pvt;  \
 uint8_t __pad[44];  \
-union __name##_sring_entry ring[1]; /* variable-length */   \
+union __name##_sring_entry ring[XENPV_FLEX_ARRAY_DIM];  \
 };  \

Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

On 01.12.2023 11:21, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 10:41:45AM +0100, Jan Beulich wrote:
>> On 01.12.2023 09:50, Roger Pau Monné wrote:
>>> On Fri, Dec 01, 2023 at 07:53:29AM +0100, Jan Beulich wrote:
 On 30.11.2023 18:37, Roger Pau Monné wrote:
> On Thu, Nov 30, 2023 at 05:55:07PM +0100, Jan Beulich wrote:
>> On 28.11.2023 11:03, Roger Pau Monne wrote:
>>> The minimal function size requirements for livepatch are either 5 bytes 
>>> (for
>>> jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always at 
>>> least
>>> that size by requesting the compiled to align the functions to 8 or 16 
>>> bytes,
>>> depending on whether Xen is build with IBT support.
>>
>> How is alignment going to enforce minimum function size? If a function is
>> last in a section, there may not be any padding added (ahead of linking 
>> at
>> least). The trailing padding also isn't part of the function.
>
> If each function lives in it's own section (by using
> -ffunction-sections), and each section is aligned, then I think we can
> guarantee that there will always be enough padding space?
>
> Even the last function/section on the .text block would still be
> aligned, and as long as the function alignment <= SECTION_ALIGN
> there will be enough padding left.  I should add some build time
> assert that CONFIG_CC_FUNCTION_ALIGNMENT <= SECTION_ALIGN.

 I'm not sure of there being a requirement for a section to be padded to
 its alignment. If the following section has smaller alignment, it could
 be made start earlier. Of course our linker scripts might guarantee
 this ...
>>>
>>> I do think so, given our linker script arrangements for the .text
>>> section:
>>>
>>> DECL_SECTION(.text) {
>>> [...]
>>> } PHDR(text) = 0x9090
>>>
>>> . = ALIGN(SECTION_ALIGN);
>>>
>>> The end of the text section is aligned to SECTION_ALIGN, so as long as
>>> SECTION_ALIGN >= CONFIG_CC_FUNCTION_ALIGNMENT the alignment should
>>> guarantee a minimal function size.
>>>
>>> Do you think it would be clearer if I add the following paragraph:
>>>
>>> "Given the Xen linker script arrangement of the .text section, we can
>>> ensure that when all functions are aligned to the given boundary the
>>> function size will always be a multiple of such alignment, even for
>>> the last function in .text, as the linker script aligns the end of the
>>> section to SECTION_ALIGN."
>>
>> I think this would be useful to have there. Beyond that, assembly code
>> also needs considering btw.
> 
> Assembly will get dealt with once we start to also have separate
> sections for each assembly function.  We cannot patch assembly code at
> the moment anyway, due to lack of debug symbols.

Well, yes, that's one part of it. The other is that some .text coming
from an assembly source may follow one coming from some C source, and
if the assembly one then isn't properly aligned, padding space again
wouldn't necessarily be large enough. This may be alright now (where
.text is the only thing that can come from .S and would be linked
ahead of all .text.*, being the only thing that can come from .c), but
it might subtly when assembly code is also switched to per-function
sections (you may recall that a patch to this effect is already
pending: "common: honor CONFIG_CC_SPLIT_SECTIONS also for assembly
functions").

Jan

Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-12-01 Thread George Dunlap

On Thu, Nov 30, 2023 at 10:28 PM Stefano Stabellini
 wrote:
>
> Hi all,
>
> This vote is in the context of this thread:
> https://marc.info/?l=xen-devel=169213351810075

To add slightly more context.

The issue here is more than a simple "should we use the word broken or
not".  We already have a mechanism for resolving this, which is that
the maintainers of the code in question (in this case THE REST), can
vote.  In any case, on that thread, four of THE REST were opposed to
using the word "broken" in technical documentation, and one in favor.

However -- and I hope I'm not misrepresenting Andy here -- Andy thinks
that position is preposterous, and that this kind of request is a
clear example of a kind of a pattern of unreasonable review which is
damaging to the project and driving away contributors.  Daniel Smith
at least supported the use of the word "broken" in that thread as
well; and (hoping I'm not reading too much into it), the tone of
writing also suggests a level of exasperation.  Andy seems to think
there are others who agree with him as well. This specific issue has
been sort of simmering in the background since August, and we're
trying to get it resolved.

In my discussions with Andy, trying to understand his point of view,
we always reach a sort of impasse, where Andy thinks the majority of
contributors would agree with him, that insisting on removing "broken"
is a completely unreasonable request; and I think that the majority of
contributors would agree with me, that insisting on removing "broken"
is a simple enforcement of long-established norms about how technical
documentation is written.

Everyone would agree, I think, that community norms should be upheld;
everyone agrees that unreasonable nitpicking or imposition of personal
idiosyncratic preferences should be avoided; but in this case we
disagree about whether "don't use broken in technical documentation"
is a "community norm" or "personal idiosyncratic preference".

So the idea was to run a test and find out.  If most people in the
community really do think that "broken" is suitable for the
documentation in our project, then of course the maintainers should
stop objecting to that kind of language.  If most of the people in the
community think that "broken" is *not* suitable for technical
documentation, then of course this isn't an example of unreasonable
review (although other instances may be).

Fundamentally a lot of these sorts of issues come up because different
parts of the community are not "on the same page".  The question is,
how do we *get* on the same page?  I don't want to have a vote or poll
over every little issue; but if we really have a deep 2(+) / 4 split,
it's probably worth having some sort of a discussion to figure out
where we are.  Hence the poll.

I would have worded it differently; but nonetheless, it's a sort of
single data point.  What do you as the community think?  Is "this
hypercall is broken" the sort of thing you'd like us to prevent, or is
that being unreasonable?

FWIW I think a "five-point survey" would probably have been somewhat better:

Regarding the review insisting that the word "broken" be removed from
the updated documentation to the old hypercall:

( ) I think this sort of enforcement is right, and would argue that we
continue doing it
( ) I'm happy with this sort of enforcement, but I wouldn't argue for it
( ) I'm not particularly happy with this sort of enforcement, but I
wouldn't argue against it
( ) I think this sort of enforcement is unreasonable and is harming
the community
( ) I have no idea why we're talking about this, it's really not a big deal.

 -George

Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

On Fri, Dec 01, 2023 at 10:41:45AM +0100, Jan Beulich wrote:
> On 01.12.2023 09:50, Roger Pau Monné wrote:
> > On Fri, Dec 01, 2023 at 07:53:29AM +0100, Jan Beulich wrote:
> >> On 30.11.2023 18:37, Roger Pau Monné wrote:
> >>> On Thu, Nov 30, 2023 at 05:55:07PM +0100, Jan Beulich wrote:
>  On 28.11.2023 11:03, Roger Pau Monne wrote:
> > The minimal function size requirements for livepatch are either 5 bytes 
> > (for
> > jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always at 
> > least
> > that size by requesting the compiled to align the functions to 8 or 16 
> > bytes,
> > depending on whether Xen is build with IBT support.
> 
>  How is alignment going to enforce minimum function size? If a function is
>  last in a section, there may not be any padding added (ahead of linking 
>  at
>  least). The trailing padding also isn't part of the function.
> >>>
> >>> If each function lives in it's own section (by using
> >>> -ffunction-sections), and each section is aligned, then I think we can
> >>> guarantee that there will always be enough padding space?
> >>>
> >>> Even the last function/section on the .text block would still be
> >>> aligned, and as long as the function alignment <= SECTION_ALIGN
> >>> there will be enough padding left.  I should add some build time
> >>> assert that CONFIG_CC_FUNCTION_ALIGNMENT <= SECTION_ALIGN.
> >>
> >> I'm not sure of there being a requirement for a section to be padded to
> >> its alignment. If the following section has smaller alignment, it could
> >> be made start earlier. Of course our linker scripts might guarantee
> >> this ...
> > 
> > I do think so, given our linker script arrangements for the .text
> > section:
> > 
> > DECL_SECTION(.text) {
> > [...]
> > } PHDR(text) = 0x9090
> > 
> > . = ALIGN(SECTION_ALIGN);
> > 
> > The end of the text section is aligned to SECTION_ALIGN, so as long as
> > SECTION_ALIGN >= CONFIG_CC_FUNCTION_ALIGNMENT the alignment should
> > guarantee a minimal function size.
> > 
> > Do you think it would be clearer if I add the following paragraph:
> > 
> > "Given the Xen linker script arrangement of the .text section, we can
> > ensure that when all functions are aligned to the given boundary the
> > function size will always be a multiple of such alignment, even for
> > the last function in .text, as the linker script aligns the end of the
> > section to SECTION_ALIGN."
> 
> I think this would be useful to have there. Beyond that, assembly code
> also needs considering btw.

Assembly will get dealt with once we start to also have separate
sections for each assembly function.  We cannot patch assembly code at
the moment anyway, due to lack of debug symbols.

Thanks, Roger.

[PATCH] xen/livepatch: make .livepatch.funcs read-only for in-tree tests

2023-12-01 Thread Roger Pau Monne

This matches the flags of the .livepatch.funcs section when generated using
livepatch-build-tools, which only sets the SHT_ALLOC flag.

Also constify the definitions of the livepatch_func variables in the tests
themselves, in order to better match the resulting output.  Note that just
making those variables constant is not enough to force the generated sections
to be read-only.

Signed-off-by: Roger Pau Monné 
---
 xen/test/livepatch/Makefile| 5 -
 xen/test/livepatch/xen_action_hooks.c  | 3 ++-
 xen/test/livepatch/xen_action_hooks_marker.c   | 3 ++-
 xen/test/livepatch/xen_action_hooks_noapply.c  | 3 ++-
 xen/test/livepatch/xen_action_hooks_norevert.c | 3 ++-
 xen/test/livepatch/xen_bye_world.c | 3 ++-
 xen/test/livepatch/xen_expectations.c  | 3 ++-
 xen/test/livepatch/xen_expectations_fail.c | 3 ++-
 xen/test/livepatch/xen_hello_world.c   | 3 ++-
 xen/test/livepatch/xen_nop.c   | 2 +-
 xen/test/livepatch/xen_prepost_hooks.c | 3 ++-
 xen/test/livepatch/xen_prepost_hooks_fail.c| 3 ++-
 xen/test/livepatch/xen_replace_world.c | 3 ++-
 13 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/xen/test/livepatch/Makefile b/xen/test/livepatch/Makefile
index d987a8367f15..4caa9e24324e 100644
--- a/xen/test/livepatch/Makefile
+++ b/xen/test/livepatch/Makefile
@@ -142,7 +142,10 @@ xen_expectations_fail-objs := xen_expectations_fail.o 
xen_hello_world_func.o not
 
 
 quiet_cmd_livepatch = LD  $@
-cmd_livepatch = $(LD) $(XEN_LDFLAGS) $(build_id_linker) -r -o $@ 
$(real-prereqs)
+define cmd_livepatch
+$(LD) $(XEN_LDFLAGS) $(build_id_linker) -r -o $@ $(real-prereqs); \
+$(OBJCOPY) --set-section-flags ".livepatch.funcs=alloc,readonly" $@
+endef
 
 $(obj)/%.livepatch: FORCE
$(call if_changed,livepatch)
diff --git a/xen/test/livepatch/xen_action_hooks.c 
b/xen/test/livepatch/xen_action_hooks.c
index fa0b3ab35f38..30c2c5de3c82 100644
--- a/xen/test/livepatch/xen_action_hooks.c
+++ b/xen/test/livepatch/xen_action_hooks.c
@@ -84,7 +84,8 @@ LIVEPATCH_REVERT_HOOK(revert_hook);
 
 LIVEPATCH_POSTREVERT_HOOK(post_revert_hook);
 
-struct livepatch_func __section(".livepatch.funcs") livepatch_xen_hello_world 
= {
+const struct livepatch_func __section(".livepatch.funcs")
+livepatch_xen_hello_world = {
 .version = LIVEPATCH_PAYLOAD_VERSION,
 .name = hello_world_patch_this_fnc,
 .new_addr = xen_hello_world,
diff --git a/xen/test/livepatch/xen_action_hooks_marker.c 
b/xen/test/livepatch/xen_action_hooks_marker.c
index d2e22f70d1f4..eb31a4abc48b 100644
--- a/xen/test/livepatch/xen_action_hooks_marker.c
+++ b/xen/test/livepatch/xen_action_hooks_marker.c
@@ -96,7 +96,8 @@ LIVEPATCH_POSTAPPLY_HOOK(post_apply_hook);
 LIVEPATCH_PREREVERT_HOOK(pre_revert_hook);
 LIVEPATCH_POSTREVERT_HOOK(post_revert_hook);
 
-struct livepatch_func __section(".livepatch.funcs") livepatch_xen_hello_world 
= {
+const struct livepatch_func __section(".livepatch.funcs")
+livepatch_xen_hello_world = {
 .version = LIVEPATCH_PAYLOAD_VERSION,
 .name = hello_world_patch_this_fnc,
 .new_addr = xen_hello_world,
diff --git a/xen/test/livepatch/xen_action_hooks_noapply.c 
b/xen/test/livepatch/xen_action_hooks_noapply.c
index 646a5fd2f002..92d10d53ffc1 100644
--- a/xen/test/livepatch/xen_action_hooks_noapply.c
+++ b/xen/test/livepatch/xen_action_hooks_noapply.c
@@ -120,7 +120,8 @@ LIVEPATCH_POSTAPPLY_HOOK(post_apply_hook);
 LIVEPATCH_PREREVERT_HOOK(pre_revert_hook);
 LIVEPATCH_POSTREVERT_HOOK(post_revert_hook);
 
-struct livepatch_func __section(".livepatch.funcs") livepatch_xen_hello_world 
= {
+const struct livepatch_func __section(".livepatch.funcs")
+livepatch_xen_hello_world = {
 .version = LIVEPATCH_PAYLOAD_VERSION,
 .name = hello_world_patch_this_fnc,
 .new_addr = xen_hello_world,
diff --git a/xen/test/livepatch/xen_action_hooks_norevert.c 
b/xen/test/livepatch/xen_action_hooks_norevert.c
index cdfff156cede..0f31faa8f386 100644
--- a/xen/test/livepatch/xen_action_hooks_norevert.c
+++ b/xen/test/livepatch/xen_action_hooks_norevert.c
@@ -115,7 +115,8 @@ LIVEPATCH_POSTAPPLY_HOOK(post_apply_hook);
 LIVEPATCH_PREREVERT_HOOK(pre_revert_hook);
 LIVEPATCH_POSTREVERT_HOOK(post_revert_hook);
 
-struct livepatch_func __section(".livepatch.funcs") livepatch_xen_hello_world 
= {
+const struct livepatch_func __section(".livepatch.funcs")
+livepatch_xen_hello_world = {
 .version = LIVEPATCH_PAYLOAD_VERSION,
 .name = hello_world_patch_this_fnc,
 .new_addr = xen_hello_world,
diff --git a/xen/test/livepatch/xen_bye_world.c 
b/xen/test/livepatch/xen_bye_world.c
index 2700f0eeddd2..86589205d8bd 100644
--- a/xen/test/livepatch/xen_bye_world.c
+++ b/xen/test/livepatch/xen_bye_world.c
@@ -14,7 +14,8 @@
 static const char bye_world_patch_this_fnc[] = "xen_extra_version";
 extern const char *xen_bye_world(void);
 
-struct livepatch_func __section(".livepatch.funcs") livepatch_xen_bye_world = {
+const struct

Re: [PATCH v2 27/29] tools/xenstored: add helpers for filename handling

2023-12-01 Thread Juergen Gross


On 28.11.23 21:42, Jason Andryuk wrote:

On Wed, Nov 15, 2023 at 1:14 AM Juergen Gross  wrote:


On 14.11.23 21:53, Julien Grall wrote:

Hi Juergen,

On 14/11/2023 09:26, Juergen Gross wrote:

On 14.11.23 10:10, Julien Grall wrote:

Hi Juergen,

On 14/11/2023 06:45, Juergen Gross wrote:

On 13.11.23 23:25, Julien Grall wrote:

Hi Juergen,

On 10/11/2023 16:08, Juergen Gross wrote:

diff --git a/tools/xenstored/lu_daemon.c b/tools/xenstored/lu_daemon.c
index 71bcabadd3..635ab0 100644
--- a/tools/xenstored/lu_daemon.c
+++ b/tools/xenstored/lu_daemon.c
@@ -24,7 +24,7 @@ void lu_get_dump_state(struct lu_dump_state *state)
   state->size = 0;
   state->filename = talloc_asprintf(NULL, "%s/state_dump",
-  xenstore_daemon_rundir());
+  xenstore_rundir());


... call and ...


   if (!state->filename)
   barf("Allocation failure");
@@ -65,7 +65,7 @@ FILE *lu_dump_open(const void *ctx)
   int fd;
   filename = talloc_asprintf(ctx, "%s/state_dump",
-   xenstore_daemon_rundir());
+   xenstore_rundir());


... this one could be replaced with absolute_filename().


No, I don't think this is a good idea.

I don't want the daemon to store trace files specified as relative files
to be stored in /var/run/xen, while I want all files of the stubdom to be
stored under /var/lib/xen.


Why? This is a bit odd to have a different behavior between stubdom and
daemon. It would be much easier for the user if they knew all the files would
be at the same place regardless the version used.


The main difference is that stubdom has access to only _one_ directory in dom0.


Would you be able to explain why we can only give access to a single directory?
Is this because of the 9pfs protocol?


Yes. I can mount a specific dom0 directory in the guest.


I'm fine with a single directory being used for stubdom.  Two
directories could be exported, and mini-os would need to use the "tag"
to differentiate the two.  That may not be worth the added code.  QEMU
can provide multiple 9pfs exports and Linux can mount them by tag
name.


The main thing is that the daemon is meant to solve exactly one problem:
having a way to enable infrastructure domains (Xenstore-stubdom, driver
domains, device-model stubdoms) to access some few files in dom0, e.g. for
logging or config purposes.

The daemon should be as simple as possible and, of course, have ways to
control resource usage (file system space) used by the domUs configured to
use it.

It should _not_ be a a replacement of the full-blown backend in e.g. qemu.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

On 01.12.2023 09:50, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 07:53:29AM +0100, Jan Beulich wrote:
>> On 30.11.2023 18:37, Roger Pau Monné wrote:
>>> On Thu, Nov 30, 2023 at 05:55:07PM +0100, Jan Beulich wrote:
 On 28.11.2023 11:03, Roger Pau Monne wrote:
> The minimal function size requirements for livepatch are either 5 bytes 
> (for
> jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always at 
> least
> that size by requesting the compiled to align the functions to 8 or 16 
> bytes,
> depending on whether Xen is build with IBT support.

 How is alignment going to enforce minimum function size? If a function is
 last in a section, there may not be any padding added (ahead of linking at
 least). The trailing padding also isn't part of the function.
>>>
>>> If each function lives in it's own section (by using
>>> -ffunction-sections), and each section is aligned, then I think we can
>>> guarantee that there will always be enough padding space?
>>>
>>> Even the last function/section on the .text block would still be
>>> aligned, and as long as the function alignment <= SECTION_ALIGN
>>> there will be enough padding left.  I should add some build time
>>> assert that CONFIG_CC_FUNCTION_ALIGNMENT <= SECTION_ALIGN.
>>
>> I'm not sure of there being a requirement for a section to be padded to
>> its alignment. If the following section has smaller alignment, it could
>> be made start earlier. Of course our linker scripts might guarantee
>> this ...
> 
> I do think so, given our linker script arrangements for the .text
> section:
> 
> DECL_SECTION(.text) {
> [...]
> } PHDR(text) = 0x9090
> 
> . = ALIGN(SECTION_ALIGN);
> 
> The end of the text section is aligned to SECTION_ALIGN, so as long as
> SECTION_ALIGN >= CONFIG_CC_FUNCTION_ALIGNMENT the alignment should
> guarantee a minimal function size.
> 
> Do you think it would be clearer if I add the following paragraph:
> 
> "Given the Xen linker script arrangement of the .text section, we can
> ensure that when all functions are aligned to the given boundary the
> function size will always be a multiple of such alignment, even for
> the last function in .text, as the linker script aligns the end of the
> section to SECTION_ALIGN."

I think this would be useful to have there. Beyond that, assembly code
also needs considering btw.

Jan

Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING


Hi Stefano,

On 30/11/2023 22:27, Stefano Stabellini wrote:

Hi all,

This vote is in the context of this thread:
https://marc.info/?l=xen-devel=169213351810075


Thanks for providing the context + CCing committers.

First I will echo what Jan said and mention that providing context to 
the vote is always useful.


My second point is while I understand the vote is open to everyone, it 
would be good to at least CC the maintainers of the area involved (or 
committers if you need a wider input). At least a few of us don't often 
look at xen-devel and I would have missed this vote request if Stefano 
hadn't CCed me.


Cheers,

--
Julien Grall

Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs

On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>  bus = PCI_BUS(machine_sbdf);
>  devfn = PCI_DEVFN(machine_sbdf);
>  
> +if ( needs_vpci(d) && !has_vpci(d) )
> +{
> +printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support 
> not enabled\n",
> +   _SBDF(seg, bus, devfn), d);
> +ret = -EPERM;
> +break;

I think this is likely too restrictive going forward.  The current
approach is indeed to enable vPCI on a per-domain basis because that's
how PVH dom0 uses it, due to being unable to use ioreq servers.

If we start to expose vPCI suport to guests the interface should be on
a per-device basis, so that vPCI could be enabled for some devices,
while others could still be handled by ioreq servers.

We might want to add a new flag to xen_domctl_assign_device (used by
XEN_DOMCTL_assign_device) in order to signal whether the device will
use vPCI.

Thanks, Roger.

Re: Informal voting proposal

2023-12-01 Thread Rich Persaud

On Nov 6, 2023, at 13:53, Kelly Choi  wrote:
> 
> Hi all,
> 
> As an open-source community, there will always be differences of opinion in 
> approaches and the way we think. It is imperative, however, that we view this 
> diversity as a source of strength rather than a hindrance.
> 
> Recent deliberations within our project have led to certain matters being put 
> on hold due to an inability to reach a consensus. While formal voting 
> procedures serve their purpose, they can be time-consuming and may not always 
> lead to meaningful progress.
> 
> Having received agreement from a few maintainers already, I would like to 
> propose the following:
> 
> Informal voting method:
> Each project should ideally have more than 2 maintainers to facilitate 
> impartial discussions. Projects lacking this configuration will be addressed 
> at a later stage.
> Anyone in the community is welcome to voice their opinions, ideas, and 
> concerns about any patch or contribution.
> If members cannot agree, the majority informal vote of the maintainers will 
> be the decision that stands. For instance, if, after careful consideration of 
> all suggestions and concerns, 2 out of 3 maintainers endorse a solution 
> within the x86 subsystem, it shall be the decision we move forward with.
> Naturally, there may be exceptional circumstances, as such, a formal vote may 
> be warranted but should happen only a few times a year for serious cases only.
> Informal votes can be as easy as 2 out of 3 maintainers providing their 
> Acked-by/Reviewed-by tag. Alternatively, Maintainers can call an informal 
> vote by simply emailing the thread with "informal vote proposed, option 1 and 
> option 2." 
> All maintainers should reply with their vote within 5 working days.  
> Please note that with any new process, there will always be room for 
> improvement and we will reiterate where needed.
> Ultimately our goal here is to prevent the project coming to a standstill 
> while deliberating decisions that we all cannot agree on. This may mean 
> compromising in the short term but I am sure the long-term benefits will 
> stand for themselves.  
> 
> If you have any strong objections to the informal voting, please let me know 
> by 30th November 2023. 
> Should I receive no objections, the process will be implemented as of 1st 
> December 2023.
> 

Apologies for the late response, I was recently asked to look at this thread, 
and it's now the end of my Nov 30th USA day.

In order to evaluate new governance proposals, historical test cases are 
needed.  Then the existing process, proposed process (and other candidate 
processes!) can be applied to each test case in turn, so we can evaluate the 
benefits and costs of each candidate.  

If the problem is not defined, how can candidate solutions be evaluated?  
Perhaps those who have responded to the thread have already discussed the 
problem(s) elsewhere, but we need to include them in the public, on-list 
discussion record.


> Again there will be times for that call for flexibility, but we should always 
> aim to have a vote for two of the best solutions to avoid the project coming 
> to another standstill. 


Unless I am mistaken, only one solution has been proposed for a problem that 
has zero on-list examples or test cases.  The community is being given a choice 
between one solution and no solution?  

If we can define the problem, with more than one historical example, then we 
can consider multiple solutions, pick two of the best solutions, and approve 
one of the solutions for implementation.

Regards,
Rich

p.s. This is a strong objection to the absence of a problem definition.

Re: [PATCH v10 12/17] vpci/header: emulate PCI_COMMAND register for guests