[PATCH v7 6/7] xen/arm: process pending vPCI map/unmap operations

2021-11-23 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

vPCI may map and unmap PCI device memory (BARs) being passed through which
may take a lot of time. For this those operations may be deferred to be
performed later, so that they can be safely preempted.

Currently this deferred processing is happening in common IOREQ code
which doesn't seem to be the right place for x86 and is even more
doubtful because IOREQ may not be enabled for Arm at all.
So, for Arm the pending vPCI work may have no chance to be executed
if the processing is left as is in the common IOREQ code only.
For that reason make vPCI processing happen in arch specific code.

Please be aware that there are a few outstanding TODOs affecting this
code path, see xen/drivers/vpci/header.c:map_range and
xen/drivers/vpci/header.c:vpci_process_pending.

Signed-off-by: Oleksandr Andrushchenko 
[x86 part]
Acked-by: Jan Beulich 
Reviewed-by: Julien Grall 
---
Cc: Andrew Cooper 
Cc: Paul Durrant 

Since v5:
 - check_for_vcpu_work: vPCI addition is moved before the
   vcpu_ioreq__handle_completion(v). This is to avoid differences
   with the x86 version. (Julien)
Since v2:
 - update commit message with more insight on x86, IOREQ and Arm
 - restored order of invocation for IOREQ and vPCI processing (Jan)
Since v1:
 - Moved the check for pending vpci work from the common IOREQ code
   to hvm_do_resume on x86
 - Re-worked the code for Arm to ensure we don't miss pending vPCI work
---
 xen/arch/arm/traps.c   | 13 +
 xen/arch/x86/hvm/hvm.c |  6 ++
 xen/common/ioreq.c |  9 -
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 219ab3c3fbde..8757210a798b 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2290,6 +2291,18 @@ static bool check_for_vcpu_work(void)
 {
 struct vcpu *v = current;
 
+if ( has_vpci(v->domain) )
+{
+bool pending;
+
+local_irq_enable();
+pending = vpci_process_pending(v);
+local_irq_disable();
+
+if ( pending )
+return true;
+}
+
 #ifdef CONFIG_IOREQ_SERVER
 if ( domain_has_ioreq_server(v->domain) )
 {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index eee365711d63..096a61b7ea02 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -546,6 +546,12 @@ void hvm_do_resume(struct vcpu *v)
 
 pt_restore_timer(v);
 
+if ( has_vpci(v->domain) && vpci_process_pending(v) )
+{
+raise_softirq(SCHEDULE_SOFTIRQ);
+return;
+}
+
 if ( !vcpu_ioreq_handle_completion(v) )
 return;
 
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index d732dc045df9..689d256544c8 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -25,9 +25,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 #include 
 #include 
@@ -212,19 +210,12 @@ static bool wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p)
 
 bool vcpu_ioreq_handle_completion(struct vcpu *v)
 {
-struct domain *d = v->domain;
 struct vcpu_io *vio = >io;
 struct ioreq_server *s;
 struct ioreq_vcpu *sv;
 enum vio_completion completion;
 bool res = true;
 
-if ( has_vpci(d) && vpci_process_pending(v) )
-{
-raise_softirq(SCHEDULE_SOFTIRQ);
-return false;
-}
-
 while ( (sv = get_pending_vcpu(v, )) != NULL )
 if ( !wait_for_io(sv, get_ioreq(s, v)) )
 return false;
-- 
2.25.1




[PATCH v7 5/7] xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m

2021-11-23 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

PCI host bridges are special devices in terms of implementing PCI
passthrough. According to [1] the current implementation depends on
Domain-0 to perform the initialization of the relevant PCI host
bridge hardware and perform PCI device enumeration. In order to
achieve that one of the required changes is to not map all the memory
ranges in map_range_to_domain as we traverse the device tree on startup
and perform some additional checks if the range needs to be mapped to
Domain-0.

The generic PCI host controller device tree binding says [2]:
- ranges: As described in IEEE Std 1275-1994, but must provide
  at least a definition of non-prefetchable memory. One
  or both of prefetchable Memory and IO Space may also
  be provided.

- reg   : The Configuration Space base address and size, as accessed
  from the parent bus.  The base address corresponds to
  the first bus in the "bus-range" property.  If no
  "bus-range" is specified, this will be bus 0 (the default).

>From the above none of the memory ranges from the "ranges" property
needs to be mapped to Domain-0 at startup as MMIO mapping is going to
be handled dynamically by vPCI as we assign PCI devices, e.g. each
device assigned to Domain-0/guest will have its MMIOs mapped/unmapped
as needed by Xen.

The "reg" property covers not only ECAM space, but may also have other
then the configuration memory ranges described, for example [3]:
- reg: Should contain rc_dbi, config registers location and length.
- reg-names: Must include the following entries:
   "rc_dbi": controller configuration registers;
   "config": PCIe configuration space registers.

This patch makes it possible to not map all the ranges from the
"ranges" property and also ECAM from the "reg". All the rest from the
"reg" property still needs to be mapped to Domain-0, so the PCI
host bridge remains functional in Domain-0. This is done by first
skipping the mappings while traversing the device tree as it is done for
usual devices and then by calling a dedicated pci_host_bridge_mappings
function which only maps MMIOs required by the host bridges leaving the
regions, needed for vPCI traps, unmapped.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00777.html
[2] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt
[3] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt

Signed-off-by: Oleksandr Andrushchenko 
---
Since v7:
- updates in comments and commit message
Since v5:
- remove some need_mapping local variables
- use own_device in handle_device
- add __init for pci_ecam_need_p2m_hwdom_mapping
- make pci_host_bridge_mappings use p2m_mmio_direct_dev directly
Since v4:
- update skip_mapping comment
- add comment why we need to map interrupts to Dom0
Since v3:
 - pass struct map_range_data to map_dt_irq_to_domain
 - remove redundant check from map_range_to_domain
 - fix handle_device's .skip_mapping
Since v2:
 - removed check in map_range_to_domain for PCI_DEV
   and moved it to handle_device, so the code is
   simpler
 - s/map_pci_bridge/skip_mapping
 - extended comment in pci_host_bridge_mappings
 - minor code restructure in construct_dom0
 - s/.need_p2m_mapping/.need_p2m_hwdom_mapping and related
   callbacks
 - unsigned int i; in pci_host_bridge_mappings
Since v1:
 - Added better description of why and what needs to be mapped into
   Domain-0's p2m and what doesn't
 - Do not do any mappings for PCI devices while traversing the DT
 - Walk all the bridges and make required mappings in one go
---
 xen/arch/arm/domain_build.c| 66 +-
 xen/arch/arm/pci/ecam.c| 14 +++
 xen/arch/arm/pci/pci-host-common.c | 50 ++
 xen/arch/arm/pci/pci-host-zynqmp.c |  1 +
 xen/include/asm-arm/pci.h  | 10 +
 xen/include/asm-arm/setup.h| 13 ++
 6 files changed, 126 insertions(+), 28 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index c83c02ab8ac6..47c884cca2c3 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -51,12 +51,6 @@ static int __init parse_dom0_mem(const char *s)
 }
 custom_param("dom0_mem", parse_dom0_mem);
 
-struct map_range_data
-{
-struct domain *d;
-p2m_type_t p2mt;
-};
-
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
@@ -1720,10 +1714,10 @@ static int __init map_dt_irq_to_domain(const struct 
dt_device_node *dev,
const struct dt_irq *dt_irq,
void *data)
 {
-struct domain *d = data;
+struct map_range_data *mr_data = data;
+struct domain *d = mr_data->d;
 unsigned int irq = dt_irq->irq;
 int res;
-bool need_mapping = !dt_device_for_passthrough(dev);
 
 if ( irq < NR_LOCAL_IRQS )
  

[PATCH v7 3/7] xen/arm: setup MMIO range trap handlers for hardware domain

2021-11-23 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

In order for vPCI to work it needs to maintain guest and hardware
domain's views of the configuration space. For example, BARs and
COMMAND registers require emulation for guests and the guest view
of the registers needs to be in sync with the real contents of the
relevant registers. For that ECAM address space needs to also be
trapped for the hardware domain, so we need to implement PCI host
bridge specific callbacks to properly setup MMIO handlers for those
ranges depending on particular host bridge implementation.

Signed-off-by: Oleksandr Andrushchenko 
---
Since v6:
- eliminate pci_host_get_num_bridges and make pci_host_iterate_bridges
  return the count
- extend comment in domain_vpci_init
- remove not yet relevant code for num MMIOs and virtual bus topology
- add extra check for has_vpci in domain_vpci_get_num_mmio_handlers
- remove code that fixes num MMIOs for guest domain as it doesn't belong
  to this patch
Since v5:
- add vpci_sbdf_from_gpa helper for gpa to SBDF translation
- take bridge's bus start into account while calculating SBDF
Since v4:
- unsigned int for functions working with count
- gate number of MMIO handlers needed for CONFIG_HAS_PCI_MSI
  and fix their number, e.g. single handler for PBA and
  MSI-X tables (Roger)
- re-work code for assigning MMIO handlers to be simpler
  and account on the fact that there could multiple host bridges
  exist for the hwdom
Since v3:
- fixed comment formatting
Since v2:
- removed unneeded assignment (count = 0)
- removed unneeded header inclusion
- update commit message
Since v1:
 - Dynamically calculate the number of MMIO handlers required for vPCI
   and update the total number accordingly
 - s/clb/cb
 - Do not introduce a new callback for MMIO handler setup
---
 xen/arch/arm/domain.c  |  2 +
 xen/arch/arm/pci/pci-host-common.c | 17 +++
 xen/arch/arm/vpci.c| 79 +++---
 xen/arch/arm/vpci.h|  6 +++
 xen/include/asm-arm/pci.h  |  4 ++
 5 files changed, 100 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 96e1b235501d..92a6c509e5c5 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -739,6 +739,8 @@ int arch_domain_create(struct domain *d,
 if ( (rc = domain_vgic_register(d, )) != 0 )
 goto fail;
 
+count += domain_vpci_get_num_mmio_handlers(d);
+
 if ( (rc = domain_io_init(d, count + MAX_IO_HANDLER)) != 0 )
 goto fail;
 
diff --git a/xen/arch/arm/pci/pci-host-common.c 
b/xen/arch/arm/pci/pci-host-common.c
index 9653b92b5b2e..18b09d5e6f10 100644
--- a/xen/arch/arm/pci/pci-host-common.c
+++ b/xen/arch/arm/pci/pci-host-common.c
@@ -296,6 +296,23 @@ int pci_get_host_bridge_segment(const struct 
dt_device_node *node,
 return -EINVAL;
 }
 
+int pci_host_iterate_bridges_and_count(struct domain *d,
+   int (*cb)(struct domain *d,
+ struct pci_host_bridge 
*bridge))
+{
+struct pci_host_bridge *bridge;
+int err, count = 0;
+
+list_for_each_entry( bridge, _host_bridges, node )
+{
+err = cb(d, bridge);
+if ( err )
+return err;
+count += err;
+}
+return count;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 23f45386f4b3..ccd998d8dba2 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -16,16 +16,31 @@
 
 #include 
 
+static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,
+ paddr_t gpa)
+{
+pci_sbdf_t sbdf;
+
+if ( bridge )
+{
+sbdf.sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
+sbdf.seg = bridge->segment;
+sbdf.bus += bridge->cfg->busn_start;
+}
+else
+sbdf.sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+
+return sbdf;
+}
+
 static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
   register_t *r, void *p)
 {
-pci_sbdf_t sbdf;
+struct pci_host_bridge *bridge = p;
+pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
-/* We ignore segment part and always handle segment 0 */
-sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
-
 if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
 1U << info->dabt.size, ) )
 {
@@ -41,10 +56,8 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
register_t r, void *p)
 {
-pci_sbdf_t sbdf;
-
-/* We ignore segment part and always handle segment 0 */
-sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE);
+struct pci_host_bridge *bridge = p;
+pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
 
  

[PATCH v7 4/7] xen/arm: account IO handler for emulated PCI host bridge

2021-11-23 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding an IO trap handler for the emulated
PCI host bridge we are not breaking anything, but we have a latent bug
as the maximum number of IOs may be exceeded.
Fix this by explicitly telling that we have an additional IO handler, so it is
accounted.

Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for 
ARM")

Signed-off-by: Oleksandr Andrushchenko 

---
New in v7
---
 xen/arch/arm/vpci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index ccd998d8dba2..8e801f275879 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -126,7 +126,8 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct 
domain *d)
 return ret < 0 ? 0 : ret;
 }
 
-return 0;
+/* For a single emulated host bridge's configuration space. */
+return 1;
 }
 
 /*
-- 
2.25.1




[PATCH v7 1/7] xen/arm: rename DEVICE_PCI to DEVICE_PCI_HOSTBRIDGE

2021-11-23 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

To better reflect the nature of the device type and not to make any
confusion rename DEVICE_PCI to DEVICE_PCI_HOSTBRIDGE which it
really is.

Suggested-by: Julien Grall 
Signed-off-by: Oleksandr Andrushchenko 
Reviewed-by: Julien Grall 
---
New in v6
---
 xen/arch/arm/pci/pci-host-generic.c | 2 +-
 xen/arch/arm/pci/pci-host-zynqmp.c  | 2 +-
 xen/arch/arm/pci/pci.c  | 2 +-
 xen/include/asm-arm/device.h| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/pci/pci-host-generic.c 
b/xen/arch/arm/pci/pci-host-generic.c
index 33457fbe9615..46de6e43cc72 100644
--- a/xen/arch/arm/pci/pci-host-generic.c
+++ b/xen/arch/arm/pci/pci-host-generic.c
@@ -32,7 +32,7 @@ static int __init pci_host_generic_probe(struct 
dt_device_node *dev,
 return pci_host_common_probe(dev, _generic_ecam_ops);
 }
 
-DT_DEVICE_START(pci_gen, "PCI HOST GENERIC", DEVICE_PCI)
+DT_DEVICE_START(pci_gen, "PCI HOST GENERIC", DEVICE_PCI_HOSTBRIDGE)
 .dt_match = gen_pci_dt_match,
 .init = pci_host_generic_probe,
 DT_DEVICE_END
diff --git a/xen/arch/arm/pci/pci-host-zynqmp.c 
b/xen/arch/arm/pci/pci-host-zynqmp.c
index 61a9807d3d58..516982bca833 100644
--- a/xen/arch/arm/pci/pci-host-zynqmp.c
+++ b/xen/arch/arm/pci/pci-host-zynqmp.c
@@ -49,7 +49,7 @@ static int __init pci_host_generic_probe(struct 
dt_device_node *dev,
 return pci_host_common_probe(dev, _pcie_ops);
 }
 
-DT_DEVICE_START(pci_gen, "PCI HOST ZYNQMP", DEVICE_PCI)
+DT_DEVICE_START(pci_gen, "PCI HOST ZYNQMP", DEVICE_PCI_HOSTBRIDGE)
 .dt_match = nwl_pcie_dt_match,
 .init = pci_host_generic_probe,
 DT_DEVICE_END
diff --git a/xen/arch/arm/pci/pci.c b/xen/arch/arm/pci/pci.c
index 082c14e127a8..78b97beaef12 100644
--- a/xen/arch/arm/pci/pci.c
+++ b/xen/arch/arm/pci/pci.c
@@ -46,7 +46,7 @@ static int __init dt_pci_init(void)
 
 dt_for_each_device_node(dt_host, np)
 {
-rc = device_init(np, DEVICE_PCI, NULL);
+rc = device_init(np, DEVICE_PCI_HOSTBRIDGE, NULL);
 /*
  * Ignore the following error codes:
  *   - EBADF: Indicate the current device is not a pci device.
diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h
index 3782660751b6..086dde13eb6b 100644
--- a/xen/include/asm-arm/device.h
+++ b/xen/include/asm-arm/device.h
@@ -37,7 +37,7 @@ enum device_class
 DEVICE_SERIAL,
 DEVICE_IOMMU,
 DEVICE_GIC,
-DEVICE_PCI,
+DEVICE_PCI_HOSTBRIDGE,
 /* Use for error */
 DEVICE_UNKNOWN,
 };
-- 
2.25.1




[PATCH v7 0/7] PCI devices passthrough on Arm, part 2

2021-11-23 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

Hi, all!

This is an assorted series of patches which aim is to make some further
basis for PCI passthrough on Arm support. The series continues the work
published earlier by Arm [1] and adds new helpers and clears the way for
vPCI changes which will follow.

RFC is at [2], [3]. Design presentation can be found at [4].

I have removed patch
[PATCH v6 5/7] xen/arm: do not map IRQs and memory for disabled devices
as it seems that this needs more time for decision on how to achive
that.

I have also added a new patch
[PATCH v7 4/7] xen/arm: account IO handler for emulated PCI host bridge
with a tiny latent bug fix.

Thank you,
Oleksandr

[1] https://patchwork.kernel.org/project/xen-devel/list/?series=558681
[2] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg01184.html
[3] 
https://lists.xenproject.org/archives/html/xen-devel/2020-07/threads.html#01184
[4] 
https://static.sched.com/hosted_files/xen2021/e4/PCI_Device_Passthrough_On_Arm.pdf

Oleksandr Andrushchenko (7):
  xen/arm: rename DEVICE_PCI to DEVICE_PCI_HOSTBRIDGE
  xen/arm: add pci-domain for disabled devices
  xen/arm: setup MMIO range trap handlers for hardware domain
  xen/arm: account IO handler for emulated PCI host bridge
  xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m
  xen/arm: process pending vPCI map/unmap operations
  xen/arm: do not use void pointer in pci_host_common_probe

 xen/arch/arm/domain.c   |   2 +
 xen/arch/arm/domain_build.c | 132 +++-
 xen/arch/arm/pci/ecam.c |  18 +++-
 xen/arch/arm/pci/pci-host-common.c  |  81 +++--
 xen/arch/arm/pci/pci-host-generic.c |   2 +-
 xen/arch/arm/pci/pci-host-zynqmp.c  |   3 +-
 xen/arch/arm/pci/pci.c  |   2 +-
 xen/arch/arm/traps.c|  13 +++
 xen/arch/arm/vpci.c |  80 +++--
 xen/arch/arm/vpci.h |   6 ++
 xen/arch/x86/hvm/hvm.c  |   6 ++
 xen/common/ioreq.c  |   9 --
 xen/include/asm-arm/device.h|   2 +-
 xen/include/asm-arm/pci.h   |  27 +-
 xen/include/asm-arm/setup.h |  13 +++
 15 files changed, 323 insertions(+), 73 deletions(-)

-- 
2.25.1




[PATCH v7 2/7] xen/arm: add pci-domain for disabled devices

2021-11-23 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

If a PCI host bridge device is present in the device tree, but is
disabled, then its PCI host bridge driver was not instantiated.
This results in the failure of the pci_get_host_bridge_segment()
and the following panic during Xen start:

(XEN) Device tree generation failed (-22).
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Could not set up DOM0 guest OS
(XEN) 

Fix this by adding "linux,pci-domain" property for all device tree nodes
which have "pci" device type, so we know which segments will be used by
the guest for which bridges.

Fixes: 4cfab4425d39 ("xen/arm: Add linux,pci-domain property for hwdom if not 
available.")

Signed-off-by: Oleksandr Andrushchenko 

---
Since v6:
- use use_dt_domains in pci_get_new_domain_nr and return -1 if set
- do not set "linux,pci-domain" if parent device is "pci"
- move the code to a new helper handle_linux_pci_domain (Julien)
New in v6
---
 xen/arch/arm/domain_build.c| 66 +++---
 xen/arch/arm/pci/pci-host-common.c |  8 +++-
 xen/include/asm-arm/pci.h  |  8 
 3 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e76cee3caccf..c83c02ab8ac6 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -654,6 +654,55 @@ static void __init allocate_static_memory(struct domain *d,
 }
 #endif
 
+/*
+ * When PCI passthrough is available we want to keep the
+ * "linux,pci-domain" in sync for every host bridge.
+ *
+ * Xen may not have a driver for all the host bridges. So we have
+ * to write an heuristic to detect whether a device node describes
+ * a host bridge.
+ *
+ * The current heuristic assumes that a device is a host bridge
+ * if the type is "pci" and then parent type is not "pci".
+ */
+static int handle_linux_pci_domain(struct kernel_info *kinfo,
+   const struct dt_device_node *node)
+{
+uint16_t segment;
+int res;
+
+if ( !is_pci_passthrough_enabled() )
+return 0;
+
+if ( !dt_device_type_is_equal(node, "pci") )
+return 0;
+
+if ( node->parent && dt_device_type_is_equal(node->parent, "pci") )
+return 0;
+
+if ( dt_find_property(node, "linux,pci-domain", NULL) )
+return 0;
+
+/* Allocate and create the linux,pci-domain */
+res = pci_get_host_bridge_segment(node, );
+if ( res < 0 )
+{
+res = pci_get_new_domain_nr();
+if ( res < 0 )
+{
+printk(XENLOG_DEBUG "Can't assign PCI segment to %s\n",
+   node->full_name);
+return -FDT_ERR_NOTFOUND;
+}
+
+segment = res;
+printk(XENLOG_DEBUG "Assigned segment %d to %s\n",
+   segment, node->full_name);
+}
+
+return fdt_property_cell(kinfo->fdt, "linux,pci-domain", segment);
+}
+
 static int __init write_properties(struct domain *d, struct kernel_info *kinfo,
const struct dt_device_node *node)
 {
@@ -755,21 +804,10 @@ static int __init write_properties(struct domain *d, 
struct kernel_info *kinfo,
 return res;
 }
 
-if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, "pci") )
-{
-if ( !dt_find_property(node, "linux,pci-domain", NULL) )
-{
-uint16_t segment;
-
-res = pci_get_host_bridge_segment(node, );
-if ( res < 0 )
-return res;
+res = handle_linux_pci_domain(kinfo, node);
 
-res = fdt_property_cell(kinfo->fdt, "linux,pci-domain", segment);
-if ( res )
-return res;
-}
-}
+if ( res )
+return res;
 
 /*
  * Override the property "status" to disable the device when it's
diff --git a/xen/arch/arm/pci/pci-host-common.c 
b/xen/arch/arm/pci/pci-host-common.c
index cdeb3a283a77..9653b92b5b2e 100644
--- a/xen/arch/arm/pci/pci-host-common.c
+++ b/xen/arch/arm/pci/pci-host-common.c
@@ -30,6 +30,8 @@ static LIST_HEAD(pci_host_bridges);
 
 static atomic_t domain_nr = ATOMIC_INIT(-1);
 
+static int use_dt_domains = -1;
+
 static inline void __iomem *pci_remap_cfgspace(paddr_t start, size_t len)
 {
 return ioremap_nocache(start, len);
@@ -137,14 +139,16 @@ void pci_add_host_bridge(struct pci_host_bridge *bridge)
 list_add_tail(>node, _host_bridges);
 }
 
-static int pci_get_new_domain_nr(void)
+int pci_get_new_domain_nr(void)
 {
+if ( use_dt_domains )
+return -1;
+
 return atomic_inc_return(_nr);
 }
 
 static int pci_bus_find_domain_nr(struct dt_device_node *dev)
 {
-static int use_dt_domains = -1;
 int domain;
 
 domain = dt_get_pci_domain_nr(dev);
diff --git a/xen/include/asm-arm/pci.h b/xen/include/asm-arm/pci.h
index 81273e0d87ac..c20eba643d86 100644
--- a/xen/include/asm-arm/pci.h
+++ b/xen/include/asm-arm/pci.h
@@ -108,6 +108,8 @@ static 

Re: About support for memory event on AMD CPUs

2021-11-23 Thread Jan Beulich
On 23.11.2021 18:14, YAN Untitled wrote:
> Dear developers,
> 
> Short version: is memory event supported on AMD CPUs or is it going to be
> supported?
> 
> Long version:
> 
> Environment: LibVMI 0.15.0 + Xen 4.14.3 on an **AMD CPU** (5950x), 64-bit
> Debian 11 Dom0.
> 
> What I am trying to accomplish: register a **memory event** handler, so that
> I can capture all memory accesses within a specific range issued by any 
> thread,
> or all memory accesses within any range issued by a specific thread.
> 
> What I got instead: an error from LibVMI saying
> "xc_hvm_set_mem_access failed with code: -1".
> 
> Some investigation: by inspecting the source code of LibVMI, I find the direct
> cause is one of the libxc functions, 1) xc_set_mem_access or
> 2) xc_altp2m_set_mem_access, returned error code -1.
> 
> After some searching, I found someone else having a similar problem [1]. I 
> also
> noted LibVMI says:
> 
>> Currently only the Xen Hypervisor provides these features,
>> and some of these are specifically only available on Intel CPUs
> 
> However, I can't find the exact confirmation for the availability of memory
> event on AMD CPUs from https://wiki.xenproject.org.

Aiui underlying what you want is altp2m, which presently depends (in
the hypervisor) on EPT being available (and in use for the guest in
question).

Jan




Re: [PATCH v6 4/7] xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m

2021-11-23 Thread Oleksandr Andrushchenko
Hi, Julien!

On 23.11.21 18:42, Julien Grall wrote:
> Hi Oleksandr,
>
> On 05/11/2021 06:33, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko 
>>
>> PCI host bridges are special devices in terms of implementing PCI
>> passthrough. According to [1] the current implementation depends on
>> Domain-0 to perform the initialization of the relevant PCI host
>> bridge hardware and perform PCI device enumeration. In order to
>> achieve that one of the required changes is to not map all the memory
>> ranges in map_range_to_domain as we traverse the device tree on startup
>> and perform some additional checks if the range needs to be mapped to
>> Domain-0.
>>
>> The generic PCI host controller device tree binding says [2]:
>> - ranges: As described in IEEE Std 1275-1994, but must provide
>>    at least a definition of non-prefetchable memory. One
>>    or both of prefetchable Memory and IO Space may also
>>    be provided.
>>
>> - reg   : The Configuration Space base address and size, as accessed
>>    from the parent bus.  The base address corresponds to
>>    the first bus in the "bus-range" property.  If no
>>    "bus-range" is specified, this will be bus 0 (the default).
>>
>>  From the above none of the memory ranges from the "ranges" property
>> needs to be mapped to Domain-0 at startup as MMIO mapping is going to
>> be handled dynamically by vPCI as we assign PCI devices, e.g. each
>> device assigned to Domain-0/guest will have its MMIOs mapped/unmapped
>> as needed by Xen.
>>
>> The "reg" property covers not only ECAM space, but may also have other
>> then the configuration memory ranges described, for example [3]:
>> - reg: Should contain rc_dbi, config registers location and length.
>> - reg-names: Must include the following entries:
>>     "rc_dbi": controller configuration registers;
>>     "config": PCIe configuration space registers.
>>
>> This patch makes it possible to not map all the ranges from the
>> "ranges" property and also ECAM from the "reg". All the rest from the
>> "reg" property still needs to be mapped to Domain-0, so the PCI
>> host bridge remains functional in Domain-0.
>
> The commit message is explaining the problematic quite well (thanks for 
> that). I think it also wants to explain the approach taken (the fact that we 
> are deferring the mapping for hostbridges).
Will add
>
>>
>> [1] 
>> https://urldefense.com/v3/__https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00777.html__;!!GF_29dbcQIUBPA!mdKldRDDMc62Zlr3zHPdIXeXjHNI2H6pGE7ZHNvbK3xizCNe0mhV7jMF5WLGZ7bA5B47tK4zFA$
>>  [lists[.]xenproject[.]org]
>> [2] 
>> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt__;!!GF_29dbcQIUBPA!mdKldRDDMc62Zlr3zHPdIXeXjHNI2H6pGE7ZHNvbK3xizCNe0mhV7jMF5WLGZ7bA5B7UwSKSww$
>>  [kernel[.]org]
>> [3] 
>> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt__;!!GF_29dbcQIUBPA!mdKldRDDMc62Zlr3zHPdIXeXjHNI2H6pGE7ZHNvbK3xizCNe0mhV7jMF5WLGZ7bA5B6NizsNXQ$
>>  [kernel[.]org]
>>
>> Signed-off-by: Oleksandr Andrushchenko 
>> ---
>> Since v5:
>> - remove some need_mapping local variables
>> - use own_device in handle_device
>> - add __init for pci_ecam_need_p2m_hwdom_mapping
>> - make pci_host_bridge_mappings use p2m_mmio_direct_dev directly
>> Since v4:
>> - update skip_mapping comment
>> - add comment why we need to map interrupts to Dom0
>> Since v3:
>>   - pass struct map_range_data to map_dt_irq_to_domain
>>   - remove redundant check from map_range_to_domain
>>   - fix handle_device's .skip_mapping
>> Since v2:
>>   - removed check in map_range_to_domain for PCI_DEV
>>     and moved it to handle_device, so the code is
>>     simpler
>>   - s/map_pci_bridge/skip_mapping
>>   - extended comment in pci_host_bridge_mappings
>>   - minor code restructure in construct_dom0
>>   - s/.need_p2m_mapping/.need_p2m_hwdom_mapping and related
>>     callbacks
>>   - unsigned int i; in pci_host_bridge_mappings
>> Since v1:
>>   - Added better description of why and what needs to be mapped into
>>     Domain-0's p2m and what doesn't
>>   - Do not do any mappings for PCI devices while traversing the DT
>>   - Walk all the bridges and make required mappings in one go
>> ---
>>   xen/arch/arm/domain_build.c    | 67 +-
>>   xen/arch/arm/pci/ecam.c    | 14 +++
>>   xen/arch/arm/pci/pci-host-common.c | 50 ++
>>   xen/arch/arm/pci/pci-host-zynqmp.c |  1 +
>>   xen/include/asm-arm/pci.h  | 10 +
>>   xen/include/asm-arm/setup.h    | 13 ++
>>   6 files changed, 126 insertions(+), 29 deletions(-)
>>
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index f7fcb1400c19..c7d992456ca7 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -10,7 +10,6 @@
>>   #include 
>>   #include 
>>   #include 
>> 

Re: [xen-4.15-testing test] 166311: regressions - FAIL

2021-11-23 Thread Jan Beulich
On 24.11.2021 03:00, osstest service owner wrote:
> flight 166311 xen-4.15-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/166311/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. 
> vs. 166198

At the example of this, I find:

Err:1 http://ftp.debian.org/debian buster/main amd64 genisoimage amd64 
9:1.1.11-3+b2
  Could not connect to cache:3143 (172.16.148.6). - connect (113: No route to 
host)
Err:2 http://ftp.debian.org/debian buster/main amd64 rsync amd64 3.1.3-6
  Unable to connect to cache:3143:
E: Failed to fetch 
http://ftp.debian.org/debian/pool/main/c/cdrkit/genisoimage_1.1.11-3+b2_amd64.deb
  Could not connect to cache:3143 (172.16.148.6). - connect (113: No route to 
host)
E: Failed to fetch 
http://ftp.debian.org/debian/pool/main/r/rsync/rsync_3.1.3-6_amd64.deb  Unable 
to connect to cache:3143:
E: Unable to fetch some archives, maybe run apt-get update or try with 
--fix-missing?

Network issue (hopefully just a transient one)?

Jan

>  test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail 
> REGR. vs. 166198
>  test-amd64-i386-xl   12 debian-install   fail REGR. vs. 
> 166198
>  test-amd64-amd64-xl-credit2  12 debian-install   fail REGR. vs. 
> 166198
>  test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 
> 166198
>  test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail 
> REGR. vs. 166198
>  test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. 
> vs. 166198
>  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install 
> fail REGR. vs. 166198
>  test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 14 
> guest-start/debianhvm.repeat fail REGR. vs. 166198
>  test-amd64-i386-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. 
> vs. 166198
>  test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail 
> REGR. vs. 166198
>  test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install   running
>  test-amd64-i386-xl-qemuu-ovmf-amd64  4 syslog-serverrunning
> 
> Tests which did not succeed, but are not blocking:
>  test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 
> 166198
>  test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 
> 166198
>  test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 
> 166198
>  test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 
> 166198
>  test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 
> 166198
>  test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 
> 166198
>  test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 
> 166198
>  test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 
> 166198
>  test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 
> 166198
>  test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 
> 166198
>  test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 
> 166198
>  test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 
> 166198
>  test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 
> 166198
>  test-amd64-amd64-libvirt 15 migrate-support-checkfail   never 
> pass
>  test-amd64-i386-libvirt  15 migrate-support-checkfail   never 
> pass
>  test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never 
> pass
>  test-arm64-arm64-xl  15 migrate-support-checkfail   never 
> pass
>  test-arm64-arm64-xl  16 saverestore-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never 
> pass
>  test-amd64-i386-xl-pvshim14 guest-start  fail   never 
> pass
>  test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never 
> pass
>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
> fail never pass
>  test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never 
> pass
>  test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never 
> pass
>  test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never 
> pass
>  test-arm64-arm64-xl-thunderx 15 migrate-support-check

Re: [PATCH v4] xen: detect uninitialized xenbus in xenbus_init

2021-11-23 Thread Jan Beulich
On 23.11.2021 22:07, Stefano Stabellini wrote:
> From: Stefano Stabellini 
> 
> If the xenstore page hasn't been allocated properly, reading the value
> of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
> error. Instead, it will succeed and return zero. Instead of attempting
> to xen_remap a bad guest physical address, detect this condition and
> return early.
> 
> Note that although a guest physical address of zero for
> HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
> and zero has never been validly used in that capacity.
> 
> Also recognize all bits set as an invalid value.
> 
> For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns
> above ULONG_MAX should never be passed by the Xen tools to HVM guests
> anyway, so check for this condition and return early.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Stefano Stabellini 

Reviewed-by: Jan Beulich 




Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain

2021-11-23 Thread Oleksandr Andrushchenko
Hi, Julien!

On 23.11.21 18:58, Julien Grall wrote:
> Hi,
>
> On 23/11/2021 16:41, Oleksandr Andrushchenko wrote:
>> On 23.11.21 18:12, Julien Grall wrote:
>>> On 23/11/2021 06:58, Oleksandr Andrushchenko wrote:
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
        if ( !has_vpci(d) )
        return 0;

        if ( is_hardware_domain(d) )
        {
        int ret = pci_host_iterate_bridges_and_count(d, 
 vpci_get_num_handlers_cb);

        return ret < 0 ? 0 : ret;
        }

        /*
     * This is a guest domain:
     *
     * 1 for a single emulated host bridge's configuration space.
     */
        return 1;
>>>
>>> I am afraid that my question stands even with this approach. This patch is 
>>> only meant to handle the hardware domain, therefore the change seems to be 
>>> out of context.
>>>
>>> I would prefer if this change is done separately.
>> While I do agree that MSI part and virtual bus topology are not belonging to 
>> this
>> patch I can't agree with the rest: we already have MMIO handlers for guest 
>> domains
>> and we introduce domain_vpci_get_num_mmio_handlers which must also account
>> on guests and stay consistent.
>> So, despite the patch has "hardware domain" in its name it doesn't mean we 
>> should
>> break guests here.
>
> We were already registering the handler for guest domain before your patch. 
> So this is nothing new.
>
> At the moment, we always allocate an extra 16 slot for IO handlers (see 
> MAX_IO_HANDLER). So we are not breaking anything. Instead, this is simply a 
> latent bug.
Agree
>
>> Thus I do think the above is still correct wrt this patch.
>
> The idea of splitting patch is to separate bug fix from new code. This helps 
> backporting and review.
>
> In this case, we don't care about backport (PCI passthrough is no supported) 
> and the fix a simple enough. So I am not going to insist on splitting to a 
> separate patch.
>
> However, this change *must* be explained in the commit message.
I will add a dedicated patch to fix that
>
> Cheers,
>
Thank you,
Oleksandr

Re: [PATCH 1/3] VT-d: prune SAGAW recognition

2021-11-23 Thread Jan Beulich
On 24.11.2021 02:22, Tian, Kevin wrote:
>> From: Jan Beulich 
>> Sent: Tuesday, November 23, 2021 9:40 PM
>>
>> Bit 0 of the capability register field has become reserved at or before
> 
> Bit 0 of 'SAGAW' in the capability register ...

I've changed it, but I thought the use of "field" in the sentence
together with the title would be entirely unambiguous.

>> spec version 2.2. Treat it as such. Replace the effective open-coding of
>> find_first_set_bit(). Adjust local variable types.
>>
>> Signed-off-by: Jan Beulich 
>> ---
>> Strictly speaking IOMMUs supporting only 3-level tables ought to result
>> in guests seeing a suitably reduced physical address width in CPUID.
>> And then the same would apply to restrictions resulting from MGAW.
> 
> Yes. I remember there was some old discussion in Qemu community
> for whether guest physical addr width should be based on IOMMU
> constraints when passthrough device is used. But it didn't go anywhere
> (and I cannot find the link...)

I've added an item to my todo list.

> anyway with above comment fixed:
> 
>   Reviewed-by: Kevin Tian 

Thanks.

Jan




Re: [PATCH v4] xenbus: support large messages

2021-11-23 Thread Juergen Gross

Ping?

On 04.10.21 11:40, Juergen Gross wrote:

Today the implementation of the xenbus protocol in Mini-OS will only
allow to transfer the complete message to or from the ring page buffer.
This is limiting the maximum message size to lower values as the xenbus
protocol normally would allow.

Change that by allowing to transfer the xenbus message in chunks as
soon as they are available.

Avoid crashing Mini-OS in case of illegal data read from the ring
buffer.

Signed-off-by: Juergen Gross 
---
V2:
- drop redundant if (Samuel Thibault)
- move rmb() (Samuel Thibault)
V3:
- correct notification test (Samuel Thibault)
V4:
- more memory barriers (Samuel Thibault)
---
  xenbus/xenbus.c | 210 
  1 file changed, 122 insertions(+), 88 deletions(-)

diff --git a/xenbus/xenbus.c b/xenbus/xenbus.c
index 23de61e..b687678 100644
--- a/xenbus/xenbus.c
+++ b/xenbus/xenbus.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #define min(x,y) ({   \

  typeof(x) tmpx = (x); \
@@ -46,6 +47,7 @@
  static struct xenstore_domain_interface *xenstore_buf;
  static DECLARE_WAIT_QUEUE_HEAD(xb_waitq);
  DECLARE_WAIT_QUEUE_HEAD(xenbus_watch_queue);
+static __DECLARE_SEMAPHORE_GENERIC(xb_write_sem, 1);
  
  xenbus_event_queue xenbus_events;

  static struct watch {
@@ -231,75 +233,103 @@ char *xenbus_wait_for_state_change(const char* path, 
XenbusState *state, xenbus_
  }
  
  
+static void xenbus_read_data(char *buf, unsigned int len)

+{
+unsigned int off = 0;
+unsigned int prod, cons;
+unsigned int size;
+
+while (off != len)
+{
+wait_event(xb_waitq, xenstore_buf->rsp_prod != xenstore_buf->rsp_cons);
+
+prod = xenstore_buf->rsp_prod;
+cons = xenstore_buf->rsp_cons;
+DEBUG("Rsp_cons %d, rsp_prod %d.\n", cons, prod);
+size = min(len - off, prod - cons);
+
+rmb();   /* Make sure data read from ring is ordered with rsp_prod. */
+memcpy_from_ring(xenstore_buf->rsp, buf + off,
+ MASK_XENSTORE_IDX(cons), size);
+off += size;
+mb();/* memcpy() and rsp_cons update must not be reordered. */
+xenstore_buf->rsp_cons += size;
+mb();/* rsp_cons must be visible before we look at rsp_prod. */
+if (xenstore_buf->rsp_prod - cons >= XENSTORE_RING_SIZE)
+notify_remote_via_evtchn(xenbus_evtchn);
+}
+}
+
  static void xenbus_thread_func(void *ign)
  {
  struct xsd_sockmsg msg;
-unsigned prod = xenstore_buf->rsp_prod;
+char *data;
  
  for (;;) {

-wait_event(xb_waitq, prod != xenstore_buf->rsp_prod);
-while (1) {
-prod = xenstore_buf->rsp_prod;
-DEBUG("Rsp_cons %d, rsp_prod %d.\n", xenstore_buf->rsp_cons,
-  xenstore_buf->rsp_prod);
-if (xenstore_buf->rsp_prod - xenstore_buf->rsp_cons < sizeof(msg))
-break;
-rmb();
-memcpy_from_ring(xenstore_buf->rsp, ,
- MASK_XENSTORE_IDX(xenstore_buf->rsp_cons),
- sizeof(msg));
-DEBUG("Msg len %d, %d avail, id %d.\n", msg.len + sizeof(msg),
-  xenstore_buf->rsp_prod - xenstore_buf->rsp_cons, msg.req_id);
-
-if (xenstore_buf->rsp_prod - xenstore_buf->rsp_cons <
-sizeof(msg) + msg.len)
-break;
-
-DEBUG("Message is good.\n");
-
-if (msg.type == XS_WATCH_EVENT) {
-struct xenbus_event *event = malloc(sizeof(*event) + msg.len);
-xenbus_event_queue *events = NULL;
-char *data = (char*)event + sizeof(*event);
-struct watch *watch;
-
-memcpy_from_ring(xenstore_buf->rsp, data,
-MASK_XENSTORE_IDX(xenstore_buf->rsp_cons + sizeof(msg)),
-msg.len);
-
-event->path = data;
-event->token = event->path + strlen(event->path) + 1;
-
-mb();
-xenstore_buf->rsp_cons += msg.len + sizeof(msg);
-
-for (watch = watches; watch; watch = watch->next)
-if (!strcmp(watch->token, event->token)) {
-events = watch->events;
-break;
-}
-
-if (events) {
-event->next = *events;
-*events = event;
-wake_up(_watch_queue);
-} else {
-printk("unexpected watch token %s\n", event->token);
-free(event);
+xenbus_read_data((char *), sizeof(msg));
+DEBUG("Msg len %d, %d avail, id %d.\n", msg.len + sizeof(msg),
+  xenstore_buf->rsp_prod - xenstore_buf->rsp_cons, msg.req_id);
+
+if (msg.len > XENSTORE_PAYLOAD_MAX) {
+printk("Xenstore violates protocol, message longer 

Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices

2021-11-23 Thread Oleksandr Andrushchenko
Hi, Julien!

On 23.11.21 19:15, Julien Grall wrote:
> Hi,
>
> On 23/11/2021 16:44, Oleksandr Andrushchenko wrote:
>> On 23.11.21 18:05, Julien Grall wrote:
>>> Hi Oleksandr,
>>>
>>> On 23/11/2021 06:31, Oleksandr Andrushchenko wrote:


 On 22.11.21 19:17, Julien Grall wrote:
> Hi,
>
> On 22/11/2021 16:23, Oleksandr Andrushchenko wrote:
>> On 22.11.21 17:29, Julien Grall wrote:
>>> I would prefer to go with my way. This can be refined in the future if 
>>> we find Device-Tree that matches what you wrote.
>> Ok, so just to make it clear:
>>     >a PCI device would always be described as a child of the 
>> hostbridges. So I would rework the 'if' to also check if the parent type 
>> is not "pci"
>
> That's correct. Thank you!
 Ok, so how about
        if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, 
 "pci") )
        {
        bool skip = false;

        /*
     * If the parent is also a "pci" device, then "linux,pci-domain"
     * should already be there, so nothing to do then.
     */
>>>
>>> This comment is a bit confusing.
>> Do you have something on your mind?
>
> Yes, explain that we only want to cover hostbridge (see my reply on the 
> previous answer).
I guess this will be explained by the comment to handle_linux_pci_domain below
>
>>> I think what matter if the parent is a "pci" device, then the current node 
>>> must not be a hostbridge. So we can skip it.
>> By skipping you only mean we do not need to add/assign "linux,pci-domain", 
>> right?
>
> Yes.
>
>>> However...
>>>
        if ( node->parent && dt_device_type_is_equal(node->parent, 
 "pci") )
        skip = true;

        if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) 
 )
        {
 I played with a single if and it looks scary...
>>>
>>> ... how about introducing an helper that will return true if this node is 
>>> likely an hostbridge?
>> This is yet a single use of such a check: why would we need a helper and 
>> what would that
>> helper do?
>
> I like splitting code in multiple functions even if they are only called once 
> because this:
>   1) keeps the functions line count small
>   2) easier to understand what is the purpose of the check
>
> In fact, I would actually move the handling for "linux,pci-domain" in a 
> separate helper. Something like:
>
> /*
>  * When PCI passthrough is available we want to keep the
>  * "linux,pci-domain" in sync for every hostbridge.
>  *
>  * Xen may not have a driver for all the hostbridges. So we have
>  * to write an heuristic to detect whether a device node describes
>  * an hostbridge.
>  *
>  * The current heuristic assumes that a device is an hostbridge
>  * if the type is "pci" and then parent type is not "pci".
>  */
> static int handle_linux_pci_domain(struct dt_device *node)
> {
>     if ( !is_pci_passthrough_enabled() )
>   return 0;
>
> if ( !dt_device_type_is_equal(node, "pci") )
>   return 0;
>
>     if ( node->parent && dt_device_type_is_equal(node->parent) )
>   return 0;
>
>     if ( dt_find_property(node, "linux,pci-domain", NULL )
>   return 0;
>
>     /* Allocate and create the linux,pci-domain */
> }
>
I'm fine with this, will move, thanks
> Cheers,
>
Thank you,
Oleksandr

RE: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.

2021-11-23 Thread Henry Wang
Hi,

> -Original Message-
> From: Xen-devel  On Behalf Of
> Sebastian Andrzej Siewior
> Sent: Monday, November 22, 2021 11:47 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> 
> Cc: linux-ker...@vger.kernel.org; Gonglei (Arei) ;
> x...@kernel.org; xen-devel@lists.xenproject.org; Peter Zijlstra
> ; Ingo Molnar ; Valentin
> Schneider ; Boris Ostrovsky
> ; Juergen Gross ; Stefano
> Stabellini ; Thomas Gleixner ;
> Ingo Molnar ; Borislav Petkov ; Dave
> Hansen ; H. Peter Anvin 
> Subject: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be
> brought up again.
> 
> From: "Longpeng(Mike)" 
> 
> A CPU will not show up in virtualized environment which includes an
> Enclave. The VM splits its resources into a primary VM and a Enclave
> VM. While the Enclave is active, the hypervisor will ignore all requests
> to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state.
> The kernel will wait up to ten seconds for CPU to show up
> (do_boot_cpu()) and then rollback the hotplug state back to
> CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is
> set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage.
> 
> After the Enclave VM terminates, the primary VM can bring up the CPU
> again.
> 
> Allow to bring up the CPU if it is in the CPU_UP_PREPARE state.
> 
> [bigeasy: Rewrite commit description.]
> 
> Signed-off-by: Longpeng(Mike) 
> Signed-off-by: Sebastian Andrzej Siewior 
> Link: https://lore.kernel.org/r/20210901051143.2752-1-
> longpe...@huawei.com
> ---
> 
> For XEN: this changes the behaviour as it allows to invoke
> cpu_initialize_context() again should it have have earlier. I *think*
> this is okay and would to bring up the CPU again should the memory
> allocation in cpu_initialize_context() fail.
> 
>  kernel/smpboot.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index f6bc0bc8a2aab..34958d7fe2c1c 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
>*/
>   return -EAGAIN;
> 
> + case CPU_UP_PREPARE:
> + /*
> +  * Timeout while waiting for the CPU to show up. Allow to try
> +  * again later.
> +  */
> + return 0;
> +
>   default:
> 
>   /* Should not happen.  Famous last words. */
> --
> 2.33.1
> 

Reviewed-by: Henry Wang 

Kind regards,

Henry


Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.

2021-11-23 Thread Dongli Zhang



On 11/23/21 3:50 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
wrote:
> 
> 
>> -Original Message-
>> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
>> Sent: Wednesday, November 24, 2021 5:22 AM
>> To: Sebastian Andrzej Siewior ; Longpeng (Mike, Cloud
>> Infrastructure Service Product Dept.) 
>> Cc: linux-ker...@vger.kernel.org; Gonglei (Arei) ;
>> x...@kernel.org; xen-devel@lists.xenproject.org; Peter Zijlstra
>> ; Ingo Molnar ; Valentin Schneider
>> ; Boris Ostrovsky ;
>> Juergen Gross ; Stefano Stabellini ;
>> Thomas Gleixner ; Ingo Molnar ; 
>> Borislav
>> Petkov ; Dave Hansen ; H. Peter
>> Anvin 
>> Subject: Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be
>> brought up again.
>>
>> Tested-by: Dongli Zhang 
>>
>>
>> The bug fixed by commit 53fafdbb8b21 ("KVM: x86: switch KVMCLOCK base to
>> monotonic raw clock") may leave the cpu_hotplug_state at CPU_UP_PREPARE. As a
>> result, to online this CPU again (even after removal) is always failed.
>>
>> I have tested that this patch works well to workaround the issue, by 
>> introducing
>> either a mdeley(11000) or while(1); to start_secondary(). That is, to online
>> the
>> same CPU again is successful even after initial do_boot_cpu() failure.
>>
>> 1. add mdelay(11000) or while(1); to the start_secondary().
>>
>> 2. to online CPU is failed at do_boot_cpu().
>>
> 
> Thanks for your testing :)
> 
> Does the cpu4 spin in wait_for_master_cpu() in your case ?

I did two tests.

TEST 1.

I added "mdelay(11000);" as the first line in start_secondary(). Once the issue
was encountered, the RIP of CPU=4 was 8c242021 (from QEMU's "info
registers -a") which was in the range of wait_for_master_cpu().

# cat /proc/kallsyms | grep 8c2420
8c242010 t wait_for_master_cpu
8c242030 T load_fixmap_gdt
8c242060 T native_write_cr4
8c2420c0 T cr4_init


TEST 2.

I added "while(true);" as the first line in start_secondary(). Once the issue
was encountered, the RIP of CPU=4 was 91654c0a (from QEMU's "info
registers -a") which was in the range of start_secondary().

# cat /proc/kallsyms | grep 91654c0
91654c00 t start_secondary

Dongli Zhang


> 
>> 3. to online CPU again is failed without this patch.
>>
>> # echo 1 > /sys/devices/system/cpu/cpu4/online
>> -su: echo: write error: Input/output error
>>
>> 4. to online CPU again is successful with this patch.
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>> On 11/22/21 7:47 AM, Sebastian Andrzej Siewior wrote:
>>> From: "Longpeng(Mike)" 
>>>
>>> A CPU will not show up in virtualized environment which includes an
>>> Enclave. The VM splits its resources into a primary VM and a Enclave
>>> VM. While the Enclave is active, the hypervisor will ignore all requests
>>> to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state.
>>> The kernel will wait up to ten seconds for CPU to show up
>>> (do_boot_cpu()) and then rollback the hotplug state back to
>>> CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is
>>> set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage.
>>>
>>> After the Enclave VM terminates, the primary VM can bring up the CPU
>>> again.
>>>
>>> Allow to bring up the CPU if it is in the CPU_UP_PREPARE state.
>>>
>>> [bigeasy: Rewrite commit description.]
>>>
>>> Signed-off-by: Longpeng(Mike) 
>>> Signed-off-by: Sebastian Andrzej Siewior 
>>> Link:
>> https://urldefense.com/v3/__https://lore.kernel.org/r/20210901051143.2752-1
>> -longpeng2@huawei.com__;!!ACWV5N9M2RV99hQ!d4sCCXMQV7ekFwpd21vo1_9K-m5h4VZ-g
>> E8Z62PLL58DT4VJ6StH57TR_KpBdbwhBE0$
>>> ---
>>>
>>> For XEN: this changes the behaviour as it allows to invoke
>>> cpu_initialize_context() again should it have have earlier. I *think*
>>> this is okay and would to bring up the CPU again should the memory
>>> allocation in cpu_initialize_context() fail.
>>>
>>>  kernel/smpboot.c | 7 +++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
>>> index f6bc0bc8a2aab..34958d7fe2c1c 100644
>>> --- a/kernel/smpboot.c
>>> +++ b/kernel/smpboot.c
>>> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
>>>  */
>>> return -EAGAIN;
>>>
>>> +   case CPU_UP_PREPARE:
>>> +   /*
>>> +* Timeout while waiting for the CPU to show up. Allow to try
>>> +* again later.
>>> +*/
>>> +   return 0;
>>> +
>>> default:
>>>
>>> /* Should not happen.  Famous last words. */
>>>



Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource

2021-11-23 Thread Juergen Gross

On 23.11.21 17:46, Oleksandr wrote:


On 20.11.21 04:19, Stefano Stabellini wrote:

Hi Stefano, Juergen, all



Juergen please see the bottom of the email

On Fri, 19 Nov 2021, Oleksandr wrote:

On 19.11.21 02:59, Stefano Stabellini wrote:

On Tue, 9 Nov 2021, Oleksandr wrote:

On 28.10.21 19:37, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.


On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

The main reason of this change is that unpopulated-alloc
code cannot be used in its current form on Arm, but there
is a desire to reuse it to avoid wasting real RAM pages
for the grant/foreign mappings.

The problem is that system "iomem_resource" is used for
the address space allocation, but the really unallocated
space can't be figured out precisely by the domain on Arm
without hypervisor involvement. For example, not all device
I/O regions are known by the time domain starts creating
grant/foreign mappings. And following the advise from
"iomem_resource" we might end up reusing these regions by
a mistake. So, the hypervisor which maintains the P2M for
the domain is in the best position to provide unused regions
of guest physical address space which could be safely used
to create grant/foreign mappings.

Introduce new helper arch_xen_unpopulated_init() which purpose
is to create specific Xen resource based on the memory regions
provided by the hypervisor to be used as unused space for Xen
scratch pages.

If arch doesn't implement arch_xen_unpopulated_init() to
initialize Xen resource the default "iomem_resource" will be used.
So the behavior on x86 won't be changed.

Also fall back to allocate xenballooned pages (steal real RAM
pages) if we do not have any suitable resource to work with and
as the result we won't be able to provide unpopulated pages.

Signed-off-by: Oleksandr Tyshchenko 
---
Changes RFC -> V2:
  - new patch, instead of
   "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
provide
unallocated space"
---
    drivers/xen/unpopulated-alloc.c | 89
+++--
    include/xen/xen.h   |  2 +
    2 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/unpopulated-alloc.c
b/drivers/xen/unpopulated-alloc.c
index a03dc5b..1f1d8d8 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -8,6 +8,7 @@
  #include 
    +#include 
    #include 
    #include 
    @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
    static struct page *page_list;
    static unsigned int list_count;
    +static struct resource *target_resource;
+static struct resource xen_resource = {
+    .name = "Xen unused space",
+};
+
+/*
+ * If arch is not happy with system "iomem_resource" being used for
+ * the region allocation it can provide it's own view by 
initializing
+ * "xen_resource" with unused regions of guest physical address 
space

+ * provided by the hypervisor.
+ */
+int __weak arch_xen_unpopulated_init(struct resource *res)
+{
+    return -ENOSYS;
+}
+
    static int fill_list(unsigned int nr_pages)
    {
    struct dev_pagemap *pgmap;
-    struct resource *res;
+    struct resource *res, *tmp_res = NULL;
    void *vaddr;
    unsigned int i, alloc_pages = round_up(nr_pages,
PAGES_PER_SECTION);
-    int ret = -ENOMEM;
+    int ret;
  res = kzalloc(sizeof(*res), GFP_KERNEL);
    if (!res)
@@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
    res->name = "Xen scratch";
    res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
    -    ret = allocate_resource(_resource, res,
+    ret = allocate_resource(target_resource, res,
    alloc_pages * PAGE_SIZE, 0, -1,
    PAGES_PER_SECTION * PAGE_SIZE, NULL,
NULL);
    if (ret < 0) {
@@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
    goto err_resource;
    }
    +    /*
+ * Reserve the region previously allocated from Xen resource
to avoid
+ * re-using it by someone else.
+ */
+    if (target_resource != _resource) {
+    tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
+    if (!res) {
+    ret = -ENOMEM;
+    goto err_insert;
+    }
+
+    tmp_res->name = res->name;
+    tmp_res->start = res->start;
+    tmp_res->end = res->end;
+    tmp_res->flags = res->flags;
+
+    ret = insert_resource(_resource, tmp_res);
+    if (ret < 0) {
+    pr_err("Cannot insert IOMEM resource [%llx -
%llx]\n",
+   tmp_res->start, tmp_res->end);
+    kfree(tmp_res);
+    goto err_insert;
+    }
+    }

I am a bit confused.. why do we need to do this? Who could be
erroneously re-using the region? Are you saying that the next time
allocate_resource is called it could find the same region again? It
doesn't seem possible?
No, as I understand the allocate_resource() being called for the 
same root
resource won't provide the same region... 

Re: [PATCH v4] xen: detect uninitialized xenbus in xenbus_init

2021-11-23 Thread Juergen Gross

On 23.11.21 22:07, Stefano Stabellini wrote:

From: Stefano Stabellini 

If the xenstore page hasn't been allocated properly, reading the value
of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
error. Instead, it will succeed and return zero. Instead of attempting
to xen_remap a bad guest physical address, detect this condition and
return early.

Note that although a guest physical address of zero for
HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
and zero has never been validly used in that capacity.

Also recognize all bits set as an invalid value.

For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns
above ULONG_MAX should never be passed by the Xen tools to HVM guests
anyway, so check for this condition and return early.

Cc: sta...@vger.kernel.org
Signed-off-by: Stefano Stabellini 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


About support for memory event on AMD CPUs

2021-11-23 Thread YAN Untitled
Dear developers,

Short version: is memory event supported on AMD CPUs or is it going to be
supported?

Long version:

Environment: LibVMI 0.15.0 + Xen 4.14.3 on an **AMD CPU** (5950x), 64-bit
Debian 11 Dom0.

What I am trying to accomplish: register a **memory event** handler, so that
I can capture all memory accesses within a specific range issued by any thread,
or all memory accesses within any range issued by a specific thread.

What I got instead: an error from LibVMI saying
"xc_hvm_set_mem_access failed with code: -1".

Some investigation: by inspecting the source code of LibVMI, I find the direct
cause is one of the libxc functions, 1) xc_set_mem_access or
2) xc_altp2m_set_mem_access, returned error code -1.

After some searching, I found someone else having a similar problem [1]. I also
noted LibVMI says:

> Currently only the Xen Hypervisor provides these features,
> and some of these are specifically only available on Intel CPUs

However, I can't find the exact confirmation for the availability of memory
event on AMD CPUs from https://wiki.xenproject.org.

Any suggestion?

(Sorry I was sending this email to the wrong place)

[1] https://github.com/libvmi/libvmi/pull/709#discussion_r353729777


Untitled YAN

Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.

2021-11-23 Thread Valentin Schneider
On 22/11/21 16:47, Sebastian Andrzej Siewior wrote:
> From: "Longpeng(Mike)" 
>
> A CPU will not show up in virtualized environment which includes an
> Enclave. The VM splits its resources into a primary VM and a Enclave
> VM. While the Enclave is active, the hypervisor will ignore all requests
> to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state.
> The kernel will wait up to ten seconds for CPU to show up
> (do_boot_cpu()) and then rollback the hotplug state back to
> CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is
> set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage.
>

For my own education, this is talking about *host* CPU hotplug, right?

> After the Enclave VM terminates, the primary VM can bring up the CPU
> again.
>
> Allow to bring up the CPU if it is in the CPU_UP_PREPARE state.
>
> [bigeasy: Rewrite commit description.]
>
> Signed-off-by: Longpeng(Mike) 
> Signed-off-by: Sebastian Andrzej Siewior 
> Link: https://lore.kernel.org/r/20210901051143.2752-1-longpe...@huawei.com
> ---
>
> For XEN: this changes the behaviour as it allows to invoke
> cpu_initialize_context() again should it have have earlier. I *think*
> this is okay and would to bring up the CPU again should the memory
> allocation in cpu_initialize_context() fail.

Virt stuff notwithstanding, that looks OK to me.
Reviewed-by: Valentin Schneider 

That said, AFAICT only powerpc makes actual use of the state being set to
CPU_UP_PREPARE, it looks to be needless bookkeeping for everyone else (and
there's archs that seem happy using only CPU_DEAD / CPU_POST_DEAD).

>
>  kernel/smpboot.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index f6bc0bc8a2aab..34958d7fe2c1c 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
>*/
>   return -EAGAIN;
>
> + case CPU_UP_PREPARE:
> + /*
> +  * Timeout while waiting for the CPU to show up. Allow to try
> +  * again later.
> +  */
> + return 0;
> +
>   default:
>
>   /* Should not happen.  Famous last words. */
> --
> 2.33.1



[xen-unstable-smoke test] 166332: tolerable all pass - PUSHED

2021-11-23 Thread osstest service owner
flight 166332 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/166332/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  a5706b80f42e028c5153fc50734a1e86a99ff9d2
baseline version:
 xen  74a11c43fd7e074b1f77631b446dd2115eacb9e8

Last test of basis   166312  2021-11-23 13:00:27 Z0 days
Testing same since   166326  2021-11-23 18:01:44 Z0 days2 attempts


People who touched revisions under test:
  Ian Jackson 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   74a11c43fd..a5706b80f4  a5706b80f42e028c5153fc50734a1e86a99ff9d2 -> smoke



[xen-4.15-testing test] 166311: regressions - FAIL

2021-11-23 Thread osstest service owner
flight 166311 xen-4.15-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/166311/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
166198
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail 
REGR. vs. 166198
 test-amd64-i386-xl   12 debian-install   fail REGR. vs. 166198
 test-amd64-amd64-xl-credit2  12 debian-install   fail REGR. vs. 166198
 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 
166198
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail 
REGR. vs. 166198
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. 
vs. 166198
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install 
fail REGR. vs. 166198
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 14 
guest-start/debianhvm.repeat fail REGR. vs. 166198
 test-amd64-i386-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
166198
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. 
vs. 166198
 test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install   running
 test-amd64-i386-xl-qemuu-ovmf-amd64  4 syslog-serverrunning

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 166198
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166198
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 166198
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 166198
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166198
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166198
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 166198
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166198
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 166198
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 166198
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166198
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 166198
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 166198
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 

[xen-4.16-testing baseline test] 166322: FAIL

2021-11-23 Thread osstest service owner
"Old" tested version had not actually been tested; therefore in this
flight we test it, rather than a new candidate.  The baseline, if
any, is the most recent actually tested revision.

flight 166322 xen-4.16-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/166322/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm  4 syslog-server running
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm  4 syslog-server running
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 STARTING  running
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 20 
guest-start/debianhvm.repeat running

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-win7-amd64 12 windows-install  fail baseline untested
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 xen-install fail baseline untested
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail 
baseline untested
 test-amd64-amd64-xl-shadow   12 debian-install  fail baseline untested
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail 
baseline untested
 test-amd64-i386-xl-xsm   12 debian-install  fail baseline untested
 test-amd64-i386-freebsd10-amd64 12 freebsd-install  fail baseline untested
 test-amd64-i386-freebsd10-i386 12 freebsd-install   fail baseline untested
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install 
fail baseline untested
 test-amd64-i386-qemut-rhel6hvm-intel 12 redhat-install  fail baseline untested
 test-amd64-i386-xl-qemuu-win7-amd64 12 windows-install  fail baseline untested
 test-amd64-i386-xl-qemut-ws16-amd64 12 windows-install  fail baseline untested
 test-amd64-amd64-xl  23 guest-start.2   fail baseline untested
 test-armhf-armhf-xl-arndale   8 xen-bootfail baseline untested
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail baseline untested
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stop  fail baseline untested
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stop  fail baseline untested
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail baseline 
untested
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail baseline 
untested
 test-armhf-armhf-libvirt   16 saverestore-support-check fail baseline untested
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stop  fail baseline untested
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail 
baseline untested
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail baseline 
untested
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop   fail baseline untested
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  

RE: [PATCH 3/3] VT-d: conditionalize IOTLB register offset check

2021-11-23 Thread Tian, Kevin
> From: Jan Beulich 
> Sent: Tuesday, November 23, 2021 9:40 PM
> 
> As of commit 6773b1a7584a ("VT-d: Don't assume register-based
> invalidation is always supported") we don't (try to) use register based
> invalidation anymore when that's not supported by hardware. Hence
> there's also no point in the respective check, avoiding pointless IOMMU
> initialization failure. After all the spec (version 3.3 at the time of
> writing) doesn't say what the respective Extended Capability Register
> field would contain in such a case.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Kevin Tian 

> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1228,7 +1228,8 @@ int __init iommu_alloc(struct acpi_drhd_
> 
>  if ( cap_fault_reg_offset(iommu->cap) +
>   cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN >
> PAGE_SIZE ||
> - ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE )
> + (has_register_based_invalidation(iommu) &&
> +  ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE) )
>  {
>  printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n");
>  print_iommu_regs(drhd);



RE: [PATCH 2/3] VT-d: correct off-by-1 in fault register range check

2021-11-23 Thread Tian, Kevin
> From: Jan Beulich 
> Sent: Tuesday, November 23, 2021 9:40 PM
> 
> All our present implementation requires is that the range fully fits
> in a single page. No need to exclude the case of the last register
> extending right to the end of that page.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Kevin Tian 

> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1229,7 +1229,7 @@ int __init iommu_alloc(struct acpi_drhd_
>  quirk_iommu_caps(iommu);
> 
>  if ( cap_fault_reg_offset(iommu->cap) +
> - cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN >=
> PAGE_SIZE ||
> + cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN >
> PAGE_SIZE ||
>   ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE )
>  {
>  printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n");



RE: [PATCH 1/3] VT-d: prune SAGAW recognition

2021-11-23 Thread Tian, Kevin
> From: Jan Beulich 
> Sent: Tuesday, November 23, 2021 9:40 PM
> 
> Bit 0 of the capability register field has become reserved at or before

Bit 0 of 'SAGAW' in the capability register ...

> spec version 2.2. Treat it as such. Replace the effective open-coding of
> find_first_set_bit(). Adjust local variable types.
> 
> Signed-off-by: Jan Beulich 
> ---
> Strictly speaking IOMMUs supporting only 3-level tables ought to result
> in guests seeing a suitably reduced physical address width in CPUID.
> And then the same would apply to restrictions resulting from MGAW.

Yes. I remember there was some old discussion in Qemu community
for whether guest physical addr width should be based on IOMMU
constraints when passthrough device is used. But it didn't go anywhere
(and I cannot find the link...)

anyway with above comment fixed:

Reviewed-by: Kevin Tian 

> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -356,7 +356,7 @@ static uint64_t domain_pgd_maddr(struct
>  pgd_maddr = hd->arch.vtd.pgd_maddr;
>  }
> 
> -/* Skip top levels of page tables for 2- and 3-level DRHDs. */
> +/* Skip top level(s) of page tables for less-than-maximum level DRHDs. */
>  for ( agaw = level_to_agaw(4);
>agaw != level_to_agaw(nr_pt_levels);
>agaw-- )
> @@ -1183,8 +1183,7 @@ static int __init iommu_set_interrupt(st
>  int __init iommu_alloc(struct acpi_drhd_unit *drhd)
>  {
>  struct vtd_iommu *iommu;
> -unsigned long sagaw, nr_dom;
> -int agaw;
> +unsigned int sagaw, agaw = 0, nr_dom;
> 
>  iommu = xzalloc(struct vtd_iommu);
>  if ( iommu == NULL )
> @@ -1237,14 +1236,13 @@ int __init iommu_alloc(struct acpi_drhd_
>  return -ENODEV;
>  }
> 
> -/* Calculate number of pagetable levels: between 2 and 4. */
> +/* Calculate number of pagetable levels: 3 or 4. */
>  sagaw = cap_sagaw(iommu->cap);
> -for ( agaw = level_to_agaw(4); agaw >= 0; agaw-- )
> -if ( test_bit(agaw, ) )
> -break;
> -if ( agaw < 0 )
> +if ( sagaw & 6 )
> +agaw = find_first_set_bit(sagaw & 6);
> +if ( !agaw )
>  {
> -printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %lx\n",
> sagaw);
> +printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n",
> sagaw);
>  print_iommu_regs(drhd);
>  return -ENODEV;
>  }



[xen-unstable-smoke test] 166326: regressions - FAIL

2021-11-23 Thread osstest service owner
flight 166326 xen-unstable-smoke real [real]
flight 166330 xen-unstable-smoke real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/166326/
http://logs.test-lab.xenproject.org/osstest/logs/166330/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 166312

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  a5706b80f42e028c5153fc50734a1e86a99ff9d2
baseline version:
 xen  74a11c43fd7e074b1f77631b446dd2115eacb9e8

Last test of basis   166312  2021-11-23 13:00:27 Z0 days
Testing same since   166326  2021-11-23 18:01:44 Z0 days1 attempts


People who touched revisions under test:
  Ian Jackson 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  fail
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit a5706b80f42e028c5153fc50734a1e86a99ff9d2
Author: Ian Jackson 
Date:   Tue Nov 23 16:55:32 2021 +

Set version to 4.17: rerun autogen.sh

Signed-off-by: Ian Jackson 

commit 8579d208ab78486717d396cb297f06526fe6b492
Author: Ian Jackson 
Date:   Tue Nov 23 16:54:08 2021 +

Set version to 4.17; 4.16 has branched

Signed-off-by: Ian Jackson 

commit 4fe6e73a8cc8f3fa93a7c6a52a9a03b24e51ea18
Author: Ian Jackson 
Date:   Tue Nov 23 16:51:47 2021 +

Revert "Config.mk: pin QEMU_UPSTREAM_REVISION (prep for Xen 4.16 RC1)"

This branch is unstable again now.

This reverts commit c9ce6afbf2d7772f47fc572bb7fc9555724927ed.
(qemu changes not included)



Re: [RFC PATCH 0/2] Boot time cpupools

2021-11-23 Thread Stefano Stabellini
On Tue, 23 Nov 2021, Bertrand Marquis wrote:
> Hi Julien,
> 
> > On 23 Nov 2021, at 13:54, Julien Grall  wrote:
> > 
> > Hi Stefano,
> > 
> > On 19/11/2021 18:55, Stefano Stabellini wrote:
> >> On Fri, 19 Nov 2021, Julien Grall wrote:
> >>> I like better Juergen's version. But before agreeing on the command line 
> >>> , I
> >>> would like to understand how Linux is dealing with big.LITTLE today (see 
> >>> my
> >>> question above).
> >> I also like Juergen's version better :-)
> >> The device tree binding that covers big.LITTLE is the same that covers
> >> cpus: Documentation/devicetree/bindings/arm/cpus.yaml
> > 
> > Are you sure? I found 
> > Documentation/devicetree/bindings/arm/cpu-capacity.txt.
> > 
> >> So basically, you can tell it is a big.LITTLE system because the
> >> compatible strings differ between certain cpus, e.g.:
> >>   cpu@0 {
> >> device_type = "cpu";
> >> compatible = "arm,cortex-a15";
> >> reg = <0x0>;
> >>   };
> >>   cpu@1 {
> >> device_type = "cpu";
> >> compatible = "arm,cortex-a15";
> >> reg = <0x1>;
> >>   };
> >>   cpu@100 {
> >> device_type = "cpu";
> >> compatible = "arm,cortex-a7";
> >> reg = <0x100>;
> >>   };
> >>   cpu@101 {
> >> device_type = "cpu";
> >> compatible = "arm,cortex-a7";
> >> reg = <0x101>;
> >>   };
> > 
> > I have not found any code in Linux using the compatible. Instead, they all 
> > seem to use the cpu-map (see drivers/base/arch_topology.c).
> > 
> > Anyway, to me it would be better to parse the Device-Tree over using the 
> > MIDR. The advantage is we can cover a wide range of cases (you may have 
> > processor with the same MIDR but different frequency). For now, we could 
> > create a cpupool per cluster.
> 
> This is a really good idea as this could also work for NUMA systems.
> 
> So reg & ~0xff would give the cluster number ?
> 
> We will probably end up splitting cores into different cpupools even if they 
> are all the same as there are several CPUs having several clusters but the 
> same cores (I recall some NXP boards being like that).
> 
> Some device tree also have a cpu-map definition:
> cpu-map {
> cluster0 {
> core0 {
> cpu = <_0>;
> };
> core1 {
> cpu = <_1>;
> };
> };
> 
> cluster1 {
> core0 {
> cpu = <_0>;
> };
> core1 {
> cpu = <_1>;
> };
> core2 {
> cpu = <_2>;
> };
> core3 {
> cpu = <_3>;
> };
> };
> };
> 
> @stefano: is the cpu-map something standard ? Lots of device trees do have it 
> in Linux now but I do not recall seeing that on older device trees.
> Maybe using cpu-map could be a solution, we could say that device tree 
> without a cpu-map are not supported.


cpu-map is newer than big.LITTLE support in Linux. See for instance
commit 4ab328f06. Before cpu-map was introduced, Linux used mostly the
MPIDR or sometimes the *machine* compatible string. I did find one
example of a board where the cpu compatible string is the same for both
big and LITTLE CPUs: arch/arm64/boot/dts/rockchip/rk3368.dtsi. That
could be the reason why the cpu compatible string is not used for
detecting big.LITTLE. Sorry about that, my earlier suggestion was wrong.

Yes I think using cpu-map would be fine. It seems to be widely used by
different vendors. (Even if something as generic as cpu-map should
really be under the devicetree.org spec, not under Linux, but anyway.)

Only one note: you mentioned NUMA. As you can see from
Documentation/devicetree/bindings/numa.txt, NUMA doesn't use cpu-map.
Instead, it uses numa-node-id and distance-map.



Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource

2021-11-23 Thread Stefano Stabellini
On Tue, 23 Nov 2021, Oleksandr wrote:
> > Actually after reading your replies and explanation I changed opinion: I
> > think we do need the fallback because Linux cannot really assume that
> > it is running on "new Xen" so it definitely needs to keep working if
> > CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not
> > advertised.
> > 
> > I think we'll have to roll back some of the changes introduced by
> > 121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is
> > enabled we cannot know if we can use unpopulated-alloc or whether we
> > have to use alloc_xenballooned_pages until we parse the /hypervisor node
> > in device tree at runtime.
> 
> Exactly!
> 
> 
> > 
> > In short, we cannot switch between unpopulated-alloc and
> > alloc_xenballooned_pages at build time, we have to do it at runtime
> > (boot time).
> 
> +1
> 
> 
> I created a patch to partially revert 121f2faca2c0a "xen/balloon: rename
> alloc/free_xenballooned_pages".
> 
> If there is no objections I will add it to V3 (which is almost ready, except
> the fallback bits). Could you please tell me what do you think?
 
It makes sense to me. You can add my Reviewed-by.

 
> From dc79bcd425358596d95e715a8bd8b81deaaeb703 Mon Sep 17 00:00:00 2001
> From: Oleksandr Tyshchenko 
> Date: Tue, 23 Nov 2021 18:14:41 +0200
> Subject: [PATCH] xen/balloon: Bring alloc(free)_xenballooned_pages helpers
>  back
> 
> This patch rolls back some of the changes introduced by commit
> 121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages"
> in order to make possible to still allocate xenballooned pages
> if CONFIG_XEN_UNPOPULATED_ALLOC is enabled.
> 
> On Arm the unpopulated pages will be allocated on top of extended
> regions provided by Xen via device-tree (the subsequent patches
> will add required bits to support unpopulated-alloc feature on Arm).
> The problem is that extended regions feature has been introduced
> into Xen quite recently (during 4.16 release cycle). So this
> effectively means that Linux must only use unpopulated-alloc on Arm
> if it is running on "new Xen" which advertises these regions.
> But, it will only be known after parsing the "hypervisor" node
> at boot time, so before doing that we cannot assume anything.
> 
> In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled
> and the extended regions are not advertised (Linux is running on
> "old Xen", etc) we need the fallback to alloc_xenballooned_pages().
> 
> This way we wouldn't reduce the amount of memory usable (wasting
> RAM pages) for any of the external mappings anymore (and eliminate
> XSA-300) with "new Xen", but would be still functional ballooning
> out RAM pages with "old Xen".
> 
> Also rename alloc(free)_xenballooned_pages to xen_alloc(free)_ballooned_pages.
> 
> Signed-off-by: Oleksandr Tyshchenko 
> ---
>  drivers/xen/balloon.c | 20 +---
>  include/xen/balloon.h |  3 +++
>  include/xen/xen.h |  6 ++
>  3 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index ba2ea11..a2c4fc49 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -581,7 +581,6 @@ void balloon_set_new_target(unsigned long target)
>  }
>  EXPORT_SYMBOL_GPL(balloon_set_new_target);
> 
> -#ifndef CONFIG_XEN_UNPOPULATED_ALLOC
>  static int add_ballooned_pages(unsigned int nr_pages)
>  {
>  enum bp_state st;
> @@ -610,12 +609,12 @@ static int add_ballooned_pages(unsigned int nr_pages)
>  }
> 
>  /**
> - * xen_alloc_unpopulated_pages - get pages that have been ballooned out
> + * xen_alloc_ballooned_pages - get pages that have been ballooned out
>   * @nr_pages: Number of pages to get
>   * @pages: pages returned
>   * @return 0 on success, error otherwise
>   */
> -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages)
> +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages)
>  {
>  unsigned int pgno = 0;
>  struct page *page;
> @@ -652,23 +651,23 @@ int xen_alloc_unpopulated_pages(unsigned int nr_pages,
> struct page **pages)
>  return 0;
>   out_undo:
>  mutex_unlock(_mutex);
> -    xen_free_unpopulated_pages(pgno, pages);
> +    xen_free_ballooned_pages(pgno, pages);
>  /*
> -     * NB: free_xenballooned_pages will only subtract pgno pages, but since
> +     * NB: xen_free_ballooned_pages will only subtract pgno pages, but since
>   * target_unpopulated is incremented with nr_pages at the start we need
>   * to remove the remaining ones also, or accounting will be screwed.
>   */
>  balloon_stats.target_unpopulated -= nr_pages - pgno;
>  return ret;
>  }
> -EXPORT_SYMBOL(xen_alloc_unpopulated_pages);
> +EXPORT_SYMBOL(xen_alloc_ballooned_pages);
> 
>  /**
> - * xen_free_unpopulated_pages - return pages retrieved with
> get_ballooned_pages
> + * xen_free_ballooned_pages - return pages retrieved with get_ballooned_pages
>   * @nr_pages: Number of pages
>   * @pages: 

Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.

2021-11-23 Thread Dongli Zhang
Tested-by: Dongli Zhang 


The bug fixed by commit 53fafdbb8b21 ("KVM: x86: switch KVMCLOCK base to
monotonic raw clock") may leave the cpu_hotplug_state at CPU_UP_PREPARE. As a
result, to online this CPU again (even after removal) is always failed.

I have tested that this patch works well to workaround the issue, by introducing
either a mdeley(11000) or while(1); to start_secondary(). That is, to online the
same CPU again is successful even after initial do_boot_cpu() failure.

1. add mdelay(11000) or while(1); to the start_secondary().

2. to online CPU is failed at do_boot_cpu().

3. to online CPU again is failed without this patch.

# echo 1 > /sys/devices/system/cpu/cpu4/online
-su: echo: write error: Input/output error

4. to online CPU again is successful with this patch.

Thank you very much!

Dongli Zhang

On 11/22/21 7:47 AM, Sebastian Andrzej Siewior wrote:
> From: "Longpeng(Mike)" 
> 
> A CPU will not show up in virtualized environment which includes an
> Enclave. The VM splits its resources into a primary VM and a Enclave
> VM. While the Enclave is active, the hypervisor will ignore all requests
> to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state.
> The kernel will wait up to ten seconds for CPU to show up
> (do_boot_cpu()) and then rollback the hotplug state back to
> CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is
> set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage.
> 
> After the Enclave VM terminates, the primary VM can bring up the CPU
> again.
> 
> Allow to bring up the CPU if it is in the CPU_UP_PREPARE state.
> 
> [bigeasy: Rewrite commit description.]
> 
> Signed-off-by: Longpeng(Mike) 
> Signed-off-by: Sebastian Andrzej Siewior 
> Link: 
> https://urldefense.com/v3/__https://lore.kernel.org/r/20210901051143.2752-1-longpeng2@huawei.com__;!!ACWV5N9M2RV99hQ!d4sCCXMQV7ekFwpd21vo1_9K-m5h4VZ-gE8Z62PLL58DT4VJ6StH57TR_KpBdbwhBE0$
>  
> ---
> 
> For XEN: this changes the behaviour as it allows to invoke
> cpu_initialize_context() again should it have have earlier. I *think*
> this is okay and would to bring up the CPU again should the memory
> allocation in cpu_initialize_context() fail.
> 
>  kernel/smpboot.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index f6bc0bc8a2aab..34958d7fe2c1c 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
>*/
>   return -EAGAIN;
>  
> + case CPU_UP_PREPARE:
> + /*
> +  * Timeout while waiting for the CPU to show up. Allow to try
> +  * again later.
> +  */
> + return 0;
> +
>   default:
>  
>   /* Should not happen.  Famous last words. */
> 



[PATCH v4] xen: detect uninitialized xenbus in xenbus_init

2021-11-23 Thread Stefano Stabellini
From: Stefano Stabellini 

If the xenstore page hasn't been allocated properly, reading the value
of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
error. Instead, it will succeed and return zero. Instead of attempting
to xen_remap a bad guest physical address, detect this condition and
return early.

Note that although a guest physical address of zero for
HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
and zero has never been validly used in that capacity.

Also recognize all bits set as an invalid value.

For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns
above ULONG_MAX should never be passed by the Xen tools to HVM guests
anyway, so check for this condition and return early.

Cc: sta...@vger.kernel.org
Signed-off-by: Stefano Stabellini 
---
Changes in v4:
- say "all bits set" instead of INVALID_PFN
- improve check

Changes in v3:
- improve in-code comment
- improve check

Changes in v2:
- add check for ULLONG_MAX (unitialized)
- add check for ULONG_MAX #if BITS_PER_LONG == 32 (actual error)
- add pr_err error message

 drivers/xen/xenbus/xenbus_probe.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 94405bb3829e..251b26439733 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -951,6 +951,29 @@ static int __init xenbus_init(void)
err = hvm_get_parameter(HVM_PARAM_STORE_PFN, );
if (err)
goto out_error;
+   /*
+* Uninitialized hvm_params are zero and return no error.
+* Although it is theoretically possible to have
+* HVM_PARAM_STORE_PFN set to zero on purpose, in reality it is
+* not zero when valid. If zero, it means that Xenstore hasn't
+* been properly initialized. Instead of attempting to map a
+* wrong guest physical address return error.
+*
+* Also recognize all bits set as an invalid value.
+*/
+   if (!v || !~v) {
+   err = -ENOENT;
+   goto out_error;
+   }
+   /* Avoid truncation on 32-bit. */
+#if BITS_PER_LONG == 32
+   if (v > ULONG_MAX) {
+   pr_err("%s: cannot handle HVM_PARAM_STORE_PFN=%llx > 
ULONG_MAX\n",
+  __func__, v);
+   err = -EINVAL;
+   goto out_error;
+   }
+#endif
xen_store_gfn = (unsigned long)v;
xen_store_interface =
xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
-- 
2.25.1




Re: [PATCH 0/5] xen: cleanup detection of non-essential pv devices

2021-11-23 Thread Boris Ostrovsky



On 11/22/21 3:20 AM, Juergen Gross wrote:

On 22.10.21 08:47, Juergen Gross wrote:

Today the non-essential pv devices are hard coded in the xenbus driver
and this list is lacking multiple entries.

This series reworks the detection logic of non-essential devices by
adding a flag for that purpose to struct xenbus_driver.

Juergen Gross (5):
   xen: add "not_essential" flag to struct xenbus_driver
   xen: flag xen_drm_front to be not essential for system boot
   xen: flag hvc_xen to be not essential for system boot
   xen: flag pvcalls-front to be not essential for system boot
   xen: flag xen_snd_front to be not essential for system boot

  drivers/gpu/drm/xen/xen_drm_front.c    |  1 +
  drivers/input/misc/xen-kbdfront.c  |  1 +
  drivers/tty/hvc/hvc_xen.c  |  1 +
  drivers/video/fbdev/xen-fbfront.c  |  1 +
  drivers/xen/pvcalls-front.c    |  1 +
  drivers/xen/xenbus/xenbus_probe_frontend.c | 14 +++---
  include/xen/xenbus.h   |  1 +
  sound/xen/xen_snd_front.c  |  1 +
  8 files changed, 10 insertions(+), 11 deletions(-)



Any further comments?



Reviewed-by: Boris Ostrovsky 


(I'll fix the semicolon typo in the last patch, no need to resend)




RE: [PATCH V2 5/6] net: netvsc: Add Isolation VM support for netvsc driver

2021-11-23 Thread Michael Kelley (LINUX)
From: Tianyu Lan  Sent: Tuesday, November 23, 2021 6:31 AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma address will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
> 
> Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation
> VM. After calling vmbus_establish_gpadl() which marks these pages visible
> to host, map these pages unencrypted addes space via dma_vmap_noncontiguous().
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/net/hyperv/hyperv_net.h   |   5 +
>  drivers/net/hyperv/netvsc.c   | 192 +++---
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h|   6 +
>  4 files changed, 190 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 315278a7cf88..31c77a00d01e 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>   u32 total_bytes;
>   u32 send_buf_index;
>   u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
> 
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + struct sg_table *recv_sgt;
>   u32 recv_buf_size; /* allocated bytes */
>   struct vmbus_gpadl recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
> @@ -1082,6 +1084,7 @@ struct netvsc_device {
> 
>   /* Send buffer allocated by us */
>   void *send_buf;
> + struct sg_table *send_sgt;
>   u32 send_buf_size;
>   struct vmbus_gpadl send_buf_gpadl_handle;
>   u32 send_section_cnt;
> @@ -1731,4 +1734,6 @@ struct rndis_message {
>  #define RETRY_US_HI  1
>  #define RETRY_MAX2000/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +   struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 396bc1c204e6..9cdc71930830 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void)
>   return net_device;
>  }
> 
> +static struct hv_device *netvsc_channel_to_device(struct vmbus_channel 
> *channel)
> +{
> + struct vmbus_channel *primary = channel->primary_channel;
> +
> + return primary ? primary->device_obj : channel->device_obj;
> +}
> +
>  static void free_netvsc_device(struct rcu_head *head)
>  {
>   struct netvsc_device *nvdev
>   = container_of(head, struct netvsc_device, rcu);
> + struct hv_device *dev =
> + netvsc_channel_to_device(nvdev->chan_table[0].channel);
>   int i;
> 
>   kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_sgt) {
> + dma_vunmap_noncontiguous(>device, nvdev->recv_buf);
> + dma_free_noncontiguous(>device, nvdev->recv_buf_size,
> +nvdev->recv_sgt, DMA_FROM_DEVICE);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_sgt) {
> + dma_vunmap_noncontiguous(>device, nvdev->send_buf);
> + dma_free_noncontiguous(>device, nvdev->send_buf_size,
> +nvdev->send_sgt, DMA_TO_DEVICE);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
>   kfree(nvdev->send_section_map);
> 
>   for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device,
>   buf_size = min_t(unsigned int, buf_size,
>NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
> 
> - net_device->recv_buf = vzalloc(buf_size);
> + if (hv_isolation_type_snp()) {
> + net_device->recv_sgt =
> + dma_alloc_noncontiguous(>device, buf_size,
> + DMA_FROM_DEVICE, GFP_KERNEL, 0);
> + if (!net_device->recv_sgt) {
> + pr_err("Fail to allocate recv buffer buf_size %d.\n.", 
> buf_size);
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + net_device->recv_buf = (void 
> *)net_device->recv_sgt->sgl->dma_address;

Use sg_dma_address() macro.

> + } else {
> + net_device->recv_buf = vzalloc(buf_size);
> + }
> +
>   if 

RE: [PATCH V2 4/6] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-23 Thread Michael Kelley (LINUX)
From: Tianyu Lan  Sent: Tuesday, November 23, 2021 6:31 AM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> Swiotlb bounce buffer code calls set_memory_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space via memremap. Populate the shared_gpa_boundary
> (vTOM) via swiotlb_unencrypted_base variable.
> 
> The map function memremap() can't work in the early place
> hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
> in the hyperv_iommu_swiotlb_later_init().
> 
> Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous
> callback to handle request of  allocating and mapping noncontiguous dma
> memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_
> bypass flag for hv device to use dma direct functions during mapping/unmapping
> dma page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
>   * Remove hv isolation check in the sev_setup_arch()
> 
>  arch/x86/mm/mem_encrypt.c  |   1 +
>  arch/x86/xen/pci-swiotlb-xen.c |   3 +-
>  drivers/hv/Kconfig |   1 +
>  drivers/hv/vmbus_drv.c |   6 ++
>  drivers/iommu/hyperv-iommu.c   | 164 +
>  include/linux/hyperv.h |  10 ++
>  6 files changed, 184 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 35487305d8af..e48c73b3dd41 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

There is no longer any need to add this #include since code changes to this
file in a previous version of the patch are now gone.

> 
>  #include "mm_internal.h"
> 
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 46df59aeaa06..30fd0600b008 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include 
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
> 
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -   NULL,
> +   hyperv_swiotlb_detect,
> pci_xen_swiotlb_init,
> NULL);
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index dd12af20e467..d43b4cd88f57 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -9,6 +9,7 @@ config HYPERV
>   select PARAVIRT
>   select X86_HV_CALLBACK_VECTOR if X86
>   select VMAP_PFN
> + select DMA_OPS_BYPASS
>   help
> Select this option to run Linux as a Hyper-V client operating
> system.
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 392c1ac4f819..32dc193e31cd 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "hyperv_vmbus.h"
> 
> @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t 
> *type,
>   return child_device_obj;
>  }
> 
> +static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
>  /*
>   * vmbus_device_register - Register the child device
>   */
> @@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   }
>   hv_debug_add_dev_dir(child_device_obj);
> 
> + child_device_obj->device.dma_ops_bypass = true;
> + child_device_obj->device.dma_ops = _iommu_dma_ops;
> + child_device_obj->device.dma_mask = _dma_mask;
> + child_device_obj->device.dma_parms = _device_obj->dma_parms;
>   return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index e285a220c913..ebcb628e7e8f 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -13,14 +13,21 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include 

[xen-unstable-smoke test] 166312: tolerable all pass - PUSHED

2021-11-23 Thread osstest service owner
flight 166312 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/166312/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  74a11c43fd7e074b1f77631b446dd2115eacb9e8
baseline version:
 xen  be12fcca8b784e456df3adedbffe657d753c5ff9

Last test of basis   166197  2021-11-19 19:00:27 Z3 days
Testing same since   166312  2021-11-23 13:00:27 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   be12fcca8b..74a11c43fd  74a11c43fd7e074b1f77631b446dd2115eacb9e8 -> smoke



Re: [PATCH v3 03/10] xen/arm: introduce new helper parse_static_mem_prop and acquire_static_memory_bank

2021-11-23 Thread Julien Grall

Hi Penny,

On 16/11/2021 06:31, Penny Zheng wrote:

Later, we will introduce allocate_static_memory_11 for allocating static
memory for direct-map domains, and it will share a lot common codes with
the existing allocate_static_memory.

In order not to bring a lot of duplicate codes, and also to make the whole
code more readable, this commit extracts common codes into two new helpers
parse_static_mem_prop and acquire_static_memory_bank.

Signed-off-by: Penny Zheng 
---
v3 changes:
- new commit to move the split off of the code outside in a separate patch
---
  xen/arch/arm/domain_build.c | 100 +++-
  1 file changed, 64 insertions(+), 36 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 7a063f62fe..1dc728e848 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -515,12 +515,69 @@ static bool __init append_static_memory_to_bank(struct 
domain *d,
  return true;
  }
  
+static mfn_t __init acquire_static_memory_bank(struct domain *d,

+ const __be32 **cell,
+ u32 addr_cells, u32 size_cells,
+ paddr_t *pbase, paddr_t *psize)
+{
+mfn_t smfn;
+int res;
+
+device_tree_get_reg(cell, addr_cells, size_cells, pbase, psize);
+ASSERT(IS_ALIGNED(*pbase, PAGE_SIZE) && IS_ALIGNED(*psize, PAGE_SIZE));
+if ( PFN_DOWN(*psize) > UINT_MAX )
+{
+printk(XENLOG_ERR "%pd: static memory size too large: %#"PRIpaddr,
+   d, *psize);
+return INVALID_MFN;
+}
+
+smfn = maddr_to_mfn(*pbase);
+res = acquire_domstatic_pages(d, smfn, PFN_DOWN(*psize), 0);
+if ( res )
+{
+printk(XENLOG_ERR
+   "%pd: failed to acquire static memory: %d.\n", d, res);
+return INVALID_MFN;
+}
+
+return smfn;
+}
+
+static int __init parse_static_mem_prop(const struct dt_device_node *node,
+u32 *addr_cells, u32 *size_cells,
+int *length, const __be32 **cell)
+{
+const struct dt_property *prop;
+
+prop = dt_find_property(node, "xen,static-mem", NULL);
+if ( !dt_property_read_u32(node, "#xen,static-mem-address-cells",
+   addr_cells) )
+{
+printk(XENLOG_ERR
+   "failed to read \"#xen,static-mem-address-cells\".\n");
+return -EINVAL;
+}
+
+if ( !dt_property_read_u32(node, "#xen,static-mem-size-cells",
+   size_cells) )
+{
+printk(XENLOG_ERR
+   "failed to read \"#xen,static-mem-size-cells\".\n");
+return -EINVAL;
+}
+
+*cell = (const __be32 *)prop->value;
+*length = prop->length;
+
+return 0;
+}
+
  /* Allocate memory from static memory as RAM for one specific domain d. */
  static void __init allocate_static_memory(struct domain *d,
struct kernel_info *kinfo,
const struct dt_device_node *node)
  {
-const struct dt_property *prop;
  u32 addr_cells, size_cells, reg_cells;
  unsigned int nr_banks, gbank, bank = 0;
  const uint64_t rambase[] = GUEST_RAM_BANK_BASES;
@@ -529,24 +586,10 @@ static void __init allocate_static_memory(struct domain 
*d,
  u64 tot_size = 0;
  paddr_t pbase, psize, gsize;
  mfn_t smfn;
-int res;
-
-prop = dt_find_property(node, "xen,static-mem", NULL);
-if ( !dt_property_read_u32(node, "#xen,static-mem-address-cells",
-   _cells) )
-{
-printk(XENLOG_ERR
-   "%pd: failed to read \"#xen,static-mem-address-cells\".\n", d);
-goto fail;
-}
+int length;
  
-if ( !dt_property_read_u32(node, "#xen,static-mem-size-cells",

-   _cells) )
-{
-printk(XENLOG_ERR
-   "%pd: failed to read \"#xen,static-mem-size-cells\".\n", d);
+if ( parse_static_mem_prop(node, _cells, _cells, , ) 
)
  goto fail;
-}
  reg_cells = addr_cells + size_cells;
  
  /*

@@ -557,29 +600,14 @@ static void __init allocate_static_memory(struct domain 
*d,
  gbank = 0;
  gsize = ramsize[gbank];
  kinfo->mem.bank[gbank].start = rambase[gbank];
-
-cell = (const __be32 *)prop->value;
-nr_banks = (prop->length) / (reg_cells * sizeof (u32));
+nr_banks = length / (reg_cells * sizeof (u32));
  
  for ( ; bank < nr_banks; bank++ )

  {
-device_tree_get_reg(, addr_cells, size_cells, , );
-ASSERT(IS_ALIGNED(pbase, PAGE_SIZE) && IS_ALIGNED(psize, PAGE_SIZE));
-
-if ( PFN_DOWN(psize) > UINT_MAX )
-{
-printk(XENLOG_ERR "%pd: static memory size too large: %#"PRIpaddr,
-   d, psize);
+smfn = acquire_static_memory_bank(d, , addr_cells, size_cells,
+

Re: [PATCH v3 02/10] xen/arm: avoid setting XEN_DOMCTL_CDF_iommu when IOMMU off

2021-11-23 Thread Julien Grall

Hi Penny,

On 16/11/2021 06:31, Penny Zheng wrote:

From: Stefano Stabellini 

This commit avoids setting XEN_DOMCTL_CDF_iommu when the IOMMU is
absent/disabled, otherwise xen will fail later when handling
device assignment.


I would explain briefly in the commit message why you want to do device 
assignment without PCI passthrough.


Other than that, the change below is fine with me.



Signed-off-by: Penny Zheng 
Signed-off-by: Stefano Stabellini 
---
v3 changes:
- new commit, split from the original "[PATCH v2 2/6] xen/arm: introduce
direct-map for domUs"
---
  xen/arch/arm/domain_build.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 664c88ebe4..7a063f62fe 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2996,7 +2996,8 @@ void __init create_domUs(void)
  panic("Missing property 'cpus' for domain %s\n",
dt_node_name(node));
  
-if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") )

+if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
+ iommu_enabled )
  d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
  
  if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) )




Cheers,

--
Julien Grall



Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices

2021-11-23 Thread Julien Grall

Hi,

On 23/11/2021 16:44, Oleksandr Andrushchenko wrote:

On 23.11.21 18:05, Julien Grall wrote:

Hi Oleksandr,

On 23/11/2021 06:31, Oleksandr Andrushchenko wrote:



On 22.11.21 19:17, Julien Grall wrote:

Hi,

On 22/11/2021 16:23, Oleksandr Andrushchenko wrote:

On 22.11.21 17:29, Julien Grall wrote:

I would prefer to go with my way. This can be refined in the future if we find 
Device-Tree that matches what you wrote.

Ok, so just to make it clear:
    >a PCI device would always be described as a child of the hostbridges. So I would 
rework the 'if' to also check if the parent type is not "pci"


That's correct. Thank you!

Ok, so how about
       if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, 
"pci") )
       {
       bool skip = false;

       /*
    * If the parent is also a "pci" device, then "linux,pci-domain"
    * should already be there, so nothing to do then.
    */


This comment is a bit confusing.

Do you have something on your mind?


Yes, explain that we only want to cover hostbridge (see my reply on the 
previous answer).



I think what matter if the parent is a "pci" device, then the current node must 
not be a hostbridge. So we can skip it.

By skipping you only mean we do not need to add/assign "linux,pci-domain", 
right?


Yes.


However...


       if ( node->parent && dt_device_type_is_equal(node->parent, "pci") )
       skip = true;

       if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) )
       {
I played with a single if and it looks scary...


... how about introducing an helper that will return true if this node is 
likely an hostbridge?

This is yet a single use of such a check: why would we need a helper and what 
would that
helper do?


I like splitting code in multiple functions even if they are only called 
once because this:

  1) keeps the functions line count small
  2) easier to understand what is the purpose of the check

In fact, I would actually move the handling for "linux,pci-domain" in a 
separate helper. Something like:


/*
 * When PCI passthrough is available we want to keep the
 * "linux,pci-domain" in sync for every hostbridge.
 *
 * Xen may not have a driver for all the hostbridges. So we have
 * to write an heuristic to detect whether a device node describes
 * an hostbridge.
 *
 * The current heuristic assumes that a device is an hostbridge
 * if the type is "pci" and then parent type is not "pci".
 */
static int handle_linux_pci_domain(struct dt_device *node)
{
if ( !is_pci_passthrough_enabled() )
  return 0;

if ( !dt_device_type_is_equal(node, "pci") )
  return 0;

if ( node->parent && dt_device_type_is_equal(node->parent) )
  return 0;

if ( dt_find_property(node, "linux,pci-domain", NULL )
  return 0;

/* Allocate and create the linux,pci-domain */
}

Cheers,

--
Julien Grall



RE: [PATCH V2 1/6] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-11-23 Thread Michael Kelley (LINUX)
From: Tianyu Lan  Sent: Tuesday, November 23, 2021 6:31 AM
> 
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary (E.G 39 bit
> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
> physical address will be original physical address + shared_gpa_boundary.
> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
> memory(vTOM). Memory addresses below vTOM are automatically treated as
> private while memory above vTOM is treated as shared.
> 
> Expose swiotlb_unencrypted_base for platforms to set unencrypted
> memory base offset and platform calls swiotlb_update_mem_attributes()
> to remap swiotlb mem to unencrypted address space. memremap() can
> not be called in the early stage and so put remapping code into
> swiotlb_update_mem_attributes(). Store remap address and use it to copy
> data from/to swiotlb bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
>   * Rework comment in the swiotlb_init_io_tlb_mem()
>   * Make swiotlb_init_io_tlb_mem() back to return void.
> ---
>  include/linux/swiotlb.h |  6 +
>  kernel/dma/swiotlb.c| 53 +
>  2 files changed, 54 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 569272871375..f6c3638255d5 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
>   * @end: The end address of the swiotlb memory pool. Used to do a quick
>   *   range check to see if the memory was in fact allocated by this
>   *   API.
> + * @vaddr:   The vaddr of the swiotlb memory pool. The swiotlb memory pool
> + *   may be remapped in the memory encrypted case and store virtual
> + *   address for bounce buffer operation.
>   * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
>   *   @end. For default swiotlb, this is command line adjustable via
>   *   setup_io_tlb_npages.
> @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
>  struct io_tlb_mem {
>   phys_addr_t start;
>   phys_addr_t end;
> + void *vaddr;
>   unsigned long nslabs;
>   unsigned long used;
>   unsigned int index;
> @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device 
> *dev)
>  }
>  #endif /* CONFIG_DMA_RESTRICTED_POOL */
> 
> +extern phys_addr_t swiotlb_unencrypted_base;
> +
>  #endif /* __LINUX_SWIOTLB_H */
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 8e840fbbed7c..c303fdeba82f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
> 
>  struct io_tlb_mem io_tlb_default_mem;
> 
> +phys_addr_t swiotlb_unencrypted_base;
> +
>  /*
>   * Max segment that we can provide which (if pages are contingous) will
>   * not be bounced (unless SWIOTLB_FORCE is set).
> @@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val)
>   return DIV_ROUND_UP(val, IO_TLB_SIZE);
>  }
> 
> +/*
> + * Remap swioltb memory in the unencrypted physical address space
> + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
> + * Isolation VMs).
> + */
> +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
> +{
> + void *vaddr;
> +
> + if (swiotlb_unencrypted_base) {
> + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
> +
> + vaddr = memremap(paddr, bytes, MEMREMAP_WB);
> + if (!vaddr) {
> + pr_err("Failed to map the unencrypted memory %llx size 
> %lx.\n",
> +paddr, bytes);
> + return NULL;
> + }
> +
> + return vaddr;
> + }
> +
> + return phys_to_virt(mem->start);
> +}
> +
>  /*
>   * Early SWIOTLB allocation may be too early to allow an architecture to
>   * perform the desired operations.  This function allows the architecture to
> @@ -172,7 +200,14 @@ void __init swiotlb_update_mem_attributes(void)
>   vaddr = phys_to_virt(mem->start);
>   bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
>   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
> - memset(vaddr, 0, bytes);
> +
> + mem->vaddr = swiotlb_mem_remap(mem, bytes);
> + if (!mem->vaddr) {
> + pr_err("Fail to remap swiotlb mem.\n");
> + return;
> + }
> +
> + memset(mem->vaddr, 0, bytes);
>  }

In the error case, do you want to leave mem->vaddr as NULL?  Or is it
better to leave it as the virtual address of mem-start?  Your code leaves it
as NULL.

The interaction between swiotlb_update_mem_attributes() and the helper
function swiotlb_memo_remap() seems kind of clunky.  phys_to_virt() gets called
twice, for 

[qemu-mainline test] 166307: regressions - FAIL

2021-11-23 Thread osstest service owner
flight 166307 qemu-mainline real [real]
flight 166316 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/166307/
http://logs.test-lab.xenproject.org/osstest/logs/166316/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-credit2  12 debian-install   fail REGR. vs. 166300
 test-amd64-amd64-libvirt-xsm 12 debian-install   fail REGR. vs. 166300

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-credit2   8 xen-bootfail pass in 166316-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 166316 never pass
 test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 166316 never 
pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166300
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166300
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166300
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 166300
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 166300
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 166300
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166300
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166300
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 qemuu

Xen 4.16 branched, commit moratorium lifted, freeze still in force

2021-11-23 Thread Ian Jackson
Xen 4.16 has now branched.  The staging-4.16 branch et al exists.

The commit moratorium is lifted.  However:

* 4.16 is in deep freeze now.  Nothing should be committed without
  a release ack.  Roger, you have a blanket release ack for your
  CHANGELOG update(s).

* Until 4.16 is released, please avoid significant restructuring or
  other changes which would hinder fixes being back- or forward-
  ported between 4.16 and unstable.

I am hoping to release early of the week of 29th November.
This means *all* non-release-administrative tree changes need to be
committed by the end of this coming Friday, the 26th of November.

If a significant bug is discovered between now and then, we will have
to slip the release.

Ian.



Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain

2021-11-23 Thread Julien Grall

Hi,

On 23/11/2021 16:41, Oleksandr Andrushchenko wrote:

On 23.11.21 18:12, Julien Grall wrote:

On 23/11/2021 06:58, Oleksandr Andrushchenko wrote:

unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
{
       if ( !has_vpci(d) )
       return 0;

       if ( is_hardware_domain(d) )
       {
       int ret = pci_host_iterate_bridges_and_count(d, 
vpci_get_num_handlers_cb);

       return ret < 0 ? 0 : ret;
       }

       /*
    * This is a guest domain:
    *
    * 1 for a single emulated host bridge's configuration space.
    */
       return 1;


I am afraid that my question stands even with this approach. This patch is only 
meant to handle the hardware domain, therefore the change seems to be out of 
context.

I would prefer if this change is done separately.

While I do agree that MSI part and virtual bus topology are not belonging to 
this
patch I can't agree with the rest: we already have MMIO handlers for guest 
domains
and we introduce domain_vpci_get_num_mmio_handlers which must also account
on guests and stay consistent.
So, despite the patch has "hardware domain" in its name it doesn't mean we 
should
break guests here.


We were already registering the handler for guest domain before your 
patch. So this is nothing new.


At the moment, we always allocate an extra 16 slot for IO handlers (see 
MAX_IO_HANDLER). So we are not breaking anything. Instead, this is 
simply a latent bug.



Thus I do think the above is still correct wrt this patch.


The idea of splitting patch is to separate bug fix from new code. This 
helps backporting and review.


In this case, we don't care about backport (PCI passthrough is no 
supported) and the fix a simple enough. So I am not going to insist on 
splitting to a separate patch.


However, this change *must* be explained in the commit message.

Cheers,

--
Julien Grall



[PATCH for-4.17 3/3] Set version to 4.17: rerun autogen.sh

2021-11-23 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 configure | 18 +-
 docs/configure| 18 +-
 stubdom/configure | 18 +-
 tools/configure   | 18 +-
 4 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/configure b/configure
index 62f6c2d47a..502273b263 100755
--- a/configure
+++ b/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.69 for Xen Hypervisor 4.16.
+# Generated by GNU Autoconf 2.69 for Xen Hypervisor 4.17.
 #
 # Report bugs to .
 #
@@ -579,8 +579,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='Xen Hypervisor'
 PACKAGE_TARNAME='xen'
-PACKAGE_VERSION='4.16'
-PACKAGE_STRING='Xen Hypervisor 4.16'
+PACKAGE_VERSION='4.17'
+PACKAGE_STRING='Xen Hypervisor 4.17'
 PACKAGE_BUGREPORT='xen-de...@lists.xen.org'
 PACKAGE_URL='https://www.xen.org/'
 
@@ -1236,7 +1236,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures Xen Hypervisor 4.16 to adapt to many kinds of systems.
+\`configure' configures Xen Hypervisor 4.17 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1302,7 +1302,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
- short | recursive ) echo "Configuration of Xen Hypervisor 4.16:";;
+ short | recursive ) echo "Configuration of Xen Hypervisor 4.17:";;
esac
   cat <<\_ACEOF
 
@@ -1403,7 +1403,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-Xen Hypervisor configure 4.16
+Xen Hypervisor configure 4.17
 generated by GNU Autoconf 2.69
 
 Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1420,7 +1420,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by Xen Hypervisor $as_me 4.16, which was
+It was created by Xen Hypervisor $as_me 4.17, which was
 generated by GNU Autoconf 2.69.  Invocation command line was
 
   $ $0 $@
@@ -2868,7 +2868,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by Xen Hypervisor $as_me 4.16, which was
+This file was extended by Xen Hypervisor $as_me 4.17, which was
 generated by GNU Autoconf 2.69.  Invocation command line was
 
   CONFIG_FILES= $CONFIG_FILES
@@ -2922,7 +2922,7 @@ _ACEOF
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; 
s/[\\""\`\$]/&/g'`"
 ac_cs_version="\\
-Xen Hypervisor config.status 4.16
+Xen Hypervisor config.status 4.17
 configured by $0, generated by GNU Autoconf 2.69,
   with options \\"\$ac_cs_config\\"
 
diff --git a/docs/configure b/docs/configure
index 569bd4c2ff..f93d086e9a 100755
--- a/docs/configure
+++ b/docs/configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.69 for Xen Hypervisor Documentation 4.16.
+# Generated by GNU Autoconf 2.69 for Xen Hypervisor Documentation 4.17.
 #
 # Report bugs to .
 #
@@ -579,8 +579,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='Xen Hypervisor Documentation'
 PACKAGE_TARNAME='xen'
-PACKAGE_VERSION='4.16'
-PACKAGE_STRING='Xen Hypervisor Documentation 4.16'
+PACKAGE_VERSION='4.17'
+PACKAGE_STRING='Xen Hypervisor Documentation 4.17'
 PACKAGE_BUGREPORT='xen-de...@lists.xen.org'
 PACKAGE_URL='https://www.xen.org/'
 
@@ -1224,7 +1224,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures Xen Hypervisor Documentation 4.16 to adapt to many 
kinds of systems.
+\`configure' configures Xen Hypervisor Documentation 4.17 to adapt to many 
kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1286,7 +1286,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
- short | recursive ) echo "Configuration of Xen Hypervisor Documentation 
4.16:";;
+ short | recursive ) echo "Configuration of Xen Hypervisor Documentation 
4.17:";;
esac
   cat <<\_ACEOF
 
@@ -1387,7 +1387,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-Xen Hypervisor Documentation configure 4.16
+Xen Hypervisor Documentation configure 4.17
 generated by GNU Autoconf 2.69
 
 Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1404,7 +1404,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by Xen Hypervisor Documentation 

[PATCH for-4.17 2/3] Set version to 4.17; 4.16 has branched

2021-11-23 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 README   | 10 +-
 xen/Makefile |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/README b/README
index a626e56436..562b808080 100644
--- a/README
+++ b/README
@@ -1,9 +1,9 @@
 
-__  ___  __  __
-\ \/ /___ _ __   | || |  / |/ /__ __ ___
- \  // _ \ '_ \  | || |_ | | '_ \ _| '__/ __|
- /  \  __/ | | | |__   _|| | (_) |_| | | (__
-/_/\_\___|_| |_||_|(_)_|\___/  |_|  \___|
+__  ____ _
+\ \/ /___ _ ___   _ _ __  ___| |_ __ _| |__ | | ___
+ \  // _ \ '_ \ _| | | | '_ \/ __| __/ _` | '_ \| |/ _ \
+ /  \  __/ | | |_| |_| | | | \__ \ || (_| | |_) | |  __/
+/_/\_\___|_| |_|  \__,_|_| |_|___/\__\__,_|_.__/|_|\___|
 
 
 
diff --git a/xen/Makefile b/xen/Makefile
index 2fc83f266b..1fd48af7ae 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -1,8 +1,8 @@
 # This is the correct place to edit the build version.
 # All other places this is stored (eg. compile.h) should be autogenerated.
 export XEN_VERSION   = 4
-export XEN_SUBVERSION= 16
-export XEN_EXTRAVERSION ?= -rc$(XEN_VENDORVERSION)
+export XEN_SUBVERSION= 17
+export XEN_EXTRAVERSION ?= -unstable$(XEN_VENDORVERSION)
 export XEN_FULLVERSION   = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
 -include xen-version
 
-- 
2.20.1




[PATCH for-4.17 1/3] Revert "Config.mk: pin QEMU_UPSTREAM_REVISION (prep for Xen 4.16 RC1)"

2021-11-23 Thread Ian Jackson
This branch is unstable again now.

This reverts commit c9ce6afbf2d7772f47fc572bb7fc9555724927ed.
---
 Config.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Config.mk b/Config.mk
index 6be010c7ce..6587c7d626 100644
--- a/Config.mk
+++ b/Config.mk
@@ -239,7 +239,7 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
 MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
 endif
 OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5
-QEMU_UPSTREAM_REVISION ?= b6e539830bf45e2d7a6bd86ddfdf003088b173b0
+QEMU_UPSTREAM_REVISION ?= master
 MINIOS_UPSTREAM_REVISION ?= 9f09744aa3e5982a083ecf8e9cd2123f477081f9
 
 SEABIOS_UPSTREAM_REVISION ?= rel-1.14.0
-- 
2.20.1




[PATCH for-4.16 3/3] Turn off debug by default

2021-11-23 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 tools/Rules.mk| 2 +-
 xen/Kconfig.debug | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/Rules.mk b/tools/Rules.mk
index b022da3336..051a5d3555 100644
--- a/tools/Rules.mk
+++ b/tools/Rules.mk
@@ -25,7 +25,7 @@ CFLAGS_xeninclude = -I$(XEN_INCLUDE)
 XENSTORE_XENSTORED ?= y
 
 # A debug build of tools?
-debug ?= y
+debug ?= n
 debug_symbols ?= $(debug)
 
 XEN_GOCODE_URL= golang.xenproject.org
diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index fad3050d4f..4419030235 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -3,7 +3,7 @@ menu "Debugging Options"
 
 config DEBUG
bool "Developer Checks"
-   default y
+   default n
---help---
  If you say Y here this will enable developer checks such as asserts
  and extra printks. This option is intended for development purposes
-- 
2.20.1




[PATCH for-4.16 2/3] SUPPORT.md: Set Release Notes link

2021-11-23 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 SUPPORT.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/SUPPORT.md b/SUPPORT.md
index 3a34933c89..b1367e2b22 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -9,13 +9,13 @@ for the definitions of the support status levels etc.
 
 # Release Support
 
-Xen-Version: unstable
+Xen-Version: 4.16
 Initial-Release: n/a
 Supported-Until: TBD
 Security-Support-Until: Unreleased - not yet security-supported
 
 Release Notes
-: https://wiki.xenproject.org/wiki/Xen_Project_X.YY_Release_Notes;>RN
+: https://wiki.xenproject.org/wiki/Xen_Project_4.16_Release_Notes;>RN
 
 # Feature Support
 
-- 
2.20.1




[PATCH for-4.16 1/3] Config.mk: switch to named tags (for stable branch)

2021-11-23 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 Config.mk | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/Config.mk b/Config.mk
index 6be010c7ce..b32a185efc 100644
--- a/Config.mk
+++ b/Config.mk
@@ -239,17 +239,15 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
 MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
 endif
 OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5
-QEMU_UPSTREAM_REVISION ?= b6e539830bf45e2d7a6bd86ddfdf003088b173b0
-MINIOS_UPSTREAM_REVISION ?= 9f09744aa3e5982a083ecf8e9cd2123f477081f9
+QEMU_UPSTREAM_REVISION ?= qemu-xen-4.16.0-rc4
+MINIOS_UPSTREAM_REVISION ?= xen-4.16.0-rc4
 
 SEABIOS_UPSTREAM_REVISION ?= rel-1.14.0
 
 ETHERBOOT_NICS ?= rtl8139 8086100e
 
 
-QEMU_TRADITIONAL_REVISION ?= 3d273dd05e51e5a1ffba3d98c7437ee84e8f8764
-# Wed Jul 15 10:01:40 2020 +0100
-# qemu-trad: remove Xen path dependencies
+QEMU_TRADITIONAL_REVISION ?= xen-4.16.0-rc4
 
 # Specify which qemu-dm to use. This may be `ioemu' to use the old
 # Mercurial in-tree version, or a local directory, or a git URL.
-- 
2.20.1




[PATCH for-4.16 0/3] Changes from release technician checklist

2021-11-23 Thread Ian Jackson
I am about to push these to xen.git#staging-4.16.

Roger, I have not messed with the furniture in CHANGELOG.md on either
staging-4.16 or staging.

Ian.

Ian Jackson (3):
  Config.mk: switch to named tags (for stable branch)
  SUPPORT.md: Set Release Notes link
  Turn off debug by default

 Config.mk | 8 +++-
 SUPPORT.md| 4 ++--
 tools/Rules.mk| 2 +-
 xen/Kconfig.debug | 2 +-
 4 files changed, 7 insertions(+), 9 deletions(-)

-- 
2.20.1



Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource

2021-11-23 Thread Oleksandr



On 20.11.21 04:19, Stefano Stabellini wrote:

Hi Stefano, Juergen, all



Juergen please see the bottom of the email

On Fri, 19 Nov 2021, Oleksandr wrote:

On 19.11.21 02:59, Stefano Stabellini wrote:

On Tue, 9 Nov 2021, Oleksandr wrote:

On 28.10.21 19:37, Stefano Stabellini wrote:

Hi Stefano

I am sorry for the late response.


On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

The main reason of this change is that unpopulated-alloc
code cannot be used in its current form on Arm, but there
is a desire to reuse it to avoid wasting real RAM pages
for the grant/foreign mappings.

The problem is that system "iomem_resource" is used for
the address space allocation, but the really unallocated
space can't be figured out precisely by the domain on Arm
without hypervisor involvement. For example, not all device
I/O regions are known by the time domain starts creating
grant/foreign mappings. And following the advise from
"iomem_resource" we might end up reusing these regions by
a mistake. So, the hypervisor which maintains the P2M for
the domain is in the best position to provide unused regions
of guest physical address space which could be safely used
to create grant/foreign mappings.

Introduce new helper arch_xen_unpopulated_init() which purpose
is to create specific Xen resource based on the memory regions
provided by the hypervisor to be used as unused space for Xen
scratch pages.

If arch doesn't implement arch_xen_unpopulated_init() to
initialize Xen resource the default "iomem_resource" will be used.
So the behavior on x86 won't be changed.

Also fall back to allocate xenballooned pages (steal real RAM
pages) if we do not have any suitable resource to work with and
as the result we won't be able to provide unpopulated pages.

Signed-off-by: Oleksandr Tyshchenko 
---
Changes RFC -> V2:
  - new patch, instead of
   "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to
provide
unallocated space"
---
drivers/xen/unpopulated-alloc.c | 89
+++--
include/xen/xen.h   |  2 +
2 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/unpopulated-alloc.c
b/drivers/xen/unpopulated-alloc.c
index a03dc5b..1f1d8d8 100644
--- a/drivers/xen/unpopulated-alloc.c
+++ b/drivers/xen/unpopulated-alloc.c
@@ -8,6 +8,7 @@
  #include 
+#include 
#include 
#include 
@@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock);
static struct page *page_list;
static unsigned int list_count;
+static struct resource *target_resource;
+static struct resource xen_resource = {
+   .name = "Xen unused space",
+};
+
+/*
+ * If arch is not happy with system "iomem_resource" being used for
+ * the region allocation it can provide it's own view by initializing
+ * "xen_resource" with unused regions of guest physical address space
+ * provided by the hypervisor.
+ */
+int __weak arch_xen_unpopulated_init(struct resource *res)
+{
+   return -ENOSYS;
+}
+
static int fill_list(unsigned int nr_pages)
{
struct dev_pagemap *pgmap;
-   struct resource *res;
+   struct resource *res, *tmp_res = NULL;
void *vaddr;
unsigned int i, alloc_pages = round_up(nr_pages,
PAGES_PER_SECTION);
-   int ret = -ENOMEM;
+   int ret;
res = kzalloc(sizeof(*res), GFP_KERNEL);
if (!res)
@@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages)
res->name = "Xen scratch";
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
-   ret = allocate_resource(_resource, res,
+   ret = allocate_resource(target_resource, res,
alloc_pages * PAGE_SIZE, 0, -1,
PAGES_PER_SECTION * PAGE_SIZE, NULL,
NULL);
if (ret < 0) {
@@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages)
goto err_resource;
}
+   /*
+* Reserve the region previously allocated from Xen resource
to avoid
+* re-using it by someone else.
+*/
+   if (target_resource != _resource) {
+   tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
+   if (!res) {
+   ret = -ENOMEM;
+   goto err_insert;
+   }
+
+   tmp_res->name = res->name;
+   tmp_res->start = res->start;
+   tmp_res->end = res->end;
+   tmp_res->flags = res->flags;
+
+   ret = insert_resource(_resource, tmp_res);
+   if (ret < 0) {
+   pr_err("Cannot insert IOMEM resource [%llx -
%llx]\n",
+  tmp_res->start, tmp_res->end);
+   kfree(tmp_res);
+   goto err_insert;
+   }
+   }

I am a bit confused.. why do we need to do this? Who could be
erroneously re-using the region? Are you saying that the next time
allocate_resource is called it could find the same 

Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices

2021-11-23 Thread Oleksandr Andrushchenko
Hi, Julien!

On 23.11.21 18:05, Julien Grall wrote:
> Hi Oleksandr,
>
> On 23/11/2021 06:31, Oleksandr Andrushchenko wrote:
>>
>>
>> On 22.11.21 19:17, Julien Grall wrote:
>>> Hi,
>>>
>>> On 22/11/2021 16:23, Oleksandr Andrushchenko wrote:
 On 22.11.21 17:29, Julien Grall wrote:
> I would prefer to go with my way. This can be refined in the future if we 
> find Device-Tree that matches what you wrote.
 Ok, so just to make it clear:
    >a PCI device would always be described as a child of the hostbridges. 
 So I would rework the 'if' to also check if the parent type is not "pci"
>>>
>>> That's correct. Thank you!
>> Ok, so how about
>>       if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, 
>> "pci") )
>>       {
>>       bool skip = false;
>>
>>       /*
>>    * If the parent is also a "pci" device, then "linux,pci-domain"
>>    * should already be there, so nothing to do then.
>>    */
>
> This comment is a bit confusing.
Do you have something on your mind?
> I think what matter if the parent is a "pci" device, then the current node 
> must not be a hostbridge. So we can skip it.
By skipping you only mean we do not need to add/assign "linux,pci-domain", 
right?
> However...
>
>>       if ( node->parent && dt_device_type_is_equal(node->parent, "pci") )
>>       skip = true;
>>
>>       if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) )
>>       {
>> I played with a single if and it looks scary...
>
> ... how about introducing an helper that will return true if this node is 
> likely an hostbridge?
This is yet a single use of such a check: why would we need a helper and what 
would that
helper do?
>
> Cheers,
>
Thank you,
Oleksandr

Re: [PATCH v6 4/7] xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m

2021-11-23 Thread Julien Grall

Hi Oleksandr,

On 05/11/2021 06:33, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

PCI host bridges are special devices in terms of implementing PCI
passthrough. According to [1] the current implementation depends on
Domain-0 to perform the initialization of the relevant PCI host
bridge hardware and perform PCI device enumeration. In order to
achieve that one of the required changes is to not map all the memory
ranges in map_range_to_domain as we traverse the device tree on startup
and perform some additional checks if the range needs to be mapped to
Domain-0.

The generic PCI host controller device tree binding says [2]:
- ranges: As described in IEEE Std 1275-1994, but must provide
   at least a definition of non-prefetchable memory. One
   or both of prefetchable Memory and IO Space may also
   be provided.

- reg   : The Configuration Space base address and size, as accessed
   from the parent bus.  The base address corresponds to
   the first bus in the "bus-range" property.  If no
   "bus-range" is specified, this will be bus 0 (the default).

 From the above none of the memory ranges from the "ranges" property
needs to be mapped to Domain-0 at startup as MMIO mapping is going to
be handled dynamically by vPCI as we assign PCI devices, e.g. each
device assigned to Domain-0/guest will have its MMIOs mapped/unmapped
as needed by Xen.

The "reg" property covers not only ECAM space, but may also have other
then the configuration memory ranges described, for example [3]:
- reg: Should contain rc_dbi, config registers location and length.
- reg-names: Must include the following entries:
"rc_dbi": controller configuration registers;
"config": PCIe configuration space registers.

This patch makes it possible to not map all the ranges from the
"ranges" property and also ECAM from the "reg". All the rest from the
"reg" property still needs to be mapped to Domain-0, so the PCI
host bridge remains functional in Domain-0.


The commit message is explaining the problematic quite well (thanks for 
that). I think it also wants to explain the approach taken (the fact 
that we are deferring the mapping for hostbridges).




[1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00777.html
[2] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt
[3] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt

Signed-off-by: Oleksandr Andrushchenko 
---
Since v5:
- remove some need_mapping local variables
- use own_device in handle_device
- add __init for pci_ecam_need_p2m_hwdom_mapping
- make pci_host_bridge_mappings use p2m_mmio_direct_dev directly
Since v4:
- update skip_mapping comment
- add comment why we need to map interrupts to Dom0
Since v3:
  - pass struct map_range_data to map_dt_irq_to_domain
  - remove redundant check from map_range_to_domain
  - fix handle_device's .skip_mapping
Since v2:
  - removed check in map_range_to_domain for PCI_DEV
and moved it to handle_device, so the code is
simpler
  - s/map_pci_bridge/skip_mapping
  - extended comment in pci_host_bridge_mappings
  - minor code restructure in construct_dom0
  - s/.need_p2m_mapping/.need_p2m_hwdom_mapping and related
callbacks
  - unsigned int i; in pci_host_bridge_mappings
Since v1:
  - Added better description of why and what needs to be mapped into
Domain-0's p2m and what doesn't
  - Do not do any mappings for PCI devices while traversing the DT
  - Walk all the bridges and make required mappings in one go
---
  xen/arch/arm/domain_build.c| 67 +-
  xen/arch/arm/pci/ecam.c| 14 +++
  xen/arch/arm/pci/pci-host-common.c | 50 ++
  xen/arch/arm/pci/pci-host-zynqmp.c |  1 +
  xen/include/asm-arm/pci.h  | 10 +
  xen/include/asm-arm/setup.h| 13 ++
  6 files changed, 126 insertions(+), 29 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index f7fcb1400c19..c7d992456ca7 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -10,7 +10,6 @@
  #include 
  #include 
  #include 
-#include 


You are still using helpers defined by this header. So I would keep the 
include even if it may have been included by another one.



  #include 
  #include 
  #include 
@@ -51,12 +50,6 @@ static int __init parse_dom0_mem(const char *s)
  }
  custom_param("dom0_mem", parse_dom0_mem);
  
-struct map_range_data

-{
-struct domain *d;
-p2m_type_t p2mt;
-};
-
  /* Override macros from asm/page.h to make them work with mfn_t */
  #undef virt_to_mfn
  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
@@ -1676,10 +1669,10 @@ static int __init map_dt_irq_to_domain(const struct 
dt_device_node *dev,
 const struct dt_irq *dt_irq,
 void *data)
  {
-struct domain *d = data;
+

Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain

2021-11-23 Thread Oleksandr Andrushchenko
Hi, Julien!

On 23.11.21 18:12, Julien Grall wrote:
>
>
> On 23/11/2021 06:58, Oleksandr Andrushchenko wrote:
>> Hi, Julien!
>
> Hi Oleksandr,
>
>> On 22.11.21 19:36, Julien Grall wrote:
>>> On 18/11/2021 10:46, Oleksandr Andrushchenko wrote:
 On 18.11.21 09:27, Oleksandr Andrushchenko wrote:
>>> +    unsigned int count;
>>> +
>>> +    if ( is_hardware_domain(d) )
>>> +    /* For each PCI host bridge's configuration space. */
>>> +    count = pci_host_get_num_bridges();
>> This first part makes sense to me. But...
>>
>>> +    else
>> ... I don't understand how the else is related to this commit. Can you 
>> clarify it?
>>
>>> +    /*
>>> + * There's a single MSI-X MMIO handler that deals with both PBA
>>> + * and MSI-X tables per each PCI device being passed through.
>>> + * Maximum number of supported devices is 32 as virtual bus
>>> + * topology emulates the devices as embedded endpoints.
>>> + * +1 for a single emulated host bridge's configuration space.
>>> + */
>>> +    count = 1;
>>> +#ifdef CONFIG_HAS_PCI_MSI
>>> +    count += 32;
>> Surely, this is a decision that is based on other factor in the vPCI 
>> code. So can use a define and avoid hardcoding the number?
> Well, in the later series [1] this is defined via PCI_SLOT(~0) + 1 and 
> there is no dedicated
> constant for that. I can use the same here, e.g. s/32/PCI_SLOT(~0) + 1
>>>
>>> I would prefer if we introduce a new constant for that. This makes easier 
>>> to update the code if we decide to increase the number of virtual devices.
>>>
>>> However, I am still not sure how the 'else' part is related to this commit. 
>>> Can you please clarify it?
>> Well, yes, this is too early for this patch to introduce some future 
>> knowledge, so I'll instead have:
>>
>> unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
>> {
>>       if ( !has_vpci(d) )
>>       return 0;
>>
>>       if ( is_hardware_domain(d) )
>>       {
>>       int ret = pci_host_iterate_bridges_and_count(d, 
>> vpci_get_num_handlers_cb);
>>
>>       return ret < 0 ? 0 : ret;
>>       }
>>
>>       /*
>>    * This is a guest domain:
>>    *
>>    * 1 for a single emulated host bridge's configuration space.
>>    */
>>       return 1;
>
> I am afraid that my question stands even with this approach. This patch is 
> only meant to handle the hardware domain, therefore the change seems to be 
> out of context.
>
> I would prefer if this change is done separately.
While I do agree that MSI part and virtual bus topology are not belonging to 
this
patch I can't agree with the rest: we already have MMIO handlers for guest 
domains
and we introduce domain_vpci_get_num_mmio_handlers which must also account
on guests and stay consistent.
So, despite the patch has "hardware domain" in its name it doesn't mean we 
should
break guests here.
Thus I do think the above is still correct wrt this patch.
>
> Cheers,
>
Thank you,
Oleksandr

[PATCH for-4.16 v3] CHANGELOG: add missing entries for work during the 4.16 release cycle

2021-11-23 Thread Roger Pau Monne
Document some of the relevant changes during the 4.16 release cycle.

Signed-off-by: Roger Pau Monné 
Release-Acked-by: Ian Jackson 
---
Changes since v2:
 - Reword the x86 page table API change.

Changes since v1:
 - Add qemu-traditional and pvgrub notes.
 - Expand vPMU support to note it's limited.
 - xenstore library API changes.
 - xencall2L addition.
 - gitlab CI changes.
 - dom0less improvements.
 - vGIC fixes.
 - New Arm features.
 - OP-TEE fixes.
 - Arm64 heterogeneous CPU support.
---
Cc: Alistair Francis 
Cc: Andrew Cooper 
Cc: Anthony PERARD 
Cc: Anthony Perard 
Cc: Bertrand Marquis 
Cc: Bob Eshleman 
Cc: Christian Lindig 
Cc: Christopher Clark 
Cc: Daniel De Graaf 
Cc: Dario Faggioli 
Cc: David Scott 
Cc: Doug Goldstein 
Cc: Elena Ufimtseva 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Josh Whitehead 
Cc: Juergen Gross 
Cc: Julien Grall 
Cc: Jun Nakajima 
Cc: Kevin Tian 
Cc: Konrad Rzeszutek Wilk 
Cc: Marek Marczykowski-Górecki 
Cc: Meng Xu 
Cc: Nick Rosbrook 
Cc: Paul Durrant 
Cc: Quan Xu 
Cc: Rahul Singh 
Cc: Roger Pau Monné 
Cc: Ross Lagerwall 
Cc: Samuel Thibault 
Cc: Shriram Rajagopalan 
Cc: Stefano Stabellini 
Cc: Stewart Hildebrand 
Cc: Tamas K Lengyel 
Cc: Tim Deegan 
Cc: Volodymyr Babchuk 
Cc: Wei Liu 
---
 CHANGELOG.md | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index ad1a8c2bc2..d312ddf627 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -21,6 +21,32 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
  - qemu-traditional based device models (both, qemu-traditional and 
ioemu-stubdom) will
no longer be built per default. In order to be able to use those, configure 
needs to
be called with "--enable-qemu-traditional" as parameter.
+ - Fixes for credit2 scheduler stability in corner case conditions.
+ - Ongoing improvements in the hypervisor build system.
+ - vtpmmgr miscellaneous fixes in preparation for TPM 2.0 support.
+ - 32bit PV guests only supported in shim mode.
+ - Improved PVH dom0 debug key handling.
+ - Fix booting on some Intel systems without a PIT (i8254).
+ - Do not build qemu-traditional or pvgrub by default.
+ - Cleanup of the xenstore library interface.
+ - Fix truncation of return value from xencall2 by introducing a new helper
+   that returns a long instead.
+ - Fix system register accesses on Arm to use the proper 32/64bit access size.
+ - Various fixes for Arm OP-TEE mediator.
+ - Switch to domheap for Xen page tables.
+
+### Added
+ - 32bit Arm builds to the gitlab-ci automated tests.
+ - x86 full system tests to the gitlab-ci automated tests.
+ - Arm limited vPMU support for guests.
+ - Static physical memory allocation for dom0less on arm64.
+ - dom0less EFI support on arm64.
+ - GICD_ICPENDR register handling in vGIC emulation to support Zephyr OS.
+ - CPU feature leveling on arm64 platform with heterogeneous cores.
+ - Report unpopulated memory regions safe to use for external mappings, Arm and
+   device tree only.
+ - Support of generic DT IOMMU bindings for Arm SMMU v2.
+ - Limit grant table version on a per-domain basis.
 
 ## 
[4.15.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.15.0) 
- 2021-04-08
 
-- 
2.33.0




Xen 4.16 RC4

2021-11-23 Thread Ian Jackson
Xen 4.16 RC4 is now available.

It is available from git:
  git clone https://xenbits.xenproject.org/git-http/xen.git -b 4.16.0-rc4

For your convenience a tarball is available:
  https://downloads.xenproject.org/release/xen/4.16.0-rc4/xen-4.16.0-rc4.tar.gz
  
https://downloads.xenproject.org/release/xen/4.16.0-rc4/xen-4.16.0-rc4.tar.gz.sig

This may be the last RC before release.

Ian.
Xen 4.16 Release Manager



Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain

2021-11-23 Thread Julien Grall




On 23/11/2021 06:58, Oleksandr Andrushchenko wrote:

Hi, Julien!


Hi Oleksandr,


On 22.11.21 19:36, Julien Grall wrote:

On 18/11/2021 10:46, Oleksandr Andrushchenko wrote:

On 18.11.21 09:27, Oleksandr Andrushchenko wrote:

+    unsigned int count;
+
+    if ( is_hardware_domain(d) )
+    /* For each PCI host bridge's configuration space. */
+    count = pci_host_get_num_bridges();

This first part makes sense to me. But...


+    else

... I don't understand how the else is related to this commit. Can you clarify 
it?


+    /*
+ * There's a single MSI-X MMIO handler that deals with both PBA
+ * and MSI-X tables per each PCI device being passed through.
+ * Maximum number of supported devices is 32 as virtual bus
+ * topology emulates the devices as embedded endpoints.
+ * +1 for a single emulated host bridge's configuration space.
+ */
+    count = 1;
+#ifdef CONFIG_HAS_PCI_MSI
+    count += 32;

Surely, this is a decision that is based on other factor in the vPCI code. So 
can use a define and avoid hardcoding the number?

Well, in the later series [1] this is defined via PCI_SLOT(~0) + 1 and there is 
no dedicated
constant for that. I can use the same here, e.g. s/32/PCI_SLOT(~0) + 1


I would prefer if we introduce a new constant for that. This makes easier to 
update the code if we decide to increase the number of virtual devices.

However, I am still not sure how the 'else' part is related to this commit. Can 
you please clarify it?

Well, yes, this is too early for this patch to introduce some future knowledge, 
so I'll instead have:

unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
{
      if ( !has_vpci(d) )
      return 0;

      if ( is_hardware_domain(d) )
      {
      int ret = pci_host_iterate_bridges_and_count(d, 
vpci_get_num_handlers_cb);

      return ret < 0 ? 0 : ret;
      }

      /*
   * This is a guest domain:
   *
   * 1 for a single emulated host bridge's configuration space.
   */
      return 1;


I am afraid that my question stands even with this approach. This patch 
is only meant to handle the hardware domain, therefore the change seems 
to be out of context.


I would prefer if this change is done separately.

Cheers,

--
Julien Grall



Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices

2021-11-23 Thread Julien Grall

Hi Oleksandr,

On 23/11/2021 06:31, Oleksandr Andrushchenko wrote:



On 22.11.21 19:17, Julien Grall wrote:

Hi,

On 22/11/2021 16:23, Oleksandr Andrushchenko wrote:

On 22.11.21 17:29, Julien Grall wrote:

I would prefer to go with my way. This can be refined in the future if we find 
Device-Tree that matches what you wrote.

Ok, so just to make it clear:
   >a PCI device would always be described as a child of the hostbridges. So I would 
rework the 'if' to also check if the parent type is not "pci"


That's correct. Thank you!

Ok, so how about
      if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, "pci") 
)
      {
      bool skip = false;

      /*
   * If the parent is also a "pci" device, then "linux,pci-domain"
   * should already be there, so nothing to do then.
   */


This comment is a bit confusing. I think what matter if the parent is a 
"pci" device, then the current node must not be a hostbridge. So we can 
skip it. However...



      if ( node->parent && dt_device_type_is_equal(node->parent, "pci") )
      skip = true;

      if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) )
      {
I played with a single if and it looks scary...


... how about introducing an helper that will return true if this node 
is likely an hostbridge?


Cheers,

--
Julien Grall



Re: [PATCH v4 05/11] vpci/header: implement guest BAR register handlers

2021-11-23 Thread Oleksandr Andrushchenko
Hi, Roger!

On 19.11.21 15:02, Jan Beulich wrote:
> On 19.11.2021 13:54, Oleksandr Andrushchenko wrote:
>> On 19.11.21 14:49, Jan Beulich wrote:
>>> On 19.11.2021 13:46, Oleksandr Andrushchenko wrote:
 On 19.11.21 14:37, Jan Beulich wrote:
> On 19.11.2021 13:10, Oleksandr Andrushchenko wrote:
>> On 19.11.21 13:58, Jan Beulich wrote:
>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote:
 --- a/xen/drivers/vpci/header.c
 +++ b/xen/drivers/vpci/header.c
 @@ -408,6 +408,48 @@ static void bar_write(const struct pci_dev *pdev, 
 unsigned int reg,
  pci_conf_write32(pdev->sbdf, reg, val);
  }
  
 +static void guest_bar_write(const struct pci_dev *pdev, unsigned int 
 reg,
 +uint32_t val, void *data)
 +{
 +struct vpci_bar *bar = data;
 +bool hi = false;
 +
 +if ( bar->type == VPCI_BAR_MEM64_HI )
 +{
 +ASSERT(reg > PCI_BASE_ADDRESS_0);
 +bar--;
 +hi = true;
 +}
 +else
 +{
 +val &= PCI_BASE_ADDRESS_MEM_MASK;
 +val |= bar->type == VPCI_BAR_MEM32 ? 
 PCI_BASE_ADDRESS_MEM_TYPE_32
 +   : 
 PCI_BASE_ADDRESS_MEM_TYPE_64;
 +val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
 +}
 +
 +bar->guest_addr &= ~(0xull << (hi ? 32 : 0));
 +bar->guest_addr |= (uint64_t)val << (hi ? 32 : 0);
 +
 +bar->guest_addr &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
 +}
 +
 +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned 
 int reg,
 +   void *data)
 +{
 +const struct vpci_bar *bar = data;
 +bool hi = false;
 +
 +if ( bar->type == VPCI_BAR_MEM64_HI )
 +{
 +ASSERT(reg > PCI_BASE_ADDRESS_0);
 +bar--;
 +hi = true;
 +}
 +
 +return bar->guest_addr >> (hi ? 32 : 0);
>>> I'm afraid "guest_addr" then isn't the best name; maybe "guest_val"?
>>> This would make more obvious that there is a meaningful difference
>>> from "addr" besides the guest vs host aspect.
>> I am not sure I can agree here:
>> bar->addr and bar->guest_addr make it clear what are these while
>> bar->addr and bar->guest_val would make someone go look for
>> additional information about what that val is for.
> Feel free to replace "val" with something more suitable. "guest_bar"
> maybe? The value definitely is not an address, so "addr" seems
> inappropriate / misleading to me.
 This is a guest's view on the BAR's address. So to me it is still 
 guest_addr
>>> It's a guest's view on the BAR, not just the address. Or else you couldn't
>>> simply return the value here without folding in the correct low bits.
>> I agree with this this respect as it is indeed address + lower bits.
>> How about guest_bar_val then? So it reflects its nature, e.g. the value
>> of the BAR as seen by the guest.
> Gets a little longish for my taste. I for one wouldn't mind it be just
> "guest". In the end Roger has the final say here anyway.
What is your preference on naming here?
1. guest_addr
2. guest_val
3. guest_bar_val
4. guest
>
> Jan
>
Thank you in advance,
Oleksandr

Re: [RFC PATCH 0/2] Boot time cpupools

2021-11-23 Thread Bertrand Marquis
Hi Julien,

> On 23 Nov 2021, at 13:54, Julien Grall  wrote:
> 
> Hi Stefano,
> 
> On 19/11/2021 18:55, Stefano Stabellini wrote:
>> On Fri, 19 Nov 2021, Julien Grall wrote:
>>> I like better Juergen's version. But before agreeing on the command line , I
>>> would like to understand how Linux is dealing with big.LITTLE today (see my
>>> question above).
>> I also like Juergen's version better :-)
>> The device tree binding that covers big.LITTLE is the same that covers
>> cpus: Documentation/devicetree/bindings/arm/cpus.yaml
> 
> Are you sure? I found Documentation/devicetree/bindings/arm/cpu-capacity.txt.
> 
>> So basically, you can tell it is a big.LITTLE system because the
>> compatible strings differ between certain cpus, e.g.:
>>   cpu@0 {
>> device_type = "cpu";
>> compatible = "arm,cortex-a15";
>> reg = <0x0>;
>>   };
>>   cpu@1 {
>> device_type = "cpu";
>> compatible = "arm,cortex-a15";
>> reg = <0x1>;
>>   };
>>   cpu@100 {
>> device_type = "cpu";
>> compatible = "arm,cortex-a7";
>> reg = <0x100>;
>>   };
>>   cpu@101 {
>> device_type = "cpu";
>> compatible = "arm,cortex-a7";
>> reg = <0x101>;
>>   };
> 
> I have not found any code in Linux using the compatible. Instead, they all 
> seem to use the cpu-map (see drivers/base/arch_topology.c).
> 
> Anyway, to me it would be better to parse the Device-Tree over using the 
> MIDR. The advantage is we can cover a wide range of cases (you may have 
> processor with the same MIDR but different frequency). For now, we could 
> create a cpupool per cluster.

This is a really good idea as this could also work for NUMA systems.

So reg & ~0xff would give the cluster number ?

We will probably end up splitting cores into different cpupools even if they 
are all the same as there are several CPUs having several clusters but the same 
cores (I recall some NXP boards being like that).

Some device tree also have a cpu-map definition:
cpu-map {
cluster0 {
core0 {
cpu = <_0>;
};
core1 {
cpu = <_1>;
};
};

cluster1 {
core0 {
cpu = <_0>;
};
core1 {
cpu = <_1>;
};
core2 {
cpu = <_2>;
};
core3 {
cpu = <_3>;
};
};
};

@stefano: is the cpu-map something standard ? Lots of device trees do have it 
in Linux now but I do not recall seeing that on older device trees.

Maybe using cpu-map could be a solution, we could say that device tree without 
a cpu-map are not supported.

Cheers
Bertrand



> 
> Cheers,
> 
> -- 
> Julien Grall




[PATCH V2 5/6] net: netvsc: Add Isolation VM support for netvsc driver

2021-11-23 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation
VM. After calling vmbus_establish_gpadl() which marks these pages visible
to host, map these pages unencrypted addes space via dma_vmap_noncontiguous().

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   6 +
 4 files changed, 190 insertions(+), 15 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..31c77a00d01e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   struct sg_table *recv_sgt;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   struct sg_table *send_sgt;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..9cdc71930830 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void)
return net_device;
 }
 
+static struct hv_device *netvsc_channel_to_device(struct vmbus_channel 
*channel)
+{
+   struct vmbus_channel *primary = channel->primary_channel;
+
+   return primary ? primary->device_obj : channel->device_obj;
+}
+
 static void free_netvsc_device(struct rcu_head *head)
 {
struct netvsc_device *nvdev
= container_of(head, struct netvsc_device, rcu);
+   struct hv_device *dev =
+   netvsc_channel_to_device(nvdev->chan_table[0].channel);
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->recv_buf);
+   dma_free_noncontiguous(>device, nvdev->recv_buf_size,
+  nvdev->recv_sgt, DMA_FROM_DEVICE);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->send_buf);
+   dma_free_noncontiguous(>device, nvdev->send_buf_size,
+  nvdev->send_sgt, DMA_TO_DEVICE);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device,
buf_size = min_t(unsigned int, buf_size,
 NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
 
-   net_device->recv_buf = vzalloc(buf_size);
+   if (hv_isolation_type_snp()) {
+   net_device->recv_sgt =
+   dma_alloc_noncontiguous(>device, buf_size,
+   DMA_FROM_DEVICE, GFP_KERNEL, 0);
+   if (!net_device->recv_sgt) {
+   pr_err("Fail to allocate recv buffer buf_size %d.\n.", 
buf_size);
+   ret = -ENOMEM;
+   goto cleanup;
+   }
+
+   net_device->recv_buf = (void 
*)net_device->recv_sgt->sgl->dma_address;
+   } else {
+   net_device->recv_buf = vzalloc(buf_size);
+   }
+
if (!net_device->recv_buf) {
netdev_err(ndev,
   "unable to allocate receive buffer of size %u\n",
@@ -357,8 +396,6 @@ static int netvsc_init_buf(struct hv_device *device,
  

[PATCH V2 6/6] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-11-23 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
-   hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
-   hvpgoff;
+   hvpfn = HVPFN_DOWN(sg_dma_address(sg));
+   hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
+sg_dma_len(sg)) - hvpfn;
 
/*
 * Fill the next portion of the PFN array with
@@ -1872,7 +1876,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * the PFN array is filled.
 */
while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   payload->range.pfn_array[i++] = hvpfn++;
}
}
 
@@ -2016,6 

[PATCH V2 4/6] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-23 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
in the hyperv_iommu_swiotlb_later_init().

Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous
callback to handle request of  allocating and mapping noncontiguous dma
memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_
bypass flag for hv device to use dma direct functions during mapping/unmapping
dma page.

Signed-off-by: Tianyu Lan 
---
Change since v1:
* Remove hv isolation check in the sev_setup_arch()

 arch/x86/mm/mem_encrypt.c  |   1 +
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/vmbus_drv.c |   6 ++
 drivers/iommu/hyperv-iommu.c   | 164 +
 include/linux/hyperv.h |  10 ++
 6 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 35487305d8af..e48c73b3dd41 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 46df59aeaa06..30fd0600b008 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index dd12af20e467..d43b4cd88f57 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -9,6 +9,7 @@ config HYPERV
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
select VMAP_PFN
+   select DMA_OPS_BYPASS
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..32dc193e31cd 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_ops_bypass = true;
+   child_device_obj->device.dma_ops = _iommu_dma_ops;
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..ebcb628e7e8f 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,21 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -337,4 +344,161 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
 };
 
+static void __init hyperv_iommu_swiotlb_init(void)
+{
+   unsigned long hyperv_io_tlb_size;
+   

[PATCH V2 3/6] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-11-23 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cc_platform.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..f3bb0431f5c5 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,23 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_HYPERV
+   if (attr == CC_ATTR_GUEST_MEM_ENCRYPT)
+   return true;
+   else
+   return false;
+#else
+   return false;
+#endif
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
-- 
2.25.1




[PATCH V2 1/6] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-11-23 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 +
 kernel/dma/swiotlb.c| 53 +
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..c303fdeba82f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr) {
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   return NULL;
+   }
+
+   return vaddr;
+   }
+
+   return phys_to_virt(mem->start);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +200,14 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr) {
+   pr_err("Fail to remap swiotlb mem.\n");
+   return;
+   }
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +231,18 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
}
+
+   /*
+* If swiotlb_unencrypted_base is set, the bounce buffer memory will
+* be remapped and cleared in swiotlb_update_mem_attributes.
+*/
+   if (swiotlb_unencrypted_base)
+   

[PATCH V2 2/6] dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops

2021-11-23 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V netvsc driver needs to allocate noncontiguous DMA memory and
remap it into unencrypted address space before sharing with host. Add
vmap/vunmap_noncontiguous() callback and handle the remap in the Hyper-V
dma ops callback.

Signed-off-by: Tianyu Lan 
---
 include/linux/dma-map-ops.h |  3 +++
 kernel/dma/mapping.c| 18 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..f7b9958ca20a 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -27,6 +27,9 @@ struct dma_map_ops {
unsigned long attrs);
void (*free_noncontiguous)(struct device *dev, size_t size,
struct sg_table *sgt, enum dma_data_direction dir);
+   void *(*vmap_noncontiguous)(struct device *dev, size_t size,
+   struct sg_table *sgt);
+   void (*vunmap_noncontiguous)(struct device *dev, void *addr);
int (*mmap)(struct device *, struct vm_area_struct *,
void *, dma_addr_t, size_t, unsigned long attrs);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9478eccd1c8e..7fd751d866cc 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -674,8 +674,14 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t 
size,
const struct dma_map_ops *ops = get_dma_ops(dev);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 
-   if (ops && ops->alloc_noncontiguous)
-   return vmap(sgt_handle(sgt)->pages, count, VM_MAP, PAGE_KERNEL);
+   if (ops) {
+   if (ops->vmap_noncontiguous)
+   return ops->vmap_noncontiguous(dev, size, sgt);
+   else if (ops->alloc_noncontiguous)
+   return vmap(sgt_handle(sgt)->pages, count, VM_MAP,
+   PAGE_KERNEL);
+   }
+
return page_address(sg_page(sgt->sgl));
 }
 EXPORT_SYMBOL_GPL(dma_vmap_noncontiguous);
@@ -684,8 +690,12 @@ void dma_vunmap_noncontiguous(struct device *dev, void 
*vaddr)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
 
-   if (ops && ops->alloc_noncontiguous)
-   vunmap(vaddr);
+   if (ops) {
+   if (ops->vunmap_noncontiguous)
+   ops->vunmap_noncontiguous(dev, vaddr);
+   else if (ops->alloc_noncontiguous)
+   vunmap(vaddr);
+   }
 }
 EXPORT_SYMBOL_GPL(dma_vunmap_noncontiguous);
 
-- 
2.25.1




[PATCH V2 0/6] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-11-23 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM. Add Hyper-V dma ops and provide dma_alloc/free_
noncontiguous and vmap/vunmap_noncontiguous callback. Allocate
rx/tx ring via dma_alloc_noncontiguous() and map them into extra
address space via dma_vmap_noncontiguous().

Change since v1:
 * Add Hyper-V Isolation support check in the cc_platform_has()
   and return true for guest memory encrypt attr.
 * Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (6):
  Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  net: netvsc: Add Isolation VM support for netvsc driver
  scsi: storvsc: Add Isolation VM support for storvsc driver

 arch/x86/kernel/cc_platform.c |  15 +++
 arch/x86/mm/mem_encrypt.c |   1 +
 arch/x86/xen/pci-swiotlb-xen.c|   3 +-
 drivers/hv/Kconfig|   1 +
 drivers/hv/vmbus_drv.c|   6 +
 drivers/iommu/hyperv-iommu.c  | 164 +
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 +++---
 include/linux/dma-map-ops.h   |   3 +
 include/linux/hyperv.h|  17 +++
 include/linux/swiotlb.h   |   6 +
 kernel/dma/mapping.c  |  18 ++-
 kernel/dma/swiotlb.c  |  53 -
 15 files changed, 482 insertions(+), 41 deletions(-)

-- 
2.25.1




Re: [RFC PATCH 0/2] Boot time cpupools

2021-11-23 Thread Julien Grall

Hi Stefano,

On 19/11/2021 18:55, Stefano Stabellini wrote:

On Fri, 19 Nov 2021, Julien Grall wrote:

I like better Juergen's version. But before agreeing on the command line , I
would like to understand how Linux is dealing with big.LITTLE today (see my
question above).


I also like Juergen's version better :-)

The device tree binding that covers big.LITTLE is the same that covers
cpus: Documentation/devicetree/bindings/arm/cpus.yaml


Are you sure? I found 
Documentation/devicetree/bindings/arm/cpu-capacity.txt.




So basically, you can tell it is a big.LITTLE system because the
compatible strings differ between certain cpus, e.g.:

   cpu@0 {
 device_type = "cpu";
 compatible = "arm,cortex-a15";
 reg = <0x0>;
   };

   cpu@1 {
 device_type = "cpu";
 compatible = "arm,cortex-a15";
 reg = <0x1>;
   };

   cpu@100 {
 device_type = "cpu";
 compatible = "arm,cortex-a7";
 reg = <0x100>;
   };

   cpu@101 {
 device_type = "cpu";
 compatible = "arm,cortex-a7";
 reg = <0x101>;
   };


I have not found any code in Linux using the compatible. Instead, they 
all seem to use the cpu-map (see drivers/base/arch_topology.c).


Anyway, to me it would be better to parse the Device-Tree over using the 
MIDR. The advantage is we can cover a wide range of cases (you may have 
processor with the same MIDR but different frequency). For now, we could 
create a cpupool per cluster.


Cheers,

--
Julien Grall



[PATCH 3/3] VT-d: conditionalize IOTLB register offset check

2021-11-23 Thread Jan Beulich
As of commit 6773b1a7584a ("VT-d: Don't assume register-based
invalidation is always supported") we don't (try to) use register based
invalidation anymore when that's not supported by hardware. Hence
there's also no point in the respective check, avoiding pointless IOMMU
initialization failure. After all the spec (version 3.3 at the time of
writing) doesn't say what the respective Extended Capability Register
field would contain in such a case.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1228,7 +1228,8 @@ int __init iommu_alloc(struct acpi_drhd_
 
 if ( cap_fault_reg_offset(iommu->cap) +
  cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN > PAGE_SIZE ||
- ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE )
+ (has_register_based_invalidation(iommu) &&
+  ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE) )
 {
 printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n");
 print_iommu_regs(drhd);




[PATCH 2/3] VT-d: correct off-by-1 in fault register range check

2021-11-23 Thread Jan Beulich
All our present implementation requires is that the range fully fits
in a single page. No need to exclude the case of the last register
extending right to the end of that page.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1229,7 +1229,7 @@ int __init iommu_alloc(struct acpi_drhd_
 quirk_iommu_caps(iommu);
 
 if ( cap_fault_reg_offset(iommu->cap) +
- cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN >= PAGE_SIZE ||
+ cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN > PAGE_SIZE ||
  ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE )
 {
 printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n");




[PATCH 1/3] VT-d: prune SAGAW recognition

2021-11-23 Thread Jan Beulich
Bit 0 of the capability register field has become reserved at or before
spec version 2.2. Treat it as such. Replace the effective open-coding of
find_first_set_bit(). Adjust local variable types.

Signed-off-by: Jan Beulich 
---
Strictly speaking IOMMUs supporting only 3-level tables ought to result
in guests seeing a suitably reduced physical address width in CPUID.
And then the same would apply to restrictions resulting from MGAW.

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -356,7 +356,7 @@ static uint64_t domain_pgd_maddr(struct
 pgd_maddr = hd->arch.vtd.pgd_maddr;
 }
 
-/* Skip top levels of page tables for 2- and 3-level DRHDs. */
+/* Skip top level(s) of page tables for less-than-maximum level DRHDs. */
 for ( agaw = level_to_agaw(4);
   agaw != level_to_agaw(nr_pt_levels);
   agaw-- )
@@ -1183,8 +1183,7 @@ static int __init iommu_set_interrupt(st
 int __init iommu_alloc(struct acpi_drhd_unit *drhd)
 {
 struct vtd_iommu *iommu;
-unsigned long sagaw, nr_dom;
-int agaw;
+unsigned int sagaw, agaw = 0, nr_dom;
 
 iommu = xzalloc(struct vtd_iommu);
 if ( iommu == NULL )
@@ -1237,14 +1236,13 @@ int __init iommu_alloc(struct acpi_drhd_
 return -ENODEV;
 }
 
-/* Calculate number of pagetable levels: between 2 and 4. */
+/* Calculate number of pagetable levels: 3 or 4. */
 sagaw = cap_sagaw(iommu->cap);
-for ( agaw = level_to_agaw(4); agaw >= 0; agaw-- )
-if ( test_bit(agaw, ) )
-break;
-if ( agaw < 0 )
+if ( sagaw & 6 )
+agaw = find_first_set_bit(sagaw & 6);
+if ( !agaw )
 {
-printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %lx\n", sagaw);
+printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw);
 print_iommu_regs(drhd);
 return -ENODEV;
 }




[PATCH 0/3] VT-d: further small corrections

2021-11-23 Thread Jan Beulich
1: prune SAGAW recognition
2: correct off-by-1 in fault register range check
3: conditionalize IOTLB register offset check

Jan




[PATCH v7 3/3] xen: add Xen pvUSB maintainer

2021-11-23 Thread Juergen Gross
Add myself as maintainer for the Xen pvUSB stuff.

Signed-off-by: Juergen Gross 
Acked-by: Konrad Rzeszutek Wilk 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5250298d2817..da7fd3d32dc4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20831,6 +20831,14 @@ F: drivers/scsi/xen-scsifront.c
 F: drivers/xen/xen-scsiback.c
 F: include/xen/interface/io/vscsiif.h
 
+XEN PVUSB DRIVER
+M: Juergen Gross 
+L: xen-devel@lists.xenproject.org (moderated for non-subscribers)
+L: linux-...@vger.kernel.org
+S: Supported
+F: divers/usb/host/xen*
+F: include/xen/interface/io/usbif.h
+
 XEN SOUND FRONTEND DRIVER
 M: Oleksandr Andrushchenko 
 L: xen-devel@lists.xenproject.org (moderated for non-subscribers)
-- 
2.26.2




[PATCH v7 1/3] usb: Add Xen pvUSB protocol description

2021-11-23 Thread Juergen Gross
Add the definition of pvUSB protocol used between the pvUSB frontend in
a Xen domU and the pvUSB backend in a Xen driver domain (usually Dom0).

This header was originally provided by Fujitsu for Xen based on Linux
2.6.18.

Changes are:
- adapt to Linux kernel style guide
- use Xen namespace
- add lots of comments
- don't use kernel internal defines

Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
---
 include/xen/interface/io/usbif.h | 405 +++
 1 file changed, 405 insertions(+)
 create mode 100644 include/xen/interface/io/usbif.h

diff --git a/include/xen/interface/io/usbif.h b/include/xen/interface/io/usbif.h
new file mode 100644
index ..a70e0b93178b
--- /dev/null
+++ b/include/xen/interface/io/usbif.h
@@ -0,0 +1,405 @@
+/* SPDX-License-Identifier: MIT */
+
+/*
+ * usbif.h
+ *
+ * USB I/O interface for Xen guest OSes.
+ *
+ * Copyright (C) 2009, FUJITSU LABORATORIES LTD.
+ * Author: Noboru Iwamatsu 
+ */
+
+#ifndef __XEN_PUBLIC_IO_USBIF_H__
+#define __XEN_PUBLIC_IO_USBIF_H__
+
+#include "ring.h"
+#include "../grant_table.h"
+
+/*
+ * Detailed Interface Description
+ * ==
+ * The pvUSB interface is using a split driver design: a frontend driver in
+ * the guest and a backend driver in a driver domain (normally dom0) having
+ * access to the physical USB device(s) being passed to the guest.
+ *
+ * The frontend and backend drivers use XenStore to initiate the connection
+ * between them, the I/O activity is handled via two shared ring pages and an
+ * event channel. As the interface between frontend and backend is at the USB
+ * host connector level, multiple (up to 31) physical USB devices can be
+ * handled by a single connection.
+ *
+ * The Xen pvUSB device name is "qusb", so the frontend's XenStore entries are
+ * to be found under "device/qusb", while the backend's XenStore entries are
+ * under "backend//qusb".
+ *
+ * When a new pvUSB connection is established, the frontend needs to setup the
+ * two shared ring pages for communication and the event channel. The ring
+ * pages need to be made available to the backend via the grant table
+ * interface.
+ *
+ * One of the shared ring pages is used by the backend to inform the frontend
+ * about USB device plug events (device to be added or removed). This is the
+ * "conn-ring".
+ *
+ * The other ring page is used for USB I/O communication (requests and
+ * responses). This is the "urb-ring".
+ *
+ * Feature and Parameter Negotiation
+ * =
+ * The two halves of a Xen pvUSB driver utilize nodes within the XenStore to
+ * communicate capabilities and to negotiate operating parameters. This
+ * section enumerates these nodes which reside in the respective front and
+ * backend portions of the XenStore, following the XenBus convention.
+ *
+ * Any specified default value is in effect if the corresponding XenBus node
+ * is not present in the XenStore.
+ *
+ * XenStore nodes in sections marked "PRIVATE" are solely for use by the
+ * driver side whose XenBus tree contains them.
+ *
+ *
+ *Backend XenBus Nodes
+ *
+ *
+ *-- Backend Device Identification (PRIVATE) --
+ *
+ * num-ports
+ *  Values: unsigned [1...31]
+ *
+ *  Number of ports for this (virtual) USB host connector.
+ *
+ * usb-ver
+ *  Values: unsigned [1...2]
+ *
+ *  USB version of this host connector: 1 = USB 1.1, 2 = USB 2.0.
+ *
+ * port/[1...31]
+ *  Values: string
+ *
+ *  Physical USB device connected to the given port, e.g. "3-1.5".
+ *
+ *
+ *Frontend XenBus Nodes
+ *
+ *
+ *--- Request Transport Parameters ---
+ *
+ * event-channel
+ *  Values: unsigned
+ *
+ *  The identifier of the Xen event channel used to signal activity
+ *  in the ring buffer.
+ *
+ * urb-ring-ref
+ *  Values: unsigned
+ *
+ *  The Xen grant reference granting permission for the backend to map
+ *  the sole page in a single page sized ring buffer. This is the ring
+ *  buffer for urb requests.
+ *
+ * conn-ring-ref
+ *  Values: unsigned
+ *
+ *  The Xen grant reference granting permission for the backend to map
+ *  the sole page in a single page sized ring buffer. This is the ring
+ *  buffer for connection/disconnection requests.
+ *
+ * protocol
+ *  Values: string (XEN_IO_PROTO_ABI_*)
+ *  Default Value:  XEN_IO_PROTO_ABI_NATIVE
+ *
+ *  The machine ABI rules governing the format of all ring request and
+ *  response structures.
+ *
+ * 

[PATCH v7 0/3] xen, usb: support pvUSB frontend driver

2021-11-23 Thread Juergen Gross
This series adds XEN guest pvUSB support. With pvUSB it is possible to
use physical USB devices from a XEN domain.

Since V4 a lot of time (several years) has passed. This is a revived
attempt to get the frontend into the kernel.

The support consists of a frontend in form of a virtual hcd driver in
the unprivileged domU passing I/O-requests to the backend in a driver
domain (usually Dom0). The backend is not part of this patch series,
as it is supported via qemu.

The code is taken (and adapted) from the original pvUSB implementation
done for Linux 2.6 in 2008 by Fujitsu.

Normal operation of USB devices by adding and removing them dynamically
to/from a domain has been tested using various USB devices (USB 1.1,
2.0 and 3.0) using the qemu based backend.

Changes in V7:
- use SPDX header in usbif.h

Changes in V6:
- add SPDX line to driver

Changes in V5:
- added interface documentation to patch 1
- frontend no longer trusts backend to return only sane values

Changes in V4:
- remove sysfs file from frontend, as it violated the "one value per file"
  rule and didn't serve any real purpose.

Changes in V3:
- move frontend to drivers/usb/host and rename it to xen-hcd.
- changed name prefixes in patch 1 to "xenusb" as requested by Greg
- use __u types rather than uint_t as requested by Greg

Changes in V2:
- removed backend, as it can be implemented in user land
- added some access macros and definitions to the pvUSB interface
  description to make it independant from linux kernel USB internals
- adapted frontend to newer kernel version and use new pvUSB
  interface macros
- set port status in one chunk as suggested by Oliver Neukum

Juergen Gross (3):
  usb: Add Xen pvUSB protocol description
  usb: Introduce Xen pvUSB frontend (xen hcd)
  xen: add Xen pvUSB maintainer

 MAINTAINERS  |8 +
 drivers/usb/host/Kconfig |   11 +
 drivers/usb/host/Makefile|1 +
 drivers/usb/host/xen-hcd.c   | 1606 ++
 include/xen/interface/io/usbif.h |  405 
 5 files changed, 2031 insertions(+)
 create mode 100644 drivers/usb/host/xen-hcd.c
 create mode 100644 include/xen/interface/io/usbif.h

-- 
2.26.2




[PATCH v7 2/3] usb: Introduce Xen pvUSB frontend (xen hcd)

2021-11-23 Thread Juergen Gross
Introduces the Xen pvUSB frontend. With pvUSB it is possible for a Xen
domU to communicate with a USB device assigned to that domU. The
communication is all done via the pvUSB backend in a driver domain
(usually Dom0) which is owner of the physical device.

The pvUSB frontend is a USB hcd for a virtual USB host connector.

The code is taken from the pvUSB implementation in Xen done by Fujitsu
based on Linux kernel 2.6.18.

Changes from the original version are:
- port to upstream kernel
- put all code in just one source file
- move module to appropriate location in kernel tree
- adapt to Linux style guide
- minor code modifications to increase readability

Signed-off-by: Juergen Gross 
---
 drivers/usb/host/Kconfig   |   11 +
 drivers/usb/host/Makefile  |1 +
 drivers/usb/host/xen-hcd.c | 1606 
 3 files changed, 1618 insertions(+)
 create mode 100644 drivers/usb/host/xen-hcd.c

diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
index d1d926f8f9c2..57ca5f97a3dc 100644
--- a/drivers/usb/host/Kconfig
+++ b/drivers/usb/host/Kconfig
@@ -772,3 +772,14 @@ config USB_HCD_TEST_MODE
  This option is of interest only to developers who need to validate
  their USB hardware designs.  It is not needed for normal use.  If
  unsure, say N.
+
+config USB_XEN_HCD
+   tristate "Xen usb virtual host driver"
+   depends on XEN
+   select XEN_XENBUS_FRONTEND
+   help
+ The Xen usb virtual host driver serves as a frontend driver enabling
+ a Xen guest system to access USB Devices passed through to the guest
+ by the Xen host (usually Dom0).
+ Only needed if the kernel is running in a Xen guest and generic
+ access to a USB device is needed.
diff --git a/drivers/usb/host/Makefile b/drivers/usb/host/Makefile
index 171de4df50bd..2948983618fb 100644
--- a/drivers/usb/host/Makefile
+++ b/drivers/usb/host/Makefile
@@ -85,3 +85,4 @@ obj-$(CONFIG_USB_HCD_BCMA)+= bcma-hcd.o
 obj-$(CONFIG_USB_HCD_SSB)  += ssb-hcd.o
 obj-$(CONFIG_USB_FOTG210_HCD)  += fotg210-hcd.o
 obj-$(CONFIG_USB_MAX3421_HCD)  += max3421-hcd.o
+obj-$(CONFIG_USB_XEN_HCD)  += xen-hcd.o
diff --git a/drivers/usb/host/xen-hcd.c b/drivers/usb/host/xen-hcd.c
new file mode 100644
index ..7801dde6f5ee
--- /dev/null
+++ b/drivers/usb/host/xen-hcd.c
@@ -0,0 +1,1606 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * xen-hcd.c
+ *
+ * Xen USB Virtual Host Controller driver
+ *
+ * Copyright (C) 2009, FUJITSU LABORATORIES LTD.
+ * Author: Noboru Iwamatsu 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/* Private per-URB data */
+struct urb_priv {
+   struct list_head list;
+   struct urb *urb;
+   int req_id; /* RING_REQUEST id for submitting */
+   int unlink_req_id;  /* RING_REQUEST id for unlinking */
+   int status;
+   bool unlinked;  /* dequeued marker */
+};
+
+/* virtual roothub port status */
+struct rhport_status {
+   __u32 status;
+   bool resuming;  /* in resuming */
+   bool c_connection;  /* connection changed */
+   unsigned long timeout;
+};
+
+/* status of attached device */
+struct vdevice_status {
+   int devnum;
+   enum usb_device_state status;
+   enum usb_device_speed speed;
+};
+
+/* RING request shadow */
+struct usb_shadow {
+   struct xenusb_urb_request req;
+   struct urb *urb;
+};
+
+struct xenhcd_info {
+   /* Virtual Host Controller has 4 urb queues */
+   struct list_head pending_submit_list;
+   struct list_head pending_unlink_list;
+   struct list_head in_progress_list;
+   struct list_head giveback_waiting_list;
+
+   spinlock_t lock;
+
+   /* timer that kick pending and giveback waiting urbs */
+   struct timer_list watchdog;
+   unsigned long actions;
+
+   /* virtual root hub */
+   int rh_numports;
+   struct rhport_status ports[XENUSB_MAX_PORTNR];
+   struct vdevice_status devices[XENUSB_MAX_PORTNR];
+
+   /* Xen related staff */
+   struct xenbus_device *xbdev;
+   int urb_ring_ref;
+   int conn_ring_ref;
+   struct xenusb_urb_front_ring urb_ring;
+   struct xenusb_conn_front_ring conn_ring;
+
+   unsigned int evtchn;
+   unsigned int irq;
+   struct usb_shadow shadow[XENUSB_URB_RING_SIZE];
+   unsigned int shadow_free;
+
+   bool error;
+};
+
+#define GRANT_INVALID_REF 0
+
+#define XENHCD_RING_JIFFIES (HZ/200)
+#define XENHCD_SCAN_JIFFIES 1
+
+enum xenhcd_timer_action {
+   TIMER_RING_WATCHDOG,
+   TIMER_SCAN_PENDING_URBS,
+};
+
+static struct kmem_cache *xenhcd_urbp_cachep;
+
+static inline struct xenhcd_info *xenhcd_hcd_to_info(struct usb_hcd *hcd)
+{
+   return (struct xenhcd_info *)hcd->hcd_priv;
+}
+
+static inline struct usb_hcd *xenhcd_info_to_hcd(struct xenhcd_info *info)
+{
+   

Re: [RFC PATCH] Added the logic to decode 32 bit ldr/str post-indexing instructions

2021-11-23 Thread Julien Grall




On 22/11/2021 14:19, Ayan Kumar Halder wrote:

Hi Julien/Stefano/Wei/Jan,


Hi,

As some of the comments were inter-related, I have consolidated my 
response to all the comments below.


On 19/11/2021 17:26, Julien Grall wrote:

Hi Ayan,

On 19/11/2021 16:52, Ayan Kumar Halder wrote:

At present, post indexing instructions are not emulated by Xen.


Can you explain in the commit message why this is a problem?


Yes, I will update the message as below :-
Armv8 hardware does not provide the correct syndrome for decoding post 
indexing ldr/str instructions. 


This statement implies that the issue happens on both AArch32 and 
AArch64 state. I am OK if we only handle the latter for now. But I would 
clarify it in the commit message.


Thus any post indexing ldr/str 
instruction at EL1 results in a data abort with ISV=0. 


Instruction from EL0 may also trap in Xen. So I would prefer if you just 
say "instruction executed by a domain results ...".


As a result, Xen 
needs to decode the instruction.


Can you give some information on the domain used. Something like:

"this was discovered with XXX which let the compiler deciding which 
instruction to use."




Thus, DomU will be able to read/write to ioreg space with post indexing 


I would say "domain" rather "domU" because all the domains are affected.


instructions for 32 bit.


How about the following commit message:

"
xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions

At the moment, Xen is only handling data abort with valid syndrome (i.e.
ISV=0). Unfortunately, this doesn't cover all the instructions a domain
could use to access MMIO regions.

For instance, Xilinx baremetal OS will use:

volatile u32 *LocalAddr = (volatile u32 *)Addr;
*LocalAddr = Value;

This leave the compiler to decide which store instructions to use. This
may be a post-index store instruction where the HW will not provide a
valid syndrome.

In order to handle post-indexing store/load instructions,
Xen will need to fetch and decode the instruction.

This patch only cover post-index store/load instructions from AArch64 
mode. For now, this is left unimplemented for trap from AArch32 mode.

"


+{
+    int32_t ret;
+
+    if ( !(start >= 0 && length > 0 && length <= 32 - start) )
+    return -EINVAL;
+
+    ret = (value >> start) & (~0U >> (32 - length));


This would be easier to read if you use GENMASK().


I see that GENMASK returns a register mask. In my scenario, I wish to 
compare the offsets 10, 21, 24 and 27 to some fixed value.


So, instead of using GENMASK four times, I can try the following
instr & MASK_for_10_21_24_27 == VALID_for_10_21_24_27 (Wei Chen's 
suggestion)


That would be OK with me. Alternatively, you could use the union 
approach suggested by Bertrand.





Also for size, v and opc, I can defined another bitmask to compare with 
VALID_for_32bit_LDR | VALID_for_32bit_STR.


Wei Chen, You had suggested using vreg_regx_extract(). However, I see 
that it is used to extract the complete u64 register value. In this 
case, I wish to compare certain offsets within a 32 bit register to some 
expected values. So, vreg_regx_extract() might not be appropriate and we 
can use the method mentioned before.


vreg_reg*_extract() are meant to work on a register. So I think they are 
not appropriate here as you don't deal with registers.


[...]


+
+    /* At the moment, we support STR(immediate) - 32 bit variant and
+ * LDR(immediate) - 32 bit variant only.
+ */


Coding style.

Ack




+    if (!((size==2) && (v==0) && ((opc==0) || (opc==1



The coding style for this should be:

if ( !(( size == 2 ) && ( ... ) ... ) )

Ack



  +    goto bad_64bit_loadstore;

+
+    rt = extract32(instr, 0, 5);
+    imm9 = extract32(instr, 12, 9);
+
+    if ( imm9 < 0 )
+    update_dabt(dabt, rt, size, true);
+    else
+    update_dabt(dabt, rt, size, false);


This could be simplified with:

update_dabt(dabt, rt, size, imm9 < 0);

Ack




+
+    dabt->valid = 1;
+
+
+    return 0;
+bad_64bit_loadstore:
+    gprintk(XENLOG_ERR, "unhandled 64bit instruction 0x%x\n", instr);
+    return 1;
+}
+
  static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
  {
  uint16_t instr;
@@ -155,6 +229,9 @@ int decode_instruction(const struct cpu_user_regs 
*regs, struct hsr_dabt *dabt)

  if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
  return decode_thumb(regs->pc, dabt);
+    if ( is_64bit_domain(current->domain) )


You can still run 32-bit code in 64-bit domain. So I think you want to 
check the SPSR mode.


Do you mean the following check :-
if ( (is_64bit_domain(current->domain) && (!(regs->spsr & PSR_MODE_BIT)) )


This should work, but I think you can simplify to use:

!psr_mode_is_32bit()


+ */
+    rc = decode_instruction(regs, );


I actually expect this to also be useful when forwarding the I/O to 
device-model. So I would move the decoding earlier in the code and the 
check of 

Re: [PATCH 01/10] vpci: move lock

2021-11-23 Thread Oleksandr Andrushchenko
Hi, Roger, Jan!

I think the below work will help with solving issues brought up by [1].

So, after thinking a bit more I concluded that indeed we do not actually
need a per-domain vPCI lock, but instead we just want to protect pdev->vpci
which is done in this patch.

At the moment I see four entities which may run concurrently and touch vpci:
1. hypercalls PHYSDEVOP_pci_device_{add|remove} - for Dom0 only
2. hypercalls XEN_DOMCTL_{de|}assign_device - any domain
3. MMIO trap handlers vpci_{read|write}
4. vpci_process_pending

 From the above #1 will update pdev->vpci and #4 may remove pdev->vpci
with vpci_remove_device on error path for Dom0. #2 and #3 seem to be
readers only. So, it is possible to improve this patch to not only take
pdev->vpci->lock out of struct vpci, but also to convert it into RW lock.

Hope this is what is needed in context of vpci and locking.

Thank you,
Oleksandr

On 20.06.18 17:42, Roger Pau Monne wrote:
> To the outside of the vpci struct. This way the lock can be used to
> check whether vpci is present, and removal can be performed while
> holding the lock, in order to make sure there are no accesses to the
> contents of the vpci struct. Previously removal could race with
> vpci_read for example, since the log was dropped prior to freeing
> pdev->vpci.
>
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> ---
>   tools/tests/vpci/emul.h   |  5 ++--
>   tools/tests/vpci/main.c   |  4 +--
>   xen/arch/x86/hvm/vmsi.c   |  8 +++---
>   xen/drivers/passthrough/pci.c |  1 +
>   xen/drivers/vpci/header.c | 19 +
>   xen/drivers/vpci/msi.c| 11 +---
>   xen/drivers/vpci/msix.c   |  8 +++---
>   xen/drivers/vpci/vpci.c   | 51 ++-
>   xen/include/xen/pci.h |  1 +
>   xen/include/xen/vpci.h|  3 +--
>   10 files changed, 70 insertions(+), 41 deletions(-)
>
> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
> index 5d47544bf7..d344ef71c9 100644
> --- a/tools/tests/vpci/emul.h
> +++ b/tools/tests/vpci/emul.h
> @@ -44,6 +44,7 @@ struct domain {
>   };
>   
>   struct pci_dev {
> +bool vpci_lock;
>   struct vpci *vpci;
>   };
>   
> @@ -53,10 +54,8 @@ struct vcpu
>   };
>   
>   extern const struct vcpu *current;
> -extern const struct pci_dev test_pdev;
> +extern struct pci_dev test_pdev;
>   
> -typedef bool spinlock_t;
> -#define spin_lock_init(l) (*(l) = false)
>   #define spin_lock(l) (*(l) = true)
>   #define spin_unlock(l) (*(l) = false)
>   
> diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
> index b9a0a6006b..26c95b08b6 100644
> --- a/tools/tests/vpci/main.c
> +++ b/tools/tests/vpci/main.c
> @@ -23,7 +23,8 @@ static struct vpci vpci;
>   
>   const static struct domain d;
>   
> -const struct pci_dev test_pdev = {
> +struct pci_dev test_pdev = {
> +.vpci_lock = false,
>   .vpci = ,
>   };
>   
> @@ -158,7 +159,6 @@ main(int argc, char **argv)
>   int rc;
>   
>   INIT_LIST_HEAD();
> -spin_lock_init();
>   
>   VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
>   VPCI_READ_CHECK(0, 4, r0);
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 3001d5c488..94550cb8c4 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -893,14 +893,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>   {
>   struct pci_dev *pdev = msix->pdev;
>   
> -spin_unlock(>pdev->vpci->lock);
> +spin_unlock(>pdev->vpci_lock);
>   process_pending_softirqs();
>   /* NB: we assume that pdev cannot go away for an alive domain. 
> */
> -if ( !pdev->vpci || !spin_trylock(>vpci->lock) )
> +if ( !spin_trylock(>vpci_lock) )
>   return -EBUSY;
> -if ( pdev->vpci->msix != msix )
> +if ( !pdev->vpci || pdev->vpci->msix != msix )
>   {
> -spin_unlock(>vpci->lock);
> +spin_unlock(>vpci_lock);
>   return -EAGAIN;
>   }
>   }
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index c4890a4295..a5d59b83b7 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -315,6 +315,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, 
> u8 bus, u8 devfn)
>   *((u8*) >devfn) = devfn;
>   pdev->domain = NULL;
>   INIT_LIST_HEAD(>msi_list);
> +spin_lock_init(>vpci_lock);
>   
>   if ( pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), 
> PCI_FUNC(devfn),
>PCI_CAP_ID_MSIX) )
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 0ec4c082a6..9d5607d5f8 100644
> --- a/xen/drivers/vpci/header.c
> +++ 

Xen Security Advisory 388 v3 (CVE-2021-28704,CVE-2021-28707,CVE-2021-28708) - PoD operations on misaligned GFNs

2021-11-23 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

 Xen Security Advisory CVE-2021-28704,CVE-2021-28707,CVE-2021-28708 / XSA-388
   version 3

   PoD operations on misaligned GFNs

UPDATES IN VERSION 3


Correct affected versions range.

Add CVE numbers to patches.

Public release.

ISSUE DESCRIPTION
=

x86 HVM and PVH guests may be started in populate-on-demand (PoD) mode,
to provide a way for them to later easily have more memory assigned.

Guests are permitted to control certain P2M aspects of individual
pages via hypercalls.  These hypercalls may act on ranges of pages
specified via page orders (resulting in a power-of-2 number of pages).
The implementation of some of these hypercalls for PoD does not
enforce the base page frame number to be suitably aligned for the
specified order, yet some code involved in PoD handling actually makes
such an assumption.

These operations are XENMEM_decrease_reservation (CVE-2021-28704) and
XENMEM_populate_physmap (CVE-2021-28707), the latter usable only by
domains controlling the guest, i.e. a de-privileged qemu or a stub
domain.  (Patch 1, combining the fix to both these two issues.)

In addition handling of XENMEM_decrease_reservation can also trigger a
host crash when the specified page order is neither 4k nor 2M nor 1G
(CVE-2021-28708, patch 2).

IMPACT
==

Malicious or buggy guest kernels may be able to mount a Denial of
Service (DoS) attack affecting the entire system.  Privilege escalation
and information leaks cannot be ruled out.

VULNERABLE SYSTEMS
==

All Xen versions from 4.7 onwards are affected.  Xen versions 4.6 and
older are not affected.

Only x86 HVM and PVH guests started in populate-on-demand mode can
leverage the vulnerability.  Populate-on-demand mode is activated
when the guest's xl configuration file specifies a "maxmem" value which
is larger than the "memory" value.

MITIGATION
==

Not starting x86 HVM or PVH guests in populate-on-demand mode will avoid
the vulnerability.

CREDITS
===

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==

Applying the appropriate pair if attached patches resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa388-?.patch   xen-unstable
xsa388-4.15-?.patch  Xen 4.15.x
xsa388-4.14-?.patch  Xen 4.14.x - 4.12.x

$ sha256sum xsa388*
43f6647e9f7d28d22eeb98680e116b301b0e29ef63ea65c9839a5aaebd449bc4  xsa388-1.patch
64b27a8c7c02036528e00a3070e27e873762d68f4ea1504e906aaf2ddc1c06be  xsa388-2.patch
6917267482101a3f8f1d13905e14994344a0af81370c7a2b92275fb176b321a0  
xsa388-4.14-1.patch
d5886e046c69f34f98f7e1fc6ffcc36d92f8fc79242b9dc88412c39aa79b4ac3  
xsa388-4.14-2.patch
fbe6af409447edc2318940d7c4bc0861a236d40db037166608fc09fa57ef54b1  
xsa388-4.15-1.patch
c828d735aaa3f430ccef314bf27519cd6a5f4daaa79e1c493dc47e42ab09ec9f  
xsa388-4.15-2.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on public-
facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigation described above is NOT permitted
during the embargo on public-facing systems with untrusted guest users
and administrators.  This is because such a configuration change is
recognizable by the affected guests.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jkMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZROMIALJsptV0nV8H5/nCLUWld3mKjAeb/+N20ul9NEwn
rUwIGGGzyrKZQdAljno+9y9o5pM8+BC+aTBwYhmxEWsHm1kodTD+YnJYf8uNW/CW
uhTJp/ZB5EsWhTFHF7YoKbPG0on4KIsy0TgoUug7bv+l2zEny9gfknsj8jdp3qCy
aFv1Bb2PzRh462qVHI3f27Ee8bn7GfErouuLppmDpCva19D3bhUXQ5PhxFB+oqsI
bww4VKUo0nxZftYhpbInWm34dajEIXK7jy5Z/CUPgCj2sTOHHBv7+5JJdw0umn/A
lJ2Ta1u03sdC9JWbat4qjvdVgK9L9vT+jWsfcwk02qq+XSU=
=uSRt
-END PGP SIGNATURE-


xsa388-1.patch
Description: Binary data


xsa388-2.patch
Description: Binary data


xsa388-4.14-1.patch
Description: Binary data



Xen Security Advisory 385 v2 (CVE-2021-28706) - guests may exceed their designated memory limit

2021-11-23 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory CVE-2021-28706 / XSA-385
   version 2

 guests may exceed their designated memory limit

UPDATES IN VERSION 2


Add CVE numbers to patches.

Public release.

ISSUE DESCRIPTION
=

When a guest is permitted to have close to 16TiB of memory, it may be
able to issue hypercalls to increase its memory allocation beyond the
administrator established limit.  This is a result of a calculation
done with 32-bit precision, which may overflow.  It would then only
be the overflowed (and hence small) number which gets compared against
the established upper bound.

IMPACT
==

A guest may be able too allocate unbounded amounts of memory to itself.
This may result in a Denial of Service (DoS) affecting the entire host.

VULNERABLE SYSTEMS
==

All Xen versions from at least 3.2 onwards are affected.

On x86, only Xen builds with the BIGMEM configuration option enabled are
affected.  (This option is off by default.)

Only hosts with more than 16 TiB of memory are affected.

MITIGATION
==

Setting the maximum amount of memory a guest may allocate to strictly
less than 1023 GiB will avoid the vulnerability.

CREDITS
===

This issue was discovered by Julien Grall of Amazon.

RESOLUTION
==

Applying the appropriate first attached patch resolves this specific
issue.  The second patch in addition documents altered support status of
Xen on huge memory systems.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa385-?.patch   xen-unstable
xsa385-4.15.patchXen 4.15.x - 4.14.x
xsa385-4.13.patchXen 4.13.x
xsa385-4.12.patchXen 4.12.x

$ sha256sum xsa385*
b278902e293730a117605200910180bb842cf95db4bdedfd54b42b7314041d8c  xsa385-1.patch
46a5ccfbb763b857f6cd0df46a9b7eed155b9de399ca4c68c9925faf4d1d9adb  xsa385-2.patch
69ebe63dc7dca71f74260af19205a6387be56c7dc67b97fa7695ab1acd3c4da4  
xsa385-4.12.patch
858eaad715e7cc62c4ab9784360f4ec77df70b2636b0755afe780d5c618cf9b4  
xsa385-4.13.patch
831e86c3adfec532b1a48a0b967b7c58c37db3733aee8d78216eb9d535b34f12  
xsa385-4.15.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on public-
facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigation described above is NOT permitted
during the embargo on public-facing systems with untrusted guest users
and administrators.  This is because such a configuration change is
recognizable by the affected guests.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jYMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZEd4IAMwrHHAqFvSHgZ8Uw+DzMeT54db9nowudP9i/kYy
+KobbVlGkxwLAU3mvh5lRkOLYzoIonrcA99cajZQNIcOKt3Mfi/8qzGGUN+hWZvh
6EZo3m7+7vx9mhtAeDBUbjkcZBLiVyxRAWALMS67ScBEX9lZTvbyj9nGkdQJmmfR
pKt98z2Da2uR9YF521KWobuPYC0AFXujYBoavaTQpU/M8SiM+Wp1A2Fc6ZG+9ZKo
frMeqFbHvwj94Hbqpn6CoLu2d/XnykMvttuLlqCKTccQc3puHXdQRz14W8IxxGYx
gqYaIShZCFw/bUCu8mYHroDUlELJI3PIWQ1nJxy02bd5+N0=
=7E6A
-END PGP SIGNATURE-


xsa385-1.patch
Description: Binary data


xsa385-2.patch
Description: Binary data


xsa385-4.12.patch
Description: Binary data


xsa385-4.13.patch
Description: Binary data


xsa385-4.15.patch
Description: Binary data


Xen Security Advisory 387 v2 (CVE-2021-28703) - grant table v2 status pages may remain accessible after de-allocation (take two)

2021-11-23 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory CVE-2021-28703 / XSA-387
   version 2

 grant table v2 status pages may remain accessible after de-allocation (take 
two)

UPDATES IN VERSION 2


Public release.

ISSUE DESCRIPTION
=

Guest get permitted access to certain Xen-owned pages of memory.  The
majority of such pages remain allocated / associated with a guest for
its entire lifetime.  Grant table v2 status pages, however, get
de-allocated when a guest switched (back) from v2 to v1.  The freeing
of such pages requires that the hypervisor know where in the guest
these pages were mapped.  The hypervisor tracks only one use within
guest space, but racing requests from the guest to insert mappings of
these pages may result in any of them to become mapped in multiple
locations.  Upon switching back from v2 to v1, the guest would then
retain access to a page that was freed and perhaps re-used for other
purposes.

This bug was fortuitously fixed by code cleanup in Xen 4.14, and
backported to security-supported Xen branches as a prerequisite of the
fix for XSA-378.

IMPACT
==

A malicious guest may be able to elevate its privileges to that of the
host, cause host or guest Denial of Service (DoS), or cause information
leaks.

VULNERABLE SYSTEMS
==

All Xen branches up to and including 4.13 are vulnerable,
but only if the patches for XSA-378 have not been applied.

Xen versions 4.13.4, 4.14.x and 4.15.x are not affected.

Only x86 HMV and PVH guests permitted to use grant table version 2
interfaces can leverage this vulnerability.  x86 PV guests cannot
leverage this vulnerability.  On Arm, grant table v2 use is explicitly
unsupported.

MITIGATION
==

Running only PV guests will avoid this vulnerability.

Suppressing use of grant table v2 interfaces for HVM or PVH guests will
also avoid this vulnerability.

CREDITS
===

This issue was discovered by Patryk Balicki and Julien Grall of Amazon.

RESOLUTION
==

Applying the following patch resolves the issue:
  x86/p2m: don't assert that the passed in MFN matches for a remove

This patch was supplied with XSA-378, as one of 378's prerequisites.
The fix has already been applied to Xen stable branches as follows:

c65ea16dbcafbe4fe21693b18f8c2a3c5d14600e   in Xen 4.14.x, 4.15.x
f50fbddbae81fcccae56d27317bd71cc0e678ba2   in Xen 4.13.4
d44643199c96ac22491ae002d3bcd1c989b95ea4   in xen.git#stable-4.12
66f400c71d12fe8adfb895984b14f2941e8cb6ce   in xen.git#stable-4.11

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jgMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZlWUIAJ4bU9n2q9A4sqhiW0xJOCI4MIdwV2ym6xziP9iN
e5sg0u3gdp94M1vLf//8h7julxLXgdJd10HWWpJkfRQcsfz3E1ul1O+mAsoHxJwI
/qGl1Xis7AkDFjrPXthJUKh/DNgi8F1Rok7XDbfFznk34v4g6anh4JDfqJIUwIFQ
l2s6qIOc2PjvmrJMXEboT1wEUADZNtChIqOL7Ibre9Zz6/mdr0FjPfPvLAqfvf9m
aLaMElJMRx5iTEUG7qCYXUn8oKLbWNTv88yceudE7QZl3/zv/UnEL8nvBZWs/Gkx
UbrC6wkNFUSpF/ngexvzsSE/SrfMYYaUPfIciyuxvuosGJY=
=DmKh
-END PGP SIGNATURE-


Xen Security Advisory 389 v3 (CVE-2021-28705,CVE-2021-28709) - issues with partially successful P2M updates on x86

2021-11-23 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

 Xen Security Advisory CVE-2021-28705,CVE-2021-28709 / XSA-389
   version 3

  issues with partially successful P2M updates on x86

UPDATES IN VERSION 3


Add CVE numbers to patches.

Public release.

ISSUE DESCRIPTION
=

x86 HVM and PVH guests may be started in populate-on-demand (PoD) mode,
to provide a way for them to later easily have more memory assigned.

Guests are permitted to control certain P2M aspects of individual
pages via hypercalls.  These hypercalls may act on ranges of pages
specified via page orders (resulting in a power-of-2 number of pages).
In some cases the hypervisor carries out the requests by splitting
them into smaller chunks.  Error handling in certain PoD cases has
been insufficient in that in particular partial success of some
operations was not properly accounted for.

There are two code paths affected - page removal (CVE-2021-28705) and
insertion of new pages (CVE-2021-28709).  (We provide one patch which
combines the fix to both issues.)

IMPACT
==

Malicious or buggy guest kernels may be able to mount a Denial of
Service (DoS) attack affecting the entire system.  Privilege escalation
and information leaks cannot be ruled out.

VULNERABLE SYSTEMS
==

All Xen versions from 3.4 onwards are affected.  Xen versions 3.3 and
older are believed to not be affected.

Only x86 HVM and PVH guests started in populate-on-demand mode are
believed to be able to leverage the vulnerability.  Populate-on-demand
mode is activated when the guest's xl configuration file specifies a
"maxmem" value which is larger than the "memory" value.

MITIGATION
==

Not starting x86 HVM or PVH guests in populate-on-demand mode is
believed to allow avoiding the vulnerability.

CREDITS
===

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa389.patch   xen-unstable
xsa389-4.15.patch  Xen 4.15.x
xsa389-4.14.patch  Xen 4.14.x
xsa389-4.13.patch  Xen 4.13.x
xsa389-4.12.patch  Xen 4.12.x

$ sha256sum xsa389*
c00f5b07594a6459bdd6f7334acc373bc3b0c14a5b0e444ec624ac60f857fc6f  xsa389.patch
bf0d66623c3239e334a17332035be5d7c7e33cfdd7f04f9b385f70ce8fa92752  
xsa389-4.12.patch
2737affcf1e0fae5d412067ea8c7fe1cc91a28fa22f3f7e97a502cbd032582cc  
xsa389-4.13.patch
b243284679b32ab8c817a2e41562d8694d9781fa8096c268bb41b0cd91684baa  
xsa389-4.14.patch
0a213e141089fe7808eae067b3c43beed6c7d5887fa4c901e8f9352618788e5a  
xsa389-4.15.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on public-
facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigation described above is NOT permitted
during the embargo on public-facing systems with untrusted guest users
and administrators.  This is because such a configuration change is
recognizable by the affected guests.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jkMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZkOAIAInsof4UP5VTDcLtiwCvGskCXZT0SwbJ5OKbmxG7
RmPJg+R5sy89aHyJ4BP4eRfgrfbG35qBSCB5zLHy2FR3oioRmDz3y4KAFP3hXJRc
B0hSNM9Al9nEfdt0YQeVxt297X0Ouz/bihLoHXKOTZ2AqKcafu9GRIdK0Kcj1v49
azcW1ndfAkIEYDGvtcdZDXYT3CyjLusQme3pweohZGwcQW6UYg7DhRKl0KPQZP/L
paQZd60walNWgDcV7qfMnWit2jYxF4AptLW8c+KFig7qorLE5z9Xj7AIJ6kGriry
fnwy/DE2xRr4IxWk/FsJgDxeAS6mv3KQ2Mpgx2bRAD0jB6I=
=3P7k
-END PGP SIGNATURE-


xsa389.patch
Description: Binary data


xsa389-4.12.patch
Description: Binary data


xsa389-4.13.patch
Description: Binary data


xsa389-4.14.patch
Description: Binary data


xsa389-4.15.patch
Description: Binary data


[xen-unstable test] 166304: tolerable FAIL

2021-11-23 Thread osstest service owner
flight 166304 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/166304/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 166263
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 166263
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166263
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166263
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 166263
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 166263
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 166263
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 166263
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166263
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 166263
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166263
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166263
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass

version targeted for testing:
 xen  be12fcca8b784e456df3adedbffe657d753c5ff9
baseline version:
 xen  

Re: [RFC PATCH] Added the logic to decode 32 bit ldr/str post-indexing instructions

2021-11-23 Thread Bertrand Marquis
Hi,

> On 23 Nov 2021, at 08:53, Bertrand Marquis  wrote:
> 
> Hi Stefano,
> 
>> On 23 Nov 2021, at 04:13, Stefano Stabellini  wrote:
>> 
>> On Mon, 22 Nov 2021, Julien Grall wrote:
>>> On Mon, 22 Nov 2021 at 19:59, Stefano Stabellini  
>>> wrote:
 
 On Mon, 22 Nov 2021, Ayan Kumar Halder wrote:
> Stefano > It doesn't look like we are setting dabt->write anywhere.
> 
> Ayan > Yes, this is a miss. Depending on the opc, this should be upadeted
> accordingly in decode_64bit_loadstore_postindexing().
> 
> Stefano > Also, is info.gpa in try_handle_mmio already updated in the
> pre-index
> case? If not, do we need to apply the offset manually here?
> 
> Ayan > Sorry, I did not understand you. This patch is only related to the 
> post
> indexing ldr/str issue. Why do we need to check for pre-indexing ?
 
 I thought you were trying to handle both post-indexing and pre-indexing.
 It is OK if you intend to only handle post-indexing but considering that
 most of the code is shared between the two, we might as well also make
 pre-indexing work (unless it turns out it is more difficult).
>>> 
>>> Wouldn't this effectively be dead code?
> 
> I agree this would be dead code. Pre-indexing is handled by the hardware, 
> only post are not.
> 
>>> 
 
 In the pre-indexing case, I would imagine we need to update the base
 address before taking any other actions.
>>> 
 From my understanding, this would have already been performed by the
>>> HW when the syndrome is valid. This may also be the case for
>>> the non-valid case, but I haven't checked the Arm Arm.
>> 
>> It is not clear to me either, that's why I wrote "I would imagine"... I
>> was guessing that it is not done by the HW in the non-valid case but I
>> don't know.
> 
> This should be handled by the hardware here, so only post actions should
> be handled here.
> 
>> 
>> Of course, if it is already done by the HW, that's all the better: no
>> need for us to do anything.
> 
> I am wondering though if other types of accesses could not be handled here
> without major modification of the code like other sizes then 32bit.

I did some checks and I think the following cases could be handled:
ldr x2, [x1], #4
nop
ldr w2, [x1], #-4
nop
ldrh w2, [x1], #8
nop
ldrb w2, [x1], #16
nop
str x2, [x1], #4
nop
str w2, [x1], #-4
nop
strh w2, [x1], #8
nop
strb w2, [x1], #16
nop

Anything that I could have missed ?

> 
> There are also post instructions with shifting but to be honest I do not 
> think this is really needed.

Please ignore this, there is no post shifting.

Once this is done I can test and add a test to XTF on arm (on our side, 
upstreaming of this is in progress) to make sure this is maintained.

Regards
Bertrand




Re: [PATCH 4/5] x86/traps: Drop exception_table[] and use if/else dispatching

2021-11-23 Thread Jan Beulich
On 19.11.2021 19:21, Andrew Cooper wrote:
> There is also a lot of redundancy in the table.  8 vectors head to do_trap(),
> 3 are handled in the IST logic, and that only leaves 7 others not heading to
> the do_reserved_trap() catch-all.  This also removes the fragility that any
> accidental NULL entry in the table becomes a ticking timebomb.
> 
> Function pointers are expensive under retpoline, and different vectors have
> wildly different frequences.  Drop the indirect call, and use an if/else chain
> instead, which is a code layout technique used by profile-guided optimsiation.
> 
> Using Xen's own perfcounter infrastructure, we see the following frequences of
> vectors measured from boot until I can SSH into dom0 and collect the stats:
> 
>   vec | CFL-R   | Milan   | Notes
>   +-+-+
>   NMI | 345 |3768 | Watchdog.  Milan has many more CPUs.
>   +-+-+
>   #PF | 1233234 | 2006441 |
>   #GP |   90054 |   96193 |
>   #UD | 848 | 851 |
>   #NM |   0 | 132 | Per-vendor lazy vs eager FPU policy.
>   #DB |  67 |  67 | No clue, but it's something in userspace.
> 
> Bloat-o-meter (after some manual insertion of ELF metadata) reports:
> 
>   add/remove: 0/1 grow/shrink: 2/0 up/down: 102/-256 (-154)
>   Function old new   delta
>   handle_exception_saved   148 226 +78
>   handle_ist_exception 453 477 +24
>   exception_table  256   --256
> 
> showing that the if/else chains are less than half the size that
> exception_table[] was in the first place.
> 
> As part of this change, make two other minor changes.  do_reserved_trap() is
> renamed to do_unhandled_trap() because it is the catchall, and already covers
> things that aren't reserved any more (#VE/#VC/#HV/#SX).
> 
> Furthermore, don't forward #TS to guests.  #TS is specifically for errors
> relating to the Task State Segment, which is a Xen-owned structure, not a
> guest-owned structure.  Even in the 32bit days, we never let guests register
> their own Task State Segments.
> 
> Signed-off-by: Andrew Cooper 

As it feels like we won't be coming to an agreement here, despite
my reservations against the specific form of some of this and on
the basis that the code change itself is certainly an improvement:
Acked-by: Jan Beulich 

I'd nevertheless be curious whether in five or ten years time you'd
still agree with your choice of basing a decision on not really
representative data (at least as far as what the description says).

Jan




Re: [RFC PATCH] Added the logic to decode 32 bit ldr/str post-indexing instructions

2021-11-23 Thread Bertrand Marquis
Hi Stefano,

> On 23 Nov 2021, at 04:13, Stefano Stabellini  wrote:
> 
> On Mon, 22 Nov 2021, Julien Grall wrote:
>> On Mon, 22 Nov 2021 at 19:59, Stefano Stabellini  
>> wrote:
>>> 
>>> On Mon, 22 Nov 2021, Ayan Kumar Halder wrote:
 Stefano > It doesn't look like we are setting dabt->write anywhere.
 
 Ayan > Yes, this is a miss. Depending on the opc, this should be upadeted
 accordingly in decode_64bit_loadstore_postindexing().
 
 Stefano > Also, is info.gpa in try_handle_mmio already updated in the
 pre-index
 case? If not, do we need to apply the offset manually here?
 
 Ayan > Sorry, I did not understand you. This patch is only related to the 
 post
 indexing ldr/str issue. Why do we need to check for pre-indexing ?
>>> 
>>> I thought you were trying to handle both post-indexing and pre-indexing.
>>> It is OK if you intend to only handle post-indexing but considering that
>>> most of the code is shared between the two, we might as well also make
>>> pre-indexing work (unless it turns out it is more difficult).
>> 
>> Wouldn't this effectively be dead code?

I agree this would be dead code. Pre-indexing is handled by the hardware, only 
post are not.

>> 
>>> 
>>> In the pre-indexing case, I would imagine we need to update the base
>>> address before taking any other actions.
>> 
>>> From my understanding, this would have already been performed by the
>> HW when the syndrome is valid. This may also be the case for
>> the non-valid case, but I haven't checked the Arm Arm.
> 
> It is not clear to me either, that's why I wrote "I would imagine"... I
> was guessing that it is not done by the HW in the non-valid case but I
> don't know.

This should be handled by the hardware here, so only post actions should
be handled here.

> 
> Of course, if it is already done by the HW, that's all the better: no
> need for us to do anything.

I am wondering though if other types of accesses could not be handled here
without major modification of the code like other sizes then 32bit.

There are also post instructions with shifting but to be honest I do not think 
this is really needed.

Regards
Bertrand






Re: [PATCH v3] xen: detect uninitialized xenbus in xenbus_init

2021-11-23 Thread Jan Beulich
On 23.11.2021 06:42, Juergen Gross wrote:
> On 22.11.21 23:16, Stefano Stabellini wrote:
>> From: Stefano Stabellini 
>>
>> If the xenstore page hasn't been allocated properly, reading the value
>> of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
>> error. Instead, it will succeed and return zero. Instead of attempting
>> to xen_remap a bad guest physical address, detect this condition and
>> return early.
>>
>> Note that although a guest physical address of zero for
>> HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
>> and zero has never been validly used in that capacity.
>>
>> Also recognize the invalid value of INVALID_PFN which is ULLONG_MAX.
>>
>> For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns
>> above ULONG_MAX should never be passed by the Xen tools to HVM guests
>> anyway, so check for this condition and return early.
>>
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Stefano Stabellini 
>> ---
>> Changes in v3:
>> - improve in-code comment
>> - improve check
>>
>> Changes in v2:
>> - add check for ULLONG_MAX (unitialized)
>> - add check for ULONG_MAX #if BITS_PER_LONG == 32 (actual error)
>> - add pr_err error message
>> ---
>>   drivers/xen/xenbus/xenbus_probe.c | 24 
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/drivers/xen/xenbus/xenbus_probe.c 
>> b/drivers/xen/xenbus/xenbus_probe.c
>> index 94405bb3829e..d3ca57d48a73 100644
>> --- a/drivers/xen/xenbus/xenbus_probe.c
>> +++ b/drivers/xen/xenbus/xenbus_probe.c
>> @@ -951,6 +951,30 @@ static int __init xenbus_init(void)
>>  err = hvm_get_parameter(HVM_PARAM_STORE_PFN, );
>>  if (err)
>>  goto out_error;
>> +/*
>> + * Uninitialized hvm_params are zero and return no error.
>> + * Although it is theoretically possible to have
>> + * HVM_PARAM_STORE_PFN set to zero on purpose, in reality it is
>> + * not zero when valid. If zero, it means that Xenstore hasn't
>> + * been properly initialized. Instead of attempting to map a
>> + * wrong guest physical address return error.
>> + *
>> + * Also recognize the invalid value of INVALID_PFN which is
>> + * ULLONG_MAX.
> 
> Adjust the comment, e.g. s/ULLONG_MAX/all bits set/ (in the commit
> message, too)?

I also don't think the reference to INVALID_PFN is appropriate here. Afaict
the two aren't the same on 32-bit. Plus I can't even find a constant named
this way in Linux'es include/.

>> + */
>> +if (!v || !(v + 1)) {
> 
> For me "if (!v || !~v)" would be more readable, but I don't really feel
> strong here.

Oh, indeed.

Jan




[qemu-mainline test] 166300: tolerable FAIL - PUSHED

2021-11-23 Thread osstest service owner
flight 166300 qemu-mainline real [real]
flight 166306 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/166300/
http://logs.test-lab.xenproject.org/osstest/logs/166306/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-credit2  10 host-ping-check-xen fail pass in 166306-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 166306 never pass
 test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 166306 never 
pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166267
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166267
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166267
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 166267
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 166267
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 166267
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166267
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166267
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 qemuu89d2f9e4c63799f7f03e9180c63b7dc45fc2a04a
baseline version:
 qemuuedf1aa8d44d5070828ac10c5b82dbc6caf29e7d3

Last test of basis   166267  2021-11-22 

Re: [PATCH 1/2] rangeset: add RANGESETF_no_print flag

2021-11-23 Thread Oleksandr Andrushchenko


On 23.11.21 10:01, Jan Beulich wrote:
> On 23.11.2021 08:49, Oleksandr Andrushchenko wrote:
>>
>> On 23.11.21 09:38, Jan Beulich wrote:
>>> On 22.11.2021 10:28, Oleksandr Andrushchenko wrote:
 --- a/xen/include/xen/rangeset.h
 +++ b/xen/include/xen/rangeset.h
 @@ -51,6 +51,9 @@ void rangeset_limit(
 /* Pretty-print range limits in hexadecimal. */
#define _RANGESETF_prettyprint_hex 0
#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
 +/* Do not print entries marked with this flag. */
 +#define _RANGESETF_no_print 1
>>> There's no need for this, so I'd like to ask that you add ...
>>>
 +#define RANGESETF_no_print  (1U << _RANGESETF_no_print)
>>> ... just this. In due course we should do away with
>>> _RANGESETF_prettyprint_hex as well (unless you want to do so right
>>> at this occasion).
>> Ok, I can remove "#define _RANGESETF_prettyprint_hex 0" as well
>> and have both flags defined as
>>    /* Pretty-print range limits in hexadecimal. */
>> #define RANGESETF_prettyprint_hex  (1U << 0)
>> /* Do not print entries marked with this flag. */
>> #define RANGESETF_no_print (1U << 1)
>>
>> Or we can use BIT macro here:
>>
>> /* Pretty-print range limits in hexadecimal. */
>> #define RANGESETF_prettyprint_hex  BIT(0, U)
>> /* Do not print entries marked with this flag. */
>> #define RANGESETF_no_print BIT(1, U)
>>
>> What's your preference here?
> Clearly the former.
I am fine with this too, so I'll use it
>   But this may not be everybody's view. I'm hesitant
> towards further uses at least as long as its generalization, GENMASK(),
> isn't ready for easy use. It was some time ago that we had a discussion
> about that one, supporting my reservations voiced back at the time when
> it was introduced into our code base. Iirc there was more than one
> issue with it; the immediately obvious one is that it silently does the
> wrong thing when the arguments get specified the wrong way round, when
> it really would be relatively easy to support either ordering.
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH 1/2] rangeset: add RANGESETF_no_print flag

2021-11-23 Thread Jan Beulich
On 23.11.2021 08:49, Oleksandr Andrushchenko wrote:
> 
> 
> On 23.11.21 09:38, Jan Beulich wrote:
>> On 22.11.2021 10:28, Oleksandr Andrushchenko wrote:
>>> --- a/xen/include/xen/rangeset.h
>>> +++ b/xen/include/xen/rangeset.h
>>> @@ -51,6 +51,9 @@ void rangeset_limit(
>>>/* Pretty-print range limits in hexadecimal. */
>>>   #define _RANGESETF_prettyprint_hex 0
>>>   #define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
>>> +/* Do not print entries marked with this flag. */
>>> +#define _RANGESETF_no_print 1
>> There's no need for this, so I'd like to ask that you add ...
>>
>>> +#define RANGESETF_no_print  (1U << _RANGESETF_no_print)
>> ... just this. In due course we should do away with
>> _RANGESETF_prettyprint_hex as well (unless you want to do so right
>> at this occasion).
> Ok, I can remove "#define _RANGESETF_prettyprint_hex 0" as well
> and have both flags defined as
>   /* Pretty-print range limits in hexadecimal. */
> #define RANGESETF_prettyprint_hex  (1U << 0)
> /* Do not print entries marked with this flag. */
> #define RANGESETF_no_print (1U << 1)
> 
> Or we can use BIT macro here:
> 
> /* Pretty-print range limits in hexadecimal. */
> #define RANGESETF_prettyprint_hex  BIT(0, U)
> /* Do not print entries marked with this flag. */
> #define RANGESETF_no_print BIT(1, U)
> 
> What's your preference here?

Clearly the former. But this may not be everybody's view. I'm hesitant
towards further uses at least as long as its generalization, GENMASK(),
isn't ready for easy use. It was some time ago that we had a discussion
about that one, supporting my reservations voiced back at the time when
it was introduced into our code base. Iirc there was more than one
issue with it; the immediately obvious one is that it silently does the
wrong thing when the arguments get specified the wrong way round, when
it really would be relatively easy to support either ordering.

Jan