[PATCH v7 6/7] xen/arm: process pending vPCI map/unmap operations
From: Oleksandr Andrushchenko vPCI may map and unmap PCI device memory (BARs) being passed through which may take a lot of time. For this those operations may be deferred to be performed later, so that they can be safely preempted. Currently this deferred processing is happening in common IOREQ code which doesn't seem to be the right place for x86 and is even more doubtful because IOREQ may not be enabled for Arm at all. So, for Arm the pending vPCI work may have no chance to be executed if the processing is left as is in the common IOREQ code only. For that reason make vPCI processing happen in arch specific code. Please be aware that there are a few outstanding TODOs affecting this code path, see xen/drivers/vpci/header.c:map_range and xen/drivers/vpci/header.c:vpci_process_pending. Signed-off-by: Oleksandr Andrushchenko [x86 part] Acked-by: Jan Beulich Reviewed-by: Julien Grall --- Cc: Andrew Cooper Cc: Paul Durrant Since v5: - check_for_vcpu_work: vPCI addition is moved before the vcpu_ioreq__handle_completion(v). This is to avoid differences with the x86 version. (Julien) Since v2: - update commit message with more insight on x86, IOREQ and Arm - restored order of invocation for IOREQ and vPCI processing (Jan) Since v1: - Moved the check for pending vpci work from the common IOREQ code to hvm_do_resume on x86 - Re-worked the code for Arm to ensure we don't miss pending vPCI work --- xen/arch/arm/traps.c | 13 + xen/arch/x86/hvm/hvm.c | 6 ++ xen/common/ioreq.c | 9 - 3 files changed, 19 insertions(+), 9 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 219ab3c3fbde..8757210a798b 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -2290,6 +2291,18 @@ static bool check_for_vcpu_work(void) { struct vcpu *v = current; +if ( has_vpci(v->domain) ) +{ +bool pending; + +local_irq_enable(); +pending = vpci_process_pending(v); +local_irq_disable(); + +if ( pending ) +return true; +} + #ifdef CONFIG_IOREQ_SERVER if ( domain_has_ioreq_server(v->domain) ) { diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index eee365711d63..096a61b7ea02 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -546,6 +546,12 @@ void hvm_do_resume(struct vcpu *v) pt_restore_timer(v); +if ( has_vpci(v->domain) && vpci_process_pending(v) ) +{ +raise_softirq(SCHEDULE_SOFTIRQ); +return; +} + if ( !vcpu_ioreq_handle_completion(v) ) return; diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index d732dc045df9..689d256544c8 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -25,9 +25,7 @@ #include #include #include -#include #include -#include #include #include @@ -212,19 +210,12 @@ static bool wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p) bool vcpu_ioreq_handle_completion(struct vcpu *v) { -struct domain *d = v->domain; struct vcpu_io *vio = >io; struct ioreq_server *s; struct ioreq_vcpu *sv; enum vio_completion completion; bool res = true; -if ( has_vpci(d) && vpci_process_pending(v) ) -{ -raise_softirq(SCHEDULE_SOFTIRQ); -return false; -} - while ( (sv = get_pending_vcpu(v, )) != NULL ) if ( !wait_for_io(sv, get_ioreq(s, v)) ) return false; -- 2.25.1
[PATCH v7 5/7] xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m
From: Oleksandr Andrushchenko PCI host bridges are special devices in terms of implementing PCI passthrough. According to [1] the current implementation depends on Domain-0 to perform the initialization of the relevant PCI host bridge hardware and perform PCI device enumeration. In order to achieve that one of the required changes is to not map all the memory ranges in map_range_to_domain as we traverse the device tree on startup and perform some additional checks if the range needs to be mapped to Domain-0. The generic PCI host controller device tree binding says [2]: - ranges: As described in IEEE Std 1275-1994, but must provide at least a definition of non-prefetchable memory. One or both of prefetchable Memory and IO Space may also be provided. - reg : The Configuration Space base address and size, as accessed from the parent bus. The base address corresponds to the first bus in the "bus-range" property. If no "bus-range" is specified, this will be bus 0 (the default). >From the above none of the memory ranges from the "ranges" property needs to be mapped to Domain-0 at startup as MMIO mapping is going to be handled dynamically by vPCI as we assign PCI devices, e.g. each device assigned to Domain-0/guest will have its MMIOs mapped/unmapped as needed by Xen. The "reg" property covers not only ECAM space, but may also have other then the configuration memory ranges described, for example [3]: - reg: Should contain rc_dbi, config registers location and length. - reg-names: Must include the following entries: "rc_dbi": controller configuration registers; "config": PCIe configuration space registers. This patch makes it possible to not map all the ranges from the "ranges" property and also ECAM from the "reg". All the rest from the "reg" property still needs to be mapped to Domain-0, so the PCI host bridge remains functional in Domain-0. This is done by first skipping the mappings while traversing the device tree as it is done for usual devices and then by calling a dedicated pci_host_bridge_mappings function which only maps MMIOs required by the host bridges leaving the regions, needed for vPCI traps, unmapped. [1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00777.html [2] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt [3] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt Signed-off-by: Oleksandr Andrushchenko --- Since v7: - updates in comments and commit message Since v5: - remove some need_mapping local variables - use own_device in handle_device - add __init for pci_ecam_need_p2m_hwdom_mapping - make pci_host_bridge_mappings use p2m_mmio_direct_dev directly Since v4: - update skip_mapping comment - add comment why we need to map interrupts to Dom0 Since v3: - pass struct map_range_data to map_dt_irq_to_domain - remove redundant check from map_range_to_domain - fix handle_device's .skip_mapping Since v2: - removed check in map_range_to_domain for PCI_DEV and moved it to handle_device, so the code is simpler - s/map_pci_bridge/skip_mapping - extended comment in pci_host_bridge_mappings - minor code restructure in construct_dom0 - s/.need_p2m_mapping/.need_p2m_hwdom_mapping and related callbacks - unsigned int i; in pci_host_bridge_mappings Since v1: - Added better description of why and what needs to be mapped into Domain-0's p2m and what doesn't - Do not do any mappings for PCI devices while traversing the DT - Walk all the bridges and make required mappings in one go --- xen/arch/arm/domain_build.c| 66 +- xen/arch/arm/pci/ecam.c| 14 +++ xen/arch/arm/pci/pci-host-common.c | 50 ++ xen/arch/arm/pci/pci-host-zynqmp.c | 1 + xen/include/asm-arm/pci.h | 10 + xen/include/asm-arm/setup.h| 13 ++ 6 files changed, 126 insertions(+), 28 deletions(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index c83c02ab8ac6..47c884cca2c3 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -51,12 +51,6 @@ static int __init parse_dom0_mem(const char *s) } custom_param("dom0_mem", parse_dom0_mem); -struct map_range_data -{ -struct domain *d; -p2m_type_t p2mt; -}; - /* Override macros from asm/page.h to make them work with mfn_t */ #undef virt_to_mfn #define virt_to_mfn(va) _mfn(__virt_to_mfn(va)) @@ -1720,10 +1714,10 @@ static int __init map_dt_irq_to_domain(const struct dt_device_node *dev, const struct dt_irq *dt_irq, void *data) { -struct domain *d = data; +struct map_range_data *mr_data = data; +struct domain *d = mr_data->d; unsigned int irq = dt_irq->irq; int res; -bool need_mapping = !dt_device_for_passthrough(dev); if ( irq < NR_LOCAL_IRQS )
[PATCH v7 3/7] xen/arm: setup MMIO range trap handlers for hardware domain
From: Oleksandr Andrushchenko In order for vPCI to work it needs to maintain guest and hardware domain's views of the configuration space. For example, BARs and COMMAND registers require emulation for guests and the guest view of the registers needs to be in sync with the real contents of the relevant registers. For that ECAM address space needs to also be trapped for the hardware domain, so we need to implement PCI host bridge specific callbacks to properly setup MMIO handlers for those ranges depending on particular host bridge implementation. Signed-off-by: Oleksandr Andrushchenko --- Since v6: - eliminate pci_host_get_num_bridges and make pci_host_iterate_bridges return the count - extend comment in domain_vpci_init - remove not yet relevant code for num MMIOs and virtual bus topology - add extra check for has_vpci in domain_vpci_get_num_mmio_handlers - remove code that fixes num MMIOs for guest domain as it doesn't belong to this patch Since v5: - add vpci_sbdf_from_gpa helper for gpa to SBDF translation - take bridge's bus start into account while calculating SBDF Since v4: - unsigned int for functions working with count - gate number of MMIO handlers needed for CONFIG_HAS_PCI_MSI and fix their number, e.g. single handler for PBA and MSI-X tables (Roger) - re-work code for assigning MMIO handlers to be simpler and account on the fact that there could multiple host bridges exist for the hwdom Since v3: - fixed comment formatting Since v2: - removed unneeded assignment (count = 0) - removed unneeded header inclusion - update commit message Since v1: - Dynamically calculate the number of MMIO handlers required for vPCI and update the total number accordingly - s/clb/cb - Do not introduce a new callback for MMIO handler setup --- xen/arch/arm/domain.c | 2 + xen/arch/arm/pci/pci-host-common.c | 17 +++ xen/arch/arm/vpci.c| 79 +++--- xen/arch/arm/vpci.h| 6 +++ xen/include/asm-arm/pci.h | 4 ++ 5 files changed, 100 insertions(+), 8 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 96e1b235501d..92a6c509e5c5 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -739,6 +739,8 @@ int arch_domain_create(struct domain *d, if ( (rc = domain_vgic_register(d, )) != 0 ) goto fail; +count += domain_vpci_get_num_mmio_handlers(d); + if ( (rc = domain_io_init(d, count + MAX_IO_HANDLER)) != 0 ) goto fail; diff --git a/xen/arch/arm/pci/pci-host-common.c b/xen/arch/arm/pci/pci-host-common.c index 9653b92b5b2e..18b09d5e6f10 100644 --- a/xen/arch/arm/pci/pci-host-common.c +++ b/xen/arch/arm/pci/pci-host-common.c @@ -296,6 +296,23 @@ int pci_get_host_bridge_segment(const struct dt_device_node *node, return -EINVAL; } +int pci_host_iterate_bridges_and_count(struct domain *d, + int (*cb)(struct domain *d, + struct pci_host_bridge *bridge)) +{ +struct pci_host_bridge *bridge; +int err, count = 0; + +list_for_each_entry( bridge, _host_bridges, node ) +{ +err = cb(d, bridge); +if ( err ) +return err; +count += err; +} +return count; +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c index 23f45386f4b3..ccd998d8dba2 100644 --- a/xen/arch/arm/vpci.c +++ b/xen/arch/arm/vpci.c @@ -16,16 +16,31 @@ #include +static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge, + paddr_t gpa) +{ +pci_sbdf_t sbdf; + +if ( bridge ) +{ +sbdf.sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr); +sbdf.seg = bridge->segment; +sbdf.bus += bridge->cfg->busn_start; +} +else +sbdf.sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE); + +return sbdf; +} + static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, register_t *r, void *p) { -pci_sbdf_t sbdf; +struct pci_host_bridge *bridge = p; +pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa); /* data is needed to prevent a pointer cast on 32bit */ unsigned long data; -/* We ignore segment part and always handle segment 0 */ -sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE); - if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa), 1U << info->dabt.size, ) ) { @@ -41,10 +56,8 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info, static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info, register_t r, void *p) { -pci_sbdf_t sbdf; - -/* We ignore segment part and always handle segment 0 */ -sbdf.sbdf = VPCI_ECAM_BDF(info->gpa - GUEST_VPCI_ECAM_BASE); +struct pci_host_bridge *bridge = p; +pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
[PATCH v7 4/7] xen/arm: account IO handler for emulated PCI host bridge
From: Oleksandr Andrushchenko At the moment, we always allocate an extra 16 slots for IO handlers (see MAX_IO_HANDLER). So while adding an IO trap handler for the emulated PCI host bridge we are not breaking anything, but we have a latent bug as the maximum number of IOs may be exceeded. Fix this by explicitly telling that we have an additional IO handler, so it is accounted. Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM") Signed-off-by: Oleksandr Andrushchenko --- New in v7 --- xen/arch/arm/vpci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c index ccd998d8dba2..8e801f275879 100644 --- a/xen/arch/arm/vpci.c +++ b/xen/arch/arm/vpci.c @@ -126,7 +126,8 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d) return ret < 0 ? 0 : ret; } -return 0; +/* For a single emulated host bridge's configuration space. */ +return 1; } /* -- 2.25.1
[PATCH v7 1/7] xen/arm: rename DEVICE_PCI to DEVICE_PCI_HOSTBRIDGE
From: Oleksandr Andrushchenko To better reflect the nature of the device type and not to make any confusion rename DEVICE_PCI to DEVICE_PCI_HOSTBRIDGE which it really is. Suggested-by: Julien Grall Signed-off-by: Oleksandr Andrushchenko Reviewed-by: Julien Grall --- New in v6 --- xen/arch/arm/pci/pci-host-generic.c | 2 +- xen/arch/arm/pci/pci-host-zynqmp.c | 2 +- xen/arch/arm/pci/pci.c | 2 +- xen/include/asm-arm/device.h| 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/pci/pci-host-generic.c b/xen/arch/arm/pci/pci-host-generic.c index 33457fbe9615..46de6e43cc72 100644 --- a/xen/arch/arm/pci/pci-host-generic.c +++ b/xen/arch/arm/pci/pci-host-generic.c @@ -32,7 +32,7 @@ static int __init pci_host_generic_probe(struct dt_device_node *dev, return pci_host_common_probe(dev, _generic_ecam_ops); } -DT_DEVICE_START(pci_gen, "PCI HOST GENERIC", DEVICE_PCI) +DT_DEVICE_START(pci_gen, "PCI HOST GENERIC", DEVICE_PCI_HOSTBRIDGE) .dt_match = gen_pci_dt_match, .init = pci_host_generic_probe, DT_DEVICE_END diff --git a/xen/arch/arm/pci/pci-host-zynqmp.c b/xen/arch/arm/pci/pci-host-zynqmp.c index 61a9807d3d58..516982bca833 100644 --- a/xen/arch/arm/pci/pci-host-zynqmp.c +++ b/xen/arch/arm/pci/pci-host-zynqmp.c @@ -49,7 +49,7 @@ static int __init pci_host_generic_probe(struct dt_device_node *dev, return pci_host_common_probe(dev, _pcie_ops); } -DT_DEVICE_START(pci_gen, "PCI HOST ZYNQMP", DEVICE_PCI) +DT_DEVICE_START(pci_gen, "PCI HOST ZYNQMP", DEVICE_PCI_HOSTBRIDGE) .dt_match = nwl_pcie_dt_match, .init = pci_host_generic_probe, DT_DEVICE_END diff --git a/xen/arch/arm/pci/pci.c b/xen/arch/arm/pci/pci.c index 082c14e127a8..78b97beaef12 100644 --- a/xen/arch/arm/pci/pci.c +++ b/xen/arch/arm/pci/pci.c @@ -46,7 +46,7 @@ static int __init dt_pci_init(void) dt_for_each_device_node(dt_host, np) { -rc = device_init(np, DEVICE_PCI, NULL); +rc = device_init(np, DEVICE_PCI_HOSTBRIDGE, NULL); /* * Ignore the following error codes: * - EBADF: Indicate the current device is not a pci device. diff --git a/xen/include/asm-arm/device.h b/xen/include/asm-arm/device.h index 3782660751b6..086dde13eb6b 100644 --- a/xen/include/asm-arm/device.h +++ b/xen/include/asm-arm/device.h @@ -37,7 +37,7 @@ enum device_class DEVICE_SERIAL, DEVICE_IOMMU, DEVICE_GIC, -DEVICE_PCI, +DEVICE_PCI_HOSTBRIDGE, /* Use for error */ DEVICE_UNKNOWN, }; -- 2.25.1
[PATCH v7 0/7] PCI devices passthrough on Arm, part 2
From: Oleksandr Andrushchenko Hi, all! This is an assorted series of patches which aim is to make some further basis for PCI passthrough on Arm support. The series continues the work published earlier by Arm [1] and adds new helpers and clears the way for vPCI changes which will follow. RFC is at [2], [3]. Design presentation can be found at [4]. I have removed patch [PATCH v6 5/7] xen/arm: do not map IRQs and memory for disabled devices as it seems that this needs more time for decision on how to achive that. I have also added a new patch [PATCH v7 4/7] xen/arm: account IO handler for emulated PCI host bridge with a tiny latent bug fix. Thank you, Oleksandr [1] https://patchwork.kernel.org/project/xen-devel/list/?series=558681 [2] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg01184.html [3] https://lists.xenproject.org/archives/html/xen-devel/2020-07/threads.html#01184 [4] https://static.sched.com/hosted_files/xen2021/e4/PCI_Device_Passthrough_On_Arm.pdf Oleksandr Andrushchenko (7): xen/arm: rename DEVICE_PCI to DEVICE_PCI_HOSTBRIDGE xen/arm: add pci-domain for disabled devices xen/arm: setup MMIO range trap handlers for hardware domain xen/arm: account IO handler for emulated PCI host bridge xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m xen/arm: process pending vPCI map/unmap operations xen/arm: do not use void pointer in pci_host_common_probe xen/arch/arm/domain.c | 2 + xen/arch/arm/domain_build.c | 132 +++- xen/arch/arm/pci/ecam.c | 18 +++- xen/arch/arm/pci/pci-host-common.c | 81 +++-- xen/arch/arm/pci/pci-host-generic.c | 2 +- xen/arch/arm/pci/pci-host-zynqmp.c | 3 +- xen/arch/arm/pci/pci.c | 2 +- xen/arch/arm/traps.c| 13 +++ xen/arch/arm/vpci.c | 80 +++-- xen/arch/arm/vpci.h | 6 ++ xen/arch/x86/hvm/hvm.c | 6 ++ xen/common/ioreq.c | 9 -- xen/include/asm-arm/device.h| 2 +- xen/include/asm-arm/pci.h | 27 +- xen/include/asm-arm/setup.h | 13 +++ 15 files changed, 323 insertions(+), 73 deletions(-) -- 2.25.1
[PATCH v7 2/7] xen/arm: add pci-domain for disabled devices
From: Oleksandr Andrushchenko If a PCI host bridge device is present in the device tree, but is disabled, then its PCI host bridge driver was not instantiated. This results in the failure of the pci_get_host_bridge_segment() and the following panic during Xen start: (XEN) Device tree generation failed (-22). (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Could not set up DOM0 guest OS (XEN) Fix this by adding "linux,pci-domain" property for all device tree nodes which have "pci" device type, so we know which segments will be used by the guest for which bridges. Fixes: 4cfab4425d39 ("xen/arm: Add linux,pci-domain property for hwdom if not available.") Signed-off-by: Oleksandr Andrushchenko --- Since v6: - use use_dt_domains in pci_get_new_domain_nr and return -1 if set - do not set "linux,pci-domain" if parent device is "pci" - move the code to a new helper handle_linux_pci_domain (Julien) New in v6 --- xen/arch/arm/domain_build.c| 66 +++--- xen/arch/arm/pci/pci-host-common.c | 8 +++- xen/include/asm-arm/pci.h | 8 3 files changed, 66 insertions(+), 16 deletions(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index e76cee3caccf..c83c02ab8ac6 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -654,6 +654,55 @@ static void __init allocate_static_memory(struct domain *d, } #endif +/* + * When PCI passthrough is available we want to keep the + * "linux,pci-domain" in sync for every host bridge. + * + * Xen may not have a driver for all the host bridges. So we have + * to write an heuristic to detect whether a device node describes + * a host bridge. + * + * The current heuristic assumes that a device is a host bridge + * if the type is "pci" and then parent type is not "pci". + */ +static int handle_linux_pci_domain(struct kernel_info *kinfo, + const struct dt_device_node *node) +{ +uint16_t segment; +int res; + +if ( !is_pci_passthrough_enabled() ) +return 0; + +if ( !dt_device_type_is_equal(node, "pci") ) +return 0; + +if ( node->parent && dt_device_type_is_equal(node->parent, "pci") ) +return 0; + +if ( dt_find_property(node, "linux,pci-domain", NULL) ) +return 0; + +/* Allocate and create the linux,pci-domain */ +res = pci_get_host_bridge_segment(node, ); +if ( res < 0 ) +{ +res = pci_get_new_domain_nr(); +if ( res < 0 ) +{ +printk(XENLOG_DEBUG "Can't assign PCI segment to %s\n", + node->full_name); +return -FDT_ERR_NOTFOUND; +} + +segment = res; +printk(XENLOG_DEBUG "Assigned segment %d to %s\n", + segment, node->full_name); +} + +return fdt_property_cell(kinfo->fdt, "linux,pci-domain", segment); +} + static int __init write_properties(struct domain *d, struct kernel_info *kinfo, const struct dt_device_node *node) { @@ -755,21 +804,10 @@ static int __init write_properties(struct domain *d, struct kernel_info *kinfo, return res; } -if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, "pci") ) -{ -if ( !dt_find_property(node, "linux,pci-domain", NULL) ) -{ -uint16_t segment; - -res = pci_get_host_bridge_segment(node, ); -if ( res < 0 ) -return res; +res = handle_linux_pci_domain(kinfo, node); -res = fdt_property_cell(kinfo->fdt, "linux,pci-domain", segment); -if ( res ) -return res; -} -} +if ( res ) +return res; /* * Override the property "status" to disable the device when it's diff --git a/xen/arch/arm/pci/pci-host-common.c b/xen/arch/arm/pci/pci-host-common.c index cdeb3a283a77..9653b92b5b2e 100644 --- a/xen/arch/arm/pci/pci-host-common.c +++ b/xen/arch/arm/pci/pci-host-common.c @@ -30,6 +30,8 @@ static LIST_HEAD(pci_host_bridges); static atomic_t domain_nr = ATOMIC_INIT(-1); +static int use_dt_domains = -1; + static inline void __iomem *pci_remap_cfgspace(paddr_t start, size_t len) { return ioremap_nocache(start, len); @@ -137,14 +139,16 @@ void pci_add_host_bridge(struct pci_host_bridge *bridge) list_add_tail(>node, _host_bridges); } -static int pci_get_new_domain_nr(void) +int pci_get_new_domain_nr(void) { +if ( use_dt_domains ) +return -1; + return atomic_inc_return(_nr); } static int pci_bus_find_domain_nr(struct dt_device_node *dev) { -static int use_dt_domains = -1; int domain; domain = dt_get_pci_domain_nr(dev); diff --git a/xen/include/asm-arm/pci.h b/xen/include/asm-arm/pci.h index 81273e0d87ac..c20eba643d86 100644 --- a/xen/include/asm-arm/pci.h +++ b/xen/include/asm-arm/pci.h @@ -108,6 +108,8 @@ static
Re: About support for memory event on AMD CPUs
On 23.11.2021 18:14, YAN Untitled wrote: > Dear developers, > > Short version: is memory event supported on AMD CPUs or is it going to be > supported? > > Long version: > > Environment: LibVMI 0.15.0 + Xen 4.14.3 on an **AMD CPU** (5950x), 64-bit > Debian 11 Dom0. > > What I am trying to accomplish: register a **memory event** handler, so that > I can capture all memory accesses within a specific range issued by any > thread, > or all memory accesses within any range issued by a specific thread. > > What I got instead: an error from LibVMI saying > "xc_hvm_set_mem_access failed with code: -1". > > Some investigation: by inspecting the source code of LibVMI, I find the direct > cause is one of the libxc functions, 1) xc_set_mem_access or > 2) xc_altp2m_set_mem_access, returned error code -1. > > After some searching, I found someone else having a similar problem [1]. I > also > noted LibVMI says: > >> Currently only the Xen Hypervisor provides these features, >> and some of these are specifically only available on Intel CPUs > > However, I can't find the exact confirmation for the availability of memory > event on AMD CPUs from https://wiki.xenproject.org. Aiui underlying what you want is altp2m, which presently depends (in the hypervisor) on EPT being available (and in use for the guest in question). Jan
Re: [PATCH v6 4/7] xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m
Hi, Julien! On 23.11.21 18:42, Julien Grall wrote: > Hi Oleksandr, > > On 05/11/2021 06:33, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko >> >> PCI host bridges are special devices in terms of implementing PCI >> passthrough. According to [1] the current implementation depends on >> Domain-0 to perform the initialization of the relevant PCI host >> bridge hardware and perform PCI device enumeration. In order to >> achieve that one of the required changes is to not map all the memory >> ranges in map_range_to_domain as we traverse the device tree on startup >> and perform some additional checks if the range needs to be mapped to >> Domain-0. >> >> The generic PCI host controller device tree binding says [2]: >> - ranges: As described in IEEE Std 1275-1994, but must provide >> at least a definition of non-prefetchable memory. One >> or both of prefetchable Memory and IO Space may also >> be provided. >> >> - reg : The Configuration Space base address and size, as accessed >> from the parent bus. The base address corresponds to >> the first bus in the "bus-range" property. If no >> "bus-range" is specified, this will be bus 0 (the default). >> >> From the above none of the memory ranges from the "ranges" property >> needs to be mapped to Domain-0 at startup as MMIO mapping is going to >> be handled dynamically by vPCI as we assign PCI devices, e.g. each >> device assigned to Domain-0/guest will have its MMIOs mapped/unmapped >> as needed by Xen. >> >> The "reg" property covers not only ECAM space, but may also have other >> then the configuration memory ranges described, for example [3]: >> - reg: Should contain rc_dbi, config registers location and length. >> - reg-names: Must include the following entries: >> "rc_dbi": controller configuration registers; >> "config": PCIe configuration space registers. >> >> This patch makes it possible to not map all the ranges from the >> "ranges" property and also ECAM from the "reg". All the rest from the >> "reg" property still needs to be mapped to Domain-0, so the PCI >> host bridge remains functional in Domain-0. > > The commit message is explaining the problematic quite well (thanks for > that). I think it also wants to explain the approach taken (the fact that we > are deferring the mapping for hostbridges). Will add > >> >> [1] >> https://urldefense.com/v3/__https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00777.html__;!!GF_29dbcQIUBPA!mdKldRDDMc62Zlr3zHPdIXeXjHNI2H6pGE7ZHNvbK3xizCNe0mhV7jMF5WLGZ7bA5B47tK4zFA$ >> [lists[.]xenproject[.]org] >> [2] >> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt__;!!GF_29dbcQIUBPA!mdKldRDDMc62Zlr3zHPdIXeXjHNI2H6pGE7ZHNvbK3xizCNe0mhV7jMF5WLGZ7bA5B7UwSKSww$ >> [kernel[.]org] >> [3] >> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt__;!!GF_29dbcQIUBPA!mdKldRDDMc62Zlr3zHPdIXeXjHNI2H6pGE7ZHNvbK3xizCNe0mhV7jMF5WLGZ7bA5B6NizsNXQ$ >> [kernel[.]org] >> >> Signed-off-by: Oleksandr Andrushchenko >> --- >> Since v5: >> - remove some need_mapping local variables >> - use own_device in handle_device >> - add __init for pci_ecam_need_p2m_hwdom_mapping >> - make pci_host_bridge_mappings use p2m_mmio_direct_dev directly >> Since v4: >> - update skip_mapping comment >> - add comment why we need to map interrupts to Dom0 >> Since v3: >> - pass struct map_range_data to map_dt_irq_to_domain >> - remove redundant check from map_range_to_domain >> - fix handle_device's .skip_mapping >> Since v2: >> - removed check in map_range_to_domain for PCI_DEV >> and moved it to handle_device, so the code is >> simpler >> - s/map_pci_bridge/skip_mapping >> - extended comment in pci_host_bridge_mappings >> - minor code restructure in construct_dom0 >> - s/.need_p2m_mapping/.need_p2m_hwdom_mapping and related >> callbacks >> - unsigned int i; in pci_host_bridge_mappings >> Since v1: >> - Added better description of why and what needs to be mapped into >> Domain-0's p2m and what doesn't >> - Do not do any mappings for PCI devices while traversing the DT >> - Walk all the bridges and make required mappings in one go >> --- >> xen/arch/arm/domain_build.c | 67 +- >> xen/arch/arm/pci/ecam.c | 14 +++ >> xen/arch/arm/pci/pci-host-common.c | 50 ++ >> xen/arch/arm/pci/pci-host-zynqmp.c | 1 + >> xen/include/asm-arm/pci.h | 10 + >> xen/include/asm-arm/setup.h | 13 ++ >> 6 files changed, 126 insertions(+), 29 deletions(-) >> >> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c >> index f7fcb1400c19..c7d992456ca7 100644 >> --- a/xen/arch/arm/domain_build.c >> +++ b/xen/arch/arm/domain_build.c >> @@ -10,7 +10,6 @@ >> #include >> #include >> #include >>
Re: [xen-4.15-testing test] 166311: regressions - FAIL
On 24.11.2021 03:00, osstest service owner wrote: > flight 166311 xen-4.15-testing real [real] > http://logs.test-lab.xenproject.org/osstest/logs/166311/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. > vs. 166198 At the example of this, I find: Err:1 http://ftp.debian.org/debian buster/main amd64 genisoimage amd64 9:1.1.11-3+b2 Could not connect to cache:3143 (172.16.148.6). - connect (113: No route to host) Err:2 http://ftp.debian.org/debian buster/main amd64 rsync amd64 3.1.3-6 Unable to connect to cache:3143: E: Failed to fetch http://ftp.debian.org/debian/pool/main/c/cdrkit/genisoimage_1.1.11-3+b2_amd64.deb Could not connect to cache:3143 (172.16.148.6). - connect (113: No route to host) E: Failed to fetch http://ftp.debian.org/debian/pool/main/r/rsync/rsync_3.1.3-6_amd64.deb Unable to connect to cache:3143: E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? Network issue (hopefully just a transient one)? Jan > test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail > REGR. vs. 166198 > test-amd64-i386-xl 12 debian-install fail REGR. vs. > 166198 > test-amd64-amd64-xl-credit2 12 debian-install fail REGR. vs. > 166198 > test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. > 166198 > test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail > REGR. vs. 166198 > test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. > vs. 166198 > test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install > fail REGR. vs. 166198 > test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 14 > guest-start/debianhvm.repeat fail REGR. vs. 166198 > test-amd64-i386-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. > vs. 166198 > test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail > REGR. vs. 166198 > test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install running > test-amd64-i386-xl-qemuu-ovmf-amd64 4 syslog-serverrunning > > Tests which did not succeed, but are not blocking: > test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail like > 166198 > test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like > 166198 > test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like > 166198 > test-armhf-armhf-libvirt 16 saverestore-support-checkfail like > 166198 > test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like > 166198 > test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like > 166198 > test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like > 166198 > test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like > 166198 > test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like > 166198 > test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like > 166198 > test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like > 166198 > test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like > 166198 > test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like > 166198 > test-amd64-amd64-libvirt 15 migrate-support-checkfail never > pass > test-amd64-i386-libvirt 15 migrate-support-checkfail never > pass > test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never > pass > test-arm64-arm64-xl 15 migrate-support-checkfail never > pass > test-arm64-arm64-xl 16 saverestore-support-checkfail never > pass > test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never > pass > test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never > pass > test-amd64-i386-xl-pvshim14 guest-start fail never > pass > test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never > pass > test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check > fail never pass > test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never > pass > test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never > pass > test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never > pass > test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never > pass > test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never > pass > test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never > pass > test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never > pass > test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never > pass > test-arm64-arm64-xl-thunderx 15 migrate-support-check
Re: [PATCH v4] xen: detect uninitialized xenbus in xenbus_init
On 23.11.2021 22:07, Stefano Stabellini wrote: > From: Stefano Stabellini > > If the xenstore page hasn't been allocated properly, reading the value > of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return > error. Instead, it will succeed and return zero. Instead of attempting > to xen_remap a bad guest physical address, detect this condition and > return early. > > Note that although a guest physical address of zero for > HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice > and zero has never been validly used in that capacity. > > Also recognize all bits set as an invalid value. > > For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns > above ULONG_MAX should never be passed by the Xen tools to HVM guests > anyway, so check for this condition and return early. > > Cc: sta...@vger.kernel.org > Signed-off-by: Stefano Stabellini Reviewed-by: Jan Beulich
Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain
Hi, Julien! On 23.11.21 18:58, Julien Grall wrote: > Hi, > > On 23/11/2021 16:41, Oleksandr Andrushchenko wrote: >> On 23.11.21 18:12, Julien Grall wrote: >>> On 23/11/2021 06:58, Oleksandr Andrushchenko wrote: unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d) { if ( !has_vpci(d) ) return 0; if ( is_hardware_domain(d) ) { int ret = pci_host_iterate_bridges_and_count(d, vpci_get_num_handlers_cb); return ret < 0 ? 0 : ret; } /* * This is a guest domain: * * 1 for a single emulated host bridge's configuration space. */ return 1; >>> >>> I am afraid that my question stands even with this approach. This patch is >>> only meant to handle the hardware domain, therefore the change seems to be >>> out of context. >>> >>> I would prefer if this change is done separately. >> While I do agree that MSI part and virtual bus topology are not belonging to >> this >> patch I can't agree with the rest: we already have MMIO handlers for guest >> domains >> and we introduce domain_vpci_get_num_mmio_handlers which must also account >> on guests and stay consistent. >> So, despite the patch has "hardware domain" in its name it doesn't mean we >> should >> break guests here. > > We were already registering the handler for guest domain before your patch. > So this is nothing new. > > At the moment, we always allocate an extra 16 slot for IO handlers (see > MAX_IO_HANDLER). So we are not breaking anything. Instead, this is simply a > latent bug. Agree > >> Thus I do think the above is still correct wrt this patch. > > The idea of splitting patch is to separate bug fix from new code. This helps > backporting and review. > > In this case, we don't care about backport (PCI passthrough is no supported) > and the fix a simple enough. So I am not going to insist on splitting to a > separate patch. > > However, this change *must* be explained in the commit message. I will add a dedicated patch to fix that > > Cheers, > Thank you, Oleksandr
Re: [PATCH 1/3] VT-d: prune SAGAW recognition
On 24.11.2021 02:22, Tian, Kevin wrote: >> From: Jan Beulich >> Sent: Tuesday, November 23, 2021 9:40 PM >> >> Bit 0 of the capability register field has become reserved at or before > > Bit 0 of 'SAGAW' in the capability register ... I've changed it, but I thought the use of "field" in the sentence together with the title would be entirely unambiguous. >> spec version 2.2. Treat it as such. Replace the effective open-coding of >> find_first_set_bit(). Adjust local variable types. >> >> Signed-off-by: Jan Beulich >> --- >> Strictly speaking IOMMUs supporting only 3-level tables ought to result >> in guests seeing a suitably reduced physical address width in CPUID. >> And then the same would apply to restrictions resulting from MGAW. > > Yes. I remember there was some old discussion in Qemu community > for whether guest physical addr width should be based on IOMMU > constraints when passthrough device is used. But it didn't go anywhere > (and I cannot find the link...) I've added an item to my todo list. > anyway with above comment fixed: > > Reviewed-by: Kevin Tian Thanks. Jan
Re: [PATCH v4] xenbus: support large messages
Ping? On 04.10.21 11:40, Juergen Gross wrote: Today the implementation of the xenbus protocol in Mini-OS will only allow to transfer the complete message to or from the ring page buffer. This is limiting the maximum message size to lower values as the xenbus protocol normally would allow. Change that by allowing to transfer the xenbus message in chunks as soon as they are available. Avoid crashing Mini-OS in case of illegal data read from the ring buffer. Signed-off-by: Juergen Gross --- V2: - drop redundant if (Samuel Thibault) - move rmb() (Samuel Thibault) V3: - correct notification test (Samuel Thibault) V4: - more memory barriers (Samuel Thibault) --- xenbus/xenbus.c | 210 1 file changed, 122 insertions(+), 88 deletions(-) diff --git a/xenbus/xenbus.c b/xenbus/xenbus.c index 23de61e..b687678 100644 --- a/xenbus/xenbus.c +++ b/xenbus/xenbus.c @@ -29,6 +29,7 @@ #include #include #include +#include #define min(x,y) ({ \ typeof(x) tmpx = (x); \ @@ -46,6 +47,7 @@ static struct xenstore_domain_interface *xenstore_buf; static DECLARE_WAIT_QUEUE_HEAD(xb_waitq); DECLARE_WAIT_QUEUE_HEAD(xenbus_watch_queue); +static __DECLARE_SEMAPHORE_GENERIC(xb_write_sem, 1); xenbus_event_queue xenbus_events; static struct watch { @@ -231,75 +233,103 @@ char *xenbus_wait_for_state_change(const char* path, XenbusState *state, xenbus_ } +static void xenbus_read_data(char *buf, unsigned int len) +{ +unsigned int off = 0; +unsigned int prod, cons; +unsigned int size; + +while (off != len) +{ +wait_event(xb_waitq, xenstore_buf->rsp_prod != xenstore_buf->rsp_cons); + +prod = xenstore_buf->rsp_prod; +cons = xenstore_buf->rsp_cons; +DEBUG("Rsp_cons %d, rsp_prod %d.\n", cons, prod); +size = min(len - off, prod - cons); + +rmb(); /* Make sure data read from ring is ordered with rsp_prod. */ +memcpy_from_ring(xenstore_buf->rsp, buf + off, + MASK_XENSTORE_IDX(cons), size); +off += size; +mb();/* memcpy() and rsp_cons update must not be reordered. */ +xenstore_buf->rsp_cons += size; +mb();/* rsp_cons must be visible before we look at rsp_prod. */ +if (xenstore_buf->rsp_prod - cons >= XENSTORE_RING_SIZE) +notify_remote_via_evtchn(xenbus_evtchn); +} +} + static void xenbus_thread_func(void *ign) { struct xsd_sockmsg msg; -unsigned prod = xenstore_buf->rsp_prod; +char *data; for (;;) { -wait_event(xb_waitq, prod != xenstore_buf->rsp_prod); -while (1) { -prod = xenstore_buf->rsp_prod; -DEBUG("Rsp_cons %d, rsp_prod %d.\n", xenstore_buf->rsp_cons, - xenstore_buf->rsp_prod); -if (xenstore_buf->rsp_prod - xenstore_buf->rsp_cons < sizeof(msg)) -break; -rmb(); -memcpy_from_ring(xenstore_buf->rsp, , - MASK_XENSTORE_IDX(xenstore_buf->rsp_cons), - sizeof(msg)); -DEBUG("Msg len %d, %d avail, id %d.\n", msg.len + sizeof(msg), - xenstore_buf->rsp_prod - xenstore_buf->rsp_cons, msg.req_id); - -if (xenstore_buf->rsp_prod - xenstore_buf->rsp_cons < -sizeof(msg) + msg.len) -break; - -DEBUG("Message is good.\n"); - -if (msg.type == XS_WATCH_EVENT) { -struct xenbus_event *event = malloc(sizeof(*event) + msg.len); -xenbus_event_queue *events = NULL; -char *data = (char*)event + sizeof(*event); -struct watch *watch; - -memcpy_from_ring(xenstore_buf->rsp, data, -MASK_XENSTORE_IDX(xenstore_buf->rsp_cons + sizeof(msg)), -msg.len); - -event->path = data; -event->token = event->path + strlen(event->path) + 1; - -mb(); -xenstore_buf->rsp_cons += msg.len + sizeof(msg); - -for (watch = watches; watch; watch = watch->next) -if (!strcmp(watch->token, event->token)) { -events = watch->events; -break; -} - -if (events) { -event->next = *events; -*events = event; -wake_up(_watch_queue); -} else { -printk("unexpected watch token %s\n", event->token); -free(event); +xenbus_read_data((char *), sizeof(msg)); +DEBUG("Msg len %d, %d avail, id %d.\n", msg.len + sizeof(msg), + xenstore_buf->rsp_prod - xenstore_buf->rsp_cons, msg.req_id); + +if (msg.len > XENSTORE_PAYLOAD_MAX) { +printk("Xenstore violates protocol, message longer
Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices
Hi, Julien! On 23.11.21 19:15, Julien Grall wrote: > Hi, > > On 23/11/2021 16:44, Oleksandr Andrushchenko wrote: >> On 23.11.21 18:05, Julien Grall wrote: >>> Hi Oleksandr, >>> >>> On 23/11/2021 06:31, Oleksandr Andrushchenko wrote: On 22.11.21 19:17, Julien Grall wrote: > Hi, > > On 22/11/2021 16:23, Oleksandr Andrushchenko wrote: >> On 22.11.21 17:29, Julien Grall wrote: >>> I would prefer to go with my way. This can be refined in the future if >>> we find Device-Tree that matches what you wrote. >> Ok, so just to make it clear: >> >a PCI device would always be described as a child of the >> hostbridges. So I would rework the 'if' to also check if the parent type >> is not "pci" > > That's correct. Thank you! Ok, so how about if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, "pci") ) { bool skip = false; /* * If the parent is also a "pci" device, then "linux,pci-domain" * should already be there, so nothing to do then. */ >>> >>> This comment is a bit confusing. >> Do you have something on your mind? > > Yes, explain that we only want to cover hostbridge (see my reply on the > previous answer). I guess this will be explained by the comment to handle_linux_pci_domain below > >>> I think what matter if the parent is a "pci" device, then the current node >>> must not be a hostbridge. So we can skip it. >> By skipping you only mean we do not need to add/assign "linux,pci-domain", >> right? > > Yes. > >>> However... >>> if ( node->parent && dt_device_type_is_equal(node->parent, "pci") ) skip = true; if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) ) { I played with a single if and it looks scary... >>> >>> ... how about introducing an helper that will return true if this node is >>> likely an hostbridge? >> This is yet a single use of such a check: why would we need a helper and >> what would that >> helper do? > > I like splitting code in multiple functions even if they are only called once > because this: > 1) keeps the functions line count small > 2) easier to understand what is the purpose of the check > > In fact, I would actually move the handling for "linux,pci-domain" in a > separate helper. Something like: > > /* > * When PCI passthrough is available we want to keep the > * "linux,pci-domain" in sync for every hostbridge. > * > * Xen may not have a driver for all the hostbridges. So we have > * to write an heuristic to detect whether a device node describes > * an hostbridge. > * > * The current heuristic assumes that a device is an hostbridge > * if the type is "pci" and then parent type is not "pci". > */ > static int handle_linux_pci_domain(struct dt_device *node) > { > if ( !is_pci_passthrough_enabled() ) > return 0; > > if ( !dt_device_type_is_equal(node, "pci") ) > return 0; > > if ( node->parent && dt_device_type_is_equal(node->parent) ) > return 0; > > if ( dt_find_property(node, "linux,pci-domain", NULL ) > return 0; > > /* Allocate and create the linux,pci-domain */ > } > I'm fine with this, will move, thanks > Cheers, > Thank you, Oleksandr
RE: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
Hi, > -Original Message- > From: Xen-devel On Behalf Of > Sebastian Andrzej Siewior > Sent: Monday, November 22, 2021 11:47 PM > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) > > Cc: linux-ker...@vger.kernel.org; Gonglei (Arei) ; > x...@kernel.org; xen-devel@lists.xenproject.org; Peter Zijlstra > ; Ingo Molnar ; Valentin > Schneider ; Boris Ostrovsky > ; Juergen Gross ; Stefano > Stabellini ; Thomas Gleixner ; > Ingo Molnar ; Borislav Petkov ; Dave > Hansen ; H. Peter Anvin > Subject: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be > brought up again. > > From: "Longpeng(Mike)" > > A CPU will not show up in virtualized environment which includes an > Enclave. The VM splits its resources into a primary VM and a Enclave > VM. While the Enclave is active, the hypervisor will ignore all requests > to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. > The kernel will wait up to ten seconds for CPU to show up > (do_boot_cpu()) and then rollback the hotplug state back to > CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is > set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. > > After the Enclave VM terminates, the primary VM can bring up the CPU > again. > > Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. > > [bigeasy: Rewrite commit description.] > > Signed-off-by: Longpeng(Mike) > Signed-off-by: Sebastian Andrzej Siewior > Link: https://lore.kernel.org/r/20210901051143.2752-1- > longpe...@huawei.com > --- > > For XEN: this changes the behaviour as it allows to invoke > cpu_initialize_context() again should it have have earlier. I *think* > this is okay and would to bring up the CPU again should the memory > allocation in cpu_initialize_context() fail. > > kernel/smpboot.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c > index f6bc0bc8a2aab..34958d7fe2c1c 100644 > --- a/kernel/smpboot.c > +++ b/kernel/smpboot.c > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) >*/ > return -EAGAIN; > > + case CPU_UP_PREPARE: > + /* > + * Timeout while waiting for the CPU to show up. Allow to try > + * again later. > + */ > + return 0; > + > default: > > /* Should not happen. Famous last words. */ > -- > 2.33.1 > Reviewed-by: Henry Wang Kind regards, Henry
Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
On 11/23/21 3:50 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: > > >> -Original Message- >> From: Dongli Zhang [mailto:dongli.zh...@oracle.com] >> Sent: Wednesday, November 24, 2021 5:22 AM >> To: Sebastian Andrzej Siewior ; Longpeng (Mike, Cloud >> Infrastructure Service Product Dept.) >> Cc: linux-ker...@vger.kernel.org; Gonglei (Arei) ; >> x...@kernel.org; xen-devel@lists.xenproject.org; Peter Zijlstra >> ; Ingo Molnar ; Valentin Schneider >> ; Boris Ostrovsky ; >> Juergen Gross ; Stefano Stabellini ; >> Thomas Gleixner ; Ingo Molnar ; >> Borislav >> Petkov ; Dave Hansen ; H. Peter >> Anvin >> Subject: Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be >> brought up again. >> >> Tested-by: Dongli Zhang >> >> >> The bug fixed by commit 53fafdbb8b21 ("KVM: x86: switch KVMCLOCK base to >> monotonic raw clock") may leave the cpu_hotplug_state at CPU_UP_PREPARE. As a >> result, to online this CPU again (even after removal) is always failed. >> >> I have tested that this patch works well to workaround the issue, by >> introducing >> either a mdeley(11000) or while(1); to start_secondary(). That is, to online >> the >> same CPU again is successful even after initial do_boot_cpu() failure. >> >> 1. add mdelay(11000) or while(1); to the start_secondary(). >> >> 2. to online CPU is failed at do_boot_cpu(). >> > > Thanks for your testing :) > > Does the cpu4 spin in wait_for_master_cpu() in your case ? I did two tests. TEST 1. I added "mdelay(11000);" as the first line in start_secondary(). Once the issue was encountered, the RIP of CPU=4 was 8c242021 (from QEMU's "info registers -a") which was in the range of wait_for_master_cpu(). # cat /proc/kallsyms | grep 8c2420 8c242010 t wait_for_master_cpu 8c242030 T load_fixmap_gdt 8c242060 T native_write_cr4 8c2420c0 T cr4_init TEST 2. I added "while(true);" as the first line in start_secondary(). Once the issue was encountered, the RIP of CPU=4 was 91654c0a (from QEMU's "info registers -a") which was in the range of start_secondary(). # cat /proc/kallsyms | grep 91654c0 91654c00 t start_secondary Dongli Zhang > >> 3. to online CPU again is failed without this patch. >> >> # echo 1 > /sys/devices/system/cpu/cpu4/online >> -su: echo: write error: Input/output error >> >> 4. to online CPU again is successful with this patch. >> >> Thank you very much! >> >> Dongli Zhang >> >> On 11/22/21 7:47 AM, Sebastian Andrzej Siewior wrote: >>> From: "Longpeng(Mike)" >>> >>> A CPU will not show up in virtualized environment which includes an >>> Enclave. The VM splits its resources into a primary VM and a Enclave >>> VM. While the Enclave is active, the hypervisor will ignore all requests >>> to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. >>> The kernel will wait up to ten seconds for CPU to show up >>> (do_boot_cpu()) and then rollback the hotplug state back to >>> CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is >>> set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. >>> >>> After the Enclave VM terminates, the primary VM can bring up the CPU >>> again. >>> >>> Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. >>> >>> [bigeasy: Rewrite commit description.] >>> >>> Signed-off-by: Longpeng(Mike) >>> Signed-off-by: Sebastian Andrzej Siewior >>> Link: >> https://urldefense.com/v3/__https://lore.kernel.org/r/20210901051143.2752-1 >> -longpeng2@huawei.com__;!!ACWV5N9M2RV99hQ!d4sCCXMQV7ekFwpd21vo1_9K-m5h4VZ-g >> E8Z62PLL58DT4VJ6StH57TR_KpBdbwhBE0$ >>> --- >>> >>> For XEN: this changes the behaviour as it allows to invoke >>> cpu_initialize_context() again should it have have earlier. I *think* >>> this is okay and would to bring up the CPU again should the memory >>> allocation in cpu_initialize_context() fail. >>> >>> kernel/smpboot.c | 7 +++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/kernel/smpboot.c b/kernel/smpboot.c >>> index f6bc0bc8a2aab..34958d7fe2c1c 100644 >>> --- a/kernel/smpboot.c >>> +++ b/kernel/smpboot.c >>> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) >>> */ >>> return -EAGAIN; >>> >>> + case CPU_UP_PREPARE: >>> + /* >>> +* Timeout while waiting for the CPU to show up. Allow to try >>> +* again later. >>> +*/ >>> + return 0; >>> + >>> default: >>> >>> /* Should not happen. Famous last words. */ >>>
Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
On 23.11.21 17:46, Oleksandr wrote: On 20.11.21 04:19, Stefano Stabellini wrote: Hi Stefano, Juergen, all Juergen please see the bottom of the email On Fri, 19 Nov 2021, Oleksandr wrote: On 19.11.21 02:59, Stefano Stabellini wrote: On Tue, 9 Nov 2021, Oleksandr wrote: On 28.10.21 19:37, Stefano Stabellini wrote: Hi Stefano I am sorry for the late response. On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote: From: Oleksandr Tyshchenko The main reason of this change is that unpopulated-alloc code cannot be used in its current form on Arm, but there is a desire to reuse it to avoid wasting real RAM pages for the grant/foreign mappings. The problem is that system "iomem_resource" is used for the address space allocation, but the really unallocated space can't be figured out precisely by the domain on Arm without hypervisor involvement. For example, not all device I/O regions are known by the time domain starts creating grant/foreign mappings. And following the advise from "iomem_resource" we might end up reusing these regions by a mistake. So, the hypervisor which maintains the P2M for the domain is in the best position to provide unused regions of guest physical address space which could be safely used to create grant/foreign mappings. Introduce new helper arch_xen_unpopulated_init() which purpose is to create specific Xen resource based on the memory regions provided by the hypervisor to be used as unused space for Xen scratch pages. If arch doesn't implement arch_xen_unpopulated_init() to initialize Xen resource the default "iomem_resource" will be used. So the behavior on x86 won't be changed. Also fall back to allocate xenballooned pages (steal real RAM pages) if we do not have any suitable resource to work with and as the result we won't be able to provide unpopulated pages. Signed-off-by: Oleksandr Tyshchenko --- Changes RFC -> V2: - new patch, instead of "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space" --- drivers/xen/unpopulated-alloc.c | 89 +++-- include/xen/xen.h | 2 + 2 files changed, 88 insertions(+), 3 deletions(-) diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c index a03dc5b..1f1d8d8 100644 --- a/drivers/xen/unpopulated-alloc.c +++ b/drivers/xen/unpopulated-alloc.c @@ -8,6 +8,7 @@ #include +#include #include #include @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock); static struct page *page_list; static unsigned int list_count; +static struct resource *target_resource; +static struct resource xen_resource = { + .name = "Xen unused space", +}; + +/* + * If arch is not happy with system "iomem_resource" being used for + * the region allocation it can provide it's own view by initializing + * "xen_resource" with unused regions of guest physical address space + * provided by the hypervisor. + */ +int __weak arch_xen_unpopulated_init(struct resource *res) +{ + return -ENOSYS; +} + static int fill_list(unsigned int nr_pages) { struct dev_pagemap *pgmap; - struct resource *res; + struct resource *res, *tmp_res = NULL; void *vaddr; unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION); - int ret = -ENOMEM; + int ret; res = kzalloc(sizeof(*res), GFP_KERNEL); if (!res) @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages) res->name = "Xen scratch"; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; - ret = allocate_resource(_resource, res, + ret = allocate_resource(target_resource, res, alloc_pages * PAGE_SIZE, 0, -1, PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL); if (ret < 0) { @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages) goto err_resource; } + /* + * Reserve the region previously allocated from Xen resource to avoid + * re-using it by someone else. + */ + if (target_resource != _resource) { + tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto err_insert; + } + + tmp_res->name = res->name; + tmp_res->start = res->start; + tmp_res->end = res->end; + tmp_res->flags = res->flags; + + ret = insert_resource(_resource, tmp_res); + if (ret < 0) { + pr_err("Cannot insert IOMEM resource [%llx - %llx]\n", + tmp_res->start, tmp_res->end); + kfree(tmp_res); + goto err_insert; + } + } I am a bit confused.. why do we need to do this? Who could be erroneously re-using the region? Are you saying that the next time allocate_resource is called it could find the same region again? It doesn't seem possible? No, as I understand the allocate_resource() being called for the same root resource won't provide the same region...
Re: [PATCH v4] xen: detect uninitialized xenbus in xenbus_init
On 23.11.21 22:07, Stefano Stabellini wrote: From: Stefano Stabellini If the xenstore page hasn't been allocated properly, reading the value of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return error. Instead, it will succeed and return zero. Instead of attempting to xen_remap a bad guest physical address, detect this condition and return early. Note that although a guest physical address of zero for HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice and zero has never been validly used in that capacity. Also recognize all bits set as an invalid value. For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns above ULONG_MAX should never be passed by the Xen tools to HVM guests anyway, so check for this condition and return early. Cc: sta...@vger.kernel.org Signed-off-by: Stefano Stabellini Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
About support for memory event on AMD CPUs
Dear developers, Short version: is memory event supported on AMD CPUs or is it going to be supported? Long version: Environment: LibVMI 0.15.0 + Xen 4.14.3 on an **AMD CPU** (5950x), 64-bit Debian 11 Dom0. What I am trying to accomplish: register a **memory event** handler, so that I can capture all memory accesses within a specific range issued by any thread, or all memory accesses within any range issued by a specific thread. What I got instead: an error from LibVMI saying "xc_hvm_set_mem_access failed with code: -1". Some investigation: by inspecting the source code of LibVMI, I find the direct cause is one of the libxc functions, 1) xc_set_mem_access or 2) xc_altp2m_set_mem_access, returned error code -1. After some searching, I found someone else having a similar problem [1]. I also noted LibVMI says: > Currently only the Xen Hypervisor provides these features, > and some of these are specifically only available on Intel CPUs However, I can't find the exact confirmation for the availability of memory event on AMD CPUs from https://wiki.xenproject.org. Any suggestion? (Sorry I was sending this email to the wrong place) [1] https://github.com/libvmi/libvmi/pull/709#discussion_r353729777 Untitled YAN
Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
On 22/11/21 16:47, Sebastian Andrzej Siewior wrote: > From: "Longpeng(Mike)" > > A CPU will not show up in virtualized environment which includes an > Enclave. The VM splits its resources into a primary VM and a Enclave > VM. While the Enclave is active, the hypervisor will ignore all requests > to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. > The kernel will wait up to ten seconds for CPU to show up > (do_boot_cpu()) and then rollback the hotplug state back to > CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is > set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. > For my own education, this is talking about *host* CPU hotplug, right? > After the Enclave VM terminates, the primary VM can bring up the CPU > again. > > Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. > > [bigeasy: Rewrite commit description.] > > Signed-off-by: Longpeng(Mike) > Signed-off-by: Sebastian Andrzej Siewior > Link: https://lore.kernel.org/r/20210901051143.2752-1-longpe...@huawei.com > --- > > For XEN: this changes the behaviour as it allows to invoke > cpu_initialize_context() again should it have have earlier. I *think* > this is okay and would to bring up the CPU again should the memory > allocation in cpu_initialize_context() fail. Virt stuff notwithstanding, that looks OK to me. Reviewed-by: Valentin Schneider That said, AFAICT only powerpc makes actual use of the state being set to CPU_UP_PREPARE, it looks to be needless bookkeeping for everyone else (and there's archs that seem happy using only CPU_DEAD / CPU_POST_DEAD). > > kernel/smpboot.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c > index f6bc0bc8a2aab..34958d7fe2c1c 100644 > --- a/kernel/smpboot.c > +++ b/kernel/smpboot.c > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) >*/ > return -EAGAIN; > > + case CPU_UP_PREPARE: > + /* > + * Timeout while waiting for the CPU to show up. Allow to try > + * again later. > + */ > + return 0; > + > default: > > /* Should not happen. Famous last words. */ > -- > 2.33.1
[xen-unstable-smoke test] 166332: tolerable all pass - PUSHED
flight 166332 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/166332/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen a5706b80f42e028c5153fc50734a1e86a99ff9d2 baseline version: xen 74a11c43fd7e074b1f77631b446dd2115eacb9e8 Last test of basis 166312 2021-11-23 13:00:27 Z0 days Testing same since 166326 2021-11-23 18:01:44 Z0 days2 attempts People who touched revisions under test: Ian Jackson jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 74a11c43fd..a5706b80f4 a5706b80f42e028c5153fc50734a1e86a99ff9d2 -> smoke
[xen-4.15-testing test] 166311: regressions - FAIL
flight 166311 xen-4.15-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/166311/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-i386-xl 12 debian-install fail REGR. vs. 166198 test-amd64-amd64-xl-credit2 12 debian-install fail REGR. vs. 166198 test-amd64-amd64-xl-qemuu-ovmf-amd64 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 14 guest-start/debianhvm.repeat fail REGR. vs. 166198 test-amd64-i386-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 166198 test-amd64-i386-xl-qemuu-ovmf-amd64 12 debian-hvm-install running test-amd64-i386-xl-qemuu-ovmf-amd64 4 syslog-serverrunning Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail like 166198 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166198 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 166198 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 166198 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166198 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166198 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 166198 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166198 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 166198 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 166198 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166198 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 166198 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 166198 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15
[xen-4.16-testing baseline test] 166322: FAIL
"Old" tested version had not actually been tested; therefore in this flight we test it, rather than a new candidate. The baseline, if any, is the most recent actually tested revision. flight 166322 xen-4.16-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/166322/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 4 syslog-server running test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 4 syslog-server running test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 STARTING running test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 20 guest-start/debianhvm.repeat running Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemut-win7-amd64 12 windows-install fail baseline untested test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 xen-install fail baseline untested test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail baseline untested test-amd64-amd64-xl-shadow 12 debian-install fail baseline untested test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail baseline untested test-amd64-i386-xl-xsm 12 debian-install fail baseline untested test-amd64-i386-freebsd10-amd64 12 freebsd-install fail baseline untested test-amd64-i386-freebsd10-i386 12 freebsd-install fail baseline untested test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install fail baseline untested test-amd64-i386-qemut-rhel6hvm-intel 12 redhat-install fail baseline untested test-amd64-i386-xl-qemuu-win7-amd64 12 windows-install fail baseline untested test-amd64-i386-xl-qemut-ws16-amd64 12 windows-install fail baseline untested test-amd64-amd64-xl 23 guest-start.2 fail baseline untested test-armhf-armhf-xl-arndale 8 xen-bootfail baseline untested test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop fail baseline untested test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stop fail baseline untested test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stop fail baseline untested test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail baseline untested test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail baseline untested test-armhf-armhf-libvirt 16 saverestore-support-check fail baseline untested test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stop fail baseline untested test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail baseline untested test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail baseline untested test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail baseline untested test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1
RE: [PATCH 3/3] VT-d: conditionalize IOTLB register offset check
> From: Jan Beulich > Sent: Tuesday, November 23, 2021 9:40 PM > > As of commit 6773b1a7584a ("VT-d: Don't assume register-based > invalidation is always supported") we don't (try to) use register based > invalidation anymore when that's not supported by hardware. Hence > there's also no point in the respective check, avoiding pointless IOMMU > initialization failure. After all the spec (version 3.3 at the time of > writing) doesn't say what the respective Extended Capability Register > field would contain in such a case. > > Signed-off-by: Jan Beulich Reviewed-by: Kevin Tian > > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1228,7 +1228,8 @@ int __init iommu_alloc(struct acpi_drhd_ > > if ( cap_fault_reg_offset(iommu->cap) + > cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN > > PAGE_SIZE || > - ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE ) > + (has_register_based_invalidation(iommu) && > + ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE) ) > { > printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n"); > print_iommu_regs(drhd);
RE: [PATCH 2/3] VT-d: correct off-by-1 in fault register range check
> From: Jan Beulich > Sent: Tuesday, November 23, 2021 9:40 PM > > All our present implementation requires is that the range fully fits > in a single page. No need to exclude the case of the last register > extending right to the end of that page. > > Signed-off-by: Jan Beulich Reviewed-by: Kevin Tian > > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1229,7 +1229,7 @@ int __init iommu_alloc(struct acpi_drhd_ > quirk_iommu_caps(iommu); > > if ( cap_fault_reg_offset(iommu->cap) + > - cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN >= > PAGE_SIZE || > + cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN > > PAGE_SIZE || > ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE ) > { > printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n");
RE: [PATCH 1/3] VT-d: prune SAGAW recognition
> From: Jan Beulich > Sent: Tuesday, November 23, 2021 9:40 PM > > Bit 0 of the capability register field has become reserved at or before Bit 0 of 'SAGAW' in the capability register ... > spec version 2.2. Treat it as such. Replace the effective open-coding of > find_first_set_bit(). Adjust local variable types. > > Signed-off-by: Jan Beulich > --- > Strictly speaking IOMMUs supporting only 3-level tables ought to result > in guests seeing a suitably reduced physical address width in CPUID. > And then the same would apply to restrictions resulting from MGAW. Yes. I remember there was some old discussion in Qemu community for whether guest physical addr width should be based on IOMMU constraints when passthrough device is used. But it didn't go anywhere (and I cannot find the link...) anyway with above comment fixed: Reviewed-by: Kevin Tian > > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -356,7 +356,7 @@ static uint64_t domain_pgd_maddr(struct > pgd_maddr = hd->arch.vtd.pgd_maddr; > } > > -/* Skip top levels of page tables for 2- and 3-level DRHDs. */ > +/* Skip top level(s) of page tables for less-than-maximum level DRHDs. */ > for ( agaw = level_to_agaw(4); >agaw != level_to_agaw(nr_pt_levels); >agaw-- ) > @@ -1183,8 +1183,7 @@ static int __init iommu_set_interrupt(st > int __init iommu_alloc(struct acpi_drhd_unit *drhd) > { > struct vtd_iommu *iommu; > -unsigned long sagaw, nr_dom; > -int agaw; > +unsigned int sagaw, agaw = 0, nr_dom; > > iommu = xzalloc(struct vtd_iommu); > if ( iommu == NULL ) > @@ -1237,14 +1236,13 @@ int __init iommu_alloc(struct acpi_drhd_ > return -ENODEV; > } > > -/* Calculate number of pagetable levels: between 2 and 4. */ > +/* Calculate number of pagetable levels: 3 or 4. */ > sagaw = cap_sagaw(iommu->cap); > -for ( agaw = level_to_agaw(4); agaw >= 0; agaw-- ) > -if ( test_bit(agaw, ) ) > -break; > -if ( agaw < 0 ) > +if ( sagaw & 6 ) > +agaw = find_first_set_bit(sagaw & 6); > +if ( !agaw ) > { > -printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %lx\n", > sagaw); > +printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", > sagaw); > print_iommu_regs(drhd); > return -ENODEV; > }
[xen-unstable-smoke test] 166326: regressions - FAIL
flight 166326 xen-unstable-smoke real [real] flight 166330 xen-unstable-smoke real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/166326/ http://logs.test-lab.xenproject.org/osstest/logs/166330/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 166312 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen a5706b80f42e028c5153fc50734a1e86a99ff9d2 baseline version: xen 74a11c43fd7e074b1f77631b446dd2115eacb9e8 Last test of basis 166312 2021-11-23 13:00:27 Z0 days Testing same since 166326 2021-11-23 18:01:44 Z0 days1 attempts People who touched revisions under test: Ian Jackson jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm fail test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit a5706b80f42e028c5153fc50734a1e86a99ff9d2 Author: Ian Jackson Date: Tue Nov 23 16:55:32 2021 + Set version to 4.17: rerun autogen.sh Signed-off-by: Ian Jackson commit 8579d208ab78486717d396cb297f06526fe6b492 Author: Ian Jackson Date: Tue Nov 23 16:54:08 2021 + Set version to 4.17; 4.16 has branched Signed-off-by: Ian Jackson commit 4fe6e73a8cc8f3fa93a7c6a52a9a03b24e51ea18 Author: Ian Jackson Date: Tue Nov 23 16:51:47 2021 + Revert "Config.mk: pin QEMU_UPSTREAM_REVISION (prep for Xen 4.16 RC1)" This branch is unstable again now. This reverts commit c9ce6afbf2d7772f47fc572bb7fc9555724927ed. (qemu changes not included)
Re: [RFC PATCH 0/2] Boot time cpupools
On Tue, 23 Nov 2021, Bertrand Marquis wrote: > Hi Julien, > > > On 23 Nov 2021, at 13:54, Julien Grall wrote: > > > > Hi Stefano, > > > > On 19/11/2021 18:55, Stefano Stabellini wrote: > >> On Fri, 19 Nov 2021, Julien Grall wrote: > >>> I like better Juergen's version. But before agreeing on the command line > >>> , I > >>> would like to understand how Linux is dealing with big.LITTLE today (see > >>> my > >>> question above). > >> I also like Juergen's version better :-) > >> The device tree binding that covers big.LITTLE is the same that covers > >> cpus: Documentation/devicetree/bindings/arm/cpus.yaml > > > > Are you sure? I found > > Documentation/devicetree/bindings/arm/cpu-capacity.txt. > > > >> So basically, you can tell it is a big.LITTLE system because the > >> compatible strings differ between certain cpus, e.g.: > >> cpu@0 { > >> device_type = "cpu"; > >> compatible = "arm,cortex-a15"; > >> reg = <0x0>; > >> }; > >> cpu@1 { > >> device_type = "cpu"; > >> compatible = "arm,cortex-a15"; > >> reg = <0x1>; > >> }; > >> cpu@100 { > >> device_type = "cpu"; > >> compatible = "arm,cortex-a7"; > >> reg = <0x100>; > >> }; > >> cpu@101 { > >> device_type = "cpu"; > >> compatible = "arm,cortex-a7"; > >> reg = <0x101>; > >> }; > > > > I have not found any code in Linux using the compatible. Instead, they all > > seem to use the cpu-map (see drivers/base/arch_topology.c). > > > > Anyway, to me it would be better to parse the Device-Tree over using the > > MIDR. The advantage is we can cover a wide range of cases (you may have > > processor with the same MIDR but different frequency). For now, we could > > create a cpupool per cluster. > > This is a really good idea as this could also work for NUMA systems. > > So reg & ~0xff would give the cluster number ? > > We will probably end up splitting cores into different cpupools even if they > are all the same as there are several CPUs having several clusters but the > same cores (I recall some NXP boards being like that). > > Some device tree also have a cpu-map definition: > cpu-map { > cluster0 { > core0 { > cpu = <_0>; > }; > core1 { > cpu = <_1>; > }; > }; > > cluster1 { > core0 { > cpu = <_0>; > }; > core1 { > cpu = <_1>; > }; > core2 { > cpu = <_2>; > }; > core3 { > cpu = <_3>; > }; > }; > }; > > @stefano: is the cpu-map something standard ? Lots of device trees do have it > in Linux now but I do not recall seeing that on older device trees. > Maybe using cpu-map could be a solution, we could say that device tree > without a cpu-map are not supported. cpu-map is newer than big.LITTLE support in Linux. See for instance commit 4ab328f06. Before cpu-map was introduced, Linux used mostly the MPIDR or sometimes the *machine* compatible string. I did find one example of a board where the cpu compatible string is the same for both big and LITTLE CPUs: arch/arm64/boot/dts/rockchip/rk3368.dtsi. That could be the reason why the cpu compatible string is not used for detecting big.LITTLE. Sorry about that, my earlier suggestion was wrong. Yes I think using cpu-map would be fine. It seems to be widely used by different vendors. (Even if something as generic as cpu-map should really be under the devicetree.org spec, not under Linux, but anyway.) Only one note: you mentioned NUMA. As you can see from Documentation/devicetree/bindings/numa.txt, NUMA doesn't use cpu-map. Instead, it uses numa-node-id and distance-map.
Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
On Tue, 23 Nov 2021, Oleksandr wrote: > > Actually after reading your replies and explanation I changed opinion: I > > think we do need the fallback because Linux cannot really assume that > > it is running on "new Xen" so it definitely needs to keep working if > > CONFIG_XEN_UNPOPULATED_ALLOC is enabled and the extended regions are not > > advertised. > > > > I think we'll have to roll back some of the changes introduced by > > 121f2faca2c0a. That's because even if CONFIG_XEN_UNPOPULATED_ALLOC is > > enabled we cannot know if we can use unpopulated-alloc or whether we > > have to use alloc_xenballooned_pages until we parse the /hypervisor node > > in device tree at runtime. > > Exactly! > > > > > > In short, we cannot switch between unpopulated-alloc and > > alloc_xenballooned_pages at build time, we have to do it at runtime > > (boot time). > > +1 > > > I created a patch to partially revert 121f2faca2c0a "xen/balloon: rename > alloc/free_xenballooned_pages". > > If there is no objections I will add it to V3 (which is almost ready, except > the fallback bits). Could you please tell me what do you think? It makes sense to me. You can add my Reviewed-by. > From dc79bcd425358596d95e715a8bd8b81deaaeb703 Mon Sep 17 00:00:00 2001 > From: Oleksandr Tyshchenko > Date: Tue, 23 Nov 2021 18:14:41 +0200 > Subject: [PATCH] xen/balloon: Bring alloc(free)_xenballooned_pages helpers > back > > This patch rolls back some of the changes introduced by commit > 121f2faca2c0a "xen/balloon: rename alloc/free_xenballooned_pages" > in order to make possible to still allocate xenballooned pages > if CONFIG_XEN_UNPOPULATED_ALLOC is enabled. > > On Arm the unpopulated pages will be allocated on top of extended > regions provided by Xen via device-tree (the subsequent patches > will add required bits to support unpopulated-alloc feature on Arm). > The problem is that extended regions feature has been introduced > into Xen quite recently (during 4.16 release cycle). So this > effectively means that Linux must only use unpopulated-alloc on Arm > if it is running on "new Xen" which advertises these regions. > But, it will only be known after parsing the "hypervisor" node > at boot time, so before doing that we cannot assume anything. > > In order to keep working if CONFIG_XEN_UNPOPULATED_ALLOC is enabled > and the extended regions are not advertised (Linux is running on > "old Xen", etc) we need the fallback to alloc_xenballooned_pages(). > > This way we wouldn't reduce the amount of memory usable (wasting > RAM pages) for any of the external mappings anymore (and eliminate > XSA-300) with "new Xen", but would be still functional ballooning > out RAM pages with "old Xen". > > Also rename alloc(free)_xenballooned_pages to xen_alloc(free)_ballooned_pages. > > Signed-off-by: Oleksandr Tyshchenko > --- > drivers/xen/balloon.c | 20 +--- > include/xen/balloon.h | 3 +++ > include/xen/xen.h | 6 ++ > 3 files changed, 18 insertions(+), 11 deletions(-) > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index ba2ea11..a2c4fc49 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -581,7 +581,6 @@ void balloon_set_new_target(unsigned long target) > } > EXPORT_SYMBOL_GPL(balloon_set_new_target); > > -#ifndef CONFIG_XEN_UNPOPULATED_ALLOC > static int add_ballooned_pages(unsigned int nr_pages) > { > enum bp_state st; > @@ -610,12 +609,12 @@ static int add_ballooned_pages(unsigned int nr_pages) > } > > /** > - * xen_alloc_unpopulated_pages - get pages that have been ballooned out > + * xen_alloc_ballooned_pages - get pages that have been ballooned out > * @nr_pages: Number of pages to get > * @pages: pages returned > * @return 0 on success, error otherwise > */ > -int xen_alloc_unpopulated_pages(unsigned int nr_pages, struct page **pages) > +int xen_alloc_ballooned_pages(unsigned int nr_pages, struct page **pages) > { > unsigned int pgno = 0; > struct page *page; > @@ -652,23 +651,23 @@ int xen_alloc_unpopulated_pages(unsigned int nr_pages, > struct page **pages) > return 0; > out_undo: > mutex_unlock(_mutex); > - xen_free_unpopulated_pages(pgno, pages); > + xen_free_ballooned_pages(pgno, pages); > /* > - * NB: free_xenballooned_pages will only subtract pgno pages, but since > + * NB: xen_free_ballooned_pages will only subtract pgno pages, but since > * target_unpopulated is incremented with nr_pages at the start we need > * to remove the remaining ones also, or accounting will be screwed. > */ > balloon_stats.target_unpopulated -= nr_pages - pgno; > return ret; > } > -EXPORT_SYMBOL(xen_alloc_unpopulated_pages); > +EXPORT_SYMBOL(xen_alloc_ballooned_pages); > > /** > - * xen_free_unpopulated_pages - return pages retrieved with > get_ballooned_pages > + * xen_free_ballooned_pages - return pages retrieved with get_ballooned_pages > * @nr_pages: Number of pages > * @pages:
Re: [PATCH] cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
Tested-by: Dongli Zhang The bug fixed by commit 53fafdbb8b21 ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") may leave the cpu_hotplug_state at CPU_UP_PREPARE. As a result, to online this CPU again (even after removal) is always failed. I have tested that this patch works well to workaround the issue, by introducing either a mdeley(11000) or while(1); to start_secondary(). That is, to online the same CPU again is successful even after initial do_boot_cpu() failure. 1. add mdelay(11000) or while(1); to the start_secondary(). 2. to online CPU is failed at do_boot_cpu(). 3. to online CPU again is failed without this patch. # echo 1 > /sys/devices/system/cpu/cpu4/online -su: echo: write error: Input/output error 4. to online CPU again is successful with this patch. Thank you very much! Dongli Zhang On 11/22/21 7:47 AM, Sebastian Andrzej Siewior wrote: > From: "Longpeng(Mike)" > > A CPU will not show up in virtualized environment which includes an > Enclave. The VM splits its resources into a primary VM and a Enclave > VM. While the Enclave is active, the hypervisor will ignore all requests > to bring up a CPU and this CPU will remain in CPU_UP_PREPARE state. > The kernel will wait up to ten seconds for CPU to show up > (do_boot_cpu()) and then rollback the hotplug state back to > CPUHP_OFFLINE leaving the CPU state in CPU_UP_PREPARE. The CPU state is > set back to CPUHP_TEARDOWN_CPU during the CPU_POST_DEAD stage. > > After the Enclave VM terminates, the primary VM can bring up the CPU > again. > > Allow to bring up the CPU if it is in the CPU_UP_PREPARE state. > > [bigeasy: Rewrite commit description.] > > Signed-off-by: Longpeng(Mike) > Signed-off-by: Sebastian Andrzej Siewior > Link: > https://urldefense.com/v3/__https://lore.kernel.org/r/20210901051143.2752-1-longpeng2@huawei.com__;!!ACWV5N9M2RV99hQ!d4sCCXMQV7ekFwpd21vo1_9K-m5h4VZ-gE8Z62PLL58DT4VJ6StH57TR_KpBdbwhBE0$ > > --- > > For XEN: this changes the behaviour as it allows to invoke > cpu_initialize_context() again should it have have earlier. I *think* > this is okay and would to bring up the CPU again should the memory > allocation in cpu_initialize_context() fail. > > kernel/smpboot.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c > index f6bc0bc8a2aab..34958d7fe2c1c 100644 > --- a/kernel/smpboot.c > +++ b/kernel/smpboot.c > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu) >*/ > return -EAGAIN; > > + case CPU_UP_PREPARE: > + /* > + * Timeout while waiting for the CPU to show up. Allow to try > + * again later. > + */ > + return 0; > + > default: > > /* Should not happen. Famous last words. */ >
[PATCH v4] xen: detect uninitialized xenbus in xenbus_init
From: Stefano Stabellini If the xenstore page hasn't been allocated properly, reading the value of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return error. Instead, it will succeed and return zero. Instead of attempting to xen_remap a bad guest physical address, detect this condition and return early. Note that although a guest physical address of zero for HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice and zero has never been validly used in that capacity. Also recognize all bits set as an invalid value. For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns above ULONG_MAX should never be passed by the Xen tools to HVM guests anyway, so check for this condition and return early. Cc: sta...@vger.kernel.org Signed-off-by: Stefano Stabellini --- Changes in v4: - say "all bits set" instead of INVALID_PFN - improve check Changes in v3: - improve in-code comment - improve check Changes in v2: - add check for ULLONG_MAX (unitialized) - add check for ULONG_MAX #if BITS_PER_LONG == 32 (actual error) - add pr_err error message drivers/xen/xenbus/xenbus_probe.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 94405bb3829e..251b26439733 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -951,6 +951,29 @@ static int __init xenbus_init(void) err = hvm_get_parameter(HVM_PARAM_STORE_PFN, ); if (err) goto out_error; + /* +* Uninitialized hvm_params are zero and return no error. +* Although it is theoretically possible to have +* HVM_PARAM_STORE_PFN set to zero on purpose, in reality it is +* not zero when valid. If zero, it means that Xenstore hasn't +* been properly initialized. Instead of attempting to map a +* wrong guest physical address return error. +* +* Also recognize all bits set as an invalid value. +*/ + if (!v || !~v) { + err = -ENOENT; + goto out_error; + } + /* Avoid truncation on 32-bit. */ +#if BITS_PER_LONG == 32 + if (v > ULONG_MAX) { + pr_err("%s: cannot handle HVM_PARAM_STORE_PFN=%llx > ULONG_MAX\n", + __func__, v); + err = -EINVAL; + goto out_error; + } +#endif xen_store_gfn = (unsigned long)v; xen_store_interface = xen_remap(xen_store_gfn << XEN_PAGE_SHIFT, -- 2.25.1
Re: [PATCH 0/5] xen: cleanup detection of non-essential pv devices
On 11/22/21 3:20 AM, Juergen Gross wrote: On 22.10.21 08:47, Juergen Gross wrote: Today the non-essential pv devices are hard coded in the xenbus driver and this list is lacking multiple entries. This series reworks the detection logic of non-essential devices by adding a flag for that purpose to struct xenbus_driver. Juergen Gross (5): xen: add "not_essential" flag to struct xenbus_driver xen: flag xen_drm_front to be not essential for system boot xen: flag hvc_xen to be not essential for system boot xen: flag pvcalls-front to be not essential for system boot xen: flag xen_snd_front to be not essential for system boot drivers/gpu/drm/xen/xen_drm_front.c | 1 + drivers/input/misc/xen-kbdfront.c | 1 + drivers/tty/hvc/hvc_xen.c | 1 + drivers/video/fbdev/xen-fbfront.c | 1 + drivers/xen/pvcalls-front.c | 1 + drivers/xen/xenbus/xenbus_probe_frontend.c | 14 +++--- include/xen/xenbus.h | 1 + sound/xen/xen_snd_front.c | 1 + 8 files changed, 10 insertions(+), 11 deletions(-) Any further comments? Reviewed-by: Boris Ostrovsky (I'll fix the semicolon typo in the last patch, no need to resend)
RE: [PATCH V2 5/6] net: netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan Sent: Tuesday, November 23, 2021 6:31 AM > > In Isolation VM, all shared memory with host needs to mark visible > to host via hvcall. vmbus_establish_gpadl() has already done it for > netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ > pagebuffer() stills need to be handled. Use DMA API to map/umap > these memory during sending/receiving packet and Hyper-V swiotlb > bounce buffer dma address will be returned. The swiotlb bounce buffer > has been masked to be visible to host during boot up. > > Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation > VM. After calling vmbus_establish_gpadl() which marks these pages visible > to host, map these pages unencrypted addes space via dma_vmap_noncontiguous(). > > Signed-off-by: Tianyu Lan > --- > drivers/net/hyperv/hyperv_net.h | 5 + > drivers/net/hyperv/netvsc.c | 192 +++--- > drivers/net/hyperv/rndis_filter.c | 2 + > include/linux/hyperv.h| 6 + > 4 files changed, 190 insertions(+), 15 deletions(-) > > diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h > index 315278a7cf88..31c77a00d01e 100644 > --- a/drivers/net/hyperv/hyperv_net.h > +++ b/drivers/net/hyperv/hyperv_net.h > @@ -164,6 +164,7 @@ struct hv_netvsc_packet { > u32 total_bytes; > u32 send_buf_index; > u32 total_data_buflen; > + struct hv_dma_range *dma_range; > }; > > #define NETVSC_HASH_KEYLEN 40 > @@ -1074,6 +1075,7 @@ struct netvsc_device { > > /* Receive buffer allocated by us but manages by NetVSP */ > void *recv_buf; > + struct sg_table *recv_sgt; > u32 recv_buf_size; /* allocated bytes */ > struct vmbus_gpadl recv_buf_gpadl_handle; > u32 recv_section_cnt; > @@ -1082,6 +1084,7 @@ struct netvsc_device { > > /* Send buffer allocated by us */ > void *send_buf; > + struct sg_table *send_sgt; > u32 send_buf_size; > struct vmbus_gpadl send_buf_gpadl_handle; > u32 send_section_cnt; > @@ -1731,4 +1734,6 @@ struct rndis_message { > #define RETRY_US_HI 1 > #define RETRY_MAX2000/* >10 sec */ > > +void netvsc_dma_unmap(struct hv_device *hv_dev, > + struct hv_netvsc_packet *packet); > #endif /* _HYPERV_NET_H */ > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > index 396bc1c204e6..9cdc71930830 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void) > return net_device; > } > > +static struct hv_device *netvsc_channel_to_device(struct vmbus_channel > *channel) > +{ > + struct vmbus_channel *primary = channel->primary_channel; > + > + return primary ? primary->device_obj : channel->device_obj; > +} > + > static void free_netvsc_device(struct rcu_head *head) > { > struct netvsc_device *nvdev > = container_of(head, struct netvsc_device, rcu); > + struct hv_device *dev = > + netvsc_channel_to_device(nvdev->chan_table[0].channel); > int i; > > kfree(nvdev->extension); > - vfree(nvdev->recv_buf); > - vfree(nvdev->send_buf); > + > + if (nvdev->recv_sgt) { > + dma_vunmap_noncontiguous(>device, nvdev->recv_buf); > + dma_free_noncontiguous(>device, nvdev->recv_buf_size, > +nvdev->recv_sgt, DMA_FROM_DEVICE); > + } else { > + vfree(nvdev->recv_buf); > + } > + > + if (nvdev->send_sgt) { > + dma_vunmap_noncontiguous(>device, nvdev->send_buf); > + dma_free_noncontiguous(>device, nvdev->send_buf_size, > +nvdev->send_sgt, DMA_TO_DEVICE); > + } else { > + vfree(nvdev->send_buf); > + } > + > kfree(nvdev->send_section_map); > > for (i = 0; i < VRSS_CHANNEL_MAX; i++) { > @@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device, > buf_size = min_t(unsigned int, buf_size, >NETVSC_RECEIVE_BUFFER_SIZE_LEGACY); > > - net_device->recv_buf = vzalloc(buf_size); > + if (hv_isolation_type_snp()) { > + net_device->recv_sgt = > + dma_alloc_noncontiguous(>device, buf_size, > + DMA_FROM_DEVICE, GFP_KERNEL, 0); > + if (!net_device->recv_sgt) { > + pr_err("Fail to allocate recv buffer buf_size %d.\n.", > buf_size); > + ret = -ENOMEM; > + goto cleanup; > + } > + > + net_device->recv_buf = (void > *)net_device->recv_sgt->sgl->dma_address; Use sg_dma_address() macro. > + } else { > + net_device->recv_buf = vzalloc(buf_size); > + } > + > if
RE: [PATCH V2 4/6] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan Sent: Tuesday, November 23, 2021 6:31 AM > > hyperv Isolation VM requires bounce buffer support to copy > data from/to encrypted memory and so enable swiotlb force > mode to use swiotlb bounce buffer for DMA transaction. > > In Isolation VM with AMD SEV, the bounce buffer needs to be > accessed via extra address space which is above shared_gpa_boundary > (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. > The access physical address will be original physical address + > shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP > spec is called virtual top of memory(vTOM). Memory addresses below > vTOM are automatically treated as private while memory above > vTOM is treated as shared. > > Hyper-V initalizes swiotlb bounce buffer and default swiotlb > needs to be disabled. pci_swiotlb_detect_override() and > pci_swiotlb_detect_4gb() enable the default one. To override > the setting, hyperv_swiotlb_detect() needs to run before > these detect functions which depends on the pci_xen_swiotlb_ > init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb > _detect() to keep the order. > > Swiotlb bounce buffer code calls set_memory_decrypted() > to mark bounce buffer visible to host and map it in extra > address space via memremap. Populate the shared_gpa_boundary > (vTOM) via swiotlb_unencrypted_base variable. > > The map function memremap() can't work in the early place > hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes() > in the hyperv_iommu_swiotlb_later_init(). > > Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous > callback to handle request of allocating and mapping noncontiguous dma > memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_ > bypass flag for hv device to use dma direct functions during mapping/unmapping > dma page. > > Signed-off-by: Tianyu Lan > --- > Change since v1: > * Remove hv isolation check in the sev_setup_arch() > > arch/x86/mm/mem_encrypt.c | 1 + > arch/x86/xen/pci-swiotlb-xen.c | 3 +- > drivers/hv/Kconfig | 1 + > drivers/hv/vmbus_drv.c | 6 ++ > drivers/iommu/hyperv-iommu.c | 164 + > include/linux/hyperv.h | 10 ++ > 6 files changed, 184 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > index 35487305d8af..e48c73b3dd41 100644 > --- a/arch/x86/mm/mem_encrypt.c > +++ b/arch/x86/mm/mem_encrypt.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include There is no longer any need to add this #include since code changes to this file in a previous version of the patch are now gone. > > #include "mm_internal.h" > > diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c > index 46df59aeaa06..30fd0600b008 100644 > --- a/arch/x86/xen/pci-swiotlb-xen.c > +++ b/arch/x86/xen/pci-swiotlb-xen.c > @@ -4,6 +4,7 @@ > > #include > #include > +#include > #include > > #include > @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void) > EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); > > IOMMU_INIT_FINISH(pci_xen_swiotlb_detect, > - NULL, > + hyperv_swiotlb_detect, > pci_xen_swiotlb_init, > NULL); > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig > index dd12af20e467..d43b4cd88f57 100644 > --- a/drivers/hv/Kconfig > +++ b/drivers/hv/Kconfig > @@ -9,6 +9,7 @@ config HYPERV > select PARAVIRT > select X86_HV_CALLBACK_VECTOR if X86 > select VMAP_PFN > + select DMA_OPS_BYPASS > help > Select this option to run Linux as a Hyper-V client operating > system. > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > index 392c1ac4f819..32dc193e31cd 100644 > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -33,6 +33,7 @@ > #include > #include > #include > +#include > #include > #include "hyperv_vmbus.h" > > @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t > *type, > return child_device_obj; > } > > +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); > /* > * vmbus_device_register - Register the child device > */ > @@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device > *child_device_obj) > } > hv_debug_add_dev_dir(child_device_obj); > > + child_device_obj->device.dma_ops_bypass = true; > + child_device_obj->device.dma_ops = _iommu_dma_ops; > + child_device_obj->device.dma_mask = _dma_mask; > + child_device_obj->device.dma_parms = _device_obj->dma_parms; > return 0; > > err_kset_unregister: > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c > index e285a220c913..ebcb628e7e8f 100644 > --- a/drivers/iommu/hyperv-iommu.c > +++ b/drivers/iommu/hyperv-iommu.c > @@ -13,14 +13,21 @@ > #include > #include > #include > +#include > +#include > > #include
[xen-unstable-smoke test] 166312: tolerable all pass - PUSHED
flight 166312 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/166312/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 74a11c43fd7e074b1f77631b446dd2115eacb9e8 baseline version: xen be12fcca8b784e456df3adedbffe657d753c5ff9 Last test of basis 166197 2021-11-19 19:00:27 Z3 days Testing same since 166312 2021-11-23 13:00:27 Z0 days1 attempts People who touched revisions under test: Jan Beulich Julien Grall jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git be12fcca8b..74a11c43fd 74a11c43fd7e074b1f77631b446dd2115eacb9e8 -> smoke
Re: [PATCH v3 03/10] xen/arm: introduce new helper parse_static_mem_prop and acquire_static_memory_bank
Hi Penny, On 16/11/2021 06:31, Penny Zheng wrote: Later, we will introduce allocate_static_memory_11 for allocating static memory for direct-map domains, and it will share a lot common codes with the existing allocate_static_memory. In order not to bring a lot of duplicate codes, and also to make the whole code more readable, this commit extracts common codes into two new helpers parse_static_mem_prop and acquire_static_memory_bank. Signed-off-by: Penny Zheng --- v3 changes: - new commit to move the split off of the code outside in a separate patch --- xen/arch/arm/domain_build.c | 100 +++- 1 file changed, 64 insertions(+), 36 deletions(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 7a063f62fe..1dc728e848 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -515,12 +515,69 @@ static bool __init append_static_memory_to_bank(struct domain *d, return true; } +static mfn_t __init acquire_static_memory_bank(struct domain *d, + const __be32 **cell, + u32 addr_cells, u32 size_cells, + paddr_t *pbase, paddr_t *psize) +{ +mfn_t smfn; +int res; + +device_tree_get_reg(cell, addr_cells, size_cells, pbase, psize); +ASSERT(IS_ALIGNED(*pbase, PAGE_SIZE) && IS_ALIGNED(*psize, PAGE_SIZE)); +if ( PFN_DOWN(*psize) > UINT_MAX ) +{ +printk(XENLOG_ERR "%pd: static memory size too large: %#"PRIpaddr, + d, *psize); +return INVALID_MFN; +} + +smfn = maddr_to_mfn(*pbase); +res = acquire_domstatic_pages(d, smfn, PFN_DOWN(*psize), 0); +if ( res ) +{ +printk(XENLOG_ERR + "%pd: failed to acquire static memory: %d.\n", d, res); +return INVALID_MFN; +} + +return smfn; +} + +static int __init parse_static_mem_prop(const struct dt_device_node *node, +u32 *addr_cells, u32 *size_cells, +int *length, const __be32 **cell) +{ +const struct dt_property *prop; + +prop = dt_find_property(node, "xen,static-mem", NULL); +if ( !dt_property_read_u32(node, "#xen,static-mem-address-cells", + addr_cells) ) +{ +printk(XENLOG_ERR + "failed to read \"#xen,static-mem-address-cells\".\n"); +return -EINVAL; +} + +if ( !dt_property_read_u32(node, "#xen,static-mem-size-cells", + size_cells) ) +{ +printk(XENLOG_ERR + "failed to read \"#xen,static-mem-size-cells\".\n"); +return -EINVAL; +} + +*cell = (const __be32 *)prop->value; +*length = prop->length; + +return 0; +} + /* Allocate memory from static memory as RAM for one specific domain d. */ static void __init allocate_static_memory(struct domain *d, struct kernel_info *kinfo, const struct dt_device_node *node) { -const struct dt_property *prop; u32 addr_cells, size_cells, reg_cells; unsigned int nr_banks, gbank, bank = 0; const uint64_t rambase[] = GUEST_RAM_BANK_BASES; @@ -529,24 +586,10 @@ static void __init allocate_static_memory(struct domain *d, u64 tot_size = 0; paddr_t pbase, psize, gsize; mfn_t smfn; -int res; - -prop = dt_find_property(node, "xen,static-mem", NULL); -if ( !dt_property_read_u32(node, "#xen,static-mem-address-cells", - _cells) ) -{ -printk(XENLOG_ERR - "%pd: failed to read \"#xen,static-mem-address-cells\".\n", d); -goto fail; -} +int length; -if ( !dt_property_read_u32(node, "#xen,static-mem-size-cells", - _cells) ) -{ -printk(XENLOG_ERR - "%pd: failed to read \"#xen,static-mem-size-cells\".\n", d); +if ( parse_static_mem_prop(node, _cells, _cells, , ) ) goto fail; -} reg_cells = addr_cells + size_cells; /* @@ -557,29 +600,14 @@ static void __init allocate_static_memory(struct domain *d, gbank = 0; gsize = ramsize[gbank]; kinfo->mem.bank[gbank].start = rambase[gbank]; - -cell = (const __be32 *)prop->value; -nr_banks = (prop->length) / (reg_cells * sizeof (u32)); +nr_banks = length / (reg_cells * sizeof (u32)); for ( ; bank < nr_banks; bank++ ) { -device_tree_get_reg(, addr_cells, size_cells, , ); -ASSERT(IS_ALIGNED(pbase, PAGE_SIZE) && IS_ALIGNED(psize, PAGE_SIZE)); - -if ( PFN_DOWN(psize) > UINT_MAX ) -{ -printk(XENLOG_ERR "%pd: static memory size too large: %#"PRIpaddr, - d, psize); +smfn = acquire_static_memory_bank(d, , addr_cells, size_cells, +
Re: [PATCH v3 02/10] xen/arm: avoid setting XEN_DOMCTL_CDF_iommu when IOMMU off
Hi Penny, On 16/11/2021 06:31, Penny Zheng wrote: From: Stefano Stabellini This commit avoids setting XEN_DOMCTL_CDF_iommu when the IOMMU is absent/disabled, otherwise xen will fail later when handling device assignment. I would explain briefly in the commit message why you want to do device assignment without PCI passthrough. Other than that, the change below is fine with me. Signed-off-by: Penny Zheng Signed-off-by: Stefano Stabellini --- v3 changes: - new commit, split from the original "[PATCH v2 2/6] xen/arm: introduce direct-map for domUs" --- xen/arch/arm/domain_build.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 664c88ebe4..7a063f62fe 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -2996,7 +2996,8 @@ void __init create_domUs(void) panic("Missing property 'cpus' for domain %s\n", dt_node_name(node)); -if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") ) +if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") && + iommu_enabled ) d_cfg.flags |= XEN_DOMCTL_CDF_iommu; if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) ) Cheers, -- Julien Grall
Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices
Hi, On 23/11/2021 16:44, Oleksandr Andrushchenko wrote: On 23.11.21 18:05, Julien Grall wrote: Hi Oleksandr, On 23/11/2021 06:31, Oleksandr Andrushchenko wrote: On 22.11.21 19:17, Julien Grall wrote: Hi, On 22/11/2021 16:23, Oleksandr Andrushchenko wrote: On 22.11.21 17:29, Julien Grall wrote: I would prefer to go with my way. This can be refined in the future if we find Device-Tree that matches what you wrote. Ok, so just to make it clear: >a PCI device would always be described as a child of the hostbridges. So I would rework the 'if' to also check if the parent type is not "pci" That's correct. Thank you! Ok, so how about if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, "pci") ) { bool skip = false; /* * If the parent is also a "pci" device, then "linux,pci-domain" * should already be there, so nothing to do then. */ This comment is a bit confusing. Do you have something on your mind? Yes, explain that we only want to cover hostbridge (see my reply on the previous answer). I think what matter if the parent is a "pci" device, then the current node must not be a hostbridge. So we can skip it. By skipping you only mean we do not need to add/assign "linux,pci-domain", right? Yes. However... if ( node->parent && dt_device_type_is_equal(node->parent, "pci") ) skip = true; if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) ) { I played with a single if and it looks scary... ... how about introducing an helper that will return true if this node is likely an hostbridge? This is yet a single use of such a check: why would we need a helper and what would that helper do? I like splitting code in multiple functions even if they are only called once because this: 1) keeps the functions line count small 2) easier to understand what is the purpose of the check In fact, I would actually move the handling for "linux,pci-domain" in a separate helper. Something like: /* * When PCI passthrough is available we want to keep the * "linux,pci-domain" in sync for every hostbridge. * * Xen may not have a driver for all the hostbridges. So we have * to write an heuristic to detect whether a device node describes * an hostbridge. * * The current heuristic assumes that a device is an hostbridge * if the type is "pci" and then parent type is not "pci". */ static int handle_linux_pci_domain(struct dt_device *node) { if ( !is_pci_passthrough_enabled() ) return 0; if ( !dt_device_type_is_equal(node, "pci") ) return 0; if ( node->parent && dt_device_type_is_equal(node->parent) ) return 0; if ( dt_find_property(node, "linux,pci-domain", NULL ) return 0; /* Allocate and create the linux,pci-domain */ } Cheers, -- Julien Grall
RE: [PATCH V2 1/6] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan Sent: Tuesday, November 23, 2021 6:31 AM > > In Isolation VM with AMD SEV, bounce buffer needs to be accessed via > extra address space which is above shared_gpa_boundary (E.G 39 bit > address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access > physical address will be original physical address + shared_gpa_boundary. > The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of > memory(vTOM). Memory addresses below vTOM are automatically treated as > private while memory above vTOM is treated as shared. > > Expose swiotlb_unencrypted_base for platforms to set unencrypted > memory base offset and platform calls swiotlb_update_mem_attributes() > to remap swiotlb mem to unencrypted address space. memremap() can > not be called in the early stage and so put remapping code into > swiotlb_update_mem_attributes(). Store remap address and use it to copy > data from/to swiotlb bounce buffer. > > Signed-off-by: Tianyu Lan > --- > Change since v1: > * Rework comment in the swiotlb_init_io_tlb_mem() > * Make swiotlb_init_io_tlb_mem() back to return void. > --- > include/linux/swiotlb.h | 6 + > kernel/dma/swiotlb.c| 53 + > 2 files changed, 54 insertions(+), 5 deletions(-) > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index 569272871375..f6c3638255d5 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; > * @end: The end address of the swiotlb memory pool. Used to do a quick > * range check to see if the memory was in fact allocated by this > * API. > + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool > + * may be remapped in the memory encrypted case and store virtual > + * address for bounce buffer operation. > * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and > * @end. For default swiotlb, this is command line adjustable via > * setup_io_tlb_npages. > @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; > struct io_tlb_mem { > phys_addr_t start; > phys_addr_t end; > + void *vaddr; > unsigned long nslabs; > unsigned long used; > unsigned int index; > @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device > *dev) > } > #endif /* CONFIG_DMA_RESTRICTED_POOL */ > > +extern phys_addr_t swiotlb_unencrypted_base; > + > #endif /* __LINUX_SWIOTLB_H */ > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 8e840fbbed7c..c303fdeba82f 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -50,6 +50,7 @@ > #include > #include > > +#include > #include > #include > #include > @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; > > struct io_tlb_mem io_tlb_default_mem; > > +phys_addr_t swiotlb_unencrypted_base; > + > /* > * Max segment that we can provide which (if pages are contingous) will > * not be bounced (unless SWIOTLB_FORCE is set). > @@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val) > return DIV_ROUND_UP(val, IO_TLB_SIZE); > } > > +/* > + * Remap swioltb memory in the unencrypted physical address space > + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP > + * Isolation VMs). > + */ > +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) > +{ > + void *vaddr; > + > + if (swiotlb_unencrypted_base) { > + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; > + > + vaddr = memremap(paddr, bytes, MEMREMAP_WB); > + if (!vaddr) { > + pr_err("Failed to map the unencrypted memory %llx size > %lx.\n", > +paddr, bytes); > + return NULL; > + } > + > + return vaddr; > + } > + > + return phys_to_virt(mem->start); > +} > + > /* > * Early SWIOTLB allocation may be too early to allow an architecture to > * perform the desired operations. This function allows the architecture to > @@ -172,7 +200,14 @@ void __init swiotlb_update_mem_attributes(void) > vaddr = phys_to_virt(mem->start); > bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); > set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); > - memset(vaddr, 0, bytes); > + > + mem->vaddr = swiotlb_mem_remap(mem, bytes); > + if (!mem->vaddr) { > + pr_err("Fail to remap swiotlb mem.\n"); > + return; > + } > + > + memset(mem->vaddr, 0, bytes); > } In the error case, do you want to leave mem->vaddr as NULL? Or is it better to leave it as the virtual address of mem-start? Your code leaves it as NULL. The interaction between swiotlb_update_mem_attributes() and the helper function swiotlb_memo_remap() seems kind of clunky. phys_to_virt() gets called twice, for
[qemu-mainline test] 166307: regressions - FAIL
flight 166307 qemu-mainline real [real] flight 166316 qemu-mainline real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/166307/ http://logs.test-lab.xenproject.org/osstest/logs/166316/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-credit2 12 debian-install fail REGR. vs. 166300 test-amd64-amd64-libvirt-xsm 12 debian-install fail REGR. vs. 166300 Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-credit2 8 xen-bootfail pass in 166316-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 166316 never pass test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 166316 never pass test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166300 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166300 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166300 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 166300 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 166300 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 166300 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166300 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166300 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass version targeted for testing: qemuu
Xen 4.16 branched, commit moratorium lifted, freeze still in force
Xen 4.16 has now branched. The staging-4.16 branch et al exists. The commit moratorium is lifted. However: * 4.16 is in deep freeze now. Nothing should be committed without a release ack. Roger, you have a blanket release ack for your CHANGELOG update(s). * Until 4.16 is released, please avoid significant restructuring or other changes which would hinder fixes being back- or forward- ported between 4.16 and unstable. I am hoping to release early of the week of 29th November. This means *all* non-release-administrative tree changes need to be committed by the end of this coming Friday, the 26th of November. If a significant bug is discovered between now and then, we will have to slip the release. Ian.
Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain
Hi, On 23/11/2021 16:41, Oleksandr Andrushchenko wrote: On 23.11.21 18:12, Julien Grall wrote: On 23/11/2021 06:58, Oleksandr Andrushchenko wrote: unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d) { if ( !has_vpci(d) ) return 0; if ( is_hardware_domain(d) ) { int ret = pci_host_iterate_bridges_and_count(d, vpci_get_num_handlers_cb); return ret < 0 ? 0 : ret; } /* * This is a guest domain: * * 1 for a single emulated host bridge's configuration space. */ return 1; I am afraid that my question stands even with this approach. This patch is only meant to handle the hardware domain, therefore the change seems to be out of context. I would prefer if this change is done separately. While I do agree that MSI part and virtual bus topology are not belonging to this patch I can't agree with the rest: we already have MMIO handlers for guest domains and we introduce domain_vpci_get_num_mmio_handlers which must also account on guests and stay consistent. So, despite the patch has "hardware domain" in its name it doesn't mean we should break guests here. We were already registering the handler for guest domain before your patch. So this is nothing new. At the moment, we always allocate an extra 16 slot for IO handlers (see MAX_IO_HANDLER). So we are not breaking anything. Instead, this is simply a latent bug. Thus I do think the above is still correct wrt this patch. The idea of splitting patch is to separate bug fix from new code. This helps backporting and review. In this case, we don't care about backport (PCI passthrough is no supported) and the fix a simple enough. So I am not going to insist on splitting to a separate patch. However, this change *must* be explained in the commit message. Cheers, -- Julien Grall
[PATCH for-4.17 3/3] Set version to 4.17: rerun autogen.sh
Signed-off-by: Ian Jackson --- configure | 18 +- docs/configure| 18 +- stubdom/configure | 18 +- tools/configure | 18 +- 4 files changed, 36 insertions(+), 36 deletions(-) diff --git a/configure b/configure index 62f6c2d47a..502273b263 100755 --- a/configure +++ b/configure @@ -1,6 +1,6 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.69 for Xen Hypervisor 4.16. +# Generated by GNU Autoconf 2.69 for Xen Hypervisor 4.17. # # Report bugs to . # @@ -579,8 +579,8 @@ MAKEFLAGS= # Identity of this package. PACKAGE_NAME='Xen Hypervisor' PACKAGE_TARNAME='xen' -PACKAGE_VERSION='4.16' -PACKAGE_STRING='Xen Hypervisor 4.16' +PACKAGE_VERSION='4.17' +PACKAGE_STRING='Xen Hypervisor 4.17' PACKAGE_BUGREPORT='xen-de...@lists.xen.org' PACKAGE_URL='https://www.xen.org/' @@ -1236,7 +1236,7 @@ if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures Xen Hypervisor 4.16 to adapt to many kinds of systems. +\`configure' configures Xen Hypervisor 4.17 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... @@ -1302,7 +1302,7 @@ fi if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of Xen Hypervisor 4.16:";; + short | recursive ) echo "Configuration of Xen Hypervisor 4.17:";; esac cat <<\_ACEOF @@ -1403,7 +1403,7 @@ fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -Xen Hypervisor configure 4.16 +Xen Hypervisor configure 4.17 generated by GNU Autoconf 2.69 Copyright (C) 2012 Free Software Foundation, Inc. @@ -1420,7 +1420,7 @@ cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by Xen Hypervisor $as_me 4.16, which was +It was created by Xen Hypervisor $as_me 4.17, which was generated by GNU Autoconf 2.69. Invocation command line was $ $0 $@ @@ -2868,7 +2868,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" -This file was extended by Xen Hypervisor $as_me 4.16, which was +This file was extended by Xen Hypervisor $as_me 4.17, which was generated by GNU Autoconf 2.69. Invocation command line was CONFIG_FILES= $CONFIG_FILES @@ -2922,7 +2922,7 @@ _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/&/g'`" ac_cs_version="\\ -Xen Hypervisor config.status 4.16 +Xen Hypervisor config.status 4.17 configured by $0, generated by GNU Autoconf 2.69, with options \\"\$ac_cs_config\\" diff --git a/docs/configure b/docs/configure index 569bd4c2ff..f93d086e9a 100755 --- a/docs/configure +++ b/docs/configure @@ -1,6 +1,6 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.69 for Xen Hypervisor Documentation 4.16. +# Generated by GNU Autoconf 2.69 for Xen Hypervisor Documentation 4.17. # # Report bugs to . # @@ -579,8 +579,8 @@ MAKEFLAGS= # Identity of this package. PACKAGE_NAME='Xen Hypervisor Documentation' PACKAGE_TARNAME='xen' -PACKAGE_VERSION='4.16' -PACKAGE_STRING='Xen Hypervisor Documentation 4.16' +PACKAGE_VERSION='4.17' +PACKAGE_STRING='Xen Hypervisor Documentation 4.17' PACKAGE_BUGREPORT='xen-de...@lists.xen.org' PACKAGE_URL='https://www.xen.org/' @@ -1224,7 +1224,7 @@ if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures Xen Hypervisor Documentation 4.16 to adapt to many kinds of systems. +\`configure' configures Xen Hypervisor Documentation 4.17 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... @@ -1286,7 +1286,7 @@ fi if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of Xen Hypervisor Documentation 4.16:";; + short | recursive ) echo "Configuration of Xen Hypervisor Documentation 4.17:";; esac cat <<\_ACEOF @@ -1387,7 +1387,7 @@ fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -Xen Hypervisor Documentation configure 4.16 +Xen Hypervisor Documentation configure 4.17 generated by GNU Autoconf 2.69 Copyright (C) 2012 Free Software Foundation, Inc. @@ -1404,7 +1404,7 @@ cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by Xen Hypervisor Documentation
[PATCH for-4.17 2/3] Set version to 4.17; 4.16 has branched
Signed-off-by: Ian Jackson --- README | 10 +- xen/Makefile | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/README b/README index a626e56436..562b808080 100644 --- a/README +++ b/README @@ -1,9 +1,9 @@ -__ ___ __ __ -\ \/ /___ _ __ | || | / |/ /__ __ ___ - \ // _ \ '_ \ | || |_ | | '_ \ _| '__/ __| - / \ __/ | | | |__ _|| | (_) |_| | | (__ -/_/\_\___|_| |_||_|(_)_|\___/ |_| \___| +__ ____ _ +\ \/ /___ _ ___ _ _ __ ___| |_ __ _| |__ | | ___ + \ // _ \ '_ \ _| | | | '_ \/ __| __/ _` | '_ \| |/ _ \ + / \ __/ | | |_| |_| | | | \__ \ || (_| | |_) | | __/ +/_/\_\___|_| |_| \__,_|_| |_|___/\__\__,_|_.__/|_|\___| diff --git a/xen/Makefile b/xen/Makefile index 2fc83f266b..1fd48af7ae 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -1,8 +1,8 @@ # This is the correct place to edit the build version. # All other places this is stored (eg. compile.h) should be autogenerated. export XEN_VERSION = 4 -export XEN_SUBVERSION= 16 -export XEN_EXTRAVERSION ?= -rc$(XEN_VENDORVERSION) +export XEN_SUBVERSION= 17 +export XEN_EXTRAVERSION ?= -unstable$(XEN_VENDORVERSION) export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION) -include xen-version -- 2.20.1
[PATCH for-4.17 1/3] Revert "Config.mk: pin QEMU_UPSTREAM_REVISION (prep for Xen 4.16 RC1)"
This branch is unstable again now. This reverts commit c9ce6afbf2d7772f47fc572bb7fc9555724927ed. --- Config.mk | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Config.mk b/Config.mk index 6be010c7ce..6587c7d626 100644 --- a/Config.mk +++ b/Config.mk @@ -239,7 +239,7 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git endif OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5 -QEMU_UPSTREAM_REVISION ?= b6e539830bf45e2d7a6bd86ddfdf003088b173b0 +QEMU_UPSTREAM_REVISION ?= master MINIOS_UPSTREAM_REVISION ?= 9f09744aa3e5982a083ecf8e9cd2123f477081f9 SEABIOS_UPSTREAM_REVISION ?= rel-1.14.0 -- 2.20.1
[PATCH for-4.16 3/3] Turn off debug by default
Signed-off-by: Ian Jackson --- tools/Rules.mk| 2 +- xen/Kconfig.debug | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/Rules.mk b/tools/Rules.mk index b022da3336..051a5d3555 100644 --- a/tools/Rules.mk +++ b/tools/Rules.mk @@ -25,7 +25,7 @@ CFLAGS_xeninclude = -I$(XEN_INCLUDE) XENSTORE_XENSTORED ?= y # A debug build of tools? -debug ?= y +debug ?= n debug_symbols ?= $(debug) XEN_GOCODE_URL= golang.xenproject.org diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug index fad3050d4f..4419030235 100644 --- a/xen/Kconfig.debug +++ b/xen/Kconfig.debug @@ -3,7 +3,7 @@ menu "Debugging Options" config DEBUG bool "Developer Checks" - default y + default n ---help--- If you say Y here this will enable developer checks such as asserts and extra printks. This option is intended for development purposes -- 2.20.1
[PATCH for-4.16 2/3] SUPPORT.md: Set Release Notes link
Signed-off-by: Ian Jackson --- SUPPORT.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/SUPPORT.md b/SUPPORT.md index 3a34933c89..b1367e2b22 100644 --- a/SUPPORT.md +++ b/SUPPORT.md @@ -9,13 +9,13 @@ for the definitions of the support status levels etc. # Release Support -Xen-Version: unstable +Xen-Version: 4.16 Initial-Release: n/a Supported-Until: TBD Security-Support-Until: Unreleased - not yet security-supported Release Notes -: https://wiki.xenproject.org/wiki/Xen_Project_X.YY_Release_Notes;>RN +: https://wiki.xenproject.org/wiki/Xen_Project_4.16_Release_Notes;>RN # Feature Support -- 2.20.1
[PATCH for-4.16 1/3] Config.mk: switch to named tags (for stable branch)
Signed-off-by: Ian Jackson --- Config.mk | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/Config.mk b/Config.mk index 6be010c7ce..b32a185efc 100644 --- a/Config.mk +++ b/Config.mk @@ -239,17 +239,15 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git endif OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5 -QEMU_UPSTREAM_REVISION ?= b6e539830bf45e2d7a6bd86ddfdf003088b173b0 -MINIOS_UPSTREAM_REVISION ?= 9f09744aa3e5982a083ecf8e9cd2123f477081f9 +QEMU_UPSTREAM_REVISION ?= qemu-xen-4.16.0-rc4 +MINIOS_UPSTREAM_REVISION ?= xen-4.16.0-rc4 SEABIOS_UPSTREAM_REVISION ?= rel-1.14.0 ETHERBOOT_NICS ?= rtl8139 8086100e -QEMU_TRADITIONAL_REVISION ?= 3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 -# Wed Jul 15 10:01:40 2020 +0100 -# qemu-trad: remove Xen path dependencies +QEMU_TRADITIONAL_REVISION ?= xen-4.16.0-rc4 # Specify which qemu-dm to use. This may be `ioemu' to use the old # Mercurial in-tree version, or a local directory, or a git URL. -- 2.20.1
[PATCH for-4.16 0/3] Changes from release technician checklist
I am about to push these to xen.git#staging-4.16. Roger, I have not messed with the furniture in CHANGELOG.md on either staging-4.16 or staging. Ian. Ian Jackson (3): Config.mk: switch to named tags (for stable branch) SUPPORT.md: Set Release Notes link Turn off debug by default Config.mk | 8 +++- SUPPORT.md| 4 ++-- tools/Rules.mk| 2 +- xen/Kconfig.debug | 2 +- 4 files changed, 7 insertions(+), 9 deletions(-) -- 2.20.1
Re: [PATCH V2 3/4] xen/unpopulated-alloc: Add mechanism to use Xen resource
On 20.11.21 04:19, Stefano Stabellini wrote: Hi Stefano, Juergen, all Juergen please see the bottom of the email On Fri, 19 Nov 2021, Oleksandr wrote: On 19.11.21 02:59, Stefano Stabellini wrote: On Tue, 9 Nov 2021, Oleksandr wrote: On 28.10.21 19:37, Stefano Stabellini wrote: Hi Stefano I am sorry for the late response. On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote: From: Oleksandr Tyshchenko The main reason of this change is that unpopulated-alloc code cannot be used in its current form on Arm, but there is a desire to reuse it to avoid wasting real RAM pages for the grant/foreign mappings. The problem is that system "iomem_resource" is used for the address space allocation, but the really unallocated space can't be figured out precisely by the domain on Arm without hypervisor involvement. For example, not all device I/O regions are known by the time domain starts creating grant/foreign mappings. And following the advise from "iomem_resource" we might end up reusing these regions by a mistake. So, the hypervisor which maintains the P2M for the domain is in the best position to provide unused regions of guest physical address space which could be safely used to create grant/foreign mappings. Introduce new helper arch_xen_unpopulated_init() which purpose is to create specific Xen resource based on the memory regions provided by the hypervisor to be used as unused space for Xen scratch pages. If arch doesn't implement arch_xen_unpopulated_init() to initialize Xen resource the default "iomem_resource" will be used. So the behavior on x86 won't be changed. Also fall back to allocate xenballooned pages (steal real RAM pages) if we do not have any suitable resource to work with and as the result we won't be able to provide unpopulated pages. Signed-off-by: Oleksandr Tyshchenko --- Changes RFC -> V2: - new patch, instead of "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide unallocated space" --- drivers/xen/unpopulated-alloc.c | 89 +++-- include/xen/xen.h | 2 + 2 files changed, 88 insertions(+), 3 deletions(-) diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c index a03dc5b..1f1d8d8 100644 --- a/drivers/xen/unpopulated-alloc.c +++ b/drivers/xen/unpopulated-alloc.c @@ -8,6 +8,7 @@ #include +#include #include #include @@ -15,13 +16,29 @@ static DEFINE_MUTEX(list_lock); static struct page *page_list; static unsigned int list_count; +static struct resource *target_resource; +static struct resource xen_resource = { + .name = "Xen unused space", +}; + +/* + * If arch is not happy with system "iomem_resource" being used for + * the region allocation it can provide it's own view by initializing + * "xen_resource" with unused regions of guest physical address space + * provided by the hypervisor. + */ +int __weak arch_xen_unpopulated_init(struct resource *res) +{ + return -ENOSYS; +} + static int fill_list(unsigned int nr_pages) { struct dev_pagemap *pgmap; - struct resource *res; + struct resource *res, *tmp_res = NULL; void *vaddr; unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION); - int ret = -ENOMEM; + int ret; res = kzalloc(sizeof(*res), GFP_KERNEL); if (!res) @@ -30,7 +47,7 @@ static int fill_list(unsigned int nr_pages) res->name = "Xen scratch"; res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; - ret = allocate_resource(_resource, res, + ret = allocate_resource(target_resource, res, alloc_pages * PAGE_SIZE, 0, -1, PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL); if (ret < 0) { @@ -38,6 +55,31 @@ static int fill_list(unsigned int nr_pages) goto err_resource; } + /* +* Reserve the region previously allocated from Xen resource to avoid +* re-using it by someone else. +*/ + if (target_resource != _resource) { + tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto err_insert; + } + + tmp_res->name = res->name; + tmp_res->start = res->start; + tmp_res->end = res->end; + tmp_res->flags = res->flags; + + ret = insert_resource(_resource, tmp_res); + if (ret < 0) { + pr_err("Cannot insert IOMEM resource [%llx - %llx]\n", + tmp_res->start, tmp_res->end); + kfree(tmp_res); + goto err_insert; + } + } I am a bit confused.. why do we need to do this? Who could be erroneously re-using the region? Are you saying that the next time allocate_resource is called it could find the same
Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices
Hi, Julien! On 23.11.21 18:05, Julien Grall wrote: > Hi Oleksandr, > > On 23/11/2021 06:31, Oleksandr Andrushchenko wrote: >> >> >> On 22.11.21 19:17, Julien Grall wrote: >>> Hi, >>> >>> On 22/11/2021 16:23, Oleksandr Andrushchenko wrote: On 22.11.21 17:29, Julien Grall wrote: > I would prefer to go with my way. This can be refined in the future if we > find Device-Tree that matches what you wrote. Ok, so just to make it clear: >a PCI device would always be described as a child of the hostbridges. So I would rework the 'if' to also check if the parent type is not "pci" >>> >>> That's correct. Thank you! >> Ok, so how about >> if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, >> "pci") ) >> { >> bool skip = false; >> >> /* >> * If the parent is also a "pci" device, then "linux,pci-domain" >> * should already be there, so nothing to do then. >> */ > > This comment is a bit confusing. Do you have something on your mind? > I think what matter if the parent is a "pci" device, then the current node > must not be a hostbridge. So we can skip it. By skipping you only mean we do not need to add/assign "linux,pci-domain", right? > However... > >> if ( node->parent && dt_device_type_is_equal(node->parent, "pci") ) >> skip = true; >> >> if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) ) >> { >> I played with a single if and it looks scary... > > ... how about introducing an helper that will return true if this node is > likely an hostbridge? This is yet a single use of such a check: why would we need a helper and what would that helper do? > > Cheers, > Thank you, Oleksandr
Re: [PATCH v6 4/7] xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m
Hi Oleksandr, On 05/11/2021 06:33, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko PCI host bridges are special devices in terms of implementing PCI passthrough. According to [1] the current implementation depends on Domain-0 to perform the initialization of the relevant PCI host bridge hardware and perform PCI device enumeration. In order to achieve that one of the required changes is to not map all the memory ranges in map_range_to_domain as we traverse the device tree on startup and perform some additional checks if the range needs to be mapped to Domain-0. The generic PCI host controller device tree binding says [2]: - ranges: As described in IEEE Std 1275-1994, but must provide at least a definition of non-prefetchable memory. One or both of prefetchable Memory and IO Space may also be provided. - reg : The Configuration Space base address and size, as accessed from the parent bus. The base address corresponds to the first bus in the "bus-range" property. If no "bus-range" is specified, this will be bus 0 (the default). From the above none of the memory ranges from the "ranges" property needs to be mapped to Domain-0 at startup as MMIO mapping is going to be handled dynamically by vPCI as we assign PCI devices, e.g. each device assigned to Domain-0/guest will have its MMIOs mapped/unmapped as needed by Xen. The "reg" property covers not only ECAM space, but may also have other then the configuration memory ranges described, for example [3]: - reg: Should contain rc_dbi, config registers location and length. - reg-names: Must include the following entries: "rc_dbi": controller configuration registers; "config": PCIe configuration space registers. This patch makes it possible to not map all the ranges from the "ranges" property and also ECAM from the "reg". All the rest from the "reg" property still needs to be mapped to Domain-0, so the PCI host bridge remains functional in Domain-0. The commit message is explaining the problematic quite well (thanks for that). I think it also wants to explain the approach taken (the fact that we are deferring the mapping for hostbridges). [1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00777.html [2] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt [3] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/hisilicon-pcie.txt Signed-off-by: Oleksandr Andrushchenko --- Since v5: - remove some need_mapping local variables - use own_device in handle_device - add __init for pci_ecam_need_p2m_hwdom_mapping - make pci_host_bridge_mappings use p2m_mmio_direct_dev directly Since v4: - update skip_mapping comment - add comment why we need to map interrupts to Dom0 Since v3: - pass struct map_range_data to map_dt_irq_to_domain - remove redundant check from map_range_to_domain - fix handle_device's .skip_mapping Since v2: - removed check in map_range_to_domain for PCI_DEV and moved it to handle_device, so the code is simpler - s/map_pci_bridge/skip_mapping - extended comment in pci_host_bridge_mappings - minor code restructure in construct_dom0 - s/.need_p2m_mapping/.need_p2m_hwdom_mapping and related callbacks - unsigned int i; in pci_host_bridge_mappings Since v1: - Added better description of why and what needs to be mapped into Domain-0's p2m and what doesn't - Do not do any mappings for PCI devices while traversing the DT - Walk all the bridges and make required mappings in one go --- xen/arch/arm/domain_build.c| 67 +- xen/arch/arm/pci/ecam.c| 14 +++ xen/arch/arm/pci/pci-host-common.c | 50 ++ xen/arch/arm/pci/pci-host-zynqmp.c | 1 + xen/include/asm-arm/pci.h | 10 + xen/include/asm-arm/setup.h| 13 ++ 6 files changed, 126 insertions(+), 29 deletions(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index f7fcb1400c19..c7d992456ca7 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -10,7 +10,6 @@ #include #include #include -#include You are still using helpers defined by this header. So I would keep the include even if it may have been included by another one. #include #include #include @@ -51,12 +50,6 @@ static int __init parse_dom0_mem(const char *s) } custom_param("dom0_mem", parse_dom0_mem); -struct map_range_data -{ -struct domain *d; -p2m_type_t p2mt; -}; - /* Override macros from asm/page.h to make them work with mfn_t */ #undef virt_to_mfn #define virt_to_mfn(va) _mfn(__virt_to_mfn(va)) @@ -1676,10 +1669,10 @@ static int __init map_dt_irq_to_domain(const struct dt_device_node *dev, const struct dt_irq *dt_irq, void *data) { -struct domain *d = data; +
Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain
Hi, Julien! On 23.11.21 18:12, Julien Grall wrote: > > > On 23/11/2021 06:58, Oleksandr Andrushchenko wrote: >> Hi, Julien! > > Hi Oleksandr, > >> On 22.11.21 19:36, Julien Grall wrote: >>> On 18/11/2021 10:46, Oleksandr Andrushchenko wrote: On 18.11.21 09:27, Oleksandr Andrushchenko wrote: >>> + unsigned int count; >>> + >>> + if ( is_hardware_domain(d) ) >>> + /* For each PCI host bridge's configuration space. */ >>> + count = pci_host_get_num_bridges(); >> This first part makes sense to me. But... >> >>> + else >> ... I don't understand how the else is related to this commit. Can you >> clarify it? >> >>> + /* >>> + * There's a single MSI-X MMIO handler that deals with both PBA >>> + * and MSI-X tables per each PCI device being passed through. >>> + * Maximum number of supported devices is 32 as virtual bus >>> + * topology emulates the devices as embedded endpoints. >>> + * +1 for a single emulated host bridge's configuration space. >>> + */ >>> + count = 1; >>> +#ifdef CONFIG_HAS_PCI_MSI >>> + count += 32; >> Surely, this is a decision that is based on other factor in the vPCI >> code. So can use a define and avoid hardcoding the number? > Well, in the later series [1] this is defined via PCI_SLOT(~0) + 1 and > there is no dedicated > constant for that. I can use the same here, e.g. s/32/PCI_SLOT(~0) + 1 >>> >>> I would prefer if we introduce a new constant for that. This makes easier >>> to update the code if we decide to increase the number of virtual devices. >>> >>> However, I am still not sure how the 'else' part is related to this commit. >>> Can you please clarify it? >> Well, yes, this is too early for this patch to introduce some future >> knowledge, so I'll instead have: >> >> unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d) >> { >> if ( !has_vpci(d) ) >> return 0; >> >> if ( is_hardware_domain(d) ) >> { >> int ret = pci_host_iterate_bridges_and_count(d, >> vpci_get_num_handlers_cb); >> >> return ret < 0 ? 0 : ret; >> } >> >> /* >> * This is a guest domain: >> * >> * 1 for a single emulated host bridge's configuration space. >> */ >> return 1; > > I am afraid that my question stands even with this approach. This patch is > only meant to handle the hardware domain, therefore the change seems to be > out of context. > > I would prefer if this change is done separately. While I do agree that MSI part and virtual bus topology are not belonging to this patch I can't agree with the rest: we already have MMIO handlers for guest domains and we introduce domain_vpci_get_num_mmio_handlers which must also account on guests and stay consistent. So, despite the patch has "hardware domain" in its name it doesn't mean we should break guests here. Thus I do think the above is still correct wrt this patch. > > Cheers, > Thank you, Oleksandr
[PATCH for-4.16 v3] CHANGELOG: add missing entries for work during the 4.16 release cycle
Document some of the relevant changes during the 4.16 release cycle. Signed-off-by: Roger Pau Monné Release-Acked-by: Ian Jackson --- Changes since v2: - Reword the x86 page table API change. Changes since v1: - Add qemu-traditional and pvgrub notes. - Expand vPMU support to note it's limited. - xenstore library API changes. - xencall2L addition. - gitlab CI changes. - dom0less improvements. - vGIC fixes. - New Arm features. - OP-TEE fixes. - Arm64 heterogeneous CPU support. --- Cc: Alistair Francis Cc: Andrew Cooper Cc: Anthony PERARD Cc: Anthony Perard Cc: Bertrand Marquis Cc: Bob Eshleman Cc: Christian Lindig Cc: Christopher Clark Cc: Daniel De Graaf Cc: Dario Faggioli Cc: David Scott Cc: Doug Goldstein Cc: Elena Ufimtseva Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Josh Whitehead Cc: Juergen Gross Cc: Julien Grall Cc: Jun Nakajima Cc: Kevin Tian Cc: Konrad Rzeszutek Wilk Cc: Marek Marczykowski-Górecki Cc: Meng Xu Cc: Nick Rosbrook Cc: Paul Durrant Cc: Quan Xu Cc: Rahul Singh Cc: Roger Pau Monné Cc: Ross Lagerwall Cc: Samuel Thibault Cc: Shriram Rajagopalan Cc: Stefano Stabellini Cc: Stewart Hildebrand Cc: Tamas K Lengyel Cc: Tim Deegan Cc: Volodymyr Babchuk Cc: Wei Liu --- CHANGELOG.md | 26 ++ 1 file changed, 26 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index ad1a8c2bc2..d312ddf627 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,6 +21,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) - qemu-traditional based device models (both, qemu-traditional and ioemu-stubdom) will no longer be built per default. In order to be able to use those, configure needs to be called with "--enable-qemu-traditional" as parameter. + - Fixes for credit2 scheduler stability in corner case conditions. + - Ongoing improvements in the hypervisor build system. + - vtpmmgr miscellaneous fixes in preparation for TPM 2.0 support. + - 32bit PV guests only supported in shim mode. + - Improved PVH dom0 debug key handling. + - Fix booting on some Intel systems without a PIT (i8254). + - Do not build qemu-traditional or pvgrub by default. + - Cleanup of the xenstore library interface. + - Fix truncation of return value from xencall2 by introducing a new helper + that returns a long instead. + - Fix system register accesses on Arm to use the proper 32/64bit access size. + - Various fixes for Arm OP-TEE mediator. + - Switch to domheap for Xen page tables. + +### Added + - 32bit Arm builds to the gitlab-ci automated tests. + - x86 full system tests to the gitlab-ci automated tests. + - Arm limited vPMU support for guests. + - Static physical memory allocation for dom0less on arm64. + - dom0less EFI support on arm64. + - GICD_ICPENDR register handling in vGIC emulation to support Zephyr OS. + - CPU feature leveling on arm64 platform with heterogeneous cores. + - Report unpopulated memory regions safe to use for external mappings, Arm and + device tree only. + - Support of generic DT IOMMU bindings for Arm SMMU v2. + - Limit grant table version on a per-domain basis. ## [4.15.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.15.0) - 2021-04-08 -- 2.33.0
Xen 4.16 RC4
Xen 4.16 RC4 is now available. It is available from git: git clone https://xenbits.xenproject.org/git-http/xen.git -b 4.16.0-rc4 For your convenience a tarball is available: https://downloads.xenproject.org/release/xen/4.16.0-rc4/xen-4.16.0-rc4.tar.gz https://downloads.xenproject.org/release/xen/4.16.0-rc4/xen-4.16.0-rc4.tar.gz.sig This may be the last RC before release. Ian. Xen 4.16 Release Manager
Re: [PATCH v6 3/7] xen/arm: setup MMIO range trap handlers for hardware domain
On 23/11/2021 06:58, Oleksandr Andrushchenko wrote: Hi, Julien! Hi Oleksandr, On 22.11.21 19:36, Julien Grall wrote: On 18/11/2021 10:46, Oleksandr Andrushchenko wrote: On 18.11.21 09:27, Oleksandr Andrushchenko wrote: + unsigned int count; + + if ( is_hardware_domain(d) ) + /* For each PCI host bridge's configuration space. */ + count = pci_host_get_num_bridges(); This first part makes sense to me. But... + else ... I don't understand how the else is related to this commit. Can you clarify it? + /* + * There's a single MSI-X MMIO handler that deals with both PBA + * and MSI-X tables per each PCI device being passed through. + * Maximum number of supported devices is 32 as virtual bus + * topology emulates the devices as embedded endpoints. + * +1 for a single emulated host bridge's configuration space. + */ + count = 1; +#ifdef CONFIG_HAS_PCI_MSI + count += 32; Surely, this is a decision that is based on other factor in the vPCI code. So can use a define and avoid hardcoding the number? Well, in the later series [1] this is defined via PCI_SLOT(~0) + 1 and there is no dedicated constant for that. I can use the same here, e.g. s/32/PCI_SLOT(~0) + 1 I would prefer if we introduce a new constant for that. This makes easier to update the code if we decide to increase the number of virtual devices. However, I am still not sure how the 'else' part is related to this commit. Can you please clarify it? Well, yes, this is too early for this patch to introduce some future knowledge, so I'll instead have: unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d) { if ( !has_vpci(d) ) return 0; if ( is_hardware_domain(d) ) { int ret = pci_host_iterate_bridges_and_count(d, vpci_get_num_handlers_cb); return ret < 0 ? 0 : ret; } /* * This is a guest domain: * * 1 for a single emulated host bridge's configuration space. */ return 1; I am afraid that my question stands even with this approach. This patch is only meant to handle the hardware domain, therefore the change seems to be out of context. I would prefer if this change is done separately. Cheers, -- Julien Grall
Re: [PATCH v6 2/7] xen/arm: add pci-domain for disabled devices
Hi Oleksandr, On 23/11/2021 06:31, Oleksandr Andrushchenko wrote: On 22.11.21 19:17, Julien Grall wrote: Hi, On 22/11/2021 16:23, Oleksandr Andrushchenko wrote: On 22.11.21 17:29, Julien Grall wrote: I would prefer to go with my way. This can be refined in the future if we find Device-Tree that matches what you wrote. Ok, so just to make it clear: >a PCI device would always be described as a child of the hostbridges. So I would rework the 'if' to also check if the parent type is not "pci" That's correct. Thank you! Ok, so how about if ( is_pci_passthrough_enabled() && dt_device_type_is_equal(node, "pci") ) { bool skip = false; /* * If the parent is also a "pci" device, then "linux,pci-domain" * should already be there, so nothing to do then. */ This comment is a bit confusing. I think what matter if the parent is a "pci" device, then the current node must not be a hostbridge. So we can skip it. However... if ( node->parent && dt_device_type_is_equal(node->parent, "pci") ) skip = true; if ( !skip && !dt_find_property(node, "linux,pci-domain", NULL) ) { I played with a single if and it looks scary... ... how about introducing an helper that will return true if this node is likely an hostbridge? Cheers, -- Julien Grall
Re: [PATCH v4 05/11] vpci/header: implement guest BAR register handlers
Hi, Roger! On 19.11.21 15:02, Jan Beulich wrote: > On 19.11.2021 13:54, Oleksandr Andrushchenko wrote: >> On 19.11.21 14:49, Jan Beulich wrote: >>> On 19.11.2021 13:46, Oleksandr Andrushchenko wrote: On 19.11.21 14:37, Jan Beulich wrote: > On 19.11.2021 13:10, Oleksandr Andrushchenko wrote: >> On 19.11.21 13:58, Jan Beulich wrote: >>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -408,6 +408,48 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg, pci_conf_write32(pdev->sbdf, reg, val); } +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg, +uint32_t val, void *data) +{ +struct vpci_bar *bar = data; +bool hi = false; + +if ( bar->type == VPCI_BAR_MEM64_HI ) +{ +ASSERT(reg > PCI_BASE_ADDRESS_0); +bar--; +hi = true; +} +else +{ +val &= PCI_BASE_ADDRESS_MEM_MASK; +val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32 + : PCI_BASE_ADDRESS_MEM_TYPE_64; +val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0; +} + +bar->guest_addr &= ~(0xull << (hi ? 32 : 0)); +bar->guest_addr |= (uint64_t)val << (hi ? 32 : 0); + +bar->guest_addr &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK; +} + +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg, + void *data) +{ +const struct vpci_bar *bar = data; +bool hi = false; + +if ( bar->type == VPCI_BAR_MEM64_HI ) +{ +ASSERT(reg > PCI_BASE_ADDRESS_0); +bar--; +hi = true; +} + +return bar->guest_addr >> (hi ? 32 : 0); >>> I'm afraid "guest_addr" then isn't the best name; maybe "guest_val"? >>> This would make more obvious that there is a meaningful difference >>> from "addr" besides the guest vs host aspect. >> I am not sure I can agree here: >> bar->addr and bar->guest_addr make it clear what are these while >> bar->addr and bar->guest_val would make someone go look for >> additional information about what that val is for. > Feel free to replace "val" with something more suitable. "guest_bar" > maybe? The value definitely is not an address, so "addr" seems > inappropriate / misleading to me. This is a guest's view on the BAR's address. So to me it is still guest_addr >>> It's a guest's view on the BAR, not just the address. Or else you couldn't >>> simply return the value here without folding in the correct low bits. >> I agree with this this respect as it is indeed address + lower bits. >> How about guest_bar_val then? So it reflects its nature, e.g. the value >> of the BAR as seen by the guest. > Gets a little longish for my taste. I for one wouldn't mind it be just > "guest". In the end Roger has the final say here anyway. What is your preference on naming here? 1. guest_addr 2. guest_val 3. guest_bar_val 4. guest > > Jan > Thank you in advance, Oleksandr
Re: [RFC PATCH 0/2] Boot time cpupools
Hi Julien, > On 23 Nov 2021, at 13:54, Julien Grall wrote: > > Hi Stefano, > > On 19/11/2021 18:55, Stefano Stabellini wrote: >> On Fri, 19 Nov 2021, Julien Grall wrote: >>> I like better Juergen's version. But before agreeing on the command line , I >>> would like to understand how Linux is dealing with big.LITTLE today (see my >>> question above). >> I also like Juergen's version better :-) >> The device tree binding that covers big.LITTLE is the same that covers >> cpus: Documentation/devicetree/bindings/arm/cpus.yaml > > Are you sure? I found Documentation/devicetree/bindings/arm/cpu-capacity.txt. > >> So basically, you can tell it is a big.LITTLE system because the >> compatible strings differ between certain cpus, e.g.: >> cpu@0 { >> device_type = "cpu"; >> compatible = "arm,cortex-a15"; >> reg = <0x0>; >> }; >> cpu@1 { >> device_type = "cpu"; >> compatible = "arm,cortex-a15"; >> reg = <0x1>; >> }; >> cpu@100 { >> device_type = "cpu"; >> compatible = "arm,cortex-a7"; >> reg = <0x100>; >> }; >> cpu@101 { >> device_type = "cpu"; >> compatible = "arm,cortex-a7"; >> reg = <0x101>; >> }; > > I have not found any code in Linux using the compatible. Instead, they all > seem to use the cpu-map (see drivers/base/arch_topology.c). > > Anyway, to me it would be better to parse the Device-Tree over using the > MIDR. The advantage is we can cover a wide range of cases (you may have > processor with the same MIDR but different frequency). For now, we could > create a cpupool per cluster. This is a really good idea as this could also work for NUMA systems. So reg & ~0xff would give the cluster number ? We will probably end up splitting cores into different cpupools even if they are all the same as there are several CPUs having several clusters but the same cores (I recall some NXP boards being like that). Some device tree also have a cpu-map definition: cpu-map { cluster0 { core0 { cpu = <_0>; }; core1 { cpu = <_1>; }; }; cluster1 { core0 { cpu = <_0>; }; core1 { cpu = <_1>; }; core2 { cpu = <_2>; }; core3 { cpu = <_3>; }; }; }; @stefano: is the cpu-map something standard ? Lots of device trees do have it in Linux now but I do not recall seeing that on older device trees. Maybe using cpu-map could be a solution, we could say that device tree without a cpu-map are not supported. Cheers Bertrand > > Cheers, > > -- > Julien Grall
[PATCH V2 5/6] net: netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation VM. After calling vmbus_establish_gpadl() which marks these pages visible to host, map these pages unencrypted addes space via dma_vmap_noncontiguous(). Signed-off-by: Tianyu Lan --- drivers/net/hyperv/hyperv_net.h | 5 + drivers/net/hyperv/netvsc.c | 192 +++--- drivers/net/hyperv/rndis_filter.c | 2 + include/linux/hyperv.h| 6 + 4 files changed, 190 insertions(+), 15 deletions(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..31c77a00d01e 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + struct sg_table *recv_sgt; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + struct sg_table *send_sgt; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..9cdc71930830 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void) return net_device; } +static struct hv_device *netvsc_channel_to_device(struct vmbus_channel *channel) +{ + struct vmbus_channel *primary = channel->primary_channel; + + return primary ? primary->device_obj : channel->device_obj; +} + static void free_netvsc_device(struct rcu_head *head) { struct netvsc_device *nvdev = container_of(head, struct netvsc_device, rcu); + struct hv_device *dev = + netvsc_channel_to_device(nvdev->chan_table[0].channel); int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_sgt) { + dma_vunmap_noncontiguous(>device, nvdev->recv_buf); + dma_free_noncontiguous(>device, nvdev->recv_buf_size, + nvdev->recv_sgt, DMA_FROM_DEVICE); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_sgt) { + dma_vunmap_noncontiguous(>device, nvdev->send_buf); + dma_free_noncontiguous(>device, nvdev->send_buf_size, + nvdev->send_sgt, DMA_TO_DEVICE); + } else { + vfree(nvdev->send_buf); + } + kfree(nvdev->send_section_map); for (i = 0; i < VRSS_CHANNEL_MAX; i++) { @@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device, buf_size = min_t(unsigned int, buf_size, NETVSC_RECEIVE_BUFFER_SIZE_LEGACY); - net_device->recv_buf = vzalloc(buf_size); + if (hv_isolation_type_snp()) { + net_device->recv_sgt = + dma_alloc_noncontiguous(>device, buf_size, + DMA_FROM_DEVICE, GFP_KERNEL, 0); + if (!net_device->recv_sgt) { + pr_err("Fail to allocate recv buffer buf_size %d.\n.", buf_size); + ret = -ENOMEM; + goto cleanup; + } + + net_device->recv_buf = (void *)net_device->recv_sgt->sgl->dma_address; + } else { + net_device->recv_buf = vzalloc(buf_size); + } + if (!net_device->recv_buf) { netdev_err(ndev, "unable to allocate receive buffer of size %u\n", @@ -357,8 +396,6 @@ static int netvsc_init_buf(struct hv_device *device,
[PATCH V2 6/6] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 2 files changed, 22 insertions(+), 16 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff; - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) - - hvpgoff; + hvpfn = HVPFN_DOWN(sg_dma_address(sg)); + hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) + +sg_dma_len(sg)) - hvpfn; /* * Fill the next portion of the PFN array with @@ -1872,7 +1876,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) * the PFN array is filled. */ while (hvpfns_to_add--) - payload->range.pfn_array[i++] = hvpfn++; + payload->range.pfn_array[i++] = hvpfn++; } } @@ -2016,6
[PATCH V2 4/6] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Hyper-V initalizes swiotlb bounce buffer and default swiotlb needs to be disabled. pci_swiotlb_detect_override() and pci_swiotlb_detect_4gb() enable the default one. To override the setting, hyperv_swiotlb_detect() needs to run before these detect functions which depends on the pci_xen_swiotlb_ init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb _detect() to keep the order. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes() in the hyperv_iommu_swiotlb_later_init(). Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous callback to handle request of allocating and mapping noncontiguous dma memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_ bypass flag for hv device to use dma direct functions during mapping/unmapping dma page. Signed-off-by: Tianyu Lan --- Change since v1: * Remove hv isolation check in the sev_setup_arch() arch/x86/mm/mem_encrypt.c | 1 + arch/x86/xen/pci-swiotlb-xen.c | 3 +- drivers/hv/Kconfig | 1 + drivers/hv/vmbus_drv.c | 6 ++ drivers/iommu/hyperv-iommu.c | 164 + include/linux/hyperv.h | 10 ++ 6 files changed, 184 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 35487305d8af..e48c73b3dd41 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "mm_internal.h" diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 46df59aeaa06..30fd0600b008 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void) EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late); IOMMU_INIT_FINISH(pci_xen_swiotlb_detect, - NULL, + hyperv_swiotlb_detect, pci_xen_swiotlb_init, NULL); diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig index dd12af20e467..d43b4cd88f57 100644 --- a/drivers/hv/Kconfig +++ b/drivers/hv/Kconfig @@ -9,6 +9,7 @@ config HYPERV select PARAVIRT select X86_HV_CALLBACK_VECTOR if X86 select VMAP_PFN + select DMA_OPS_BYPASS help Select this option to run Linux as a Hyper-V client operating system. diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..32dc193e31cd 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_ops_bypass = true; + child_device_obj->device.dma_ops = _iommu_dma_ops; + child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e285a220c913..ebcb628e7e8f 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -13,14 +13,21 @@ #include #include #include +#include +#include #include #include #include #include +#include +#include #include #include #include +#include +#include +#include #include "irq_remapping.h" @@ -337,4 +344,161 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = { .free = hyperv_root_irq_remapping_free, }; +static void __init hyperv_iommu_swiotlb_init(void) +{ + unsigned long hyperv_io_tlb_size; +
[PATCH V2 3/6] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM which has memory encrypt support. Add hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT attribute. Signed-off-by: Tianyu Lan --- arch/x86/kernel/cc_platform.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..f3bb0431f5c5 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,9 +59,23 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_HYPERV + if (attr == CC_ATTR_GUEST_MEM_ENCRYPT) + return true; + else + return false; +#else + return false; +#endif +} bool cc_platform_has(enum cc_attr attr) { + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + if (sme_me_mask) return amd_cc_platform_has(attr); -- 2.25.1
[PATCH V2 1/6] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan --- Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 + kernel/dma/swiotlb.c| 53 + 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..c303fdeba82f 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) { + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + return NULL; + } + + return vaddr; + } + + return phys_to_virt(mem->start); +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +200,14 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) { + pr_err("Fail to remap swiotlb mem.\n"); + return; + } + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +231,18 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID_PHYS_ADDR; mem->slots[i].alloc_size = 0; } + + /* +* If swiotlb_unencrypted_base is set, the bounce buffer memory will +* be remapped and cleared in swiotlb_update_mem_attributes. +*/ + if (swiotlb_unencrypted_base) +
[PATCH V2 2/6] dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops
From: Tianyu Lan Hyper-V netvsc driver needs to allocate noncontiguous DMA memory and remap it into unencrypted address space before sharing with host. Add vmap/vunmap_noncontiguous() callback and handle the remap in the Hyper-V dma ops callback. Signed-off-by: Tianyu Lan --- include/linux/dma-map-ops.h | 3 +++ kernel/dma/mapping.c| 18 ++ 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 0d5b06b3a4a6..f7b9958ca20a 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -27,6 +27,9 @@ struct dma_map_ops { unsigned long attrs); void (*free_noncontiguous)(struct device *dev, size_t size, struct sg_table *sgt, enum dma_data_direction dir); + void *(*vmap_noncontiguous)(struct device *dev, size_t size, + struct sg_table *sgt); + void (*vunmap_noncontiguous)(struct device *dev, void *addr); int (*mmap)(struct device *, struct vm_area_struct *, void *, dma_addr_t, size_t, unsigned long attrs); diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 9478eccd1c8e..7fd751d866cc 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -674,8 +674,14 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t size, const struct dma_map_ops *ops = get_dma_ops(dev); unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; - if (ops && ops->alloc_noncontiguous) - return vmap(sgt_handle(sgt)->pages, count, VM_MAP, PAGE_KERNEL); + if (ops) { + if (ops->vmap_noncontiguous) + return ops->vmap_noncontiguous(dev, size, sgt); + else if (ops->alloc_noncontiguous) + return vmap(sgt_handle(sgt)->pages, count, VM_MAP, + PAGE_KERNEL); + } + return page_address(sg_page(sgt->sgl)); } EXPORT_SYMBOL_GPL(dma_vmap_noncontiguous); @@ -684,8 +690,12 @@ void dma_vunmap_noncontiguous(struct device *dev, void *vaddr) { const struct dma_map_ops *ops = get_dma_ops(dev); - if (ops && ops->alloc_noncontiguous) - vunmap(vaddr); + if (ops) { + if (ops->vunmap_noncontiguous) + ops->vunmap_noncontiguous(dev, vaddr); + else if (ops->alloc_noncontiguous) + vunmap(vaddr); + } } EXPORT_SYMBOL_GPL(dma_vunmap_noncontiguous); -- 2.25.1
[PATCH V2 0/6] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc in Isolation VM. Add Hyper-V dma ops and provide dma_alloc/free_ noncontiguous and vmap/vunmap_noncontiguous callback. Allocate rx/tx ring via dma_alloc_noncontiguous() and map them into extra address space via dma_vmap_noncontiguous(). Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (6): Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM net: netvsc: Add Isolation VM support for netvsc driver scsi: storvsc: Add Isolation VM support for storvsc driver arch/x86/kernel/cc_platform.c | 15 +++ arch/x86/mm/mem_encrypt.c | 1 + arch/x86/xen/pci-swiotlb-xen.c| 3 +- drivers/hv/Kconfig| 1 + drivers/hv/vmbus_drv.c| 6 + drivers/iommu/hyperv-iommu.c | 164 + drivers/net/hyperv/hyperv_net.h | 5 + drivers/net/hyperv/netvsc.c | 192 +++--- drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 +++--- include/linux/dma-map-ops.h | 3 + include/linux/hyperv.h| 17 +++ include/linux/swiotlb.h | 6 + kernel/dma/mapping.c | 18 ++- kernel/dma/swiotlb.c | 53 - 15 files changed, 482 insertions(+), 41 deletions(-) -- 2.25.1
Re: [RFC PATCH 0/2] Boot time cpupools
Hi Stefano, On 19/11/2021 18:55, Stefano Stabellini wrote: On Fri, 19 Nov 2021, Julien Grall wrote: I like better Juergen's version. But before agreeing on the command line , I would like to understand how Linux is dealing with big.LITTLE today (see my question above). I also like Juergen's version better :-) The device tree binding that covers big.LITTLE is the same that covers cpus: Documentation/devicetree/bindings/arm/cpus.yaml Are you sure? I found Documentation/devicetree/bindings/arm/cpu-capacity.txt. So basically, you can tell it is a big.LITTLE system because the compatible strings differ between certain cpus, e.g.: cpu@0 { device_type = "cpu"; compatible = "arm,cortex-a15"; reg = <0x0>; }; cpu@1 { device_type = "cpu"; compatible = "arm,cortex-a15"; reg = <0x1>; }; cpu@100 { device_type = "cpu"; compatible = "arm,cortex-a7"; reg = <0x100>; }; cpu@101 { device_type = "cpu"; compatible = "arm,cortex-a7"; reg = <0x101>; }; I have not found any code in Linux using the compatible. Instead, they all seem to use the cpu-map (see drivers/base/arch_topology.c). Anyway, to me it would be better to parse the Device-Tree over using the MIDR. The advantage is we can cover a wide range of cases (you may have processor with the same MIDR but different frequency). For now, we could create a cpupool per cluster. Cheers, -- Julien Grall
[PATCH 3/3] VT-d: conditionalize IOTLB register offset check
As of commit 6773b1a7584a ("VT-d: Don't assume register-based invalidation is always supported") we don't (try to) use register based invalidation anymore when that's not supported by hardware. Hence there's also no point in the respective check, avoiding pointless IOMMU initialization failure. After all the spec (version 3.3 at the time of writing) doesn't say what the respective Extended Capability Register field would contain in such a case. Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1228,7 +1228,8 @@ int __init iommu_alloc(struct acpi_drhd_ if ( cap_fault_reg_offset(iommu->cap) + cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN > PAGE_SIZE || - ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE ) + (has_register_based_invalidation(iommu) && + ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE) ) { printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n"); print_iommu_regs(drhd);
[PATCH 2/3] VT-d: correct off-by-1 in fault register range check
All our present implementation requires is that the range fully fits in a single page. No need to exclude the case of the last register extending right to the end of that page. Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1229,7 +1229,7 @@ int __init iommu_alloc(struct acpi_drhd_ quirk_iommu_caps(iommu); if ( cap_fault_reg_offset(iommu->cap) + - cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN >= PAGE_SIZE || + cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN > PAGE_SIZE || ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE ) { printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n");
[PATCH 1/3] VT-d: prune SAGAW recognition
Bit 0 of the capability register field has become reserved at or before spec version 2.2. Treat it as such. Replace the effective open-coding of find_first_set_bit(). Adjust local variable types. Signed-off-by: Jan Beulich --- Strictly speaking IOMMUs supporting only 3-level tables ought to result in guests seeing a suitably reduced physical address width in CPUID. And then the same would apply to restrictions resulting from MGAW. --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -356,7 +356,7 @@ static uint64_t domain_pgd_maddr(struct pgd_maddr = hd->arch.vtd.pgd_maddr; } -/* Skip top levels of page tables for 2- and 3-level DRHDs. */ +/* Skip top level(s) of page tables for less-than-maximum level DRHDs. */ for ( agaw = level_to_agaw(4); agaw != level_to_agaw(nr_pt_levels); agaw-- ) @@ -1183,8 +1183,7 @@ static int __init iommu_set_interrupt(st int __init iommu_alloc(struct acpi_drhd_unit *drhd) { struct vtd_iommu *iommu; -unsigned long sagaw, nr_dom; -int agaw; +unsigned int sagaw, agaw = 0, nr_dom; iommu = xzalloc(struct vtd_iommu); if ( iommu == NULL ) @@ -1237,14 +1236,13 @@ int __init iommu_alloc(struct acpi_drhd_ return -ENODEV; } -/* Calculate number of pagetable levels: between 2 and 4. */ +/* Calculate number of pagetable levels: 3 or 4. */ sagaw = cap_sagaw(iommu->cap); -for ( agaw = level_to_agaw(4); agaw >= 0; agaw-- ) -if ( test_bit(agaw, ) ) -break; -if ( agaw < 0 ) +if ( sagaw & 6 ) +agaw = find_first_set_bit(sagaw & 6); +if ( !agaw ) { -printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %lx\n", sagaw); +printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw); print_iommu_regs(drhd); return -ENODEV; }
[PATCH 0/3] VT-d: further small corrections
1: prune SAGAW recognition 2: correct off-by-1 in fault register range check 3: conditionalize IOTLB register offset check Jan
[PATCH v7 3/3] xen: add Xen pvUSB maintainer
Add myself as maintainer for the Xen pvUSB stuff. Signed-off-by: Juergen Gross Acked-by: Konrad Rzeszutek Wilk --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 5250298d2817..da7fd3d32dc4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -20831,6 +20831,14 @@ F: drivers/scsi/xen-scsifront.c F: drivers/xen/xen-scsiback.c F: include/xen/interface/io/vscsiif.h +XEN PVUSB DRIVER +M: Juergen Gross +L: xen-devel@lists.xenproject.org (moderated for non-subscribers) +L: linux-...@vger.kernel.org +S: Supported +F: divers/usb/host/xen* +F: include/xen/interface/io/usbif.h + XEN SOUND FRONTEND DRIVER M: Oleksandr Andrushchenko L: xen-devel@lists.xenproject.org (moderated for non-subscribers) -- 2.26.2
[PATCH v7 1/3] usb: Add Xen pvUSB protocol description
Add the definition of pvUSB protocol used between the pvUSB frontend in a Xen domU and the pvUSB backend in a Xen driver domain (usually Dom0). This header was originally provided by Fujitsu for Xen based on Linux 2.6.18. Changes are: - adapt to Linux kernel style guide - use Xen namespace - add lots of comments - don't use kernel internal defines Signed-off-by: Juergen Gross Reviewed-by: Boris Ostrovsky --- include/xen/interface/io/usbif.h | 405 +++ 1 file changed, 405 insertions(+) create mode 100644 include/xen/interface/io/usbif.h diff --git a/include/xen/interface/io/usbif.h b/include/xen/interface/io/usbif.h new file mode 100644 index ..a70e0b93178b --- /dev/null +++ b/include/xen/interface/io/usbif.h @@ -0,0 +1,405 @@ +/* SPDX-License-Identifier: MIT */ + +/* + * usbif.h + * + * USB I/O interface for Xen guest OSes. + * + * Copyright (C) 2009, FUJITSU LABORATORIES LTD. + * Author: Noboru Iwamatsu + */ + +#ifndef __XEN_PUBLIC_IO_USBIF_H__ +#define __XEN_PUBLIC_IO_USBIF_H__ + +#include "ring.h" +#include "../grant_table.h" + +/* + * Detailed Interface Description + * == + * The pvUSB interface is using a split driver design: a frontend driver in + * the guest and a backend driver in a driver domain (normally dom0) having + * access to the physical USB device(s) being passed to the guest. + * + * The frontend and backend drivers use XenStore to initiate the connection + * between them, the I/O activity is handled via two shared ring pages and an + * event channel. As the interface between frontend and backend is at the USB + * host connector level, multiple (up to 31) physical USB devices can be + * handled by a single connection. + * + * The Xen pvUSB device name is "qusb", so the frontend's XenStore entries are + * to be found under "device/qusb", while the backend's XenStore entries are + * under "backend//qusb". + * + * When a new pvUSB connection is established, the frontend needs to setup the + * two shared ring pages for communication and the event channel. The ring + * pages need to be made available to the backend via the grant table + * interface. + * + * One of the shared ring pages is used by the backend to inform the frontend + * about USB device plug events (device to be added or removed). This is the + * "conn-ring". + * + * The other ring page is used for USB I/O communication (requests and + * responses). This is the "urb-ring". + * + * Feature and Parameter Negotiation + * = + * The two halves of a Xen pvUSB driver utilize nodes within the XenStore to + * communicate capabilities and to negotiate operating parameters. This + * section enumerates these nodes which reside in the respective front and + * backend portions of the XenStore, following the XenBus convention. + * + * Any specified default value is in effect if the corresponding XenBus node + * is not present in the XenStore. + * + * XenStore nodes in sections marked "PRIVATE" are solely for use by the + * driver side whose XenBus tree contains them. + * + * + *Backend XenBus Nodes + * + * + *-- Backend Device Identification (PRIVATE) -- + * + * num-ports + * Values: unsigned [1...31] + * + * Number of ports for this (virtual) USB host connector. + * + * usb-ver + * Values: unsigned [1...2] + * + * USB version of this host connector: 1 = USB 1.1, 2 = USB 2.0. + * + * port/[1...31] + * Values: string + * + * Physical USB device connected to the given port, e.g. "3-1.5". + * + * + *Frontend XenBus Nodes + * + * + *--- Request Transport Parameters --- + * + * event-channel + * Values: unsigned + * + * The identifier of the Xen event channel used to signal activity + * in the ring buffer. + * + * urb-ring-ref + * Values: unsigned + * + * The Xen grant reference granting permission for the backend to map + * the sole page in a single page sized ring buffer. This is the ring + * buffer for urb requests. + * + * conn-ring-ref + * Values: unsigned + * + * The Xen grant reference granting permission for the backend to map + * the sole page in a single page sized ring buffer. This is the ring + * buffer for connection/disconnection requests. + * + * protocol + * Values: string (XEN_IO_PROTO_ABI_*) + * Default Value: XEN_IO_PROTO_ABI_NATIVE + * + * The machine ABI rules governing the format of all ring request and + * response structures. + * + *
[PATCH v7 0/3] xen, usb: support pvUSB frontend driver
This series adds XEN guest pvUSB support. With pvUSB it is possible to use physical USB devices from a XEN domain. Since V4 a lot of time (several years) has passed. This is a revived attempt to get the frontend into the kernel. The support consists of a frontend in form of a virtual hcd driver in the unprivileged domU passing I/O-requests to the backend in a driver domain (usually Dom0). The backend is not part of this patch series, as it is supported via qemu. The code is taken (and adapted) from the original pvUSB implementation done for Linux 2.6 in 2008 by Fujitsu. Normal operation of USB devices by adding and removing them dynamically to/from a domain has been tested using various USB devices (USB 1.1, 2.0 and 3.0) using the qemu based backend. Changes in V7: - use SPDX header in usbif.h Changes in V6: - add SPDX line to driver Changes in V5: - added interface documentation to patch 1 - frontend no longer trusts backend to return only sane values Changes in V4: - remove sysfs file from frontend, as it violated the "one value per file" rule and didn't serve any real purpose. Changes in V3: - move frontend to drivers/usb/host and rename it to xen-hcd. - changed name prefixes in patch 1 to "xenusb" as requested by Greg - use __u types rather than uint_t as requested by Greg Changes in V2: - removed backend, as it can be implemented in user land - added some access macros and definitions to the pvUSB interface description to make it independant from linux kernel USB internals - adapted frontend to newer kernel version and use new pvUSB interface macros - set port status in one chunk as suggested by Oliver Neukum Juergen Gross (3): usb: Add Xen pvUSB protocol description usb: Introduce Xen pvUSB frontend (xen hcd) xen: add Xen pvUSB maintainer MAINTAINERS |8 + drivers/usb/host/Kconfig | 11 + drivers/usb/host/Makefile|1 + drivers/usb/host/xen-hcd.c | 1606 ++ include/xen/interface/io/usbif.h | 405 5 files changed, 2031 insertions(+) create mode 100644 drivers/usb/host/xen-hcd.c create mode 100644 include/xen/interface/io/usbif.h -- 2.26.2
[PATCH v7 2/3] usb: Introduce Xen pvUSB frontend (xen hcd)
Introduces the Xen pvUSB frontend. With pvUSB it is possible for a Xen domU to communicate with a USB device assigned to that domU. The communication is all done via the pvUSB backend in a driver domain (usually Dom0) which is owner of the physical device. The pvUSB frontend is a USB hcd for a virtual USB host connector. The code is taken from the pvUSB implementation in Xen done by Fujitsu based on Linux kernel 2.6.18. Changes from the original version are: - port to upstream kernel - put all code in just one source file - move module to appropriate location in kernel tree - adapt to Linux style guide - minor code modifications to increase readability Signed-off-by: Juergen Gross --- drivers/usb/host/Kconfig | 11 + drivers/usb/host/Makefile |1 + drivers/usb/host/xen-hcd.c | 1606 3 files changed, 1618 insertions(+) create mode 100644 drivers/usb/host/xen-hcd.c diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig index d1d926f8f9c2..57ca5f97a3dc 100644 --- a/drivers/usb/host/Kconfig +++ b/drivers/usb/host/Kconfig @@ -772,3 +772,14 @@ config USB_HCD_TEST_MODE This option is of interest only to developers who need to validate their USB hardware designs. It is not needed for normal use. If unsure, say N. + +config USB_XEN_HCD + tristate "Xen usb virtual host driver" + depends on XEN + select XEN_XENBUS_FRONTEND + help + The Xen usb virtual host driver serves as a frontend driver enabling + a Xen guest system to access USB Devices passed through to the guest + by the Xen host (usually Dom0). + Only needed if the kernel is running in a Xen guest and generic + access to a USB device is needed. diff --git a/drivers/usb/host/Makefile b/drivers/usb/host/Makefile index 171de4df50bd..2948983618fb 100644 --- a/drivers/usb/host/Makefile +++ b/drivers/usb/host/Makefile @@ -85,3 +85,4 @@ obj-$(CONFIG_USB_HCD_BCMA)+= bcma-hcd.o obj-$(CONFIG_USB_HCD_SSB) += ssb-hcd.o obj-$(CONFIG_USB_FOTG210_HCD) += fotg210-hcd.o obj-$(CONFIG_USB_MAX3421_HCD) += max3421-hcd.o +obj-$(CONFIG_USB_XEN_HCD) += xen-hcd.o diff --git a/drivers/usb/host/xen-hcd.c b/drivers/usb/host/xen-hcd.c new file mode 100644 index ..7801dde6f5ee --- /dev/null +++ b/drivers/usb/host/xen-hcd.c @@ -0,0 +1,1606 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * xen-hcd.c + * + * Xen USB Virtual Host Controller driver + * + * Copyright (C) 2009, FUJITSU LABORATORIES LTD. + * Author: Noboru Iwamatsu + */ + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include + +/* Private per-URB data */ +struct urb_priv { + struct list_head list; + struct urb *urb; + int req_id; /* RING_REQUEST id for submitting */ + int unlink_req_id; /* RING_REQUEST id for unlinking */ + int status; + bool unlinked; /* dequeued marker */ +}; + +/* virtual roothub port status */ +struct rhport_status { + __u32 status; + bool resuming; /* in resuming */ + bool c_connection; /* connection changed */ + unsigned long timeout; +}; + +/* status of attached device */ +struct vdevice_status { + int devnum; + enum usb_device_state status; + enum usb_device_speed speed; +}; + +/* RING request shadow */ +struct usb_shadow { + struct xenusb_urb_request req; + struct urb *urb; +}; + +struct xenhcd_info { + /* Virtual Host Controller has 4 urb queues */ + struct list_head pending_submit_list; + struct list_head pending_unlink_list; + struct list_head in_progress_list; + struct list_head giveback_waiting_list; + + spinlock_t lock; + + /* timer that kick pending and giveback waiting urbs */ + struct timer_list watchdog; + unsigned long actions; + + /* virtual root hub */ + int rh_numports; + struct rhport_status ports[XENUSB_MAX_PORTNR]; + struct vdevice_status devices[XENUSB_MAX_PORTNR]; + + /* Xen related staff */ + struct xenbus_device *xbdev; + int urb_ring_ref; + int conn_ring_ref; + struct xenusb_urb_front_ring urb_ring; + struct xenusb_conn_front_ring conn_ring; + + unsigned int evtchn; + unsigned int irq; + struct usb_shadow shadow[XENUSB_URB_RING_SIZE]; + unsigned int shadow_free; + + bool error; +}; + +#define GRANT_INVALID_REF 0 + +#define XENHCD_RING_JIFFIES (HZ/200) +#define XENHCD_SCAN_JIFFIES 1 + +enum xenhcd_timer_action { + TIMER_RING_WATCHDOG, + TIMER_SCAN_PENDING_URBS, +}; + +static struct kmem_cache *xenhcd_urbp_cachep; + +static inline struct xenhcd_info *xenhcd_hcd_to_info(struct usb_hcd *hcd) +{ + return (struct xenhcd_info *)hcd->hcd_priv; +} + +static inline struct usb_hcd *xenhcd_info_to_hcd(struct xenhcd_info *info) +{ +
Re: [RFC PATCH] Added the logic to decode 32 bit ldr/str post-indexing instructions
On 22/11/2021 14:19, Ayan Kumar Halder wrote: Hi Julien/Stefano/Wei/Jan, Hi, As some of the comments were inter-related, I have consolidated my response to all the comments below. On 19/11/2021 17:26, Julien Grall wrote: Hi Ayan, On 19/11/2021 16:52, Ayan Kumar Halder wrote: At present, post indexing instructions are not emulated by Xen. Can you explain in the commit message why this is a problem? Yes, I will update the message as below :- Armv8 hardware does not provide the correct syndrome for decoding post indexing ldr/str instructions. This statement implies that the issue happens on both AArch32 and AArch64 state. I am OK if we only handle the latter for now. But I would clarify it in the commit message. Thus any post indexing ldr/str instruction at EL1 results in a data abort with ISV=0. Instruction from EL0 may also trap in Xen. So I would prefer if you just say "instruction executed by a domain results ...". As a result, Xen needs to decode the instruction. Can you give some information on the domain used. Something like: "this was discovered with XXX which let the compiler deciding which instruction to use." Thus, DomU will be able to read/write to ioreg space with post indexing I would say "domain" rather "domU" because all the domains are affected. instructions for 32 bit. How about the following commit message: " xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions At the moment, Xen is only handling data abort with valid syndrome (i.e. ISV=0). Unfortunately, this doesn't cover all the instructions a domain could use to access MMIO regions. For instance, Xilinx baremetal OS will use: volatile u32 *LocalAddr = (volatile u32 *)Addr; *LocalAddr = Value; This leave the compiler to decide which store instructions to use. This may be a post-index store instruction where the HW will not provide a valid syndrome. In order to handle post-indexing store/load instructions, Xen will need to fetch and decode the instruction. This patch only cover post-index store/load instructions from AArch64 mode. For now, this is left unimplemented for trap from AArch32 mode. " +{ + int32_t ret; + + if ( !(start >= 0 && length > 0 && length <= 32 - start) ) + return -EINVAL; + + ret = (value >> start) & (~0U >> (32 - length)); This would be easier to read if you use GENMASK(). I see that GENMASK returns a register mask. In my scenario, I wish to compare the offsets 10, 21, 24 and 27 to some fixed value. So, instead of using GENMASK four times, I can try the following instr & MASK_for_10_21_24_27 == VALID_for_10_21_24_27 (Wei Chen's suggestion) That would be OK with me. Alternatively, you could use the union approach suggested by Bertrand. Also for size, v and opc, I can defined another bitmask to compare with VALID_for_32bit_LDR | VALID_for_32bit_STR. Wei Chen, You had suggested using vreg_regx_extract(). However, I see that it is used to extract the complete u64 register value. In this case, I wish to compare certain offsets within a 32 bit register to some expected values. So, vreg_regx_extract() might not be appropriate and we can use the method mentioned before. vreg_reg*_extract() are meant to work on a register. So I think they are not appropriate here as you don't deal with registers. [...] + + /* At the moment, we support STR(immediate) - 32 bit variant and + * LDR(immediate) - 32 bit variant only. + */ Coding style. Ack + if (!((size==2) && (v==0) && ((opc==0) || (opc==1 The coding style for this should be: if ( !(( size == 2 ) && ( ... ) ... ) ) Ack + goto bad_64bit_loadstore; + + rt = extract32(instr, 0, 5); + imm9 = extract32(instr, 12, 9); + + if ( imm9 < 0 ) + update_dabt(dabt, rt, size, true); + else + update_dabt(dabt, rt, size, false); This could be simplified with: update_dabt(dabt, rt, size, imm9 < 0); Ack + + dabt->valid = 1; + + + return 0; +bad_64bit_loadstore: + gprintk(XENLOG_ERR, "unhandled 64bit instruction 0x%x\n", instr); + return 1; +} + static int decode_thumb(register_t pc, struct hsr_dabt *dabt) { uint16_t instr; @@ -155,6 +229,9 @@ int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt) if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB ) return decode_thumb(regs->pc, dabt); + if ( is_64bit_domain(current->domain) ) You can still run 32-bit code in 64-bit domain. So I think you want to check the SPSR mode. Do you mean the following check :- if ( (is_64bit_domain(current->domain) && (!(regs->spsr & PSR_MODE_BIT)) ) This should work, but I think you can simplify to use: !psr_mode_is_32bit() + */ + rc = decode_instruction(regs, ); I actually expect this to also be useful when forwarding the I/O to device-model. So I would move the decoding earlier in the code and the check of
Re: [PATCH 01/10] vpci: move lock
Hi, Roger, Jan! I think the below work will help with solving issues brought up by [1]. So, after thinking a bit more I concluded that indeed we do not actually need a per-domain vPCI lock, but instead we just want to protect pdev->vpci which is done in this patch. At the moment I see four entities which may run concurrently and touch vpci: 1. hypercalls PHYSDEVOP_pci_device_{add|remove} - for Dom0 only 2. hypercalls XEN_DOMCTL_{de|}assign_device - any domain 3. MMIO trap handlers vpci_{read|write} 4. vpci_process_pending From the above #1 will update pdev->vpci and #4 may remove pdev->vpci with vpci_remove_device on error path for Dom0. #2 and #3 seem to be readers only. So, it is possible to improve this patch to not only take pdev->vpci->lock out of struct vpci, but also to convert it into RW lock. Hope this is what is needed in context of vpci and locking. Thank you, Oleksandr On 20.06.18 17:42, Roger Pau Monne wrote: > To the outside of the vpci struct. This way the lock can be used to > check whether vpci is present, and removal can be performed while > holding the lock, in order to make sure there are no accesses to the > contents of the vpci struct. Previously removal could race with > vpci_read for example, since the log was dropped prior to freeing > pdev->vpci. > > Signed-off-by: Roger Pau Monné > --- > Cc: Andrew Cooper > Cc: George Dunlap > Cc: Ian Jackson > Cc: Jan Beulich > Cc: Julien Grall > Cc: Konrad Rzeszutek Wilk > Cc: Stefano Stabellini > Cc: Tim Deegan > Cc: Wei Liu > --- > tools/tests/vpci/emul.h | 5 ++-- > tools/tests/vpci/main.c | 4 +-- > xen/arch/x86/hvm/vmsi.c | 8 +++--- > xen/drivers/passthrough/pci.c | 1 + > xen/drivers/vpci/header.c | 19 + > xen/drivers/vpci/msi.c| 11 +--- > xen/drivers/vpci/msix.c | 8 +++--- > xen/drivers/vpci/vpci.c | 51 ++- > xen/include/xen/pci.h | 1 + > xen/include/xen/vpci.h| 3 +-- > 10 files changed, 70 insertions(+), 41 deletions(-) > > diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h > index 5d47544bf7..d344ef71c9 100644 > --- a/tools/tests/vpci/emul.h > +++ b/tools/tests/vpci/emul.h > @@ -44,6 +44,7 @@ struct domain { > }; > > struct pci_dev { > +bool vpci_lock; > struct vpci *vpci; > }; > > @@ -53,10 +54,8 @@ struct vcpu > }; > > extern const struct vcpu *current; > -extern const struct pci_dev test_pdev; > +extern struct pci_dev test_pdev; > > -typedef bool spinlock_t; > -#define spin_lock_init(l) (*(l) = false) > #define spin_lock(l) (*(l) = true) > #define spin_unlock(l) (*(l) = false) > > diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c > index b9a0a6006b..26c95b08b6 100644 > --- a/tools/tests/vpci/main.c > +++ b/tools/tests/vpci/main.c > @@ -23,7 +23,8 @@ static struct vpci vpci; > > const static struct domain d; > > -const struct pci_dev test_pdev = { > +struct pci_dev test_pdev = { > +.vpci_lock = false, > .vpci = , > }; > > @@ -158,7 +159,6 @@ main(int argc, char **argv) > int rc; > > INIT_LIST_HEAD(); > -spin_lock_init(); > > VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0); > VPCI_READ_CHECK(0, 4, r0); > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 3001d5c488..94550cb8c4 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -893,14 +893,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > { > struct pci_dev *pdev = msix->pdev; > > -spin_unlock(>pdev->vpci->lock); > +spin_unlock(>pdev->vpci_lock); > process_pending_softirqs(); > /* NB: we assume that pdev cannot go away for an alive domain. > */ > -if ( !pdev->vpci || !spin_trylock(>vpci->lock) ) > +if ( !spin_trylock(>vpci_lock) ) > return -EBUSY; > -if ( pdev->vpci->msix != msix ) > +if ( !pdev->vpci || pdev->vpci->msix != msix ) > { > -spin_unlock(>vpci->lock); > +spin_unlock(>vpci_lock); > return -EAGAIN; > } > } > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index c4890a4295..a5d59b83b7 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -315,6 +315,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, > u8 bus, u8 devfn) > *((u8*) >devfn) = devfn; > pdev->domain = NULL; > INIT_LIST_HEAD(>msi_list); > +spin_lock_init(>vpci_lock); > > if ( pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), > PCI_FUNC(devfn), >PCI_CAP_ID_MSIX) ) > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > index 0ec4c082a6..9d5607d5f8 100644 > --- a/xen/drivers/vpci/header.c > +++
Xen Security Advisory 388 v3 (CVE-2021-28704,CVE-2021-28707,CVE-2021-28708) - PoD operations on misaligned GFNs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2021-28704,CVE-2021-28707,CVE-2021-28708 / XSA-388 version 3 PoD operations on misaligned GFNs UPDATES IN VERSION 3 Correct affected versions range. Add CVE numbers to patches. Public release. ISSUE DESCRIPTION = x86 HVM and PVH guests may be started in populate-on-demand (PoD) mode, to provide a way for them to later easily have more memory assigned. Guests are permitted to control certain P2M aspects of individual pages via hypercalls. These hypercalls may act on ranges of pages specified via page orders (resulting in a power-of-2 number of pages). The implementation of some of these hypercalls for PoD does not enforce the base page frame number to be suitably aligned for the specified order, yet some code involved in PoD handling actually makes such an assumption. These operations are XENMEM_decrease_reservation (CVE-2021-28704) and XENMEM_populate_physmap (CVE-2021-28707), the latter usable only by domains controlling the guest, i.e. a de-privileged qemu or a stub domain. (Patch 1, combining the fix to both these two issues.) In addition handling of XENMEM_decrease_reservation can also trigger a host crash when the specified page order is neither 4k nor 2M nor 1G (CVE-2021-28708, patch 2). IMPACT == Malicious or buggy guest kernels may be able to mount a Denial of Service (DoS) attack affecting the entire system. Privilege escalation and information leaks cannot be ruled out. VULNERABLE SYSTEMS == All Xen versions from 4.7 onwards are affected. Xen versions 4.6 and older are not affected. Only x86 HVM and PVH guests started in populate-on-demand mode can leverage the vulnerability. Populate-on-demand mode is activated when the guest's xl configuration file specifies a "maxmem" value which is larger than the "memory" value. MITIGATION == Not starting x86 HVM or PVH guests in populate-on-demand mode will avoid the vulnerability. CREDITS === This issue was discovered by Jan Beulich of SUSE. RESOLUTION == Applying the appropriate pair if attached patches resolves this issue. Note that patches for released versions are generally prepared to apply to the stable branches, and may not apply cleanly to the most recent release tarball. Downstreams are encouraged to update to the tip of the stable branch before applying these patches. xsa388-?.patch xen-unstable xsa388-4.15-?.patch Xen 4.15.x xsa388-4.14-?.patch Xen 4.14.x - 4.12.x $ sha256sum xsa388* 43f6647e9f7d28d22eeb98680e116b301b0e29ef63ea65c9839a5aaebd449bc4 xsa388-1.patch 64b27a8c7c02036528e00a3070e27e873762d68f4ea1504e906aaf2ddc1c06be xsa388-2.patch 6917267482101a3f8f1d13905e14994344a0af81370c7a2b92275fb176b321a0 xsa388-4.14-1.patch d5886e046c69f34f98f7e1fc6ffcc36d92f8fc79242b9dc88412c39aa79b4ac3 xsa388-4.14-2.patch fbe6af409447edc2318940d7c4bc0861a236d40db037166608fc09fa57ef54b1 xsa388-4.15-1.patch c828d735aaa3f430ccef314bf27519cd6a5f4daaa79e1c493dc47e42ab09ec9f xsa388-4.15-2.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches described above (or others which are substantially similar) is permitted during the embargo, even on public- facing systems with untrusted guest users and administrators. HOWEVER, deployment of the mitigation described above is NOT permitted during the embargo on public-facing systems with untrusted guest users and administrators. This is because such a configuration change is recognizable by the affected guests. AND: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jkMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZROMIALJsptV0nV8H5/nCLUWld3mKjAeb/+N20ul9NEwn rUwIGGGzyrKZQdAljno+9y9o5pM8+BC+aTBwYhmxEWsHm1kodTD+YnJYf8uNW/CW uhTJp/ZB5EsWhTFHF7YoKbPG0on4KIsy0TgoUug7bv+l2zEny9gfknsj8jdp3qCy aFv1Bb2PzRh462qVHI3f27Ee8bn7GfErouuLppmDpCva19D3bhUXQ5PhxFB+oqsI bww4VKUo0nxZftYhpbInWm34dajEIXK7jy5Z/CUPgCj2sTOHHBv7+5JJdw0umn/A lJ2Ta1u03sdC9JWbat4qjvdVgK9L9vT+jWsfcwk02qq+XSU= =uSRt -END PGP SIGNATURE- xsa388-1.patch Description: Binary data xsa388-2.patch Description: Binary data xsa388-4.14-1.patch Description: Binary data
Xen Security Advisory 385 v2 (CVE-2021-28706) - guests may exceed their designated memory limit
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2021-28706 / XSA-385 version 2 guests may exceed their designated memory limit UPDATES IN VERSION 2 Add CVE numbers to patches. Public release. ISSUE DESCRIPTION = When a guest is permitted to have close to 16TiB of memory, it may be able to issue hypercalls to increase its memory allocation beyond the administrator established limit. This is a result of a calculation done with 32-bit precision, which may overflow. It would then only be the overflowed (and hence small) number which gets compared against the established upper bound. IMPACT == A guest may be able too allocate unbounded amounts of memory to itself. This may result in a Denial of Service (DoS) affecting the entire host. VULNERABLE SYSTEMS == All Xen versions from at least 3.2 onwards are affected. On x86, only Xen builds with the BIGMEM configuration option enabled are affected. (This option is off by default.) Only hosts with more than 16 TiB of memory are affected. MITIGATION == Setting the maximum amount of memory a guest may allocate to strictly less than 1023 GiB will avoid the vulnerability. CREDITS === This issue was discovered by Julien Grall of Amazon. RESOLUTION == Applying the appropriate first attached patch resolves this specific issue. The second patch in addition documents altered support status of Xen on huge memory systems. Note that patches for released versions are generally prepared to apply to the stable branches, and may not apply cleanly to the most recent release tarball. Downstreams are encouraged to update to the tip of the stable branch before applying these patches. xsa385-?.patch xen-unstable xsa385-4.15.patchXen 4.15.x - 4.14.x xsa385-4.13.patchXen 4.13.x xsa385-4.12.patchXen 4.12.x $ sha256sum xsa385* b278902e293730a117605200910180bb842cf95db4bdedfd54b42b7314041d8c xsa385-1.patch 46a5ccfbb763b857f6cd0df46a9b7eed155b9de399ca4c68c9925faf4d1d9adb xsa385-2.patch 69ebe63dc7dca71f74260af19205a6387be56c7dc67b97fa7695ab1acd3c4da4 xsa385-4.12.patch 858eaad715e7cc62c4ab9784360f4ec77df70b2636b0755afe780d5c618cf9b4 xsa385-4.13.patch 831e86c3adfec532b1a48a0b967b7c58c37db3733aee8d78216eb9d535b34f12 xsa385-4.15.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches described above (or others which are substantially similar) is permitted during the embargo, even on public- facing systems with untrusted guest users and administrators. HOWEVER, deployment of the mitigation described above is NOT permitted during the embargo on public-facing systems with untrusted guest users and administrators. This is because such a configuration change is recognizable by the affected guests. AND: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jYMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZEd4IAMwrHHAqFvSHgZ8Uw+DzMeT54db9nowudP9i/kYy +KobbVlGkxwLAU3mvh5lRkOLYzoIonrcA99cajZQNIcOKt3Mfi/8qzGGUN+hWZvh 6EZo3m7+7vx9mhtAeDBUbjkcZBLiVyxRAWALMS67ScBEX9lZTvbyj9nGkdQJmmfR pKt98z2Da2uR9YF521KWobuPYC0AFXujYBoavaTQpU/M8SiM+Wp1A2Fc6ZG+9ZKo frMeqFbHvwj94Hbqpn6CoLu2d/XnykMvttuLlqCKTccQc3puHXdQRz14W8IxxGYx gqYaIShZCFw/bUCu8mYHroDUlELJI3PIWQ1nJxy02bd5+N0= =7E6A -END PGP SIGNATURE- xsa385-1.patch Description: Binary data xsa385-2.patch Description: Binary data xsa385-4.12.patch Description: Binary data xsa385-4.13.patch Description: Binary data xsa385-4.15.patch Description: Binary data
Xen Security Advisory 387 v2 (CVE-2021-28703) - grant table v2 status pages may remain accessible after de-allocation (take two)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2021-28703 / XSA-387 version 2 grant table v2 status pages may remain accessible after de-allocation (take two) UPDATES IN VERSION 2 Public release. ISSUE DESCRIPTION = Guest get permitted access to certain Xen-owned pages of memory. The majority of such pages remain allocated / associated with a guest for its entire lifetime. Grant table v2 status pages, however, get de-allocated when a guest switched (back) from v2 to v1. The freeing of such pages requires that the hypervisor know where in the guest these pages were mapped. The hypervisor tracks only one use within guest space, but racing requests from the guest to insert mappings of these pages may result in any of them to become mapped in multiple locations. Upon switching back from v2 to v1, the guest would then retain access to a page that was freed and perhaps re-used for other purposes. This bug was fortuitously fixed by code cleanup in Xen 4.14, and backported to security-supported Xen branches as a prerequisite of the fix for XSA-378. IMPACT == A malicious guest may be able to elevate its privileges to that of the host, cause host or guest Denial of Service (DoS), or cause information leaks. VULNERABLE SYSTEMS == All Xen branches up to and including 4.13 are vulnerable, but only if the patches for XSA-378 have not been applied. Xen versions 4.13.4, 4.14.x and 4.15.x are not affected. Only x86 HMV and PVH guests permitted to use grant table version 2 interfaces can leverage this vulnerability. x86 PV guests cannot leverage this vulnerability. On Arm, grant table v2 use is explicitly unsupported. MITIGATION == Running only PV guests will avoid this vulnerability. Suppressing use of grant table v2 interfaces for HVM or PVH guests will also avoid this vulnerability. CREDITS === This issue was discovered by Patryk Balicki and Julien Grall of Amazon. RESOLUTION == Applying the following patch resolves the issue: x86/p2m: don't assert that the passed in MFN matches for a remove This patch was supplied with XSA-378, as one of 378's prerequisites. The fix has already been applied to Xen stable branches as follows: c65ea16dbcafbe4fe21693b18f8c2a3c5d14600e in Xen 4.14.x, 4.15.x f50fbddbae81fcccae56d27317bd71cc0e678ba2 in Xen 4.13.4 d44643199c96ac22491ae002d3bcd1c989b95ea4 in xen.git#stable-4.12 66f400c71d12fe8adfb895984b14f2941e8cb6ce in xen.git#stable-4.11 DEPLOYMENT DURING EMBARGO = Deployment of the patches and/or mitigations described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. But: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jgMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZlWUIAJ4bU9n2q9A4sqhiW0xJOCI4MIdwV2ym6xziP9iN e5sg0u3gdp94M1vLf//8h7julxLXgdJd10HWWpJkfRQcsfz3E1ul1O+mAsoHxJwI /qGl1Xis7AkDFjrPXthJUKh/DNgi8F1Rok7XDbfFznk34v4g6anh4JDfqJIUwIFQ l2s6qIOc2PjvmrJMXEboT1wEUADZNtChIqOL7Ibre9Zz6/mdr0FjPfPvLAqfvf9m aLaMElJMRx5iTEUG7qCYXUn8oKLbWNTv88yceudE7QZl3/zv/UnEL8nvBZWs/Gkx UbrC6wkNFUSpF/ngexvzsSE/SrfMYYaUPfIciyuxvuosGJY= =DmKh -END PGP SIGNATURE-
Xen Security Advisory 389 v3 (CVE-2021-28705,CVE-2021-28709) - issues with partially successful P2M updates on x86
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2021-28705,CVE-2021-28709 / XSA-389 version 3 issues with partially successful P2M updates on x86 UPDATES IN VERSION 3 Add CVE numbers to patches. Public release. ISSUE DESCRIPTION = x86 HVM and PVH guests may be started in populate-on-demand (PoD) mode, to provide a way for them to later easily have more memory assigned. Guests are permitted to control certain P2M aspects of individual pages via hypercalls. These hypercalls may act on ranges of pages specified via page orders (resulting in a power-of-2 number of pages). In some cases the hypervisor carries out the requests by splitting them into smaller chunks. Error handling in certain PoD cases has been insufficient in that in particular partial success of some operations was not properly accounted for. There are two code paths affected - page removal (CVE-2021-28705) and insertion of new pages (CVE-2021-28709). (We provide one patch which combines the fix to both issues.) IMPACT == Malicious or buggy guest kernels may be able to mount a Denial of Service (DoS) attack affecting the entire system. Privilege escalation and information leaks cannot be ruled out. VULNERABLE SYSTEMS == All Xen versions from 3.4 onwards are affected. Xen versions 3.3 and older are believed to not be affected. Only x86 HVM and PVH guests started in populate-on-demand mode are believed to be able to leverage the vulnerability. Populate-on-demand mode is activated when the guest's xl configuration file specifies a "maxmem" value which is larger than the "memory" value. MITIGATION == Not starting x86 HVM or PVH guests in populate-on-demand mode is believed to allow avoiding the vulnerability. CREDITS === This issue was discovered by Jan Beulich of SUSE. RESOLUTION == Applying the appropriate attached patch resolves this issue. Note that patches for released versions are generally prepared to apply to the stable branches, and may not apply cleanly to the most recent release tarball. Downstreams are encouraged to update to the tip of the stable branch before applying these patches. xsa389.patch xen-unstable xsa389-4.15.patch Xen 4.15.x xsa389-4.14.patch Xen 4.14.x xsa389-4.13.patch Xen 4.13.x xsa389-4.12.patch Xen 4.12.x $ sha256sum xsa389* c00f5b07594a6459bdd6f7334acc373bc3b0c14a5b0e444ec624ac60f857fc6f xsa389.patch bf0d66623c3239e334a17332035be5d7c7e33cfdd7f04f9b385f70ce8fa92752 xsa389-4.12.patch 2737affcf1e0fae5d412067ea8c7fe1cc91a28fa22f3f7e97a502cbd032582cc xsa389-4.13.patch b243284679b32ab8c817a2e41562d8694d9781fa8096c268bb41b0cd91684baa xsa389-4.14.patch 0a213e141089fe7808eae067b3c43beed6c7d5887fa4c901e8f9352618788e5a xsa389-4.15.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches described above (or others which are substantially similar) is permitted during the embargo, even on public- facing systems with untrusted guest users and administrators. HOWEVER, deployment of the mitigation described above is NOT permitted during the embargo on public-facing systems with untrusted guest users and administrators. This is because such a configuration change is recognizable by the affected guests. AND: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmGc2jkMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZkOAIAInsof4UP5VTDcLtiwCvGskCXZT0SwbJ5OKbmxG7 RmPJg+R5sy89aHyJ4BP4eRfgrfbG35qBSCB5zLHy2FR3oioRmDz3y4KAFP3hXJRc B0hSNM9Al9nEfdt0YQeVxt297X0Ouz/bihLoHXKOTZ2AqKcafu9GRIdK0Kcj1v49 azcW1ndfAkIEYDGvtcdZDXYT3CyjLusQme3pweohZGwcQW6UYg7DhRKl0KPQZP/L paQZd60walNWgDcV7qfMnWit2jYxF4AptLW8c+KFig7qorLE5z9Xj7AIJ6kGriry fnwy/DE2xRr4IxWk/FsJgDxeAS6mv3KQ2Mpgx2bRAD0jB6I= =3P7k -END PGP SIGNATURE- xsa389.patch Description: Binary data xsa389-4.12.patch Description: Binary data xsa389-4.13.patch Description: Binary data xsa389-4.14.patch Description: Binary data xsa389-4.15.patch Description: Binary data
[xen-unstable test] 166304: tolerable FAIL
flight 166304 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/166304/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 166263 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 166263 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166263 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166263 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 166263 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 166263 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 166263 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 166263 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166263 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 166263 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166263 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166263 test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass version targeted for testing: xen be12fcca8b784e456df3adedbffe657d753c5ff9 baseline version: xen
Re: [RFC PATCH] Added the logic to decode 32 bit ldr/str post-indexing instructions
Hi, > On 23 Nov 2021, at 08:53, Bertrand Marquis wrote: > > Hi Stefano, > >> On 23 Nov 2021, at 04:13, Stefano Stabellini wrote: >> >> On Mon, 22 Nov 2021, Julien Grall wrote: >>> On Mon, 22 Nov 2021 at 19:59, Stefano Stabellini >>> wrote: On Mon, 22 Nov 2021, Ayan Kumar Halder wrote: > Stefano > It doesn't look like we are setting dabt->write anywhere. > > Ayan > Yes, this is a miss. Depending on the opc, this should be upadeted > accordingly in decode_64bit_loadstore_postindexing(). > > Stefano > Also, is info.gpa in try_handle_mmio already updated in the > pre-index > case? If not, do we need to apply the offset manually here? > > Ayan > Sorry, I did not understand you. This patch is only related to the > post > indexing ldr/str issue. Why do we need to check for pre-indexing ? I thought you were trying to handle both post-indexing and pre-indexing. It is OK if you intend to only handle post-indexing but considering that most of the code is shared between the two, we might as well also make pre-indexing work (unless it turns out it is more difficult). >>> >>> Wouldn't this effectively be dead code? > > I agree this would be dead code. Pre-indexing is handled by the hardware, > only post are not. > >>> In the pre-indexing case, I would imagine we need to update the base address before taking any other actions. >>> From my understanding, this would have already been performed by the >>> HW when the syndrome is valid. This may also be the case for >>> the non-valid case, but I haven't checked the Arm Arm. >> >> It is not clear to me either, that's why I wrote "I would imagine"... I >> was guessing that it is not done by the HW in the non-valid case but I >> don't know. > > This should be handled by the hardware here, so only post actions should > be handled here. > >> >> Of course, if it is already done by the HW, that's all the better: no >> need for us to do anything. > > I am wondering though if other types of accesses could not be handled here > without major modification of the code like other sizes then 32bit. I did some checks and I think the following cases could be handled: ldr x2, [x1], #4 nop ldr w2, [x1], #-4 nop ldrh w2, [x1], #8 nop ldrb w2, [x1], #16 nop str x2, [x1], #4 nop str w2, [x1], #-4 nop strh w2, [x1], #8 nop strb w2, [x1], #16 nop Anything that I could have missed ? > > There are also post instructions with shifting but to be honest I do not > think this is really needed. Please ignore this, there is no post shifting. Once this is done I can test and add a test to XTF on arm (on our side, upstreaming of this is in progress) to make sure this is maintained. Regards Bertrand
Re: [PATCH 4/5] x86/traps: Drop exception_table[] and use if/else dispatching
On 19.11.2021 19:21, Andrew Cooper wrote: > There is also a lot of redundancy in the table. 8 vectors head to do_trap(), > 3 are handled in the IST logic, and that only leaves 7 others not heading to > the do_reserved_trap() catch-all. This also removes the fragility that any > accidental NULL entry in the table becomes a ticking timebomb. > > Function pointers are expensive under retpoline, and different vectors have > wildly different frequences. Drop the indirect call, and use an if/else chain > instead, which is a code layout technique used by profile-guided optimsiation. > > Using Xen's own perfcounter infrastructure, we see the following frequences of > vectors measured from boot until I can SSH into dom0 and collect the stats: > > vec | CFL-R | Milan | Notes > +-+-+ > NMI | 345 |3768 | Watchdog. Milan has many more CPUs. > +-+-+ > #PF | 1233234 | 2006441 | > #GP | 90054 | 96193 | > #UD | 848 | 851 | > #NM | 0 | 132 | Per-vendor lazy vs eager FPU policy. > #DB | 67 | 67 | No clue, but it's something in userspace. > > Bloat-o-meter (after some manual insertion of ELF metadata) reports: > > add/remove: 0/1 grow/shrink: 2/0 up/down: 102/-256 (-154) > Function old new delta > handle_exception_saved 148 226 +78 > handle_ist_exception 453 477 +24 > exception_table 256 --256 > > showing that the if/else chains are less than half the size that > exception_table[] was in the first place. > > As part of this change, make two other minor changes. do_reserved_trap() is > renamed to do_unhandled_trap() because it is the catchall, and already covers > things that aren't reserved any more (#VE/#VC/#HV/#SX). > > Furthermore, don't forward #TS to guests. #TS is specifically for errors > relating to the Task State Segment, which is a Xen-owned structure, not a > guest-owned structure. Even in the 32bit days, we never let guests register > their own Task State Segments. > > Signed-off-by: Andrew Cooper As it feels like we won't be coming to an agreement here, despite my reservations against the specific form of some of this and on the basis that the code change itself is certainly an improvement: Acked-by: Jan Beulich I'd nevertheless be curious whether in five or ten years time you'd still agree with your choice of basing a decision on not really representative data (at least as far as what the description says). Jan
Re: [RFC PATCH] Added the logic to decode 32 bit ldr/str post-indexing instructions
Hi Stefano, > On 23 Nov 2021, at 04:13, Stefano Stabellini wrote: > > On Mon, 22 Nov 2021, Julien Grall wrote: >> On Mon, 22 Nov 2021 at 19:59, Stefano Stabellini >> wrote: >>> >>> On Mon, 22 Nov 2021, Ayan Kumar Halder wrote: Stefano > It doesn't look like we are setting dabt->write anywhere. Ayan > Yes, this is a miss. Depending on the opc, this should be upadeted accordingly in decode_64bit_loadstore_postindexing(). Stefano > Also, is info.gpa in try_handle_mmio already updated in the pre-index case? If not, do we need to apply the offset manually here? Ayan > Sorry, I did not understand you. This patch is only related to the post indexing ldr/str issue. Why do we need to check for pre-indexing ? >>> >>> I thought you were trying to handle both post-indexing and pre-indexing. >>> It is OK if you intend to only handle post-indexing but considering that >>> most of the code is shared between the two, we might as well also make >>> pre-indexing work (unless it turns out it is more difficult). >> >> Wouldn't this effectively be dead code? I agree this would be dead code. Pre-indexing is handled by the hardware, only post are not. >> >>> >>> In the pre-indexing case, I would imagine we need to update the base >>> address before taking any other actions. >> >>> From my understanding, this would have already been performed by the >> HW when the syndrome is valid. This may also be the case for >> the non-valid case, but I haven't checked the Arm Arm. > > It is not clear to me either, that's why I wrote "I would imagine"... I > was guessing that it is not done by the HW in the non-valid case but I > don't know. This should be handled by the hardware here, so only post actions should be handled here. > > Of course, if it is already done by the HW, that's all the better: no > need for us to do anything. I am wondering though if other types of accesses could not be handled here without major modification of the code like other sizes then 32bit. There are also post instructions with shifting but to be honest I do not think this is really needed. Regards Bertrand
Re: [PATCH v3] xen: detect uninitialized xenbus in xenbus_init
On 23.11.2021 06:42, Juergen Gross wrote: > On 22.11.21 23:16, Stefano Stabellini wrote: >> From: Stefano Stabellini >> >> If the xenstore page hasn't been allocated properly, reading the value >> of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return >> error. Instead, it will succeed and return zero. Instead of attempting >> to xen_remap a bad guest physical address, detect this condition and >> return early. >> >> Note that although a guest physical address of zero for >> HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice >> and zero has never been validly used in that capacity. >> >> Also recognize the invalid value of INVALID_PFN which is ULLONG_MAX. >> >> For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns >> above ULONG_MAX should never be passed by the Xen tools to HVM guests >> anyway, so check for this condition and return early. >> >> Cc: sta...@vger.kernel.org >> Signed-off-by: Stefano Stabellini >> --- >> Changes in v3: >> - improve in-code comment >> - improve check >> >> Changes in v2: >> - add check for ULLONG_MAX (unitialized) >> - add check for ULONG_MAX #if BITS_PER_LONG == 32 (actual error) >> - add pr_err error message >> --- >> drivers/xen/xenbus/xenbus_probe.c | 24 >> 1 file changed, 24 insertions(+) >> >> diff --git a/drivers/xen/xenbus/xenbus_probe.c >> b/drivers/xen/xenbus/xenbus_probe.c >> index 94405bb3829e..d3ca57d48a73 100644 >> --- a/drivers/xen/xenbus/xenbus_probe.c >> +++ b/drivers/xen/xenbus/xenbus_probe.c >> @@ -951,6 +951,30 @@ static int __init xenbus_init(void) >> err = hvm_get_parameter(HVM_PARAM_STORE_PFN, ); >> if (err) >> goto out_error; >> +/* >> + * Uninitialized hvm_params are zero and return no error. >> + * Although it is theoretically possible to have >> + * HVM_PARAM_STORE_PFN set to zero on purpose, in reality it is >> + * not zero when valid. If zero, it means that Xenstore hasn't >> + * been properly initialized. Instead of attempting to map a >> + * wrong guest physical address return error. >> + * >> + * Also recognize the invalid value of INVALID_PFN which is >> + * ULLONG_MAX. > > Adjust the comment, e.g. s/ULLONG_MAX/all bits set/ (in the commit > message, too)? I also don't think the reference to INVALID_PFN is appropriate here. Afaict the two aren't the same on 32-bit. Plus I can't even find a constant named this way in Linux'es include/. >> + */ >> +if (!v || !(v + 1)) { > > For me "if (!v || !~v)" would be more readable, but I don't really feel > strong here. Oh, indeed. Jan
[qemu-mainline test] 166300: tolerable FAIL - PUSHED
flight 166300 qemu-mainline real [real] flight 166306 qemu-mainline real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/166300/ http://logs.test-lab.xenproject.org/osstest/logs/166306/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-credit2 10 host-ping-check-xen fail pass in 166306-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 166306 never pass test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 166306 never pass test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 166267 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 166267 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 166267 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 166267 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 166267 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 166267 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 166267 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 166267 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass version targeted for testing: qemuu89d2f9e4c63799f7f03e9180c63b7dc45fc2a04a baseline version: qemuuedf1aa8d44d5070828ac10c5b82dbc6caf29e7d3 Last test of basis 166267 2021-11-22
Re: [PATCH 1/2] rangeset: add RANGESETF_no_print flag
On 23.11.21 10:01, Jan Beulich wrote: > On 23.11.2021 08:49, Oleksandr Andrushchenko wrote: >> >> On 23.11.21 09:38, Jan Beulich wrote: >>> On 22.11.2021 10:28, Oleksandr Andrushchenko wrote: --- a/xen/include/xen/rangeset.h +++ b/xen/include/xen/rangeset.h @@ -51,6 +51,9 @@ void rangeset_limit( /* Pretty-print range limits in hexadecimal. */ #define _RANGESETF_prettyprint_hex 0 #define RANGESETF_prettyprint_hex (1U << _RANGESETF_prettyprint_hex) +/* Do not print entries marked with this flag. */ +#define _RANGESETF_no_print 1 >>> There's no need for this, so I'd like to ask that you add ... >>> +#define RANGESETF_no_print (1U << _RANGESETF_no_print) >>> ... just this. In due course we should do away with >>> _RANGESETF_prettyprint_hex as well (unless you want to do so right >>> at this occasion). >> Ok, I can remove "#define _RANGESETF_prettyprint_hex 0" as well >> and have both flags defined as >> /* Pretty-print range limits in hexadecimal. */ >> #define RANGESETF_prettyprint_hex (1U << 0) >> /* Do not print entries marked with this flag. */ >> #define RANGESETF_no_print (1U << 1) >> >> Or we can use BIT macro here: >> >> /* Pretty-print range limits in hexadecimal. */ >> #define RANGESETF_prettyprint_hex BIT(0, U) >> /* Do not print entries marked with this flag. */ >> #define RANGESETF_no_print BIT(1, U) >> >> What's your preference here? > Clearly the former. I am fine with this too, so I'll use it > But this may not be everybody's view. I'm hesitant > towards further uses at least as long as its generalization, GENMASK(), > isn't ready for easy use. It was some time ago that we had a discussion > about that one, supporting my reservations voiced back at the time when > it was introduced into our code base. Iirc there was more than one > issue with it; the immediately obvious one is that it silently does the > wrong thing when the arguments get specified the wrong way round, when > it really would be relatively easy to support either ordering. > > Jan > Thank you, Oleksandr
Re: [PATCH 1/2] rangeset: add RANGESETF_no_print flag
On 23.11.2021 08:49, Oleksandr Andrushchenko wrote: > > > On 23.11.21 09:38, Jan Beulich wrote: >> On 22.11.2021 10:28, Oleksandr Andrushchenko wrote: >>> --- a/xen/include/xen/rangeset.h >>> +++ b/xen/include/xen/rangeset.h >>> @@ -51,6 +51,9 @@ void rangeset_limit( >>>/* Pretty-print range limits in hexadecimal. */ >>> #define _RANGESETF_prettyprint_hex 0 >>> #define RANGESETF_prettyprint_hex (1U << _RANGESETF_prettyprint_hex) >>> +/* Do not print entries marked with this flag. */ >>> +#define _RANGESETF_no_print 1 >> There's no need for this, so I'd like to ask that you add ... >> >>> +#define RANGESETF_no_print (1U << _RANGESETF_no_print) >> ... just this. In due course we should do away with >> _RANGESETF_prettyprint_hex as well (unless you want to do so right >> at this occasion). > Ok, I can remove "#define _RANGESETF_prettyprint_hex 0" as well > and have both flags defined as > /* Pretty-print range limits in hexadecimal. */ > #define RANGESETF_prettyprint_hex (1U << 0) > /* Do not print entries marked with this flag. */ > #define RANGESETF_no_print (1U << 1) > > Or we can use BIT macro here: > > /* Pretty-print range limits in hexadecimal. */ > #define RANGESETF_prettyprint_hex BIT(0, U) > /* Do not print entries marked with this flag. */ > #define RANGESETF_no_print BIT(1, U) > > What's your preference here? Clearly the former. But this may not be everybody's view. I'm hesitant towards further uses at least as long as its generalization, GENMASK(), isn't ready for easy use. It was some time ago that we had a discussion about that one, supporting my reservations voiced back at the time when it was introduced into our code base. Iirc there was more than one issue with it; the immediately obvious one is that it silently does the wrong thing when the arguments get specified the wrong way round, when it really would be relatively easy to support either ordering. Jan