[xen-4.13-testing test] 169180: tolerable FAIL - PUSHED
flight 169180 xen-4.13-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/169180/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168481 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168481 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168481 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168481 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168481 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168481 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168481 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168481 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168481 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168481 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168481 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168481 test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass version targeted for testing: xen 169a2834ef5d723091f187a5d6493ae77825757a baseline version: xen
Re: Increasing domain memory beyond initial maxmem
On 05.04.22 18:24, Marek Marczykowski-Górecki wrote: On Tue, Apr 05, 2022 at 01:03:57PM +0200, Juergen Gross wrote: Hi Marek, On 31.03.22 14:36, Marek Marczykowski-Górecki wrote: On Thu, Mar 31, 2022 at 02:22:03PM +0200, Juergen Gross wrote: Maybe some kernel config differences, or other udev rules (memory onlining is done via udev in my guest)? I'm seeing: # zgrep MEMORY_HOTPLUG /proc/config.gz CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG=y # CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512 I have: # zgrep MEMORY_HOTPLUG /proc/config.gz CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512 Not sure if relevant, but I also have: CONFIG_XEN_UNPOPULATED_ALLOC=y on top of that, I have a similar udev rule too: SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online" But I don't think they are conflicting. What type of guest are you using? Mine was a PVH guest. PVH here too. Would you like to try the attached patch? It seemed to work for me. Unfortunately it doesn't help, now the behavior is different: Initially guest started with 800M: [root@personal ~]# free -m totalusedfree shared buff/cache available Mem:740 223 272 2 243 401 Swap: 1023 01023 Then increased: [root@dom0 ~]$ xl mem-max personal 2048 [root@dom0 ~]$ xenstore-write /local/domain/$(xl domid personal)/memory/static-max $((2048*1024)) [root@dom0 ~]$ xl mem-set personal 2000 And guest shows now only a little more memory, but not full 2000M: [root@personal ~]# [ 37.657046] xen:balloon: Populating new zone [ 37.658206] Fallback order for Node 0: 0 [ 37.658219] Built 1 zonelists, mobility grouping on. Total pages: 175889 [ 37.658233] Policy zone: Normal [root@personal ~]# [root@personal ~]# free -m totalusedfree shared buff/cache available Mem:826 245 337 2 244 462 Swap: 1023 01023 I've applied the patch on top of 5.16.18. If you think 5.17 would make a difference, I can try that too. Hmm, weird. Can you please post the output of cat /proc/buddyinfo cat /proc/iomem in the guest before and after the operations? Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[xen-4.15-testing test] 169178: tolerable FAIL - PUSHED
flight 169178 xen-4.15-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/169178/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169162 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169162 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 169162 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169162 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169162 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169162 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 169162 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 169162 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 169162 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 169162 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 169162 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 169162 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass version targeted for testing: xen aaa61028803a64e72f1026f9608dfa34d0c255ec baseline version: xen
Re: [PATCH v4 8/9] tools: add example application to initialize dom0less PV drivers
On Fri, 1 Apr 2022, Juergen Gross wrote: > On 01.04.22 12:21, Julien Grall wrote: > > Hi, > > > > I have posted some comments in v3 after you sent this version. Please have a > > look. > > > > On 01/04/2022 01:38, Stefano Stabellini wrote: > > > +static int init_domain(struct xs_handle *xsh, libxl_dominfo *info) > > > +{ > > > + struct xc_interface_core *xch; > > > + libxl_uuid uuid; > > > + uint64_t xenstore_evtchn, xenstore_pfn; > > > + int rc; > > > + > > > + printf("Init dom0less domain: %u\n", info->domid); > > > + xch = xc_interface_open(0, 0, 0); > > > + > > > + rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_EVTCHN, > > > + _evtchn); > > > + if (rc != 0) { > > > + printf("Failed to get HVM_PARAM_STORE_EVTCHN\n"); > > > + return 1; > > > + } > > > + > > > + /* Alloc xenstore page */ > > > + if (alloc_xs_page(xch, info, _pfn) != 0) { > > > + printf("Error on alloc magic pages\n"); > > > + return 1; > > > + } > > > + > > > + rc = xc_dom_gnttab_seed(xch, info->domid, true, > > > + (xen_pfn_t)-1, xenstore_pfn, 0, 0); > > > + if (rc) > > > + err(1, "xc_dom_gnttab_seed"); > > > + > > > + libxl_uuid_generate(); > > > + xc_domain_sethandle(xch, info->domid, libxl_uuid_bytearray()); > > > + > > > + rc = gen_stub_json_config(info->domid, ); > > > + if (rc) > > > + err(1, "gen_stub_json_config"); > > > + > > > + /* Now everything is ready: set HVM_PARAM_STORE_PFN */ > > > + rc = xc_hvm_param_set(xch, info->domid, HVM_PARAM_STORE_PFN, > > > + xenstore_pfn); > > > > On patch #1, you told me you didn't want to allocate the page in Xen because > > it wouldn't be initialized by Xenstored. But this is what we are doing here. > > Xenstore (at least the C variant) is only using the fixed grant ref > GNTTAB_RESERVED_XENSTORE, so it doesn't need the page to be advertised > to the guest. And the mapping is done only when the domain is being > introduced to Xenstore. > > > > > This would be a problem if Linux is still booting and hasn't yet call > > xenbus_probe_initcall(). > > > > I understand we need to have the page setup before raising the event > > channel. I don't think we can allow Xenstored to set the HVM_PARAM (it may > > run in a domain with less privilege). So I think we may need to create a > > separate command to kick the client (not great). > > > > Juergen, any thoughts? > > I think it should work like that: > > - setup the grant via xc_dom_gnttab_seed() > - introduce the domain to Xenstore > - call xc_hvm_param_set() > > When the guest is receiving the event, it should wait for the xenstore > page to appear. I am OK with what you wrote above, and I understand Julien's concerns about "waiting". Before discussing that, I would like to make sure I understood why setting HVM_PARAM_STORE_PFN first (before xs_introduce_domain) is not possible. In a previous reply to Julien I wrote that it is not a good idea to set HVM_PARAM_STORE_PFN in Xen before creating the domains because it would cause Linux to hang at boot. That is true, Linux hangs on drivers/xen/xenbus/xenbus_comms.c:xb_init_comms waiting on xb_waitq. It could wait a very long time as domUs are typically a lot faster than dom0 to boot. However, if we set HVM_PARAM_STORE_PFN before calling xs_introduce_domain in init-dom0less, for Linux to see it before xs_introduce_domain is done, Linux would need to be racing against init-dom0less. In that case, the wait in xb_init_comms would be minimal anyway. It shouldn't be a problem. There would be no "hang", just a wait a bit longer than usual. Is that right?
Re: [PATCH] arm/xen: Fix refcount leak in xen_dt_guest_init
Hi, On Fri, Mar 11, 2022 at 06:01:11PM -0800, Stefano Stabellini wrote: > On Wed, 9 Mar 2022, Miaoqian Lin wrote: > > The of_find_compatible_node() function returns a node pointer with > > refcount incremented, We should use of_node_put() on it when done > > Add the missing of_node_put() to release the refcount. > > > > Fixes: 9b08aaa3199a ("ARM: XEN: Move xen_early_init() before efi_init()") > > Signed-off-by: Miaoqian Lin > > Thanks for the patch! > > > > --- > > arch/arm/xen/enlighten.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > > index ec5b082f3de6..262f45f686b6 100644 > > --- a/arch/arm/xen/enlighten.c > > +++ b/arch/arm/xen/enlighten.c > > @@ -424,6 +424,7 @@ static void __init xen_dt_guest_init(void) > > > > if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, )) { > > pr_err("Xen grant table region is not found\n"); > > + of_node_put(xen_node); > > return; > > } > > This is adding a call to of_node_put on the error path. Shouldn't it > be called also in the non-error path? You're right. It should be called also in the non-error path. I made a mistake. > Also, there is another instance of of_address_to_resource being called > in this file (in arch_xen_unpopulated_init), does it make sense to call > of_node_put there too? I think so, becase device node pointer np is a local variable. So the reference it taken should be released in the scope. I look into the whole codebase for this kind of usage pattern ($ret=of_find_compatible_node();of_address_to_resource($ret,_,_), $ret is a local variable), Most of them call of_node_put() when done. And document of of_find_compatible_node() also mentions > Return: A node pointer with refcount incremented, use > of_node_put() on it when done. But I am not sure, Since I am unfamiliar with other code logic. It better if the developers could double check. I found some similar cases in arch/arm.
Re: [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion
On 4/1/22 9:21 AM, Chuck Zmudzinski wrote: On 3/30/22 2:45 PM, Jason Andryuk wrote: On Fri, Mar 18, 2022 at 4:13 AM Jan Beulich wrote: On 14.03.2022 04:41, Chuck Zmudzinski wrote: When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD opregion to the guest but libxl does not grant the guest permission to access the mapped memory region. This results in a crash of the i915.ko kernel module in a Linux HVM guest when it needs to access the IGD opregion: Oct 23 11:36:33 domU kernel: Call Trace: Oct 23 11:36:33 domU kernel: ? idr_alloc+0x39/0x70 Oct 23 11:36:33 domU kernel: drm_get_last_vbltimestamp+0xaa/0xc0 [drm] Oct 23 11:36:33 domU kernel: drm_reset_vblank_timestamp+0x5b/0xd0 [drm] Oct 23 11:36:33 domU kernel: drm_crtc_vblank_on+0x7b/0x130 [drm] Oct 23 11:36:33 domU kernel: intel_modeset_setup_hw_state+0xbd4/0x1900 [i915] Oct 23 11:36:33 domU kernel: ? _cond_resched+0x16/0x40 Oct 23 11:36:33 domU kernel: ? ww_mutex_lock+0x15/0x80 Oct 23 11:36:33 domU kernel: intel_modeset_init_nogem+0x867/0x1d30 [i915] Oct 23 11:36:33 domU kernel: ? gen6_write32+0x4b/0x1c0 [i915] Oct 23 11:36:33 domU kernel: ? intel_irq_postinstall+0xb9/0x670 [i915] Oct 23 11:36:33 domU kernel: i915_driver_probe+0x5c2/0xc90 [i915] Oct 23 11:36:33 domU kernel: ? vga_switcheroo_client_probe_defer+0x1f/0x40 Oct 23 11:36:33 domU kernel: ? i915_pci_probe+0x3f/0x150 [i915] Oct 23 11:36:33 domU kernel: local_pci_probe+0x42/0x80 Oct 23 11:36:33 domU kernel: ? _cond_resched+0x16/0x40 Oct 23 11:36:33 domU kernel: pci_device_probe+0xfd/0x1b0 Oct 23 11:36:33 domU kernel: really_probe+0x222/0x480 Oct 23 11:36:33 domU kernel: driver_probe_device+0xe1/0x150 Oct 23 11:36:33 domU kernel: device_driver_attach+0xa1/0xb0 Oct 23 11:36:33 domU kernel: __driver_attach+0x8a/0x150 Oct 23 11:36:33 domU kernel: ? device_driver_attach+0xb0/0xb0 Oct 23 11:36:33 domU kernel: ? device_driver_attach+0xb0/0xb0 Oct 23 11:36:33 domU kernel: bus_for_each_dev+0x78/0xc0 Oct 23 11:36:33 domU kernel: bus_add_driver+0x12b/0x1e0 Oct 23 11:36:33 domU kernel: driver_register+0x8b/0xe0 Oct 23 11:36:33 domU kernel: ? 0xc06b8000 Oct 23 11:36:33 domU kernel: i915_init+0x5d/0x70 [i915] Oct 23 11:36:33 domU kernel: do_one_initcall+0x44/0x1d0 Oct 23 11:36:33 domU kernel: ? do_init_module+0x23/0x260 Oct 23 11:36:33 domU kernel: ? kmem_cache_alloc_trace+0xf5/0x200 Oct 23 11:36:33 domU kernel: do_init_module+0x5c/0x260 Oct 23 11:36:33 domU kernel: __do_sys_finit_module+0xb1/0x110 Oct 23 11:36:33 domU kernel: do_syscall_64+0x33/0x80 Oct 23 11:36:33 domU kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 The call trace alone leaves open where exactly the crash occurred. Looking at 5.17 I notice that the first thing the driver does after mapping the range it to check the signature (both in intel_opregion_setup()). As the signature can't possibly match with no access granted to the underlying mappings, there shouldn't be any further attempts to use the region in the driver; if there are, I'd view this as a driver bug. Yes. i915_driver_hw_probe does not check the return value of intel_opregion_setup(dev_priv) and just continues on. Chuck, the attached patch may help if you want to test it. Regards, Jason I tested the patch - it made no noticeable difference. Correction (sorry for the confusion): I didn't know I needed to replace more than just a re-built i915.ko module to enable the patch for testing. When I updated the entire Debian kernel package including all the modules and the kernel image with the patched kernel package, it made quite a difference. With Jason's patch, the three call traces just became a much shorter error message: Apr 05 20:46:18 debian kernel: xen: --> pirq=16 -> irq=24 (gsi=24) Apr 05 20:46:18 debian kernel: i915 :00:02.0: [drm] VT-d active for gfx access Apr 05 20:46:18 debian kernel: i915 :00:02.0: vgaarb: deactivate vga console Apr 05 20:46:18 debian kernel: Console: switching to colour dummy device 80x25 Apr 05 20:46:18 debian kernel: i915 :00:02.0: [drm] DMAR active, disabling use of stolen memory Apr 05 20:46:18 debian kernel: resource sanity check: requesting [mem 0x-0x11ffe], which spans more than Reserved [mem 0xfdfff000-0x] Apr 05 20:46:18 debian kernel: caller memremap+0xeb/0x1c0 mapping multiple BARs Apr 05 20:46:18 debian kernel: i915 :00:02.0: Device initialization failed (-22) Apr 05 20:46:18 debian kernel: i915 :00:02.0: Please file a bug on drm/i915; see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details. Apr 05 20:46:18 debian kernel: i915: probe of :00:02.0 failed with error -22 - End of Kernel Error Log -- So I think the patch does propagate the error up the stack and bails out before producing the Call traces, and... I even had output after booting - the gdm3 Gnome display manager login page displayed, but when I tried to login to the Gnome
Re: [PATCH v3 5/5] tools: add example application to initialize dom0less PV drivers
On Tue, 5 Apr 2022, Stefano Stabellini wrote: > On Fri, 1 Apr 2022, Julien Grall wrote: > > On 01/04/2022 01:35, Stefano Stabellini wrote: > > > > > > > + > > > > > > > +/* Alloc magic pages */ > > > > > > > +if (alloc_magic_pages(info, ) != 0) { > > > > > > > +printf("Error on alloc magic pages\n"); > > > > > > > +return 1; > > > > > > > +} > > > > > > > + > > > > > > > +xc_dom_gnttab_init(); > > > > > > > > > > > > This call as the risk to break the guest if the dom0 Linux doesn't > > > > > > support > > > > > > the > > > > > > acquire interface. This is because it will punch a hole in the > > > > > > domain > > > > > > memory > > > > > > where the grant-table may have already been mapped. > > > > > > > > > > > > Also, this function could fails. > > > > > > > > > > I'll check for return errors. Dom0less is for fully static > > > > > configurations so I think it is OK to return error and abort if > > > > > something unexpected happens: dom0less' main reason for being is that > > > > > there is nothing unexpected :-) > > > > Does this mean the caller will have to reboot the system if there is an > > > > error? > > > > IOW, we don't expect them to call ./init-dom0less twice. > > > > > > Yes, exactly. I think init-dom0less could even panic. My mental model is > > > that this is an "extension" of construct_domU. Over there we just panic > > > if something is wrong and here it would be similar. The user provided a > > > wrong config and should fix it. > > > > Ok. I think we should make explicit how it can be used. > > > > > > > > > + > > > > > > > +libxl_uuid_generate(); > > > > > > > +xc_domain_sethandle(dom.xch, info->domid, > > > > > > > libxl_uuid_bytearray()); > > > > > > > + > > > > > > > +rc = gen_stub_json_config(info->domid, ); > > > > > > > +if (rc) > > > > > > > +err(1, "gen_stub_json_config"); > > > > > > > + > > > > > > > +rc = restore_xenstore(xsh, info, uuid, dom.xenstore_evtchn); > > > > > > > +if (rc) > > > > > > > +err(1, "writing to xenstore"); > > > > > > > + > > > > > > > +xs_introduce_domain(xsh, info->domid, > > > > > > > +(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + > > > > > > > XENSTORE_PFN_OFFSET, > > > > > > > +dom.xenstore_evtchn); > > > > > > > > > > > > xs_introduce_domain() can technically fails. > > > > > > > > > > OK > > > > > > > > > > > > > > > > > +return 0; > > > > > > > +} > > > > > > > + > > > > > > > +/* Check if domain has been configured in XS */ > > > > > > > +static bool domain_exists(struct xs_handle *xsh, int domid) > > > > > > > +{ > > > > > > > +return xs_is_domain_introduced(xsh, domid); > > > > > > > +} > > > > > > > > > > > > Would not this lead to initialize a domain with PV driver disabled? > > > > > > > > > > I am not sure I understood your question, but I'll try to answer > > > > > anyway. > > > > > This check is purely to distinguish dom0less guests, which needs > > > > > further > > > > > initializations, from regular guests (e.g. xl guests) that don't need > > > > > any actions taken here. > > > > > > > > Dom0less domUs can be divided in two categories based on whether they > > > > are > > > > xen > > > > aware (e.g. xen,enhanced is set). > > > > > > > > Looking at this script, it seems to assume that all dom0less domUs are > > > > Xen > > > > aware. So it will end up to allocate Xenstore ring and call > > > > xs_introduce_domain(). I suspect the call will end up to fail because > > > > the > > > > event channel would be 0. > > > > > > > > So did you try to use this script on a platform where there only xen > > > > aware > > > > domU and/or a mix? > > > > > > Good idea of asking for this test. I thought I already ran that test, > > > but I did it again to be sure. Everything works OK (although the > > > xenstore page allocation is unneeded). xs_introduce_domain does not > > > fail: > > > > Are you sure? If I pass 0 as the 4th argument (event channel), the command > > will return EINVAL. However, looking at the code you are not checking the > > return for the call. So you will continue as if it were successful. > > We are not passing 0 as the 4th argument, we are passing the event > channel previously set as HVM_PARAM_STORE_EVTCHN by Xen: > > rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_EVTCHN, > _evtchn); > > Also in my working version of the series I have a check for the return > value of xs_introduce_domain (as you requested in one of your previous > reviews). So xs_introduce_domain is actually working correctly and > returning success. Sorry I didn't read carefully enough the older messages. I re-run the tests again and I can see the issue you were describing (I am puzzled on why I didn't see it before as I did have a check on the return value as I wrote -- probably a mistake in my setup.) The problem goes away if we only call xs_introduce_domain for xen,enhanced domains (when
Re: [PATCH v3 5/5] tools: add example application to initialize dom0less PV drivers
On Fri, 1 Apr 2022, Juergen Gross wrote: > On 01.04.22 12:02, Julien Grall wrote: > > Hi Stefano, > > > > On 01/04/2022 01:35, Stefano Stabellini wrote: > > > > > > > + > > > > > > > + /* Alloc magic pages */ > > > > > > > + if (alloc_magic_pages(info, ) != 0) { > > > > > > > + printf("Error on alloc magic pages\n"); > > > > > > > + return 1; > > > > > > > + } > > > > > > > + > > > > > > > + xc_dom_gnttab_init(); > > > > > > > > > > > > This call as the risk to break the guest if the dom0 Linux doesn't > > > > > > support > > > > > > the > > > > > > acquire interface. This is because it will punch a hole in the > > > > > > domain > > > > > > memory > > > > > > where the grant-table may have already been mapped. > > > > > > > > > > > > Also, this function could fails. > > > > > > > > > > I'll check for return errors. Dom0less is for fully static > > > > > configurations so I think it is OK to return error and abort if > > > > > something unexpected happens: dom0less' main reason for being is that > > > > > there is nothing unexpected :-) > > > > Does this mean the caller will have to reboot the system if there is an > > > > error? > > > > IOW, we don't expect them to call ./init-dom0less twice. > > > > > > Yes, exactly. I think init-dom0less could even panic. My mental model is > > > that this is an "extension" of construct_domU. Over there we just panic > > > if something is wrong and here it would be similar. The user provided a > > > wrong config and should fix it. > > > > Ok. I think we should make explicit how it can be used. > > > > > > > > > + > > > > > > > + libxl_uuid_generate(); > > > > > > > + xc_domain_sethandle(dom.xch, info->domid, > > > > > > > libxl_uuid_bytearray()); > > > > > > > + > > > > > > > + rc = gen_stub_json_config(info->domid, ); > > > > > > > + if (rc) > > > > > > > + err(1, "gen_stub_json_config"); > > > > > > > + > > > > > > > + rc = restore_xenstore(xsh, info, uuid, dom.xenstore_evtchn); > > > > > > > + if (rc) > > > > > > > + err(1, "writing to xenstore"); > > > > > > > + > > > > > > > + xs_introduce_domain(xsh, info->domid, > > > > > > > + (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + > > > > > > > XENSTORE_PFN_OFFSET, > > > > > > > + dom.xenstore_evtchn); > > > > > > > > > > > > xs_introduce_domain() can technically fails. > > > > > > > > > > OK > > > > > > > > > > > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > +/* Check if domain has been configured in XS */ > > > > > > > +static bool domain_exists(struct xs_handle *xsh, int domid) > > > > > > > +{ > > > > > > > + return xs_is_domain_introduced(xsh, domid); > > > > > > > +} > > > > > > > > > > > > Would not this lead to initialize a domain with PV driver disabled? > > > > > > > > > > I am not sure I understood your question, but I'll try to answer > > > > > anyway. > > > > > This check is purely to distinguish dom0less guests, which needs > > > > > further > > > > > initializations, from regular guests (e.g. xl guests) that don't need > > > > > any actions taken here. > > > > > > > > Dom0less domUs can be divided in two categories based on whether they > > > > are xen > > > > aware (e.g. xen,enhanced is set). > > > > > > > > Looking at this script, it seems to assume that all dom0less domUs are > > > > Xen > > > > aware. So it will end up to allocate Xenstore ring and call > > > > xs_introduce_domain(). I suspect the call will end up to fail because > > > > the > > > > event channel would be 0. > > > > > > > > So did you try to use this script on a platform where there only xen > > > > aware > > > > domU and/or a mix? > > > > > > Good idea of asking for this test. I thought I already ran that test, > > > but I did it again to be sure. Everything works OK (although the > > > xenstore page allocation is unneeded). xs_introduce_domain does not > > > fail: > > > > Are you sure? If I pass 0 as the 4th argument (event channel), the command > > will return EINVAL. However, looking at the code you are not checking the > > return for the call. So you will continue as if it were successful. > > > > So you will end up to write nodes for a domain Xenstored is not aware and > > also set HVM_PARAM_STORE_PFN which may further confuse the guest as it may > > try to initialize Xenstored it discovers the page. > > > > > I think that's because it is usually called on all domains by the > > > toolstack, even the ones without xenstore support in the kernel. > > > > The toolstack will always allocate the event channel irrespective to whether > > the guest will use Xenstore. So both the shared page and the event channel > > are always valid today. > > > > With your series, this will change as the event channel will not be > > allocated when "xen,enhanced" is not set. > > > > In your case, I think we may want to register the domain to xenstore but say > > there are no connection available for the domain. Juergen,
Re: [PATCH v3 5/5] tools: add example application to initialize dom0less PV drivers
On Fri, 1 Apr 2022, Julien Grall wrote: > On 01/04/2022 01:35, Stefano Stabellini wrote: > > > > > > + > > > > > > +/* Alloc magic pages */ > > > > > > +if (alloc_magic_pages(info, ) != 0) { > > > > > > +printf("Error on alloc magic pages\n"); > > > > > > +return 1; > > > > > > +} > > > > > > + > > > > > > +xc_dom_gnttab_init(); > > > > > > > > > > This call as the risk to break the guest if the dom0 Linux doesn't > > > > > support > > > > > the > > > > > acquire interface. This is because it will punch a hole in the domain > > > > > memory > > > > > where the grant-table may have already been mapped. > > > > > > > > > > Also, this function could fails. > > > > > > > > I'll check for return errors. Dom0less is for fully static > > > > configurations so I think it is OK to return error and abort if > > > > something unexpected happens: dom0less' main reason for being is that > > > > there is nothing unexpected :-) > > > Does this mean the caller will have to reboot the system if there is an > > > error? > > > IOW, we don't expect them to call ./init-dom0less twice. > > > > Yes, exactly. I think init-dom0less could even panic. My mental model is > > that this is an "extension" of construct_domU. Over there we just panic > > if something is wrong and here it would be similar. The user provided a > > wrong config and should fix it. > > Ok. I think we should make explicit how it can be used. > > > > > > > + > > > > > > +libxl_uuid_generate(); > > > > > > +xc_domain_sethandle(dom.xch, info->domid, > > > > > > libxl_uuid_bytearray()); > > > > > > + > > > > > > +rc = gen_stub_json_config(info->domid, ); > > > > > > +if (rc) > > > > > > +err(1, "gen_stub_json_config"); > > > > > > + > > > > > > +rc = restore_xenstore(xsh, info, uuid, dom.xenstore_evtchn); > > > > > > +if (rc) > > > > > > +err(1, "writing to xenstore"); > > > > > > + > > > > > > +xs_introduce_domain(xsh, info->domid, > > > > > > +(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + > > > > > > XENSTORE_PFN_OFFSET, > > > > > > +dom.xenstore_evtchn); > > > > > > > > > > xs_introduce_domain() can technically fails. > > > > > > > > OK > > > > > > > > > > > > > > +return 0; > > > > > > +} > > > > > > + > > > > > > +/* Check if domain has been configured in XS */ > > > > > > +static bool domain_exists(struct xs_handle *xsh, int domid) > > > > > > +{ > > > > > > +return xs_is_domain_introduced(xsh, domid); > > > > > > +} > > > > > > > > > > Would not this lead to initialize a domain with PV driver disabled? > > > > > > > > I am not sure I understood your question, but I'll try to answer anyway. > > > > This check is purely to distinguish dom0less guests, which needs further > > > > initializations, from regular guests (e.g. xl guests) that don't need > > > > any actions taken here. > > > > > > Dom0less domUs can be divided in two categories based on whether they are > > > xen > > > aware (e.g. xen,enhanced is set). > > > > > > Looking at this script, it seems to assume that all dom0less domUs are Xen > > > aware. So it will end up to allocate Xenstore ring and call > > > xs_introduce_domain(). I suspect the call will end up to fail because the > > > event channel would be 0. > > > > > > So did you try to use this script on a platform where there only xen aware > > > domU and/or a mix? > > > > Good idea of asking for this test. I thought I already ran that test, > > but I did it again to be sure. Everything works OK (although the > > xenstore page allocation is unneeded). xs_introduce_domain does not > > fail: > > Are you sure? If I pass 0 as the 4th argument (event channel), the command > will return EINVAL. However, looking at the code you are not checking the > return for the call. So you will continue as if it were successful. We are not passing 0 as the 4th argument, we are passing the event channel previously set as HVM_PARAM_STORE_EVTCHN by Xen: rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_EVTCHN, _evtchn); Also in my working version of the series I have a check for the return value of xs_introduce_domain (as you requested in one of your previous reviews). So xs_introduce_domain is actually working correctly and returning success.
Re: [PATCH v3 13/19] xen/arm: Move fixmap definitions in a separate header
On Tue, 5 Apr 2022, Julien Grall wrote: > On 05/04/2022 22:12, Stefano Stabellini wrote: > > > +/* Map a page in a fixmap entry */ > > > +extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); > > > +/* Remove a mapping from a fixmap entry */ > > > +extern void clear_fixmap(unsigned map); > > > + > > > +#endif /* __ASSEMBLY__ */ > > > + > > > +#endif /* __ASM_FIXMAP_H */ > > > > > > It is a good idea to create fixmap.h, but I think it should be acpi.h to > > include fixmap.h, not the other way around. > > As I wrote in the commit message, one definition in fixmap.h rely on define > from acpi.h (i.e NUM_FIXMAP_ACPI_PAGES). So if we don't include it, then user > of FIXMAP_PMAP_BEGIN (see next patch) will requires to include acpi.h in order > to build. > > Re-ordering the values would not help because the problem would exactly be the > same but this time the acpi users would have to include pmap.h to define > NUM_FIX_PMAP. > > > > > The appended changes build correctly on top of this patch. > > That's expected because all the users of FIXMAP_ACPI_END will be including > acpi.h. But after the next patch, we would need pmap.c to include acpi.h. > > I don't think this would be right (and quite likely you would ask why > this is done). Hence this approach. I premise that I see your point and I don't feel very strongly either way. In my opinion the fixmap is the low level "library" that others make use of, so it should be acpi.h and pmap.h (the clients of the library) that include fixmap.h and not the other way around. So I would rather define NUM_FIXMAP_ACPI_PAGES and NUM_FIX_PMAP in fixmap.h, then have both pmap.h and acpi.h include fixmap.h. It makes more sense to me. However, I won't insist if you don't like it. Rough patch below for reference. diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h index c46a15e59d..a231ebfe25 100644 --- a/xen/arch/arm/include/asm/fixmap.h +++ b/xen/arch/arm/include/asm/fixmap.h @@ -4,8 +4,13 @@ #ifndef __ASM_FIXMAP_H #define __ASM_FIXMAP_H -#include -#include +#include +#include + +#define NUM_FIXMAP_ACPI_PAGES 64 + +/* Large enough for mapping 5 levels of page tables with some headroom */ +#define NUM_FIX_PMAP 8 /* Fixmap slots */ #define FIXMAP_CONSOLE 0 /* The primary UART */ @@ -22,6 +27,10 @@ #ifndef __ASSEMBLY__ +#include + +extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES]; + /* Map a page in a fixmap entry */ extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); /* Remove a mapping from a fixmap entry */ diff --git a/xen/arch/arm/include/asm/pmap.h b/xen/arch/arm/include/asm/pmap.h index 70eafe2891..31d29e021d 100644 --- a/xen/arch/arm/include/asm/pmap.h +++ b/xen/arch/arm/include/asm/pmap.h @@ -2,9 +2,8 @@ #define __ASM_PMAP_H__ #include +#include -/* XXX: Find an header to declare it */ -extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES]; static inline void arch_pmap_map(unsigned int slot, mfn_t mfn) { diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h index 1b9c75e68f..afcc9d5b4f 100644 --- a/xen/include/xen/acpi.h +++ b/xen/include/xen/acpi.h @@ -28,12 +28,7 @@ #define _LINUX #endif -/* - * Fixmap pages to reserve for ACPI boot-time tables (see - * arch/x86/include/asm/fixmap.h or arch/arm/include/asm/fixmap.h), - * 64 pages(256KB) is large enough for most cases.) - */ -#define NUM_FIXMAP_ACPI_PAGES 64 +#include #ifndef __ASSEMBLY__ diff --git a/xen/include/xen/pmap.h b/xen/include/xen/pmap.h index 93e61b1087..aa892154c0 100644 --- a/xen/include/xen/pmap.h +++ b/xen/include/xen/pmap.h @@ -1,9 +1,6 @@ #ifndef __XEN_PMAP_H__ #define __XEN_PMAP_H__ -/* Large enough for mapping 5 levels of page tables with some headroom */ -#define NUM_FIX_PMAP 8 - #ifndef __ASSEMBLY__ #include
Re: [PATCH v3 17/19] xen/arm64: mm: Add memory to the boot allocator first
On Tue, 5 Apr 2022, Julien Grall wrote: > On 05/04/2022 22:50, Stefano Stabellini wrote: > > > +static void __init setup_mm(void) > > > +{ > > > +const struct meminfo *banks = > > > +paddr_t ram_start = ~0; > > > +paddr_t ram_end = 0; > > > +paddr_t ram_size = 0; > > > +unsigned int i; > > > + > > > +init_pdx(); > > > + > > > +/* > > > + * We need some memory to allocate the page-tables used for the > > > xenheap > > > + * mappings. But some regions may contain memory already allocated > > > + * for other uses (e.g. modules, reserved-memory...). > > > + * > > > + * For simplify add all the free regions in the boot allocator. > > > + */ > > > > We currently have: > > > > BUG_ON(nr_bootmem_regions == (PAGE_SIZE / sizeof(struct bootmem_region))); > > This has enough space for 256 distinct regions on arm64 (512 regions on > arm32). > > > > > Do you think we should check for the limit in populate_boot_allocator? > > This patch doesn't change the number of regions added to the boot allocator. > So if we need to check the limit then I would rather deal separately (see more > below). > > > Or there is no need because it is unrealistic to reach it? > I can't say never because history told us on some UEFI systems, there will be > a large number of regions exposed. I haven't heard anyone that would hit the > BUG_ON(). > > The problem is what do we do if we hit the limit? We could ignore all the > regions after. However, there are potentially a risk there would not be enough > memory to cover the boot memory allocation (regions may be really small). > > So if we ever hit the limit, then I think we should update the boot allocator. OK, thanks for the explanation. Reviewed-by: Stefano Stabellini
Re: [PATCH v3 19/19] xen/arm: mm: Re-implement setup_frame_table_mappings() with map_pages_to_xen()
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Julien Grall > > Now that map_pages_to_xen() has been extended to support 2MB mappings, > we can replace the create_mappings() call by map_pages_to_xen() call. > > This has the advantage to remove the differences between 32-bit and > 64-bit code. > > Lastly remove create_mappings() as there is no more callers. > > Signed-off-by: Julien Grall > Signed-off-by: Julien Grall > > --- > Changes in v3: > - Fix typo in the commit message > - Remove the TODO regarding contiguous bit > > Changes in v2: > - New patch > --- > xen/arch/arm/mm.c | 63 --- > 1 file changed, 5 insertions(+), 58 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 4af59375d998..d73f49d5b6fc 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -354,40 +354,6 @@ void clear_fixmap(unsigned map) > BUG_ON(res != 0); > } > > -/* Create Xen's mappings of memory. > - * Mapping_size must be either 2MB or 32MB. > - * Base and virt must be mapping_size aligned. > - * Size must be a multiple of mapping_size. > - * second must be a contiguous set of second level page tables > - * covering the region starting at virt_offset. */ > -static void __init create_mappings(lpae_t *second, > - unsigned long virt_offset, > - unsigned long base_mfn, > - unsigned long nr_mfns, > - unsigned int mapping_size) > -{ > -unsigned long i, count; > -const unsigned long granularity = mapping_size >> PAGE_SHIFT; > -lpae_t pte, *p; > - > -ASSERT((mapping_size == MB(2)) || (mapping_size == MB(32))); > -ASSERT(!((virt_offset >> PAGE_SHIFT) % granularity)); > -ASSERT(!(base_mfn % granularity)); > -ASSERT(!(nr_mfns % granularity)); > - > -count = nr_mfns / XEN_PT_LPAE_ENTRIES; > -p = second + second_linear_offset(virt_offset); > -pte = mfn_to_xen_entry(_mfn(base_mfn), MT_NORMAL); > -if ( granularity == 16 * XEN_PT_LPAE_ENTRIES ) > -pte.pt.contig = 1; /* These maps are in 16-entry contiguous chunks. > */ > -for ( i = 0; i < count; i++ ) > -{ > -write_pte(p + i, pte); > -pte.pt.base += 1 << XEN_PT_LPAE_SHIFT; > -} > -flush_xen_tlb_local(); > -} > - > #ifdef CONFIG_DOMAIN_PAGE > void *map_domain_page_global(mfn_t mfn) > { > @@ -846,36 +812,17 @@ void __init setup_frametable_mappings(paddr_t ps, > paddr_t pe) > unsigned long frametable_size = nr_pdxs * sizeof(struct page_info); > mfn_t base_mfn; > const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : > MB(32); > -#ifdef CONFIG_ARM_64 > -lpae_t *second, pte; > -unsigned long nr_second; > -mfn_t second_base; > -int i; > -#endif > +int rc; > > frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps)); > /* Round up to 2M or 32M boundary, as appropriate. */ > frametable_size = ROUNDUP(frametable_size, mapping_size); > base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12)); > > -#ifdef CONFIG_ARM_64 > -/* Compute the number of second level pages. */ > -nr_second = ROUNDUP(frametable_size, FIRST_SIZE) >> FIRST_SHIFT; > -second_base = alloc_boot_pages(nr_second, 1); > -second = mfn_to_virt(second_base); > -for ( i = 0; i < nr_second; i++ ) > -{ > -clear_page(mfn_to_virt(mfn_add(second_base, i))); > -pte = mfn_to_xen_entry(mfn_add(second_base, i), MT_NORMAL); > -pte.pt.table = 1; > -write_pte(_first[first_table_offset(FRAMETABLE_VIRT_START)+i], > pte); > -} > -create_mappings(second, 0, mfn_x(base_mfn), frametable_size >> > PAGE_SHIFT, > -mapping_size); > -#else > -create_mappings(xen_second, FRAMETABLE_VIRT_START, mfn_x(base_mfn), > -frametable_size >> PAGE_SHIFT, mapping_size); > -#endif > +rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn, > + frametable_size >> PAGE_SHIFT, PAGE_HYPERVISOR_RW); Doesn't it need to be PAGE_HYPERVISOR_RW | _PAGE_BLOCK ? > +if ( rc ) > +panic("Unable to setup the frametable mappings.\n"); > > memset(_table[0], 0, nr_pdxs * sizeof(struct page_info)); > memset(_table[nr_pdxs], -1, > -- > 2.32.0 >
Re: [PATCH v3 18/19] xen/arm: mm: Rework setup_xenheap_mappings()
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Julien Grall > > The current implementation of setup_xenheap_mappings() is using 1GB > mappings. This can lead to unexpected result because the mapping > may alias a non-cachable region (such as device or reserved regions). > For more details see B2.8 in ARM DDI 0487H.a. > > map_pages_to_xen() was recently reworked to allow superpage mappings, > support contiguous mapping and deal with the use of pagge-tables before pagetables > they are mapped. > > Most of the code in setup_xenheap_mappings() is now replaced with a > single call to map_pages_to_xen(). > > Signed-off-by: Julien Grall > Signed-off-by: Julien Grall > > --- > Changes in v3: > - Don't use 1GB mapping > - Re-order code in setup_mm() in a separate patch > > Changes in v2: > - New patch > --- > xen/arch/arm/mm.c | 87 ++- > 1 file changed, 18 insertions(+), 69 deletions(-) Very good! > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 11b6b60a2bc1..4af59375d998 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -138,17 +138,6 @@ static DEFINE_PAGE_TABLE(cpu0_pgtable); > static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES); > #endif > > -#ifdef CONFIG_ARM_64 > -/* The first page of the first level mapping of the xenheap. The > - * subsequent xenheap first level pages are dynamically allocated, but > - * we need this one to bootstrap ourselves. */ > -static DEFINE_PAGE_TABLE(xenheap_first_first); > -/* The zeroeth level slot which uses xenheap_first_first. Used because > - * setup_xenheap_mappings otherwise relies on mfn_to_virt which isn't > - * valid for a non-xenheap mapping. */ > -static __initdata int xenheap_first_first_slot = -1; > -#endif > - > /* Common pagetable leaves */ > /* Second level page tables. > * > @@ -815,77 +804,37 @@ void __init setup_xenheap_mappings(unsigned long > base_mfn, > void __init setup_xenheap_mappings(unsigned long base_mfn, > unsigned long nr_mfns) > { > -lpae_t *first, pte; > -unsigned long mfn, end_mfn; > -vaddr_t vaddr; > - > -/* Align to previous 1GB boundary */ > -mfn = base_mfn & ~((FIRST_SIZE>>PAGE_SHIFT)-1); > +int rc; > > /* First call sets the xenheap physical and virtual offset. */ > if ( mfn_eq(xenheap_mfn_start, INVALID_MFN) ) > { > +unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1); > + > xenheap_mfn_start = _mfn(base_mfn); > xenheap_base_pdx = mfn_to_pdx(_mfn(base_mfn)); > +/* > + * The base address may not be aligned to the first level > + * size (e.g. 1GB when using 4KB pages). This would prevent > + * superpage mappings for all the regions because the virtual > + * address and machine address should both be suitably aligned. > + * > + * Prevent that by offsetting the start of the xenheap virtual > + * address. > + */ > xenheap_virt_start = DIRECTMAP_VIRT_START + > -(base_mfn - mfn) * PAGE_SIZE; > +(base_mfn - mfn_gb) * PAGE_SIZE; > } [...] > +rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn), > + _mfn(base_mfn), nr_mfns, > + PAGE_HYPERVISOR_RW | _PAGE_BLOCK); > +if ( rc ) > +panic("Unable to setup the xenheap mappings.\n"); I understand the intent of the code and I like it. maddr_to_virt is implemented as: return (void *)(XENHEAP_VIRT_START - (xenheap_base_pdx << PAGE_SHIFT) + ((ma & ma_va_bottom_mask) | ((ma & ma_top_mask) >> pfn_pdx_hole_shift))); The PDX stuff is always difficult to follow and I cannot claim that I traced through exactly what the resulting virtual address in the mapping would be for a given base_mfn, but the patch looks correct compared to the previous code. Reviewed-by: Stefano Stabellini
[xen-unstable test] 169172: tolerable FAIL
flight 169172 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/169172/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-freebsd10-amd64 19 guest-localmigrate/x10 fail pass in 169163 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 169163 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169163 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 169163 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169163 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169163 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 169163 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 169163 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 169163 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 169163 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 169163 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169163 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 169163 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169163 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
Re: [PATCH v3 17/19] xen/arm64: mm: Add memory to the boot allocator first
Hi Stefano, On 05/04/2022 22:50, Stefano Stabellini wrote: +static void __init setup_mm(void) +{ +const struct meminfo *banks = +paddr_t ram_start = ~0; +paddr_t ram_end = 0; +paddr_t ram_size = 0; +unsigned int i; + +init_pdx(); + +/* + * We need some memory to allocate the page-tables used for the xenheap + * mappings. But some regions may contain memory already allocated + * for other uses (e.g. modules, reserved-memory...). + * + * For simplify add all the free regions in the boot allocator. + */ We currently have: BUG_ON(nr_bootmem_regions == (PAGE_SIZE / sizeof(struct bootmem_region))); This has enough space for 256 distinct regions on arm64 (512 regions on arm32). Do you think we should check for the limit in populate_boot_allocator? This patch doesn't change the number of regions added to the boot allocator. So if we need to check the limit then I would rather deal separately (see more below). Or there is no need because it is unrealistic to reach it? I can't say never because history told us on some UEFI systems, there will be a large number of regions exposed. I haven't heard anyone that would hit the BUG_ON(). The problem is what do we do if we hit the limit? We could ignore all the regions after. However, there are potentially a risk there would not be enough memory to cover the boot memory allocation (regions may be really small). So if we ever hit the limit, then I think we should update the boot allocator. Cheers, -- Julien Grall
Re: [PATCH v3 17/19] xen/arm64: mm: Add memory to the boot allocator first
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Julien Grall > > Currently, memory is added to the boot allocator after the xenheap > mappings are done. This will break if the first mapping is more than > 512GB of RAM. > > In addition to that, a follow-up patch will rework setup_xenheap_mappings() > to use smaller mappings (e.g. 2MB, 4KB). So it will be necessary to have > memory in the boot allocator earlier. > > Only free memory (e.g. not reserved or modules) can be added to the boot > allocator. It might be possible that some regions (including the first > one) will have no free memory. > > So we need to add all the free memory to the boot allocator first > and then add do the mappings. > > Signed-off-by: Julien Grall > > --- > Changes in v3: > - Patch added > --- > xen/arch/arm/setup.c | 63 +--- > 1 file changed, 42 insertions(+), 21 deletions(-) > > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c > index d5d0792ed48a..777cf96639f5 100644 > --- a/xen/arch/arm/setup.c > +++ b/xen/arch/arm/setup.c > @@ -767,30 +767,18 @@ static void __init setup_mm(void) > init_staticmem_pages(); > } > #else /* CONFIG_ARM_64 */ > -static void __init setup_mm(void) > +static void __init populate_boot_allocator(void) > { > -paddr_t ram_start = ~0; > -paddr_t ram_end = 0; > -paddr_t ram_size = 0; > -int bank; > - > -init_pdx(); > +unsigned int i; > +const struct meminfo *banks = > > -total_pages = 0; > -for ( bank = 0 ; bank < bootinfo.mem.nr_banks; bank++ ) > +for ( i = 0; i < banks->nr_banks; i++ ) > { > -paddr_t bank_start = bootinfo.mem.bank[bank].start; > -paddr_t bank_size = bootinfo.mem.bank[bank].size; > -paddr_t bank_end = bank_start + bank_size; > +const struct membank *bank = >bank[i]; > +paddr_t bank_end = bank->start + bank->size; > paddr_t s, e; > > -ram_size = ram_size + bank_size; > -ram_start = min(ram_start,bank_start); > -ram_end = max(ram_end,bank_end); > - > -setup_xenheap_mappings(bank_start>>PAGE_SHIFT, > bank_size>>PAGE_SHIFT); > - > -s = bank_start; > +s = bank->start; > while ( s < bank_end ) > { > paddr_t n = bank_end; > @@ -798,9 +786,7 @@ static void __init setup_mm(void) > e = next_module(s, ); > > if ( e == ~(paddr_t)0 ) > -{ > e = n = bank_end; > -} > > if ( e > bank_end ) > e = bank_end; > @@ -809,6 +795,41 @@ static void __init setup_mm(void) > s = n; > } > } > +} > + > +static void __init setup_mm(void) > +{ > +const struct meminfo *banks = > +paddr_t ram_start = ~0; > +paddr_t ram_end = 0; > +paddr_t ram_size = 0; > +unsigned int i; > + > +init_pdx(); > + > +/* > + * We need some memory to allocate the page-tables used for the xenheap > + * mappings. But some regions may contain memory already allocated > + * for other uses (e.g. modules, reserved-memory...). > + * > + * For simplify add all the free regions in the boot allocator. > + */ We currently have: BUG_ON(nr_bootmem_regions == (PAGE_SIZE / sizeof(struct bootmem_region))); Do you think we should check for the limit in populate_boot_allocator? Or there is no need because it is unrealistic to reach it? > +populate_boot_allocator(); > + > +total_pages = 0; > + > +for ( i = 0; i < banks->nr_banks; i++ ) > +{ > +const struct membank *bank = >bank[i]; > +paddr_t bank_end = bank->start + bank->size; > + > +ram_size = ram_size + bank->size; > +ram_start = min(ram_start, bank->start); > +ram_end = max(ram_end, bank_end); > + > +setup_xenheap_mappings(PFN_DOWN(bank->start), > + PFN_DOWN(bank->size)); > +} > > total_pages += ram_size >> PAGE_SHIFT; > > -- > 2.32.0 >
Re: [PATCH v3 13/19] xen/arm: Move fixmap definitions in a separate header
Hi Stefano, On 05/04/2022 22:12, Stefano Stabellini wrote: +/* Map a page in a fixmap entry */ +extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); +/* Remove a mapping from a fixmap entry */ +extern void clear_fixmap(unsigned map); + +#endif /* __ASSEMBLY__ */ + +#endif /* __ASM_FIXMAP_H */ It is a good idea to create fixmap.h, but I think it should be acpi.h to include fixmap.h, not the other way around. As I wrote in the commit message, one definition in fixmap.h rely on define from acpi.h (i.e NUM_FIXMAP_ACPI_PAGES). So if we don't include it, then user of FIXMAP_PMAP_BEGIN (see next patch) will requires to include acpi.h in order to build. Re-ordering the values would not help because the problem would exactly be the same but this time the acpi users would have to include pmap.h to define NUM_FIX_PMAP. The appended changes build correctly on top of this patch. That's expected because all the users of FIXMAP_ACPI_END will be including acpi.h. But after the next patch, we would need pmap.c to include acpi.h. I don't think this would be right (and quite likely you would ask why this is done). Hence this approach. Cheers, -- Julien Grall
Re: [PATCH v3 16/19] xen/arm: mm: Use the PMAP helpers in xen_{,un}map_table()
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Julien Grall > > During early boot, it is not possible to use xen_{,un}map_table() > if the page tables are not residing the Xen binary. > > This is a blocker to switch some of the helpers to use xen_pt_update() > as we may need to allocate extra page tables and access them before > the domheap has been initialized (see setup_xenheap_mappings()). > > xen_{,un}map_table() are now updated to use the PMAP helpers for early > boot map/unmap. Note that the special case for page-tables residing > in Xen binary has been dropped because it is "complex" and was > only added as a workaround in 8d4f1b8878e0 ("xen/arm: mm: Allow > generic xen page-tables helpers to be called early"). > > Signed-off-by: Julien Grall Reviewed-by: Stefano Stabellini > --- > Changes in v2: > - New patch > --- > xen/arch/arm/mm.c | 33 + > 1 file changed, 9 insertions(+), 24 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 659bdf25e0ff..11b6b60a2bc1 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -964,27 +965,11 @@ void *ioremap(paddr_t pa, size_t len) > static lpae_t *xen_map_table(mfn_t mfn) > { > /* > - * We may require to map the page table before map_domain_page() is > - * useable. The requirements here is it must be useable as soon as > - * page-tables are allocated dynamically via alloc_boot_pages(). > - * > - * We need to do the check on physical address rather than virtual > - * address to avoid truncation on Arm32. Therefore is_kernel() cannot > - * be used. > + * During early boot, map_domain_page() may be unusable. Use the > + * PMAP to map temporarily a page-table. > */ > if ( system_state == SYS_STATE_early_boot ) > -{ > -if ( is_xen_fixed_mfn(mfn) ) > -{ > -/* > - * It is fine to demote the type because the size of Xen > - * will always fit in vaddr_t. > - */ > -vaddr_t offset = mfn_to_maddr(mfn) - virt_to_maddr(&_start); > - > -return (lpae_t *)(XEN_VIRT_START + offset); > -} > -} > +return pmap_map(mfn); > > return map_domain_page(mfn); > } > @@ -993,12 +978,12 @@ static void xen_unmap_table(const lpae_t *table) > { > /* > * During early boot, xen_map_table() will not use map_domain_page() > - * for page-tables residing in Xen binary. So skip the unmap part. > + * but the PMAP. > */ > -if ( system_state == SYS_STATE_early_boot && is_kernel(table) ) > -return; > - > -unmap_domain_page(table); > +if ( system_state == SYS_STATE_early_boot ) > +pmap_unmap(table); > +else > +unmap_domain_page(table); > } > > static int create_xen_table(lpae_t *entry) > -- > 2.32.0 >
[linux-linus test] 169174: tolerable FAIL - PUSHED
flight 169174 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/169174/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 169145 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169145 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 169145 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169145 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169145 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169145 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169145 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 169145 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass version targeted for testing: linux3123109284176b1532874591f7c81f3837bbdc17 baseline version: linux09bb8856d4a7cf3128dedd79cd07d75bbf4a9f04 Last test of basis 169145 2022-04-03 20:41:35 Z2 days Testing same since 169157 2022-04-04 06:23:01 Z1 days3 attempts People who touched revisions under test: Linus Torvalds jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass
Re: [PATCH v3 15/19] xen/arm: mm: Clean-up the includes and order them
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Julien Grall > > The numbers of includes in mm.c has been growing quite a lot. However > some of them (e.g. xen/device_tree.h, xen/softirq.h) doesn't look > to be directly used by the file or other will be included by > larger headers (e.g asm/flushtlb.h will be included by xen/mm.h). > > So trim down the number of includes. Take the opportunity to order > them with the xen headers first, then asm headers and last public > headers. > > Signed-off-by: Julien Grall I'll trust you on this one :-) Acked-by: Stefano Stabellini > --- > Changes in v3: > - Patch added > --- > xen/arch/arm/mm.c | 27 ++- > 1 file changed, 10 insertions(+), 17 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index b7942464d4de..659bdf25e0ff 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -17,33 +17,26 @@ > * GNU General Public License for more details. > */ > > -#include > -#include > -#include > -#include > -#include > -#include > +#include > #include > #include > -#include > -#include > #include > -#include > -#include > -#include > -#include > -#include > -#include > +#include > +#include > +#include > +#include > #include > +#include > +#include > #include > + > #include > -#include > -#include > -#include > > #include > #include > > +#include > + > /* Override macros from asm/page.h to make them work with mfn_t */ > #undef virt_to_mfn > #define virt_to_mfn(va) _mfn(__virt_to_mfn(va)) > -- > 2.32.0 >
Re: [PATCH v3 14/19] xen/arm: add Persistent Map (PMAP) infrastructure
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Wei Liu > > The basic idea is like Persistent Kernel Map (PKMAP) in Linux. We > pre-populate all the relevant page tables before the system is fully > set up. > > We will need it on Arm in order to rework the arm64 version of > xenheap_setup_mappings() as we may need to use pages allocated from > the boot allocator before they are effectively mapped. > > This infrastructure is not lock-protected therefore can only be used > before smpboot. After smpboot, map_domain_page() has to be used. > > This is based on the x86 version [1] that was originally implemented > by Wei Liu. > > The PMAP infrastructure is implemented in common code with some > arch helpers to set/clear the page-table entries and convertion > between a fixmap slot to a virtual address... > > As mfn_to_xen_entry() now needs to be exported, take the opportunity > to swich the parameter attr from unsigned to unsigned int. > > [1] > > > Signed-off-by: Wei Liu > Signed-off-by: Hongyan Xia > [julien: Adapted for Arm] > Signed-off-by: Julien Grall > > --- > Changes in v3: > - s/BITS_PER_LONG/BITS_PER_BYTE/ > - Move pmap to common code > > Changes in v2: > - New patch > > Cc: Jan Beulich > Cc: Wei Liu > Cc: Andrew Cooper > Cc: Roger Pau Monné > --- > xen/arch/arm/Kconfig | 1 + > xen/arch/arm/include/asm/fixmap.h | 17 +++ > xen/arch/arm/include/asm/lpae.h | 8 > xen/arch/arm/include/asm/pmap.h | 33 + > xen/arch/arm/mm.c | 7 +-- > xen/common/Kconfig| 3 ++ > xen/common/Makefile | 1 + > xen/common/pmap.c | 79 +++ > xen/include/xen/pmap.h| 16 +++ > 9 files changed, 159 insertions(+), 6 deletions(-) > create mode 100644 xen/arch/arm/include/asm/pmap.h > create mode 100644 xen/common/pmap.c > create mode 100644 xen/include/xen/pmap.h > > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig > index ecfa6822e4d3..a89a67802aa9 100644 > --- a/xen/arch/arm/Kconfig > +++ b/xen/arch/arm/Kconfig > @@ -14,6 +14,7 @@ config ARM > select HAS_DEVICE_TREE > select HAS_PASSTHROUGH > select HAS_PDX > + select HAS_PMAP > select IOMMU_FORCE_PT_SHARE > > config ARCH_DEFCONFIG > diff --git a/xen/arch/arm/include/asm/fixmap.h > b/xen/arch/arm/include/asm/fixmap.h > index 1cee51e52ab9..c46a15e59de4 100644 > --- a/xen/arch/arm/include/asm/fixmap.h > +++ b/xen/arch/arm/include/asm/fixmap.h > @@ -5,12 +5,20 @@ > #define __ASM_FIXMAP_H > > #include > +#include > > /* Fixmap slots */ > #define FIXMAP_CONSOLE 0 /* The primary UART */ > #define FIXMAP_MISC 1 /* Ephemeral mappings of hardware */ > #define FIXMAP_ACPI_BEGIN 2 /* Start mappings of ACPI tables */ > #define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1) > /* End mappings of ACPI tables */ > +#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */ > +#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of > PMAP */ > + > +#define FIXMAP_LAST FIXMAP_PMAP_END > + > +#define FIXADDR_START FIXMAP_ADDR(0) > +#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST) > > #ifndef __ASSEMBLY__ > > @@ -19,6 +27,15 @@ extern void set_fixmap(unsigned map, mfn_t mfn, unsigned > attributes); > /* Remove a mapping from a fixmap entry */ > extern void clear_fixmap(unsigned map); > > +#define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot)) > + > +static inline unsigned int virt_to_fix(vaddr_t vaddr) > +{ > +BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); > + > +return ((vaddr - FIXADDR_START) >> PAGE_SHIFT); > +} > + > #endif /* __ASSEMBLY__ */ > > #endif /* __ASM_FIXMAP_H */ > diff --git a/xen/arch/arm/include/asm/lpae.h b/xen/arch/arm/include/asm/lpae.h > index 8cf932b5c947..6099037da1c0 100644 > --- a/xen/arch/arm/include/asm/lpae.h > +++ b/xen/arch/arm/include/asm/lpae.h > @@ -4,6 +4,7 @@ > #ifndef __ASSEMBLY__ > > #include > +#include > > /* > * WARNING! Unlike the x86 pagetable code, where l1 is the lowest level and > @@ -168,6 +169,13 @@ static inline bool lpae_is_superpage(lpae_t pte, > unsigned int level) > third_table_offset(addr)\ > } > > +/* > + * Standard entry type that we'll use to build Xen's own pagetables. > + * We put the same permissions at every level, because they're ignored > + * by the walker in non-leaf entries. > + */ > +lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr); > + > #endif /* __ASSEMBLY__ */ > > /* > diff --git a/xen/arch/arm/include/asm/pmap.h b/xen/arch/arm/include/asm/pmap.h > new file mode 100644 > index ..70eafe2891d7 > --- /dev/null > +++ b/xen/arch/arm/include/asm/pmap.h > @@ -0,0 +1,33 @@ > +#ifndef __ASM_PMAP_H__ > +#define __ASM_PMAP_H__ > + > +#include > + > +/* XXX: Find an header to declare it */ > +extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES];
Re: [PATCH v3 13/19] xen/arm: Move fixmap definitions in a separate header
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Julien Grall > > To use properly the fixmap definitions, their user would need > also new to include . This is not very great when > the user itself is not meant to directly use ACPI definitions. > > Including in is not option because > the latter header is included by everyone. So move out the fixmap > entries definition in a new header. > > Take the opportunity to also move {set, clear}_fixmap() prototypes > in the new header. > > Note that most of the definitions in now need to be > surrounded with #ifndef __ASSEMBLY__ because will > be used in assembly (see EARLY_UART_VIRTUAL_ADDRESS). > > The split will become more helpful in a follow-up patch where new > fixmap entries will be defined. > > Signed-off-by: Julien Grall > > --- > Changes in v3: > - Patch added > --- > xen/arch/arm/acpi/lib.c | 2 ++ > xen/arch/arm/include/asm/config.h | 6 -- > xen/arch/arm/include/asm/early_printk.h | 1 + > xen/arch/arm/include/asm/fixmap.h | 24 > xen/arch/arm/include/asm/mm.h | 4 > xen/arch/arm/kernel.c | 1 + > xen/arch/arm/mm.c | 1 + > xen/include/xen/acpi.h | 18 +++--- > 8 files changed, 40 insertions(+), 17 deletions(-) > create mode 100644 xen/arch/arm/include/asm/fixmap.h > > diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c > index a59cc4074cfb..41d521f720ac 100644 > --- a/xen/arch/arm/acpi/lib.c > +++ b/xen/arch/arm/acpi/lib.c > @@ -25,6 +25,8 @@ > #include > #include > > +#include > + > static bool fixmap_inuse; > > char *__acpi_map_table(paddr_t phys, unsigned long size) > diff --git a/xen/arch/arm/include/asm/config.h > b/xen/arch/arm/include/asm/config.h > index 85d4a510ce8a..51908bf9422c 100644 > --- a/xen/arch/arm/include/asm/config.h > +++ b/xen/arch/arm/include/asm/config.h > @@ -175,12 +175,6 @@ > > #endif > > -/* Fixmap slots */ > -#define FIXMAP_CONSOLE 0 /* The primary UART */ > -#define FIXMAP_MISC 1 /* Ephemeral mappings of hardware */ > -#define FIXMAP_ACPI_BEGIN 2 /* Start mappings of ACPI tables */ > -#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1) > /* End mappings of ACPI tables */ > - > #define NR_hypercalls 64 > > #define STACK_ORDER 3 > diff --git a/xen/arch/arm/include/asm/early_printk.h > b/xen/arch/arm/include/asm/early_printk.h > index 8dc911cf48a3..c5149b2976da 100644 > --- a/xen/arch/arm/include/asm/early_printk.h > +++ b/xen/arch/arm/include/asm/early_printk.h > @@ -11,6 +11,7 @@ > #define __ARM_EARLY_PRINTK_H__ > > #include > +#include > > #ifdef CONFIG_EARLY_PRINTK > > diff --git a/xen/arch/arm/include/asm/fixmap.h > b/xen/arch/arm/include/asm/fixmap.h > new file mode 100644 > index ..1cee51e52ab9 > --- /dev/null > +++ b/xen/arch/arm/include/asm/fixmap.h > @@ -0,0 +1,24 @@ > +/* > + * fixmap.h: compile-time virtual memory allocation > + */ > +#ifndef __ASM_FIXMAP_H > +#define __ASM_FIXMAP_H > + > +#include > + > +/* Fixmap slots */ > +#define FIXMAP_CONSOLE 0 /* The primary UART */ > +#define FIXMAP_MISC 1 /* Ephemeral mappings of hardware */ > +#define FIXMAP_ACPI_BEGIN 2 /* Start mappings of ACPI tables */ > +#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1) > /* End mappings of ACPI tables */ > + > +#ifndef __ASSEMBLY__ > + > +/* Map a page in a fixmap entry */ > +extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); > +/* Remove a mapping from a fixmap entry */ > +extern void clear_fixmap(unsigned map); > + > +#endif /* __ASSEMBLY__ */ > + > +#endif /* __ASM_FIXMAP_H */ It is a good idea to create fixmap.h, but I think it should be acpi.h to include fixmap.h, not the other way around. The appended changes build correctly on top of this patch. diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h index 1cee51e52a..8cf9dbb618 100644 --- a/xen/arch/arm/include/asm/fixmap.h +++ b/xen/arch/arm/include/asm/fixmap.h @@ -4,8 +4,6 @@ #ifndef __ASM_FIXMAP_H #define __ASM_FIXMAP_H -#include - /* Fixmap slots */ #define FIXMAP_CONSOLE 0 /* The primary UART */ #define FIXMAP_MISC 1 /* Ephemeral mappings of hardware */ @@ -14,6 +12,8 @@ #ifndef __ASSEMBLY__ +#include + /* Map a page in a fixmap entry */ extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); /* Remove a mapping from a fixmap entry */ diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h index 1b9c75e68f..148673e77c 100644 --- a/xen/include/xen/acpi.h +++ b/xen/include/xen/acpi.h @@ -28,6 +28,8 @@ #define _LINUX #endif +#include + /* * Fixmap pages to reserve for ACPI boot-time tables (see * arch/x86/include/asm/fixmap.h or arch/arm/include/asm/fixmap.h),
Re: [PATCH v3 12/19] xen/arm: mm: Allow page-table allocation from the boot allocator
On Mon, 21 Feb 2022, Julien Grall wrote: > From: Julien Grall > > At the moment, page-table can only be allocated from domheap. This means > it is not possible to create mapping in the page-tables via > map_pages_to_xen() if page-table needs to be allocated. > > In order to avoid open-coding page-tables update in early boot, we need > to be able to allocate page-tables much earlier. Thankfully, we have the > boot allocator for those cases. > > create_xen_table() is updated to cater early boot allocation by using > alloc_boot_pages(). > > Note, this is not sufficient to bootstrap the page-tables (i.e mapping > before any memory is actually mapped). This will be addressed > separately. > > Signed-off-by: Julien Grall > Signed-off-by: Julien Grall Reviewed-by: Stefano Stabellini > --- > Changes in v2: > - New patch > --- > xen/arch/arm/mm.c | 20 ++-- > 1 file changed, 14 insertions(+), 6 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 58364bb6c820..f70b8cc7ce87 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -1014,19 +1014,27 @@ static void xen_unmap_table(const lpae_t *table) > > static int create_xen_table(lpae_t *entry) > { > -struct page_info *pg; > +mfn_t mfn; > void *p; > lpae_t pte; > > -pg = alloc_domheap_page(NULL, 0); > -if ( pg == NULL ) > -return -ENOMEM; > +if ( system_state != SYS_STATE_early_boot ) > +{ > +struct page_info *pg = alloc_domheap_page(NULL, 0); > + > +if ( pg == NULL ) > +return -ENOMEM; > + > +mfn = page_to_mfn(pg); > +} > +else > +mfn = alloc_boot_pages(1, 1); > > -p = xen_map_table(page_to_mfn(pg)); > +p = xen_map_table(mfn); > clear_page(p); > xen_unmap_table(p); > > -pte = mfn_to_xen_entry(page_to_mfn(pg), MT_NORMAL); > +pte = mfn_to_xen_entry(mfn, MT_NORMAL); > pte.pt.table = 1; > write_pte(entry, pte);
Re: [PATCH v3 07/19] xen/arm: mm: Don't open-code Xen PT update in remove_early_mappings()
On Sat, 2 Apr 2022, Julien Grall wrote: > On 02/04/2022 01:04, Stefano Stabellini wrote: > > On Mon, 21 Feb 2022, Julien Grall wrote: > > > From: Julien Grall > > > > > > Now that xen_pt_update_entry() is able to deal with different mapping > > > size, we can replace the open-coding of the page-tables update by a call > > > to modify_xen_mappings(). > > > > > > As the function is not meant to fail, a BUG_ON() is added to check the > > > return. > > > > > > Signed-off-by: Julien Grall > > > Signed-off-by: Julien Grall > > > > Nice! > > > > > > > --- > > > Changes in v2: > > > - Stay consistent with how function name are used in the commit > > > message > > > - Add my AWS signed-off-by > > > --- > > > xen/arch/arm/mm.c | 10 +- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > > index 7b4b9de8693e..f088a4b2de96 100644 > > > --- a/xen/arch/arm/mm.c > > > +++ b/xen/arch/arm/mm.c > > > @@ -599,11 +599,11 @@ void * __init early_fdt_map(paddr_t fdt_paddr) > > > void __init remove_early_mappings(void) > > > { > > > -lpae_t pte = {0}; > > > -write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START), > > > pte); > > > -write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START + > > > SZ_2M), > > > - pte); > > > -flush_xen_tlb_range_va(BOOT_FDT_VIRT_START, BOOT_FDT_SLOT_SIZE); > > > +int rc; > > > + > > > +rc = modify_xen_mappings(BOOT_FDT_VIRT_START, BOOT_FDT_VIRT_END, > > > + _PAGE_BLOCK); > > > +BUG_ON(rc); > > > > Am I right that we are actually destroying the mapping, which usually is > > done by calling destroy_xen_mappings, but we cannot call > > destroy_xen_mappings in this case because it doesn't take a flags > > parameter? > > You are right. > > > > > If so, then I would add a flags parameter to destroy_xen_mappings > > instead of calling modify_xen_mappings just to pass _PAGE_BLOCK. > > But I don't feel strongly about it so if you don't feel like making the > > change to destroy_xen_mappings, you can add my acked-by here anyway. > > destroy_xen_mappings() is a function used by common code. This is the only > place so far where I need to pass _PAGE_BLOCK and I don't expect it to be used > by the common code any time soon. > > So I am not in favor to add an extra parameter for destroy_xen_mappings(). > > Would you prefer if I open-code the call to xen_pt_update? No need, just add a one-line in-code comment like: /* destroy the _PAGE_BLOCK mapping */
Re: [PATCH v3 06/19] xen/arm: mm: Avoid flushing the TLBs when mapping are inserted
On Sat, 2 Apr 2022, Julien Grall wrote: > Hi Stefano, > > On 02/04/2022 01:00, Stefano Stabellini wrote: > > On Mon, 21 Feb 2022, Julien Grall wrote: > > > From: Julien Grall > > > > > > Currently, the function xen_pt_update() will flush the TLBs even when > > > the mappings are inserted. This is a bit wasteful because we don't > > > allow mapping replacement. Even if we were, the flush would need to > > > happen earlier because mapping replacement should use Break-Before-Make > > > when updating the entry. > > > > > > A single call to xen_pt_update() can perform a single action. IOW, it > > > is not possible to, for instance, mix inserting and removing mappings. > > > Therefore, we can use `flags` to determine what action is performed. > > > > > > This change will be particularly help to limit the impact of switching > > > boot time mapping to use xen_pt_update(). > > > > > > Signed-off-by: Julien Grall > > > > > > --- > > > Changes in v2: > > > - New patch > > > --- > > > xen/arch/arm/mm.c | 17 ++--- > > > 1 file changed, 14 insertions(+), 3 deletions(-) > > > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > > index fd16c1541ce2..7b4b9de8693e 100644 > > > --- a/xen/arch/arm/mm.c > > > +++ b/xen/arch/arm/mm.c > > > @@ -1104,7 +1104,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t > > > mfn, unsigned int level, > > > /* We should be here with a valid MFN. */ > > > ASSERT(!mfn_eq(mfn, INVALID_MFN)); > > > -/* We don't allow replacing any valid entry. */ > > > +/* > > > + * We don't allow replacing any valid entry. > > > + * > > > + * Note that the function xen_pt_update() relies on this > > > + * assumption and will skip the TLB flush. The function will need > > > + * to be updated if the check is relaxed. > > > + */ > > > if ( lpae_is_valid(entry) ) > > > { > > > if ( lpae_is_mapping(entry, level) ) > > > @@ -1417,11 +1423,16 @@ static int xen_pt_update(unsigned long virt, > > > } > > > /* > > > - * Flush the TLBs even in case of failure because we may have > > > + * The TLBs flush can be safely skipped when a mapping is inserted > > > + * as we don't allow mapping replacement (see xen_pt_check_entry()). > > > + * > > > + * For all the other cases, the TLBs will be flushed unconditionally > > > + * even if the mapping has failed. This is because we may have > > >* partially modified the PT. This will prevent any unexpected > > >* behavior afterwards. > > >*/ > > > -flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns); > > > +if ( !(flags & _PAGE_PRESENT) || mfn_eq(mfn, INVALID_MFN) ) > > > +flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns); > > > > I am trying to think of a care where the following wouldn't be enough > > but I cannot come up with one: > > > > if ( mfn_eq(mfn, INVALID_MFN) ) > > flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns); > > _PAGE_PRESENT is not set for two cases: when removing a page or populating > page-tables for a region. Both of them will expect an INVALID_MFN (see the two > asserts in xen_pt_check_entry()). > > Therefore your solution should work. However, technically the 'mfn' is ignored > in both situation (hence why this is an ASSERT() rather than a prod check). > > Also, I feel it is better to flush more than less (missing a flush could have > catastrophic result). So I chose to be explicit in which case the flush can be > skipped. > > Maybe it would be clearer if I write: > > !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) It is not much a matter of clarity -- I just wanted to check with you the reasons for the if condition because, as you wrote, wrong tlb flushes can have catastrophic effects. That said, actually I prefer your second version: !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN))
Re: [PATCH v3 05/19] xen/arm: mm: Add support for the contiguous bit
On Sat, 2 Apr 2022, Julien Grall wrote: > On 02/04/2022 00:53, Stefano Stabellini wrote: > > On Mon, 21 Feb 2022, Julien Grall wrote: > > > @@ -1333,21 +1386,34 @@ static int xen_pt_update(unsigned long virt, > > > while ( left ) > > > { > > > unsigned int order, level; > > > +unsigned int nr_contig; > > > +unsigned int new_flags; > > > level = xen_pt_mapping_level(vfn, mfn, left, flags); > > > order = XEN_PT_LEVEL_ORDER(level); > > > ASSERT(left >= BIT(order, UL)); > > > -rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, level, > > > flags); > > > -if ( rc ) > > > -break; > > > +/* > > > + * Check if we can set the contiguous mapping and update the > > > + * flags accordingly. > > > + */ > > > +nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags); > > > +new_flags = flags | ((nr_contig > 1) ? _PAGE_CONTIG : 0); > > > > Here is an optional idea to make the code simpler. We could move the > > flags changes (adding/removing _PAGE_CONTIG) to xen_pt_check_contig. > > That way, we could remove the inner loop. > > > > xen_pt_check_contig could check if _PAGE_CONTIG is already set and based > > on alignment, it should be able to figure out when it needs to be > > disabled. > > My initial attempt was to do everything in a loop. But this didn't pan out as > I wanted (I felt the code was complex) and there are extra work to be done for > the next 31 entries (assuming 4KB granularity). > > Hence the two loops. Unfortunately, I didn't keep my first attempt. So I can't > realy show what I wrote. I trusted you that the resulting code with a single loop was worse. Reviewed-by: Stefano Stabellini
Re: [PATCH v3 04/19] xen/arm: mm: Allow other mapping size in xen_pt_update_entry()
On Sat, 2 Apr 2022, Julien Grall wrote: > On 02/04/2022 00:35, Stefano Stabellini wrote: > > > +/* Return the level where mapping should be done */ > > > +static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned > > > long nr, > > > +unsigned int flags) > > > +{ > > > +unsigned int level; > > > +unsigned long mask; > > > > Shouldn't mask be 64-bit on aarch32? > > The 3 variables we will use (mfn, vfn, nr) are unsigned long. So it is fine to > define the mask as unsigned long. Good point > > > +} > > > + > > > static DEFINE_SPINLOCK(xen_pt_lock); > > > static int xen_pt_update(unsigned long virt, > > >mfn_t mfn, > > > - unsigned long nr_mfns, > > > + const unsigned long nr_mfns, > > > > Why const? nr_mfns is an unsigned long so it is passed as value: it > > couldn't change the caller's parameter anyway. Just curious. > > Because nr_mfns is used to flush the TLBs. In the original I made the mistake > to decrement the variable and only discovered later on when the TLB contained > the wrong entry. > > Such bug tends to be very subtle and it is hard to find the root cause. So > better mark the variable const to avoid any surprise. > > The short version of what I wrote is in the commit message. I can write a > small comment in the code if you want. No, that's fine. Thanks for the explanation. > > >unsigned int flags) > > > { > > > int rc = 0; > > > -unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE; > > > +unsigned long vfn = virt >> PAGE_SHIFT; > > > +unsigned long left = nr_mfns; > > > /* > > >* For arm32, page-tables are different on each CPUs. Yet, they > > > share > > > @@ -1268,14 +1330,24 @@ static int xen_pt_update(unsigned long virt, > > > spin_lock(_pt_lock); > > > -for ( ; addr < addr_end; addr += PAGE_SIZE ) > > > +while ( left ) > > > { > > > -rc = xen_pt_update_entry(root, addr, mfn, flags); > > > +unsigned int order, level; > > > + > > > +level = xen_pt_mapping_level(vfn, mfn, left, flags); > > > +order = XEN_PT_LEVEL_ORDER(level); > > > + > > > +ASSERT(left >= BIT(order, UL)); > > > + > > > +rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, level, > > > flags); > > > > NIT: I know we don't have vfn_to_vaddr at the moment and there is no > > widespread usage of vfn in Xen anyway, but it looks off to use > > pfn_to_paddr on a vfn parameter. Maybe open-code pfn_to_paddr instead? > > Or introduce vfn_to_vaddr locally in this file? > > To avoid inconsistency with mfn_to_maddr() and gfn_to_gaddr(), I don't want ot > introduce vfn_to_vaddr() withtout the typesafe part. I think this is a bit > over the top for now. > > So I will open-code pfn_to_paddr(). Sounds good
Re: cleanup swiotlb initialization v8
On 4/4/22 1:05 AM, Christoph Hellwig wrote: Hi all, this series tries to clean up the swiotlb initialization, including that of swiotlb-xen. To get there is also removes the x86 iommu table infrastructure that massively obsfucates the initialization path. Git tree: git://git.infradead.org/users/hch/misc.git swiotlb-init-cleanup Gitweb: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup Tested-by: Boris Ostrovsky
Design meeting for AMD SEV-SNP project
Hello everyone, As announced during earlier community call, I'm posting here to announce our intention to bootstrap Xen support of AMD SEV-SNP technology. In very short, this hardware extension on AMD CPUs will allow to run encrypted memory in guests, except for explicitly permitted areas. The obvious example use case is running this technology in the Cloud, introducing an increased level of trust since even Xen or Dom0 couldn't read encrypted guests memory. For reference, here is our current "base" document to discuss further: https://cryptpad.fr/pad/#/2/pad/view/ApLTDJLGLG0mzKGIwrL9M0UVft5nTBnVre7eVAbIk00/ [https://cryptpad.fr/pad/#/2/pad/view/ApLTDJLGLG0mzKGIwrL9M0UVft5nTBnVre7eVAbIk00/] The first action that should be organized is a "design/discussion session" to clarify some choices and put things in motion with clear initial and achievable targets. It would be nice to meet relatively soon (ie: next week). Here is a Doodle link we'll use to choose the best day to meet: https://doodle.com/meeting/participate/id/dBBoPjkd I selected on purpose an hour to get a suitable schedule for both US and EU based people: the same hour than the Xen community call, which is 3:00 PM UTC, 4:00 PM London time, 5:00 PM Paris time and finally 11:00 AM New-York time. I will let the Doodle opened until the end of the week to let you know quickly here which day was selected. Regarding the meeting location: https://meet.vates.fr/sev (Jitsi powered) Let me know if you need any other information or if you have any question :) Regards, Olivier Lambert | Vates CEO XCP-ng & Xen Orchestra - Vates solutions w: vates.fr | xcp-ng.org | xen-orchestra.com
Re: [PATCH v2] Grab the EFI System Resource Table and check it
> On 5 Apr 2022, at 20:21, Stefano Stabellini wrote: > > On Mon, 4 Apr 2022, Luca Fancellu wrote: >>> On 2 Apr 2022, at 00:14, Demi Marie Obenour >>> wrote: >>> >>> The EFI System Resource Table (ESRT) is necessary for fwupd to identify >>> firmware updates to install. According to the UEFI specification §23.4, >>> the table shall be stored in memory of type EfiBootServicesData. >>> Therefore, Xen must avoid reusing that memory for other purposes, so >>> that Linux can access the ESRT. Additionally, Xen must mark the memory >>> as reserved, so that Linux knows accessing it is safe. >>> >>> See https://lore.kernel.org/xen-devel/20200818184018.GN1679@mail-itl/T/ >>> for details. >>> >>> Signed-off-by: Demi Marie Obenour >> >> Hi, >> >> I’ve tested the patch on an arm machine booting Xen+Dom0 through EFI, >> unfortunately >> I could not test the functionality. > > I understand you couldn't test ESRT but did the basic Xen+Dom0 boot via > EFI on ARM work? Yes, I realise now I should have added *and it works* before the comma, without it the sentence is misleading. Cheers, Luca
Re: [PATCH v2] Grab the EFI System Resource Table and check it
On Mon, 4 Apr 2022, Luca Fancellu wrote: > > On 2 Apr 2022, at 00:14, Demi Marie Obenour > > wrote: > > > > The EFI System Resource Table (ESRT) is necessary for fwupd to identify > > firmware updates to install. According to the UEFI specification §23.4, > > the table shall be stored in memory of type EfiBootServicesData. > > Therefore, Xen must avoid reusing that memory for other purposes, so > > that Linux can access the ESRT. Additionally, Xen must mark the memory > > as reserved, so that Linux knows accessing it is safe. > > > > See https://lore.kernel.org/xen-devel/20200818184018.GN1679@mail-itl/T/ > > for details. > > > > Signed-off-by: Demi Marie Obenour > > Hi, > > I’ve tested the patch on an arm machine booting Xen+Dom0 through EFI, > unfortunately > I could not test the functionality. I understand you couldn't test ESRT but did the basic Xen+Dom0 boot via EFI on ARM work?
Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged
On 4/5/22 13:17, Jason Andryuk wrote: > On Mon, Apr 4, 2022 at 11:34 AM Daniel P. Smith > wrote: >> >> On 3/31/22 09:16, Jason Andryuk wrote: >>> On Wed, Mar 30, 2022 at 3:05 PM Daniel P. Smith >>> wrote: There are now instances where internal hypervisor logic needs to make resource allocation calls that are protected by XSM checks. The internal hypervisor logic is represented a number of system domains which by designed are represented by non-privileged struct domain instances. To enable these logic blocks to function correctly but in a controlled manner, this commit introduces a pair of privilege escalation and demotion functions that will make a system domain privileged and then remove that privilege. Signed-off-by: Daniel P. Smith --- xen/include/xsm/xsm.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index e22d6160b5..157e57151e 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -189,6 +189,28 @@ struct xsm_operations { #endif }; +static always_inline int xsm_elevate_priv(struct domain *d) +{ +if ( is_system_domain(d) ) +{ +d->is_privileged = true; +return 0; +} + +return -EPERM; +} >>> >>> These look sufficient for the default policy, but they don't seem >>> sufficient for Flask. I think you need to create a new XSM hook. For >>> Flask, you would want the demote hook to transition xen_boot_t -> >>> xen_t. That would start xen_boot_t with privileges that are dropped >>> in a one-way transition. Does that require all policies to then have >>> xen_boot_t and xen_t? I guess it does unless the hook code has some >>> logic to skip the transition. >> >> I am still thinking this through but my initial concern for Flask is >> that I don't think we want dedicated domain transitions directly in >> code. My current thinking would be to use a Kconfig to use xen_boot_t >> type as the initial sid for the idle domain which would then require the >> default policy to include an allowed transition from xen_boot_t to >> xen_t. Then rely upon a boot domain to issue an xsm_op to do a relabel >> transition for the idle domain with an assertion that the idle domain is >> no longer labeled with its initial sid before Xen transitions its state >> to SYS_STATE_active. The one wrinkle to this is whether I will be able >> to schedule the boot domain before allowing Xen to transition into >> SYS_STATE_active. > > That is an interesting approach. While it would work, I find it > unusual that a domain would relabel Xen. I think Xen should be > responsible for itself and not rely on a domain for this operation. The boot domain is not a general domain as no domain can/should be created with its domid or flask label post transition to SYS_STATE_active. Its purpose was specifically meant to be a natural way to push out complicated pre-execution domain configuration from having to be in they hypervisor code. Therefore in a way it can be considered a user provided de-privileged part of the hypervisor. With that said, I just realized a flaw in the basis of my position. What is the difference between codifying a check that the idle domain is not the boot label versus codifying a transition from the boot label to the running label? None really, both will require some knowledge that there is a boot label and some running label. Combine with the fact that the idle domain really shouldn't have any other label than xen_t. I will work out how to incorporate the domain transition. >>> For the default policy, you could start by creating the system domains >>> as privileged and just have a single hook to drop privs. Then you >>> don't have to worry about the "elevate" hook existing. The patch 2 >>> asserts could instead become the location of xsm_drop_privs calls to >>> have a clear demarcation point. That expands the window with >>> privileges though. It's a little simpler, but maybe you don't want >>> that. However, it seems like you can only depriv once for the Flask >>> case since you want it to be one-way. >> >> This does simplify the solution and since today we cannot differentiate >> between hypervisor setup and hypervisor initiated domain construction >> contexts, it does not run counter to what I have proposed. As for flask, >> again I do not believe codifying a domain transition bound to a new XSM >> op is the appropriate approach. > > This hard coded domain transition does feel a little weird. But it > seems like a natural consequence of trying to use Flask to > deprivilege. I guess the transition could be behind a > dom0less/hyperlaunch Kconfig option. I just don't see a way around it > in some fashion with Flask enforcing. > > Another idea: Flask could start in permissive and only transition to >
[xen-unstable-smoke test] 169183: tolerable all pass - PUSHED
flight 169183 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/169183/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 14dd241aad8af447680ac73e8579990e2c09c1e7 baseline version: xen 120e26c2bb0097a589d718b1b58d7052ccce4458 Last test of basis 169175 2022-04-05 10:01:52 Z0 days Testing same since 169183 2022-04-05 14:01:59 Z0 days1 attempts People who touched revisions under test: Jan Beulich Roger Pau Monne Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 120e26c2bb..14dd241aad 14dd241aad8af447680ac73e8579990e2c09c1e7 -> smoke
[ovmf test] 169177: regressions - FAIL
flight 169177 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/169177/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 168254 build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a298a84478053872ed9da660a75f182ce81b8ddc baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 36 days Failing since168258 2022-03-01 01:55:31 Z 35 days 280 attempts Testing same since 169173 2022-04-05 05:13:00 Z0 days2 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bob Feng Gerd Hoffmann Guo Dong Guomin Jiang Hao A Wu Hua Ma Huang, Li-Xia Jagadeesh Ujja Jason Jason Lou Ken Lautner Kenneth Lautner Kuo, Ted Laszlo Ersek Leif Lindholm Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min Xu Patrick Rudolph Purna Chandra Rao Bandaru Ray Ni Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Ted Kuo Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li Yuanhao Xie Zhihao Li jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 4610 lines long.)
Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged
On Mon, Apr 4, 2022 at 11:34 AM Daniel P. Smith wrote: > > On 3/31/22 09:16, Jason Andryuk wrote: > > On Wed, Mar 30, 2022 at 3:05 PM Daniel P. Smith > > wrote: > >> > >> There are now instances where internal hypervisor logic needs to make > >> resource > >> allocation calls that are protected by XSM checks. The internal hypervisor > >> logic > >> is represented a number of system domains which by designed are > >> represented by > >> non-privileged struct domain instances. To enable these logic blocks to > >> function correctly but in a controlled manner, this commit introduces a > >> pair > >> of privilege escalation and demotion functions that will make a system > >> domain > >> privileged and then remove that privilege. > >> > >> Signed-off-by: Daniel P. Smith > >> --- > >> xen/include/xsm/xsm.h | 22 ++ > >> 1 file changed, 22 insertions(+) > >> > >> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h > >> index e22d6160b5..157e57151e 100644 > >> --- a/xen/include/xsm/xsm.h > >> +++ b/xen/include/xsm/xsm.h > >> @@ -189,6 +189,28 @@ struct xsm_operations { > >> #endif > >> }; > >> > >> +static always_inline int xsm_elevate_priv(struct domain *d) > >> +{ > >> +if ( is_system_domain(d) ) > >> +{ > >> +d->is_privileged = true; > >> +return 0; > >> +} > >> + > >> +return -EPERM; > >> +} > > > > These look sufficient for the default policy, but they don't seem > > sufficient for Flask. I think you need to create a new XSM hook. For > > Flask, you would want the demote hook to transition xen_boot_t -> > > xen_t. That would start xen_boot_t with privileges that are dropped > > in a one-way transition. Does that require all policies to then have > > xen_boot_t and xen_t? I guess it does unless the hook code has some > > logic to skip the transition. > > I am still thinking this through but my initial concern for Flask is > that I don't think we want dedicated domain transitions directly in > code. My current thinking would be to use a Kconfig to use xen_boot_t > type as the initial sid for the idle domain which would then require the > default policy to include an allowed transition from xen_boot_t to > xen_t. Then rely upon a boot domain to issue an xsm_op to do a relabel > transition for the idle domain with an assertion that the idle domain is > no longer labeled with its initial sid before Xen transitions its state > to SYS_STATE_active. The one wrinkle to this is whether I will be able > to schedule the boot domain before allowing Xen to transition into > SYS_STATE_active. That is an interesting approach. While it would work, I find it unusual that a domain would relabel Xen. I think Xen should be responsible for itself and not rely on a domain for this operation. > > For the default policy, you could start by creating the system domains > > as privileged and just have a single hook to drop privs. Then you > > don't have to worry about the "elevate" hook existing. The patch 2 > > asserts could instead become the location of xsm_drop_privs calls to > > have a clear demarcation point. That expands the window with > > privileges though. It's a little simpler, but maybe you don't want > > that. However, it seems like you can only depriv once for the Flask > > case since you want it to be one-way. > > This does simplify the solution and since today we cannot differentiate > between hypervisor setup and hypervisor initiated domain construction > contexts, it does not run counter to what I have proposed. As for flask, > again I do not believe codifying a domain transition bound to a new XSM > op is the appropriate approach. This hard coded domain transition does feel a little weird. But it seems like a natural consequence of trying to use Flask to deprivilege. I guess the transition could be behind a dom0less/hyperlaunch Kconfig option. I just don't see a way around it in some fashion with Flask enforcing. Another idea: Flask could start in permissive and only transition to enforcing at the deprivilege point. Kinda gross, but it works without needing a transition. To reiterate, XSM isn't really appropriate to enforce anything internal to Xen. We are working around the need to go through hook points during correct operation. Code exec in Xen means all bets are off. Memory writes to Xen data mean the XSM checks can be disabled (flip Flask to permissive) or bypassed (set d->is_privileged or change d->ssid). We shouldn't lose sight of this when we talk about deprivileging the idle domain. Regards, Jason
[libvirt test] 169171: regressions - FAIL
flight 169171 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/169171/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a version targeted for testing: libvirt 5d0eeb8cd7088e575c0a2b1d5759ccfb72c525c9 baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 634 days Failing since151818 2020-07-11 04:18:52 Z 633 days 615 attempts Testing same since 169171 2022-04-05 04:21:54 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Amneesh Singh Andika Triwidada Andrea Bolognani Ani Sinha Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Bjoern Walk Boris Fiuczynski Brad Laue Brian Turek Bruno Haible Chris Mayo Christian Borntraeger Christian Ehrhardt Christian Kirbach Christian Schoenebeck Christophe Fergeau Claudio Fontana Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Didik Supriadi dinglimin Divya Garg Dmitrii Shcherbakov Dmytro Linkin Eiichi Tsukata Emilio Herrera Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fabiano Fidêncio Fangge Jin Farhan Ali Fedora Weblate Translation Franck Ridel Gavi Teitz gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Haonan Wang Hela Basa Helmut Grohne Hiroki Narukawa Hyman Huang(黄勇) Ian Wienand Ioanna Alifieraki Ivan Teterevkov Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen jason lee Jean-Baptiste Holcroft Jia Zhou Jianan Gao Jim Fehlig Jin Yan Jing Qi Jinsheng Zhang Jiri Denemark Joachim Falk John Ferlan John Levon John Levon Jonathan Watt Jonathon Jongsma Julio Faracco Justin Gatzen Ján Tomko Kashyap Chamarthy Kevin Locke Kim InSoo Koichi Murase Kristina Hanicova Laine Stump Laszlo Ersek Lee Yarwood Lei Yang Liao Pingfang Lin Ma Lin Ma Lin Ma Liu Yiding Lubomir Rintel Luke Yue Luyao Zhong Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Martin Pitt Masayoshi Mizuma Matej Cepl Matt Coleman Matt Coleman Mauro Matteo Cascella Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Muha Aliss Nathan Neal Gompa Nick Chevsky Nick Shyrokovskiy Nickys Music Group Nico Pache Nicolas Lécureuil Nicolas Lécureuil Nikolay Shirokovskiy Olaf Hering Olesya Gerasimenko Or Ozeri Orion Poplawski Pany Paolo Bonzini Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Praveen K Paladugu Richard W.M. Jones Ricky Tigg Robin Lee Rohit Kumar Roman Bogorodskiy Roman Bolshakov Ryan Gahagan Ryan
Re: Increasing domain memory beyond initial maxmem
On Tue, Apr 05, 2022 at 01:03:57PM +0200, Juergen Gross wrote: > Hi Marek, > > On 31.03.22 14:36, Marek Marczykowski-Górecki wrote: > > On Thu, Mar 31, 2022 at 02:22:03PM +0200, Juergen Gross wrote: > > > Maybe some kernel config differences, or other udev rules (memory onlining > > > is done via udev in my guest)? > > > > > > I'm seeing: > > > > > > # zgrep MEMORY_HOTPLUG /proc/config.gz > > > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y > > > CONFIG_MEMORY_HOTPLUG=y > > > # CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set > > > CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y > > > CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512 > > > > I have: > > # zgrep MEMORY_HOTPLUG /proc/config.gz > > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y > > CONFIG_MEMORY_HOTPLUG=y > > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y > > CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y > > CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512 > > > > Not sure if relevant, but I also have: > > CONFIG_XEN_UNPOPULATED_ALLOC=y > > > > on top of that, I have a similar udev rule too: > > > > SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", > > ATTR{state}="online" > > > > But I don't think they are conflicting. > > > > > What type of guest are you using? Mine was a PVH guest. > > > > PVH here too. > > Would you like to try the attached patch? It seemed to work for me. Unfortunately it doesn't help, now the behavior is different: Initially guest started with 800M: [root@personal ~]# free -m totalusedfree shared buff/cache available Mem:740 223 272 2 243 401 Swap: 1023 01023 Then increased: [root@dom0 ~]$ xl mem-max personal 2048 [root@dom0 ~]$ xenstore-write /local/domain/$(xl domid personal)/memory/static-max $((2048*1024)) [root@dom0 ~]$ xl mem-set personal 2000 And guest shows now only a little more memory, but not full 2000M: [root@personal ~]# [ 37.657046] xen:balloon: Populating new zone [ 37.658206] Fallback order for Node 0: 0 [ 37.658219] Built 1 zonelists, mobility grouping on. Total pages: 175889 [ 37.658233] Policy zone: Normal [root@personal ~]# [root@personal ~]# free -m totalusedfree shared buff/cache available Mem:826 245 337 2 244 462 Swap: 1023 01023 I've applied the patch on top of 5.16.18. If you think 5.17 would make a difference, I can try that too. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH v4 3/8] x86/EFI: retrieve EDID
On Tue, Apr 05, 2022 at 04:36:53PM +0200, Jan Beulich wrote: > On 05.04.2022 12:27, Roger Pau Monné wrote: > > On Thu, Mar 31, 2022 at 11:45:36AM +0200, Jan Beulich wrote: > >> --- a/xen/arch/x86/efi/efi-boot.h > >> +++ b/xen/arch/x86/efi/efi-boot.h > >> @@ -568,6 +568,49 @@ static void __init efi_arch_video_init(E > >> #endif > >> } > >> > >> +#ifdef CONFIG_VIDEO > >> +static bool __init copy_edid(const void *buf, unsigned int size) > >> +{ > >> +/* > >> + * Be conservative - for both undersized and oversized blobs it is > >> unclear > >> + * what to actually do with them. The more that unlike the VESA BIOS > >> + * interface we also have no associated "capabilities" value (which > >> might > >> + * carry a hint as to possible interpretation). > >> + */ > >> +if ( size != ARRAY_SIZE(boot_edid_info) ) > >> +return false; > >> + > >> +memcpy(boot_edid_info, buf, size); > >> +boot_edid_caps = 0; > >> + > >> +return true; > >> +} > >> +#endif > >> + > >> +static void __init efi_arch_edid(EFI_HANDLE gop_handle) > >> +{ > >> +#ifdef CONFIG_VIDEO > >> +static EFI_GUID __initdata active_guid = > >> EFI_EDID_ACTIVE_PROTOCOL_GUID; > >> +static EFI_GUID __initdata discovered_guid = > >> EFI_EDID_DISCOVERED_PROTOCOL_GUID; > > > > Is there a need to make those static? > > > > I think this function is either called from efi_start or > > efi_multiboot, but there aren't multiple calls to it? (also both > > parameters are IN only, so not to be changed by the EFI method? > > > > I have the feeling setting them to static is done because they can't > > be set to const? > > Even if they could be const, they ought to also be static. They don't > strictly need to be, but without "static" code will be generated to > populate the on-stack variables; quite possibly the compiler would > even allocate an unnamed static variable and memcpy() from there onto > the stack. I thought that making those const (and then annotate with __initconst) would already have the same effect as having it static, as there will be no memcpy in that case either. > >> +EFI_EDID_ACTIVE_PROTOCOL *active_edid; > >> +EFI_EDID_DISCOVERED_PROTOCOL *discovered_edid; > >> +EFI_STATUS status; > >> + > >> +status = efi_bs->OpenProtocol(gop_handle, _guid, > >> + (void **)_edid, efi_ih, NULL, > >> + EFI_OPEN_PROTOCOL_GET_PROTOCOL); > >> +if ( status == EFI_SUCCESS && > >> + copy_edid(active_edid->Edid, active_edid->SizeOfEdid) ) > >> +return; > > > > Isn't it enough to just call EFI_EDID_ACTIVE_PROTOCOL_GUID? > > > > From my reading of the UEFI spec this will either return > > EFI_EDID_OVERRIDE_PROTOCOL_GUID or EFI_EDID_DISCOVERED_PROTOCOL_GUID. > > If EFI_EDID_OVERRIDE_PROTOCOL is set it must be used, and hence > > falling back to EFI_EDID_DISCOVERED_PROTOCOL_GUID if > > EFI_EDID_ACTIVE_PROTOCOL_GUID cannot be parsed would likely mean > > ignoring EFI_EDID_OVERRIDE_PROTOCOL? > > That's the theory. As per one of the post-commit-message remarks I had > looked at what GrUB does, and I decided to follow its behavior in this > regard, assuming they do what they do to work around quirks. As said > in the remark, I didn't want to go as far as also cloning their use of > the undocumented (afaik) "agp-internal-edid" variable. Could you add this as a comment here? So it's not lost on commit as being just a post-commit log remark. With that: Acked-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged
On Tue, Apr 05, 2022 at 08:06:31AM -0400, Daniel P. Smith wrote: > On 4/5/22 03:42, Roger Pau Monné wrote: > > On Mon, Apr 04, 2022 at 12:08:25PM -0400, Daniel P. Smith wrote: > >> On 4/4/22 11:12, Roger Pau Monné wrote: > >>> On Mon, Apr 04, 2022 at 10:21:18AM -0400, Daniel P. Smith wrote: > On 3/31/22 08:36, Roger Pau Monné wrote: > > On Wed, Mar 30, 2022 at 07:05:48PM -0400, Daniel P. Smith wrote: > >> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h > >> index e22d6160b5..157e57151e 100644 > >> --- a/xen/include/xsm/xsm.h > >> +++ b/xen/include/xsm/xsm.h > >> @@ -189,6 +189,28 @@ struct xsm_operations { > >> #endif > >> }; > >> > >> +static always_inline int xsm_elevate_priv(struct domain *d) > > > > I don't think it needs to be always_inline, using just inline would be > > fine IMO. > > > > Also this needs to be __init. > > AIUI always_inline is likely the best way to preserve the speculation > safety brought in by the call to is_system_domain(). > >>> > >>> There's nothing related to speculation safety in is_system_domain() > >>> AFAICT. It's just a plain check against d->domain_id. It's my > >>> understanding there's no need for any speculation barrier there > >>> because d->domain_id is not an external input. > >> > >> Hmmm, this actually raises a good question. Why is is_control_domain(), > >> is_hardware_domain, and others all have evaluate_nospec() wrapping the > >> check of a struct domain element while is_system_domain() does not? > > > > Jan replied to this regard, see: > > > > https://lore.kernel.org/xen-devel/54272d08-7ce1-b162-c8e9-1955b780c...@suse.com/ > > Jan can correct me if I misunderstood, but his point is with respect to > where the inline function will be expanded into and I would think you > would want to ensure that if anyone were to use is_system_domain(), then > the inline expansion of this new location could create a potential > speculation-able branch. Basically my concern is not putting the guards > in place today just because there is not currently any location where > is_system_domain() is expanded to create a speculation opportunity does > not mean there is not an opening for the opportunity down the road for a > future unprotected use. > > >>> In any case this function should be __init only, at which point there > >>> are no untrusted inputs to Xen. > >> > >> I thought it was agreed that __init on inline functions in headers had > >> no meaning? > > > > In a different reply I already noted my preference would be for the > > function to not reside in a header and not be inline, simply because > > it would be gone after initialization and we won't have to worry about > > any stray calls when the system is active. > > If an inline function is only used by __init code, how would be > available for stray calls when the system is active? I would concede > that it is possible for someone to explicitly use in not __init code but > I would like to believe any usage in a submitted code change would be > questioned by the reviewers. Right, it's IMO easier when things just explode when not used correctly, hence my suggestion to make it __init. > With that said, if we consider Jason's suggestion would this remove your > concern since that would only introduce a de-privilege function and > there would be no piv escalation that could be erroneously called at > anytime? Indeed. IMO everything that happens before the system switches to the active state should be considered to be running in a privileged context anyway. Maybe others have different opinions. Or maybe there are use-cases I'm not aware of where this is not true. Thanks, Roger.
Re: [PATCH] x86/irq: Skip unmap_domain_pirq XSM during destruction
On Tue, Apr 5, 2022 at 4:18 AM Jan Beulich wrote: > > On 30.03.2022 20:17, Jason Andryuk wrote: > > xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from > > complete_domain_destroy as an RCU callback. The source context was an > > unexpected, random domain. Since this is a xen-internal operation, > > we don't want the XSM hook denying the operation. > > > > Check d->is_dying and skip the check when the domain is dead. The RCU > > callback runs when a domain is in that state. > > One question which has always been puzzling me (perhaps to Daniel): While > I can see why mapping of an IRQ needs to be subject to an XSM check, it's > not really clear to me why unmapping would need to be, at least as long > as it's the domain itself which requests the unmap (and which I would > view to extend to the domain being cleaned up). But maybe that's why it's > XSM_HOOK ... > > > --- > > Dan wants to change current to point at DOMID_IDLE when the RCU callback > > runs. I think Juergen's commit 53594c7bd197 "rcu: don't use > > stop_machine_run() for rcu_barrier()" may have changed this since it > > mentions stop_machine_run scheduled the idle vcpus to run the callbacks > > for the old code. > > > > Would that be as easy as changing rcu_do_batch() to do: > > > > +/* Run as "Xen" not a random domain's vcpu. */ > > +vcpu = get_current(); > > +set_current(idle_vcpu[smp_processor_id()]); > > list->func(list); > > +set_current(vcpu); > > > > or is using set_current() only acceptable as part of context_switch? > > Indeed I would question any uses outside of context_switch() (and > system bringup). > > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -2340,10 +2340,14 @@ int unmap_domain_pirq(struct domain *d, int pirq) > > nr = msi_desc->msi.nvec; > > } > > > > -ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq, > > - msi_desc ? msi_desc->dev : NULL); > > -if ( ret ) > > -goto done; > > +/* When called by complete_domain_destroy via RCU, current is a random > > + * domain. Skip the XSM check since this is a Xen-initiated action. */ > > Comment style. Yes. Sorry about that. > > +if ( d->is_dying != DOMDYING_dead ) { > > Please use !d->is_dying. Also please correct the placement of the brace. > Or you could avoid the need for a brace by leveraging that ret is zero > ahead of this if(), i.e. ... Here I was patting myself on the back for remembering the spaces inside the parens, and I screwed up the brace... Sorry. I intentionally chose DOMDYING_dead because, from my reading of the code, complete_domain_destroy should only reach here when dead (and not dying). If this function is reached when DOMDYING_dying, then that is unexpected. That would be a guest-initiated action and therefore the XSM check should apply. Just checking is_dying is fine, but I want to explain and highlight this aspect. > > +ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq, > > + msi_desc ? msi_desc->dev : NULL); > > +if ( ret ) > > +goto done; > > +} > > > if ( !d->is_dying ) > ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq, >msi_desc ? msi_desc->dev : NULL); > if ( ret ) > goto done; I'm planning to just do it this way. Thank you for reviewing. -Jason
[xen-4.14-testing test] 169170: regressions - FAIL
flight 169170 xen-4.14-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/169170/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl 18 guest-localmigrate fail REGR. vs. 168506 build-arm64-xsm 6 xen-buildfail REGR. vs. 168506 Tests which did not succeed, but are not blocking: test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168506 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168506 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168506 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168506 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168506 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168506 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168506 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168506 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168506 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168506 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168506 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168506 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: xen
Re: [PATCH v4 3/8] x86/EFI: retrieve EDID
On 05.04.2022 12:27, Roger Pau Monné wrote: > On Thu, Mar 31, 2022 at 11:45:36AM +0200, Jan Beulich wrote: >> --- a/xen/arch/x86/efi/efi-boot.h >> +++ b/xen/arch/x86/efi/efi-boot.h >> @@ -568,6 +568,49 @@ static void __init efi_arch_video_init(E >> #endif >> } >> >> +#ifdef CONFIG_VIDEO >> +static bool __init copy_edid(const void *buf, unsigned int size) >> +{ >> +/* >> + * Be conservative - for both undersized and oversized blobs it is >> unclear >> + * what to actually do with them. The more that unlike the VESA BIOS >> + * interface we also have no associated "capabilities" value (which >> might >> + * carry a hint as to possible interpretation). >> + */ >> +if ( size != ARRAY_SIZE(boot_edid_info) ) >> +return false; >> + >> +memcpy(boot_edid_info, buf, size); >> +boot_edid_caps = 0; >> + >> +return true; >> +} >> +#endif >> + >> +static void __init efi_arch_edid(EFI_HANDLE gop_handle) >> +{ >> +#ifdef CONFIG_VIDEO >> +static EFI_GUID __initdata active_guid = EFI_EDID_ACTIVE_PROTOCOL_GUID; >> +static EFI_GUID __initdata discovered_guid = >> EFI_EDID_DISCOVERED_PROTOCOL_GUID; > > Is there a need to make those static? > > I think this function is either called from efi_start or > efi_multiboot, but there aren't multiple calls to it? (also both > parameters are IN only, so not to be changed by the EFI method? > > I have the feeling setting them to static is done because they can't > be set to const? Even if they could be const, they ought to also be static. They don't strictly need to be, but without "static" code will be generated to populate the on-stack variables; quite possibly the compiler would even allocate an unnamed static variable and memcpy() from there onto the stack. >> +EFI_EDID_ACTIVE_PROTOCOL *active_edid; >> +EFI_EDID_DISCOVERED_PROTOCOL *discovered_edid; >> +EFI_STATUS status; >> + >> +status = efi_bs->OpenProtocol(gop_handle, _guid, >> + (void **)_edid, efi_ih, NULL, >> + EFI_OPEN_PROTOCOL_GET_PROTOCOL); >> +if ( status == EFI_SUCCESS && >> + copy_edid(active_edid->Edid, active_edid->SizeOfEdid) ) >> +return; > > Isn't it enough to just call EFI_EDID_ACTIVE_PROTOCOL_GUID? > > From my reading of the UEFI spec this will either return > EFI_EDID_OVERRIDE_PROTOCOL_GUID or EFI_EDID_DISCOVERED_PROTOCOL_GUID. > If EFI_EDID_OVERRIDE_PROTOCOL is set it must be used, and hence > falling back to EFI_EDID_DISCOVERED_PROTOCOL_GUID if > EFI_EDID_ACTIVE_PROTOCOL_GUID cannot be parsed would likely mean > ignoring EFI_EDID_OVERRIDE_PROTOCOL? That's the theory. As per one of the post-commit-message remarks I had looked at what GrUB does, and I decided to follow its behavior in this regard, assuming they do what they do to work around quirks. As said in the remark, I didn't want to go as far as also cloning their use of the undocumented (afaik) "agp-internal-edid" variable. >> --- a/xen/include/efi/efiprot.h >> +++ b/xen/include/efi/efiprot.h >> @@ -724,5 +724,52 @@ struct _EFI_GRAPHICS_OUTPUT_PROTOCOL { >>EFI_GRAPHICS_OUTPUT_PROTOCOL_BLT Blt; >>EFI_GRAPHICS_OUTPUT_PROTOCOL_MODE*Mode; >> }; >> + >> +/* >> + * EFI EDID Discovered Protocol >> + * UEFI Specification Version 2.5 Section 11.9 >> + */ >> +#define EFI_EDID_DISCOVERED_PROTOCOL_GUID \ >> +{ 0x1C0C34F6, 0xD380, 0x41FA, { 0xA0, 0x49, 0x8a, 0xD0, 0x6C, 0x1A, >> 0x66, 0xAA} } >> + >> +typedef struct _EFI_EDID_DISCOVERED_PROTOCOL { >> +UINT32 SizeOfEdid; >> +UINT8 *Edid; >> +} EFI_EDID_DISCOVERED_PROTOCOL; >> + >> +/* >> + * EFI EDID Active Protocol >> + * UEFI Specification Version 2.5 Section 11.9 >> + */ >> +#define EFI_EDID_ACTIVE_PROTOCOL_GUID \ >> +{ 0xBD8C1056, 0x9F36, 0x44EC, { 0x92, 0xA8, 0xA6, 0x33, 0x7F, 0x81, >> 0x79, 0x86} } >> + >> +typedef struct _EFI_EDID_ACTIVE_PROTOCOL { >> +UINT32 SizeOfEdid; >> +UINT8 *Edid; >> +} EFI_EDID_ACTIVE_PROTOCOL; >> + >> +/* >> + * EFI EDID Override Protocol >> + * UEFI Specification Version 2.5 Section 11.9 >> + */ >> +#define EFI_EDID_OVERRIDE_PROTOCOL_GUID \ >> +{ 0x48ECB431, 0xFB72, 0x45C0, { 0xA9, 0x22, 0xF4, 0x58, 0xFE, 0x04, >> 0x0B, 0xD5} } >> + >> +INTERFACE_DECL(_EFI_EDID_OVERRIDE_PROTOCOL); >> + >> +typedef >> +EFI_STATUS >> +(EFIAPI *EFI_EDID_OVERRIDE_PROTOCOL_GET_EDID) ( >> + IN struct _EFI_EDID_OVERRIDE_PROTOCOL *This, >> + IN EFI_HANDLE *ChildHandle, >> + OUT UINT32 *Attributes, >> + IN OUT UINTN*EdidSize, >> + IN OUT UINT8 **Edid); >> + >> +typedef struct _EFI_EDID_OVERRIDE_PROTOCOL { >> +EFI_EDID_OVERRIDE_PROTOCOL_GET_EDID GetEdid; >> +} EFI_EDID_OVERRIDE_PROTOCOL; >> + >> #endif > > FWIW, EFI_EDID_OVERRIDE_PROTOCOL_GUID is not used by the patch, so I > guess it's introduced for completeness (or because it's
Re: [PATCH 1/2] hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XEN
On Sat, Mar 26, 2022 at 05:58:23PM +0100, Bernhard Beschow wrote: > igd-passthrough-isa-bridge is only requested in xen_pt but was > implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc > which is hereby resolved. > > Signed-off-by: Bernhard Beschow Acked-by: Anthony PERARD Thanks, -- Anthony PERARD
Re: [PATCH 2/2] hw/xen/xen_pt: Resolve igd_passthrough_isa_bridge_create() indirection
On Sat, Mar 26, 2022 at 05:58:24PM +0100, Bernhard Beschow wrote: > Now that igd_passthrough_isa_bridge_create() is implemented within the > xen context it may use Xen* data types directly and become > xen_igd_passthrough_isa_bridge_create(). This resolves an indirection. > > Signed-off-by: Bernhard Beschow Acked-by: Anthony PERARD Thanks, -- Anthony PERARD
[PATCH] osstest: stop anacron service
Just disabling cron in rc.d is not enough. There's also anacron which will get invoked during startup, and since apt-compat has a delay of up to 30min it can be picked up by the leak detector if the test finishes fast enough: LEAKED [process 14563 sleep] process: root 14563 14556 0 07:49 ? 00:00:00 sleep 1163 LEAKED [process 14550 /bin/sh] process: root 14550 2264 0 07:49 ? 00:00:00 /bin/sh -c run-parts --report /etc/cron.daily LEAKED [process 14551 run-parts] process: root 14551 14550 0 07:49 ? 00:00:00 run-parts --report /etc/cron.daily LEAKED [process 14556 /bin/sh] process: root 14556 14551 0 07:49 ? 00:00:00 /bin/sh /etc/cron.daily/apt-compat From: http://logs.test-lab.xenproject.org/osstest/logs/169015 To prevent this disable anacron like it's done for cron. Signed-off-by: Roger Pau Monné --- Osstest/TestSupport.pm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index 8103ea1d..8e3e5f68 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -3151,6 +3151,8 @@ sub host_install_postboot_complete ($) { target_core_dump_setup($ho); target_cmd_root($ho, "update-rc.d cron disable"); target_cmd_root($ho, "service cron stop"); +target_cmd_root($ho, "update-rc.d anacron disable"); +target_cmd_root($ho, "service anacron stop"); target_cmd_root($ho, "update-rc.d osstest-confirm-booted start 99 2 ."); target_https_mitm_proxy_setup($ho); } -- 2.35.1
[xen-unstable-smoke test] 169175: tolerable all pass - PUSHED
flight 169175 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/169175/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 120e26c2bb0097a589d718b1b58d7052ccce4458 baseline version: xen e270af94280e6a9610705ebc1fdd1d7a9b1f8a98 Last test of basis 169160 2022-04-04 12:03:06 Z1 days Testing same since 169175 2022-04-05 10:01:52 Z0 days1 attempts People who touched revisions under test: Anthony PERARD Jan Beulich Julien Grall jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git e270af9428..120e26c2bb 120e26c2bb0097a589d718b1b58d7052ccce4458 -> smoke
Re: preparations for 4.14.5 ?
On Mon, Apr 04, 2022 at 03:42:09PM +0200, Jan Beulich wrote: > On 01.04.2022 15:46, Marek Marczykowski-Górecki wrote: > > On Wed, Mar 30, 2022 at 12:16:00PM +0200, Jan Beulich wrote: > > I'm not sure if "just" bugfix qualify for 4.14 at this point, but if so, > > I'd propose: > > 0a20a53df158 tools/libs/light: set video_mem for PVH guests > > > > In any case, the above should be backported to 4.15 and 4.16. > > Hmm, Anthony, I'd like to ask for your view here: This looks more > like a cosmetic change to me at the first glance. Plus it's a > little odd to see it being proposed for backporting now, when it's > already almost 4 months old and hence could have gone into 4.15.2 > and 4.14.4 if it was important. The patch might be good to backport. I guess that could mess up memory hotplug a little with PVH guests without the patch. I've got a few others commits which would be good to backport I think: e45ad0b1b0 ("xl: Fix global pci options") d2ecf97f91 ("libxl: Don't segfault on soft-reset failure") d62a34423a ("libxl: Re-scope qmp_proxy_spawn.ao usage") Thanks, -- Anthony PERARD
Re: [PATCH] x86/irq: Skip unmap_domain_pirq XSM during destruction
On 4/5/22 04:18, Jan Beulich wrote: > On 30.03.2022 20:17, Jason Andryuk wrote: >> xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from >> complete_domain_destroy as an RCU callback. The source context was an >> unexpected, random domain. Since this is a xen-internal operation, >> we don't want the XSM hook denying the operation. >> >> Check d->is_dying and skip the check when the domain is dead. The RCU >> callback runs when a domain is in that state. > > One question which has always been puzzling me (perhaps to Daniel): While > I can see why mapping of an IRQ needs to be subject to an XSM check, it's > not really clear to me why unmapping would need to be, at least as long > as it's the domain itself which requests the unmap (and which I would > view to extend to the domain being cleaned up). But maybe that's why it's > XSM_HOOK ... There are situations for instance where there is a flask-based system with one or more domains (v-platform-mgr) that are each responsible for the management of a subset of domains and are responsible for hotplugging in and out a device, i.e. granting the privilege to a v-platform-mgr to call PHYSDEVOP_map_pirq/PHYSDEVOP_unmap_pirq, for the domains each one is managing. >> --- >> Dan wants to change current to point at DOMID_IDLE when the RCU callback >> runs. I think Juergen's commit 53594c7bd197 "rcu: don't use >> stop_machine_run() for rcu_barrier()" may have changed this since it >> mentions stop_machine_run scheduled the idle vcpus to run the callbacks >> for the old code. >> >> Would that be as easy as changing rcu_do_batch() to do: >> >> +/* Run as "Xen" not a random domain's vcpu. */ >> +vcpu = get_current(); >> +set_current(idle_vcpu[smp_processor_id()]); >> list->func(list); >> +set_current(vcpu); >> >> or is using set_current() only acceptable as part of context_switch? > > Indeed I would question any uses outside of context_switch() (and > system bringup). I am not familiar with the details of the scheduler, but from a higher level, conceptual perspective, I do not understand why an idle domain task is being executed without an explicit context switch to the idle domain to ensure the current world view is consistent with the task execution scope. Just seems to me like this is creating a situation where things have the potential to go sideways/wrong. v/r, dps
Re: [PATCH 2/2] arch: ensure idle domain is not left privileged
On 4/5/22 04:26, Jan Beulich wrote: > On 31.03.2022 01:05, Daniel P. Smith wrote: >> --- a/xen/arch/x86/setup.c >> +++ b/xen/arch/x86/setup.c >> @@ -589,6 +589,9 @@ static void noinline init_done(void) >> void *va; >> unsigned long start, end; >> >> +/* Ensure idle domain was not left privileged */ >> +ASSERT(current->domain->is_privileged == false) ; > > I think this should be stronger than ASSERT(); I'd recommend calling > panic(). Also please don't compare against "true" or "false" - use > ordinary boolean operations instead (here it would be > "!current->domain->is_privileged"). Ack. v/r, dps
[ovmf test] 169173: regressions - FAIL
flight 169173 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/169173/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64 6 xen-buildfail REGR. vs. 168254 build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a298a84478053872ed9da660a75f182ce81b8ddc baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 36 days Failing since168258 2022-03-01 01:55:31 Z 35 days 279 attempts Testing same since 169173 2022-04-05 05:13:00 Z0 days1 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bob Feng Gerd Hoffmann Guo Dong Guomin Jiang Hao A Wu Hua Ma Huang, Li-Xia Jagadeesh Ujja Jason Jason Lou Ken Lautner Kenneth Lautner Kuo, Ted Laszlo Ersek Leif Lindholm Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min Xu Patrick Rudolph Purna Chandra Rao Bandaru Ray Ni Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Ted Kuo Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li Yuanhao Xie Zhihao Li jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 4610 lines long.)
Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged
On 4/5/22 03:42, Roger Pau Monné wrote: > On Mon, Apr 04, 2022 at 12:08:25PM -0400, Daniel P. Smith wrote: >> On 4/4/22 11:12, Roger Pau Monné wrote: >>> On Mon, Apr 04, 2022 at 10:21:18AM -0400, Daniel P. Smith wrote: On 3/31/22 08:36, Roger Pau Monné wrote: > On Wed, Mar 30, 2022 at 07:05:48PM -0400, Daniel P. Smith wrote: >> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h >> index e22d6160b5..157e57151e 100644 >> --- a/xen/include/xsm/xsm.h >> +++ b/xen/include/xsm/xsm.h >> @@ -189,6 +189,28 @@ struct xsm_operations { >> #endif >> }; >> >> +static always_inline int xsm_elevate_priv(struct domain *d) > > I don't think it needs to be always_inline, using just inline would be > fine IMO. > > Also this needs to be __init. AIUI always_inline is likely the best way to preserve the speculation safety brought in by the call to is_system_domain(). >>> >>> There's nothing related to speculation safety in is_system_domain() >>> AFAICT. It's just a plain check against d->domain_id. It's my >>> understanding there's no need for any speculation barrier there >>> because d->domain_id is not an external input. >> >> Hmmm, this actually raises a good question. Why is is_control_domain(), >> is_hardware_domain, and others all have evaluate_nospec() wrapping the >> check of a struct domain element while is_system_domain() does not? > > Jan replied to this regard, see: > > https://lore.kernel.org/xen-devel/54272d08-7ce1-b162-c8e9-1955b780c...@suse.com/ Jan can correct me if I misunderstood, but his point is with respect to where the inline function will be expanded into and I would think you would want to ensure that if anyone were to use is_system_domain(), then the inline expansion of this new location could create a potential speculation-able branch. Basically my concern is not putting the guards in place today just because there is not currently any location where is_system_domain() is expanded to create a speculation opportunity does not mean there is not an opening for the opportunity down the road for a future unprotected use. >>> In any case this function should be __init only, at which point there >>> are no untrusted inputs to Xen. >> >> I thought it was agreed that __init on inline functions in headers had >> no meaning? > > In a different reply I already noted my preference would be for the > function to not reside in a header and not be inline, simply because > it would be gone after initialization and we won't have to worry about > any stray calls when the system is active. If an inline function is only used by __init code, how would be available for stray calls when the system is active? I would concede that it is possible for someone to explicitly use in not __init code but I would like to believe any usage in a submitted code change would be questioned by the reviewers. With that said, if we consider Jason's suggestion would this remove your concern since that would only introduce a de-privilege function and there would be no piv escalation that could be erroneously called at anytime? v/r dps
Xen Security Advisory 399 v2 (CVE-2022-26357) - race in VT-d domain ID cleanup
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2022-26357 / XSA-399 version 2 race in VT-d domain ID cleanup UPDATES IN VERSION 2 Public release. ISSUE DESCRIPTION = Xen domain IDs are up to 15 bits wide. VT-d hardware may allow for only less than 15 bits to hold a domain ID associating a physical device with a particular domain. Therefore internally Xen domain IDs are mapped to the smaller value range. The cleaning up of the housekeeping structures has a race, allowing for VT-d domain IDs to be leaked and flushes to be bypassed. IMPACT == The precise impact is system specific, but would typically be a Denial of Service (DoS) affecting the entire host. Privilege escalation and information leaks cannot be ruled out. VULNERABLE SYSTEMS == Xen versions 4.11 through 4.16 are vulnerable. Xen versions 4.10 and earlier are not vulnerable. Only x86 systems with VT-d IOMMU hardware are vulnerable. Arm systems as well as x86 systems without VT-d hardware or without any IOMMUs in use are not vulnerable. Only x86 guests which have physical devices passed through to them can leverage the vulnerability. MITIGATION == Not passing through physical devices to untrusted guests will avoid the vulnerability. CREDITS === This issue was discovered by Jan Beulich of SUSE. RESOLUTION == Applying the appropriate attached patch resolves this issue. Note that patches for released versions are generally prepared to apply to the stable branches, and may not apply cleanly to the most recent release tarball. Downstreams are encouraged to update to the tip of the stable branch before applying these patches. xsa399.patch xen-unstable xsa399-4.16.patch Xen 4.16.x - Xen 4.13.x xsa399-4.12.patch Xen 4.12.x $ sha256sum xsa399* 53b9745564eb21f70dbb7bd7194ff3518f29cd9715c68e9dd7eff25812968019 xsa399.patch 16c3327a60d8ab6c3524f10f57d63efaf2e3e54b807bc285a749cd1a94392a30 xsa399-4.12.patch 79d0f5a0442dec0a806d77a722a1d2c04793572fe0b564bf86dcd1c6d992a679 xsa399-4.16.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. HOWEVER, deployment of the mitigation is NOT permitted (except where all the affected systems and VMs are administered and used only by organisations which are members of the Xen Project Security Issues Predisclosure List). Specifically, deployment on public cloud systems is NOT permitted. This is because removal of pass-through devices or their replacement by emulated devices is a guest visible configuration change, which may lead to re-discovery of the issue. Deployment of this mitigation is permitted only AFTER the embargo ends. AND: Distribution of updated software is prohibited (except to other members of the predisclosure list). Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmJMJDcMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZpo8H/AqiAS0l5WJWl00bTQ4Q69REzd83m9Y3+UnUqRaf JUFWo4R1m4V2zJlq0E3TR/2ZS1RkXFJxlmXQyzueFmDEvMV2oKB0ids5ta1oUO2E eiQxdSFbTLrLnhI+4IxbTHHy+ovSHT/SKPeo1Zd1tXHfZ35g1OgGTYHHqj7RKJHp SyZT4iuAKjIr61M4NBKJcycpfRidlXEDvAotDX3jBQ06t3vgs/12nwe5LzzeV2V4 sIDjpeDGNKzgT2NgLP2b+XMEUg1259iWb19tS3PPNJaLKSvQqTBOFjK+sqh7ACXV v6ph2Yy0Q/ZP+N9DvCeBCPEU9A9RhmPYzobU+Lc/T85SrQ4= =sp/Q -END PGP SIGNATURE- xsa399.patch Description: Binary data xsa399-4.12.patch Description: Binary data xsa399-4.16.patch Description: Binary data
Xen Security Advisory 397 v2 (CVE-2022-26356) - Racy interactions between dirty vram tracking and paging log dirty hypercalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2022-26356 / XSA-397 version 2 Racy interactions between dirty vram tracking and paging log dirty hypercalls UPDATES IN VERSION 2 Public release. ISSUE DESCRIPTION = Activation of log dirty mode done by XEN_DMOP_track_dirty_vram (was named HVMOP_track_dirty_vram before Xen 4.9) is racy with ongoing log dirty hypercalls. A suitably timed call to XEN_DMOP_track_dirty_vram can enable log dirty while another CPU is still in the process of tearing down the structures related to a previously enabled log dirty mode (XEN_DOMCTL_SHADOW_OP_OFF). This is due to lack of mutually exclusive locking between both operations and can lead to entries being added in already freed slots, resulting in a memory leak. IMPACT == An attacker can cause Xen to leak memory, eventually leading to a Denial of Service (DoS) affecting the entire host. VULNERABLE SYSTEMS == All Xen versions from at least 4.0 onwards are vulnerable. Only x86 systems are vulnerable. Arm systems are not vulnerable. Only domains controlling an x86 HVM guest using Hardware Assisted Paging (HAP) can leverage the vulnerability. On common deployments this is limited to domains that run device models on behalf of guests. MITIGATION == Using only PV or PVH guests and/or running HVM guests in shadow mode will avoid the vulnerability. CREDITS === This issue was discovered by Roger Pau Monné of Citrix. RESOLUTION == Applying the appropriate attached patch resolves this issue. Note that patches for released versions are generally prepared to apply to the stable branches, and may not apply cleanly to the most recent release tarball. Downstreams are encouraged to update to the tip of the stable branch before applying these patches. xsa397.patch xen-unstable xsa397-4.16.patch Xen 4.16.x - Xen 4.15.x xsa397-4.14.patch Xen 4.14.x - Xen 4.13.x xsa397-4.12.patch Xen 4.12.x $ sha256sum xsa397* 49c663e2bb9131dbc2488e12487f79bdf0dafd51a32413cbf3964e39d8779cae xsa397.patch 24f95f47b79739c9cb5b9110137c802989356c82d0aa27963b5ac7e33f667285 xsa397-4.12.patch 9af14f90ba10d074425eb6072a6c648082c92c1cf8b6f881f57ed2fc13d6e49d xsa397-4.14.patch ff5dd3b7a8dbf349c3b832b7916322c0296fa59c7f9cd2ba30858989add5f65c xsa397-4.16.patch $ DEPLOYMENT DURING EMBARGO = Deployment of the patches described above (or others which are substantially similar) is permitted during the embargo, even on public-facing systems with untrusted guest users and administrators. But: Distribution of updated software (except to other members of the predisclosure list) or deployment of mitigations is prohibited. Predisclosure list members who wish to deploy significantly different patches and/or mitigations, please contact the Xen Project Security Team. (Note: this during-embargo deployment notice is retained in post-embargo publicly released Xen Project advisories, even though it is then no longer applicable. This is to enable the community to have oversight of the Xen Project Security Team's decisionmaking.) For more information about permissible uses of embargoed information, consult the Xen Project community's agreed Security Policy: http://www.xenproject.org/security-policy.html -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmJMJDEMHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZOUMH/RRZ8aMaoywqTV38SeTFne2tFT5jnWPPXR1ZGCvh 825hmSqzcYUaILbWFruUfT2PdpGoU9Eprz3xWXBDwgsUEGvKt7ZhGoWvxzXASlDh cPRh/XwQVEEYsB1cRSk/GoLxLCQEV8oGNpmAcjEM4K1dG0VbVaRD0W2thNCmyPcv d7aTkAdD2IE8NU4hX8YGN6v+UCkjrgzL0AF/hff9CMj7Sn/wBRrdStLT0LDZU20c G/5+9nsOAVM7EwrzImI5Lx9KELyHwl37XUPffbftyTLUofdHJ5PK40J1tNIRS/RW YYvs2alF7ng7LlwB/Go8gtn4XRx6xZidceYrUk22oB4JBqo= =Fje3 -END PGP SIGNATURE- xsa397.patch Description: Binary data xsa397-4.12.patch Description: Binary data xsa397-4.14.patch Description: Binary data xsa397-4.16.patch Description: Binary data
[qemu-mainline test] 169166: tolerable FAIL - PUSHED
flight 169166 qemu-mainline real [real] flight 169176 qemu-mainline real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/169166/ http://logs.test-lab.xenproject.org/osstest/logs/169176/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-arm64-arm64-xl 13 debian-fixupfail pass in 169176-retest Tests which did not succeed, but are not blocking: test-arm64-arm64-xl 15 migrate-support-check fail in 169176 never pass test-arm64-arm64-xl 16 saverestore-support-check fail in 169176 never pass test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169138 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169138 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 169138 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 169138 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 169138 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 169138 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 169138 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169138 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass version targeted for testing: qemuu20661b75ea6093f5e59079d00a778a972d6732c5 baseline version: qemuubc6ec396d471d9e4aae7e2ff8b72e11da9a97665 Last test of basis 169138 2022-04-03 02:47:41 Z
Re: [PATCH v4 2/8] x86/boot: obtain video info from boot loader
On Tue, Apr 05, 2022 at 12:57:51PM +0200, Jan Beulich wrote: > On 05.04.2022 11:35, Roger Pau Monné wrote: > > On Thu, Mar 31, 2022 at 11:45:02AM +0200, Jan Beulich wrote: > >> --- a/xen/arch/x86/boot/head.S > >> +++ b/xen/arch/x86/boot/head.S > >> @@ -562,12 +562,18 @@ trampoline_setup: > >> mov %esi, sym_esi(xen_phys_start) > >> mov %esi, sym_esi(trampoline_xen_phys_start) > >> > >> -mov sym_esi(trampoline_phys), %ecx > >> - > >> /* Get bottom-most low-memory stack address. */ > >> +mov sym_esi(trampoline_phys), %ecx > >> add $TRAMPOLINE_SPACE,%ecx > > > > Just for my understanding, since you are already touching the > > instruction, why not switch it to a lea like you do below? > > > > Is that because you would also like to take the opportunity to fold > > the add into the lea and that would be too much of a change? > > No. This MOV cannot be converted, as its source operand isn't an > immediate (or register); such a conversion would also be undesirable, > for increasing insn size. See the later patch doing conversions in > the other direction, to reduce code size. Somewhat similarly ... > > >> +#ifdef CONFIG_VIDEO > >> +lea sym_esi(boot_vid_info), %edx > > ... this LEA also cannot be expressed by a single MOV. > > >> @@ -32,6 +33,39 @@ asm ( > >> #include "../../../include/xen/kconfig.h" > >> #include > >> > >> +#ifdef CONFIG_VIDEO > >> +# include "video.h" > >> + > >> +/* VESA control information */ > >> +struct __packed vesa_ctrl_info { > >> +uint8_t signature[4]; > >> +uint16_t version; > >> +uint32_t oem_name; > >> +uint32_t capabilities; > >> +uint32_t mode_list; > >> +uint16_t mem_size; > >> +/* We don't use any further fields. */ > >> +}; > >> + > >> +/* VESA 2.0 mode information */ > >> +struct vesa_mode_info { > > > > Should we add __packed here just in case further added fields are no > > longer naturally aligned? (AFAICT all field right now are aligned to > > it's size so there's no need for it). > > I think we should avoid __packed whenever possible. > > >> +uint16_t attrib; > >> +uint8_t window[14]; /* We don't use the individual fields. */ > >> +uint16_t bytes_per_line; > >> +uint16_t width; > >> +uint16_t height; > >> +uint8_t cell_width; > >> +uint8_t cell_height; > >> +uint8_t nr_planes; > >> +uint8_t depth; > >> +uint8_t memory[5]; /* We don't use the individual fields. */ > >> +struct boot_video_colors colors; > >> +uint8_t direct_color; > >> +uint32_t base; > >> +/* We don't use any further fields. */ > >> +}; > > > > Would it make sense to put those struct definitions in boot/video.h > > like you do for boot_video_info? > > Personally I prefer to expose things in headers only when multiple > other files want to consume what is being declared/defined. > > >> @@ -254,17 +291,64 @@ static multiboot_info_t *mbi2_reloc(u32 > >> ++mod_idx; > >> break; > >> > >> +#ifdef CONFIG_VIDEO > >> +case MULTIBOOT2_TAG_TYPE_VBE: > >> +if ( video_out ) > >> +{ > >> +const struct vesa_ctrl_info *ci; > >> +const struct vesa_mode_info *mi; > >> + > >> +video = _p(video_out); > >> +ci = (void *)get_mb2_data(tag, vbe, vbe_control_info); > >> +mi = (void *)get_mb2_data(tag, vbe, vbe_mode_info); > >> + > >> +if ( ci->version >= 0x0200 && (mi->attrib & 0x9b) == 0x9b > >> ) > >> +{ > >> +video->capabilities = ci->capabilities; > >> +video->lfb_linelength = mi->bytes_per_line; > >> +video->lfb_width = mi->width; > >> +video->lfb_height = mi->height; > >> +video->lfb_depth = mi->depth; > >> +video->lfb_base = mi->base; > >> +video->lfb_size = ci->mem_size; > >> +video->colors = mi->colors; > >> +video->vesa_attrib = mi->attrib; > >> +} > >> + > >> +video->vesapm.seg = get_mb2_data(tag, vbe, > >> vbe_interface_seg); > >> +video->vesapm.off = get_mb2_data(tag, vbe, > >> vbe_interface_off); > >> +} > >> +break; > >> + > >> +case MULTIBOOT2_TAG_TYPE_FRAMEBUFFER: > >> +if ( (get_mb2_data(tag, framebuffer, framebuffer_type) != > >> + MULTIBOOT2_FRAMEBUFFER_TYPE_RGB) ) > >> +{ > >> +video_out = 0; > >> +video = NULL; > >> +} > > > > I'm confused, don't you need to store the information in the > > framebuffer tag for use after relocation? > > If there was a consumer - yes. Right now this tag is used only to > invalidate the information taken from the other tag (or to suppress > taking values from there if that other tag
Re: [PATCH 1/2] hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XEN
Am 26. März 2022 16:58:23 UTC schrieb Bernhard Beschow : >igd-passthrough-isa-bridge is only requested in xen_pt but was >implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc >which is hereby resolved. > >Signed-off-by: Bernhard Beschow >--- > hw/i386/pc_piix.c| 118 -- > hw/xen/xen_pt.c | 1 - > hw/xen/xen_pt.h | 1 + > hw/xen/xen_pt_graphics.c | 119 +++ > include/hw/i386/pc.h | 1 - > 5 files changed, 120 insertions(+), 120 deletions(-) > >diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c >index b72c03d0a6..6ad5c02f07 100644 >--- a/hw/i386/pc_piix.c >+++ b/hw/i386/pc_piix.c >@@ -801,124 +801,6 @@ static void pc_i440fx_1_4_machine_options(MachineClass >*m) > DEFINE_I440FX_MACHINE(v1_4, "pc-i440fx-1.4", pc_compat_1_4_fn, > pc_i440fx_1_4_machine_options); > >-typedef struct { >-uint16_t gpu_device_id; >-uint16_t pch_device_id; >-uint8_t pch_revision_id; >-} IGDDeviceIDInfo; >- >-/* In real world different GPU should have different PCH. But actually >- * the different PCH DIDs likely map to different PCH SKUs. We do the >- * same thing for the GPU. For PCH, the different SKUs are going to be >- * all the same silicon design and implementation, just different >- * features turn on and off with fuses. The SW interfaces should be >- * consistent across all SKUs in a given family (eg LPT). But just same >- * features may not be supported. >- * >- * Most of these different PCH features probably don't matter to the >- * Gfx driver, but obviously any difference in display port connections >- * will so it should be fine with any PCH in case of passthrough. >- * >- * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell) >- * scenarios, 0x9cc3 for BDW(Broadwell). >- */ >-static const IGDDeviceIDInfo igd_combo_id_infos[] = { >-/* HSW Classic */ >-{0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */ >-{0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */ >-{0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */ >-{0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */ >-{0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */ >-/* HSW ULT */ >-{0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */ >-{0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */ >-{0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */ >-{0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */ >-{0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */ >-{0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */ >-/* HSW CRW */ >-{0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */ >-{0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */ >-/* HSW Server */ >-{0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */ >-/* HSW SRVR */ >-{0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */ >-/* BSW */ >-{0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */ >-{0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */ >-{0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */ >-{0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */ >-{0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */ >-{0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */ >-{0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */ >-{0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */ >-{0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */ >-{0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */ >-{0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */ >-}; >- >-static void isa_bridge_class_init(ObjectClass *klass, void *data) >-{ >-DeviceClass *dc = DEVICE_CLASS(klass); >-PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); >- >-dc->desc= "ISA bridge faked to support IGD PT"; >-set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); >-k->vendor_id= PCI_VENDOR_ID_INTEL; >-k->class_id = PCI_CLASS_BRIDGE_ISA; >-}; >- >-static const TypeInfo isa_bridge_info = { >-.name = "igd-passthrough-isa-bridge", >-.parent= TYPE_PCI_DEVICE, >-.instance_size = sizeof(PCIDevice), >-.class_init = isa_bridge_class_init, >-.interfaces = (InterfaceInfo[]) { >-{ INTERFACE_CONVENTIONAL_PCI_DEVICE }, >-{ }, >-}, >-}; >- >-static void pt_graphics_register_types(void) >-{ >-type_register_static(_bridge_info); >-} >-type_init(pt_graphics_register_types) >- >-void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id) >-{ >-struct PCIDevice *bridge_dev; >-int i, num; >-uint16_t pch_dev_id = 0x; >-uint8_t pch_rev_id = 0; >- >-num = ARRAY_SIZE(igd_combo_id_infos); >-for (i = 0; i < num; i++) { >-if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) { >-pch_dev_id = igd_combo_id_infos[i].pch_device_id; >-pch_rev_id = igd_combo_id_infos[i].pch_revision_id; >-} >-} >- >-if (pch_dev_id == 0x) { >-return; >-} >- >-/* Currently IGD drivers always need to access PCH by 1f.0. */ >-bridge_dev =
Re: [PATCH v4 8/8] x86/boot: fold two MOVs into an ADD
On Thu, Mar 31, 2022 at 11:51:02AM +0200, Jan Beulich wrote: > There's no point going through %ax; the addition can be done directly in > %di. > > Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH v4 7/8] x86/boot: LEA -> MOV in video handling code
On Thu, Mar 31, 2022 at 11:50:20AM +0200, Jan Beulich wrote: > Replace most LEA instances with (one byte shorter) MOV. > > Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH v4 6/8] x86/boot: fold/replace moves in video handling code
On Thu, Mar 31, 2022 at 11:50:00AM +0200, Jan Beulich wrote: > Replace (mainly) MOV forms with shorter insns (or sequences thereof). > > Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none
On 05/04/2022 12:04, Jan Beulich wrote: > On 05.04.2022 12:58, Andrew Cooper wrote: >> On 05/04/2022 11:18, Jan Beulich wrote: >>> On 01.04.2022 17:05, Andrew Cooper wrote: On 01/04/2022 15:48, Andrew Cooper wrote: > On 01/04/2022 15:37, Roger Pau Monne wrote: >> Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the >> Makefile doesn't get it propagated to the subdirectories, so instead >> set the flag in firmware/Rules.mk, like it's done for other compiler >> flags. >> >> Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT') >> Signed-off-by: Roger Pau Monné > Acked-by: Andrew Cooper This also needs backporting with the XSA-398 CET-IBT fixes. >>> I don't think so - the backports of the original commit didn't include >>> what this patch fixes. I have queued patch 2 of this series though. >> In which case I screwed up the backport. (I remember spotting this bug >> and thought I'd corrected it, but clearly not.) tools/firmware really >> does need to be -fcf-protection=none to counteract the defaults in >> Ubuntu/etc. > Okay, I'll adjust title and description some then while doing the backport. Thanks, and sorry for this mess. ~Andrew
Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none
On 05.04.2022 12:58, Andrew Cooper wrote: > On 05/04/2022 11:18, Jan Beulich wrote: >> On 01.04.2022 17:05, Andrew Cooper wrote: >>> On 01/04/2022 15:48, Andrew Cooper wrote: On 01/04/2022 15:37, Roger Pau Monne wrote: > Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the > Makefile doesn't get it propagated to the subdirectories, so instead > set the flag in firmware/Rules.mk, like it's done for other compiler > flags. > > Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT') > Signed-off-by: Roger Pau Monné Acked-by: Andrew Cooper >>> This also needs backporting with the XSA-398 CET-IBT fixes. >> I don't think so - the backports of the original commit didn't include >> what this patch fixes. I have queued patch 2 of this series though. > > In which case I screwed up the backport. (I remember spotting this bug > and thought I'd corrected it, but clearly not.) tools/firmware really > does need to be -fcf-protection=none to counteract the defaults in > Ubuntu/etc. Okay, I'll adjust title and description some then while doing the backport. Jan
Re: Increasing domain memory beyond initial maxmem
Hi Marek, On 31.03.22 14:36, Marek Marczykowski-Górecki wrote: On Thu, Mar 31, 2022 at 02:22:03PM +0200, Juergen Gross wrote: Maybe some kernel config differences, or other udev rules (memory onlining is done via udev in my guest)? I'm seeing: # zgrep MEMORY_HOTPLUG /proc/config.gz CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG=y # CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512 I have: # zgrep MEMORY_HOTPLUG /proc/config.gz CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512 Not sure if relevant, but I also have: CONFIG_XEN_UNPOPULATED_ALLOC=y on top of that, I have a similar udev rule too: SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online" But I don't think they are conflicting. What type of guest are you using? Mine was a PVH guest. PVH here too. Would you like to try the attached patch? It seemed to work for me. Juergen From a605232115a9c3d3f8103d0833b149ff22956c4b Mon Sep 17 00:00:00 2001 From: Juergen Gross To: linux-ker...@vger.kernel.org Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org Date: Tue, 5 Apr 2022 12:43:41 +0200 Subject: [PATCH] xen/balloon: fix page onlining when populating new zone MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When onlining a new memory page in a guest the Xen balloon driver is adding it to the ballooned pages instead making it available to be used immediately. This is meant to enable to add a new upper memory limit to a guest via hotplugging memory, without having to assign the new memory in one go. In case the upper memory limit will be raised above 4G, the new memory will populate the ZONE_NORMAL memory zone, which wasn't populated before. The newly populated zone won't be added to the list of zones looked at by the page allocator though, as only zones with available memory are being added, and the memory isn't yet available as it is ballooned out. This will result in the new memory being assigned to the guest, but without the allocator being able to use it. When running as a PV guest the situation is even worse: when having been started with less memory than allowed, and the upper limit being lower than 4G, ballooning up will have the same effect as hotplugging new memory. This is due to the usage of the zone device functionality since commit 9e2369c06c8a ("xen: add helpers to allocate unpopulated memory") for creating mappings of other guest's pages, which as a side effect is being used for PV guest ballooning, too. Fix this by checking in xen_online_page() whether the new memory page will be the first in a new zone. If this is the case, add another page to the balloon and use the first memory page of the new chunk as a replacement for this now ballooned out page. This will result in the newly populated zone containing one page being available for the page allocator, which in turn will lead to the zone being added to the allocator. Cc: sta...@vger.kernel.org Fixes: 9e2369c06c8a ("xen: add helpers to allocate unpopulated memory") Reported-by: Marek Marczykowski-Górecki Signed-off-by: Juergen Gross --- drivers/xen/balloon.c | 72 ++- 1 file changed, 65 insertions(+), 7 deletions(-) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index dfe26fa17e95..f895c54c4c65 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -355,14 +355,77 @@ static enum bp_state reserve_additional_memory(void) return BP_ECANCELED; } +static struct page *alloc_page_for_balloon(gfp_t gfp) +{ + struct page *page; + + page = alloc_page(gfp); + if (page == NULL) + return NULL; + + adjust_managed_page_count(page, -1); + xenmem_reservation_scrub_page(page); + + return page; +} + +static void add_page_to_balloon(struct page *page) +{ + xenmem_reservation_va_mapping_reset(1, ); + balloon_append(page); +} + static void xen_online_page(struct page *page, unsigned int order) { unsigned long i, size = (1 << order); unsigned long start_pfn = page_to_pfn(page); struct page *p; + struct zone *zone; pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn); mutex_lock(_mutex); + zone = page_zone(pfn_to_page(start_pfn)); + + /* + * In case a new memory zone is going to be populated, we need to + * ensure at least one page is made available for the memory allocator. + * As the number of pages per zone is updated only after a batch of + * pages having been added, use the number of managed pages as an + * additional indicator for a new zone. + * Otherwise this zone won't be added to the zonelist resulting in the + * zone's memory not usable by the kernel. + * Add an already valid page to the balloon and replace it with the + * first page of the to be added new
Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none
On 05/04/2022 11:18, Jan Beulich wrote: > On 01.04.2022 17:05, Andrew Cooper wrote: >> On 01/04/2022 15:48, Andrew Cooper wrote: >>> On 01/04/2022 15:37, Roger Pau Monne wrote: Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the Makefile doesn't get it propagated to the subdirectories, so instead set the flag in firmware/Rules.mk, like it's done for other compiler flags. Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT') Signed-off-by: Roger Pau Monné >>> Acked-by: Andrew Cooper >> This also needs backporting with the XSA-398 CET-IBT fixes. > I don't think so - the backports of the original commit didn't include > what this patch fixes. I have queued patch 2 of this series though. In which case I screwed up the backport. (I remember spotting this bug and thought I'd corrected it, but clearly not.) tools/firmware really does need to be -fcf-protection=none to counteract the defaults in Ubuntu/etc. ~Andrew
Re: [PATCH v4 2/8] x86/boot: obtain video info from boot loader
On 05.04.2022 11:35, Roger Pau Monné wrote: > On Thu, Mar 31, 2022 at 11:45:02AM +0200, Jan Beulich wrote: >> --- a/xen/arch/x86/boot/head.S >> +++ b/xen/arch/x86/boot/head.S >> @@ -562,12 +562,18 @@ trampoline_setup: >> mov %esi, sym_esi(xen_phys_start) >> mov %esi, sym_esi(trampoline_xen_phys_start) >> >> -mov sym_esi(trampoline_phys), %ecx >> - >> /* Get bottom-most low-memory stack address. */ >> +mov sym_esi(trampoline_phys), %ecx >> add $TRAMPOLINE_SPACE,%ecx > > Just for my understanding, since you are already touching the > instruction, why not switch it to a lea like you do below? > > Is that because you would also like to take the opportunity to fold > the add into the lea and that would be too much of a change? No. This MOV cannot be converted, as its source operand isn't an immediate (or register); such a conversion would also be undesirable, for increasing insn size. See the later patch doing conversions in the other direction, to reduce code size. Somewhat similarly ... >> +#ifdef CONFIG_VIDEO >> +lea sym_esi(boot_vid_info), %edx ... this LEA also cannot be expressed by a single MOV. >> @@ -32,6 +33,39 @@ asm ( >> #include "../../../include/xen/kconfig.h" >> #include >> >> +#ifdef CONFIG_VIDEO >> +# include "video.h" >> + >> +/* VESA control information */ >> +struct __packed vesa_ctrl_info { >> +uint8_t signature[4]; >> +uint16_t version; >> +uint32_t oem_name; >> +uint32_t capabilities; >> +uint32_t mode_list; >> +uint16_t mem_size; >> +/* We don't use any further fields. */ >> +}; >> + >> +/* VESA 2.0 mode information */ >> +struct vesa_mode_info { > > Should we add __packed here just in case further added fields are no > longer naturally aligned? (AFAICT all field right now are aligned to > it's size so there's no need for it). I think we should avoid __packed whenever possible. >> +uint16_t attrib; >> +uint8_t window[14]; /* We don't use the individual fields. */ >> +uint16_t bytes_per_line; >> +uint16_t width; >> +uint16_t height; >> +uint8_t cell_width; >> +uint8_t cell_height; >> +uint8_t nr_planes; >> +uint8_t depth; >> +uint8_t memory[5]; /* We don't use the individual fields. */ >> +struct boot_video_colors colors; >> +uint8_t direct_color; >> +uint32_t base; >> +/* We don't use any further fields. */ >> +}; > > Would it make sense to put those struct definitions in boot/video.h > like you do for boot_video_info? Personally I prefer to expose things in headers only when multiple other files want to consume what is being declared/defined. >> @@ -254,17 +291,64 @@ static multiboot_info_t *mbi2_reloc(u32 >> ++mod_idx; >> break; >> >> +#ifdef CONFIG_VIDEO >> +case MULTIBOOT2_TAG_TYPE_VBE: >> +if ( video_out ) >> +{ >> +const struct vesa_ctrl_info *ci; >> +const struct vesa_mode_info *mi; >> + >> +video = _p(video_out); >> +ci = (void *)get_mb2_data(tag, vbe, vbe_control_info); >> +mi = (void *)get_mb2_data(tag, vbe, vbe_mode_info); >> + >> +if ( ci->version >= 0x0200 && (mi->attrib & 0x9b) == 0x9b ) >> +{ >> +video->capabilities = ci->capabilities; >> +video->lfb_linelength = mi->bytes_per_line; >> +video->lfb_width = mi->width; >> +video->lfb_height = mi->height; >> +video->lfb_depth = mi->depth; >> +video->lfb_base = mi->base; >> +video->lfb_size = ci->mem_size; >> +video->colors = mi->colors; >> +video->vesa_attrib = mi->attrib; >> +} >> + >> +video->vesapm.seg = get_mb2_data(tag, vbe, >> vbe_interface_seg); >> +video->vesapm.off = get_mb2_data(tag, vbe, >> vbe_interface_off); >> +} >> +break; >> + >> +case MULTIBOOT2_TAG_TYPE_FRAMEBUFFER: >> +if ( (get_mb2_data(tag, framebuffer, framebuffer_type) != >> + MULTIBOOT2_FRAMEBUFFER_TYPE_RGB) ) >> +{ >> +video_out = 0; >> +video = NULL; >> +} > > I'm confused, don't you need to store the information in the > framebuffer tag for use after relocation? If there was a consumer - yes. Right now this tag is used only to invalidate the information taken from the other tag (or to suppress taking values from there if that other tag came later) in case the framebuffer type doesn't match what we support. >> +break; >> +#endif /* CONFIG_VIDEO */ >> + >> case MULTIBOOT2_TAG_TYPE_END: >> -return mbi_out; >> +goto end; /* Cannot "break;" here. */ >> >> default: >> break; >> } >>
Re: [PATCH v4 5/8] x86/boot: fold branches in video handling code
On Thu, Mar 31, 2022 at 11:49:24AM +0200, Jan Beulich wrote: > Using Jcc to branch around a JMP is necessary only in pre-386 code, > where Jcc is limited to disp8. Use the opposite Jcc directly in two > places. Since it's adjacent, also convert an ORB to TESTB. > > Signed-off-by: Jan Beulich Reviewed-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH v4 4/8] x86/boot: simplify mode_table
On Thu, Mar 31, 2022 at 11:48:51AM +0200, Jan Beulich wrote: > There's no point in writing 80x25 text mode information via multiple > insns all storing immediate values. The data can simply be included > first thing in the vga_modes table, allowing the already present > REP MOVSB to take care of everything in one go. > > While touching this also correct a related but stale comment. > > Signed-off-by: Jan Beulich Reviewed-by: Roger Pau Monné Thanks, Roger.
[seabios test] 169167: tolerable FAIL - PUSHED
flight 169167 seabios real [real] http://logs.test-lab.xenproject.org/osstest/logs/169167/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168315 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168315 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168315 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168315 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168315 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass version targeted for testing: seabios 01774004c7f7fdc9c1e8f1715f70d3b913f8d491 baseline version: seabios d239552ce7220e448ae81f41515138f7b9e3c4db Last test of basis 168315 2022-03-02 02:40:13 Z 34 days Testing same since 169167 2022-04-04 21:41:47 Z0 days1 attempts People who touched revisions under test: Volker Rümelin jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass test-amd64-i386-xl-qemuu-debianhvm-i386-xsm pass test-amd64-amd64-qemuu-nested-amdfail test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-amd64-qemuu-freebsd11-amd64 pass test-amd64-amd64-qemuu-freebsd12-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-qemuu-ws16-amd64 fail test-amd64-i386-xl-qemuu-ws16-amd64 fail test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictpass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass test-amd64-amd64-qemuu-nested-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/seabios.git d239552..0177400 01774004c7f7fdc9c1e8f1715f70d3b913f8d491 -> xen-tested-master
Re: [PATCH v4 3/8] x86/EFI: retrieve EDID
On Thu, Mar 31, 2022 at 11:45:36AM +0200, Jan Beulich wrote: > When booting directly from EFI, obtaining this information from EFI is > the only possible way. And even when booting with a boot loader > interposed, it's more clean not to use legacy BIOS calls for this > purpose. (The downside being that there are no "capabilities" that we > can retrieve the EFI way.) > > To achieve this we need to propagate the handle used to obtain the > EFI_GRAPHICS_OUTPUT_PROTOCOL instance for further obtaining an > EFI_EDID_*_PROTOCOL instance, which has been part of the spec since 2.5. > > Signed-off-by: Jan Beulich > --- > Setting boot_edid_caps to zero isn't desirable, but arbitrarily setting > one or both of the two low bits also doesn't seem appropriate. > > GrUB also checks an "agp-internal-edid" variable. As I haven't been able > to find any related documentation, and as GrUB being happy about the > variable being any size (rather than at least / precisely 128 bytes), > I didn't follow that route. > --- > v3: Re-base. > v2: New. > > --- a/xen/arch/arm/efi/efi-boot.h > +++ b/xen/arch/arm/efi/efi-boot.h > @@ -464,6 +464,10 @@ static void __init efi_arch_edd(void) > { > } > > +static void __init efi_arch_edid(EFI_HANDLE gop_handle) > +{ > +} > + > static void __init efi_arch_memory_setup(void) > { > } > --- a/xen/arch/x86/boot/video.S > +++ b/xen/arch/x86/boot/video.S > @@ -922,7 +922,14 @@ store_edid: > pushw %dx > pushw %di > > -cmpb$1, bootsym(opt_edid) # EDID disabled on cmdline (edid=no)? > +movbbootsym(opt_edid), %al > +cmpw$0x1313, bootsym(boot_edid_caps) # Data already retrieved? > +je .Lcheck_edid > +cmpb$2, %al # EDID forced on cmdline > (edid=force)? > +jne .Lno_edid > + > +.Lcheck_edid: > +cmpb$1, %al # EDID disabled on cmdline (edid=no)? > je .Lno_edid > > leawvesa_glob_info, %di > --- a/xen/arch/x86/efi/efi-boot.h > +++ b/xen/arch/x86/efi/efi-boot.h > @@ -568,6 +568,49 @@ static void __init efi_arch_video_init(E > #endif > } > > +#ifdef CONFIG_VIDEO > +static bool __init copy_edid(const void *buf, unsigned int size) > +{ > +/* > + * Be conservative - for both undersized and oversized blobs it is > unclear > + * what to actually do with them. The more that unlike the VESA BIOS > + * interface we also have no associated "capabilities" value (which might > + * carry a hint as to possible interpretation). > + */ > +if ( size != ARRAY_SIZE(boot_edid_info) ) > +return false; > + > +memcpy(boot_edid_info, buf, size); > +boot_edid_caps = 0; > + > +return true; > +} > +#endif > + > +static void __init efi_arch_edid(EFI_HANDLE gop_handle) > +{ > +#ifdef CONFIG_VIDEO > +static EFI_GUID __initdata active_guid = EFI_EDID_ACTIVE_PROTOCOL_GUID; > +static EFI_GUID __initdata discovered_guid = > EFI_EDID_DISCOVERED_PROTOCOL_GUID; Is there a need to make those static? I think this function is either called from efi_start or efi_multiboot, but there aren't multiple calls to it? (also both parameters are IN only, so not to be changed by the EFI method? I have the feeling setting them to static is done because they can't be set to const? > +EFI_EDID_ACTIVE_PROTOCOL *active_edid; > +EFI_EDID_DISCOVERED_PROTOCOL *discovered_edid; > +EFI_STATUS status; > + > +status = efi_bs->OpenProtocol(gop_handle, _guid, > + (void **)_edid, efi_ih, NULL, > + EFI_OPEN_PROTOCOL_GET_PROTOCOL); > +if ( status == EFI_SUCCESS && > + copy_edid(active_edid->Edid, active_edid->SizeOfEdid) ) > +return; Isn't it enough to just call EFI_EDID_ACTIVE_PROTOCOL_GUID? >From my reading of the UEFI spec this will either return EFI_EDID_OVERRIDE_PROTOCOL_GUID or EFI_EDID_DISCOVERED_PROTOCOL_GUID. If EFI_EDID_OVERRIDE_PROTOCOL is set it must be used, and hence falling back to EFI_EDID_DISCOVERED_PROTOCOL_GUID if EFI_EDID_ACTIVE_PROTOCOL_GUID cannot be parsed would likely mean ignoring EFI_EDID_OVERRIDE_PROTOCOL? > +status = efi_bs->OpenProtocol(gop_handle, _guid, > + (void **)_edid, efi_ih, NULL, > + EFI_OPEN_PROTOCOL_GET_PROTOCOL); > +if ( status == EFI_SUCCESS ) > +copy_edid(discovered_edid->Edid, discovered_edid->SizeOfEdid); > +#endif > +} > + > static void __init efi_arch_memory_setup(void) > { > unsigned int i; > @@ -729,6 +772,7 @@ static void __init efi_arch_flush_dcache > void __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE > *SystemTable) > { > EFI_GRAPHICS_OUTPUT_PROTOCOL *gop; > +EFI_HANDLE gop_handle; > UINTN cols, gop_mode = ~0, rows; > > __set_bit(EFI_BOOT, _flags); > @@ -742,11 +786,15 @@ void __init efi_multiboot2(EFI_HANDLE Im > , ) ==
Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none
On 01.04.2022 17:05, Andrew Cooper wrote: > On 01/04/2022 15:48, Andrew Cooper wrote: >> On 01/04/2022 15:37, Roger Pau Monne wrote: >>> Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the >>> Makefile doesn't get it propagated to the subdirectories, so instead >>> set the flag in firmware/Rules.mk, like it's done for other compiler >>> flags. >>> >>> Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT') >>> Signed-off-by: Roger Pau Monné >> Acked-by: Andrew Cooper > > This also needs backporting with the XSA-398 CET-IBT fixes. I don't think so - the backports of the original commit didn't include what this patch fixes. I have queued patch 2 of this series though. Jan
Re: [PATCH v4 4/9] xen: export evtchn_alloc_unbound
On 01.04.2022 02:38, Stefano Stabellini wrote: > From: Stefano Stabellini > > It will be used during dom0less domains construction. > > Signed-off-by: Stefano Stabellini I think this better wouldn't be a patch of its own. Functions should be non-static only when they have a user outside of their defining TU. > --- a/xen/include/xen/event.h > +++ b/xen/include/xen/event.h > @@ -71,6 +71,9 @@ void evtchn_free(struct domain *d, struct evtchn *chn); > /* Allocate a specific event channel port. */ > int evtchn_allocate_port(struct domain *d, unsigned int port); > > +/* Allocate a new event channel */ > +int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc); I wonder whether while exposing it the function should also become __must_check. Jan
Re: [PATCH v4 2/8] x86/boot: obtain video info from boot loader
On Thu, Mar 31, 2022 at 11:45:02AM +0200, Jan Beulich wrote: > With MB2 the boot loader may provide this information, allowing us to > obtain it without needing to enter real mode (assuming we don't need to > set a new mode from "vga=", but can instead inherit the one the > bootloader may have established). > > Signed-off-by: Jan Beulich > --- > v4: Re-base. > v3: Re-base. > v2: New. > > --- a/xen/arch/x86/boot/defs.h > +++ b/xen/arch/x86/boot/defs.h > @@ -53,6 +53,7 @@ typedef unsigned int u32; > typedef unsigned long long u64; > typedef unsigned int size_t; > typedef u8 uint8_t; > +typedef u16 uint16_t; > typedef u32 uint32_t; > typedef u64 uint64_t; > > --- a/xen/arch/x86/boot/head.S > +++ b/xen/arch/x86/boot/head.S > @@ -562,12 +562,18 @@ trampoline_setup: > mov %esi, sym_esi(xen_phys_start) > mov %esi, sym_esi(trampoline_xen_phys_start) > > -mov sym_esi(trampoline_phys), %ecx > - > /* Get bottom-most low-memory stack address. */ > +mov sym_esi(trampoline_phys), %ecx > add $TRAMPOLINE_SPACE,%ecx Just for my understanding, since you are already touching the instruction, why not switch it to a lea like you do below? Is that because you would also like to take the opportunity to fold the add into the lea and that would be too much of a change? > > +#ifdef CONFIG_VIDEO > +lea sym_esi(boot_vid_info), %edx > +#else > +xor %edx, %edx > +#endif > + > /* Save Multiboot / PVH info struct (after relocation) for later > use. */ > +push%edx/* Boot video info to be filled from > MB2. */ > push%ecx/* Bottom-most low-memory stack address. > */ > push%ebx/* Multiboot / PVH information address. > */ > push%eax/* Magic number. */ > --- a/xen/arch/x86/boot/reloc.c > +++ b/xen/arch/x86/boot/reloc.c > @@ -14,9 +14,10 @@ > > /* > * This entry point is entered from xen/arch/x86/boot/head.S with: > - * - 0x4(%esp) = MAGIC, > - * - 0x8(%esp) = INFORMATION_ADDRESS, > - * - 0xc(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS. > + * - 0x04(%esp) = MAGIC, > + * - 0x08(%esp) = INFORMATION_ADDRESS, > + * - 0x0c(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS. > + * - 0x10(%esp) = BOOT_VIDEO_INFO_ADDRESS. > */ > asm ( > ".text \n" > @@ -32,6 +33,39 @@ asm ( > #include "../../../include/xen/kconfig.h" > #include > > +#ifdef CONFIG_VIDEO > +# include "video.h" > + > +/* VESA control information */ > +struct __packed vesa_ctrl_info { > +uint8_t signature[4]; > +uint16_t version; > +uint32_t oem_name; > +uint32_t capabilities; > +uint32_t mode_list; > +uint16_t mem_size; > +/* We don't use any further fields. */ > +}; > + > +/* VESA 2.0 mode information */ > +struct vesa_mode_info { Should we add __packed here just in case further added fields are no longer naturally aligned? (AFAICT all field right now are aligned to it's size so there's no need for it). > +uint16_t attrib; > +uint8_t window[14]; /* We don't use the individual fields. */ > +uint16_t bytes_per_line; > +uint16_t width; > +uint16_t height; > +uint8_t cell_width; > +uint8_t cell_height; > +uint8_t nr_planes; > +uint8_t depth; > +uint8_t memory[5]; /* We don't use the individual fields. */ > +struct boot_video_colors colors; > +uint8_t direct_color; > +uint32_t base; > +/* We don't use any further fields. */ > +}; Would it make sense to put those struct definitions in boot/video.h like you do for boot_video_info? I also wonder whether you could then hide the #ifdef CONFIG_VIDEO check inside of the header itself. > +#endif /* CONFIG_VIDEO */ > + > #define get_mb2_data(tag, type, member) (((multiboot2_tag_##type##_t > *)(tag))->member) > #define get_mb2_string(tag, type, member) ((u32)get_mb2_data(tag, type, > member)) > > @@ -146,7 +180,7 @@ static multiboot_info_t *mbi_reloc(u32 m > return mbi_out; > } > > -static multiboot_info_t *mbi2_reloc(u32 mbi_in) > +static multiboot_info_t *mbi2_reloc(uint32_t mbi_in, uint32_t video_out) > { > const multiboot2_fixed_t *mbi_fix = _p(mbi_in); > const multiboot2_memory_map_t *mmap_src; > @@ -154,6 +188,9 @@ static multiboot_info_t *mbi2_reloc(u32 > module_t *mbi_out_mods = NULL; > memory_map_t *mmap_dst; > multiboot_info_t *mbi_out; > +#ifdef CONFIG_VIDEO > +struct boot_video_info *video = NULL; > +#endif > u32 ptr; > unsigned int i, mod_idx = 0; > > @@ -254,17 +291,64 @@ static multiboot_info_t *mbi2_reloc(u32 > ++mod_idx; > break; > > +#ifdef CONFIG_VIDEO > +case MULTIBOOT2_TAG_TYPE_VBE: > +if ( video_out ) > +{ > +const struct vesa_ctrl_info *ci; > +const struct vesa_mode_info *mi; > + > +video =
Re: [PATCH v4 2/2] xen: Populate xen.lds.h and make use of its macros
On 05.04.2022 11:16, Michal Orzel wrote: > Populate header file xen.lds.h with the first portion of macros storing > constructs common to x86 and arm linker scripts. Replace the original > constructs with these helpers. > > No functional improvements to x86 linker script. > > Making use of common macros improves arm linker script with: > - explicit list of debug sections that otherwise are seen as "orphans" > by the linker. This will allow to fix issues after enabling linker > option --orphan-handling one day, > - extended list of discarded section to include: .discard, destructors > related sections, .fini_array which can reference .text.exit, > - sections not related to debugging that are placed by ld.lld. Even > though we do not support linking with LLD on Arm, these sections do > not cause problem to GNU ld. > > Please note that this patch does not aim to perform the full sync up > between the linker scripts. It creates a base for further work. > > Signed-off-by: Michal Orzel Reviewed-by: Jan Beulich
[PATCH v4 2/2] xen: Populate xen.lds.h and make use of its macros
Populate header file xen.lds.h with the first portion of macros storing constructs common to x86 and arm linker scripts. Replace the original constructs with these helpers. No functional improvements to x86 linker script. Making use of common macros improves arm linker script with: - explicit list of debug sections that otherwise are seen as "orphans" by the linker. This will allow to fix issues after enabling linker option --orphan-handling one day, - extended list of discarded section to include: .discard, destructors related sections, .fini_array which can reference .text.exit, - sections not related to debugging that are placed by ld.lld. Even though we do not support linking with LLD on Arm, these sections do not cause problem to GNU ld. Please note that this patch does not aim to perform the full sync up between the linker scripts. It creates a base for further work. Signed-off-by: Michal Orzel --- Changes since v3: -use POINTER_ALIGN in debug sections when needed -modify comment about ELF_DETAILS_SECTIONS Changes since v2: -refactor commit msg -move constructs together with surrounding ifdefery -list constructs other than *_SECTIONS in alphabetical order -add comment about EFI vs EFI support Changes since v1: -merge x86 and arm changes into single patch -do not propagate issues by generalizing CTORS -extract sections not related to debugging into separate macro -get rid of _SECTION suffix in favor of using more meaningful suffixes --- xen/arch/arm/xen.lds.S| 44 +++-- xen/arch/x86/xen.lds.S| 96 +++- xen/include/xen/xen.lds.h | 129 ++ 3 files changed, 147 insertions(+), 122 deletions(-) diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index c666fc3e69..649aa04f7f 100644 --- a/xen/arch/arm/xen.lds.S +++ b/xen/arch/arm/xen.lds.S @@ -68,12 +68,7 @@ SECTIONS *(.proc.info) __proc_info_end = .; -#ifdef CONFIG_HAS_VPCI - . = ALIGN(POINTER_ALIGN); - __start_vpci_array = .; - *(SORT(.data.vpci.*)) - __end_vpci_array = .; -#endif + VPCI_ARRAY } :text #if defined(BUILD_ID) @@ -109,12 +104,7 @@ SECTIONS *(.data.schedulers) __end_schedulers_array = .; -#ifdef CONFIG_HYPFS - . = ALIGN(8); - __paramhypfs_start = .; - *(.data.paramhypfs) - __paramhypfs_end = .; -#endif + HYPFS_PARAM *(.data .data.*) CONSTRUCTORS @@ -178,12 +168,7 @@ SECTIONS *(.altinstructions) __alt_instructions_end = .; -#ifdef CONFIG_DEBUG_LOCK_PROFILE - . = ALIGN(POINTER_ALIGN); - __lock_profile_start = .; - *(.lockprofile.data) - __lock_profile_end = .; -#endif + LOCK_PROFILE_DATA *(.init.data) *(.init.data.rel) @@ -222,22 +207,13 @@ SECTIONS /* Section for the device tree blob (if any). */ .dtb : { *(.dtb) } :text - /* Sections to be discarded */ - /DISCARD/ : { - *(.exit.text) - *(.exit.data) - *(.exitcall.exit) - *(.eh_frame) - } - - /* Stabs debugging sections. */ - .stab 0 : { *(.stab) } - .stabstr 0 : { *(.stabstr) } - .stab.excl 0 : { *(.stab.excl) } - .stab.exclstr 0 : { *(.stab.exclstr) } - .stab.index 0 : { *(.stab.index) } - .stab.indexstr 0 : { *(.stab.indexstr) } - .comment 0 : { *(.comment) } + DWARF2_DEBUG_SECTIONS + + DISCARD_SECTIONS + + STABS_DEBUG_SECTIONS + + ELF_DETAILS_SECTIONS } /* diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S index 3e65c09bb3..65cc4c9231 100644 --- a/xen/arch/x86/xen.lds.S +++ b/xen/arch/x86/xen.lds.S @@ -13,13 +13,6 @@ #undef __XEN_VIRT_START #define __XEN_VIRT_START __image_base__ #define DECL_SECTION(x) x : -/* - * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld - * for PE output, in order to record that we'd prefer these sections to not - * be loaded into memory. - */ -#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) } -#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) } ENTRY(efi_start) @@ -27,8 +20,6 @@ ENTRY(efi_start) #define FORMAT "elf64-x86-64" #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START) -#define DECL_DEBUG(x, a) #x 0 : { *(x) } -#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) } ENTRY(start_pa) @@ -159,12 +150,7 @@ SECTIONS *(.note.gnu.build-id) __note_gnu_build_id_end = .; #endif -#ifdef CONFIG_HAS_VPCI - . = ALIGN(POINTER_ALIGN); - __start_vpci_array = .; - *(SORT(.data.vpci.*)) - __end_vpci_array = .; -#endif + VPCI_ARRAY } PHDR(text) #if defined(CONFIG_PVH_GUEST) && !defined(EFI) @@ -278,12 +264,7 @@ SECTIONS *(.altinstructions) __alt_instructions_end = .; -#ifdef CONFIG_DEBUG_LOCK_PROFILE - . = ALIGN(POINTER_ALIGN); - __lock_profile_start = .; - *(.lockprofile.data) - __lock_profile_end = .; -#endif + LOCK_PROFILE_DATA . = ALIGN(8);
[PATCH v4 1/2] xen: Introduce a header to store common linker scripts content
Both x86 and arm linker scripts share quite a lot of common content. It is difficult to keep syncing them up, thus introduce a new header in include/xen called xen.lds.h to store the internals mutual to all the linker scripts. Include this header in linker scripts for x86 and arm. This patch serves as an intermediate step before populating xen.lds.h and making use of its content in the linker scripts later on. Signed-off-by: Michal Orzel Acked-by: Jan Beulich --- Changes since v2,v3: -none Changes since v1: -rename header to xen.lds.h to be coherent with Linux kernel -include empty header in linker scripts --- xen/arch/arm/xen.lds.S| 1 + xen/arch/x86/xen.lds.S| 1 + xen/include/xen/xen.lds.h | 8 3 files changed, 10 insertions(+) create mode 100644 xen/include/xen/xen.lds.h diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S index 7921d8fa28..c666fc3e69 100644 --- a/xen/arch/arm/xen.lds.S +++ b/xen/arch/arm/xen.lds.S @@ -3,6 +3,7 @@ /* Modified for ARM Xen by Ian Campbell */ #include +#include #include #undef ENTRY #undef ALIGN diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S index 3f9f633f55..3e65c09bb3 100644 --- a/xen/arch/x86/xen.lds.S +++ b/xen/arch/x86/xen.lds.S @@ -2,6 +2,7 @@ /* Modified for i386/x86-64 Xen by Keir Fraser */ #include +#include #include #undef ENTRY #undef ALIGN diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h new file mode 100644 index 00..dd292fa7dc --- /dev/null +++ b/xen/include/xen/xen.lds.h @@ -0,0 +1,8 @@ +#ifndef __XEN_LDS_H__ +#define __XEN_LDS_H__ + +/* + * Common macros to be used in architecture specific linker scripts. + */ + +#endif /* __XEN_LDS_H__ */ -- 2.25.1
[PATCH v4 0/2] xen: Linker scripts synchronization
This patch series aims to do the first step towards linker scripts synchronization. Linker scripts for arm and x86 share a lot of common sections and in order to make the process of changing/improving/syncing them, these sections shall be defined in just one place. The first patch creates an empty header file xen.lds.h to store the constructs mutual to both x86 and arm linker scripts. It also includes this header in the scripts. The second patch populates xen.lds.h with the first portion of common macros and replaces the original contructs with these helpers. Michal Orzel (2): xen: Introduce a header to store common linker scripts content xen: Populate xen.lds.h and make use of its macros xen/arch/arm/xen.lds.S| 45 +++-- xen/arch/x86/xen.lds.S| 97 +++ xen/include/xen/xen.lds.h | 137 ++ 3 files changed, 157 insertions(+), 122 deletions(-) create mode 100644 xen/include/xen/xen.lds.h -- 2.25.1
Re: [PATCH v3 2/2] xen: Populate xen.lds.h and make use of its macros
Hi Jan, On 05.04.2022 10:49, Jan Beulich wrote: > On 31.03.2022 09:14, Michal Orzel wrote: >> --- a/xen/include/xen/xen.lds.h >> +++ b/xen/include/xen/xen.lds.h >> @@ -5,4 +5,133 @@ >> * Common macros to be used in architecture specific linker scripts. >> */ >> >> +/* >> + * To avoid any confusion, please note that the EFI macro does not >> correspond >> + * to EFI support and is used when linking a native EFI (i.e. PE/COFF) >> binary, >> + * hence its usage in this header. >> + */ >> + >> +/* Macros to declare debug sections. */ >> +#ifdef EFI >> +/* >> + * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld >> + * for PE output, in order to record that we'd prefer these sections to not >> + * be loaded into memory. >> + */ >> +#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) } >> +#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) } >> +#else >> +#define DECL_DEBUG(x, a) #x 0 : { *(x) } >> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) } >> +#endif >> + >> +/* >> + * DWARF2+ debug sections. >> + * Explicitly list debug sections, first of all to avoid these sections >> being >> + * viewed as "orphan" by the linker. >> + * >> + * For the PE output this is further necessary so that they don't end up at >> + * VA 0, which is below image base and thus invalid. Note that this macro is >> + * to be used after _end, so if these sections get loaded they'll be >> discarded >> + * at runtime anyway. >> + */ >> +#define DWARF2_DEBUG_SECTIONS \ >> + DECL_DEBUG(.debug_abbrev, 1)\ >> + DECL_DEBUG2(.debug_info, .gnu.linkonce.wi.*, 1) \ >> + DECL_DEBUG(.debug_types, 1) \ >> + DECL_DEBUG(.debug_str, 1) \ >> + DECL_DEBUG2(.debug_line, .debug_line.*, 1) \ >> + DECL_DEBUG(.debug_line_str, 1) \ >> + DECL_DEBUG(.debug_names, 4) \ >> + DECL_DEBUG(.debug_frame, 4) \ >> + DECL_DEBUG(.debug_loc, 1) \ >> + DECL_DEBUG(.debug_loclists, 4) \ >> + DECL_DEBUG(.debug_macinfo, 1) \ >> + DECL_DEBUG(.debug_macro, 1) \ >> + DECL_DEBUG(.debug_ranges, 8)\ > > Here and ... > >> + DECL_DEBUG(.debug_rnglists, 4) \ >> + DECL_DEBUG(.debug_addr, 8) \ > > ... here I think you also want to switch to POINTER_ALIGN. > Ok, you're right. >> + DECL_DEBUG(.debug_aranges, 1) \ >> + DECL_DEBUG(.debug_pubnames, 1) \ >> + DECL_DEBUG(.debug_pubtypes, 1) >> + >> +/* Stabs debug sections. */ >> +#define STABS_DEBUG_SECTIONS \ >> + .stab 0 : { *(.stab) } \ >> + .stabstr 0 : { *(.stabstr) } \ >> + .stab.excl 0 : { *(.stab.excl) } \ >> + .stab.exclstr 0 : { *(.stab.exclstr) } \ >> + .stab.index 0 : { *(.stab.index) } \ >> + .stab.indexstr 0 : { *(.stab.indexstr) } >> + >> +/* >> + * Required sections not related to debugging. > > Nit: Perhaps better "Required ELF sections ..."? Personally I'd also > drop the mentioning of debugging - that's not really relevant here. > I'm also unsure about "Required" - .comment isn't really required. > IOW ideally simply "ELF sections" or "Sections to be retained in ELF > binaries" or some such. > ELF sections is ok for me. > Jan > I will push updated series soon. Cheers, Michal
[PATCH v5 6/6] xen/cpupool: Allow cpupool0 to use different scheduler
Currently cpupool0 can use only the default scheduler, and cpupool_create has an hardcoded behavior when creating the pool 0 that doesn't allocate new memory for the scheduler, but uses the default scheduler structure in memory. With this commit it is possible to allocate a different scheduler for the cpupool0 when using the boot time cpupool. To achieve this the hardcoded behavior in cpupool_create is removed and the cpupool0 creation is moved. When compiling without boot time cpupools enabled, the current behavior is maintained (except that cpupool0 scheduler memory will be allocated). Signed-off-by: Luca Fancellu --- Changes in v5: - no changes Changes in v4: - no changes Changes in v3: - fix typo in commit message (Juergen) - rebase changes Changes in v2: - new patch --- xen/common/boot_cpupools.c | 5 - xen/common/sched/cpupool.c | 8 +--- xen/include/xen/sched.h| 5 - 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/xen/common/boot_cpupools.c b/xen/common/boot_cpupools.c index 1f940330a62d..a56baf3329a9 100644 --- a/xen/common/boot_cpupools.c +++ b/xen/common/boot_cpupools.c @@ -201,8 +201,11 @@ void __init btcpupools_allocate_pools(void) if ( add_extra_cpupool ) next_pool_id++; +/* Keep track of cpupool id 0 with the global cpupool0 */ +cpupool0 = cpupool_create_pool(0, pool_sched_map[0]); + /* Create cpupools with selected schedulers */ -for ( i = 0; i < next_pool_id; i++ ) +for ( i = 1; i < next_pool_id; i++ ) cpupool_create_pool(i, pool_sched_map[i]); } diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c index 86a175f99cd5..83112f5f04d3 100644 --- a/xen/common/sched/cpupool.c +++ b/xen/common/sched/cpupool.c @@ -312,10 +312,7 @@ static struct cpupool *cpupool_create(unsigned int poolid, c->cpupool_id = q->cpupool_id + 1; } -if ( poolid == 0 ) -c->sched = scheduler_get_default(); -else -c->sched = scheduler_alloc(sched_id); +c->sched = scheduler_alloc(sched_id); if ( IS_ERR(c->sched) ) { ret = PTR_ERR(c->sched); @@ -1242,9 +1239,6 @@ static int __init cf_check cpupool_init(void) cpupool_hypfs_init(); -cpupool0 = cpupool_create(0, 0); -BUG_ON(IS_ERR(cpupool0)); -cpupool_put(cpupool0); register_cpu_notifier(_nfb); btcpupools_dtb_parse(); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index b62315ad5e5d..e8f31758c058 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -1185,7 +1185,10 @@ void btcpupools_dtb_parse(void); int btcpupools_get_domain_pool_id(const struct dt_device_node *node); #else /* !CONFIG_BOOT_TIME_CPUPOOLS */ -static inline void btcpupools_allocate_pools(void) {} +static inline void btcpupools_allocate_pools(void) +{ +cpupool0 = cpupool_create_pool(0, -1); +} static inline void btcpupools_dtb_parse(void) {} static inline unsigned int btcpupools_get_cpupool_id(unsigned int cpu) { -- 2.17.1
[PATCH v5 3/6] xen/sched: retrieve scheduler id by name
Add a static function to retrieve the scheduler pointer using the scheduler name. Add a public function to retrieve the scheduler id by the scheduler name that makes use of the new static function. Take the occasion to replace open coded scheduler search with the new static function in scheduler_init. Signed-off-by: Luca Fancellu Reviewed-by: Juergen Gross --- Changes in v5: - no changes Changes in v4: - no changes Changes in v3: - add R-by Changes in v2: - replace open coded scheduler search in scheduler_init (Juergen) --- xen/common/sched/core.c | 40 ++-- xen/include/xen/sched.h | 11 +++ 2 files changed, 37 insertions(+), 14 deletions(-) diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 19ab67818106..48ee01420fb8 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -2947,10 +2947,30 @@ void scheduler_enable(void) scheduler_active = true; } +static inline +const struct scheduler *__init sched_get_by_name(const char *sched_name) +{ +unsigned int i; + +for ( i = 0; i < NUM_SCHEDULERS; i++ ) +if ( schedulers[i] && !strcmp(schedulers[i]->opt_name, sched_name) ) +return schedulers[i]; + +return NULL; +} + +int __init sched_get_id_by_name(const char *sched_name) +{ +const struct scheduler *scheduler = sched_get_by_name(sched_name); + +return scheduler ? scheduler->sched_id : -1; +} + /* Initialise the data structures. */ void __init scheduler_init(void) { struct domain *idle_domain; +const struct scheduler *scheduler; int i; scheduler_enable(); @@ -2981,25 +3001,17 @@ void __init scheduler_init(void) schedulers[i]->opt_name); schedulers[i] = NULL; } - -if ( schedulers[i] && !ops.name && - !strcmp(schedulers[i]->opt_name, opt_sched) ) -ops = *schedulers[i]; } -if ( !ops.name ) +scheduler = sched_get_by_name(opt_sched); +if ( !scheduler ) { printk("Could not find scheduler: %s\n", opt_sched); -for ( i = 0; i < NUM_SCHEDULERS; i++ ) -if ( schedulers[i] && - !strcmp(schedulers[i]->opt_name, CONFIG_SCHED_DEFAULT) ) -{ -ops = *schedulers[i]; -break; -} -BUG_ON(!ops.name); -printk("Using '%s' (%s)\n", ops.name, ops.opt_name); +scheduler = sched_get_by_name(CONFIG_SCHED_DEFAULT); +BUG_ON(!scheduler); +printk("Using '%s' (%s)\n", scheduler->name, scheduler->opt_name); } +ops = *scheduler; if ( cpu_schedule_up(0) ) BUG(); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index b07717987434..b527f141a1d3 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -758,6 +758,17 @@ void sched_destroy_domain(struct domain *d); long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *); long sched_adjust_global(struct xen_sysctl_scheduler_op *); int sched_id(void); + +/* + * sched_get_id_by_name - retrieves a scheduler id given a scheduler name + * @sched_name: scheduler name as a string + * + * returns: + * positive value being the scheduler id, on success + * negative value if the scheduler name is not found. + */ +int sched_get_id_by_name(const char *sched_name); + void vcpu_wake(struct vcpu *v); long vcpu_yield(void); void vcpu_sleep_nosync(struct vcpu *v); -- 2.17.1
[PATCH v5 5/6] arm/dom0less: assign dom0less guests to cpupools
Introduce domain-cpupool property of a xen,domain device tree node, that specifies the cpupool device tree handle of a xen,cpupool node that identifies a cpupool created at boot time where the guest will be assigned on creation. Add member to the xen_domctl_createdomain public interface so the XEN_DOMCTL_INTERFACE_VERSION version is bumped. Add public function to retrieve a pool id from the device tree cpupool node. Update documentation about the property. Signed-off-by: Luca Fancellu Reviewed-by: Stefano Stabellini --- Changes in v5: - no changes Changes in v4: - no changes - add R-by Changes in v3: - Use explicitely sized integer for struct xen_domctl_createdomain cpupool_id member. (Stefano) - Changed code due to previous commit code changes Changes in v2: - Moved cpupool_id from arch specific to common part (Juergen) - Implemented functions to retrieve the cpupool id from the cpupool dtb node. --- docs/misc/arm/device-tree/booting.txt | 5 + xen/arch/arm/domain_build.c | 14 +- xen/common/boot_cpupools.c| 24 xen/common/domain.c | 2 +- xen/include/public/domctl.h | 4 +++- xen/include/xen/sched.h | 9 + 6 files changed, 55 insertions(+), 3 deletions(-) diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt index a94125394e35..7b4a29a2c293 100644 --- a/docs/misc/arm/device-tree/booting.txt +++ b/docs/misc/arm/device-tree/booting.txt @@ -188,6 +188,11 @@ with the following properties: An empty property to request the memory of the domain to be direct-map (guest physical address == physical address). +- domain-cpupool + +Optional. Handle to a xen,cpupool device tree node that identifies the +cpupool where the guest will be started at boot. + Under the "xen,domain" compatible node, one or more sub-nodes are present for the DomU kernel and ramdisk. diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 8be01678de05..9c67a483d4a4 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -3172,7 +3172,8 @@ static int __init construct_domU(struct domain *d, void __init create_domUs(void) { struct dt_device_node *node; -const struct dt_device_node *chosen = dt_find_node_by_path("/chosen"); +const struct dt_device_node *cpupool_node, +*chosen = dt_find_node_by_path("/chosen"); BUG_ON(chosen == NULL); dt_for_each_child_node(chosen, node) @@ -3241,6 +3242,17 @@ void __init create_domUs(void) vpl011_virq - 32 + 1); } +/* Get the optional property domain-cpupool */ +cpupool_node = dt_parse_phandle(node, "domain-cpupool", 0); +if ( cpupool_node ) +{ +int pool_id = btcpupools_get_domain_pool_id(cpupool_node); +if ( pool_id < 0 ) +panic("Error getting cpupool id from domain-cpupool (%d)\n", + pool_id); +d_cfg.cpupool_id = pool_id; +} + /* * The variable max_init_domid is initialized with zero, so here it's * very important to use the pre-increment operator to call diff --git a/xen/common/boot_cpupools.c b/xen/common/boot_cpupools.c index 97c321386879..1f940330a62d 100644 --- a/xen/common/boot_cpupools.c +++ b/xen/common/boot_cpupools.c @@ -21,6 +21,8 @@ static unsigned int __initdata next_pool_id; #define BTCPUPOOLS_DT_NODE_NO_REG (-1) #define BTCPUPOOLS_DT_NODE_NO_LOG_CPU (-2) +#define BTCPUPOOLS_DT_WRONG_NODE (-3) +#define BTCPUPOOLS_DT_CORRUPTED_NODE (-4) static int __init get_logical_cpu_from_hw_id(unsigned int hwid) { @@ -55,6 +57,28 @@ get_logical_cpu_from_cpu_node(const struct dt_device_node *cpu_node) return cpu_num; } +int __init btcpupools_get_domain_pool_id(const struct dt_device_node *node) +{ +const struct dt_device_node *phandle_node; +int cpu_num; + +if ( !dt_device_is_compatible(node, "xen,cpupool") ) +return BTCPUPOOLS_DT_WRONG_NODE; +/* + * Get first cpu listed in the cpupool, from its reg it's possible to + * retrieve the cpupool id. + */ +phandle_node = dt_parse_phandle(node, "cpupool-cpus", 0); +if ( !phandle_node ) +return BTCPUPOOLS_DT_CORRUPTED_NODE; + +cpu_num = get_logical_cpu_from_cpu_node(phandle_node); +if ( cpu_num < 0 ) +return cpu_num; + +return pool_cpu_map[cpu_num]; +} + static int __init check_and_get_sched_id(const char* scheduler_name) { int sched_id = sched_get_id_by_name(scheduler_name); diff --git a/xen/common/domain.c b/xen/common/domain.c index 351029f8b239..0827400f4f49 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -698,7 +698,7 @@ struct domain *domain_create(domid_t domid, if ( !d->pbuf ) goto fail; -if ( (err = sched_init_domain(d, 0)) != 0 ) +
[PATCH v5 1/6] tools/cpupools: Give a name to unnamed cpupools
With the introduction of boot time cpupools, Xen can create many different cpupools at boot time other than cpupool with id 0. Since these newly created cpupools can't have an entry in Xenstore, create the entry using xen-init-dom0 helper with the usual convention: Pool-. Given the change, remove the check for poolid == 0 from libxl_cpupoolid_to_name(...). Signed-off-by: Luca Fancellu Reviewed-by: Juergen Gross --- Changes in v5: - no changes Changes in v4: - no changes Changes in v3: - no changes, add R-by Changes in v2: - Remove unused variable, moved xc_cpupool_infofree ahead to simplify the code, use asprintf (Juergen) --- tools/helpers/xen-init-dom0.c | 35 +- tools/libs/light/libxl_utils.c | 3 +-- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c index c99224a4b607..84286617790f 100644 --- a/tools/helpers/xen-init-dom0.c +++ b/tools/helpers/xen-init-dom0.c @@ -43,7 +43,9 @@ int main(int argc, char **argv) int rc; struct xs_handle *xsh = NULL; xc_interface *xch = NULL; -char *domname_string = NULL, *domid_string = NULL; +char *domname_string = NULL, *domid_string = NULL, *pool_path, *pool_name; +xc_cpupoolinfo_t *xcinfo; +unsigned int pool_id = 0; libxl_uuid uuid; /* Accept 0 or 1 argument */ @@ -114,6 +116,37 @@ int main(int argc, char **argv) goto out; } +/* Create an entry in xenstore for each cpupool on the system */ +do { +xcinfo = xc_cpupool_getinfo(xch, pool_id); +if (xcinfo != NULL) { +if (xcinfo->cpupool_id != pool_id) +pool_id = xcinfo->cpupool_id; +xc_cpupool_infofree(xch, xcinfo); +if (asprintf(_path, "/local/pool/%d/name", pool_id) <= 0) { +fprintf(stderr, "cannot allocate memory for pool path\n"); +rc = 1; +goto out; +} +if (asprintf(_name, "Pool-%d", pool_id) <= 0) { +fprintf(stderr, "cannot allocate memory for pool name\n"); +rc = 1; +goto out_err; +} +pool_id++; +if (!xs_write(xsh, XBT_NULL, pool_path, pool_name, + strlen(pool_name))) { +fprintf(stderr, "cannot set pool name\n"); +rc = 1; +} +free(pool_name); +out_err: +free(pool_path); +if ( rc ) +goto out; +} +} while(xcinfo != NULL); + printf("Done setting up Dom0\n"); out: diff --git a/tools/libs/light/libxl_utils.c b/tools/libs/light/libxl_utils.c index b91c2cafa223..81780da3ff40 100644 --- a/tools/libs/light/libxl_utils.c +++ b/tools/libs/light/libxl_utils.c @@ -151,8 +151,7 @@ char *libxl_cpupoolid_to_name(libxl_ctx *ctx, uint32_t poolid) snprintf(path, sizeof(path), "/local/pool/%d/name", poolid); s = xs_read(ctx->xsh, XBT_NULL, path, ); -if (!s && (poolid == 0)) -return strdup("Pool-0"); + return s; } -- 2.17.1
[PATCH v5 4/6] xen/cpupool: Create different cpupools at boot time
Introduce a way to create different cpupools at boot time, this is particularly useful on ARM big.LITTLE system where there might be the need to have different cpupools for each type of core, but also systems using NUMA can have different cpu pools for each node. The feature on arm relies on a specification of the cpupools from the device tree to build pools and assign cpus to them. Documentation is created to explain the feature. Signed-off-by: Luca Fancellu --- Changes in v5: - Fixed wrong variable name, swapped schedulers, add scheduler info in the printk (Stefano) - introduce assert in cpupool_init and btcpupools_get_cpupool_id to harden the code Changes in v4: - modify Makefile to put in *.init.o, fixed stubs and macro (Jan) - fixed docs, fix brakets (Stefano) - keep cpu0 in Pool-0 (Julien) - moved printk from btcpupools_allocate_pools to btcpupools_get_cpupool_id - Add to docs constraint about cpu0 and Pool-0 Changes in v3: - Add newline to cpupools.txt and removed "default n" from Kconfig (Jan) - Fixed comment, moved defines, used global cpu_online_map, use HAS_DEVICE_TREE instead of ARM and place arch specific code in header (Juergen) - Fix brakets, x86 code only panic, get rid of scheduler dt node, don't save pool pointer and look for it from the pool list (Stefano) - Changed data structures to allow modification to the code. Changes in v2: - Move feature to common code (Juergen) - Try to decouple dtb parse and cpupool creation to allow more way to specify cpupools (for example command line) - Created standalone dt node for the scheduler so it can be used in future work to set scheduler specific parameters - Use only auto generated ids for cpupools --- docs/misc/arm/device-tree/cpupools.txt | 136 + xen/arch/arm/include/asm/smp.h | 3 + xen/common/Kconfig | 7 + xen/common/Makefile| 1 + xen/common/boot_cpupools.c | 203 + xen/common/sched/cpupool.c | 12 +- xen/include/xen/sched.h| 14 ++ 7 files changed, 375 insertions(+), 1 deletion(-) create mode 100644 docs/misc/arm/device-tree/cpupools.txt create mode 100644 xen/common/boot_cpupools.c diff --git a/docs/misc/arm/device-tree/cpupools.txt b/docs/misc/arm/device-tree/cpupools.txt new file mode 100644 index ..5dac2b1384e0 --- /dev/null +++ b/docs/misc/arm/device-tree/cpupools.txt @@ -0,0 +1,136 @@ +Boot time cpupools +== + +When BOOT_TIME_CPUPOOLS is enabled in the Xen configuration, it is possible to +create cpupools during boot phase by specifying them in the device tree. + +Cpupools specification nodes shall be direct childs of /chosen node. +Each cpupool node contains the following properties: + +- compatible (mandatory) + +Must always include the compatiblity string: "xen,cpupool". + +- cpupool-cpus (mandatory) + +Must be a list of device tree phandle to nodes describing cpus (e.g. having +device_type = "cpu"), it can't be empty. + +- cpupool-sched (optional) + +Must be a string having the name of a Xen scheduler. Check the sched=<...> +boot argument for allowed values. + + +Constraints +=== + +If no cpupools are specified, all cpus will be assigned to one cpupool +implicitly created (Pool-0). + +If cpupools node are specified, but not every cpu brought up by Xen is assigned, +all the not assigned cpu will be assigned to an additional cpupool. + +If a cpu is assigned to a cpupool, but it's not brought up correctly, Xen will +stop. + +The boot cpu must be assigned to Pool-0, so the cpupool containing that core +will become Pool-0 automatically. + + +Examples + + +A system having two types of core, the following device tree specification will +instruct Xen to have two cpupools: + +- The cpupool with id 0 will have 4 cpus assigned. +- The cpupool with id 1 will have 2 cpus assigned. + +The following example can work only if hmp-unsafe=1 is passed to Xen boot +arguments, otherwise not all cores will be brought up by Xen and the cpupool +creation process will stop Xen. + + +a72_1: cpu@0 { +compatible = "arm,cortex-a72"; +reg = <0x0 0x0>; +device_type = "cpu"; +[...] +}; + +a72_2: cpu@1 { +compatible = "arm,cortex-a72"; +reg = <0x0 0x1>; +device_type = "cpu"; +[...] +}; + +a53_1: cpu@100 { +compatible = "arm,cortex-a53"; +reg = <0x0 0x100>; +device_type = "cpu"; +[...] +}; + +a53_2: cpu@101 { +compatible = "arm,cortex-a53"; +reg = <0x0 0x101>; +device_type = "cpu"; +[...] +}; + +a53_3: cpu@102 { +compatible = "arm,cortex-a53"; +reg = <0x0 0x102>; +device_type = "cpu"; +[...] +}; + +a53_4: cpu@103 { +compatible = "arm,cortex-a53"; +reg = <0x0 0x103>; +device_type = "cpu"; +[...] +}; + +chosen { + +cpupool_a { +compatible =
[PATCH v5 2/6] xen/sched: create public function for cpupools creation
Create new public function to create cpupools, can take as parameter the scheduler id or a negative value that means the default Xen scheduler will be used. Signed-off-by: Luca Fancellu --- Changes in v5: - no changes Changes in v4: - no changes Changes in v3: - Fixed comment (Andrew) Changes in v2: - cpupool_create_pool doesn't check anymore for pool id uniqueness before calling cpupool_create. Modified commit message accordingly --- xen/common/sched/cpupool.c | 15 +++ xen/include/xen/sched.h| 16 2 files changed, 31 insertions(+) diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c index a6da4970506a..89a891af7076 100644 --- a/xen/common/sched/cpupool.c +++ b/xen/common/sched/cpupool.c @@ -1219,6 +1219,21 @@ static void cpupool_hypfs_init(void) #endif /* CONFIG_HYPFS */ +struct cpupool *__init cpupool_create_pool(unsigned int pool_id, int sched_id) +{ +struct cpupool *pool; + +if ( sched_id < 0 ) +sched_id = scheduler_get_default()->sched_id; + +pool = cpupool_create(pool_id, sched_id); + +BUG_ON(IS_ERR(pool)); +cpupool_put(pool); + +return pool; +} + static int __init cf_check cpupool_init(void) { unsigned int cpu; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 406d9bc610a4..b07717987434 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -1147,6 +1147,22 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c); int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op); unsigned int cpupool_get_id(const struct domain *d); const cpumask_t *cpupool_valid_cpus(const struct cpupool *pool); + +/* + * cpupool_create_pool - Creates a cpupool + * @pool_id: id of the pool to be created + * @sched_id: id of the scheduler to be used for the pool + * + * Creates a cpupool with pool_id id. + * The sched_id parameter identifies the scheduler to be used, if it is + * negative, the default scheduler of Xen will be used. + * + * returns: + * pointer to the struct cpupool just created, or Xen will panic in case of + * error + */ +struct cpupool *cpupool_create_pool(unsigned int pool_id, int sched_id); + extern void cf_check dump_runq(unsigned char key); void arch_do_physinfo(struct xen_sysctl_physinfo *pi); -- 2.17.1
[PATCH v5 0/6] Boot time cpupools
This serie introduces a feature for Xen to create cpu pools at boot time, the feature is enabled using a configurable that is disabled by default. The boot time cpupool feature relies on the device tree to describe the cpu pools. Another feature is introduced by the serie, the possibility to assign a dom0less guest to a cpupool at boot time. Here follows an example, Xen is built with CONFIG_BOOT_TIME_CPUPOOLS=y. >From the DT: [...] a72_0: cpu@0 { compatible = "arm,cortex-a72"; reg = <0x0 0x0>; device_type = "cpu"; [...] }; a72_1: cpu@1 { compatible = "arm,cortex-a72"; reg = <0x0 0x1>; device_type = "cpu"; [...] }; a53_0: cpu@100 { compatible = "arm,cortex-a53"; reg = <0x0 0x100>; device_type = "cpu"; [...] }; a53_1: cpu@101 { compatible = "arm,cortex-a53"; reg = <0x0 0x101>; device_type = "cpu"; [...] }; a53_2: cpu@102 { compatible = "arm,cortex-a53"; reg = <0x0 0x102>; device_type = "cpu"; [...] }; a53_3: cpu@103 { compatible = "arm,cortex-a53"; reg = <0x0 0x103>; device_type = "cpu"; [...] }; chosen { #size-cells = <0x1>; #address-cells = <0x1>; xen,dom0-bootargs = "..."; xen,xen-bootargs = "..."; cpupool0 { compatible = "xen,cpupool"; cpupool-cpus = <_0 _1>; cpupool-sched = "credit2"; }; cp1: cpupool1 { compatible = "xen,cpupool"; cpupool-cpus = <_0 _1 _2 _3>; }; module@0 { reg = <0x8008 0x130>; compatible = "multiboot,module"; }; domU1 { #size-cells = <0x1>; #address-cells = <0x1>; compatible = "xen,domain"; cpus = <1>; memory = <0 0xC>; vpl011; domain-cpupool = <>; module@9200 { compatible = "multiboot,kernel", "multiboot,module"; reg = <0x9200 0x1ff>; bootargs = "..."; }; }; }; [...] The example DT is instructing Xen to have two cpu pools, the one with id 0 having two phisical cpus and the one with id 1 having 4 phisical cpu, the second cpu pool uses the null scheduler and from the /chosen node we can see that a dom0less guest will be started on that cpu pool. In this particular case Xen must boot with different type of cpus, so the boot argument hmp_unsafe must be enabled. Luca Fancellu (6): tools/cpupools: Give a name to unnamed cpupools xen/sched: create public function for cpupools creation xen/sched: retrieve scheduler id by name xen/cpupool: Create different cpupools at boot time arm/dom0less: assign dom0less guests to cpupools xen/cpupool: Allow cpupool0 to use different scheduler docs/misc/arm/device-tree/booting.txt | 5 + docs/misc/arm/device-tree/cpupools.txt | 136 +++ tools/helpers/xen-init-dom0.c | 35 +++- tools/libs/light/libxl_utils.c | 3 +- xen/arch/arm/domain_build.c| 14 +- xen/arch/arm/include/asm/smp.h | 3 + xen/common/Kconfig | 7 + xen/common/Makefile| 1 + xen/common/boot_cpupools.c | 230 + xen/common/domain.c| 2 +- xen/common/sched/core.c| 40 +++-- xen/common/sched/cpupool.c | 35 +++- xen/include/public/domctl.h| 4 +- xen/include/xen/sched.h| 53 ++ 14 files changed, 540 insertions(+), 28 deletions(-) create mode 100644 docs/misc/arm/device-tree/cpupools.txt create mode 100644 xen/common/boot_cpupools.c -- 2.17.1
Re: [PATCH v3 2/2] xen: Populate xen.lds.h and make use of its macros
On 31.03.2022 09:14, Michal Orzel wrote: > --- a/xen/include/xen/xen.lds.h > +++ b/xen/include/xen/xen.lds.h > @@ -5,4 +5,133 @@ > * Common macros to be used in architecture specific linker scripts. > */ > > +/* > + * To avoid any confusion, please note that the EFI macro does not correspond > + * to EFI support and is used when linking a native EFI (i.e. PE/COFF) > binary, > + * hence its usage in this header. > + */ > + > +/* Macros to declare debug sections. */ > +#ifdef EFI > +/* > + * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld > + * for PE output, in order to record that we'd prefer these sections to not > + * be loaded into memory. > + */ > +#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) } > +#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) } > +#else > +#define DECL_DEBUG(x, a) #x 0 : { *(x) } > +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) } > +#endif > + > +/* > + * DWARF2+ debug sections. > + * Explicitly list debug sections, first of all to avoid these sections being > + * viewed as "orphan" by the linker. > + * > + * For the PE output this is further necessary so that they don't end up at > + * VA 0, which is below image base and thus invalid. Note that this macro is > + * to be used after _end, so if these sections get loaded they'll be > discarded > + * at runtime anyway. > + */ > +#define DWARF2_DEBUG_SECTIONS \ > + DECL_DEBUG(.debug_abbrev, 1)\ > + DECL_DEBUG2(.debug_info, .gnu.linkonce.wi.*, 1) \ > + DECL_DEBUG(.debug_types, 1) \ > + DECL_DEBUG(.debug_str, 1) \ > + DECL_DEBUG2(.debug_line, .debug_line.*, 1) \ > + DECL_DEBUG(.debug_line_str, 1) \ > + DECL_DEBUG(.debug_names, 4) \ > + DECL_DEBUG(.debug_frame, 4) \ > + DECL_DEBUG(.debug_loc, 1) \ > + DECL_DEBUG(.debug_loclists, 4) \ > + DECL_DEBUG(.debug_macinfo, 1) \ > + DECL_DEBUG(.debug_macro, 1) \ > + DECL_DEBUG(.debug_ranges, 8)\ Here and ... > + DECL_DEBUG(.debug_rnglists, 4) \ > + DECL_DEBUG(.debug_addr, 8) \ ... here I think you also want to switch to POINTER_ALIGN. > + DECL_DEBUG(.debug_aranges, 1) \ > + DECL_DEBUG(.debug_pubnames, 1) \ > + DECL_DEBUG(.debug_pubtypes, 1) > + > +/* Stabs debug sections. */ > +#define STABS_DEBUG_SECTIONS \ > + .stab 0 : { *(.stab) } \ > + .stabstr 0 : { *(.stabstr) } \ > + .stab.excl 0 : { *(.stab.excl) } \ > + .stab.exclstr 0 : { *(.stab.exclstr) } \ > + .stab.index 0 : { *(.stab.index) } \ > + .stab.indexstr 0 : { *(.stab.indexstr) } > + > +/* > + * Required sections not related to debugging. Nit: Perhaps better "Required ELF sections ..."? Personally I'd also drop the mentioning of debugging - that's not really relevant here. I'm also unsure about "Required" - .comment isn't really required. IOW ideally simply "ELF sections" or "Sections to be retained in ELF binaries" or some such. Jan
Re: [PATCH v4 1/8] x86/boot: make "vga=current" work with graphics modes
On Thu, Mar 31, 2022 at 11:44:10AM +0200, Jan Beulich wrote: > GrUB2 can be told to leave the screen in the graphics mode it has been > using (or any other one), via "set gfxpayload=keep" (or suitable > variants thereof). In this case we can avoid doing another mode switch > ourselves. This in particular avoids possibly setting the screen to a > less desirable mode: On one of my test systems the set of modes > reported available by the VESA BIOS depends on whether the interposed > KVM switch has that machine set as the active one. If it's not active, > only modes up to 1024x768 get reported, while when active 1280x1024 > modes are also included. For things to always work with an explicitly > specified mode (via the "vga=" option), that mode therefore needs be a > 1024x768 one. > > For some reason this only works for me with "multiboot2" (and > "module2"); "multiboot" (and "module") still forces the screen into text > mode, despite my reading of the sources suggesting otherwise. > > For starters I'm limiting this to graphics modes; I do think this ought > to also work for text modes, but > - I can't tell whether GrUB2 can set any text mode other than 80x25 > (I've only found plain "text" to be valid as a "gfxpayload" setting), > - I'm uncertain whether supporting that is worth it, since I'm uncertain > how many people would be running their systems/screens in text mode, > - I'd like to limit the amount of code added to the realmode trampoline. > > For starters I'm also limiting mode information retrieval to raw BIOS > accesses. This will allow things to work (in principle) also with other > boot environments where a graphics mode can be left in place. The > downside is that this then still is dependent upon switching back to > real mode, so retrieving the needed information from multiboot info is > likely going to be desirable down the road. > > Signed-off-by: Jan Beulich Acked-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH v4 1/8] x86/boot: make "vga=current" work with graphics modes
On 05.04.2022 10:24, Roger Pau Monné wrote: > On Mon, Apr 04, 2022 at 05:50:57PM +0200, Jan Beulich wrote: >> (reducing Cc list some) >> >> On 04.04.2022 16:49, Roger Pau Monné wrote: >>> On Thu, Mar 31, 2022 at 11:44:10AM +0200, Jan Beulich wrote: GrUB2 can be told to leave the screen in the graphics mode it has been using (or any other one), via "set gfxpayload=keep" (or suitable variants thereof). In this case we can avoid doing another mode switch ourselves. This in particular avoids possibly setting the screen to a less desirable mode: On one of my test systems the set of modes reported available by the VESA BIOS depends on whether the interposed KVM switch has that machine set as the active one. If it's not active, only modes up to 1024x768 get reported, while when active 1280x1024 modes are also included. For things to always work with an explicitly specified mode (via the "vga=" option), that mode therefore needs be a 1024x768 one. > > So this patch helps you by not having to set a mode and just relying > on the mode set by GrUB? Yes, but it goes beyond that: The modes offered by VESA on the particular system don't include the higher resolution one under certain circumstances, so I cannot tell Xen to switch to that mode. By not having to tell Xen a specific mode (but rather inherit that set / left active by the boot loader), I can leverage the better mode in most cases, but things will still work if I turn on (or reset) the system with another machine being the presently selected one at the KVM switch. But yes, beyond the particular quirk on this system the benefit is one less mode switch and hence less screen flickering and slightly faster boot. --- a/xen/arch/x86/boot/video.S +++ b/xen/arch/x86/boot/video.S @@ -575,7 +575,6 @@ set14: movw$0x, %ax movb$0x01, %ah # Define cursor scan lines 11-12 movw$0x0b0c, %cx int $0x10 -set_current: stc ret @@ -693,6 +692,39 @@ vga_modes: .word VIDEO_80x60, 0x50,0x3c,0# 80x60 vga_modes_end: +# If the current mode is a VESA graphics one, obtain its parameters. +set_current: +leawvesa_glob_info, %di +movw$0x4f00, %ax +int $0x10 +cmpw$0x004f, %ax +jne .Lsetc_done >>> >>> You don't seem to make use of the information fetched here? I guess >>> this is somehow required to access the other functions? >> >> See the similar logic at check_vesa. The information is used later, by >> mode_params (half way into mopar_gr). Quite likely this could be done >> just in a single place, but that would require some restructuring of >> the code, which I'd like to avoid doing here. > > I didn't realize check_vesa and set_current where mutually > exclusive. > +movw$0x4f03, %ax >>> >>> It would help readability to have defines for those values, ie: >>> VESA_GET_CURRENT_MODE or some such (not that you need to do it here, >>> just a comment). >> >> Right - this applies to all of our BIOS interfacing code, I guess. >> +int $0x10 +cmpw$0x004f, %ax +jne .Lsetc_done + +leawvesa_mode_info, %di # Get mode information structure +movw%bx, %cx +movw$0x4f01, %ax +int $0x10 +cmpw$0x004f, %ax +jne .Lsetc_done + +movb(%di), %al # Check mode attributes +andb$0x9b, %al +cmpb$0x9b, %al >>> >>> So you also check that the reserved D1 bit is set to 1 as mandated by >>> the spec. This is slightly different than what's done in check_vesa, >>> would you mind adding a define for this an unifying with check_vesa? >> >> Well, see the v2 changelog comment. I'm somewhat hesitant to do that >> here; I'd prefer to consolidate this in a separate patch. > > Sorry, didn't notice that v2 comment before. > > It's my understanding that the main difference this patch introduces > is that set_current now fetches the currently set mode, so that we > avoid further mode changes if the mode set already matches the > selected one, or if Xen is to use the already set mode? Not exactly: You either tell Xen to use the current mode ("vga=current") or you tell Xen to use a specific mode ("vga="). Checking whether the present mode is the (specific) one Xen was told to switch to would require yet more work. But skipping a requested mode switch can also have unintended consequences, so I wouldn't even be certain we would want to go such a route. Jan
Re: [PATCH 2/2] arch: ensure idle domain is not left privileged
On 31.03.2022 01:05, Daniel P. Smith wrote: > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -589,6 +589,9 @@ static void noinline init_done(void) > void *va; > unsigned long start, end; > > +/* Ensure idle domain was not left privileged */ > +ASSERT(current->domain->is_privileged == false) ; I think this should be stronger than ASSERT(); I'd recommend calling panic(). Also please don't compare against "true" or "false" - use ordinary boolean operations instead (here it would be "!current->domain->is_privileged"). Jan
Re: [PATCH v4 1/8] x86/boot: make "vga=current" work with graphics modes
On Mon, Apr 04, 2022 at 05:50:57PM +0200, Jan Beulich wrote: > (reducing Cc list some) > > On 04.04.2022 16:49, Roger Pau Monné wrote: > > On Thu, Mar 31, 2022 at 11:44:10AM +0200, Jan Beulich wrote: > >> GrUB2 can be told to leave the screen in the graphics mode it has been > >> using (or any other one), via "set gfxpayload=keep" (or suitable > >> variants thereof). In this case we can avoid doing another mode switch > >> ourselves. This in particular avoids possibly setting the screen to a > >> less desirable mode: On one of my test systems the set of modes > >> reported available by the VESA BIOS depends on whether the interposed > >> KVM switch has that machine set as the active one. If it's not active, > >> only modes up to 1024x768 get reported, while when active 1280x1024 > >> modes are also included. For things to always work with an explicitly > >> specified mode (via the "vga=" option), that mode therefore needs be a > >> 1024x768 one. So this patch helps you by not having to set a mode and just relying on the mode set by GrUB? > >> > >> For some reason this only works for me with "multiboot2" (and > >> "module2"); "multiboot" (and "module") still forces the screen into text > >> mode, despite my reading of the sources suggesting otherwise. > >> > >> For starters I'm limiting this to graphics modes; I do think this ought > >> to also work for text modes, but > >> - I can't tell whether GrUB2 can set any text mode other than 80x25 > >> (I've only found plain "text" to be valid as a "gfxpayload" setting), > >> - I'm uncertain whether supporting that is worth it, since I'm uncertain > >> how many people would be running their systems/screens in text mode, > >> - I'd like to limit the amount of code added to the realmode trampoline. > >> > >> For starters I'm also limiting mode information retrieval to raw BIOS > >> accesses. This will allow things to work (in principle) also with other > >> boot environments where a graphics mode can be left in place. The > >> downside is that this then still is dependent upon switching back to > >> real mode, so retrieving the needed information from multiboot info is > >> likely going to be desirable down the road. > > > > I'm unsure, what's the benefit from retrieving this information from > > the VESA blob rather than from multiboot(2) structures? > > As said - it allows things to work even when that data isn't provided. > Note also how I say "for starters" - patch 2 adds logic to retrieve > the information from MB. > > > Is it because we require a VESA mode to be set before we parse the > > multiboot information? > > No, I don't think so. > > >> --- a/xen/arch/x86/boot/video.S > >> +++ b/xen/arch/x86/boot/video.S > >> @@ -575,7 +575,6 @@ set14: movw$0x, %ax > >> movb$0x01, %ah # Define cursor scan lines 11-12 > >> movw$0x0b0c, %cx > >> int $0x10 > >> -set_current: > >> stc > >> ret > >> > >> @@ -693,6 +692,39 @@ vga_modes: > >> .word VIDEO_80x60, 0x50,0x3c,0# 80x60 > >> vga_modes_end: > >> > >> +# If the current mode is a VESA graphics one, obtain its parameters. > >> +set_current: > >> +leawvesa_glob_info, %di > >> +movw$0x4f00, %ax > >> +int $0x10 > >> +cmpw$0x004f, %ax > >> +jne .Lsetc_done > > > > You don't seem to make use of the information fetched here? I guess > > this is somehow required to access the other functions? > > See the similar logic at check_vesa. The information is used later, by > mode_params (half way into mopar_gr). Quite likely this could be done > just in a single place, but that would require some restructuring of > the code, which I'd like to avoid doing here. I didn't realize check_vesa and set_current where mutually exclusive. > >> +movw$0x4f03, %ax > > > > It would help readability to have defines for those values, ie: > > VESA_GET_CURRENT_MODE or some such (not that you need to do it here, > > just a comment). > > Right - this applies to all of our BIOS interfacing code, I guess. > > >> +int $0x10 > >> +cmpw$0x004f, %ax > >> +jne .Lsetc_done > >> + > >> +leawvesa_mode_info, %di # Get mode information structure > >> +movw%bx, %cx > >> +movw$0x4f01, %ax > >> +int $0x10 > >> +cmpw$0x004f, %ax > >> +jne .Lsetc_done > >> + > >> +movb(%di), %al # Check mode attributes > >> +andb$0x9b, %al > >> +cmpb$0x9b, %al > > > > So you also check that the reserved D1 bit is set to 1 as mandated by > > the spec. This is slightly different than what's done in check_vesa, > > would you mind adding a define for this an unifying with check_vesa? > > Well, see the v2 changelog comment. I'm somewhat hesitant to do that > here; I'd prefer to consolidate this in a separate patch. Sorry, didn't notice
[linux-linus test] 169164: regressions - FAIL
flight 169164 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/169164/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64-pvops 6 kernel-build fail in 169157 REGR. vs. 169145 Tests which are failing intermittently (not blocking): test-armhf-armhf-libvirt-raw 10 host-ping-check-xenfail pass in 169157 Tests which did not succeed, but are not blocking: test-arm64-arm64-xl 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-xl-thunderx 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-examine 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-xl-seattle 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked in 169157 n/a test-arm64-arm64-xl-vhd 1 build-check(1) blocked in 169157 n/a test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail in 169157 blocked in 169145 test-armhf-armhf-libvirt-raw 14 migrate-support-check fail in 169157 never pass test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169145 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 169145 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169145 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169145 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169145 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169145 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 169145 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
Re: [PATCH] x86/irq: Skip unmap_domain_pirq XSM during destruction
On 30.03.2022 20:17, Jason Andryuk wrote: > xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from > complete_domain_destroy as an RCU callback. The source context was an > unexpected, random domain. Since this is a xen-internal operation, > we don't want the XSM hook denying the operation. > > Check d->is_dying and skip the check when the domain is dead. The RCU > callback runs when a domain is in that state. One question which has always been puzzling me (perhaps to Daniel): While I can see why mapping of an IRQ needs to be subject to an XSM check, it's not really clear to me why unmapping would need to be, at least as long as it's the domain itself which requests the unmap (and which I would view to extend to the domain being cleaned up). But maybe that's why it's XSM_HOOK ... > --- > Dan wants to change current to point at DOMID_IDLE when the RCU callback > runs. I think Juergen's commit 53594c7bd197 "rcu: don't use > stop_machine_run() for rcu_barrier()" may have changed this since it > mentions stop_machine_run scheduled the idle vcpus to run the callbacks > for the old code. > > Would that be as easy as changing rcu_do_batch() to do: > > +/* Run as "Xen" not a random domain's vcpu. */ > +vcpu = get_current(); > +set_current(idle_vcpu[smp_processor_id()]); > list->func(list); > +set_current(vcpu); > > or is using set_current() only acceptable as part of context_switch? Indeed I would question any uses outside of context_switch() (and system bringup). > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -2340,10 +2340,14 @@ int unmap_domain_pirq(struct domain *d, int pirq) > nr = msi_desc->msi.nvec; > } > > -ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq, > - msi_desc ? msi_desc->dev : NULL); > -if ( ret ) > -goto done; > +/* When called by complete_domain_destroy via RCU, current is a random > + * domain. Skip the XSM check since this is a Xen-initiated action. */ Comment style. > +if ( d->is_dying != DOMDYING_dead ) { Please use !d->is_dying. Also please correct the placement of the brace. Or you could avoid the need for a brace by leveraging that ret is zero ahead of this if(), i.e. ... > +ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq, > + msi_desc ? msi_desc->dev : NULL); > +if ( ret ) > +goto done; > +} if ( !d->is_dying ) ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq, msi_desc ? msi_desc->dev : NULL); if ( ret ) goto done; Jan
Re: [PATCH v4 4/4] x86/time: use fake read_tsc()
On Mon, Apr 04, 2022 at 05:33:04PM +0200, Jan Beulich wrote: > On 04.04.2022 15:22, Roger Pau Monné wrote: > > On Thu, Mar 31, 2022 at 11:31:38AM +0200, Jan Beulich wrote: > >> Go a step further than bed9ae54df44 ("x86/time: switch platform timer > >> hooks to altcall") did and eliminate the "real" read_tsc() altogether: > >> It's not used except in pointer comparisons, and hence it looks overall > >> more safe to simply poison plt_tsc's read_counter hook. > >> > >> Signed-off-by: Jan Beulich > >> --- > >> I wasn't really sure whether it would be better to use simply void * for > >> the type of the expression, resulting in an undesirable data -> function > >> pointer conversion, but making it impossible to mistakenly try and call > >> the (fake) function directly. > > > > I think it's slightly better to avoid being able to call the function, > > hence using void * would be my preference. What's wrong with the data > > -> function pointer conversion for the comparisons? > > There's no data -> function pointer conversion for the comparisons; the > situation there is even less pleasant. What I referred to was actually > the initializer, where there would be a data -> function pointer > conversion if I used void *. I see, there are architectures with different sizes for function and data pointers. It's also not clear all compilers will be happy with the conversion. > >> --- > >> v2: Comment wording. > >> > >> --- a/xen/arch/x86/time.c > >> +++ b/xen/arch/x86/time.c > >> @@ -607,10 +607,12 @@ static s64 __init cf_check init_tsc(stru > >> return ret; > >> } > >> > >> -static uint64_t __init cf_check read_tsc(void) > >> -{ > >> -return rdtsc_ordered(); > >> -} > >> +/* > >> + * plt_tsc's read_counter hook is not (and should not be) invoked via the > >> + * struct field. To avoid carrying an unused, indirectly reachable > >> function, > >> + * poison the field with an easily identifiable non-canonical pointer. > >> + */ > >> +#define read_tsc ((uint64_t(*)(void))0x75C75C75C75C75C0ul) > > > > Instead of naming this like a suitable function, I would rather use > > READ_TSC_PTR_POISON or some such. > > I'll be happy to name it something like this; the primary thing to > settle on is the type to use. I think it's safer to use a function pointer type like you currently have from a correctness PoV, but in order to prevent stray calls to read_tsc() I would rename to READ_TSC_PTR_POISON. This was already static, so I guess it's hard anyway for any of such direct calls to appear without us realizing. With that: Reviewed-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged
On Mon, Apr 04, 2022 at 12:08:25PM -0400, Daniel P. Smith wrote: > On 4/4/22 11:12, Roger Pau Monné wrote: > > On Mon, Apr 04, 2022 at 10:21:18AM -0400, Daniel P. Smith wrote: > >> On 3/31/22 08:36, Roger Pau Monné wrote: > >>> On Wed, Mar 30, 2022 at 07:05:48PM -0400, Daniel P. Smith wrote: > diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h > index e22d6160b5..157e57151e 100644 > --- a/xen/include/xsm/xsm.h > +++ b/xen/include/xsm/xsm.h > @@ -189,6 +189,28 @@ struct xsm_operations { > #endif > }; > > +static always_inline int xsm_elevate_priv(struct domain *d) > >>> > >>> I don't think it needs to be always_inline, using just inline would be > >>> fine IMO. > >>> > >>> Also this needs to be __init. > >> > >> AIUI always_inline is likely the best way to preserve the speculation > >> safety brought in by the call to is_system_domain(). > > > > There's nothing related to speculation safety in is_system_domain() > > AFAICT. It's just a plain check against d->domain_id. It's my > > understanding there's no need for any speculation barrier there > > because d->domain_id is not an external input. > > Hmmm, this actually raises a good question. Why is is_control_domain(), > is_hardware_domain, and others all have evaluate_nospec() wrapping the > check of a struct domain element while is_system_domain() does not? Jan replied to this regard, see: https://lore.kernel.org/xen-devel/54272d08-7ce1-b162-c8e9-1955b780c...@suse.com/ > > In any case this function should be __init only, at which point there > > are no untrusted inputs to Xen. > > I thought it was agreed that __init on inline functions in headers had > no meaning? In a different reply I already noted my preference would be for the function to not reside in a header and not be inline, simply because it would be gone after initialization and we won't have to worry about any stray calls when the system is active. Thanks, Roger.