[xen-4.13-testing test] 169180: tolerable FAIL - PUSHED

2022-04-05 Thread osstest service owner
flight 169180 xen-4.13-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169180/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168481
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168481
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168481
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168481
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168481
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168481
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168481
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168481
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168481
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168481
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168481
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168481
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass

version targeted for testing:
 xen  169a2834ef5d723091f187a5d6493ae77825757a
baseline version:
 xen  

Re: Increasing domain memory beyond initial maxmem

2022-04-05 Thread Juergen Gross

On 05.04.22 18:24, Marek Marczykowski-Górecki wrote:

On Tue, Apr 05, 2022 at 01:03:57PM +0200, Juergen Gross wrote:

Hi Marek,

On 31.03.22 14:36, Marek Marczykowski-Górecki wrote:

On Thu, Mar 31, 2022 at 02:22:03PM +0200, Juergen Gross wrote:

Maybe some kernel config differences, or other udev rules (memory onlining
is done via udev in my guest)?

I'm seeing:

# zgrep MEMORY_HOTPLUG /proc/config.gz
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512


I have:
# zgrep MEMORY_HOTPLUG /proc/config.gz
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512

Not sure if relevant, but I also have:
CONFIG_XEN_UNPOPULATED_ALLOC=y

on top of that, I have a similar udev rule too:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

But I don't think they are conflicting.


What type of guest are you using? Mine was a PVH guest.


PVH here too.


Would you like to try the attached patch? It seemed to work for me.


Unfortunately it doesn't help, now the behavior is different:

Initially guest started with 800M:

 [root@personal ~]# free -m
   totalusedfree  shared  buff/cache   
available
 Mem:740 223 272   2 243
 401
 Swap:  1023   01023

Then increased:

 [root@dom0 ~]$ xl mem-max personal 2048
 [root@dom0 ~]$ xenstore-write /local/domain/$(xl domid 
personal)/memory/static-max $((2048*1024))
 [root@dom0 ~]$ xl mem-set personal 2000

And guest shows now only a little more memory, but not full 2000M:

 [root@personal ~]# [   37.657046] xen:balloon: Populating new zone
 [   37.658206] Fallback order for Node 0: 0
 [   37.658219] Built 1 zonelists, mobility grouping on.  Total pages: 
175889
 [   37.658233] Policy zone: Normal

 [root@personal ~]#
 [root@personal ~]# free -m
   totalusedfree  shared  buff/cache   
available
 Mem:826 245 337   2 244
 462
 Swap:  1023   01023


I've applied the patch on top of 5.16.18. If you think 5.17 would make a
difference, I can try that too.


Hmm, weird.

Can you please post the output of

cat /proc/buddyinfo
cat /proc/iomem

in the guest before and after the operations?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[xen-4.15-testing test] 169178: tolerable FAIL - PUSHED

2022-04-05 Thread osstest service owner
flight 169178 xen-4.15-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169178/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169162
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169162
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 169162
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169162
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169162
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169162
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 169162
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 169162
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 169162
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 169162
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 169162
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 169162
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  aaa61028803a64e72f1026f9608dfa34d0c255ec
baseline version:
 xen  

Re: [PATCH v4 8/9] tools: add example application to initialize dom0less PV drivers

2022-04-05 Thread Stefano Stabellini
On Fri, 1 Apr 2022, Juergen Gross wrote:
> On 01.04.22 12:21, Julien Grall wrote:
> > Hi,
> > 
> > I have posted some comments in v3 after you sent this version. Please have a
> > look.
> > 
> > On 01/04/2022 01:38, Stefano Stabellini wrote:
> > > +static int init_domain(struct xs_handle *xsh, libxl_dominfo *info)
> > > +{
> > > +    struct xc_interface_core *xch;
> > > +    libxl_uuid uuid;
> > > +    uint64_t xenstore_evtchn, xenstore_pfn;
> > > +    int rc;
> > > +
> > > +    printf("Init dom0less domain: %u\n", info->domid);
> > > +    xch = xc_interface_open(0, 0, 0);
> > > +
> > > +    rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_EVTCHN,
> > > +  _evtchn);
> > > +    if (rc != 0) {
> > > +    printf("Failed to get HVM_PARAM_STORE_EVTCHN\n");
> > > +    return 1;
> > > +    }
> > > +
> > > +    /* Alloc xenstore page */
> > > +    if (alloc_xs_page(xch, info, _pfn) != 0) {
> > > +    printf("Error on alloc magic pages\n");
> > > +    return 1;
> > > +    }
> > > +
> > > +    rc = xc_dom_gnttab_seed(xch, info->domid, true,
> > > +    (xen_pfn_t)-1, xenstore_pfn, 0, 0);
> > > +    if (rc)
> > > +    err(1, "xc_dom_gnttab_seed");
> > > +
> > > +    libxl_uuid_generate();
> > > +    xc_domain_sethandle(xch, info->domid, libxl_uuid_bytearray());
> > > +
> > > +    rc = gen_stub_json_config(info->domid, );
> > > +    if (rc)
> > > +    err(1, "gen_stub_json_config");
> > > +
> > > +    /* Now everything is ready: set HVM_PARAM_STORE_PFN */
> > > +    rc = xc_hvm_param_set(xch, info->domid, HVM_PARAM_STORE_PFN,
> > > +  xenstore_pfn);
> > 
> > On patch #1, you told me you didn't want to allocate the page in Xen because
> > it wouldn't be initialized by Xenstored. But this is what we are doing here.
> 
> Xenstore (at least the C variant) is only using the fixed grant ref
> GNTTAB_RESERVED_XENSTORE, so it doesn't need the page to be advertised
> to the guest. And the mapping is done only when the domain is being
> introduced to Xenstore.
> 
> > 
> > This would be a problem if Linux is still booting and hasn't yet call
> > xenbus_probe_initcall().
> > 
> > I understand we need to have the page setup before raising the event
> > channel. I don't think we can allow Xenstored to set the HVM_PARAM (it may
> > run in a domain with less privilege). So I think we may need to create a
> > separate command to kick the client (not great).
> > 
> > Juergen, any thoughts?
> 
> I think it should work like that:
> 
> - setup the grant via xc_dom_gnttab_seed()
> - introduce the domain to Xenstore
> - call xc_hvm_param_set()
> 
> When the guest is receiving the event, it should wait for the xenstore
> page to appear.


I am OK with what you wrote above, and I understand Julien's concerns
about "waiting". Before discussing that, I would like to make sure I
understood why setting HVM_PARAM_STORE_PFN first (before
xs_introduce_domain) is not possible.

In a previous reply to Julien I wrote that it is not a good idea to
set HVM_PARAM_STORE_PFN in Xen before creating the domains because it
would cause Linux to hang at boot. That is true, Linux hangs on
drivers/xen/xenbus/xenbus_comms.c:xb_init_comms waiting on xb_waitq.
It could wait a very long time as domUs are typically a lot faster than
dom0 to boot.

However, if we set HVM_PARAM_STORE_PFN before calling
xs_introduce_domain in init-dom0less, for Linux to see it before
xs_introduce_domain is done, Linux would need to be racing against
init-dom0less. In that case, the wait in xb_init_comms would be minimal
anyway. It shouldn't be a problem. There would be no "hang", just a wait
a bit longer than usual.

Is that right?


Re: [PATCH] arm/xen: Fix refcount leak in xen_dt_guest_init

2022-04-05 Thread Miaoqian Lin
Hi,

On Fri, Mar 11, 2022 at 06:01:11PM -0800, Stefano Stabellini wrote:
> On Wed, 9 Mar 2022, Miaoqian Lin wrote:
> > The of_find_compatible_node() function returns a node pointer with
> > refcount incremented, We should use of_node_put() on it when done
> > Add the missing of_node_put() to release the refcount.
> > 
> > Fixes: 9b08aaa3199a ("ARM: XEN: Move xen_early_init() before efi_init()")
> > Signed-off-by: Miaoqian Lin 
> 
> Thanks for the patch!
> 
> 
> > ---
> >  arch/arm/xen/enlighten.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> > index ec5b082f3de6..262f45f686b6 100644
> > --- a/arch/arm/xen/enlighten.c
> > +++ b/arch/arm/xen/enlighten.c
> > @@ -424,6 +424,7 @@ static void __init xen_dt_guest_init(void)
> >  
> > if (of_address_to_resource(xen_node, GRANT_TABLE_INDEX, )) {
> > pr_err("Xen grant table region is not found\n");
> > +   of_node_put(xen_node);
> > return;
> > }
> 
> This is adding a call to of_node_put on the error path. Shouldn't it
> be called also in the non-error path?

You're right. It should be called also in the non-error path.
I made a mistake.

> Also, there is another instance of of_address_to_resource being called
> in this file (in arch_xen_unpopulated_init), does it make sense to call
> of_node_put there too?

I think so, becase device node pointer np is a local variable.
So the reference it taken should be released in the scope.

I look into the whole codebase for this kind of usage pattern
($ret=of_find_compatible_node();of_address_to_resource($ret,_,_), 
$ret is a local variable), Most of them call of_node_put() when done. 
And document of of_find_compatible_node() also mentions
> Return: A node pointer with refcount incremented, use
> of_node_put() on it when done.

But I am not sure, Since I am unfamiliar with other code logic.
It better if the developers could double check. I found some 
similar cases in arch/arm. 



Re: [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion

2022-04-05 Thread Chuck Zmudzinski




On 4/1/22 9:21 AM, Chuck Zmudzinski wrote:

On 3/30/22 2:45 PM, Jason Andryuk wrote:

On Fri, Mar 18, 2022 at 4:13 AM Jan Beulich  wrote:

On 14.03.2022 04:41, Chuck Zmudzinski wrote:

When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD
opregion to the guest but libxl does not grant the guest permission to
access the mapped memory region. This results in a crash of the 
i915.ko

kernel module in a Linux HVM guest when it needs to access the IGD
opregion:

Oct 23 11:36:33 domU kernel: Call Trace:
Oct 23 11:36:33 domU kernel:  ? idr_alloc+0x39/0x70
Oct 23 11:36:33 domU kernel: drm_get_last_vbltimestamp+0xaa/0xc0 [drm]
Oct 23 11:36:33 domU kernel: drm_reset_vblank_timestamp+0x5b/0xd0 
[drm]

Oct 23 11:36:33 domU kernel:  drm_crtc_vblank_on+0x7b/0x130 [drm]
Oct 23 11:36:33 domU kernel: 
intel_modeset_setup_hw_state+0xbd4/0x1900 [i915]

Oct 23 11:36:33 domU kernel:  ? _cond_resched+0x16/0x40
Oct 23 11:36:33 domU kernel:  ? ww_mutex_lock+0x15/0x80
Oct 23 11:36:33 domU kernel: intel_modeset_init_nogem+0x867/0x1d30 
[i915]

Oct 23 11:36:33 domU kernel:  ? gen6_write32+0x4b/0x1c0 [i915]
Oct 23 11:36:33 domU kernel:  ? intel_irq_postinstall+0xb9/0x670 
[i915]

Oct 23 11:36:33 domU kernel:  i915_driver_probe+0x5c2/0xc90 [i915]
Oct 23 11:36:33 domU kernel:  ? 
vga_switcheroo_client_probe_defer+0x1f/0x40

Oct 23 11:36:33 domU kernel:  ? i915_pci_probe+0x3f/0x150 [i915]
Oct 23 11:36:33 domU kernel:  local_pci_probe+0x42/0x80
Oct 23 11:36:33 domU kernel:  ? _cond_resched+0x16/0x40
Oct 23 11:36:33 domU kernel:  pci_device_probe+0xfd/0x1b0
Oct 23 11:36:33 domU kernel:  really_probe+0x222/0x480
Oct 23 11:36:33 domU kernel:  driver_probe_device+0xe1/0x150
Oct 23 11:36:33 domU kernel:  device_driver_attach+0xa1/0xb0
Oct 23 11:36:33 domU kernel:  __driver_attach+0x8a/0x150
Oct 23 11:36:33 domU kernel:  ? device_driver_attach+0xb0/0xb0
Oct 23 11:36:33 domU kernel:  ? device_driver_attach+0xb0/0xb0
Oct 23 11:36:33 domU kernel:  bus_for_each_dev+0x78/0xc0
Oct 23 11:36:33 domU kernel:  bus_add_driver+0x12b/0x1e0
Oct 23 11:36:33 domU kernel:  driver_register+0x8b/0xe0
Oct 23 11:36:33 domU kernel:  ? 0xc06b8000
Oct 23 11:36:33 domU kernel:  i915_init+0x5d/0x70 [i915]
Oct 23 11:36:33 domU kernel:  do_one_initcall+0x44/0x1d0
Oct 23 11:36:33 domU kernel:  ? do_init_module+0x23/0x260
Oct 23 11:36:33 domU kernel:  ? kmem_cache_alloc_trace+0xf5/0x200
Oct 23 11:36:33 domU kernel:  do_init_module+0x5c/0x260
Oct 23 11:36:33 domU kernel: __do_sys_finit_module+0xb1/0x110
Oct 23 11:36:33 domU kernel:  do_syscall_64+0x33/0x80
Oct 23 11:36:33 domU kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9

The call trace alone leaves open where exactly the crash occurred.
Looking at 5.17 I notice that the first thing the driver does
after mapping the range it to check the signature (both in
intel_opregion_setup()). As the signature can't possibly match
with no access granted to the underlying mappings, there shouldn't
be any further attempts to use the region in the driver; if there
are, I'd view this as a driver bug.

Yes.  i915_driver_hw_probe does not check the return value of
intel_opregion_setup(dev_priv) and just continues on.

Chuck, the attached patch may help if you want to test it.

Regards,
Jason


I tested the patch - it made no noticeable difference.


Correction (sorry for the confusion):

I didn't know I needed to replace more than just a
re-built i915.ko module to enable the patch
for testing. When I updated the entire Debian kernel
package including all the modules and the kernel
image with the patched kernel package, it made
quite a difference.

With Jason's patch, the three call traces just became a
much shorter error message:

Apr 05 20:46:18 debian kernel: xen: --> pirq=16 -> irq=24 (gsi=24)
Apr 05 20:46:18 debian kernel: i915 :00:02.0: [drm] VT-d active for 
gfx access
Apr 05 20:46:18 debian kernel: i915 :00:02.0: vgaarb: deactivate vga 
console
Apr 05 20:46:18 debian kernel: Console: switching to colour dummy device 
80x25
Apr 05 20:46:18 debian kernel: i915 :00:02.0: [drm] DMAR active, 
disabling use of stolen memory
Apr 05 20:46:18 debian kernel: resource sanity check: requesting [mem 
0x-0x11ffe], which spans more than Reserved [mem 
0xfdfff000-0x]
Apr 05 20:46:18 debian kernel: caller memremap+0xeb/0x1c0 mapping 
multiple BARs
Apr 05 20:46:18 debian kernel: i915 :00:02.0: Device initialization 
failed (-22)
Apr 05 20:46:18 debian kernel: i915 :00:02.0: Please file a bug on 
drm/i915; see 
https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs 
for details.
Apr 05 20:46:18 debian kernel: i915: probe of :00:02.0 failed with 
error -22

- End of Kernel Error Log --

So I think the patch does propagate the error up the
stack and bails out before producing the Call traces,

and...

I even had output after booting - the gdm3 Gnome display
manager login page displayed, but when I tried to login to
the Gnome 

Re: [PATCH v3 5/5] tools: add example application to initialize dom0less PV drivers

2022-04-05 Thread Stefano Stabellini
On Tue, 5 Apr 2022, Stefano Stabellini wrote:
> On Fri, 1 Apr 2022, Julien Grall wrote:
> > On 01/04/2022 01:35, Stefano Stabellini wrote:
> > > > > > > +
> > > > > > > +/* Alloc magic pages */
> > > > > > > +if (alloc_magic_pages(info, ) != 0) {
> > > > > > > +printf("Error on alloc magic pages\n");
> > > > > > > +return 1;
> > > > > > > +}
> > > > > > > +
> > > > > > > +xc_dom_gnttab_init();
> > > > > > 
> > > > > > This call as the risk to break the guest if the dom0 Linux doesn't
> > > > > > support
> > > > > > the
> > > > > > acquire interface. This is because it will punch a hole in the 
> > > > > > domain
> > > > > > memory
> > > > > > where the grant-table may have already been mapped.
> > > > > > 
> > > > > > Also, this function could fails.
> > > > > 
> > > > > I'll check for return errors. Dom0less is for fully static
> > > > > configurations so I think it is OK to return error and abort if
> > > > > something unexpected happens: dom0less' main reason for being is that
> > > > > there is nothing unexpected :-)
> > > > Does this mean the caller will have to reboot the system if there is an
> > > > error?
> > > > IOW, we don't expect them to call ./init-dom0less twice.
> > > 
> > > Yes, exactly. I think init-dom0less could even panic. My mental model is
> > > that this is an "extension" of construct_domU. Over there we just panic
> > > if something is wrong and here it would be similar. The user provided a
> > > wrong config and should fix it.
> > 
> > Ok. I think we should make explicit how it can be used.
> > 
> > > > > > > +
> > > > > > > +libxl_uuid_generate();
> > > > > > > +xc_domain_sethandle(dom.xch, info->domid,
> > > > > > > libxl_uuid_bytearray());
> > > > > > > +
> > > > > > > +rc = gen_stub_json_config(info->domid, );
> > > > > > > +if (rc)
> > > > > > > +err(1, "gen_stub_json_config");
> > > > > > > +
> > > > > > > +rc = restore_xenstore(xsh, info, uuid, dom.xenstore_evtchn);
> > > > > > > +if (rc)
> > > > > > > +err(1, "writing to xenstore");
> > > > > > > +
> > > > > > > +xs_introduce_domain(xsh, info->domid,
> > > > > > > +(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) +
> > > > > > > XENSTORE_PFN_OFFSET,
> > > > > > > +dom.xenstore_evtchn);
> > > > > > 
> > > > > > xs_introduce_domain() can technically fails.
> > > > > 
> > > > > OK
> > > > > 
> > > > > 
> > > > > > > +return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Check if domain has been configured in XS */
> > > > > > > +static bool domain_exists(struct xs_handle *xsh, int domid)
> > > > > > > +{
> > > > > > > +return xs_is_domain_introduced(xsh, domid);
> > > > > > > +}
> > > > > > 
> > > > > > Would not this lead to initialize a domain with PV driver disabled?
> > > > > 
> > > > > I am not sure I understood your question, but I'll try to answer 
> > > > > anyway.
> > > > > This check is purely to distinguish dom0less guests, which needs 
> > > > > further
> > > > > initializations, from regular guests (e.g. xl guests) that don't need
> > > > > any actions taken here.
> > > > 
> > > > Dom0less domUs can be divided in two categories based on whether they 
> > > > are
> > > > xen
> > > > aware (e.g. xen,enhanced is set).
> > > > 
> > > > Looking at this script, it seems to assume that all dom0less domUs are 
> > > > Xen
> > > > aware. So it will end up to allocate Xenstore ring and call
> > > > xs_introduce_domain(). I suspect the call will end up to fail because 
> > > > the
> > > > event channel would be 0.
> > > > 
> > > > So did you try to use this script on a platform where there only xen 
> > > > aware
> > > > domU and/or a mix?
> > > 
> > > Good idea of asking for this test. I thought I already ran that test,
> > > but I did it again to be sure. Everything works OK (although the
> > > xenstore page allocation is unneeded). xs_introduce_domain does not
> > > fail:
> > 
> > Are you sure? If I pass 0 as the 4th argument (event channel), the command
> > will return EINVAL. However, looking at the code you are not checking the
> > return for the call. So you will continue as if it were successful.
> 
> We are not passing 0 as the 4th argument, we are passing the event
> channel previously set as HVM_PARAM_STORE_EVTCHN by Xen:
> 
> rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_EVTCHN,
>   _evtchn);
> 
> Also in my working version of the series I have a check for the return
> value of xs_introduce_domain (as you requested in one of your previous
> reviews). So xs_introduce_domain is actually working correctly and
> returning success.

Sorry I didn't read carefully enough the older messages. I re-run the
tests again and I can see the issue you were describing (I am puzzled on
why I didn't see it before as I did have a check on the return value as
I wrote -- probably a mistake in my setup.)

The problem goes away if we only call xs_introduce_domain for
xen,enhanced domains (when 

Re: [PATCH v3 5/5] tools: add example application to initialize dom0less PV drivers

2022-04-05 Thread Stefano Stabellini
On Fri, 1 Apr 2022, Juergen Gross wrote:
> On 01.04.22 12:02, Julien Grall wrote:
> > Hi Stefano,
> > 
> > On 01/04/2022 01:35, Stefano Stabellini wrote:
> > > > > > > +
> > > > > > > +    /* Alloc magic pages */
> > > > > > > +    if (alloc_magic_pages(info, ) != 0) {
> > > > > > > +    printf("Error on alloc magic pages\n");
> > > > > > > +    return 1;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    xc_dom_gnttab_init();
> > > > > > 
> > > > > > This call as the risk to break the guest if the dom0 Linux doesn't
> > > > > > support
> > > > > > the
> > > > > > acquire interface. This is because it will punch a hole in the
> > > > > > domain
> > > > > > memory
> > > > > > where the grant-table may have already been mapped.
> > > > > > 
> > > > > > Also, this function could fails.
> > > > > 
> > > > > I'll check for return errors. Dom0less is for fully static
> > > > > configurations so I think it is OK to return error and abort if
> > > > > something unexpected happens: dom0less' main reason for being is that
> > > > > there is nothing unexpected :-)
> > > > Does this mean the caller will have to reboot the system if there is an
> > > > error?
> > > > IOW, we don't expect them to call ./init-dom0less twice.
> > > 
> > > Yes, exactly. I think init-dom0less could even panic. My mental model is
> > > that this is an "extension" of construct_domU. Over there we just panic
> > > if something is wrong and here it would be similar. The user provided a
> > > wrong config and should fix it.
> > 
> > Ok. I think we should make explicit how it can be used.
> > 
> > > > > > > +
> > > > > > > +    libxl_uuid_generate();
> > > > > > > +    xc_domain_sethandle(dom.xch, info->domid,
> > > > > > > libxl_uuid_bytearray());
> > > > > > > +
> > > > > > > +    rc = gen_stub_json_config(info->domid, );
> > > > > > > +    if (rc)
> > > > > > > +    err(1, "gen_stub_json_config");
> > > > > > > +
> > > > > > > +    rc = restore_xenstore(xsh, info, uuid, dom.xenstore_evtchn);
> > > > > > > +    if (rc)
> > > > > > > +    err(1, "writing to xenstore");
> > > > > > > +
> > > > > > > +    xs_introduce_domain(xsh, info->domid,
> > > > > > > +    (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) +
> > > > > > > XENSTORE_PFN_OFFSET,
> > > > > > > +    dom.xenstore_evtchn);
> > > > > > 
> > > > > > xs_introduce_domain() can technically fails.
> > > > > 
> > > > > OK
> > > > > 
> > > > > 
> > > > > > > +    return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Check if domain has been configured in XS */
> > > > > > > +static bool domain_exists(struct xs_handle *xsh, int domid)
> > > > > > > +{
> > > > > > > +    return xs_is_domain_introduced(xsh, domid);
> > > > > > > +}
> > > > > > 
> > > > > > Would not this lead to initialize a domain with PV driver disabled?
> > > > > 
> > > > > I am not sure I understood your question, but I'll try to answer
> > > > > anyway.
> > > > > This check is purely to distinguish dom0less guests, which needs
> > > > > further
> > > > > initializations, from regular guests (e.g. xl guests) that don't need
> > > > > any actions taken here.
> > > > 
> > > > Dom0less domUs can be divided in two categories based on whether they
> > > > are xen
> > > > aware (e.g. xen,enhanced is set).
> > > > 
> > > > Looking at this script, it seems to assume that all dom0less domUs are
> > > > Xen
> > > > aware. So it will end up to allocate Xenstore ring and call
> > > > xs_introduce_domain(). I suspect the call will end up to fail because
> > > > the
> > > > event channel would be 0.
> > > > 
> > > > So did you try to use this script on a platform where there only xen
> > > > aware
> > > > domU and/or a mix?
> > > 
> > > Good idea of asking for this test. I thought I already ran that test,
> > > but I did it again to be sure. Everything works OK (although the
> > > xenstore page allocation is unneeded). xs_introduce_domain does not
> >  > fail:
> > 
> > Are you sure? If I pass 0 as the 4th argument (event channel), the command
> > will return EINVAL. However, looking at the code you are not checking the
> > return for the call. So you will continue as if it were successful.
> > 
> > So you will end up to write nodes for a domain Xenstored is not aware and
> > also set HVM_PARAM_STORE_PFN which may further confuse the guest as it may
> > try to initialize Xenstored it discovers the page.
> > 
> > > I think that's because it is usually called on all domains by the
> > > toolstack, even the ones without xenstore support in the kernel.
> > 
> > The toolstack will always allocate the event channel irrespective to whether
> > the guest will use Xenstore. So both the shared page and the event channel
> > are always valid today.
> > 
> > With your series, this will change as the event channel will not be
> > allocated when "xen,enhanced" is not set.
> > 
> > In your case, I think we may want to register the domain to xenstore but say
> > there are no connection available for the domain. Juergen, 

Re: [PATCH v3 5/5] tools: add example application to initialize dom0less PV drivers

2022-04-05 Thread Stefano Stabellini
On Fri, 1 Apr 2022, Julien Grall wrote:
> On 01/04/2022 01:35, Stefano Stabellini wrote:
> > > > > > +
> > > > > > +/* Alloc magic pages */
> > > > > > +if (alloc_magic_pages(info, ) != 0) {
> > > > > > +printf("Error on alloc magic pages\n");
> > > > > > +return 1;
> > > > > > +}
> > > > > > +
> > > > > > +xc_dom_gnttab_init();
> > > > > 
> > > > > This call as the risk to break the guest if the dom0 Linux doesn't
> > > > > support
> > > > > the
> > > > > acquire interface. This is because it will punch a hole in the domain
> > > > > memory
> > > > > where the grant-table may have already been mapped.
> > > > > 
> > > > > Also, this function could fails.
> > > > 
> > > > I'll check for return errors. Dom0less is for fully static
> > > > configurations so I think it is OK to return error and abort if
> > > > something unexpected happens: dom0less' main reason for being is that
> > > > there is nothing unexpected :-)
> > > Does this mean the caller will have to reboot the system if there is an
> > > error?
> > > IOW, we don't expect them to call ./init-dom0less twice.
> > 
> > Yes, exactly. I think init-dom0less could even panic. My mental model is
> > that this is an "extension" of construct_domU. Over there we just panic
> > if something is wrong and here it would be similar. The user provided a
> > wrong config and should fix it.
> 
> Ok. I think we should make explicit how it can be used.
> 
> > > > > > +
> > > > > > +libxl_uuid_generate();
> > > > > > +xc_domain_sethandle(dom.xch, info->domid,
> > > > > > libxl_uuid_bytearray());
> > > > > > +
> > > > > > +rc = gen_stub_json_config(info->domid, );
> > > > > > +if (rc)
> > > > > > +err(1, "gen_stub_json_config");
> > > > > > +
> > > > > > +rc = restore_xenstore(xsh, info, uuid, dom.xenstore_evtchn);
> > > > > > +if (rc)
> > > > > > +err(1, "writing to xenstore");
> > > > > > +
> > > > > > +xs_introduce_domain(xsh, info->domid,
> > > > > > +(GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) +
> > > > > > XENSTORE_PFN_OFFSET,
> > > > > > +dom.xenstore_evtchn);
> > > > > 
> > > > > xs_introduce_domain() can technically fails.
> > > > 
> > > > OK
> > > > 
> > > > 
> > > > > > +return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/* Check if domain has been configured in XS */
> > > > > > +static bool domain_exists(struct xs_handle *xsh, int domid)
> > > > > > +{
> > > > > > +return xs_is_domain_introduced(xsh, domid);
> > > > > > +}
> > > > > 
> > > > > Would not this lead to initialize a domain with PV driver disabled?
> > > > 
> > > > I am not sure I understood your question, but I'll try to answer anyway.
> > > > This check is purely to distinguish dom0less guests, which needs further
> > > > initializations, from regular guests (e.g. xl guests) that don't need
> > > > any actions taken here.
> > > 
> > > Dom0less domUs can be divided in two categories based on whether they are
> > > xen
> > > aware (e.g. xen,enhanced is set).
> > > 
> > > Looking at this script, it seems to assume that all dom0less domUs are Xen
> > > aware. So it will end up to allocate Xenstore ring and call
> > > xs_introduce_domain(). I suspect the call will end up to fail because the
> > > event channel would be 0.
> > > 
> > > So did you try to use this script on a platform where there only xen aware
> > > domU and/or a mix?
> > 
> > Good idea of asking for this test. I thought I already ran that test,
> > but I did it again to be sure. Everything works OK (although the
> > xenstore page allocation is unneeded). xs_introduce_domain does not
> > fail:
> 
> Are you sure? If I pass 0 as the 4th argument (event channel), the command
> will return EINVAL. However, looking at the code you are not checking the
> return for the call. So you will continue as if it were successful.

We are not passing 0 as the 4th argument, we are passing the event
channel previously set as HVM_PARAM_STORE_EVTCHN by Xen:

rc = xc_hvm_param_get(xch, info->domid, HVM_PARAM_STORE_EVTCHN,
  _evtchn);

Also in my working version of the series I have a check for the return
value of xs_introduce_domain (as you requested in one of your previous
reviews). So xs_introduce_domain is actually working correctly and
returning success.



Re: [PATCH v3 13/19] xen/arm: Move fixmap definitions in a separate header

2022-04-05 Thread Stefano Stabellini
On Tue, 5 Apr 2022, Julien Grall wrote:
> On 05/04/2022 22:12, Stefano Stabellini wrote:
> > > +/* Map a page in a fixmap entry */
> > > +extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
> > > +/* Remove a mapping from a fixmap entry */
> > > +extern void clear_fixmap(unsigned map);
> > > +
> > > +#endif /* __ASSEMBLY__ */
> > > +
> > > +#endif /* __ASM_FIXMAP_H */
> > 
> > 
> > It is a good idea to create fixmap.h, but I think it should be acpi.h to
> > include fixmap.h, not the other way around.
> 
> As I wrote in the commit message, one definition in fixmap.h rely on define
> from acpi.h (i.e NUM_FIXMAP_ACPI_PAGES). So if we don't include it, then user
> of FIXMAP_PMAP_BEGIN (see next patch) will requires to include acpi.h in order
> to build.
> 
> Re-ordering the values would not help because the problem would exactly be the
> same but this time the acpi users would have to include pmap.h to define
> NUM_FIX_PMAP.
> 
> > 
> > The appended changes build correctly on top of this patch.
> 
> That's expected because all the users of FIXMAP_ACPI_END will be including
> acpi.h. But after the next patch, we would need pmap.c to include acpi.h.
> 
> I don't think this would be right (and quite likely you would ask why
> this is done). Hence this approach.


I premise that I see your point and I don't feel very strongly either
way. In my opinion the fixmap is the low level "library" that others
make use of, so it should be acpi.h and pmap.h (the clients of the
library) that include fixmap.h and not the other way around.

So I would rather define NUM_FIXMAP_ACPI_PAGES and NUM_FIX_PMAP in
fixmap.h, then have both pmap.h and acpi.h include fixmap.h. It makes
more sense to me. However, I won't insist if you don't like it. Rough
patch below for reference.



diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
index c46a15e59d..a231ebfe25 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -4,8 +4,13 @@
 #ifndef __ASM_FIXMAP_H
 #define __ASM_FIXMAP_H
 
-#include 
-#include 
+#include 
+#include 
+
+#define NUM_FIXMAP_ACPI_PAGES  64
+
+/* Large enough for mapping 5 levels of page tables with some headroom */
+#define NUM_FIX_PMAP 8
 
 /* Fixmap slots */
 #define FIXMAP_CONSOLE  0  /* The primary UART */
@@ -22,6 +27,10 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
+extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES];
+
 /* Map a page in a fixmap entry */
 extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
 /* Remove a mapping from a fixmap entry */
diff --git a/xen/arch/arm/include/asm/pmap.h b/xen/arch/arm/include/asm/pmap.h
index 70eafe2891..31d29e021d 100644
--- a/xen/arch/arm/include/asm/pmap.h
+++ b/xen/arch/arm/include/asm/pmap.h
@@ -2,9 +2,8 @@
 #define __ASM_PMAP_H__
 
 #include 
+#include 
 
-/* XXX: Find an header to declare it */
-extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES];
 
 static inline void arch_pmap_map(unsigned int slot, mfn_t mfn)
 {
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 1b9c75e68f..afcc9d5b4f 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -28,12 +28,7 @@
 #define _LINUX
 #endif
 
-/*
- * Fixmap pages to reserve for ACPI boot-time tables (see
- * arch/x86/include/asm/fixmap.h or arch/arm/include/asm/fixmap.h),
- * 64 pages(256KB) is large enough for most cases.)
- */
-#define NUM_FIXMAP_ACPI_PAGES  64
+#include 
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/xen/pmap.h b/xen/include/xen/pmap.h
index 93e61b1087..aa892154c0 100644
--- a/xen/include/xen/pmap.h
+++ b/xen/include/xen/pmap.h
@@ -1,9 +1,6 @@
 #ifndef __XEN_PMAP_H__
 #define __XEN_PMAP_H__
 
-/* Large enough for mapping 5 levels of page tables with some headroom */
-#define NUM_FIX_PMAP 8
-
 #ifndef __ASSEMBLY__
 
 #include 



Re: [PATCH v3 17/19] xen/arm64: mm: Add memory to the boot allocator first

2022-04-05 Thread Stefano Stabellini
On Tue, 5 Apr 2022, Julien Grall wrote:
> On 05/04/2022 22:50, Stefano Stabellini wrote:
> > > +static void __init setup_mm(void)
> > > +{
> > > +const struct meminfo *banks = 
> > > +paddr_t ram_start = ~0;
> > > +paddr_t ram_end = 0;
> > > +paddr_t ram_size = 0;
> > > +unsigned int i;
> > > +
> > > +init_pdx();
> > > +
> > > +/*
> > > + * We need some memory to allocate the page-tables used for the
> > > xenheap
> > > + * mappings. But some regions may contain memory already allocated
> > > + * for other uses (e.g. modules, reserved-memory...).
> > > + *
> > > + * For simplify add all the free regions in the boot allocator.
> > > + */
> > 
> > We currently have:
> > 
> > BUG_ON(nr_bootmem_regions == (PAGE_SIZE / sizeof(struct bootmem_region)));
> 
> This has enough space for 256 distinct regions on arm64 (512 regions on
> arm32).
> 
> > 
> > Do you think we should check for the limit in populate_boot_allocator?
> 
> This patch doesn't change the number of regions added to the boot allocator.
> So if we need to check the limit then I would rather deal separately (see more
> below).
> 
> > Or there is no need because it is unrealistic to reach it?
> I can't say never because history told us on some UEFI systems, there will be
> a large number of regions exposed. I haven't heard anyone that would hit the
> BUG_ON().
> 
> The problem is what do we do if we hit the limit? We could ignore all the
> regions after. However, there are potentially a risk there would not be enough
> memory to cover the boot memory allocation (regions may be really small).
> 
> So if we ever hit the limit, then I think we should update the boot allocator.

OK, thanks for the explanation.

Reviewed-by: Stefano Stabellini 



Re: [PATCH v3 19/19] xen/arm: mm: Re-implement setup_frame_table_mappings() with map_pages_to_xen()

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> Now that map_pages_to_xen() has been extended to support 2MB mappings,
> we can replace the create_mappings() call by map_pages_to_xen() call.
> 
> This has the advantage to remove the differences between 32-bit and
> 64-bit code.
> 
> Lastly remove create_mappings() as there is no more callers.
> 
> Signed-off-by: Julien Grall 
> Signed-off-by: Julien Grall 
> 
> ---
> Changes in v3:
> - Fix typo in the commit message
> - Remove the TODO regarding contiguous bit
> 
> Changes in v2:
> - New patch
> ---
>  xen/arch/arm/mm.c | 63 ---
>  1 file changed, 5 insertions(+), 58 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 4af59375d998..d73f49d5b6fc 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -354,40 +354,6 @@ void clear_fixmap(unsigned map)
>  BUG_ON(res != 0);
>  }
>  
> -/* Create Xen's mappings of memory.
> - * Mapping_size must be either 2MB or 32MB.
> - * Base and virt must be mapping_size aligned.
> - * Size must be a multiple of mapping_size.
> - * second must be a contiguous set of second level page tables
> - * covering the region starting at virt_offset. */
> -static void __init create_mappings(lpae_t *second,
> -   unsigned long virt_offset,
> -   unsigned long base_mfn,
> -   unsigned long nr_mfns,
> -   unsigned int mapping_size)
> -{
> -unsigned long i, count;
> -const unsigned long granularity = mapping_size >> PAGE_SHIFT;
> -lpae_t pte, *p;
> -
> -ASSERT((mapping_size == MB(2)) || (mapping_size == MB(32)));
> -ASSERT(!((virt_offset >> PAGE_SHIFT) % granularity));
> -ASSERT(!(base_mfn % granularity));
> -ASSERT(!(nr_mfns % granularity));
> -
> -count = nr_mfns / XEN_PT_LPAE_ENTRIES;
> -p = second + second_linear_offset(virt_offset);
> -pte = mfn_to_xen_entry(_mfn(base_mfn), MT_NORMAL);
> -if ( granularity == 16 * XEN_PT_LPAE_ENTRIES )
> -pte.pt.contig = 1;  /* These maps are in 16-entry contiguous chunks. 
> */
> -for ( i = 0; i < count; i++ )
> -{
> -write_pte(p + i, pte);
> -pte.pt.base += 1 << XEN_PT_LPAE_SHIFT;
> -}
> -flush_xen_tlb_local();
> -}
> -
>  #ifdef CONFIG_DOMAIN_PAGE
>  void *map_domain_page_global(mfn_t mfn)
>  {
> @@ -846,36 +812,17 @@ void __init setup_frametable_mappings(paddr_t ps, 
> paddr_t pe)
>  unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
>  mfn_t base_mfn;
>  const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : 
> MB(32);
> -#ifdef CONFIG_ARM_64
> -lpae_t *second, pte;
> -unsigned long nr_second;
> -mfn_t second_base;
> -int i;
> -#endif
> +int rc;
>  
>  frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
>  /* Round up to 2M or 32M boundary, as appropriate. */
>  frametable_size = ROUNDUP(frametable_size, mapping_size);
>  base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
>  
> -#ifdef CONFIG_ARM_64
> -/* Compute the number of second level pages. */
> -nr_second = ROUNDUP(frametable_size, FIRST_SIZE) >> FIRST_SHIFT;
> -second_base = alloc_boot_pages(nr_second, 1);
> -second = mfn_to_virt(second_base);
> -for ( i = 0; i < nr_second; i++ )
> -{
> -clear_page(mfn_to_virt(mfn_add(second_base, i)));
> -pte = mfn_to_xen_entry(mfn_add(second_base, i), MT_NORMAL);
> -pte.pt.table = 1;
> -write_pte(_first[first_table_offset(FRAMETABLE_VIRT_START)+i], 
> pte);
> -}
> -create_mappings(second, 0, mfn_x(base_mfn), frametable_size >> 
> PAGE_SHIFT,
> -mapping_size);
> -#else
> -create_mappings(xen_second, FRAMETABLE_VIRT_START, mfn_x(base_mfn),
> -frametable_size >> PAGE_SHIFT, mapping_size);
> -#endif
> +rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> +  frametable_size >> PAGE_SHIFT, PAGE_HYPERVISOR_RW);

Doesn't it need to be PAGE_HYPERVISOR_RW | _PAGE_BLOCK ?


> +if ( rc )
> +panic("Unable to setup the frametable mappings.\n");
>  
>  memset(_table[0], 0, nr_pdxs * sizeof(struct page_info));
>  memset(_table[nr_pdxs], -1,
> -- 
> 2.32.0
> 



Re: [PATCH v3 18/19] xen/arm: mm: Rework setup_xenheap_mappings()

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> The current implementation of setup_xenheap_mappings() is using 1GB
> mappings. This can lead to unexpected result because the mapping
> may alias a non-cachable region (such as device or reserved regions).
> For more details see B2.8 in ARM DDI 0487H.a.
> 
> map_pages_to_xen() was recently reworked to allow superpage mappings,
> support contiguous mapping and deal with the use of pagge-tables before

pagetables


> they are mapped.
> 
> Most of the code in setup_xenheap_mappings() is now replaced with a
> single call to map_pages_to_xen().
> 
> Signed-off-by: Julien Grall 
> Signed-off-by: Julien Grall 
> 
> ---
> Changes in v3:
> - Don't use 1GB mapping
> - Re-order code in setup_mm() in a separate patch
> 
> Changes in v2:
> - New patch
> ---
>  xen/arch/arm/mm.c | 87 ++-
>  1 file changed, 18 insertions(+), 69 deletions(-)

Very good!



> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 11b6b60a2bc1..4af59375d998 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -138,17 +138,6 @@ static DEFINE_PAGE_TABLE(cpu0_pgtable);
>  static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES);
>  #endif
>  
> -#ifdef CONFIG_ARM_64
> -/* The first page of the first level mapping of the xenheap. The
> - * subsequent xenheap first level pages are dynamically allocated, but
> - * we need this one to bootstrap ourselves. */
> -static DEFINE_PAGE_TABLE(xenheap_first_first);
> -/* The zeroeth level slot which uses xenheap_first_first. Used because
> - * setup_xenheap_mappings otherwise relies on mfn_to_virt which isn't
> - * valid for a non-xenheap mapping. */
> -static __initdata int xenheap_first_first_slot = -1;
> -#endif
> -
>  /* Common pagetable leaves */
>  /* Second level page tables.
>   *
> @@ -815,77 +804,37 @@ void __init setup_xenheap_mappings(unsigned long 
> base_mfn,
>  void __init setup_xenheap_mappings(unsigned long base_mfn,
> unsigned long nr_mfns)
>  {
> -lpae_t *first, pte;
> -unsigned long mfn, end_mfn;
> -vaddr_t vaddr;
> -
> -/* Align to previous 1GB boundary */
> -mfn = base_mfn & ~((FIRST_SIZE>>PAGE_SHIFT)-1);
> +int rc;
>  
>  /* First call sets the xenheap physical and virtual offset. */
>  if ( mfn_eq(xenheap_mfn_start, INVALID_MFN) )
>  {
> +unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
> +
>  xenheap_mfn_start = _mfn(base_mfn);
>  xenheap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
> +/*
> + * The base address may not be aligned to the first level
> + * size (e.g. 1GB when using 4KB pages). This would prevent
> + * superpage mappings for all the regions because the virtual
> + * address and machine address should both be suitably aligned.
> + *
> + * Prevent that by offsetting the start of the xenheap virtual
> + * address.
> + */
>  xenheap_virt_start = DIRECTMAP_VIRT_START +
> -(base_mfn - mfn) * PAGE_SIZE;
> +(base_mfn - mfn_gb) * PAGE_SIZE;
>  }

[...]

> +rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
> +  _mfn(base_mfn), nr_mfns,
> +  PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
> +if ( rc )
> +panic("Unable to setup the xenheap mappings.\n");


I understand the intent of the code and I like it. maddr_to_virt is
implemented as:

return (void *)(XENHEAP_VIRT_START -
(xenheap_base_pdx << PAGE_SHIFT) +
((ma & ma_va_bottom_mask) |
 ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));

The PDX stuff is always difficult to follow and I cannot claim that I
traced through exactly what the resulting virtual address in the mapping
would be for a given base_mfn, but the patch looks correct compared to
the previous code.


Reviewed-by: Stefano Stabellini 



[xen-unstable test] 169172: tolerable FAIL

2022-04-05 Thread osstest service owner
flight 169172 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169172/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-freebsd10-amd64 19 guest-localmigrate/x10  fail pass in 169163
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 169163

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169163
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 169163
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169163
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169163
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 169163
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 169163
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 169163
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 169163
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 169163
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169163
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 169163
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169163
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 

Re: [PATCH v3 17/19] xen/arm64: mm: Add memory to the boot allocator first

2022-04-05 Thread Julien Grall

Hi Stefano,

On 05/04/2022 22:50, Stefano Stabellini wrote:

+static void __init setup_mm(void)
+{
+const struct meminfo *banks = 
+paddr_t ram_start = ~0;
+paddr_t ram_end = 0;
+paddr_t ram_size = 0;
+unsigned int i;
+
+init_pdx();
+
+/*
+ * We need some memory to allocate the page-tables used for the xenheap
+ * mappings. But some regions may contain memory already allocated
+ * for other uses (e.g. modules, reserved-memory...).
+ *
+ * For simplify add all the free regions in the boot allocator.
+ */


We currently have:

BUG_ON(nr_bootmem_regions == (PAGE_SIZE / sizeof(struct bootmem_region)));


This has enough space for 256 distinct regions on arm64 (512 regions on 
arm32).




Do you think we should check for the limit in populate_boot_allocator?


This patch doesn't change the number of regions added to the boot 
allocator. So if we need to check the limit then I would rather deal 
separately (see more below).



Or there is no need because it is unrealistic to reach it?
I can't say never because history told us on some UEFI systems, there 
will be a large number of regions exposed. I haven't heard anyone that 
would hit the BUG_ON().


The problem is what do we do if we hit the limit? We could ignore all 
the regions after. However, there are potentially a risk there would not 
be enough memory to cover the boot memory allocation (regions may be 
really small).


So if we ever hit the limit, then I think we should update the boot 
allocator.


Cheers,

--
Julien Grall



Re: [PATCH v3 17/19] xen/arm64: mm: Add memory to the boot allocator first

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> Currently, memory is added to the boot allocator after the xenheap
> mappings are done. This will break if the first mapping is more than
> 512GB of RAM.
> 
> In addition to that, a follow-up patch will rework setup_xenheap_mappings()
> to use smaller mappings (e.g. 2MB, 4KB). So it will be necessary to have
> memory in the boot allocator earlier.
> 
> Only free memory (e.g. not reserved or modules) can be added to the boot
> allocator. It might be possible that some regions (including the first
> one) will have no free memory.
> 
> So we need to add all the free memory to the boot allocator first
> and then add do the mappings.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> Changes in v3:
> - Patch added
> ---
>  xen/arch/arm/setup.c | 63 +---
>  1 file changed, 42 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index d5d0792ed48a..777cf96639f5 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -767,30 +767,18 @@ static void __init setup_mm(void)
>  init_staticmem_pages();
>  }
>  #else /* CONFIG_ARM_64 */
> -static void __init setup_mm(void)
> +static void __init populate_boot_allocator(void)
>  {
> -paddr_t ram_start = ~0;
> -paddr_t ram_end = 0;
> -paddr_t ram_size = 0;
> -int bank;
> -
> -init_pdx();
> +unsigned int i;
> +const struct meminfo *banks = 
>  
> -total_pages = 0;
> -for ( bank = 0 ; bank < bootinfo.mem.nr_banks; bank++ )
> +for ( i = 0; i < banks->nr_banks; i++ )
>  {
> -paddr_t bank_start = bootinfo.mem.bank[bank].start;
> -paddr_t bank_size = bootinfo.mem.bank[bank].size;
> -paddr_t bank_end = bank_start + bank_size;
> +const struct membank *bank = >bank[i];
> +paddr_t bank_end = bank->start + bank->size;
>  paddr_t s, e;
>  
> -ram_size = ram_size + bank_size;
> -ram_start = min(ram_start,bank_start);
> -ram_end = max(ram_end,bank_end);
> -
> -setup_xenheap_mappings(bank_start>>PAGE_SHIFT, 
> bank_size>>PAGE_SHIFT);
> -
> -s = bank_start;
> +s = bank->start;
>  while ( s < bank_end )
>  {
>  paddr_t n = bank_end;
> @@ -798,9 +786,7 @@ static void __init setup_mm(void)
>  e = next_module(s, );
>  
>  if ( e == ~(paddr_t)0 )
> -{
>  e = n = bank_end;
> -}
>  
>  if ( e > bank_end )
>  e = bank_end;
> @@ -809,6 +795,41 @@ static void __init setup_mm(void)
>  s = n;
>  }
>  }
> +}
> +
> +static void __init setup_mm(void)
> +{
> +const struct meminfo *banks = 
> +paddr_t ram_start = ~0;
> +paddr_t ram_end = 0;
> +paddr_t ram_size = 0;
> +unsigned int i;
> +
> +init_pdx();
> +
> +/*
> + * We need some memory to allocate the page-tables used for the xenheap
> + * mappings. But some regions may contain memory already allocated
> + * for other uses (e.g. modules, reserved-memory...).
> + *
> + * For simplify add all the free regions in the boot allocator.
> + */

We currently have:

BUG_ON(nr_bootmem_regions == (PAGE_SIZE / sizeof(struct bootmem_region)));

Do you think we should check for the limit in populate_boot_allocator?
Or there is no need because it is unrealistic to reach it?


> +populate_boot_allocator();
> +
> +total_pages = 0;
> +
> +for ( i = 0; i < banks->nr_banks; i++ )
> +{
> +const struct membank *bank = >bank[i];
> +paddr_t bank_end = bank->start + bank->size;
> +
> +ram_size = ram_size + bank->size;
> +ram_start = min(ram_start, bank->start);
> +ram_end = max(ram_end, bank_end);
> +
> +setup_xenheap_mappings(PFN_DOWN(bank->start),
> +   PFN_DOWN(bank->size));
> +}
>  
>  total_pages += ram_size >> PAGE_SHIFT;
>  
> -- 
> 2.32.0
> 



Re: [PATCH v3 13/19] xen/arm: Move fixmap definitions in a separate header

2022-04-05 Thread Julien Grall

Hi Stefano,

On 05/04/2022 22:12, Stefano Stabellini wrote:

+/* Map a page in a fixmap entry */
+extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
+/* Remove a mapping from a fixmap entry */
+extern void clear_fixmap(unsigned map);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_FIXMAP_H */



It is a good idea to create fixmap.h, but I think it should be acpi.h to
include fixmap.h, not the other way around.


As I wrote in the commit message, one definition in fixmap.h rely on 
define from acpi.h (i.e NUM_FIXMAP_ACPI_PAGES). So if we don't include 
it, then user of FIXMAP_PMAP_BEGIN (see next patch) will requires to 
include acpi.h in order to build.


Re-ordering the values would not help because the problem would exactly 
be the same but this time the acpi users would have to include pmap.h to 
define NUM_FIX_PMAP.




The appended changes build correctly on top of this patch.


That's expected because all the users of FIXMAP_ACPI_END will be 
including acpi.h. But after the next patch, we would need pmap.c to 
include acpi.h.


I don't think this would be right (and quite likely you would ask why
this is done). Hence this approach.

Cheers,

--
Julien Grall



Re: [PATCH v3 16/19] xen/arm: mm: Use the PMAP helpers in xen_{,un}map_table()

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> During early boot, it is not possible to use xen_{,un}map_table()
> if the page tables are not residing the Xen binary.
> 
> This is a blocker to switch some of the helpers to use xen_pt_update()
> as we may need to allocate extra page tables and access them before
> the domheap has been initialized (see setup_xenheap_mappings()).
> 
> xen_{,un}map_table() are now updated to use the PMAP helpers for early
> boot map/unmap. Note that the special case for page-tables residing
> in Xen binary has been dropped because it is "complex" and was
> only added as a workaround in 8d4f1b8878e0 ("xen/arm: mm: Allow
> generic xen page-tables helpers to be called early").
> 
> Signed-off-by: Julien Grall 


Reviewed-by: Stefano Stabellini 


> ---
> Changes in v2:
> - New patch
> ---
>  xen/arch/arm/mm.c | 33 +
>  1 file changed, 9 insertions(+), 24 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 659bdf25e0ff..11b6b60a2bc1 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -964,27 +965,11 @@ void *ioremap(paddr_t pa, size_t len)
>  static lpae_t *xen_map_table(mfn_t mfn)
>  {
>  /*
> - * We may require to map the page table before map_domain_page() is
> - * useable. The requirements here is it must be useable as soon as
> - * page-tables are allocated dynamically via alloc_boot_pages().
> - *
> - * We need to do the check on physical address rather than virtual
> - * address to avoid truncation on Arm32. Therefore is_kernel() cannot
> - * be used.
> + * During early boot, map_domain_page() may be unusable. Use the
> + * PMAP to map temporarily a page-table.
>   */
>  if ( system_state == SYS_STATE_early_boot )
> -{
> -if ( is_xen_fixed_mfn(mfn) )
> -{
> -/*
> - * It is fine to demote the type because the size of Xen
> - * will always fit in vaddr_t.
> - */
> -vaddr_t offset = mfn_to_maddr(mfn) - virt_to_maddr(&_start);
> -
> -return (lpae_t *)(XEN_VIRT_START + offset);
> -}
> -}
> +return pmap_map(mfn);
>  
>  return map_domain_page(mfn);
>  }
> @@ -993,12 +978,12 @@ static void xen_unmap_table(const lpae_t *table)
>  {
>  /*
>   * During early boot, xen_map_table() will not use map_domain_page()
> - * for page-tables residing in Xen binary. So skip the unmap part.
> + * but the PMAP.
>   */
> -if ( system_state == SYS_STATE_early_boot && is_kernel(table) )
> -return;
> -
> -unmap_domain_page(table);
> +if ( system_state == SYS_STATE_early_boot )
> +pmap_unmap(table);
> +else
> +unmap_domain_page(table);
>  }
>  
>  static int create_xen_table(lpae_t *entry)
> -- 
> 2.32.0
> 



[linux-linus test] 169174: tolerable FAIL - PUSHED

2022-04-05 Thread osstest service owner
flight 169174 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169174/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 
169145
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169145
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 169145
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169145
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169145
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169145
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169145
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 169145
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linux3123109284176b1532874591f7c81f3837bbdc17
baseline version:
 linux09bb8856d4a7cf3128dedd79cd07d75bbf4a9f04

Last test of basis   169145  2022-04-03 20:41:35 Z2 days
Testing same since   169157  2022-04-04 06:23:01 Z1 days3 attempts


People who touched revisions under test:
  Linus Torvalds 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 

Re: [PATCH v3 15/19] xen/arm: mm: Clean-up the includes and order them

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> The numbers of includes in mm.c has been growing quite a lot. However
> some of them (e.g. xen/device_tree.h, xen/softirq.h) doesn't look
> to be directly used by the file or other will be included by
> larger headers (e.g asm/flushtlb.h will be included by xen/mm.h).
> 
> So trim down the number of includes. Take the opportunity to order
> them with the xen headers first, then asm headers and last public
> headers.
> 
> Signed-off-by: Julien Grall 

I'll trust you on this one :-)

Acked-by: Stefano Stabellini 


> ---
> Changes in v3:
> - Patch added
> ---
>  xen/arch/arm/mm.c | 27 ++-
>  1 file changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index b7942464d4de..659bdf25e0ff 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -17,33 +17,26 @@
>   * GNU General Public License for more details.
>   */
>  
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> +#include 
>  #include 
>  #include 
> -#include 
> -#include 
>  #include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> +#include 
> +#include 
> +#include 
> +#include 
>  #include 
> +#include 
> +#include 
>  #include 
> +
>  #include 
> -#include 
> -#include 
> -#include 
>  
>  #include 
>  #include 
>  
> +#include 
> +
>  /* Override macros from asm/page.h to make them work with mfn_t */
>  #undef virt_to_mfn
>  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> -- 
> 2.32.0
> 



Re: [PATCH v3 14/19] xen/arm: add Persistent Map (PMAP) infrastructure

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Wei Liu 
> 
> The basic idea is like Persistent Kernel Map (PKMAP) in Linux. We
> pre-populate all the relevant page tables before the system is fully
> set up.
> 
> We will need it on Arm in order to rework the arm64 version of
> xenheap_setup_mappings() as we may need to use pages allocated from
> the boot allocator before they are effectively mapped.
> 
> This infrastructure is not lock-protected therefore can only be used
> before smpboot. After smpboot, map_domain_page() has to be used.
> 
> This is based on the x86 version [1] that was originally implemented
> by Wei Liu.
> 
> The PMAP infrastructure is implemented in common code with some
> arch helpers to set/clear the page-table entries and convertion
> between a fixmap slot to a virtual address...
> 
> As mfn_to_xen_entry() now needs to be exported, take the opportunity
> to swich the parameter attr from unsigned to unsigned int.
> 
> [1] 
> 
> 
> Signed-off-by: Wei Liu 
> Signed-off-by: Hongyan Xia 
> [julien: Adapted for Arm]
> Signed-off-by: Julien Grall 
> 
> ---
> Changes in v3:
> - s/BITS_PER_LONG/BITS_PER_BYTE/
> - Move pmap to common code
> 
> Changes in v2:
> - New patch
> 
> Cc: Jan Beulich 
> Cc: Wei Liu 
> Cc: Andrew Cooper 
> Cc: Roger Pau Monné 
> ---
>  xen/arch/arm/Kconfig  |  1 +
>  xen/arch/arm/include/asm/fixmap.h | 17 +++
>  xen/arch/arm/include/asm/lpae.h   |  8 
>  xen/arch/arm/include/asm/pmap.h   | 33 +
>  xen/arch/arm/mm.c |  7 +--
>  xen/common/Kconfig|  3 ++
>  xen/common/Makefile   |  1 +
>  xen/common/pmap.c | 79 +++
>  xen/include/xen/pmap.h| 16 +++
>  9 files changed, 159 insertions(+), 6 deletions(-)
>  create mode 100644 xen/arch/arm/include/asm/pmap.h
>  create mode 100644 xen/common/pmap.c
>  create mode 100644 xen/include/xen/pmap.h
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index ecfa6822e4d3..a89a67802aa9 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -14,6 +14,7 @@ config ARM
>   select HAS_DEVICE_TREE
>   select HAS_PASSTHROUGH
>   select HAS_PDX
> + select HAS_PMAP
>   select IOMMU_FORCE_PT_SHARE
>  
>  config ARCH_DEFCONFIG
> diff --git a/xen/arch/arm/include/asm/fixmap.h 
> b/xen/arch/arm/include/asm/fixmap.h
> index 1cee51e52ab9..c46a15e59de4 100644
> --- a/xen/arch/arm/include/asm/fixmap.h
> +++ b/xen/arch/arm/include/asm/fixmap.h
> @@ -5,12 +5,20 @@
>  #define __ASM_FIXMAP_H
>  
>  #include 
> +#include 
>  
>  /* Fixmap slots */
>  #define FIXMAP_CONSOLE  0  /* The primary UART */
>  #define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
>  #define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
>  #define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  
> /* End mappings of ACPI tables */
> +#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */
> +#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of 
> PMAP */
> +
> +#define FIXMAP_LAST FIXMAP_PMAP_END
> +
> +#define FIXADDR_START FIXMAP_ADDR(0)
> +#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -19,6 +27,15 @@ extern void set_fixmap(unsigned map, mfn_t mfn, unsigned 
> attributes);
>  /* Remove a mapping from a fixmap entry */
>  extern void clear_fixmap(unsigned map);
>  
> +#define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot))
> +
> +static inline unsigned int virt_to_fix(vaddr_t vaddr)
> +{
> +BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
> +
> +return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* __ASM_FIXMAP_H */
> diff --git a/xen/arch/arm/include/asm/lpae.h b/xen/arch/arm/include/asm/lpae.h
> index 8cf932b5c947..6099037da1c0 100644
> --- a/xen/arch/arm/include/asm/lpae.h
> +++ b/xen/arch/arm/include/asm/lpae.h
> @@ -4,6 +4,7 @@
>  #ifndef __ASSEMBLY__
>  
>  #include 
> +#include 
>  
>  /*
>   * WARNING!  Unlike the x86 pagetable code, where l1 is the lowest level and
> @@ -168,6 +169,13 @@ static inline bool lpae_is_superpage(lpae_t pte, 
> unsigned int level)
>  third_table_offset(addr)\
>  }
>  
> +/*
> + * Standard entry type that we'll use to build Xen's own pagetables.
> + * We put the same permissions at every level, because they're ignored
> + * by the walker in non-leaf entries.
> + */
> +lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  /*
> diff --git a/xen/arch/arm/include/asm/pmap.h b/xen/arch/arm/include/asm/pmap.h
> new file mode 100644
> index ..70eafe2891d7
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/pmap.h
> @@ -0,0 +1,33 @@
> +#ifndef __ASM_PMAP_H__
> +#define __ASM_PMAP_H__
> +
> +#include 
> +
> +/* XXX: Find an header to declare it */
> +extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES];


Re: [PATCH v3 13/19] xen/arm: Move fixmap definitions in a separate header

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> To use properly the fixmap definitions, their user would need
> also new to include . This is not very great when
> the user itself is not meant to directly use ACPI definitions.
> 
> Including  in  is not option because
> the latter header is included by everyone. So move out the fixmap
> entries definition in a new header.
> 
> Take the opportunity to also move {set, clear}_fixmap() prototypes
> in the new header.
> 
> Note that most of the definitions in  now need to be
> surrounded with #ifndef __ASSEMBLY__ because  will
> be used in assembly (see EARLY_UART_VIRTUAL_ADDRESS).
> 
> The split will become more helpful in a follow-up patch where new
> fixmap entries will be defined.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> Changes in v3:
> - Patch added
> ---
>  xen/arch/arm/acpi/lib.c |  2 ++
>  xen/arch/arm/include/asm/config.h   |  6 --
>  xen/arch/arm/include/asm/early_printk.h |  1 +
>  xen/arch/arm/include/asm/fixmap.h   | 24 
>  xen/arch/arm/include/asm/mm.h   |  4 
>  xen/arch/arm/kernel.c   |  1 +
>  xen/arch/arm/mm.c   |  1 +
>  xen/include/xen/acpi.h  | 18 +++---
>  8 files changed, 40 insertions(+), 17 deletions(-)
>  create mode 100644 xen/arch/arm/include/asm/fixmap.h
> 
> diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
> index a59cc4074cfb..41d521f720ac 100644
> --- a/xen/arch/arm/acpi/lib.c
> +++ b/xen/arch/arm/acpi/lib.c
> @@ -25,6 +25,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  static bool fixmap_inuse;
>  
>  char *__acpi_map_table(paddr_t phys, unsigned long size)
> diff --git a/xen/arch/arm/include/asm/config.h 
> b/xen/arch/arm/include/asm/config.h
> index 85d4a510ce8a..51908bf9422c 100644
> --- a/xen/arch/arm/include/asm/config.h
> +++ b/xen/arch/arm/include/asm/config.h
> @@ -175,12 +175,6 @@
>  
>  #endif
>  
> -/* Fixmap slots */
> -#define FIXMAP_CONSOLE  0  /* The primary UART */
> -#define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
> -#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
> -#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  
> /* End mappings of ACPI tables */
> -
>  #define NR_hypercalls 64
>  
>  #define STACK_ORDER 3
> diff --git a/xen/arch/arm/include/asm/early_printk.h 
> b/xen/arch/arm/include/asm/early_printk.h
> index 8dc911cf48a3..c5149b2976da 100644
> --- a/xen/arch/arm/include/asm/early_printk.h
> +++ b/xen/arch/arm/include/asm/early_printk.h
> @@ -11,6 +11,7 @@
>  #define __ARM_EARLY_PRINTK_H__
>  
>  #include 
> +#include 
>  
>  #ifdef CONFIG_EARLY_PRINTK
>  
> diff --git a/xen/arch/arm/include/asm/fixmap.h 
> b/xen/arch/arm/include/asm/fixmap.h
> new file mode 100644
> index ..1cee51e52ab9
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/fixmap.h
> @@ -0,0 +1,24 @@
> +/*
> + * fixmap.h: compile-time virtual memory allocation
> + */
> +#ifndef __ASM_FIXMAP_H
> +#define __ASM_FIXMAP_H
> +
> +#include 
> +
> +/* Fixmap slots */
> +#define FIXMAP_CONSOLE  0  /* The primary UART */
> +#define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
> +#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
> +#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  
> /* End mappings of ACPI tables */
> +
> +#ifndef __ASSEMBLY__
> +
> +/* Map a page in a fixmap entry */
> +extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
> +/* Remove a mapping from a fixmap entry */
> +extern void clear_fixmap(unsigned map);
> +
> +#endif /* __ASSEMBLY__ */
> +
> +#endif /* __ASM_FIXMAP_H */


It is a good idea to create fixmap.h, but I think it should be acpi.h to
include fixmap.h, not the other way around.

The appended changes build correctly on top of this patch.


diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
index 1cee51e52a..8cf9dbb618 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -4,8 +4,6 @@
 #ifndef __ASM_FIXMAP_H
 #define __ASM_FIXMAP_H
 
-#include 
-
 /* Fixmap slots */
 #define FIXMAP_CONSOLE  0  /* The primary UART */
 #define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
@@ -14,6 +12,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 /* Map a page in a fixmap entry */
 extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
 /* Remove a mapping from a fixmap entry */
diff --git a/xen/include/xen/acpi.h b/xen/include/xen/acpi.h
index 1b9c75e68f..148673e77c 100644
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -28,6 +28,8 @@
 #define _LINUX
 #endif
 
+#include 
+
 /*
  * Fixmap pages to reserve for ACPI boot-time tables (see
  * arch/x86/include/asm/fixmap.h or arch/arm/include/asm/fixmap.h),



Re: [PATCH v3 12/19] xen/arm: mm: Allow page-table allocation from the boot allocator

2022-04-05 Thread Stefano Stabellini
On Mon, 21 Feb 2022, Julien Grall wrote:
> From: Julien Grall 
> 
> At the moment, page-table can only be allocated from domheap. This means
> it is not possible to create mapping in the page-tables via
> map_pages_to_xen() if page-table needs to be allocated.
> 
> In order to avoid open-coding page-tables update in early boot, we need
> to be able to allocate page-tables much earlier. Thankfully, we have the
> boot allocator for those cases.
> 
> create_xen_table() is updated to cater early boot allocation by using
> alloc_boot_pages().
> 
> Note, this is not sufficient to bootstrap the page-tables (i.e mapping
> before any memory is actually mapped). This will be addressed
> separately.
> 
> Signed-off-by: Julien Grall 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> ---
> Changes in v2:
> - New patch
> ---
>  xen/arch/arm/mm.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 58364bb6c820..f70b8cc7ce87 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1014,19 +1014,27 @@ static void xen_unmap_table(const lpae_t *table)
>  
>  static int create_xen_table(lpae_t *entry)
>  {
> -struct page_info *pg;
> +mfn_t mfn;
>  void *p;
>  lpae_t pte;
>  
> -pg = alloc_domheap_page(NULL, 0);
> -if ( pg == NULL )
> -return -ENOMEM;
> +if ( system_state != SYS_STATE_early_boot )
> +{
> +struct page_info *pg = alloc_domheap_page(NULL, 0);
> +
> +if ( pg == NULL )
> +return -ENOMEM;
> +
> +mfn = page_to_mfn(pg);
> +}
> +else
> +mfn = alloc_boot_pages(1, 1);
>  
> -p = xen_map_table(page_to_mfn(pg));
> +p = xen_map_table(mfn);
>  clear_page(p);
>  xen_unmap_table(p);
>  
> -pte = mfn_to_xen_entry(page_to_mfn(pg), MT_NORMAL);
> +pte = mfn_to_xen_entry(mfn, MT_NORMAL);
>  pte.pt.table = 1;
>  write_pte(entry, pte);



Re: [PATCH v3 07/19] xen/arm: mm: Don't open-code Xen PT update in remove_early_mappings()

2022-04-05 Thread Stefano Stabellini
On Sat, 2 Apr 2022, Julien Grall wrote:
> On 02/04/2022 01:04, Stefano Stabellini wrote:
> > On Mon, 21 Feb 2022, Julien Grall wrote:
> > > From: Julien Grall 
> > > 
> > > Now that xen_pt_update_entry() is able to deal with different mapping
> > > size, we can replace the open-coding of the page-tables update by a call
> > > to modify_xen_mappings().
> > > 
> > > As the function is not meant to fail, a BUG_ON() is added to check the
> > > return.
> > > 
> > > Signed-off-by: Julien Grall 
> > > Signed-off-by: Julien Grall 
> > 
> > Nice!
> > 
> > 
> > > ---
> > >  Changes in v2:
> > >  - Stay consistent with how function name are used in the commit
> > >  message
> > >  - Add my AWS signed-off-by
> > > ---
> > >   xen/arch/arm/mm.c | 10 +-
> > >   1 file changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > > index 7b4b9de8693e..f088a4b2de96 100644
> > > --- a/xen/arch/arm/mm.c
> > > +++ b/xen/arch/arm/mm.c
> > > @@ -599,11 +599,11 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
> > > void __init remove_early_mappings(void)
> > >   {
> > > -lpae_t pte = {0};
> > > -write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START),
> > > pte);
> > > -write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START +
> > > SZ_2M),
> > > -  pte);
> > > -flush_xen_tlb_range_va(BOOT_FDT_VIRT_START, BOOT_FDT_SLOT_SIZE);
> > > +int rc;
> > > +
> > > +rc = modify_xen_mappings(BOOT_FDT_VIRT_START, BOOT_FDT_VIRT_END,
> > > + _PAGE_BLOCK);
> > > +BUG_ON(rc);
> > 
> > Am I right that we are actually destroying the mapping, which usually is
> > done by calling destroy_xen_mappings, but we cannot call
> > destroy_xen_mappings in this case because it doesn't take a flags
> > parameter?
> 
> You are right.
> 
> > 
> > If so, then I would add a flags parameter to destroy_xen_mappings
> > instead of calling modify_xen_mappings just to pass _PAGE_BLOCK.
> > But I don't feel strongly about it so if you don't feel like making the
> > change to destroy_xen_mappings, you can add my acked-by here anyway.
> 
> destroy_xen_mappings() is a function used by common code. This is the only
> place so far where I need to pass _PAGE_BLOCK and I don't expect it to be used
> by the common code any time soon.
> 
> So I am not in favor to add an extra parameter for destroy_xen_mappings().
> 
> Would you prefer if I open-code the call to xen_pt_update?

No need, just add a one-line in-code comment like:

/* destroy the _PAGE_BLOCK mapping */



Re: [PATCH v3 06/19] xen/arm: mm: Avoid flushing the TLBs when mapping are inserted

2022-04-05 Thread Stefano Stabellini
On Sat, 2 Apr 2022, Julien Grall wrote:
> Hi Stefano,
> 
> On 02/04/2022 01:00, Stefano Stabellini wrote:
> > On Mon, 21 Feb 2022, Julien Grall wrote:
> > > From: Julien Grall 
> > > 
> > > Currently, the function xen_pt_update() will flush the TLBs even when
> > > the mappings are inserted. This is a bit wasteful because we don't
> > > allow mapping replacement. Even if we were, the flush would need to
> > > happen earlier because mapping replacement should use Break-Before-Make
> > > when updating the entry.
> > > 
> > > A single call to xen_pt_update() can perform a single action. IOW, it
> > > is not possible to, for instance, mix inserting and removing mappings.
> > > Therefore, we can use `flags` to determine what action is performed.
> > > 
> > > This change will be particularly help to limit the impact of switching
> > > boot time mapping to use xen_pt_update().
> > > 
> > > Signed-off-by: Julien Grall 
> > > 
> > > ---
> > >  Changes in v2:
> > >  - New patch
> > > ---
> > >   xen/arch/arm/mm.c | 17 ++---
> > >   1 file changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > > index fd16c1541ce2..7b4b9de8693e 100644
> > > --- a/xen/arch/arm/mm.c
> > > +++ b/xen/arch/arm/mm.c
> > > @@ -1104,7 +1104,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t
> > > mfn, unsigned int level,
> > >   /* We should be here with a valid MFN. */
> > >   ASSERT(!mfn_eq(mfn, INVALID_MFN));
> > >   -/* We don't allow replacing any valid entry. */
> > > +/*
> > > + * We don't allow replacing any valid entry.
> > > + *
> > > + * Note that the function xen_pt_update() relies on this
> > > + * assumption and will skip the TLB flush. The function will need
> > > + * to be updated if the check is relaxed.
> > > + */
> > >   if ( lpae_is_valid(entry) )
> > >   {
> > >   if ( lpae_is_mapping(entry, level) )
> > > @@ -1417,11 +1423,16 @@ static int xen_pt_update(unsigned long virt,
> > >   }
> > > /*
> > > - * Flush the TLBs even in case of failure because we may have
> > > + * The TLBs flush can be safely skipped when a mapping is inserted
> > > + * as we don't allow mapping replacement (see xen_pt_check_entry()).
> > > + *
> > > + * For all the other cases, the TLBs will be flushed unconditionally
> > > + * even if the mapping has failed. This is because we may have
> > >* partially modified the PT. This will prevent any unexpected
> > >* behavior afterwards.
> > >*/
> > > -flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
> > > +if ( !(flags & _PAGE_PRESENT) || mfn_eq(mfn, INVALID_MFN) )
> > > +flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
> > 
> > I am trying to think of a care where the following wouldn't be enough
> > but I cannot come up with one:
> > 
> > if ( mfn_eq(mfn, INVALID_MFN) )
> > flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
> 
> _PAGE_PRESENT is not set for two cases: when removing a page or populating
> page-tables for a region. Both of them will expect an INVALID_MFN (see the two
> asserts in xen_pt_check_entry()).
> 
> Therefore your solution should work. However, technically the 'mfn' is ignored
> in both situation (hence why this is an ASSERT() rather than a prod check).
> 
> Also, I feel it is better to flush more than less (missing a flush could have
> catastrophic result). So I chose to be explicit in which case the flush can be
> skipped.
> 
> Maybe it would be clearer if I write:
> 
>  !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN))

It is not much a matter of clarity -- I just wanted to check with you
the reasons for the if condition because, as you wrote, wrong tlb
flushes can have catastrophic effects.

That said, actually I prefer your second version:

  !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN))



Re: [PATCH v3 05/19] xen/arm: mm: Add support for the contiguous bit

2022-04-05 Thread Stefano Stabellini
On Sat, 2 Apr 2022, Julien Grall wrote:
> On 02/04/2022 00:53, Stefano Stabellini wrote:
> > On Mon, 21 Feb 2022, Julien Grall wrote:
> > > @@ -1333,21 +1386,34 @@ static int xen_pt_update(unsigned long virt,
> > >   while ( left )
> > >   {
> > >   unsigned int order, level;
> > > +unsigned int nr_contig;
> > > +unsigned int new_flags;
> > > level = xen_pt_mapping_level(vfn, mfn, left, flags);
> > >   order = XEN_PT_LEVEL_ORDER(level);
> > > ASSERT(left >= BIT(order, UL));
> > >   -rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, level,
> > > flags);
> > > -if ( rc )
> > > -break;
> > > +/*
> > > + * Check if we can set the contiguous mapping and update the
> > > + * flags accordingly.
> > > + */
> > > +nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags);
> > > +new_flags = flags | ((nr_contig > 1) ? _PAGE_CONTIG : 0);
> > 
> > Here is an optional idea to make the code simpler. We could move the
> > flags changes (adding/removing _PAGE_CONTIG) to xen_pt_check_contig.
> > That way, we could remove the inner loop.
> > 
> > xen_pt_check_contig could check if _PAGE_CONTIG is already set and based
> > on alignment, it should be able to figure out when it needs to be
> > disabled.
> 
> My initial attempt was to do everything in a loop. But this didn't pan out as
> I wanted (I felt the code was complex) and there are extra work to be done for
> the next 31 entries (assuming 4KB granularity).
> 
> Hence the two loops. Unfortunately, I didn't keep my first attempt. So I can't
> realy show what I wrote.

I trusted you that the resulting code with a single loop was worse.

Reviewed-by: Stefano Stabellini 



Re: [PATCH v3 04/19] xen/arm: mm: Allow other mapping size in xen_pt_update_entry()

2022-04-05 Thread Stefano Stabellini
On Sat, 2 Apr 2022, Julien Grall wrote:
> On 02/04/2022 00:35, Stefano Stabellini wrote:
> > > +/* Return the level where mapping should be done */
> > > +static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned
> > > long nr,
> > > +unsigned int flags)
> > > +{
> > > +unsigned int level;
> > > +unsigned long mask;
> > 
> > Shouldn't mask be 64-bit on aarch32?
> 
> The 3 variables we will use (mfn, vfn, nr) are unsigned long. So it is fine to
> define the mask as unsigned long.

Good point


> > > +}
> > > +
> > >   static DEFINE_SPINLOCK(xen_pt_lock);
> > > static int xen_pt_update(unsigned long virt,
> > >mfn_t mfn,
> > > - unsigned long nr_mfns,
> > > + const unsigned long nr_mfns,
> > 
> > Why const? nr_mfns is an unsigned long so it is passed as value: it
> > couldn't change the caller's parameter anyway. Just curious.
> 
> Because nr_mfns is used to flush the TLBs. In the original I made the mistake
> to decrement the variable and only discovered later on when the TLB contained
> the wrong entry.
> 
> Such bug tends to be very subtle and it is hard to find the root cause. So
> better mark the variable const to avoid any surprise.
> 
> The short version of what I wrote is in the commit message. I can write a
> small comment in the code if you want.

No, that's fine. Thanks for the explanation.


> > >unsigned int flags)
> > >   {
> > >   int rc = 0;
> > > -unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE;
> > > +unsigned long vfn = virt >> PAGE_SHIFT;
> > > +unsigned long left = nr_mfns;
> > > /*
> > >* For arm32, page-tables are different on each CPUs. Yet, they
> > > share
> > > @@ -1268,14 +1330,24 @@ static int xen_pt_update(unsigned long virt,
> > > spin_lock(_pt_lock);
> > >   -for ( ; addr < addr_end; addr += PAGE_SIZE )
> > > +while ( left )
> > >   {
> > > -rc = xen_pt_update_entry(root, addr, mfn, flags);
> > > +unsigned int order, level;
> > > +
> > > +level = xen_pt_mapping_level(vfn, mfn, left, flags);
> > > +order = XEN_PT_LEVEL_ORDER(level);
> > > +
> > > +ASSERT(left >= BIT(order, UL));
> > > +
> > > +rc = xen_pt_update_entry(root, pfn_to_paddr(vfn), mfn, level,
> > > flags);
> > 
> > NIT: I know we don't have vfn_to_vaddr at the moment and there is no
> > widespread usage of vfn in Xen anyway, but it looks off to use
> > pfn_to_paddr on a vfn parameter. Maybe open-code pfn_to_paddr instead?
> > Or introduce vfn_to_vaddr locally in this file?
> 
> To avoid inconsistency with mfn_to_maddr() and gfn_to_gaddr(), I don't want ot
> introduce vfn_to_vaddr() withtout the typesafe part. I think this is a bit
> over the top for now.
> 
> So I will open-code pfn_to_paddr().

Sounds good



Re: cleanup swiotlb initialization v8

2022-04-05 Thread Boris Ostrovsky



On 4/4/22 1:05 AM, Christoph Hellwig wrote:

Hi all,

this series tries to clean up the swiotlb initialization, including
that of swiotlb-xen.  To get there is also removes the x86 iommu table
infrastructure that massively obsfucates the initialization path.

Git tree:

 git://git.infradead.org/users/hch/misc.git swiotlb-init-cleanup

Gitweb:

 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup





Tested-by: Boris Ostrovsky 





Design meeting for AMD SEV-SNP project

2022-04-05 Thread Olivier Lambert
Hello everyone,


As announced during earlier community call, I'm posting here to announce our 
intention to bootstrap Xen support of AMD SEV-SNP technology. In very short, 
this hardware extension on AMD CPUs will allow to run encrypted memory in 
guests, except for explicitly permitted areas. The obvious example use case is 
running this technology in the Cloud, introducing an increased level of trust 
since even Xen or Dom0 couldn't read encrypted guests memory.


For reference, here is our current "base" document to discuss further: 
https://cryptpad.fr/pad/#/2/pad/view/ApLTDJLGLG0mzKGIwrL9M0UVft5nTBnVre7eVAbIk00/
 
[https://cryptpad.fr/pad/#/2/pad/view/ApLTDJLGLG0mzKGIwrL9M0UVft5nTBnVre7eVAbIk00/]


The first action that should be organized is a "design/discussion session" to 
clarify some choices and put things in motion with clear initial and achievable 
targets. It would be nice to meet relatively soon (ie: next week).


Here is a Doodle link we'll use to choose the best day to meet:

https://doodle.com/meeting/participate/id/dBBoPjkd


I selected on purpose an hour to get a suitable schedule for both US and EU 
based people: the same hour than the Xen community call, which is 3:00 PM UTC, 
4:00 PM London time, 5:00 PM Paris time and finally 11:00 AM New-York time.


I will let the Doodle opened until the end of the week to let you know quickly 
here which day was selected.

Regarding the meeting location: https://meet.vates.fr/sev (Jitsi powered)


Let me know if you need any other information or if you have any question :)



Regards,







Olivier Lambert | Vates CEO
XCP-ng & Xen Orchestra - Vates solutions
w: vates.fr | xcp-ng.org | xen-orchestra.com


Re: [PATCH v2] Grab the EFI System Resource Table and check it

2022-04-05 Thread Luca Fancellu


> On 5 Apr 2022, at 20:21, Stefano Stabellini  wrote:
> 
> On Mon, 4 Apr 2022, Luca Fancellu wrote:
>>> On 2 Apr 2022, at 00:14, Demi Marie Obenour  
>>> wrote:
>>> 
>>> The EFI System Resource Table (ESRT) is necessary for fwupd to identify
>>> firmware updates to install.  According to the UEFI specification §23.4,
>>> the table shall be stored in memory of type EfiBootServicesData.
>>> Therefore, Xen must avoid reusing that memory for other purposes, so
>>> that Linux can access the ESRT.  Additionally, Xen must mark the memory
>>> as reserved, so that Linux knows accessing it is safe.
>>> 
>>> See https://lore.kernel.org/xen-devel/20200818184018.GN1679@mail-itl/T/
>>> for details.
>>> 
>>> Signed-off-by: Demi Marie Obenour 
>> 
>> Hi,
>> 
>> I’ve tested the patch on an arm machine booting Xen+Dom0 through EFI, 
>> unfortunately
>> I could not test the functionality.
> 
> I understand you couldn't test ESRT but did the basic Xen+Dom0 boot via
> EFI on ARM work?

Yes, I realise now I should have added *and it works* before the comma, without 
it the sentence is
misleading.

Cheers,
Luca

Re: [PATCH v2] Grab the EFI System Resource Table and check it

2022-04-05 Thread Stefano Stabellini
On Mon, 4 Apr 2022, Luca Fancellu wrote:
> > On 2 Apr 2022, at 00:14, Demi Marie Obenour  
> > wrote:
> > 
> > The EFI System Resource Table (ESRT) is necessary for fwupd to identify
> > firmware updates to install.  According to the UEFI specification §23.4,
> > the table shall be stored in memory of type EfiBootServicesData.
> > Therefore, Xen must avoid reusing that memory for other purposes, so
> > that Linux can access the ESRT.  Additionally, Xen must mark the memory
> > as reserved, so that Linux knows accessing it is safe.
> > 
> > See https://lore.kernel.org/xen-devel/20200818184018.GN1679@mail-itl/T/
> > for details.
> > 
> > Signed-off-by: Demi Marie Obenour 
> 
> Hi,
> 
> I’ve tested the patch on an arm machine booting Xen+Dom0 through EFI, 
> unfortunately
> I could not test the functionality.

I understand you couldn't test ESRT but did the basic Xen+Dom0 boot via
EFI on ARM work?

Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged

2022-04-05 Thread Daniel P. Smith
On 4/5/22 13:17, Jason Andryuk wrote:
> On Mon, Apr 4, 2022 at 11:34 AM Daniel P. Smith
>  wrote:
>>
>> On 3/31/22 09:16, Jason Andryuk wrote:
>>> On Wed, Mar 30, 2022 at 3:05 PM Daniel P. Smith
>>>  wrote:

 There are now instances where internal hypervisor logic needs to make 
 resource
 allocation calls that are protected by XSM checks. The internal hypervisor 
 logic
 is represented a number of system domains which by designed are 
 represented by
 non-privileged struct domain instances. To enable these logic blocks to
 function correctly but in a controlled manner, this commit introduces a 
 pair
 of privilege escalation and demotion functions that will make a system 
 domain
 privileged and then remove that privilege.

 Signed-off-by: Daniel P. Smith 
 ---
  xen/include/xsm/xsm.h | 22 ++
  1 file changed, 22 insertions(+)

 diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
 index e22d6160b5..157e57151e 100644
 --- a/xen/include/xsm/xsm.h
 +++ b/xen/include/xsm/xsm.h
 @@ -189,6 +189,28 @@ struct xsm_operations {
  #endif
  };

 +static always_inline int xsm_elevate_priv(struct domain *d)
 +{
 +if ( is_system_domain(d) )
 +{
 +d->is_privileged = true;
 +return 0;
 +}
 +
 +return -EPERM;
 +}
>>>
>>> These look sufficient for the default policy, but they don't seem
>>> sufficient for Flask.  I think you need to create a new XSM hook.  For
>>> Flask, you would want the demote hook to transition xen_boot_t ->
>>> xen_t.  That would start xen_boot_t with privileges that are dropped
>>> in a one-way transition.  Does that require all policies to then have
>>> xen_boot_t and xen_t?  I guess it does unless the hook code has some
>>> logic to skip the transition.
>>
>> I am still thinking this through but my initial concern for Flask is
>> that I don't think we want dedicated domain transitions directly in
>> code. My current thinking would be to use a Kconfig to use xen_boot_t
>> type as the initial sid for the idle domain which would then require the
>> default policy to include an allowed transition from xen_boot_t to
>> xen_t. Then rely upon a boot domain to issue an xsm_op to do a relabel
>> transition for the idle domain with an assertion that the idle domain is
>> no longer labeled with its initial sid before Xen transitions its state
>> to SYS_STATE_active. The one wrinkle to this is whether I will be able
>> to schedule the boot domain before allowing Xen to transition into
>> SYS_STATE_active.
> 
> That is an interesting approach.  While it would work, I find it
> unusual that a domain would relabel Xen.  I think Xen should be
> responsible for itself and not rely on a domain for this operation.

The boot domain is not a general domain as no domain can/should be
created with its domid or flask label post transition to
SYS_STATE_active. Its purpose was specifically meant to be a natural way
to push out complicated pre-execution domain configuration from having
to be in they hypervisor code. Therefore in a way it can be considered a
user provided de-privileged part of the hypervisor.

With that said, I just realized a flaw in the basis of my position. What
is the difference between codifying a check that the idle domain is not
the boot label versus codifying a transition from the boot label to the
running label? None really, both will require some knowledge that there
is a boot label and some running label. Combine with the fact that the
idle domain really shouldn't have any other label than xen_t. I will
work out how to incorporate the domain transition.

>>> For the default policy, you could start by creating the system domains
>>> as privileged and just have a single hook to drop privs.  Then you
>>> don't have to worry about the "elevate" hook existing.  The patch 2
>>> asserts could instead become the location of xsm_drop_privs calls to
>>> have a clear demarcation point.  That expands the window with
>>> privileges though.  It's a little simpler, but maybe you don't want
>>> that.  However, it seems like you can only depriv once for the Flask
>>> case since you want it to be one-way.
>>
>> This does simplify the solution and since today we cannot differentiate
>> between hypervisor setup and hypervisor initiated domain construction
>> contexts, it does not run counter to what I have proposed. As for flask,
>> again I do not believe codifying a domain transition bound to a new XSM
>> op is the appropriate approach.
> 
> This hard coded domain transition does feel a little weird.  But it
> seems like a natural consequence of trying to use Flask to
> deprivilege.  I guess the transition could be behind a
> dom0less/hyperlaunch Kconfig option.  I just don't see a way around it
> in some fashion with Flask enforcing.
> 
> Another idea: Flask could start in permissive and only transition to
> 

[xen-unstable-smoke test] 169183: tolerable all pass - PUSHED

2022-04-05 Thread osstest service owner
flight 169183 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169183/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  14dd241aad8af447680ac73e8579990e2c09c1e7
baseline version:
 xen  120e26c2bb0097a589d718b1b58d7052ccce4458

Last test of basis   169175  2022-04-05 10:01:52 Z0 days
Testing same since   169183  2022-04-05 14:01:59 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Roger Pau Monne 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   120e26c2bb..14dd241aad  14dd241aad8af447680ac73e8579990e2c09c1e7 -> smoke



[ovmf test] 169177: regressions - FAIL

2022-04-05 Thread osstest service owner
flight 169177 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169177/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a298a84478053872ed9da660a75f182ce81b8ddc
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   36 days
Failing since168258  2022-03-01 01:55:31 Z   35 days  280 attempts
Testing same since   169173  2022-04-05 05:13:00 Z0 days2 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bob Feng 
  Gerd Hoffmann 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jason 
  Jason Lou 
  Ken Lautner 
  Kenneth Lautner 
  Kuo, Ted 
  Laszlo Ersek 
  Leif Lindholm 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min Xu 
  Patrick Rudolph 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Ted Kuo 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  Yuanhao Xie 
  Zhihao Li 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 4610 lines long.)



Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged

2022-04-05 Thread Jason Andryuk
On Mon, Apr 4, 2022 at 11:34 AM Daniel P. Smith
 wrote:
>
> On 3/31/22 09:16, Jason Andryuk wrote:
> > On Wed, Mar 30, 2022 at 3:05 PM Daniel P. Smith
> >  wrote:
> >>
> >> There are now instances where internal hypervisor logic needs to make 
> >> resource
> >> allocation calls that are protected by XSM checks. The internal hypervisor 
> >> logic
> >> is represented a number of system domains which by designed are 
> >> represented by
> >> non-privileged struct domain instances. To enable these logic blocks to
> >> function correctly but in a controlled manner, this commit introduces a 
> >> pair
> >> of privilege escalation and demotion functions that will make a system 
> >> domain
> >> privileged and then remove that privilege.
> >>
> >> Signed-off-by: Daniel P. Smith 
> >> ---
> >>  xen/include/xsm/xsm.h | 22 ++
> >>  1 file changed, 22 insertions(+)
> >>
> >> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> >> index e22d6160b5..157e57151e 100644
> >> --- a/xen/include/xsm/xsm.h
> >> +++ b/xen/include/xsm/xsm.h
> >> @@ -189,6 +189,28 @@ struct xsm_operations {
> >>  #endif
> >>  };
> >>
> >> +static always_inline int xsm_elevate_priv(struct domain *d)
> >> +{
> >> +if ( is_system_domain(d) )
> >> +{
> >> +d->is_privileged = true;
> >> +return 0;
> >> +}
> >> +
> >> +return -EPERM;
> >> +}
> >
> > These look sufficient for the default policy, but they don't seem
> > sufficient for Flask.  I think you need to create a new XSM hook.  For
> > Flask, you would want the demote hook to transition xen_boot_t ->
> > xen_t.  That would start xen_boot_t with privileges that are dropped
> > in a one-way transition.  Does that require all policies to then have
> > xen_boot_t and xen_t?  I guess it does unless the hook code has some
> > logic to skip the transition.
>
> I am still thinking this through but my initial concern for Flask is
> that I don't think we want dedicated domain transitions directly in
> code. My current thinking would be to use a Kconfig to use xen_boot_t
> type as the initial sid for the idle domain which would then require the
> default policy to include an allowed transition from xen_boot_t to
> xen_t. Then rely upon a boot domain to issue an xsm_op to do a relabel
> transition for the idle domain with an assertion that the idle domain is
> no longer labeled with its initial sid before Xen transitions its state
> to SYS_STATE_active. The one wrinkle to this is whether I will be able
> to schedule the boot domain before allowing Xen to transition into
> SYS_STATE_active.

That is an interesting approach.  While it would work, I find it
unusual that a domain would relabel Xen.  I think Xen should be
responsible for itself and not rely on a domain for this operation.

> > For the default policy, you could start by creating the system domains
> > as privileged and just have a single hook to drop privs.  Then you
> > don't have to worry about the "elevate" hook existing.  The patch 2
> > asserts could instead become the location of xsm_drop_privs calls to
> > have a clear demarcation point.  That expands the window with
> > privileges though.  It's a little simpler, but maybe you don't want
> > that.  However, it seems like you can only depriv once for the Flask
> > case since you want it to be one-way.
>
> This does simplify the solution and since today we cannot differentiate
> between hypervisor setup and hypervisor initiated domain construction
> contexts, it does not run counter to what I have proposed. As for flask,
> again I do not believe codifying a domain transition bound to a new XSM
> op is the appropriate approach.

This hard coded domain transition does feel a little weird.  But it
seems like a natural consequence of trying to use Flask to
deprivilege.  I guess the transition could be behind a
dom0less/hyperlaunch Kconfig option.  I just don't see a way around it
in some fashion with Flask enforcing.

Another idea: Flask could start in permissive and only transition to
enforcing at the deprivilege point.  Kinda gross, but it works without
needing a transition.

To reiterate, XSM isn't really appropriate to enforce anything
internal to Xen.  We are working around the need to go through hook
points during correct operation.  Code exec in Xen means all bets are
off.  Memory writes to Xen data mean the XSM checks can be disabled
(flip Flask to permissive) or bypassed (set d->is_privileged or change
d->ssid).  We shouldn't lose sight of this when we talk about
deprivileging the idle domain.

Regards,
Jason



[libvirt test] 169171: regressions - FAIL

2022-04-05 Thread osstest service owner
flight 169171 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169171/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  5d0eeb8cd7088e575c0a2b1d5759ccfb72c525c9
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  634 days
Failing since151818  2020-07-11 04:18:52 Z  633 days  615 attempts
Testing same since   169171  2022-04-05 04:21:54 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Amneesh Singh 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Claudio Fontana 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Haonan Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jing Qi 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  John Levon 
  John Levon 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kim InSoo 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Martin Pitt 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Paolo Bonzini 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Rohit Kumar 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan 

Re: Increasing domain memory beyond initial maxmem

2022-04-05 Thread Marek Marczykowski-Górecki
On Tue, Apr 05, 2022 at 01:03:57PM +0200, Juergen Gross wrote:
> Hi Marek,
> 
> On 31.03.22 14:36, Marek Marczykowski-Górecki wrote:
> > On Thu, Mar 31, 2022 at 02:22:03PM +0200, Juergen Gross wrote:
> > > Maybe some kernel config differences, or other udev rules (memory onlining
> > > is done via udev in my guest)?
> > > 
> > > I'm seeing:
> > > 
> > > # zgrep MEMORY_HOTPLUG /proc/config.gz
> > > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
> > > CONFIG_MEMORY_HOTPLUG=y
> > > # CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
> > > CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
> > > CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512
> > 
> > I have:
> > # zgrep MEMORY_HOTPLUG /proc/config.gz
> > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
> > CONFIG_MEMORY_HOTPLUG=y
> > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
> > CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
> > CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512
> > 
> > Not sure if relevant, but I also have:
> > CONFIG_XEN_UNPOPULATED_ALLOC=y
> > 
> > on top of that, I have a similar udev rule too:
> > 
> > SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", 
> > ATTR{state}="online"
> > 
> > But I don't think they are conflicting.
> > 
> > > What type of guest are you using? Mine was a PVH guest.
> > 
> > PVH here too.
> 
> Would you like to try the attached patch? It seemed to work for me.

Unfortunately it doesn't help, now the behavior is different:

Initially guest started with 800M:

[root@personal ~]# free -m
  totalusedfree  shared  buff/cache   
available
Mem:740 223 272   2 243 
401
Swap:  1023   01023

Then increased:

[root@dom0 ~]$ xl mem-max personal 2048
[root@dom0 ~]$ xenstore-write /local/domain/$(xl domid 
personal)/memory/static-max $((2048*1024))
[root@dom0 ~]$ xl mem-set personal 2000

And guest shows now only a little more memory, but not full 2000M:

[root@personal ~]# [   37.657046] xen:balloon: Populating new zone
[   37.658206] Fallback order for Node 0: 0 
[   37.658219] Built 1 zonelists, mobility grouping on.  Total pages: 175889
[   37.658233] Policy zone: Normal

[root@personal ~]# 
[root@personal ~]# free -m
  totalusedfree  shared  buff/cache   
available
Mem:826 245 337   2 244 
462
Swap:  1023   01023


I've applied the patch on top of 5.16.18. If you think 5.17 would make a
difference, I can try that too.


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH v4 3/8] x86/EFI: retrieve EDID

2022-04-05 Thread Roger Pau Monné
On Tue, Apr 05, 2022 at 04:36:53PM +0200, Jan Beulich wrote:
> On 05.04.2022 12:27, Roger Pau Monné wrote:
> > On Thu, Mar 31, 2022 at 11:45:36AM +0200, Jan Beulich wrote:
> >> --- a/xen/arch/x86/efi/efi-boot.h
> >> +++ b/xen/arch/x86/efi/efi-boot.h
> >> @@ -568,6 +568,49 @@ static void __init efi_arch_video_init(E
> >>  #endif
> >>  }
> >>  
> >> +#ifdef CONFIG_VIDEO
> >> +static bool __init copy_edid(const void *buf, unsigned int size)
> >> +{
> >> +/*
> >> + * Be conservative - for both undersized and oversized blobs it is 
> >> unclear
> >> + * what to actually do with them. The more that unlike the VESA BIOS
> >> + * interface we also have no associated "capabilities" value (which 
> >> might
> >> + * carry a hint as to possible interpretation).
> >> + */
> >> +if ( size != ARRAY_SIZE(boot_edid_info) )
> >> +return false;
> >> +
> >> +memcpy(boot_edid_info, buf, size);
> >> +boot_edid_caps = 0;
> >> +
> >> +return true;
> >> +}
> >> +#endif
> >> +
> >> +static void __init efi_arch_edid(EFI_HANDLE gop_handle)
> >> +{
> >> +#ifdef CONFIG_VIDEO
> >> +static EFI_GUID __initdata active_guid = 
> >> EFI_EDID_ACTIVE_PROTOCOL_GUID;
> >> +static EFI_GUID __initdata discovered_guid = 
> >> EFI_EDID_DISCOVERED_PROTOCOL_GUID;
> > 
> > Is there a need to make those static?
> > 
> > I think this function is either called from efi_start or
> > efi_multiboot, but there aren't multiple calls to it? (also both
> > parameters are IN only, so not to be changed by the EFI method?
> > 
> > I have the feeling setting them to static is done because they can't
> > be set to const?
> 
> Even if they could be const, they ought to also be static. They don't
> strictly need to be, but without "static" code will be generated to
> populate the on-stack variables; quite possibly the compiler would
> even allocate an unnamed static variable and memcpy() from there onto
> the stack.

I thought that making those const (and then annotate with __initconst)
would already have the same effect as having it static, as there will
be no memcpy in that case either.

> >> +EFI_EDID_ACTIVE_PROTOCOL *active_edid;
> >> +EFI_EDID_DISCOVERED_PROTOCOL *discovered_edid;
> >> +EFI_STATUS status;
> >> +
> >> +status = efi_bs->OpenProtocol(gop_handle, _guid,
> >> +  (void **)_edid, efi_ih, NULL,
> >> +  EFI_OPEN_PROTOCOL_GET_PROTOCOL);
> >> +if ( status == EFI_SUCCESS &&
> >> + copy_edid(active_edid->Edid, active_edid->SizeOfEdid) )
> >> +return;
> > 
> > Isn't it enough to just call EFI_EDID_ACTIVE_PROTOCOL_GUID?
> > 
> > From my reading of the UEFI spec this will either return
> > EFI_EDID_OVERRIDE_PROTOCOL_GUID or EFI_EDID_DISCOVERED_PROTOCOL_GUID.
> > If EFI_EDID_OVERRIDE_PROTOCOL is set it must be used, and hence
> > falling back to EFI_EDID_DISCOVERED_PROTOCOL_GUID if
> > EFI_EDID_ACTIVE_PROTOCOL_GUID cannot be parsed would likely mean
> > ignoring EFI_EDID_OVERRIDE_PROTOCOL?
> 
> That's the theory. As per one of the post-commit-message remarks I had
> looked at what GrUB does, and I decided to follow its behavior in this
> regard, assuming they do what they do to work around quirks. As said
> in the remark, I didn't want to go as far as also cloning their use of
> the undocumented (afaik) "agp-internal-edid" variable.

Could you add this as a comment here? So it's not lost on commit as
being just a post-commit log remark. With that:

Acked-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged

2022-04-05 Thread Roger Pau Monné
On Tue, Apr 05, 2022 at 08:06:31AM -0400, Daniel P. Smith wrote:
> On 4/5/22 03:42, Roger Pau Monné wrote:
> > On Mon, Apr 04, 2022 at 12:08:25PM -0400, Daniel P. Smith wrote:
> >> On 4/4/22 11:12, Roger Pau Monné wrote:
> >>> On Mon, Apr 04, 2022 at 10:21:18AM -0400, Daniel P. Smith wrote:
>  On 3/31/22 08:36, Roger Pau Monné wrote:
> > On Wed, Mar 30, 2022 at 07:05:48PM -0400, Daniel P. Smith wrote:
> >> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> >> index e22d6160b5..157e57151e 100644
> >> --- a/xen/include/xsm/xsm.h
> >> +++ b/xen/include/xsm/xsm.h
> >> @@ -189,6 +189,28 @@ struct xsm_operations {
> >>  #endif
> >>  };
> >>  
> >> +static always_inline int xsm_elevate_priv(struct domain *d)
> >
> > I don't think it needs to be always_inline, using just inline would be
> > fine IMO.
> >
> > Also this needs to be __init.
> 
>  AIUI always_inline is likely the best way to preserve the speculation
>  safety brought in by the call to is_system_domain().
> >>>
> >>> There's nothing related to speculation safety in is_system_domain()
> >>> AFAICT. It's just a plain check against d->domain_id. It's my
> >>> understanding there's no need for any speculation barrier there
> >>> because d->domain_id is not an external input.
> >>
> >> Hmmm, this actually raises a good question. Why is is_control_domain(),
> >> is_hardware_domain, and others all have evaluate_nospec() wrapping the
> >> check of a struct domain element while is_system_domain() does not?
> > 
> > Jan replied to this regard, see:
> > 
> > https://lore.kernel.org/xen-devel/54272d08-7ce1-b162-c8e9-1955b780c...@suse.com/
> 
> Jan can correct me if I misunderstood, but his point is with respect to
> where the inline function will be expanded into and I would think you
> would want to ensure that if anyone were to use is_system_domain(), then
> the inline expansion of this new location could create a potential
> speculation-able branch. Basically my concern is not putting the guards
> in place today just because there is not currently any location where
> is_system_domain() is expanded to create a speculation opportunity does
> not mean there is not an opening for the opportunity down the road for a
> future unprotected use.
> 
> >>> In any case this function should be __init only, at which point there
> >>> are no untrusted inputs to Xen.
> >>
> >> I thought it was agreed that __init on inline functions in headers had
> >> no meaning?
> > 
> > In a different reply I already noted my preference would be for the
> > function to not reside in a header and not be inline, simply because
> > it would be gone after initialization and we won't have to worry about
> > any stray calls when the system is active.
> 
> If an inline function is only used by __init code, how would be
> available for stray calls when the system is active? I would concede
> that it is possible for someone to explicitly use in not __init code but
> I would like to believe any usage in a submitted code change would be
> questioned by the reviewers.

Right, it's IMO easier when things just explode when not used
correctly, hence my suggestion to make it __init.

> With that said, if we consider Jason's suggestion would this remove your
> concern since that would only introduce a de-privilege function and
> there would be no piv escalation that could be erroneously called at
> anytime?

Indeed.  IMO everything that happens before the system switches to the
active state should be considered to be running in a privileged
context anyway.  Maybe others have different opinions.  Or maybe there
are use-cases I'm not aware of where this is not true.

Thanks, Roger.



Re: [PATCH] x86/irq: Skip unmap_domain_pirq XSM during destruction

2022-04-05 Thread Jason Andryuk
On Tue, Apr 5, 2022 at 4:18 AM Jan Beulich  wrote:
>
> On 30.03.2022 20:17, Jason Andryuk wrote:
> > xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from
> > complete_domain_destroy as an RCU callback.  The source context was an
> > unexpected, random domain.  Since this is a xen-internal operation,
> > we don't want the XSM hook denying the operation.
> >
> > Check d->is_dying and skip the check when the domain is dead.  The RCU
> > callback runs when a domain is in that state.
>
> One question which has always been puzzling me (perhaps to Daniel): While
> I can see why mapping of an IRQ needs to be subject to an XSM check, it's
> not really clear to me why unmapping would need to be, at least as long
> as it's the domain itself which requests the unmap (and which I would
> view to extend to the domain being cleaned up). But maybe that's why it's
> XSM_HOOK ...
>
> > ---
> > Dan wants to change current to point at DOMID_IDLE when the RCU callback
> > runs.  I think Juergen's commit 53594c7bd197 "rcu: don't use
> > stop_machine_run() for rcu_barrier()" may have changed this since it
> > mentions stop_machine_run scheduled the idle vcpus to run the callbacks
> > for the old code.
> >
> > Would that be as easy as changing rcu_do_batch() to do:
> >
> > +/* Run as "Xen" not a random domain's vcpu. */
> > +vcpu = get_current();
> > +set_current(idle_vcpu[smp_processor_id()]);
> >  list->func(list);
> > +set_current(vcpu);
> >
> > or is using set_current() only acceptable as part of context_switch?
>
> Indeed I would question any uses outside of context_switch() (and
> system bringup).
>
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -2340,10 +2340,14 @@ int unmap_domain_pirq(struct domain *d, int pirq)
> >  nr = msi_desc->msi.nvec;
> >  }
> >
> > -ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
> > -   msi_desc ? msi_desc->dev : NULL);
> > -if ( ret )
> > -goto done;
> > +/* When called by complete_domain_destroy via RCU, current is a random
> > + * domain.  Skip the XSM check since this is a Xen-initiated action. */
>
> Comment style.

Yes.  Sorry about that.

> > +if ( d->is_dying != DOMDYING_dead ) {
>
> Please use !d->is_dying. Also please correct the placement of the brace.
> Or you could avoid the need for a brace by leveraging that ret is zero
> ahead of this if(), i.e. ...

Here I was patting myself on the back for remembering the spaces
inside the parens, and I screwed up the brace...  Sorry.

I intentionally chose DOMDYING_dead because, from my reading of the
code, complete_domain_destroy should only reach here when dead (and
not dying).  If this function is reached when DOMDYING_dying, then
that is unexpected.  That would be a guest-initiated action and
therefore the XSM check should apply.

Just checking is_dying is fine, but I want to explain and highlight this aspect.

> > +ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
> > +   msi_desc ? msi_desc->dev : NULL);
> > +if ( ret )
> > +goto done;
> > +}
>
>
> if ( !d->is_dying )
> ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
>msi_desc ? msi_desc->dev : NULL);
> if ( ret )
> goto done;

I'm planning to just do it this way.

Thank you for reviewing.

-Jason



[xen-4.14-testing test] 169170: regressions - FAIL

2022-04-05 Thread osstest service owner
flight 169170 xen-4.14-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169170/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl  18 guest-localmigrate   fail REGR. vs. 168506
 build-arm64-xsm   6 xen-buildfail REGR. vs. 168506

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168506
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168506
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168506
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168506
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168506
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168506
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168506
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168506
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168506
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168506
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168506
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168506
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  

Re: [PATCH v4 3/8] x86/EFI: retrieve EDID

2022-04-05 Thread Jan Beulich
On 05.04.2022 12:27, Roger Pau Monné wrote:
> On Thu, Mar 31, 2022 at 11:45:36AM +0200, Jan Beulich wrote:
>> --- a/xen/arch/x86/efi/efi-boot.h
>> +++ b/xen/arch/x86/efi/efi-boot.h
>> @@ -568,6 +568,49 @@ static void __init efi_arch_video_init(E
>>  #endif
>>  }
>>  
>> +#ifdef CONFIG_VIDEO
>> +static bool __init copy_edid(const void *buf, unsigned int size)
>> +{
>> +/*
>> + * Be conservative - for both undersized and oversized blobs it is 
>> unclear
>> + * what to actually do with them. The more that unlike the VESA BIOS
>> + * interface we also have no associated "capabilities" value (which 
>> might
>> + * carry a hint as to possible interpretation).
>> + */
>> +if ( size != ARRAY_SIZE(boot_edid_info) )
>> +return false;
>> +
>> +memcpy(boot_edid_info, buf, size);
>> +boot_edid_caps = 0;
>> +
>> +return true;
>> +}
>> +#endif
>> +
>> +static void __init efi_arch_edid(EFI_HANDLE gop_handle)
>> +{
>> +#ifdef CONFIG_VIDEO
>> +static EFI_GUID __initdata active_guid = EFI_EDID_ACTIVE_PROTOCOL_GUID;
>> +static EFI_GUID __initdata discovered_guid = 
>> EFI_EDID_DISCOVERED_PROTOCOL_GUID;
> 
> Is there a need to make those static?
> 
> I think this function is either called from efi_start or
> efi_multiboot, but there aren't multiple calls to it? (also both
> parameters are IN only, so not to be changed by the EFI method?
> 
> I have the feeling setting them to static is done because they can't
> be set to const?

Even if they could be const, they ought to also be static. They don't
strictly need to be, but without "static" code will be generated to
populate the on-stack variables; quite possibly the compiler would
even allocate an unnamed static variable and memcpy() from there onto
the stack.

>> +EFI_EDID_ACTIVE_PROTOCOL *active_edid;
>> +EFI_EDID_DISCOVERED_PROTOCOL *discovered_edid;
>> +EFI_STATUS status;
>> +
>> +status = efi_bs->OpenProtocol(gop_handle, _guid,
>> +  (void **)_edid, efi_ih, NULL,
>> +  EFI_OPEN_PROTOCOL_GET_PROTOCOL);
>> +if ( status == EFI_SUCCESS &&
>> + copy_edid(active_edid->Edid, active_edid->SizeOfEdid) )
>> +return;
> 
> Isn't it enough to just call EFI_EDID_ACTIVE_PROTOCOL_GUID?
> 
> From my reading of the UEFI spec this will either return
> EFI_EDID_OVERRIDE_PROTOCOL_GUID or EFI_EDID_DISCOVERED_PROTOCOL_GUID.
> If EFI_EDID_OVERRIDE_PROTOCOL is set it must be used, and hence
> falling back to EFI_EDID_DISCOVERED_PROTOCOL_GUID if
> EFI_EDID_ACTIVE_PROTOCOL_GUID cannot be parsed would likely mean
> ignoring EFI_EDID_OVERRIDE_PROTOCOL?

That's the theory. As per one of the post-commit-message remarks I had
looked at what GrUB does, and I decided to follow its behavior in this
regard, assuming they do what they do to work around quirks. As said
in the remark, I didn't want to go as far as also cloning their use of
the undocumented (afaik) "agp-internal-edid" variable.

>> --- a/xen/include/efi/efiprot.h
>> +++ b/xen/include/efi/efiprot.h
>> @@ -724,5 +724,52 @@ struct _EFI_GRAPHICS_OUTPUT_PROTOCOL {
>>EFI_GRAPHICS_OUTPUT_PROTOCOL_BLT Blt;
>>EFI_GRAPHICS_OUTPUT_PROTOCOL_MODE*Mode;
>>  };
>> +
>> +/*
>> + * EFI EDID Discovered Protocol
>> + * UEFI Specification Version 2.5 Section 11.9
>> + */
>> +#define EFI_EDID_DISCOVERED_PROTOCOL_GUID \
>> +{ 0x1C0C34F6, 0xD380, 0x41FA, { 0xA0, 0x49, 0x8a, 0xD0, 0x6C, 0x1A, 
>> 0x66, 0xAA} }
>> +
>> +typedef struct _EFI_EDID_DISCOVERED_PROTOCOL {
>> +UINT32   SizeOfEdid;
>> +UINT8   *Edid;
>> +} EFI_EDID_DISCOVERED_PROTOCOL;
>> +
>> +/*
>> + * EFI EDID Active Protocol
>> + * UEFI Specification Version 2.5 Section 11.9
>> + */
>> +#define EFI_EDID_ACTIVE_PROTOCOL_GUID \
>> +{ 0xBD8C1056, 0x9F36, 0x44EC, { 0x92, 0xA8, 0xA6, 0x33, 0x7F, 0x81, 
>> 0x79, 0x86} }
>> +
>> +typedef struct _EFI_EDID_ACTIVE_PROTOCOL {
>> +UINT32   SizeOfEdid;
>> +UINT8   *Edid;
>> +} EFI_EDID_ACTIVE_PROTOCOL;
>> +
>> +/*
>> + * EFI EDID Override Protocol
>> + * UEFI Specification Version 2.5 Section 11.9
>> + */
>> +#define EFI_EDID_OVERRIDE_PROTOCOL_GUID \
>> +{ 0x48ECB431, 0xFB72, 0x45C0, { 0xA9, 0x22, 0xF4, 0x58, 0xFE, 0x04, 
>> 0x0B, 0xD5} }
>> +
>> +INTERFACE_DECL(_EFI_EDID_OVERRIDE_PROTOCOL);
>> +
>> +typedef
>> +EFI_STATUS
>> +(EFIAPI *EFI_EDID_OVERRIDE_PROTOCOL_GET_EDID) (
>> +  IN  struct _EFI_EDID_OVERRIDE_PROTOCOL   *This,
>> +  IN  EFI_HANDLE   *ChildHandle,
>> +  OUT UINT32   *Attributes,
>> +  IN OUT  UINTN*EdidSize,
>> +  IN OUT  UINT8   **Edid);
>> +
>> +typedef struct _EFI_EDID_OVERRIDE_PROTOCOL {
>> +EFI_EDID_OVERRIDE_PROTOCOL_GET_EDID  GetEdid;
>> +} EFI_EDID_OVERRIDE_PROTOCOL;
>> +
>>  #endif
> 
> FWIW, EFI_EDID_OVERRIDE_PROTOCOL_GUID is not used by the patch, so I
> guess it's introduced for completeness (or because it's 

Re: [PATCH 1/2] hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XEN

2022-04-05 Thread Anthony PERARD
On Sat, Mar 26, 2022 at 05:58:23PM +0100, Bernhard Beschow wrote:
> igd-passthrough-isa-bridge is only requested in xen_pt but was
> implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc
> which is hereby resolved.
> 
> Signed-off-by: Bernhard Beschow 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [PATCH 2/2] hw/xen/xen_pt: Resolve igd_passthrough_isa_bridge_create() indirection

2022-04-05 Thread Anthony PERARD
On Sat, Mar 26, 2022 at 05:58:24PM +0100, Bernhard Beschow wrote:
> Now that igd_passthrough_isa_bridge_create() is implemented within the
> xen context it may use Xen* data types directly and become
> xen_igd_passthrough_isa_bridge_create(). This resolves an indirection.
> 
> Signed-off-by: Bernhard Beschow 

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



[PATCH] osstest: stop anacron service

2022-04-05 Thread Roger Pau Monne
Just disabling cron in rc.d is not enough. There's also anacron which
will get invoked during startup, and since apt-compat has a delay of
up to 30min it can be picked up by the leak detector if the test
finishes fast enough:

LEAKED [process 14563 sleep] process: root 14563 14556  0 07:49 ?
00:00:00 sleep 1163
LEAKED [process 14550 /bin/sh] process: root 14550  2264  0 07:49 ?
00:00:00 /bin/sh -c run-parts --report /etc/cron.daily
LEAKED [process 14551 run-parts] process: root 14551 14550  0 07:49 ?   
 00:00:00 run-parts --report /etc/cron.daily
LEAKED [process 14556 /bin/sh] process: root 14556 14551  0 07:49 ?
00:00:00 /bin/sh /etc/cron.daily/apt-compat

From:

http://logs.test-lab.xenproject.org/osstest/logs/169015

To prevent this disable anacron like it's done for cron.

Signed-off-by: Roger Pau Monné 
---
 Osstest/TestSupport.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 8103ea1d..8e3e5f68 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -3151,6 +3151,8 @@ sub host_install_postboot_complete ($) {
 target_core_dump_setup($ho);
 target_cmd_root($ho, "update-rc.d cron disable");
 target_cmd_root($ho, "service cron stop");
+target_cmd_root($ho, "update-rc.d anacron disable");
+target_cmd_root($ho, "service anacron stop");
 target_cmd_root($ho, "update-rc.d osstest-confirm-booted start 99 2 .");
 target_https_mitm_proxy_setup($ho);
 }
-- 
2.35.1




[xen-unstable-smoke test] 169175: tolerable all pass - PUSHED

2022-04-05 Thread osstest service owner
flight 169175 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169175/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  120e26c2bb0097a589d718b1b58d7052ccce4458
baseline version:
 xen  e270af94280e6a9610705ebc1fdd1d7a9b1f8a98

Last test of basis   169160  2022-04-04 12:03:06 Z1 days
Testing same since   169175  2022-04-05 10:01:52 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Jan Beulich 
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   e270af9428..120e26c2bb  120e26c2bb0097a589d718b1b58d7052ccce4458 -> smoke



Re: preparations for 4.14.5 ?

2022-04-05 Thread Anthony PERARD
On Mon, Apr 04, 2022 at 03:42:09PM +0200, Jan Beulich wrote:
> On 01.04.2022 15:46, Marek Marczykowski-Górecki wrote:
> > On Wed, Mar 30, 2022 at 12:16:00PM +0200, Jan Beulich wrote:
> > I'm not sure if "just" bugfix qualify for 4.14 at this point, but if so,
> > I'd propose:
> > 0a20a53df158 tools/libs/light: set video_mem for PVH guests
> > 
> > In any case, the above should be backported to 4.15 and 4.16.
> 
> Hmm, Anthony, I'd like to ask for your view here: This looks more
> like a cosmetic change to me at the first glance. Plus it's a
> little odd to see it being proposed for backporting now, when it's
> already almost 4 months old and hence could have gone into 4.15.2
> and 4.14.4 if it was important.

The patch might be good to backport. I guess that could mess up memory
hotplug a little with PVH guests without the patch.

I've got a few others commits which would be good to backport I think:

e45ad0b1b0 ("xl: Fix global pci options")
d2ecf97f91 ("libxl: Don't segfault on soft-reset failure")
d62a34423a ("libxl: Re-scope qmp_proxy_spawn.ao usage")

Thanks,

-- 
Anthony PERARD



Re: [PATCH] x86/irq: Skip unmap_domain_pirq XSM during destruction

2022-04-05 Thread Daniel P. Smith
On 4/5/22 04:18, Jan Beulich wrote:
> On 30.03.2022 20:17, Jason Andryuk wrote:
>> xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from
>> complete_domain_destroy as an RCU callback.  The source context was an
>> unexpected, random domain.  Since this is a xen-internal operation,
>> we don't want the XSM hook denying the operation.
>>
>> Check d->is_dying and skip the check when the domain is dead.  The RCU
>> callback runs when a domain is in that state.
> 
> One question which has always been puzzling me (perhaps to Daniel): While
> I can see why mapping of an IRQ needs to be subject to an XSM check, it's
> not really clear to me why unmapping would need to be, at least as long
> as it's the domain itself which requests the unmap (and which I would
> view to extend to the domain being cleaned up). But maybe that's why it's
> XSM_HOOK ...

There are situations for instance where there is a flask-based system
with one or more domains (v-platform-mgr) that are each responsible for
the management of a subset of domains and are responsible for
hotplugging in and out a device, i.e. granting the privilege to a
v-platform-mgr to call PHYSDEVOP_map_pirq/PHYSDEVOP_unmap_pirq, for the
domains each one is managing.

>> ---
>> Dan wants to change current to point at DOMID_IDLE when the RCU callback
>> runs.  I think Juergen's commit 53594c7bd197 "rcu: don't use
>> stop_machine_run() for rcu_barrier()" may have changed this since it
>> mentions stop_machine_run scheduled the idle vcpus to run the callbacks
>> for the old code.
>>
>> Would that be as easy as changing rcu_do_batch() to do:
>>
>> +/* Run as "Xen" not a random domain's vcpu. */
>> +vcpu = get_current();
>> +set_current(idle_vcpu[smp_processor_id()]);
>>  list->func(list);
>> +set_current(vcpu);
>>
>> or is using set_current() only acceptable as part of context_switch?
> 
> Indeed I would question any uses outside of context_switch() (and
> system bringup).

I am not familiar with the details of the scheduler, but from a higher
level, conceptual perspective, I do not understand why an idle domain
task is being executed without an explicit context switch to the idle
domain to ensure the current world view is consistent with the task
execution scope. Just seems to me like this is creating a situation
where things have the potential to go sideways/wrong.

v/r,
dps



Re: [PATCH 2/2] arch: ensure idle domain is not left privileged

2022-04-05 Thread Daniel P. Smith
On 4/5/22 04:26, Jan Beulich wrote:
> On 31.03.2022 01:05, Daniel P. Smith wrote:
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -589,6 +589,9 @@ static void noinline init_done(void)
>>  void *va;
>>  unsigned long start, end;
>>  
>> +/* Ensure idle domain was not left privileged */
>> +ASSERT(current->domain->is_privileged == false) ;
> 
> I think this should be stronger than ASSERT(); I'd recommend calling
> panic(). Also please don't compare against "true" or "false" - use
> ordinary boolean operations instead (here it would be
> "!current->domain->is_privileged").

Ack.

v/r,
dps



[ovmf test] 169173: regressions - FAIL

2022-04-05 Thread osstest service owner
flight 169173 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169173/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a298a84478053872ed9da660a75f182ce81b8ddc
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   36 days
Failing since168258  2022-03-01 01:55:31 Z   35 days  279 attempts
Testing same since   169173  2022-04-05 05:13:00 Z0 days1 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bob Feng 
  Gerd Hoffmann 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jason 
  Jason Lou 
  Ken Lautner 
  Kenneth Lautner 
  Kuo, Ted 
  Laszlo Ersek 
  Leif Lindholm 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min Xu 
  Patrick Rudolph 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Ted Kuo 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  Yuanhao Xie 
  Zhihao Li 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 4610 lines long.)



Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged

2022-04-05 Thread Daniel P. Smith
On 4/5/22 03:42, Roger Pau Monné wrote:
> On Mon, Apr 04, 2022 at 12:08:25PM -0400, Daniel P. Smith wrote:
>> On 4/4/22 11:12, Roger Pau Monné wrote:
>>> On Mon, Apr 04, 2022 at 10:21:18AM -0400, Daniel P. Smith wrote:
 On 3/31/22 08:36, Roger Pau Monné wrote:
> On Wed, Mar 30, 2022 at 07:05:48PM -0400, Daniel P. Smith wrote:
>> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
>> index e22d6160b5..157e57151e 100644
>> --- a/xen/include/xsm/xsm.h
>> +++ b/xen/include/xsm/xsm.h
>> @@ -189,6 +189,28 @@ struct xsm_operations {
>>  #endif
>>  };
>>  
>> +static always_inline int xsm_elevate_priv(struct domain *d)
>
> I don't think it needs to be always_inline, using just inline would be
> fine IMO.
>
> Also this needs to be __init.

 AIUI always_inline is likely the best way to preserve the speculation
 safety brought in by the call to is_system_domain().
>>>
>>> There's nothing related to speculation safety in is_system_domain()
>>> AFAICT. It's just a plain check against d->domain_id. It's my
>>> understanding there's no need for any speculation barrier there
>>> because d->domain_id is not an external input.
>>
>> Hmmm, this actually raises a good question. Why is is_control_domain(),
>> is_hardware_domain, and others all have evaluate_nospec() wrapping the
>> check of a struct domain element while is_system_domain() does not?
> 
> Jan replied to this regard, see:
> 
> https://lore.kernel.org/xen-devel/54272d08-7ce1-b162-c8e9-1955b780c...@suse.com/

Jan can correct me if I misunderstood, but his point is with respect to
where the inline function will be expanded into and I would think you
would want to ensure that if anyone were to use is_system_domain(), then
the inline expansion of this new location could create a potential
speculation-able branch. Basically my concern is not putting the guards
in place today just because there is not currently any location where
is_system_domain() is expanded to create a speculation opportunity does
not mean there is not an opening for the opportunity down the road for a
future unprotected use.

>>> In any case this function should be __init only, at which point there
>>> are no untrusted inputs to Xen.
>>
>> I thought it was agreed that __init on inline functions in headers had
>> no meaning?
> 
> In a different reply I already noted my preference would be for the
> function to not reside in a header and not be inline, simply because
> it would be gone after initialization and we won't have to worry about
> any stray calls when the system is active.

If an inline function is only used by __init code, how would be
available for stray calls when the system is active? I would concede
that it is possible for someone to explicitly use in not __init code but
I would like to believe any usage in a submitted code change would be
questioned by the reviewers.

With that said, if we consider Jason's suggestion would this remove your
concern since that would only introduce a de-privilege function and
there would be no piv escalation that could be erroneously called at
anytime?

v/r
dps



Xen Security Advisory 399 v2 (CVE-2022-26357) - race in VT-d domain ID cleanup

2022-04-05 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory CVE-2022-26357 / XSA-399
   version 2

race in VT-d domain ID cleanup

UPDATES IN VERSION 2


Public release.

ISSUE DESCRIPTION
=

Xen domain IDs are up to 15 bits wide.  VT-d hardware may allow for only
less than 15 bits to hold a domain ID associating a physical device with
a particular domain.  Therefore internally Xen domain IDs are mapped to
the smaller value range.  The cleaning up of the housekeeping structures
has a race, allowing for VT-d domain IDs to be leaked and flushes to be
bypassed.

IMPACT
==

The precise impact is system specific, but would typically be a Denial
of Service (DoS) affecting the entire host.  Privilege escalation and
information leaks cannot be ruled out.

VULNERABLE SYSTEMS
==

Xen versions 4.11 through 4.16 are vulnerable.  Xen versions 4.10 and
earlier are not vulnerable.

Only x86 systems with VT-d IOMMU hardware are vulnerable.  Arm systems
as well as x86 systems without VT-d hardware or without any IOMMUs in
use are not vulnerable.

Only x86 guests which have physical devices passed through to them can
leverage the vulnerability.

MITIGATION
==

Not passing through physical devices to untrusted guests will avoid
the vulnerability.

CREDITS
===

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa399.patch   xen-unstable
xsa399-4.16.patch  Xen 4.16.x - Xen 4.13.x
xsa399-4.12.patch  Xen 4.12.x

$ sha256sum xsa399*
53b9745564eb21f70dbb7bd7194ff3518f29cd9715c68e9dd7eff25812968019  xsa399.patch
16c3327a60d8ab6c3524f10f57d63efaf2e3e54b807bc285a749cd1a94392a30  
xsa399-4.12.patch
79d0f5a0442dec0a806d77a722a1d2c04793572fe0b564bf86dcd1c6d992a679  
xsa399-4.16.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigation is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.

This is because removal of pass-through devices or their replacement by
emulated devices is a guest visible configuration change, which may lead
to re-discovery of the issue.

Deployment of this mitigation is permitted only AFTER the embargo ends.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmJMJDcMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZpo8H/AqiAS0l5WJWl00bTQ4Q69REzd83m9Y3+UnUqRaf
JUFWo4R1m4V2zJlq0E3TR/2ZS1RkXFJxlmXQyzueFmDEvMV2oKB0ids5ta1oUO2E
eiQxdSFbTLrLnhI+4IxbTHHy+ovSHT/SKPeo1Zd1tXHfZ35g1OgGTYHHqj7RKJHp
SyZT4iuAKjIr61M4NBKJcycpfRidlXEDvAotDX3jBQ06t3vgs/12nwe5LzzeV2V4
sIDjpeDGNKzgT2NgLP2b+XMEUg1259iWb19tS3PPNJaLKSvQqTBOFjK+sqh7ACXV
v6ph2Yy0Q/ZP+N9DvCeBCPEU9A9RhmPYzobU+Lc/T85SrQ4=
=sp/Q
-END PGP SIGNATURE-


xsa399.patch
Description: Binary data


xsa399-4.12.patch
Description: Binary data


xsa399-4.16.patch
Description: Binary data


Xen Security Advisory 397 v2 (CVE-2022-26356) - Racy interactions between dirty vram tracking and paging log dirty hypercalls

2022-04-05 Thread Xen . org security team
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory CVE-2022-26356 / XSA-397
   version 2

 Racy interactions between dirty vram tracking and paging log dirty hypercalls

UPDATES IN VERSION 2


Public release.

ISSUE DESCRIPTION
=

Activation of log dirty mode done by XEN_DMOP_track_dirty_vram (was named
HVMOP_track_dirty_vram before Xen 4.9) is racy with ongoing log dirty
hypercalls.  A suitably timed call to XEN_DMOP_track_dirty_vram can enable
log dirty while another CPU is still in the process of tearing down the
structures related to a previously enabled log dirty mode
(XEN_DOMCTL_SHADOW_OP_OFF).  This is due to lack of mutually exclusive locking
between both operations and can lead to entries being added in already freed
slots, resulting in a memory leak.

IMPACT
==

An attacker can cause Xen to leak memory, eventually leading to a Denial of
Service (DoS) affecting the entire host.

VULNERABLE SYSTEMS
==

All Xen versions from at least 4.0 onwards are vulnerable.

Only x86 systems are vulnerable.  Arm systems are not vulnerable.

Only domains controlling an x86 HVM guest using Hardware Assisted Paging (HAP)
can leverage the vulnerability.  On common deployments this is limited to
domains that run device models on behalf of guests.

MITIGATION
==

Using only PV or PVH guests and/or running HVM guests in shadow mode will avoid
the vulnerability.

CREDITS
===

This issue was discovered by Roger Pau Monné of Citrix.

RESOLUTION
==

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa397.patch   xen-unstable
xsa397-4.16.patch  Xen 4.16.x - Xen 4.15.x
xsa397-4.14.patch  Xen 4.14.x - Xen 4.13.x
xsa397-4.12.patch  Xen 4.12.x

$ sha256sum xsa397*
49c663e2bb9131dbc2488e12487f79bdf0dafd51a32413cbf3964e39d8779cae  xsa397.patch
24f95f47b79739c9cb5b9110137c802989356c82d0aa27963b5ac7e33f667285  
xsa397-4.12.patch
9af14f90ba10d074425eb6072a6c648082c92c1cf8b6f881f57ed2fc13d6e49d  
xsa397-4.14.patch
ff5dd3b7a8dbf349c3b832b7916322c0296fa59c7f9cd2ba30858989add5f65c  
xsa397-4.16.patch
$

DEPLOYMENT DURING EMBARGO
=

Deployment of the patches described above (or others which are substantially
similar) is permitted during the embargo, even on public-facing systems with
untrusted guest users and administrators.

But: Distribution of updated software (except to other members of the
predisclosure list) or deployment of mitigations is prohibited.

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.


(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmJMJDEMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZOUMH/RRZ8aMaoywqTV38SeTFne2tFT5jnWPPXR1ZGCvh
825hmSqzcYUaILbWFruUfT2PdpGoU9Eprz3xWXBDwgsUEGvKt7ZhGoWvxzXASlDh
cPRh/XwQVEEYsB1cRSk/GoLxLCQEV8oGNpmAcjEM4K1dG0VbVaRD0W2thNCmyPcv
d7aTkAdD2IE8NU4hX8YGN6v+UCkjrgzL0AF/hff9CMj7Sn/wBRrdStLT0LDZU20c
G/5+9nsOAVM7EwrzImI5Lx9KELyHwl37XUPffbftyTLUofdHJ5PK40J1tNIRS/RW
YYvs2alF7ng7LlwB/Go8gtn4XRx6xZidceYrUk22oB4JBqo=
=Fje3
-END PGP SIGNATURE-


xsa397.patch
Description: Binary data


xsa397-4.12.patch
Description: Binary data


xsa397-4.14.patch
Description: Binary data


xsa397-4.16.patch
Description: Binary data


[qemu-mainline test] 169166: tolerable FAIL - PUSHED

2022-04-05 Thread osstest service owner
flight 169166 qemu-mainline real [real]
flight 169176 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/169166/
http://logs.test-lab.xenproject.org/osstest/logs/169176/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl  13 debian-fixupfail pass in 169176-retest

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl 15 migrate-support-check fail in 169176 never pass
 test-arm64-arm64-xl 16 saverestore-support-check fail in 169176 never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169138
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169138
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 169138
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 169138
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 169138
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 169138
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 169138
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169138
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu20661b75ea6093f5e59079d00a778a972d6732c5
baseline version:
 qemuubc6ec396d471d9e4aae7e2ff8b72e11da9a97665

Last test of basis   169138  2022-04-03 02:47:41 Z   

Re: [PATCH v4 2/8] x86/boot: obtain video info from boot loader

2022-04-05 Thread Roger Pau Monné
On Tue, Apr 05, 2022 at 12:57:51PM +0200, Jan Beulich wrote:
> On 05.04.2022 11:35, Roger Pau Monné wrote:
> > On Thu, Mar 31, 2022 at 11:45:02AM +0200, Jan Beulich wrote:
> >> --- a/xen/arch/x86/boot/head.S
> >> +++ b/xen/arch/x86/boot/head.S
> >> @@ -562,12 +562,18 @@ trampoline_setup:
> >>  mov %esi, sym_esi(xen_phys_start)
> >>  mov %esi, sym_esi(trampoline_xen_phys_start)
> >>  
> >> -mov sym_esi(trampoline_phys), %ecx
> >> -
> >>  /* Get bottom-most low-memory stack address. */
> >> +mov sym_esi(trampoline_phys), %ecx
> >>  add $TRAMPOLINE_SPACE,%ecx
> > 
> > Just for my understanding, since you are already touching the
> > instruction, why not switch it to a lea like you do below?
> > 
> > Is that because you would also like to take the opportunity to fold
> > the add into the lea and that would be too much of a change?
> 
> No. This MOV cannot be converted, as its source operand isn't an
> immediate (or register); such a conversion would also be undesirable,
> for increasing insn size. See the later patch doing conversions in
> the other direction, to reduce code size. Somewhat similarly ...
> 
> >> +#ifdef CONFIG_VIDEO
> >> +lea sym_esi(boot_vid_info), %edx
> 
> ... this LEA also cannot be expressed by a single MOV.
> 
> >> @@ -32,6 +33,39 @@ asm (
> >>  #include "../../../include/xen/kconfig.h"
> >>  #include 
> >>  
> >> +#ifdef CONFIG_VIDEO
> >> +# include "video.h"
> >> +
> >> +/* VESA control information */
> >> +struct __packed vesa_ctrl_info {
> >> +uint8_t signature[4];
> >> +uint16_t version;
> >> +uint32_t oem_name;
> >> +uint32_t capabilities;
> >> +uint32_t mode_list;
> >> +uint16_t mem_size;
> >> +/* We don't use any further fields. */
> >> +};
> >> +
> >> +/* VESA 2.0 mode information */
> >> +struct vesa_mode_info {
> > 
> > Should we add __packed here just in case further added fields are no
> > longer naturally aligned? (AFAICT all field right now are aligned to
> > it's size so there's no need for it).
> 
> I think we should avoid __packed whenever possible.
> 
> >> +uint16_t attrib;
> >> +uint8_t window[14]; /* We don't use the individual fields. */
> >> +uint16_t bytes_per_line;
> >> +uint16_t width;
> >> +uint16_t height;
> >> +uint8_t cell_width;
> >> +uint8_t cell_height;
> >> +uint8_t nr_planes;
> >> +uint8_t depth;
> >> +uint8_t memory[5]; /* We don't use the individual fields. */
> >> +struct boot_video_colors colors;
> >> +uint8_t direct_color;
> >> +uint32_t base;
> >> +/* We don't use any further fields. */
> >> +};
> > 
> > Would it make sense to put those struct definitions in boot/video.h
> > like you do for boot_video_info?
> 
> Personally I prefer to expose things in headers only when multiple
> other files want to consume what is being declared/defined.
> 
> >> @@ -254,17 +291,64 @@ static multiboot_info_t *mbi2_reloc(u32
> >>  ++mod_idx;
> >>  break;
> >>  
> >> +#ifdef CONFIG_VIDEO
> >> +case MULTIBOOT2_TAG_TYPE_VBE:
> >> +if ( video_out )
> >> +{
> >> +const struct vesa_ctrl_info *ci;
> >> +const struct vesa_mode_info *mi;
> >> +
> >> +video = _p(video_out);
> >> +ci = (void *)get_mb2_data(tag, vbe, vbe_control_info);
> >> +mi = (void *)get_mb2_data(tag, vbe, vbe_mode_info);
> >> +
> >> +if ( ci->version >= 0x0200 && (mi->attrib & 0x9b) == 0x9b 
> >> )
> >> +{
> >> +video->capabilities = ci->capabilities;
> >> +video->lfb_linelength = mi->bytes_per_line;
> >> +video->lfb_width = mi->width;
> >> +video->lfb_height = mi->height;
> >> +video->lfb_depth = mi->depth;
> >> +video->lfb_base = mi->base;
> >> +video->lfb_size = ci->mem_size;
> >> +video->colors = mi->colors;
> >> +video->vesa_attrib = mi->attrib;
> >> +}
> >> +
> >> +video->vesapm.seg = get_mb2_data(tag, vbe, 
> >> vbe_interface_seg);
> >> +video->vesapm.off = get_mb2_data(tag, vbe, 
> >> vbe_interface_off);
> >> +}
> >> +break;
> >> +
> >> +case MULTIBOOT2_TAG_TYPE_FRAMEBUFFER:
> >> +if ( (get_mb2_data(tag, framebuffer, framebuffer_type) !=
> >> +  MULTIBOOT2_FRAMEBUFFER_TYPE_RGB) )
> >> +{
> >> +video_out = 0;
> >> +video = NULL;
> >> +}
> > 
> > I'm confused, don't you need to store the information in the
> > framebuffer tag for use after relocation?
> 
> If there was a consumer - yes. Right now this tag is used only to
> invalidate the information taken from the other tag (or to suppress
> taking values from there if that other tag 

Re: [PATCH 1/2] hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XEN

2022-04-05 Thread Bernhard Beschow
Am 26. März 2022 16:58:23 UTC schrieb Bernhard Beschow :
>igd-passthrough-isa-bridge is only requested in xen_pt but was
>implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc
>which is hereby resolved.
>
>Signed-off-by: Bernhard Beschow 
>---
> hw/i386/pc_piix.c| 118 --
> hw/xen/xen_pt.c  |   1 -
> hw/xen/xen_pt.h  |   1 +
> hw/xen/xen_pt_graphics.c | 119 +++
> include/hw/i386/pc.h |   1 -
> 5 files changed, 120 insertions(+), 120 deletions(-)
>
>diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>index b72c03d0a6..6ad5c02f07 100644
>--- a/hw/i386/pc_piix.c
>+++ b/hw/i386/pc_piix.c
>@@ -801,124 +801,6 @@ static void pc_i440fx_1_4_machine_options(MachineClass 
>*m)
> DEFINE_I440FX_MACHINE(v1_4, "pc-i440fx-1.4", pc_compat_1_4_fn,
>   pc_i440fx_1_4_machine_options);
> 
>-typedef struct {
>-uint16_t gpu_device_id;
>-uint16_t pch_device_id;
>-uint8_t pch_revision_id;
>-} IGDDeviceIDInfo;
>-
>-/* In real world different GPU should have different PCH. But actually
>- * the different PCH DIDs likely map to different PCH SKUs. We do the
>- * same thing for the GPU. For PCH, the different SKUs are going to be
>- * all the same silicon design and implementation, just different
>- * features turn on and off with fuses. The SW interfaces should be
>- * consistent across all SKUs in a given family (eg LPT). But just same
>- * features may not be supported.
>- *
>- * Most of these different PCH features probably don't matter to the
>- * Gfx driver, but obviously any difference in display port connections
>- * will so it should be fine with any PCH in case of passthrough.
>- *
>- * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
>- * scenarios, 0x9cc3 for BDW(Broadwell).
>- */
>-static const IGDDeviceIDInfo igd_combo_id_infos[] = {
>-/* HSW Classic */
>-{0x0402, 0x8c4e, 0x04}, /* HSWGT1D, HSWD_w7 */
>-{0x0406, 0x8c4e, 0x04}, /* HSWGT1M, HSWM_w7 */
>-{0x0412, 0x8c4e, 0x04}, /* HSWGT2D, HSWD_w7 */
>-{0x0416, 0x8c4e, 0x04}, /* HSWGT2M, HSWM_w7 */
>-{0x041E, 0x8c4e, 0x04}, /* HSWGT15D, HSWD_w7 */
>-/* HSW ULT */
>-{0x0A06, 0x8c4e, 0x04}, /* HSWGT1UT, HSWM_w7 */
>-{0x0A16, 0x8c4e, 0x04}, /* HSWGT2UT, HSWM_w7 */
>-{0x0A26, 0x8c4e, 0x06}, /* HSWGT3UT, HSWM_w7 */
>-{0x0A2E, 0x8c4e, 0x04}, /* HSWGT3UT28W, HSWM_w7 */
>-{0x0A1E, 0x8c4e, 0x04}, /* HSWGT2UX, HSWM_w7 */
>-{0x0A0E, 0x8c4e, 0x04}, /* HSWGT1ULX, HSWM_w7 */
>-/* HSW CRW */
>-{0x0D26, 0x8c4e, 0x04}, /* HSWGT3CW, HSWM_w7 */
>-{0x0D22, 0x8c4e, 0x04}, /* HSWGT3CWDT, HSWD_w7 */
>-/* HSW Server */
>-{0x041A, 0x8c4e, 0x04}, /* HSWSVGT2, HSWD_w7 */
>-/* HSW SRVR */
>-{0x040A, 0x8c4e, 0x04}, /* HSWSVGT1, HSWD_w7 */
>-/* BSW */
>-{0x1606, 0x9cc3, 0x03}, /* BDWULTGT1, BDWM_w7 */
>-{0x1616, 0x9cc3, 0x03}, /* BDWULTGT2, BDWM_w7 */
>-{0x1626, 0x9cc3, 0x03}, /* BDWULTGT3, BDWM_w7 */
>-{0x160E, 0x9cc3, 0x03}, /* BDWULXGT1, BDWM_w7 */
>-{0x161E, 0x9cc3, 0x03}, /* BDWULXGT2, BDWM_w7 */
>-{0x1602, 0x9cc3, 0x03}, /* BDWHALOGT1, BDWM_w7 */
>-{0x1612, 0x9cc3, 0x03}, /* BDWHALOGT2, BDWM_w7 */
>-{0x1622, 0x9cc3, 0x03}, /* BDWHALOGT3, BDWM_w7 */
>-{0x162B, 0x9cc3, 0x03}, /* BDWHALO28W, BDWM_w7 */
>-{0x162A, 0x9cc3, 0x03}, /* BDWGT3WRKS, BDWM_w7 */
>-{0x162D, 0x9cc3, 0x03}, /* BDWGT3SRVR, BDWM_w7 */
>-};
>-
>-static void isa_bridge_class_init(ObjectClass *klass, void *data)
>-{
>-DeviceClass *dc = DEVICE_CLASS(klass);
>-PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>-
>-dc->desc= "ISA bridge faked to support IGD PT";
>-set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>-k->vendor_id= PCI_VENDOR_ID_INTEL;
>-k->class_id = PCI_CLASS_BRIDGE_ISA;
>-};
>-
>-static const TypeInfo isa_bridge_info = {
>-.name  = "igd-passthrough-isa-bridge",
>-.parent= TYPE_PCI_DEVICE,
>-.instance_size = sizeof(PCIDevice),
>-.class_init = isa_bridge_class_init,
>-.interfaces = (InterfaceInfo[]) {
>-{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
>-{ },
>-},
>-};
>-
>-static void pt_graphics_register_types(void)
>-{
>-type_register_static(_bridge_info);
>-}
>-type_init(pt_graphics_register_types)
>-
>-void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t gpu_dev_id)
>-{
>-struct PCIDevice *bridge_dev;
>-int i, num;
>-uint16_t pch_dev_id = 0x;
>-uint8_t pch_rev_id = 0;
>-
>-num = ARRAY_SIZE(igd_combo_id_infos);
>-for (i = 0; i < num; i++) {
>-if (gpu_dev_id == igd_combo_id_infos[i].gpu_device_id) {
>-pch_dev_id = igd_combo_id_infos[i].pch_device_id;
>-pch_rev_id = igd_combo_id_infos[i].pch_revision_id;
>-}
>-}
>-
>-if (pch_dev_id == 0x) {
>-return;
>-}
>-
>-/* Currently IGD drivers always need to access PCH by 1f.0. */
>-bridge_dev = 

Re: [PATCH v4 8/8] x86/boot: fold two MOVs into an ADD

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:51:02AM +0200, Jan Beulich wrote:
> There's no point going through %ax; the addition can be done directly in
> %di.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH v4 7/8] x86/boot: LEA -> MOV in video handling code

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:50:20AM +0200, Jan Beulich wrote:
> Replace most LEA instances with (one byte shorter) MOV.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH v4 6/8] x86/boot: fold/replace moves in video handling code

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:50:00AM +0200, Jan Beulich wrote:
> Replace (mainly) MOV forms with shorter insns (or sequences thereof).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none

2022-04-05 Thread Andrew Cooper
On 05/04/2022 12:04, Jan Beulich wrote:
> On 05.04.2022 12:58, Andrew Cooper wrote:
>> On 05/04/2022 11:18, Jan Beulich wrote:
>>> On 01.04.2022 17:05, Andrew Cooper wrote:
 On 01/04/2022 15:48, Andrew Cooper wrote:
> On 01/04/2022 15:37, Roger Pau Monne wrote:
>> Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the
>> Makefile doesn't get it propagated to the subdirectories, so instead
>> set the flag in firmware/Rules.mk, like it's done for other compiler
>> flags.
>>
>> Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT')
>> Signed-off-by: Roger Pau Monné 
> Acked-by: Andrew Cooper 
 This also needs backporting with the XSA-398 CET-IBT fixes.
>>> I don't think so - the backports of the original commit didn't include
>>> what this patch fixes. I have queued patch 2 of this series though.
>> In which case I screwed up the backport.  (I remember spotting this bug
>> and thought I'd corrected it, but clearly not.)  tools/firmware really
>> does need to be -fcf-protection=none to counteract the defaults in
>> Ubuntu/etc.
> Okay, I'll adjust title and description some then while doing the backport.

Thanks, and sorry for this mess.

~Andrew


Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none

2022-04-05 Thread Jan Beulich
On 05.04.2022 12:58, Andrew Cooper wrote:
> On 05/04/2022 11:18, Jan Beulich wrote:
>> On 01.04.2022 17:05, Andrew Cooper wrote:
>>> On 01/04/2022 15:48, Andrew Cooper wrote:
 On 01/04/2022 15:37, Roger Pau Monne wrote:
> Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the
> Makefile doesn't get it propagated to the subdirectories, so instead
> set the flag in firmware/Rules.mk, like it's done for other compiler
> flags.
>
> Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT')
> Signed-off-by: Roger Pau Monné 
 Acked-by: Andrew Cooper 
>>> This also needs backporting with the XSA-398 CET-IBT fixes.
>> I don't think so - the backports of the original commit didn't include
>> what this patch fixes. I have queued patch 2 of this series though.
> 
> In which case I screwed up the backport.  (I remember spotting this bug
> and thought I'd corrected it, but clearly not.)  tools/firmware really
> does need to be -fcf-protection=none to counteract the defaults in
> Ubuntu/etc.

Okay, I'll adjust title and description some then while doing the backport.

Jan




Re: Increasing domain memory beyond initial maxmem

2022-04-05 Thread Juergen Gross

Hi Marek,

On 31.03.22 14:36, Marek Marczykowski-Górecki wrote:

On Thu, Mar 31, 2022 at 02:22:03PM +0200, Juergen Gross wrote:

Maybe some kernel config differences, or other udev rules (memory onlining
is done via udev in my guest)?

I'm seeing:

# zgrep MEMORY_HOTPLUG /proc/config.gz
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG=y
# CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE is not set
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512


I have:
# zgrep MEMORY_HOTPLUG /proc/config.gz
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
CONFIG_XEN_MEMORY_HOTPLUG_LIMIT=512

Not sure if relevant, but I also have:
CONFIG_XEN_UNPOPULATED_ALLOC=y

on top of that, I have a similar udev rule too:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

But I don't think they are conflicting.


What type of guest are you using? Mine was a PVH guest.


PVH here too.


Would you like to try the attached patch? It seemed to work for me.


Juergen
From a605232115a9c3d3f8103d0833b149ff22956c4b Mon Sep 17 00:00:00 2001
From: Juergen Gross 
To: linux-ker...@vger.kernel.org
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: xen-devel@lists.xenproject.org
Date: Tue, 5 Apr 2022 12:43:41 +0200
Subject: [PATCH] xen/balloon: fix page onlining when populating new zone
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When onlining a new memory page in a guest the Xen balloon driver is
adding it to the ballooned pages instead making it available to be
used immediately. This is meant to enable to add a new upper memory
limit to a guest via hotplugging memory, without having to assign the
new memory in one go.

In case the upper memory limit will be raised above 4G, the new memory
will populate the ZONE_NORMAL memory zone, which wasn't populated
before. The newly populated zone won't be added to the list of zones
looked at by the page allocator though, as only zones with available
memory are being added, and the memory isn't yet available as it is
ballooned out.

This will result in the new memory being assigned to the guest, but
without the allocator being able to use it.

When running as a PV guest the situation is even worse: when having
been started with less memory than allowed, and the upper limit being
lower than 4G, ballooning up will have the same effect as hotplugging
new memory. This is due to the usage of the zone device functionality
since commit 9e2369c06c8a ("xen: add helpers to allocate unpopulated
memory") for creating mappings of other guest's pages, which as a side
effect is being used for PV guest ballooning, too.

Fix this by checking in xen_online_page() whether the new memory page
will be the first in a new zone. If this is the case, add another page
to the balloon and use the first memory page of the new chunk as a
replacement for this now ballooned out page. This will result in the
newly populated zone containing one page being available for the page
allocator, which in turn will lead to the zone being added to the
allocator.

Cc: sta...@vger.kernel.org
Fixes: 9e2369c06c8a ("xen: add helpers to allocate unpopulated memory")
Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: Juergen Gross 
---
 drivers/xen/balloon.c | 72 ++-
 1 file changed, 65 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index dfe26fa17e95..f895c54c4c65 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -355,14 +355,77 @@ static enum bp_state reserve_additional_memory(void)
 	return BP_ECANCELED;
 }
 
+static struct page *alloc_page_for_balloon(gfp_t gfp)
+{
+	struct page *page;
+
+	page = alloc_page(gfp);
+	if (page == NULL)
+		return NULL;
+
+	adjust_managed_page_count(page, -1);
+	xenmem_reservation_scrub_page(page);
+
+	return page;
+}
+
+static void add_page_to_balloon(struct page *page)
+{
+	xenmem_reservation_va_mapping_reset(1, );
+	balloon_append(page);
+}
+
 static void xen_online_page(struct page *page, unsigned int order)
 {
 	unsigned long i, size = (1 << order);
 	unsigned long start_pfn = page_to_pfn(page);
 	struct page *p;
+	struct zone *zone;
 
 	pr_debug("Online %lu pages starting at pfn 0x%lx\n", size, start_pfn);
 	mutex_lock(_mutex);
+	zone = page_zone(pfn_to_page(start_pfn));
+
+	/*
+	 * In case a new memory zone is going to be populated, we need to
+	 * ensure at least one page is made available for the memory allocator.
+	 * As the number of pages per zone is updated only after a batch of
+	 * pages having been added, use the number of managed pages as an
+	 * additional indicator for a new zone.
+	 * Otherwise this zone won't be added to the zonelist resulting in the
+	 * zone's memory not usable by the kernel.
+	 * Add an already valid page to the balloon and replace it with the
+	 * first page of the to be added new 

Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none

2022-04-05 Thread Andrew Cooper
On 05/04/2022 11:18, Jan Beulich wrote:
> On 01.04.2022 17:05, Andrew Cooper wrote:
>> On 01/04/2022 15:48, Andrew Cooper wrote:
>>> On 01/04/2022 15:37, Roger Pau Monne wrote:
 Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the
 Makefile doesn't get it propagated to the subdirectories, so instead
 set the flag in firmware/Rules.mk, like it's done for other compiler
 flags.

 Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT')
 Signed-off-by: Roger Pau Monné 
>>> Acked-by: Andrew Cooper 
>> This also needs backporting with the XSA-398 CET-IBT fixes.
> I don't think so - the backports of the original commit didn't include
> what this patch fixes. I have queued patch 2 of this series though.

In which case I screwed up the backport.  (I remember spotting this bug
and thought I'd corrected it, but clearly not.)  tools/firmware really
does need to be -fcf-protection=none to counteract the defaults in
Ubuntu/etc.

~Andrew


Re: [PATCH v4 2/8] x86/boot: obtain video info from boot loader

2022-04-05 Thread Jan Beulich
On 05.04.2022 11:35, Roger Pau Monné wrote:
> On Thu, Mar 31, 2022 at 11:45:02AM +0200, Jan Beulich wrote:
>> --- a/xen/arch/x86/boot/head.S
>> +++ b/xen/arch/x86/boot/head.S
>> @@ -562,12 +562,18 @@ trampoline_setup:
>>  mov %esi, sym_esi(xen_phys_start)
>>  mov %esi, sym_esi(trampoline_xen_phys_start)
>>  
>> -mov sym_esi(trampoline_phys), %ecx
>> -
>>  /* Get bottom-most low-memory stack address. */
>> +mov sym_esi(trampoline_phys), %ecx
>>  add $TRAMPOLINE_SPACE,%ecx
> 
> Just for my understanding, since you are already touching the
> instruction, why not switch it to a lea like you do below?
> 
> Is that because you would also like to take the opportunity to fold
> the add into the lea and that would be too much of a change?

No. This MOV cannot be converted, as its source operand isn't an
immediate (or register); such a conversion would also be undesirable,
for increasing insn size. See the later patch doing conversions in
the other direction, to reduce code size. Somewhat similarly ...

>> +#ifdef CONFIG_VIDEO
>> +lea sym_esi(boot_vid_info), %edx

... this LEA also cannot be expressed by a single MOV.

>> @@ -32,6 +33,39 @@ asm (
>>  #include "../../../include/xen/kconfig.h"
>>  #include 
>>  
>> +#ifdef CONFIG_VIDEO
>> +# include "video.h"
>> +
>> +/* VESA control information */
>> +struct __packed vesa_ctrl_info {
>> +uint8_t signature[4];
>> +uint16_t version;
>> +uint32_t oem_name;
>> +uint32_t capabilities;
>> +uint32_t mode_list;
>> +uint16_t mem_size;
>> +/* We don't use any further fields. */
>> +};
>> +
>> +/* VESA 2.0 mode information */
>> +struct vesa_mode_info {
> 
> Should we add __packed here just in case further added fields are no
> longer naturally aligned? (AFAICT all field right now are aligned to
> it's size so there's no need for it).

I think we should avoid __packed whenever possible.

>> +uint16_t attrib;
>> +uint8_t window[14]; /* We don't use the individual fields. */
>> +uint16_t bytes_per_line;
>> +uint16_t width;
>> +uint16_t height;
>> +uint8_t cell_width;
>> +uint8_t cell_height;
>> +uint8_t nr_planes;
>> +uint8_t depth;
>> +uint8_t memory[5]; /* We don't use the individual fields. */
>> +struct boot_video_colors colors;
>> +uint8_t direct_color;
>> +uint32_t base;
>> +/* We don't use any further fields. */
>> +};
> 
> Would it make sense to put those struct definitions in boot/video.h
> like you do for boot_video_info?

Personally I prefer to expose things in headers only when multiple
other files want to consume what is being declared/defined.

>> @@ -254,17 +291,64 @@ static multiboot_info_t *mbi2_reloc(u32
>>  ++mod_idx;
>>  break;
>>  
>> +#ifdef CONFIG_VIDEO
>> +case MULTIBOOT2_TAG_TYPE_VBE:
>> +if ( video_out )
>> +{
>> +const struct vesa_ctrl_info *ci;
>> +const struct vesa_mode_info *mi;
>> +
>> +video = _p(video_out);
>> +ci = (void *)get_mb2_data(tag, vbe, vbe_control_info);
>> +mi = (void *)get_mb2_data(tag, vbe, vbe_mode_info);
>> +
>> +if ( ci->version >= 0x0200 && (mi->attrib & 0x9b) == 0x9b )
>> +{
>> +video->capabilities = ci->capabilities;
>> +video->lfb_linelength = mi->bytes_per_line;
>> +video->lfb_width = mi->width;
>> +video->lfb_height = mi->height;
>> +video->lfb_depth = mi->depth;
>> +video->lfb_base = mi->base;
>> +video->lfb_size = ci->mem_size;
>> +video->colors = mi->colors;
>> +video->vesa_attrib = mi->attrib;
>> +}
>> +
>> +video->vesapm.seg = get_mb2_data(tag, vbe, 
>> vbe_interface_seg);
>> +video->vesapm.off = get_mb2_data(tag, vbe, 
>> vbe_interface_off);
>> +}
>> +break;
>> +
>> +case MULTIBOOT2_TAG_TYPE_FRAMEBUFFER:
>> +if ( (get_mb2_data(tag, framebuffer, framebuffer_type) !=
>> +  MULTIBOOT2_FRAMEBUFFER_TYPE_RGB) )
>> +{
>> +video_out = 0;
>> +video = NULL;
>> +}
> 
> I'm confused, don't you need to store the information in the
> framebuffer tag for use after relocation?

If there was a consumer - yes. Right now this tag is used only to
invalidate the information taken from the other tag (or to suppress
taking values from there if that other tag came later) in case the
framebuffer type doesn't match what we support.

>> +break;
>> +#endif /* CONFIG_VIDEO */
>> +
>>  case MULTIBOOT2_TAG_TYPE_END:
>> -return mbi_out;
>> +goto end; /* Cannot "break;" here. */
>>  
>>  default:
>>  break;
>>  }
>>  

Re: [PATCH v4 5/8] x86/boot: fold branches in video handling code

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:49:24AM +0200, Jan Beulich wrote:
> Using Jcc to branch around a JMP is necessary only in pre-386 code,
> where Jcc is limited to disp8. Use the opposite Jcc directly in two
> places. Since it's adjacent, also convert an ORB to TESTB.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH v4 4/8] x86/boot: simplify mode_table

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:48:51AM +0200, Jan Beulich wrote:
> There's no point in writing 80x25 text mode information via multiple
> insns all storing immediate values. The data can simply be included
> first thing in the vga_modes table, allowing the already present
> REP MOVSB to take care of everything in one go.
> 
> While touching this also correct a related but stale comment.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.



[seabios test] 169167: tolerable FAIL - PUSHED

2022-04-05 Thread osstest service owner
flight 169167 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169167/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168315
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168315
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168315
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168315
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168315
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass

version targeted for testing:
 seabios  01774004c7f7fdc9c1e8f1715f70d3b913f8d491
baseline version:
 seabios  d239552ce7220e448ae81f41515138f7b9e3c4db

Last test of basis   168315  2022-03-02 02:40:13 Z   34 days
Testing same since   169167  2022-04-04 21:41:47 Z0 days1 attempts


People who touched revisions under test:
  Volker Rümelin 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-qemuu-freebsd11-amd64   pass
 test-amd64-amd64-qemuu-freebsd12-amd64   pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-ws16-amd64 fail
 test-amd64-i386-xl-qemuu-ws16-amd64  fail
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictpass
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass
 test-amd64-amd64-qemuu-nested-intel  pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/seabios.git
   d239552..0177400  01774004c7f7fdc9c1e8f1715f70d3b913f8d491 -> 
xen-tested-master



Re: [PATCH v4 3/8] x86/EFI: retrieve EDID

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:45:36AM +0200, Jan Beulich wrote:
> When booting directly from EFI, obtaining this information from EFI is
> the only possible way. And even when booting with a boot loader
> interposed, it's more clean not to use legacy BIOS calls for this
> purpose. (The downside being that there are no "capabilities" that we
> can retrieve the EFI way.)
> 
> To achieve this we need to propagate the handle used to obtain the
> EFI_GRAPHICS_OUTPUT_PROTOCOL instance for further obtaining an
> EFI_EDID_*_PROTOCOL instance, which has been part of the spec since 2.5.
> 
> Signed-off-by: Jan Beulich 
> ---
> Setting boot_edid_caps to zero isn't desirable, but arbitrarily setting
> one or both of the two low bits also doesn't seem appropriate.
> 
> GrUB also checks an "agp-internal-edid" variable. As I haven't been able
> to find any related documentation, and as GrUB being happy about the
> variable being any size (rather than at least / precisely 128 bytes),
> I didn't follow that route.
> ---
> v3: Re-base.
> v2: New.
> 
> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -464,6 +464,10 @@ static void __init efi_arch_edd(void)
>  {
>  }
>  
> +static void __init efi_arch_edid(EFI_HANDLE gop_handle)
> +{
> +}
> +
>  static void __init efi_arch_memory_setup(void)
>  {
>  }
> --- a/xen/arch/x86/boot/video.S
> +++ b/xen/arch/x86/boot/video.S
> @@ -922,7 +922,14 @@ store_edid:
>  pushw   %dx
>  pushw   %di
>  
> -cmpb$1, bootsym(opt_edid)   # EDID disabled on cmdline (edid=no)?
> +movbbootsym(opt_edid), %al
> +cmpw$0x1313, bootsym(boot_edid_caps) # Data already retrieved?
> +je  .Lcheck_edid
> +cmpb$2, %al # EDID forced on cmdline 
> (edid=force)?
> +jne .Lno_edid
> +
> +.Lcheck_edid:
> +cmpb$1, %al # EDID disabled on cmdline (edid=no)?
>  je  .Lno_edid
>  
>  leawvesa_glob_info, %di
> --- a/xen/arch/x86/efi/efi-boot.h
> +++ b/xen/arch/x86/efi/efi-boot.h
> @@ -568,6 +568,49 @@ static void __init efi_arch_video_init(E
>  #endif
>  }
>  
> +#ifdef CONFIG_VIDEO
> +static bool __init copy_edid(const void *buf, unsigned int size)
> +{
> +/*
> + * Be conservative - for both undersized and oversized blobs it is 
> unclear
> + * what to actually do with them. The more that unlike the VESA BIOS
> + * interface we also have no associated "capabilities" value (which might
> + * carry a hint as to possible interpretation).
> + */
> +if ( size != ARRAY_SIZE(boot_edid_info) )
> +return false;
> +
> +memcpy(boot_edid_info, buf, size);
> +boot_edid_caps = 0;
> +
> +return true;
> +}
> +#endif
> +
> +static void __init efi_arch_edid(EFI_HANDLE gop_handle)
> +{
> +#ifdef CONFIG_VIDEO
> +static EFI_GUID __initdata active_guid = EFI_EDID_ACTIVE_PROTOCOL_GUID;
> +static EFI_GUID __initdata discovered_guid = 
> EFI_EDID_DISCOVERED_PROTOCOL_GUID;

Is there a need to make those static?

I think this function is either called from efi_start or
efi_multiboot, but there aren't multiple calls to it? (also both
parameters are IN only, so not to be changed by the EFI method?

I have the feeling setting them to static is done because they can't
be set to const?

> +EFI_EDID_ACTIVE_PROTOCOL *active_edid;
> +EFI_EDID_DISCOVERED_PROTOCOL *discovered_edid;
> +EFI_STATUS status;
> +
> +status = efi_bs->OpenProtocol(gop_handle, _guid,
> +  (void **)_edid, efi_ih, NULL,
> +  EFI_OPEN_PROTOCOL_GET_PROTOCOL);
> +if ( status == EFI_SUCCESS &&
> + copy_edid(active_edid->Edid, active_edid->SizeOfEdid) )
> +return;

Isn't it enough to just call EFI_EDID_ACTIVE_PROTOCOL_GUID?

>From my reading of the UEFI spec this will either return
EFI_EDID_OVERRIDE_PROTOCOL_GUID or EFI_EDID_DISCOVERED_PROTOCOL_GUID.
If EFI_EDID_OVERRIDE_PROTOCOL is set it must be used, and hence
falling back to EFI_EDID_DISCOVERED_PROTOCOL_GUID if
EFI_EDID_ACTIVE_PROTOCOL_GUID cannot be parsed would likely mean
ignoring EFI_EDID_OVERRIDE_PROTOCOL?

> +status = efi_bs->OpenProtocol(gop_handle, _guid,
> +  (void **)_edid, efi_ih, NULL,
> +  EFI_OPEN_PROTOCOL_GET_PROTOCOL);
> +if ( status == EFI_SUCCESS )
> +copy_edid(discovered_edid->Edid, discovered_edid->SizeOfEdid);
> +#endif
> +}
> +
>  static void __init efi_arch_memory_setup(void)
>  {
>  unsigned int i;
> @@ -729,6 +772,7 @@ static void __init efi_arch_flush_dcache
>  void __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE 
> *SystemTable)
>  {
>  EFI_GRAPHICS_OUTPUT_PROTOCOL *gop;
> +EFI_HANDLE gop_handle;
>  UINTN cols, gop_mode = ~0, rows;
>  
>  __set_bit(EFI_BOOT, _flags);
> @@ -742,11 +786,15 @@ void __init efi_multiboot2(EFI_HANDLE Im
> , ) == 

Re: [PATCH 1/2] tools/firmware: fix setting of fcf-protection=none

2022-04-05 Thread Jan Beulich
On 01.04.2022 17:05, Andrew Cooper wrote:
> On 01/04/2022 15:48, Andrew Cooper wrote:
>> On 01/04/2022 15:37, Roger Pau Monne wrote:
>>> Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the
>>> Makefile doesn't get it propagated to the subdirectories, so instead
>>> set the flag in firmware/Rules.mk, like it's done for other compiler
>>> flags.
>>>
>>> Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT')
>>> Signed-off-by: Roger Pau Monné 
>> Acked-by: Andrew Cooper 
> 
> This also needs backporting with the XSA-398 CET-IBT fixes.

I don't think so - the backports of the original commit didn't include
what this patch fixes. I have queued patch 2 of this series though.

Jan




Re: [PATCH v4 4/9] xen: export evtchn_alloc_unbound

2022-04-05 Thread Jan Beulich
On 01.04.2022 02:38, Stefano Stabellini wrote:
> From: Stefano Stabellini 
> 
> It will be used during dom0less domains construction.
> 
> Signed-off-by: Stefano Stabellini 

I think this better wouldn't be a patch of its own. Functions should
be non-static only when they have a user outside of their defining TU.

> --- a/xen/include/xen/event.h
> +++ b/xen/include/xen/event.h
> @@ -71,6 +71,9 @@ void evtchn_free(struct domain *d, struct evtchn *chn);
>  /* Allocate a specific event channel port. */
>  int evtchn_allocate_port(struct domain *d, unsigned int port);
>  
> +/* Allocate a new event channel */
> +int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc);

I wonder whether while exposing it the function should also become
__must_check.

Jan




Re: [PATCH v4 2/8] x86/boot: obtain video info from boot loader

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:45:02AM +0200, Jan Beulich wrote:
> With MB2 the boot loader may provide this information, allowing us to
> obtain it without needing to enter real mode (assuming we don't need to
> set a new mode from "vga=", but can instead inherit the one the
> bootloader may have established).
> 
> Signed-off-by: Jan Beulich 
> ---
> v4: Re-base.
> v3: Re-base.
> v2: New.
> 
> --- a/xen/arch/x86/boot/defs.h
> +++ b/xen/arch/x86/boot/defs.h
> @@ -53,6 +53,7 @@ typedef unsigned int u32;
>  typedef unsigned long long u64;
>  typedef unsigned int size_t;
>  typedef u8 uint8_t;
> +typedef u16 uint16_t;
>  typedef u32 uint32_t;
>  typedef u64 uint64_t;
>  
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -562,12 +562,18 @@ trampoline_setup:
>  mov %esi, sym_esi(xen_phys_start)
>  mov %esi, sym_esi(trampoline_xen_phys_start)
>  
> -mov sym_esi(trampoline_phys), %ecx
> -
>  /* Get bottom-most low-memory stack address. */
> +mov sym_esi(trampoline_phys), %ecx
>  add $TRAMPOLINE_SPACE,%ecx

Just for my understanding, since you are already touching the
instruction, why not switch it to a lea like you do below?

Is that because you would also like to take the opportunity to fold
the add into the lea and that would be too much of a change?

>  
> +#ifdef CONFIG_VIDEO
> +lea sym_esi(boot_vid_info), %edx
> +#else
> +xor %edx, %edx
> +#endif
> +
>  /* Save Multiboot / PVH info struct (after relocation) for later 
> use. */
> +push%edx/* Boot video info to be filled from 
> MB2. */
>  push%ecx/* Bottom-most low-memory stack address. 
> */
>  push%ebx/* Multiboot / PVH information address. 
> */
>  push%eax/* Magic number. */
> --- a/xen/arch/x86/boot/reloc.c
> +++ b/xen/arch/x86/boot/reloc.c
> @@ -14,9 +14,10 @@
>  
>  /*
>   * This entry point is entered from xen/arch/x86/boot/head.S with:
> - *   - 0x4(%esp) = MAGIC,
> - *   - 0x8(%esp) = INFORMATION_ADDRESS,
> - *   - 0xc(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
> + *   - 0x04(%esp) = MAGIC,
> + *   - 0x08(%esp) = INFORMATION_ADDRESS,
> + *   - 0x0c(%esp) = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
> + *   - 0x10(%esp) = BOOT_VIDEO_INFO_ADDRESS.
>   */
>  asm (
>  ".text \n"
> @@ -32,6 +33,39 @@ asm (
>  #include "../../../include/xen/kconfig.h"
>  #include 
>  
> +#ifdef CONFIG_VIDEO
> +# include "video.h"
> +
> +/* VESA control information */
> +struct __packed vesa_ctrl_info {
> +uint8_t signature[4];
> +uint16_t version;
> +uint32_t oem_name;
> +uint32_t capabilities;
> +uint32_t mode_list;
> +uint16_t mem_size;
> +/* We don't use any further fields. */
> +};
> +
> +/* VESA 2.0 mode information */
> +struct vesa_mode_info {

Should we add __packed here just in case further added fields are no
longer naturally aligned? (AFAICT all field right now are aligned to
it's size so there's no need for it).

> +uint16_t attrib;
> +uint8_t window[14]; /* We don't use the individual fields. */
> +uint16_t bytes_per_line;
> +uint16_t width;
> +uint16_t height;
> +uint8_t cell_width;
> +uint8_t cell_height;
> +uint8_t nr_planes;
> +uint8_t depth;
> +uint8_t memory[5]; /* We don't use the individual fields. */
> +struct boot_video_colors colors;
> +uint8_t direct_color;
> +uint32_t base;
> +/* We don't use any further fields. */
> +};

Would it make sense to put those struct definitions in boot/video.h
like you do for boot_video_info?

I also wonder whether you could then hide the #ifdef CONFIG_VIDEO
check inside of the header itself.

> +#endif /* CONFIG_VIDEO */
> +
>  #define get_mb2_data(tag, type, member)   (((multiboot2_tag_##type##_t 
> *)(tag))->member)
>  #define get_mb2_string(tag, type, member) ((u32)get_mb2_data(tag, type, 
> member))
>  
> @@ -146,7 +180,7 @@ static multiboot_info_t *mbi_reloc(u32 m
>  return mbi_out;
>  }
>  
> -static multiboot_info_t *mbi2_reloc(u32 mbi_in)
> +static multiboot_info_t *mbi2_reloc(uint32_t mbi_in, uint32_t video_out)
>  {
>  const multiboot2_fixed_t *mbi_fix = _p(mbi_in);
>  const multiboot2_memory_map_t *mmap_src;
> @@ -154,6 +188,9 @@ static multiboot_info_t *mbi2_reloc(u32
>  module_t *mbi_out_mods = NULL;
>  memory_map_t *mmap_dst;
>  multiboot_info_t *mbi_out;
> +#ifdef CONFIG_VIDEO
> +struct boot_video_info *video = NULL;
> +#endif
>  u32 ptr;
>  unsigned int i, mod_idx = 0;
>  
> @@ -254,17 +291,64 @@ static multiboot_info_t *mbi2_reloc(u32
>  ++mod_idx;
>  break;
>  
> +#ifdef CONFIG_VIDEO
> +case MULTIBOOT2_TAG_TYPE_VBE:
> +if ( video_out )
> +{
> +const struct vesa_ctrl_info *ci;
> +const struct vesa_mode_info *mi;
> +
> +video = 

Re: [PATCH v4 2/2] xen: Populate xen.lds.h and make use of its macros

2022-04-05 Thread Jan Beulich
On 05.04.2022 11:16, Michal Orzel wrote:
> Populate header file xen.lds.h with the first portion of macros storing
> constructs common to x86 and arm linker scripts. Replace the original
> constructs with these helpers.
> 
> No functional improvements to x86 linker script.
> 
> Making use of common macros improves arm linker script with:
> - explicit list of debug sections that otherwise are seen as "orphans"
>   by the linker. This will allow to fix issues after enabling linker
>   option --orphan-handling one day,
> - extended list of discarded section to include: .discard, destructors
>   related sections, .fini_array which can reference .text.exit,
> - sections not related to debugging that are placed by ld.lld. Even
>   though we do not support linking with LLD on Arm, these sections do
>   not cause problem to GNU ld.
> 
> Please note that this patch does not aim to perform the full sync up
> between the linker scripts. It creates a base for further work.
> 
> Signed-off-by: Michal Orzel 

Reviewed-by: Jan Beulich 




[PATCH v4 2/2] xen: Populate xen.lds.h and make use of its macros

2022-04-05 Thread Michal Orzel
Populate header file xen.lds.h with the first portion of macros storing
constructs common to x86 and arm linker scripts. Replace the original
constructs with these helpers.

No functional improvements to x86 linker script.

Making use of common macros improves arm linker script with:
- explicit list of debug sections that otherwise are seen as "orphans"
  by the linker. This will allow to fix issues after enabling linker
  option --orphan-handling one day,
- extended list of discarded section to include: .discard, destructors
  related sections, .fini_array which can reference .text.exit,
- sections not related to debugging that are placed by ld.lld. Even
  though we do not support linking with LLD on Arm, these sections do
  not cause problem to GNU ld.

Please note that this patch does not aim to perform the full sync up
between the linker scripts. It creates a base for further work.

Signed-off-by: Michal Orzel 
---
Changes since v3:
-use POINTER_ALIGN in debug sections when needed
-modify comment about ELF_DETAILS_SECTIONS
Changes since v2:
-refactor commit msg
-move constructs together with surrounding ifdefery
-list constructs other than *_SECTIONS in alphabetical order
-add comment about EFI vs EFI support
Changes since v1:
-merge x86 and arm changes into single patch
-do not propagate issues by generalizing CTORS
-extract sections not related to debugging into separate macro
-get rid of _SECTION suffix in favor of using more meaningful suffixes
---
 xen/arch/arm/xen.lds.S|  44 +++--
 xen/arch/x86/xen.lds.S|  96 +++-
 xen/include/xen/xen.lds.h | 129 ++
 3 files changed, 147 insertions(+), 122 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index c666fc3e69..649aa04f7f 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -68,12 +68,7 @@ SECTIONS
*(.proc.info)
__proc_info_end = .;
 
-#ifdef CONFIG_HAS_VPCI
-   . = ALIGN(POINTER_ALIGN);
-   __start_vpci_array = .;
-   *(SORT(.data.vpci.*))
-   __end_vpci_array = .;
-#endif
+   VPCI_ARRAY
   } :text
 
 #if defined(BUILD_ID)
@@ -109,12 +104,7 @@ SECTIONS
*(.data.schedulers)
__end_schedulers_array = .;
 
-#ifdef CONFIG_HYPFS
-   . = ALIGN(8);
-   __paramhypfs_start = .;
-   *(.data.paramhypfs)
-   __paramhypfs_end = .;
-#endif
+   HYPFS_PARAM
 
*(.data .data.*)
CONSTRUCTORS
@@ -178,12 +168,7 @@ SECTIONS
*(.altinstructions)
__alt_instructions_end = .;
 
-#ifdef CONFIG_DEBUG_LOCK_PROFILE
-   . = ALIGN(POINTER_ALIGN);
-   __lock_profile_start = .;
-   *(.lockprofile.data)
-   __lock_profile_end = .;
-#endif
+   LOCK_PROFILE_DATA
 
*(.init.data)
*(.init.data.rel)
@@ -222,22 +207,13 @@ SECTIONS
   /* Section for the device tree blob (if any). */
   .dtb : { *(.dtb) } :text
 
-  /* Sections to be discarded */
-  /DISCARD/ : {
-   *(.exit.text)
-   *(.exit.data)
-   *(.exitcall.exit)
-   *(.eh_frame)
-  }
-
-  /* Stabs debugging sections.  */
-  .stab 0 : { *(.stab) }
-  .stabstr 0 : { *(.stabstr) }
-  .stab.excl 0 : { *(.stab.excl) }
-  .stab.exclstr 0 : { *(.stab.exclstr) }
-  .stab.index 0 : { *(.stab.index) }
-  .stab.indexstr 0 : { *(.stab.indexstr) }
-  .comment 0 : { *(.comment) }
+  DWARF2_DEBUG_SECTIONS
+
+  DISCARD_SECTIONS
+
+  STABS_DEBUG_SECTIONS
+
+  ELF_DETAILS_SECTIONS
 }
 
 /*
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 3e65c09bb3..65cc4c9231 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -13,13 +13,6 @@
 #undef __XEN_VIRT_START
 #define __XEN_VIRT_START __image_base__
 #define DECL_SECTION(x) x :
-/*
- * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld
- * for PE output, in order to record that we'd prefer these sections to not
- * be loaded into memory.
- */
-#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) }
-#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) }
 
 ENTRY(efi_start)
 
@@ -27,8 +20,6 @@ ENTRY(efi_start)
 
 #define FORMAT "elf64-x86-64"
 #define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START)
-#define DECL_DEBUG(x, a) #x 0 : { *(x) }
-#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) }
 
 ENTRY(start_pa)
 
@@ -159,12 +150,7 @@ SECTIONS
*(.note.gnu.build-id)
__note_gnu_build_id_end = .;
 #endif
-#ifdef CONFIG_HAS_VPCI
-   . = ALIGN(POINTER_ALIGN);
-   __start_vpci_array = .;
-   *(SORT(.data.vpci.*))
-   __end_vpci_array = .;
-#endif
+   VPCI_ARRAY
   } PHDR(text)
 
 #if defined(CONFIG_PVH_GUEST) && !defined(EFI)
@@ -278,12 +264,7 @@ SECTIONS
 *(.altinstructions)
 __alt_instructions_end = .;
 
-#ifdef CONFIG_DEBUG_LOCK_PROFILE
-   . = ALIGN(POINTER_ALIGN);
-   __lock_profile_start = .;
-   *(.lockprofile.data)
-   __lock_profile_end = .;
-#endif
+   LOCK_PROFILE_DATA
 
. = ALIGN(8);

[PATCH v4 1/2] xen: Introduce a header to store common linker scripts content

2022-04-05 Thread Michal Orzel
Both x86 and arm linker scripts share quite a lot of common content.
It is difficult to keep syncing them up, thus introduce a new header
in include/xen called xen.lds.h to store the internals mutual to all
the linker scripts.

Include this header in linker scripts for x86 and arm.
This patch serves as an intermediate step before populating xen.lds.h
and making use of its content in the linker scripts later on.

Signed-off-by: Michal Orzel 
Acked-by: Jan Beulich 
---
Changes since v2,v3:
-none
Changes since v1:
-rename header to xen.lds.h to be coherent with Linux kernel
-include empty header in linker scripts
---
 xen/arch/arm/xen.lds.S| 1 +
 xen/arch/x86/xen.lds.S| 1 +
 xen/include/xen/xen.lds.h | 8 
 3 files changed, 10 insertions(+)
 create mode 100644 xen/include/xen/xen.lds.h

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 7921d8fa28..c666fc3e69 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -3,6 +3,7 @@
 /* Modified for ARM Xen by Ian Campbell */
 
 #include 
+#include 
 #include 
 #undef ENTRY
 #undef ALIGN
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 3f9f633f55..3e65c09bb3 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -2,6 +2,7 @@
 /* Modified for i386/x86-64 Xen by Keir Fraser */
 
 #include 
+#include 
 #include 
 #undef ENTRY
 #undef ALIGN
diff --git a/xen/include/xen/xen.lds.h b/xen/include/xen/xen.lds.h
new file mode 100644
index 00..dd292fa7dc
--- /dev/null
+++ b/xen/include/xen/xen.lds.h
@@ -0,0 +1,8 @@
+#ifndef __XEN_LDS_H__
+#define __XEN_LDS_H__
+
+/*
+ * Common macros to be used in architecture specific linker scripts.
+ */
+
+#endif /* __XEN_LDS_H__ */
-- 
2.25.1




[PATCH v4 0/2] xen: Linker scripts synchronization

2022-04-05 Thread Michal Orzel
This patch series aims to do the first step towards linker scripts
synchronization. Linker scripts for arm and x86 share a lot of common
sections and in order to make the process of changing/improving/syncing
them, these sections shall be defined in just one place.

The first patch creates an empty header file xen.lds.h to store the
constructs mutual to both x86 and arm linker scripts. It also includes
this header in the scripts.

The second patch populates xen.lds.h with the first portion of common
macros and replaces the original contructs with these helpers.

Michal Orzel (2):
  xen: Introduce a header to store common linker scripts content
  xen: Populate xen.lds.h and make use of its macros

 xen/arch/arm/xen.lds.S|  45 +++--
 xen/arch/x86/xen.lds.S|  97 +++
 xen/include/xen/xen.lds.h | 137 ++
 3 files changed, 157 insertions(+), 122 deletions(-)
 create mode 100644 xen/include/xen/xen.lds.h

-- 
2.25.1




Re: [PATCH v3 2/2] xen: Populate xen.lds.h and make use of its macros

2022-04-05 Thread Michal Orzel
Hi Jan,

On 05.04.2022 10:49, Jan Beulich wrote:
> On 31.03.2022 09:14, Michal Orzel wrote:
>> --- a/xen/include/xen/xen.lds.h
>> +++ b/xen/include/xen/xen.lds.h
>> @@ -5,4 +5,133 @@
>>   * Common macros to be used in architecture specific linker scripts.
>>   */
>>  
>> +/*
>> + * To avoid any confusion, please note that the EFI macro does not 
>> correspond
>> + * to EFI support and is used when linking a native EFI (i.e. PE/COFF) 
>> binary,
>> + * hence its usage in this header.
>> + */
>> +
>> +/* Macros to declare debug sections. */
>> +#ifdef EFI
>> +/*
>> + * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld
>> + * for PE output, in order to record that we'd prefer these sections to not
>> + * be loaded into memory.
>> + */
>> +#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) }
>> +#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) }
>> +#else
>> +#define DECL_DEBUG(x, a) #x 0 : { *(x) }
>> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) }
>> +#endif
>> +
>> +/*
>> + * DWARF2+ debug sections.
>> + * Explicitly list debug sections, first of all to avoid these sections 
>> being
>> + * viewed as "orphan" by the linker.
>> + *
>> + * For the PE output this is further necessary so that they don't end up at
>> + * VA 0, which is below image base and thus invalid. Note that this macro is
>> + * to be used after _end, so if these sections get loaded they'll be 
>> discarded
>> + * at runtime anyway.
>> + */
>> +#define DWARF2_DEBUG_SECTIONS \
>> +  DECL_DEBUG(.debug_abbrev, 1)\
>> +  DECL_DEBUG2(.debug_info, .gnu.linkonce.wi.*, 1) \
>> +  DECL_DEBUG(.debug_types, 1) \
>> +  DECL_DEBUG(.debug_str, 1)   \
>> +  DECL_DEBUG2(.debug_line, .debug_line.*, 1)  \
>> +  DECL_DEBUG(.debug_line_str, 1)  \
>> +  DECL_DEBUG(.debug_names, 4) \
>> +  DECL_DEBUG(.debug_frame, 4) \
>> +  DECL_DEBUG(.debug_loc, 1)   \
>> +  DECL_DEBUG(.debug_loclists, 4)  \
>> +  DECL_DEBUG(.debug_macinfo, 1)   \
>> +  DECL_DEBUG(.debug_macro, 1) \
>> +  DECL_DEBUG(.debug_ranges, 8)\
> 
> Here and ...
> 
>> +  DECL_DEBUG(.debug_rnglists, 4)  \
>> +  DECL_DEBUG(.debug_addr, 8)  \
> 
> ... here I think you also want to switch to POINTER_ALIGN.
> 
Ok, you're right.

>> +  DECL_DEBUG(.debug_aranges, 1)   \
>> +  DECL_DEBUG(.debug_pubnames, 1)  \
>> +  DECL_DEBUG(.debug_pubtypes, 1)
>> +
>> +/* Stabs debug sections. */
>> +#define STABS_DEBUG_SECTIONS \
>> +  .stab 0 : { *(.stab) } \
>> +  .stabstr 0 : { *(.stabstr) }   \
>> +  .stab.excl 0 : { *(.stab.excl) }   \
>> +  .stab.exclstr 0 : { *(.stab.exclstr) } \
>> +  .stab.index 0 : { *(.stab.index) } \
>> +  .stab.indexstr 0 : { *(.stab.indexstr) }
>> +
>> +/*
>> + * Required sections not related to debugging.
> 
> Nit: Perhaps better "Required ELF sections ..."? Personally I'd also
> drop the mentioning of debugging - that's not really relevant here.
> I'm also unsure about "Required" - .comment isn't really required.
> IOW ideally simply "ELF sections" or "Sections to be retained in ELF
> binaries" or some such.
> 
ELF sections is ok for me.

> Jan
> 

I will push updated series soon.

Cheers,
Michal



[PATCH v5 6/6] xen/cpupool: Allow cpupool0 to use different scheduler

2022-04-05 Thread Luca Fancellu
Currently cpupool0 can use only the default scheduler, and
cpupool_create has an hardcoded behavior when creating the pool 0
that doesn't allocate new memory for the scheduler, but uses the
default scheduler structure in memory.

With this commit it is possible to allocate a different scheduler for
the cpupool0 when using the boot time cpupool.
To achieve this the hardcoded behavior in cpupool_create is removed
and the cpupool0 creation is moved.

When compiling without boot time cpupools enabled, the current
behavior is maintained (except that cpupool0 scheduler memory will be
allocated).

Signed-off-by: Luca Fancellu 
---
Changes in v5:
- no changes
Changes in v4:
- no changes
Changes in v3:
- fix typo in commit message (Juergen)
- rebase changes
Changes in v2:
- new patch
---
 xen/common/boot_cpupools.c | 5 -
 xen/common/sched/cpupool.c | 8 +---
 xen/include/xen/sched.h| 5 -
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/common/boot_cpupools.c b/xen/common/boot_cpupools.c
index 1f940330a62d..a56baf3329a9 100644
--- a/xen/common/boot_cpupools.c
+++ b/xen/common/boot_cpupools.c
@@ -201,8 +201,11 @@ void __init btcpupools_allocate_pools(void)
 if ( add_extra_cpupool )
 next_pool_id++;
 
+/* Keep track of cpupool id 0 with the global cpupool0 */
+cpupool0 = cpupool_create_pool(0, pool_sched_map[0]);
+
 /* Create cpupools with selected schedulers */
-for ( i = 0; i < next_pool_id; i++ )
+for ( i = 1; i < next_pool_id; i++ )
 cpupool_create_pool(i, pool_sched_map[i]);
 }
 
diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 86a175f99cd5..83112f5f04d3 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -312,10 +312,7 @@ static struct cpupool *cpupool_create(unsigned int poolid,
 c->cpupool_id = q->cpupool_id + 1;
 }
 
-if ( poolid == 0 )
-c->sched = scheduler_get_default();
-else
-c->sched = scheduler_alloc(sched_id);
+c->sched = scheduler_alloc(sched_id);
 if ( IS_ERR(c->sched) )
 {
 ret = PTR_ERR(c->sched);
@@ -1242,9 +1239,6 @@ static int __init cf_check cpupool_init(void)
 
 cpupool_hypfs_init();
 
-cpupool0 = cpupool_create(0, 0);
-BUG_ON(IS_ERR(cpupool0));
-cpupool_put(cpupool0);
 register_cpu_notifier(_nfb);
 
 btcpupools_dtb_parse();
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b62315ad5e5d..e8f31758c058 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1185,7 +1185,10 @@ void btcpupools_dtb_parse(void);
 int btcpupools_get_domain_pool_id(const struct dt_device_node *node);
 
 #else /* !CONFIG_BOOT_TIME_CPUPOOLS */
-static inline void btcpupools_allocate_pools(void) {}
+static inline void btcpupools_allocate_pools(void)
+{
+cpupool0 = cpupool_create_pool(0, -1);
+}
 static inline void btcpupools_dtb_parse(void) {}
 static inline unsigned int btcpupools_get_cpupool_id(unsigned int cpu)
 {
-- 
2.17.1




[PATCH v5 3/6] xen/sched: retrieve scheduler id by name

2022-04-05 Thread Luca Fancellu
Add a static function to retrieve the scheduler pointer using the
scheduler name.

Add a public function to retrieve the scheduler id by the scheduler
name that makes use of the new static function.

Take the occasion to replace open coded scheduler search with the
new static function in scheduler_init.

Signed-off-by: Luca Fancellu 
Reviewed-by: Juergen Gross 
---
Changes in v5:
- no changes
Changes in v4:
- no changes
Changes in v3:
- add R-by
Changes in v2:
- replace open coded scheduler search in scheduler_init (Juergen)
---
 xen/common/sched/core.c | 40 ++--
 xen/include/xen/sched.h | 11 +++
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 19ab67818106..48ee01420fb8 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2947,10 +2947,30 @@ void scheduler_enable(void)
 scheduler_active = true;
 }
 
+static inline
+const struct scheduler *__init sched_get_by_name(const char *sched_name)
+{
+unsigned int i;
+
+for ( i = 0; i < NUM_SCHEDULERS; i++ )
+if ( schedulers[i] && !strcmp(schedulers[i]->opt_name, sched_name) )
+return schedulers[i];
+
+return NULL;
+}
+
+int __init sched_get_id_by_name(const char *sched_name)
+{
+const struct scheduler *scheduler = sched_get_by_name(sched_name);
+
+return scheduler ? scheduler->sched_id : -1;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
 struct domain *idle_domain;
+const struct scheduler *scheduler;
 int i;
 
 scheduler_enable();
@@ -2981,25 +3001,17 @@ void __init scheduler_init(void)
schedulers[i]->opt_name);
 schedulers[i] = NULL;
 }
-
-if ( schedulers[i] && !ops.name &&
- !strcmp(schedulers[i]->opt_name, opt_sched) )
-ops = *schedulers[i];
 }
 
-if ( !ops.name )
+scheduler = sched_get_by_name(opt_sched);
+if ( !scheduler )
 {
 printk("Could not find scheduler: %s\n", opt_sched);
-for ( i = 0; i < NUM_SCHEDULERS; i++ )
-if ( schedulers[i] &&
- !strcmp(schedulers[i]->opt_name, CONFIG_SCHED_DEFAULT) )
-{
-ops = *schedulers[i];
-break;
-}
-BUG_ON(!ops.name);
-printk("Using '%s' (%s)\n", ops.name, ops.opt_name);
+scheduler = sched_get_by_name(CONFIG_SCHED_DEFAULT);
+BUG_ON(!scheduler);
+printk("Using '%s' (%s)\n", scheduler->name, scheduler->opt_name);
 }
+ops = *scheduler;
 
 if ( cpu_schedule_up(0) )
 BUG();
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b07717987434..b527f141a1d3 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -758,6 +758,17 @@ void sched_destroy_domain(struct domain *d);
 long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *);
 long sched_adjust_global(struct xen_sysctl_scheduler_op *);
 int  sched_id(void);
+
+/*
+ * sched_get_id_by_name - retrieves a scheduler id given a scheduler name
+ * @sched_name: scheduler name as a string
+ *
+ * returns:
+ * positive value being the scheduler id, on success
+ * negative value if the scheduler name is not found.
+ */
+int sched_get_id_by_name(const char *sched_name);
+
 void vcpu_wake(struct vcpu *v);
 long vcpu_yield(void);
 void vcpu_sleep_nosync(struct vcpu *v);
-- 
2.17.1




[PATCH v5 5/6] arm/dom0less: assign dom0less guests to cpupools

2022-04-05 Thread Luca Fancellu
Introduce domain-cpupool property of a xen,domain device tree node,
that specifies the cpupool device tree handle of a xen,cpupool node
that identifies a cpupool created at boot time where the guest will
be assigned on creation.

Add member to the xen_domctl_createdomain public interface so the
XEN_DOMCTL_INTERFACE_VERSION version is bumped.

Add public function to retrieve a pool id from the device tree
cpupool node.

Update documentation about the property.

Signed-off-by: Luca Fancellu 
Reviewed-by: Stefano Stabellini 
---
Changes in v5:
- no changes
Changes in v4:
- no changes
- add R-by
Changes in v3:
- Use explicitely sized integer for struct xen_domctl_createdomain
  cpupool_id member. (Stefano)
- Changed code due to previous commit code changes
Changes in v2:
- Moved cpupool_id from arch specific to common part (Juergen)
- Implemented functions to retrieve the cpupool id from the
  cpupool dtb node.
---
 docs/misc/arm/device-tree/booting.txt |  5 +
 xen/arch/arm/domain_build.c   | 14 +-
 xen/common/boot_cpupools.c| 24 
 xen/common/domain.c   |  2 +-
 xen/include/public/domctl.h   |  4 +++-
 xen/include/xen/sched.h   |  9 +
 6 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index a94125394e35..7b4a29a2c293 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -188,6 +188,11 @@ with the following properties:
 An empty property to request the memory of the domain to be
 direct-map (guest physical address == physical address).
 
+- domain-cpupool
+
+Optional. Handle to a xen,cpupool device tree node that identifies the
+cpupool where the guest will be started at boot.
+
 Under the "xen,domain" compatible node, one or more sub-nodes are present
 for the DomU kernel and ramdisk.
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 8be01678de05..9c67a483d4a4 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -3172,7 +3172,8 @@ static int __init construct_domU(struct domain *d,
 void __init create_domUs(void)
 {
 struct dt_device_node *node;
-const struct dt_device_node *chosen = dt_find_node_by_path("/chosen");
+const struct dt_device_node *cpupool_node,
+*chosen = dt_find_node_by_path("/chosen");
 
 BUG_ON(chosen == NULL);
 dt_for_each_child_node(chosen, node)
@@ -3241,6 +3242,17 @@ void __init create_domUs(void)
  vpl011_virq - 32 + 1);
 }
 
+/* Get the optional property domain-cpupool */
+cpupool_node = dt_parse_phandle(node, "domain-cpupool", 0);
+if ( cpupool_node )
+{
+int pool_id = btcpupools_get_domain_pool_id(cpupool_node);
+if ( pool_id < 0 )
+panic("Error getting cpupool id from domain-cpupool (%d)\n",
+  pool_id);
+d_cfg.cpupool_id = pool_id;
+}
+
 /*
  * The variable max_init_domid is initialized with zero, so here it's
  * very important to use the pre-increment operator to call
diff --git a/xen/common/boot_cpupools.c b/xen/common/boot_cpupools.c
index 97c321386879..1f940330a62d 100644
--- a/xen/common/boot_cpupools.c
+++ b/xen/common/boot_cpupools.c
@@ -21,6 +21,8 @@ static unsigned int __initdata next_pool_id;
 
 #define BTCPUPOOLS_DT_NODE_NO_REG (-1)
 #define BTCPUPOOLS_DT_NODE_NO_LOG_CPU (-2)
+#define BTCPUPOOLS_DT_WRONG_NODE  (-3)
+#define BTCPUPOOLS_DT_CORRUPTED_NODE  (-4)
 
 static int __init get_logical_cpu_from_hw_id(unsigned int hwid)
 {
@@ -55,6 +57,28 @@ get_logical_cpu_from_cpu_node(const struct dt_device_node 
*cpu_node)
 return cpu_num;
 }
 
+int __init btcpupools_get_domain_pool_id(const struct dt_device_node *node)
+{
+const struct dt_device_node *phandle_node;
+int cpu_num;
+
+if ( !dt_device_is_compatible(node, "xen,cpupool") )
+return BTCPUPOOLS_DT_WRONG_NODE;
+/*
+ * Get first cpu listed in the cpupool, from its reg it's possible to
+ * retrieve the cpupool id.
+ */
+phandle_node = dt_parse_phandle(node, "cpupool-cpus", 0);
+if ( !phandle_node )
+return BTCPUPOOLS_DT_CORRUPTED_NODE;
+
+cpu_num = get_logical_cpu_from_cpu_node(phandle_node);
+if ( cpu_num < 0 )
+return cpu_num;
+
+return pool_cpu_map[cpu_num];
+}
+
 static int __init check_and_get_sched_id(const char* scheduler_name)
 {
 int sched_id = sched_get_id_by_name(scheduler_name);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 351029f8b239..0827400f4f49 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -698,7 +698,7 @@ struct domain *domain_create(domid_t domid,
 if ( !d->pbuf )
 goto fail;
 
-if ( (err = sched_init_domain(d, 0)) != 0 )
+ 

[PATCH v5 1/6] tools/cpupools: Give a name to unnamed cpupools

2022-04-05 Thread Luca Fancellu
With the introduction of boot time cpupools, Xen can create many
different cpupools at boot time other than cpupool with id 0.

Since these newly created cpupools can't have an
entry in Xenstore, create the entry using xen-init-dom0
helper with the usual convention: Pool-.

Given the change, remove the check for poolid == 0 from
libxl_cpupoolid_to_name(...).

Signed-off-by: Luca Fancellu 
Reviewed-by: Juergen Gross 
---
Changes in v5:
- no changes
Changes in v4:
- no changes
Changes in v3:
- no changes, add R-by
Changes in v2:
 - Remove unused variable, moved xc_cpupool_infofree
   ahead to simplify the code, use asprintf (Juergen)
---
 tools/helpers/xen-init-dom0.c  | 35 +-
 tools/libs/light/libxl_utils.c |  3 +--
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c
index c99224a4b607..84286617790f 100644
--- a/tools/helpers/xen-init-dom0.c
+++ b/tools/helpers/xen-init-dom0.c
@@ -43,7 +43,9 @@ int main(int argc, char **argv)
 int rc;
 struct xs_handle *xsh = NULL;
 xc_interface *xch = NULL;
-char *domname_string = NULL, *domid_string = NULL;
+char *domname_string = NULL, *domid_string = NULL, *pool_path, *pool_name;
+xc_cpupoolinfo_t *xcinfo;
+unsigned int pool_id = 0;
 libxl_uuid uuid;
 
 /* Accept 0 or 1 argument */
@@ -114,6 +116,37 @@ int main(int argc, char **argv)
 goto out;
 }
 
+/* Create an entry in xenstore for each cpupool on the system */
+do {
+xcinfo = xc_cpupool_getinfo(xch, pool_id);
+if (xcinfo != NULL) {
+if (xcinfo->cpupool_id != pool_id)
+pool_id = xcinfo->cpupool_id;
+xc_cpupool_infofree(xch, xcinfo);
+if (asprintf(_path, "/local/pool/%d/name", pool_id) <= 0) {
+fprintf(stderr, "cannot allocate memory for pool path\n");
+rc = 1;
+goto out;
+}
+if (asprintf(_name, "Pool-%d", pool_id) <= 0) {
+fprintf(stderr, "cannot allocate memory for pool name\n");
+rc = 1;
+goto out_err;
+}
+pool_id++;
+if (!xs_write(xsh, XBT_NULL, pool_path, pool_name,
+  strlen(pool_name))) {
+fprintf(stderr, "cannot set pool name\n");
+rc = 1;
+}
+free(pool_name);
+out_err:
+free(pool_path);
+if ( rc )
+goto out;
+}
+} while(xcinfo != NULL);
+
 printf("Done setting up Dom0\n");
 
 out:
diff --git a/tools/libs/light/libxl_utils.c b/tools/libs/light/libxl_utils.c
index b91c2cafa223..81780da3ff40 100644
--- a/tools/libs/light/libxl_utils.c
+++ b/tools/libs/light/libxl_utils.c
@@ -151,8 +151,7 @@ char *libxl_cpupoolid_to_name(libxl_ctx *ctx, uint32_t 
poolid)
 
 snprintf(path, sizeof(path), "/local/pool/%d/name", poolid);
 s = xs_read(ctx->xsh, XBT_NULL, path, );
-if (!s && (poolid == 0))
-return strdup("Pool-0");
+
 return s;
 }
 
-- 
2.17.1




[PATCH v5 4/6] xen/cpupool: Create different cpupools at boot time

2022-04-05 Thread Luca Fancellu
Introduce a way to create different cpupools at boot time, this is
particularly useful on ARM big.LITTLE system where there might be the
need to have different cpupools for each type of core, but also
systems using NUMA can have different cpu pools for each node.

The feature on arm relies on a specification of the cpupools from the
device tree to build pools and assign cpus to them.

Documentation is created to explain the feature.

Signed-off-by: Luca Fancellu 
---
Changes in v5:
- Fixed wrong variable name, swapped schedulers, add scheduler info
  in the printk (Stefano)
- introduce assert in cpupool_init and btcpupools_get_cpupool_id to
  harden the code
Changes in v4:
- modify Makefile to put in *.init.o, fixed stubs and macro (Jan)
- fixed docs, fix brakets (Stefano)
- keep cpu0 in Pool-0 (Julien)
- moved printk from btcpupools_allocate_pools to
  btcpupools_get_cpupool_id
- Add to docs constraint about cpu0 and Pool-0
Changes in v3:
- Add newline to cpupools.txt and removed "default n" from Kconfig (Jan)
- Fixed comment, moved defines, used global cpu_online_map, use
  HAS_DEVICE_TREE instead of ARM and place arch specific code in header
  (Juergen)
- Fix brakets, x86 code only panic, get rid of scheduler dt node, don't
  save pool pointer and look for it from the pool list (Stefano)
- Changed data structures to allow modification to the code.
Changes in v2:
- Move feature to common code (Juergen)
- Try to decouple dtb parse and cpupool creation to allow
  more way to specify cpupools (for example command line)
- Created standalone dt node for the scheduler so it can
  be used in future work to set scheduler specific
  parameters
- Use only auto generated ids for cpupools
---
 docs/misc/arm/device-tree/cpupools.txt | 136 +
 xen/arch/arm/include/asm/smp.h |   3 +
 xen/common/Kconfig |   7 +
 xen/common/Makefile|   1 +
 xen/common/boot_cpupools.c | 203 +
 xen/common/sched/cpupool.c |  12 +-
 xen/include/xen/sched.h|  14 ++
 7 files changed, 375 insertions(+), 1 deletion(-)
 create mode 100644 docs/misc/arm/device-tree/cpupools.txt
 create mode 100644 xen/common/boot_cpupools.c

diff --git a/docs/misc/arm/device-tree/cpupools.txt 
b/docs/misc/arm/device-tree/cpupools.txt
new file mode 100644
index ..5dac2b1384e0
--- /dev/null
+++ b/docs/misc/arm/device-tree/cpupools.txt
@@ -0,0 +1,136 @@
+Boot time cpupools
+==
+
+When BOOT_TIME_CPUPOOLS is enabled in the Xen configuration, it is possible to
+create cpupools during boot phase by specifying them in the device tree.
+
+Cpupools specification nodes shall be direct childs of /chosen node.
+Each cpupool node contains the following properties:
+
+- compatible (mandatory)
+
+Must always include the compatiblity string: "xen,cpupool".
+
+- cpupool-cpus (mandatory)
+
+Must be a list of device tree phandle to nodes describing cpus (e.g. having
+device_type = "cpu"), it can't be empty.
+
+- cpupool-sched (optional)
+
+Must be a string having the name of a Xen scheduler. Check the sched=<...>
+boot argument for allowed values.
+
+
+Constraints
+===
+
+If no cpupools are specified, all cpus will be assigned to one cpupool
+implicitly created (Pool-0).
+
+If cpupools node are specified, but not every cpu brought up by Xen is 
assigned,
+all the not assigned cpu will be assigned to an additional cpupool.
+
+If a cpu is assigned to a cpupool, but it's not brought up correctly, Xen will
+stop.
+
+The boot cpu must be assigned to Pool-0, so the cpupool containing that core
+will become Pool-0 automatically.
+
+
+Examples
+
+
+A system having two types of core, the following device tree specification will
+instruct Xen to have two cpupools:
+
+- The cpupool with id 0 will have 4 cpus assigned.
+- The cpupool with id 1 will have 2 cpus assigned.
+
+The following example can work only if hmp-unsafe=1 is passed to Xen boot
+arguments, otherwise not all cores will be brought up by Xen and the cpupool
+creation process will stop Xen.
+
+
+a72_1: cpu@0 {
+compatible = "arm,cortex-a72";
+reg = <0x0 0x0>;
+device_type = "cpu";
+[...]
+};
+
+a72_2: cpu@1 {
+compatible = "arm,cortex-a72";
+reg = <0x0 0x1>;
+device_type = "cpu";
+[...]
+};
+
+a53_1: cpu@100 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x100>;
+device_type = "cpu";
+[...]
+};
+
+a53_2: cpu@101 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x101>;
+device_type = "cpu";
+[...]
+};
+
+a53_3: cpu@102 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x102>;
+device_type = "cpu";
+[...]
+};
+
+a53_4: cpu@103 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x103>;
+device_type = "cpu";
+[...]
+};
+
+chosen {
+
+cpupool_a {
+compatible = 

[PATCH v5 2/6] xen/sched: create public function for cpupools creation

2022-04-05 Thread Luca Fancellu
Create new public function to create cpupools, can take as parameter
the scheduler id or a negative value that means the default Xen
scheduler will be used.

Signed-off-by: Luca Fancellu 
---
Changes in v5:
- no changes
Changes in v4:
- no changes
Changes in v3:
- Fixed comment (Andrew)
Changes in v2:
- cpupool_create_pool doesn't check anymore for pool id uniqueness
  before calling cpupool_create. Modified commit message accordingly
---
 xen/common/sched/cpupool.c | 15 +++
 xen/include/xen/sched.h| 16 
 2 files changed, 31 insertions(+)

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index a6da4970506a..89a891af7076 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -1219,6 +1219,21 @@ static void cpupool_hypfs_init(void)
 
 #endif /* CONFIG_HYPFS */
 
+struct cpupool *__init cpupool_create_pool(unsigned int pool_id, int sched_id)
+{
+struct cpupool *pool;
+
+if ( sched_id < 0 )
+sched_id = scheduler_get_default()->sched_id;
+
+pool = cpupool_create(pool_id, sched_id);
+
+BUG_ON(IS_ERR(pool));
+cpupool_put(pool);
+
+return pool;
+}
+
 static int __init cf_check cpupool_init(void)
 {
 unsigned int cpu;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 406d9bc610a4..b07717987434 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1147,6 +1147,22 @@ int cpupool_move_domain(struct domain *d, struct cpupool 
*c);
 int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op);
 unsigned int cpupool_get_id(const struct domain *d);
 const cpumask_t *cpupool_valid_cpus(const struct cpupool *pool);
+
+/*
+ * cpupool_create_pool - Creates a cpupool
+ * @pool_id: id of the pool to be created
+ * @sched_id: id of the scheduler to be used for the pool
+ *
+ * Creates a cpupool with pool_id id.
+ * The sched_id parameter identifies the scheduler to be used, if it is
+ * negative, the default scheduler of Xen will be used.
+ *
+ * returns:
+ * pointer to the struct cpupool just created, or Xen will panic in case of
+ * error
+ */
+struct cpupool *cpupool_create_pool(unsigned int pool_id, int sched_id);
+
 extern void cf_check dump_runq(unsigned char key);
 
 void arch_do_physinfo(struct xen_sysctl_physinfo *pi);
-- 
2.17.1




[PATCH v5 0/6] Boot time cpupools

2022-04-05 Thread Luca Fancellu
This serie introduces a feature for Xen to create cpu pools at boot time, the
feature is enabled using a configurable that is disabled by default.
The boot time cpupool feature relies on the device tree to describe the cpu
pools.
Another feature is introduced by the serie, the possibility to assign a
dom0less guest to a cpupool at boot time.

Here follows an example, Xen is built with CONFIG_BOOT_TIME_CPUPOOLS=y.

>From the DT:

  [...]

  a72_0: cpu@0 {
compatible = "arm,cortex-a72";
reg = <0x0 0x0>;
device_type = "cpu";
[...]
  };

  a72_1: cpu@1 {
compatible = "arm,cortex-a72";
reg = <0x0 0x1>;
device_type = "cpu";
[...]
  };

  a53_0: cpu@100 {
compatible = "arm,cortex-a53";
reg = <0x0 0x100>;
device_type = "cpu";
[...]
  };

  a53_1: cpu@101 {
compatible = "arm,cortex-a53";
reg = <0x0 0x101>;
device_type = "cpu";
[...]
  };

  a53_2: cpu@102 {
compatible = "arm,cortex-a53";
reg = <0x0 0x102>;
device_type = "cpu";
[...]
  };

  a53_3: cpu@103 {
compatible = "arm,cortex-a53";
reg = <0x0 0x103>;
device_type = "cpu";
[...]
  };

  chosen {
#size-cells = <0x1>;
#address-cells = <0x1>;
xen,dom0-bootargs = "...";
xen,xen-bootargs = "...";

cpupool0 {
  compatible = "xen,cpupool";
  cpupool-cpus = <_0 _1>;
  cpupool-sched = "credit2";
};

cp1: cpupool1 {
  compatible = "xen,cpupool";
  cpupool-cpus = <_0 _1 _2 _3>;
};

module@0 {
  reg = <0x8008 0x130>;
  compatible = "multiboot,module";
};

domU1 {
  #size-cells = <0x1>;
  #address-cells = <0x1>;
  compatible = "xen,domain";
  cpus = <1>;
  memory = <0 0xC>;
  vpl011;
  domain-cpupool = <>;

  module@9200 {
compatible = "multiboot,kernel", "multiboot,module";
reg = <0x9200 0x1ff>;
bootargs = "...";
  };
};
  };

  [...]

The example DT is instructing Xen to have two cpu pools, the one with id 0
having two phisical cpus and the one with id 1 having 4 phisical cpu, the
second cpu pool uses the null scheduler and from the /chosen node we can see
that a dom0less guest will be started on that cpu pool.

In this particular case Xen must boot with different type of cpus, so the
boot argument hmp_unsafe must be enabled.

Luca Fancellu (6):
  tools/cpupools: Give a name to unnamed cpupools
  xen/sched: create public function for cpupools creation
  xen/sched: retrieve scheduler id by name
  xen/cpupool: Create different cpupools at boot time
  arm/dom0less: assign dom0less guests to cpupools
  xen/cpupool: Allow cpupool0 to use different scheduler

 docs/misc/arm/device-tree/booting.txt  |   5 +
 docs/misc/arm/device-tree/cpupools.txt | 136 +++
 tools/helpers/xen-init-dom0.c  |  35 +++-
 tools/libs/light/libxl_utils.c |   3 +-
 xen/arch/arm/domain_build.c|  14 +-
 xen/arch/arm/include/asm/smp.h |   3 +
 xen/common/Kconfig |   7 +
 xen/common/Makefile|   1 +
 xen/common/boot_cpupools.c | 230 +
 xen/common/domain.c|   2 +-
 xen/common/sched/core.c|  40 +++--
 xen/common/sched/cpupool.c |  35 +++-
 xen/include/public/domctl.h|   4 +-
 xen/include/xen/sched.h|  53 ++
 14 files changed, 540 insertions(+), 28 deletions(-)
 create mode 100644 docs/misc/arm/device-tree/cpupools.txt
 create mode 100644 xen/common/boot_cpupools.c

-- 
2.17.1




Re: [PATCH v3 2/2] xen: Populate xen.lds.h and make use of its macros

2022-04-05 Thread Jan Beulich
On 31.03.2022 09:14, Michal Orzel wrote:
> --- a/xen/include/xen/xen.lds.h
> +++ b/xen/include/xen/xen.lds.h
> @@ -5,4 +5,133 @@
>   * Common macros to be used in architecture specific linker scripts.
>   */
>  
> +/*
> + * To avoid any confusion, please note that the EFI macro does not correspond
> + * to EFI support and is used when linking a native EFI (i.e. PE/COFF) 
> binary,
> + * hence its usage in this header.
> + */
> +
> +/* Macros to declare debug sections. */
> +#ifdef EFI
> +/*
> + * Use the NOLOAD directive, despite currently ignored by (at least) GNU ld
> + * for PE output, in order to record that we'd prefer these sections to not
> + * be loaded into memory.
> + */
> +#define DECL_DEBUG(x, a) #x ALIGN(a) (NOLOAD) : { *(x) }
> +#define DECL_DEBUG2(x, y, a) #x ALIGN(a) (NOLOAD) : { *(x) *(y) }
> +#else
> +#define DECL_DEBUG(x, a) #x 0 : { *(x) }
> +#define DECL_DEBUG2(x, y, a) #x 0 : { *(x) *(y) }
> +#endif
> +
> +/*
> + * DWARF2+ debug sections.
> + * Explicitly list debug sections, first of all to avoid these sections being
> + * viewed as "orphan" by the linker.
> + *
> + * For the PE output this is further necessary so that they don't end up at
> + * VA 0, which is below image base and thus invalid. Note that this macro is
> + * to be used after _end, so if these sections get loaded they'll be 
> discarded
> + * at runtime anyway.
> + */
> +#define DWARF2_DEBUG_SECTIONS \
> +  DECL_DEBUG(.debug_abbrev, 1)\
> +  DECL_DEBUG2(.debug_info, .gnu.linkonce.wi.*, 1) \
> +  DECL_DEBUG(.debug_types, 1) \
> +  DECL_DEBUG(.debug_str, 1)   \
> +  DECL_DEBUG2(.debug_line, .debug_line.*, 1)  \
> +  DECL_DEBUG(.debug_line_str, 1)  \
> +  DECL_DEBUG(.debug_names, 4) \
> +  DECL_DEBUG(.debug_frame, 4) \
> +  DECL_DEBUG(.debug_loc, 1)   \
> +  DECL_DEBUG(.debug_loclists, 4)  \
> +  DECL_DEBUG(.debug_macinfo, 1)   \
> +  DECL_DEBUG(.debug_macro, 1) \
> +  DECL_DEBUG(.debug_ranges, 8)\

Here and ...

> +  DECL_DEBUG(.debug_rnglists, 4)  \
> +  DECL_DEBUG(.debug_addr, 8)  \

... here I think you also want to switch to POINTER_ALIGN.

> +  DECL_DEBUG(.debug_aranges, 1)   \
> +  DECL_DEBUG(.debug_pubnames, 1)  \
> +  DECL_DEBUG(.debug_pubtypes, 1)
> +
> +/* Stabs debug sections. */
> +#define STABS_DEBUG_SECTIONS \
> +  .stab 0 : { *(.stab) } \
> +  .stabstr 0 : { *(.stabstr) }   \
> +  .stab.excl 0 : { *(.stab.excl) }   \
> +  .stab.exclstr 0 : { *(.stab.exclstr) } \
> +  .stab.index 0 : { *(.stab.index) } \
> +  .stab.indexstr 0 : { *(.stab.indexstr) }
> +
> +/*
> + * Required sections not related to debugging.

Nit: Perhaps better "Required ELF sections ..."? Personally I'd also
drop the mentioning of debugging - that's not really relevant here.
I'm also unsure about "Required" - .comment isn't really required.
IOW ideally simply "ELF sections" or "Sections to be retained in ELF
binaries" or some such.

Jan




Re: [PATCH v4 1/8] x86/boot: make "vga=current" work with graphics modes

2022-04-05 Thread Roger Pau Monné
On Thu, Mar 31, 2022 at 11:44:10AM +0200, Jan Beulich wrote:
> GrUB2 can be told to leave the screen in the graphics mode it has been
> using (or any other one), via "set gfxpayload=keep" (or suitable
> variants thereof). In this case we can avoid doing another mode switch
> ourselves. This in particular avoids possibly setting the screen to a
> less desirable mode: On one of my test systems the set of modes
> reported available by the VESA BIOS depends on whether the interposed
> KVM switch has that machine set as the active one. If it's not active,
> only modes up to 1024x768 get reported, while when active 1280x1024
> modes are also included. For things to always work with an explicitly
> specified mode (via the "vga=" option), that mode therefore needs be a
> 1024x768 one.
> 
> For some reason this only works for me with "multiboot2" (and
> "module2"); "multiboot" (and "module") still forces the screen into text
> mode, despite my reading of the sources suggesting otherwise.
> 
> For starters I'm limiting this to graphics modes; I do think this ought
> to also work for text modes, but
> - I can't tell whether GrUB2 can set any text mode other than 80x25
>   (I've only found plain "text" to be valid as a "gfxpayload" setting),
> - I'm uncertain whether supporting that is worth it, since I'm uncertain
>   how many people would be running their systems/screens in text mode,
> - I'd like to limit the amount of code added to the realmode trampoline.
> 
> For starters I'm also limiting mode information retrieval to raw BIOS
> accesses. This will allow things to work (in principle) also with other
> boot environments where a graphics mode can be left in place. The
> downside is that this then still is dependent upon switching back to
> real mode, so retrieving the needed information from multiboot info is
> likely going to be desirable down the road.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH v4 1/8] x86/boot: make "vga=current" work with graphics modes

2022-04-05 Thread Jan Beulich
On 05.04.2022 10:24, Roger Pau Monné wrote:
> On Mon, Apr 04, 2022 at 05:50:57PM +0200, Jan Beulich wrote:
>> (reducing Cc list some)
>>
>> On 04.04.2022 16:49, Roger Pau Monné wrote:
>>> On Thu, Mar 31, 2022 at 11:44:10AM +0200, Jan Beulich wrote:
 GrUB2 can be told to leave the screen in the graphics mode it has been
 using (or any other one), via "set gfxpayload=keep" (or suitable
 variants thereof). In this case we can avoid doing another mode switch
 ourselves. This in particular avoids possibly setting the screen to a
 less desirable mode: On one of my test systems the set of modes
 reported available by the VESA BIOS depends on whether the interposed
 KVM switch has that machine set as the active one. If it's not active,
 only modes up to 1024x768 get reported, while when active 1280x1024
 modes are also included. For things to always work with an explicitly
 specified mode (via the "vga=" option), that mode therefore needs be a
 1024x768 one.
> 
> So this patch helps you by not having to set a mode and just relying
> on the mode set by GrUB?

Yes, but it goes beyond that: The modes offered by VESA on the particular
system don't include the higher resolution one under certain circumstances,
so I cannot tell Xen to switch to that mode. By not having to tell Xen a
specific mode (but rather inherit that set / left active by the boot
loader), I can leverage the better mode in most cases, but things will
still work if I turn on (or reset) the system with another machine being
the presently selected one at the KVM switch.

But yes, beyond the particular quirk on this system the benefit is one
less mode switch and hence less screen flickering and slightly faster
boot.

 --- a/xen/arch/x86/boot/video.S
 +++ b/xen/arch/x86/boot/video.S
 @@ -575,7 +575,6 @@ set14:  movw$0x, %ax
  movb$0x01, %ah  # Define cursor scan lines 11-12
  movw$0x0b0c, %cx
  int $0x10
 -set_current:
  stc
  ret
  
 @@ -693,6 +692,39 @@ vga_modes:
  .word   VIDEO_80x60, 0x50,0x3c,0# 80x60
  vga_modes_end:
  
 +# If the current mode is a VESA graphics one, obtain its parameters.
 +set_current:
 +leawvesa_glob_info, %di
 +movw$0x4f00, %ax
 +int $0x10
 +cmpw$0x004f, %ax
 +jne .Lsetc_done
>>>
>>> You don't seem to make use of the information fetched here? I guess
>>> this is somehow required to access the other functions?
>>
>> See the similar logic at check_vesa. The information is used later, by
>> mode_params (half way into mopar_gr). Quite likely this could be done
>> just in a single place, but that would require some restructuring of
>> the code, which I'd like to avoid doing here.
> 
> I didn't realize check_vesa and set_current where mutually
> exclusive.
> 
 +movw$0x4f03, %ax
>>>
>>> It would help readability to have defines for those values, ie:
>>> VESA_GET_CURRENT_MODE or some such (not that you need to do it here,
>>> just a comment).
>>
>> Right - this applies to all of our BIOS interfacing code, I guess.
>>
 +int $0x10
 +cmpw$0x004f, %ax
 +jne .Lsetc_done
 +
 +leawvesa_mode_info, %di # Get mode information structure
 +movw%bx, %cx
 +movw$0x4f01, %ax
 +int $0x10
 +cmpw$0x004f, %ax
 +jne .Lsetc_done
 +
 +movb(%di), %al  # Check mode attributes
 +andb$0x9b, %al
 +cmpb$0x9b, %al
>>>
>>> So you also check that the reserved D1 bit is set to 1 as mandated by
>>> the spec. This is slightly different than what's done in check_vesa,
>>> would you mind adding a define for this an unifying with check_vesa?
>>
>> Well, see the v2 changelog comment. I'm somewhat hesitant to do that
>> here; I'd prefer to consolidate this in a separate patch.
> 
> Sorry, didn't notice that v2 comment before.
> 
> It's my understanding that the main difference this patch introduces
> is that set_current now fetches the currently set mode, so that we
> avoid further mode changes if the mode set already matches the
> selected one, or if Xen is to use the already set mode?

Not exactly: You either tell Xen to use the current mode ("vga=current")
or you tell Xen to use a specific mode ("vga="). Checking whether
the present mode is the (specific) one Xen was told to switch to would
require yet more work. But skipping a requested mode switch can also
have unintended consequences, so I wouldn't even be certain we would
want to go such a route.

Jan




Re: [PATCH 2/2] arch: ensure idle domain is not left privileged

2022-04-05 Thread Jan Beulich
On 31.03.2022 01:05, Daniel P. Smith wrote:
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -589,6 +589,9 @@ static void noinline init_done(void)
>  void *va;
>  unsigned long start, end;
>  
> +/* Ensure idle domain was not left privileged */
> +ASSERT(current->domain->is_privileged == false) ;

I think this should be stronger than ASSERT(); I'd recommend calling
panic(). Also please don't compare against "true" or "false" - use
ordinary boolean operations instead (here it would be
"!current->domain->is_privileged").

Jan




Re: [PATCH v4 1/8] x86/boot: make "vga=current" work with graphics modes

2022-04-05 Thread Roger Pau Monné
On Mon, Apr 04, 2022 at 05:50:57PM +0200, Jan Beulich wrote:
> (reducing Cc list some)
> 
> On 04.04.2022 16:49, Roger Pau Monné wrote:
> > On Thu, Mar 31, 2022 at 11:44:10AM +0200, Jan Beulich wrote:
> >> GrUB2 can be told to leave the screen in the graphics mode it has been
> >> using (or any other one), via "set gfxpayload=keep" (or suitable
> >> variants thereof). In this case we can avoid doing another mode switch
> >> ourselves. This in particular avoids possibly setting the screen to a
> >> less desirable mode: On one of my test systems the set of modes
> >> reported available by the VESA BIOS depends on whether the interposed
> >> KVM switch has that machine set as the active one. If it's not active,
> >> only modes up to 1024x768 get reported, while when active 1280x1024
> >> modes are also included. For things to always work with an explicitly
> >> specified mode (via the "vga=" option), that mode therefore needs be a
> >> 1024x768 one.

So this patch helps you by not having to set a mode and just relying
on the mode set by GrUB?

> >>
> >> For some reason this only works for me with "multiboot2" (and
> >> "module2"); "multiboot" (and "module") still forces the screen into text
> >> mode, despite my reading of the sources suggesting otherwise.
> >>
> >> For starters I'm limiting this to graphics modes; I do think this ought
> >> to also work for text modes, but
> >> - I can't tell whether GrUB2 can set any text mode other than 80x25
> >>   (I've only found plain "text" to be valid as a "gfxpayload" setting),
> >> - I'm uncertain whether supporting that is worth it, since I'm uncertain
> >>   how many people would be running their systems/screens in text mode,
> >> - I'd like to limit the amount of code added to the realmode trampoline.
> >>
> >> For starters I'm also limiting mode information retrieval to raw BIOS
> >> accesses. This will allow things to work (in principle) also with other
> >> boot environments where a graphics mode can be left in place. The
> >> downside is that this then still is dependent upon switching back to
> >> real mode, so retrieving the needed information from multiboot info is
> >> likely going to be desirable down the road.
> > 
> > I'm unsure, what's the benefit from retrieving this information from
> > the VESA blob rather than from multiboot(2) structures?
> 
> As said - it allows things to work even when that data isn't provided.
> Note also how I say "for starters" - patch 2 adds logic to retrieve
> the information from MB.
> 
> > Is it because we require a VESA mode to be set before we parse the
> > multiboot information?
> 
> No, I don't think so.
> 
> >> --- a/xen/arch/x86/boot/video.S
> >> +++ b/xen/arch/x86/boot/video.S
> >> @@ -575,7 +575,6 @@ set14:  movw$0x, %ax
> >>  movb$0x01, %ah  # Define cursor scan lines 11-12
> >>  movw$0x0b0c, %cx
> >>  int $0x10
> >> -set_current:
> >>  stc
> >>  ret
> >>  
> >> @@ -693,6 +692,39 @@ vga_modes:
> >>  .word   VIDEO_80x60, 0x50,0x3c,0# 80x60
> >>  vga_modes_end:
> >>  
> >> +# If the current mode is a VESA graphics one, obtain its parameters.
> >> +set_current:
> >> +leawvesa_glob_info, %di
> >> +movw$0x4f00, %ax
> >> +int $0x10
> >> +cmpw$0x004f, %ax
> >> +jne .Lsetc_done
> > 
> > You don't seem to make use of the information fetched here? I guess
> > this is somehow required to access the other functions?
> 
> See the similar logic at check_vesa. The information is used later, by
> mode_params (half way into mopar_gr). Quite likely this could be done
> just in a single place, but that would require some restructuring of
> the code, which I'd like to avoid doing here.

I didn't realize check_vesa and set_current where mutually
exclusive.

> >> +movw$0x4f03, %ax
> > 
> > It would help readability to have defines for those values, ie:
> > VESA_GET_CURRENT_MODE or some such (not that you need to do it here,
> > just a comment).
> 
> Right - this applies to all of our BIOS interfacing code, I guess.
> 
> >> +int $0x10
> >> +cmpw$0x004f, %ax
> >> +jne .Lsetc_done
> >> +
> >> +leawvesa_mode_info, %di # Get mode information structure
> >> +movw%bx, %cx
> >> +movw$0x4f01, %ax
> >> +int $0x10
> >> +cmpw$0x004f, %ax
> >> +jne .Lsetc_done
> >> +
> >> +movb(%di), %al  # Check mode attributes
> >> +andb$0x9b, %al
> >> +cmpb$0x9b, %al
> > 
> > So you also check that the reserved D1 bit is set to 1 as mandated by
> > the spec. This is slightly different than what's done in check_vesa,
> > would you mind adding a define for this an unifying with check_vesa?
> 
> Well, see the v2 changelog comment. I'm somewhat hesitant to do that
> here; I'd prefer to consolidate this in a separate patch.

Sorry, didn't notice 

[linux-linus test] 169164: regressions - FAIL

2022-04-05 Thread osstest service owner
flight 169164 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/169164/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-pvops 6 kernel-build   fail in 169157 REGR. vs. 169145

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-raw 10 host-ping-check-xenfail pass in 169157

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl   1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 169157 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 169157 n/a
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail in 169157 
blocked in 169145
 test-armhf-armhf-libvirt-raw 14 migrate-support-check fail in 169157 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 169145
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 169145
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 169145
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 169145
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 169145
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 169145
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 169145
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 

Re: [PATCH] x86/irq: Skip unmap_domain_pirq XSM during destruction

2022-04-05 Thread Jan Beulich
On 30.03.2022 20:17, Jason Andryuk wrote:
> xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from
> complete_domain_destroy as an RCU callback.  The source context was an
> unexpected, random domain.  Since this is a xen-internal operation,
> we don't want the XSM hook denying the operation.
> 
> Check d->is_dying and skip the check when the domain is dead.  The RCU
> callback runs when a domain is in that state.

One question which has always been puzzling me (perhaps to Daniel): While
I can see why mapping of an IRQ needs to be subject to an XSM check, it's
not really clear to me why unmapping would need to be, at least as long
as it's the domain itself which requests the unmap (and which I would
view to extend to the domain being cleaned up). But maybe that's why it's
XSM_HOOK ...

> ---
> Dan wants to change current to point at DOMID_IDLE when the RCU callback
> runs.  I think Juergen's commit 53594c7bd197 "rcu: don't use
> stop_machine_run() for rcu_barrier()" may have changed this since it
> mentions stop_machine_run scheduled the idle vcpus to run the callbacks
> for the old code.
> 
> Would that be as easy as changing rcu_do_batch() to do:
> 
> +/* Run as "Xen" not a random domain's vcpu. */
> +vcpu = get_current();
> +set_current(idle_vcpu[smp_processor_id()]);
>  list->func(list);
> +set_current(vcpu);
> 
> or is using set_current() only acceptable as part of context_switch?

Indeed I would question any uses outside of context_switch() (and
system bringup).

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -2340,10 +2340,14 @@ int unmap_domain_pirq(struct domain *d, int pirq)
>  nr = msi_desc->msi.nvec;
>  }
>  
> -ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
> -   msi_desc ? msi_desc->dev : NULL);
> -if ( ret )
> -goto done;
> +/* When called by complete_domain_destroy via RCU, current is a random
> + * domain.  Skip the XSM check since this is a Xen-initiated action. */

Comment style.

> +if ( d->is_dying != DOMDYING_dead ) {

Please use !d->is_dying. Also please correct the placement of the brace.
Or you could avoid the need for a brace by leveraging that ret is zero
ahead of this if(), i.e. ...

> +ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
> +   msi_desc ? msi_desc->dev : NULL);
> +if ( ret )
> +goto done;
> +}


if ( !d->is_dying )
ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
   msi_desc ? msi_desc->dev : NULL);
if ( ret )
goto done;

Jan




Re: [PATCH v4 4/4] x86/time: use fake read_tsc()

2022-04-05 Thread Roger Pau Monné
On Mon, Apr 04, 2022 at 05:33:04PM +0200, Jan Beulich wrote:
> On 04.04.2022 15:22, Roger Pau Monné wrote:
> > On Thu, Mar 31, 2022 at 11:31:38AM +0200, Jan Beulich wrote:
> >> Go a step further than bed9ae54df44 ("x86/time: switch platform timer
> >> hooks to altcall") did and eliminate the "real" read_tsc() altogether:
> >> It's not used except in pointer comparisons, and hence it looks overall
> >> more safe to simply poison plt_tsc's read_counter hook.
> >>
> >> Signed-off-by: Jan Beulich 
> >> ---
> >> I wasn't really sure whether it would be better to use simply void * for
> >> the type of the expression, resulting in an undesirable data -> function
> >> pointer conversion, but making it impossible to mistakenly try and call
> >> the (fake) function directly.
> > 
> > I think it's slightly better to avoid being able to call the function,
> > hence using void * would be my preference. What's wrong with the data
> > -> function pointer conversion for the comparisons?
> 
> There's no data -> function pointer conversion for the comparisons; the
> situation there is even less pleasant. What I referred to was actually
> the initializer, where there would be a data -> function pointer
> conversion if I used void *.

I see, there are architectures with different sizes for function and
data pointers. It's also not clear all compilers will be happy with
the conversion.

> >> ---
> >> v2: Comment wording.
> >>
> >> --- a/xen/arch/x86/time.c
> >> +++ b/xen/arch/x86/time.c
> >> @@ -607,10 +607,12 @@ static s64 __init cf_check init_tsc(stru
> >>  return ret;
> >>  }
> >>  
> >> -static uint64_t __init cf_check read_tsc(void)
> >> -{
> >> -return rdtsc_ordered();
> >> -}
> >> +/*
> >> + * plt_tsc's read_counter hook is not (and should not be) invoked via the
> >> + * struct field. To avoid carrying an unused, indirectly reachable 
> >> function,
> >> + * poison the field with an easily identifiable non-canonical pointer.
> >> + */
> >> +#define read_tsc ((uint64_t(*)(void))0x75C75C75C75C75C0ul)
> > 
> > Instead of naming this like a suitable function, I would rather use
> > READ_TSC_PTR_POISON or some such.
> 
> I'll be happy to name it something like this; the primary thing to
> settle on is the type to use.

I think it's safer to use a function pointer type like you currently
have from a correctness PoV, but in order to prevent stray calls to
read_tsc() I would rename to READ_TSC_PTR_POISON. This was already
static, so I guess it's hard anyway for any of such direct calls to
appear without us realizing.

With that:

Reviewed-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH 1/2] xsm: add ability to elevate a domain to privileged

2022-04-05 Thread Roger Pau Monné
On Mon, Apr 04, 2022 at 12:08:25PM -0400, Daniel P. Smith wrote:
> On 4/4/22 11:12, Roger Pau Monné wrote:
> > On Mon, Apr 04, 2022 at 10:21:18AM -0400, Daniel P. Smith wrote:
> >> On 3/31/22 08:36, Roger Pau Monné wrote:
> >>> On Wed, Mar 30, 2022 at 07:05:48PM -0400, Daniel P. Smith wrote:
>  diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
>  index e22d6160b5..157e57151e 100644
>  --- a/xen/include/xsm/xsm.h
>  +++ b/xen/include/xsm/xsm.h
>  @@ -189,6 +189,28 @@ struct xsm_operations {
>   #endif
>   };
>   
>  +static always_inline int xsm_elevate_priv(struct domain *d)
> >>>
> >>> I don't think it needs to be always_inline, using just inline would be
> >>> fine IMO.
> >>>
> >>> Also this needs to be __init.
> >>
> >> AIUI always_inline is likely the best way to preserve the speculation
> >> safety brought in by the call to is_system_domain().
> > 
> > There's nothing related to speculation safety in is_system_domain()
> > AFAICT. It's just a plain check against d->domain_id. It's my
> > understanding there's no need for any speculation barrier there
> > because d->domain_id is not an external input.
> 
> Hmmm, this actually raises a good question. Why is is_control_domain(),
> is_hardware_domain, and others all have evaluate_nospec() wrapping the
> check of a struct domain element while is_system_domain() does not?

Jan replied to this regard, see:

https://lore.kernel.org/xen-devel/54272d08-7ce1-b162-c8e9-1955b780c...@suse.com/

> > In any case this function should be __init only, at which point there
> > are no untrusted inputs to Xen.
> 
> I thought it was agreed that __init on inline functions in headers had
> no meaning?

In a different reply I already noted my preference would be for the
function to not reside in a header and not be inline, simply because
it would be gone after initialization and we won't have to worry about
any stray calls when the system is active.

Thanks, Roger.