date:20170117

flight 104227 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104227/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl   3 host-install(3)broken REGR. vs. 104208
 test-amd64-amd64-xl-qcow2 9 debian-di-installfail REGR. vs. 104208
 test-armhf-armhf-libvirt-qcow2 14 guest-start/debian.repeat fail REGR. vs. 
104208

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 11 guest-start  fail REGR. vs. 104208
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 104208
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 104208
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 104208
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 104208
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 104208
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 104208
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 104208

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu23eb9e6b6d5315171cc15969bbc755f258004df0
baseline version:
 qemuua8c611e1133f97c979922f41103f79309339dc27

Last test of basis   104208  2017-01-17 12:43:29 Z0 days
Testing same since   104227  2017-01-17 20:43:59 Z0 days1 attempts


People who touched revisions under test:
  Marc-AndrÃ© Lureau 
  Markus Armbruster 
  Paolo Bonzini 
  Peter Maydell 
  Stefan Hajnoczi 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  broken  
 test-amd64-i386-xl

Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-17 Thread Tian, Kevin

> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, January 16, 2017 7:00 PM
> 
> >>> On 16.01.17 at 06:25,  wrote:
> > One thing noted though. The original patch from Quan is actually orthogonal
> > to this ASSERT. Regardless of whether intack.vector is larger or smaller
> > than pt_vector, we always require the trick as long as pt_vector is not the
> > one being currently programmed to RVI.
> 
> I don't think the ASSERT() addition is orthogonal: It exchanges
> intack.vector for pt_vector in the invocation of
> vmx_set_eoi_exit_bitmap(), and during discussion of the patch
> there at least intermediately was max() of the two used instead.
> It was - iirc - one of you who suggested that the use of max()
> there is unnecessary, which the ASSERT() triggering has now
> shown is wrong.

Attached was my earlier comment:

--
> >>> On 20.12.16 at 06:37,  wrote:
> >>  From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
> >> Sent: Friday, December 16, 2016 5:40 PM
> >> -if (pt_vector != -1)
> >> -vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +if ( pt_vector != -1 ) {
> >> +if ( intack.vector > pt_vector )
> >> +vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> +else
> >> +vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +}
> >
> > Above can be simplified as one line change:
> > if ( pt_vector != -1 )
> > vmx_set_eoi_exit_bitmap(v, intack.vector);
> 
> Hmm, I don't understand. Did you mean to use max() here? Or
> else how is this an equivalent of the originally proposed code?
> 

Original code is not 100% correct. The purpose is to set EOI exit
bitmap for any vector which may block injection of pt_vector - 
give chance to recognize pt_vector in future intack and then do pt 
intr post. The simplified code achieves this effect same as original
code if intack.vector >= vector. I cannot come up a case why
intack.vector might be smaller than vector. If this case happens,
we still need enable exit bitmap for intack.vector instead of
pt_vector for said purpose while original code did it wrong.

Thanks
Kevin
--

Using intack.vector is always expected here regardless of the 
comparison result between intack.vector and pt_vector. The 
reason why I was OK adding an ASSERT was simply to test 
whether intack.vecor 
> > Then do we want to revert the whole
> > commit until the problem is finally fixed, or OK to just remove ASSERT
> > (or replace with WARN_ON with more debug info) to unblock test system
> > before the fix is ready?
> 
> Well, as the VMX maintainer I think the proposal of whether to
> revert or wait should really come from you.
> 
> Jan

Andrew, how long do you usually tolerate a failure case in osstest?
I'm not sure how long it may take for developer to reproduce this
situation. If it has blocking impact in your side, I'd suggest go 
replacing ASSERT with more informative warn info before final 
root-cause, if Quan cannot reproduce in a short time (say
1 or 2wk or so).

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] Problems with pci/vga passthrough

2017-01-17 Thread Diederik de Haas

Hi!

I reported/discussed this earlier on IRC, but was asked to report it here too.

The goal I tried to accomplish was getting vga passthrough working to get 
hardware acceleration and OpenGL 2+ working for a vm containing 
KDE Plasma 5 (Debian Stretch). 
The graphics card I tried it with was an (old) NVidia GeForce 6200 TurboCache 
and even though I already had a suspicion that it would be challenging, 
which was confirmed on IRC, I wanted to try/figure out what I should be doing 
as I'm (very) new to Xen. 

But as soon as I enabled 'gfx_passthru = 1` and `pci = [ '02:00.0' ]`, the 
`xl create ` process kept crashing and the only way I was able to 
stop it was by doing `kill ` and the  was 
"/usr/lib/xen-4.8/bin/xl create -c /etc/xen/tradestation.home.cknow.org.cfg" 
(attached as tradestation.xen.cfg)

Here's some console output illustrating the crashing:
root@cknowsvr01:/home/diederik# xl create -c 
/etc/xen/tradestation.home.cknow.org.cfg 
Parsing config from /etc/xen/tradestation.home.cknow.org.cfg
libxl: notice: libxl_numa.c:518:libxl__get_numa_candidate: NUMA placement 
failed, performance might be affected
libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an error 
message from QMP server: Could not set password
root@cknowsvr01:/home/diederik# xl list
NameID   Mem VCPUs  State   Time(s)
Domain-0 0 11277332 r-  80.0
tradestation.home.cknow.org  5 16383 1 --psc-   0.0
root@cknowsvr01:/home/diederik# xl list
NameID   Mem VCPUs  State   Time(s)
Domain-0 0 11277332 r-  86.2
tradestation.home.cknow.org  6 16383 1 ---sc-   0.0
root@cknowsvr01:/home/diederik# xl list
NameID   Mem VCPUs  State   Time(s)
Domain-0 0 11277332 r-  86.3
tradestation.home.cknow.org  6 16383 1 ---sc-   0.0
root@cknowsvr01:/home/diederik# xl list
NameID   Mem VCPUs  State   Time(s)
Domain-0 0 11277332 r-  90.4
root@cknowsvr01:/home/diederik# xl list
NameID   Mem VCPUs  State   Time(s)
Domain-0 0 11277332 r-  92.8
tradestation.home.cknow.org  7 16383 1 ---sc-   0.0
root@cknowsvr01:/home/diederik# xl list
NameID   Mem VCPUs  State   Time(s)
Domain-0 0 11277332 r- 132.1
tradestation.home.cknow.org 13 16383 1 ---sc-   0.0

Trying with `xl destroy ` didn't help, `xl pause ` paused the 
crashing, but as soon as I tried to activate it again, the crash-loop 
continued and as said before `kill ` was the only way out.

I've also attached the output of `xl info` and `xl dmesg` as that may 
provide some info as well. 
In the `xl dmesg` you'll notice various crashes as well and that is very likely 
due to hardware-wise failing of the vga card which disappeared when I took 
the vga card out of the system.

Furthermore, I have also attached a description of the process I tried to get 
it working when trying to use `xl pci-assignable-add` which resulted in a 
complete system hang. In retrospect it may have been caused by hardware 
failure, but still, getting a complete system hang when executing a `xl` 
command isn't nice. 
But as I said, I'm a n00b wrt Xen, so I figured I better provide too much 
info then too little.

No matter the hardware failure, I see the continuous loop of the  failed vm 
creation attempt as a real problem as I could only stop that by using 
`kill ` and that was before the hardware (really) died.

As it looks like the hardware failure of the vga card even prevented booting 
of the whole system at some point, I have removed it and have no plans to 
put it back in, even for testing purposes. 
But otherwise I'll try to answer any questions to the best of my abilities.
I have ordered an XFX Radeon RX 460 - 4GB GDDR5 (passively cooled) and 
when that arrives I can try to see whether I can reproduce it with that too, 
but it may take a couple of days and it is a completely different card.

Lastly, I subscribed to this list but due to the huge volume, I unsubscribed 
again, so a CC of any response would be preferable.

Cheers,
  Diederikroot@cknowsvr01:/home/diederik# cat /etc/xen/tradestation.home.cknow.org.cfg 
#
# Configuration file for the Xen instance tradestation.home.cknow.org, created
# by xen-tools 4.6.2 on Sun Jan  8 12:58:13 2017.
#

#
#  Kernel + memory size
#
kernel  = '/boot/vmlinuz-4.8.0-2-amd64'
extra   = 'elevator=noop xen-fbfront.video=32,1920,1080'
ramdisk =

Re: [Xen-devel] [PATCH V5] x86/HVM: Introduce struct hvm_pi_ops

2017-01-17 Thread Tian, Kevin



> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Tuesday, January 17, 2017 5:30 PM
> To: Suravee Suthikulpanit
> Cc: sherry.hurw...@amd.com; andrew.coop...@citrix.com; Nakajima, Jun; Tian, 
> Kevin;
> xen-devel@lists.xen.org; Boris Ostrovsky; konrad.w...@oracle.com
> Subject: Re: [PATCH V5] x86/HVM: Introduce struct hvm_pi_ops
> 
> >>> On 17.01.17 at 03:35,  wrote:
> > The current function pointers in struct vmx_domain for managing hvm
> > posted interrupt can be used also by SVM AVIC. Therefore, this patch
> > introduces the struct hvm_pi_ops in the struct hvm_domain to hold them.
> >
> > Signed-off-by: Suravee Suthikulpanit 
> 
> Acked-by: Jan Beulich 
> 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] PV audio drivers for Linux

2017-01-17 Thread Ughreja, Rakesh A



>-Original Message-
>From: Stefano Stabellini [mailto:sstabell...@kernel.org]
>Sent: Wednesday, January 18, 2017 5:41 AM
>To: Ughreja, Rakesh A 
>Cc: xen-devel@lists.xen.org; oleksandr_andrushche...@epam.com;
>oleksandr_gryt...@epam.com; oleksandr.dmytrys...@globallogic.com;
>iurii.konovale...@globallogic.com; konrad.w...@oracle.com
>Subject: Re: [Xen-devel] PV audio drivers for Linux
>
>On Tue, 17 Jan 2017, Ughreja, Rakesh A wrote:
>> Hi,
>>
>> I am trying to develop PV audio drivers and facing one issue to
>> achieve zero copy of the buffers between Front End (DOM1) and
>> Back End (DOM0) drivers.
>
>You might want to take a look at the existing PV sound proposal:
>
>http://marc.info/?l=xen-devel=148094319010445
>
Sure, let me look into this.
Thank you very much for the quick reply and the reference.

>
>> When the buffer is allocated using __get_free_pages() on the DOM0
>> OS, I am able to grant the access using gnttab_grant_foreign_access()
>> to DOM1 as well as I am able to map it in the DOM1 virtual space
>> using xenbus_map_ring_valloc().
>>
>> However the existing audio driver allocates buffer using
>> dma_alloc_coherent(). In that case I am able to grant the access using
>> gnttab_grant_foreign_access() to DOM1 but when I try to map in the
>> DOM1 virtual space using xenbus_map_ring_valloc(), it returns an error.
>>
>> [1] Code returns from here.
>>
>> 507 xenbus_dev_fatal(dev, map[i].status,
>> 508  "mapping in shared page %d from 
>> domain %d",
>> 509  gnt_refs[i], dev->otherend_id);
>>
>> gnttab_batch_map(map, i) is unable to map the page, but I am unable to
>> understand why. May be its due to the difference in the way buffers
>> are allocated dma_alloc_coherent() vs __get_free_pages().
>>
>> Since I don't want to touch existing audio driver, I need to figure out
>> how to map buffer to DOM1 space with dma_alloc_coherent().
>>
>> Any pointers would be really helpful. Thank you in advance.
>
>Pages allocated by dma_alloc_coherent can be a bit special. Are you
>going through the swiotlb-xen
>(drivers/xen/swiotlb-xen.c:xen_swiotlb_alloc_coherent) in Dom0?
>

No, I am not using this. Actually I am trying to reuse the existing
HDA driver and just opening the ALSA streams at kernel level in
PV Backend driver.

Buffers are allocated by the existing HDA driver.
http://lxr.free-electrons.com/source/sound/core/memalloc.c#L83

>I would probably add a few printks to Xen in
>xen/common/grant_table.c:do_grant_table_op to understand what is the
>error exactly.

In the gnttab_retry_eagain_gop function, when it tries to
get the status it always receives status as GNTST_eagain. After the retries
the status is marked as GNTST_bad_page.

I am unable to figure out what properties of dma_alloc_coherent allocated
buffers makes it un-mappable at Dom1.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [xen-unstable test] 104223: tolerable FAIL - PUSHED

flight 104223 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104223/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 104202
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 104202
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 104202
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 104202
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 104202
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 104202
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 104202
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 104202
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 104202

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  5ad98e3c7fa92f46d77a788e1109b7d282bd1256
baseline version:
 xen  c33b5f013db3460c07c017dea45a1c010c3dacc0

Last test of basis   104202  2017-01-17 09:44:42 Z0 days
Testing same since   104223  2017-01-17 19:14:55 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-oldkern  pass
 build-i386-oldkern   pass
 build-amd64-prev pass
 build-i386-prev  pass
 build-amd64-pvopspass
 build-armhf-pvopspass

[Xen-devel] [PATCH v5 23/24] tools: L2 CAT: support set cbm for L2 CAT.

This patch implements the xl/xc changes to support set CBM
for L2 CAT.

The new level option is introduced to original CAT setting
command in order to set CBM for specified level CAT.
- 'xl psr-cat-cbm-set' is updated to set cache capacity
  bitmasks(CBM) for a domain according to input cache level.

root@:~$ xl psr-cat-cbm-set -l2 1 0x7f

root@:~$ xl psr-cat-show -l2 1
Socket ID   : 0
Default CBM : 0xff
   ID NAME CBM
1 ubuntu140x7f

Signed-off-by: He Chen 
Signed-off-by: Yi Sun 
---
 tools/libxc/xc_psr.c  |  3 +++
 tools/libxl/xl_cmdimpl.c  | 31 ---
 tools/libxl/xl_cmdtable.c |  1 +
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index f0aed2d..67556bb 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -266,6 +266,9 @@ int xc_psr_cat_set_domain_data(xc_interface *xch, uint32_t 
domid,
 case XC_PSR_CAT_L3_CBM_DATA:
 cmd = XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA;
 break;
+case XC_PSR_CAT_L2_CBM:
+cmd = XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM;
+break;
 default:
 errno = EINVAL;
 return -1;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index cdcee5f..a32438c 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -9523,19 +9523,21 @@ int main_psr_cat_cbm_set(int argc, char **argv)
 char *value;
 libxl_string_list socket_list;
 unsigned long start, end;
-int i, j, len;
+unsigned int i, j, len;
+unsigned int lvl = 3;
 
 static struct option opts[] = {
 {"socket", 1, 0, 's'},
 {"data", 0, 0, 'd'},
 {"code", 0, 0, 'c'},
+{"level", 1, 0, 'l'},
 COMMON_LONG_OPTS
 };
 
 libxl_socket_bitmap_alloc(ctx, _map, 0);
 libxl_bitmap_set_none(_map);
 
-SWITCH_FOREACH_OPT(opt, "s:cd", opts, "psr-cat-cbm-set", 2) {
+SWITCH_FOREACH_OPT(opt, "s:l:cd", opts, "psr-cat-cbm-set", 2) {
 case 's':
 trim(isspace, optarg, );
 split_string_into_string_list(value, ",", _list);
@@ -9555,17 +9557,24 @@ int main_psr_cat_cbm_set(int argc, char **argv)
 case 'c':
 opt_code = 1;
 break;
+case 'l':
+lvl = atoi(optarg);
+break;
 }
 
-if (opt_data && opt_code) {
-fprintf(stderr, "Cannot handle -c and -d at the same time\n");
-return -1;
-} else if (opt_data) {
-type = LIBXL_PSR_CBM_TYPE_L3_CBM_DATA;
-} else if (opt_code) {
-type = LIBXL_PSR_CBM_TYPE_L3_CBM_CODE;
-} else {
-type = LIBXL_PSR_CBM_TYPE_L3_CBM;
+if (lvl == 2)
+type = LIBXL_PSR_CBM_TYPE_L2_CBM;
+else if (lvl == 3) {
+if (opt_data && opt_code) {
+fprintf(stderr, "Cannot handle -c and -d at the same time\n");
+return ERROR_FAIL;
+} else if (opt_data) {
+type = LIBXL_PSR_CBM_TYPE_L3_CBM_DATA;
+} else if (opt_code) {
+type = LIBXL_PSR_CBM_TYPE_L3_CBM_CODE;
+} else {
+type = LIBXL_PSR_CBM_TYPE_L3_CBM;
+}
 }
 
 if (libxl_bitmap_is_empty(_map))
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index c5fbad4..32c3ee5 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -550,6 +550,7 @@ struct cmd_spec cmd_table[] = {
   "Set cache capacity bitmasks(CBM) for a domain",
   "[options]  ",
   "-sSpecify the socket to process, otherwise all sockets 
are processed\n"
+  "-l Specify the cache level to process, otherwise L3 
cache is processed\n"
   "-cSet code CBM if CDP is supported\n"
   "-dSet data CBM if CDP is supported\n"
 },
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 21/24] tools: L2 CAT: support get HW info for L2 CAT.

This patch implements xl/xc changes to support get HW info
for L2 CAT.

'xl psr-hwinfo' is updated to show both L3 CAT and L2 CAT
info.

Example(on machine which only supports L2 CAT):
Cache Monitoring Technology (CMT):
Enabled : 0
Cache Allocation Technology (CAT): L3
libxl: error: libxl_psr.c:100:libxl__psr_cat_log_err_msg: CAT is not enabled on 
the socket: No such file or directory
Failed to get l3 cat info
Cache Allocation Technology (CAT): L2
Socket ID   : 0
Maximum COS : 3
CBM length  : 8
Default CBM : 0xff

Signed-off-by: He Chen 
Signed-off-by: Yi Sun 
---
 tools/libxc/include/xenctrl.h |  6 ++---
 tools/libxc/xc_psr.c  | 40 +++--
 tools/libxl/libxl.h   |  9 
 tools/libxl/libxl_psr.c   | 19 +++-
 tools/libxl/libxl_types.idl   |  1 +
 tools/libxl/xl_cmdimpl.c  | 52 +--
 6 files changed, 95 insertions(+), 32 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 4ab0f57..7ea0c92 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2626,9 +2626,9 @@ int xc_psr_cat_set_domain_data(xc_interface *xch, 
uint32_t domid,
 int xc_psr_cat_get_domain_data(xc_interface *xch, uint32_t domid,
xc_psr_cat_type type, uint32_t target,
uint64_t *data);
-int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
-   uint32_t *cos_max, uint32_t *cbm_len,
-   bool *cdp_enabled);
+int xc_psr_cat_get_info(xc_interface *xch, uint32_t socket, unsigned int lvl,
+uint32_t *cos_max, uint32_t *cbm_len,
+bool *cdp_enabled);
 
 int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps);
 int xc_get_cpu_featureset(xc_interface *xch, uint32_t index,
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index 43b3286..6c61aa5 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -317,24 +317,40 @@ int xc_psr_cat_get_domain_data(xc_interface *xch, 
uint32_t domid,
 return rc;
 }
 
-int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
-   uint32_t *cos_max, uint32_t *cbm_len,
-   bool *cdp_enabled)
+int xc_psr_cat_get_info(xc_interface *xch, uint32_t socket, unsigned int lvl,
+uint32_t *cos_max, uint32_t *cbm_len, bool 
*cdp_enabled)
 {
-int rc;
+int rc = -1;
 DECLARE_SYSCTL;
 
 sysctl.cmd = XEN_SYSCTL_psr_cat_op;
-sysctl.u.psr_cat_op.cmd = XEN_SYSCTL_PSR_CAT_get_l3_info;
 sysctl.u.psr_cat_op.target = socket;
 
-rc = xc_sysctl(xch, );
-if ( !rc )
-{
-*cos_max = sysctl.u.psr_cat_op.u.l3_info.cos_max;
-*cbm_len = sysctl.u.psr_cat_op.u.l3_info.cbm_len;
-*cdp_enabled = sysctl.u.psr_cat_op.u.l3_info.flags &
-   XEN_SYSCTL_PSR_CAT_L3_CDP;
+switch ( lvl ) {
+case 2:
+sysctl.u.psr_cat_op.cmd = XEN_SYSCTL_PSR_CAT_get_l2_info;
+rc = xc_sysctl(xch, );
+if ( !rc )
+{
+*cos_max = sysctl.u.psr_cat_op.u.l2_info.cos_max;
+*cbm_len = sysctl.u.psr_cat_op.u.l2_info.cbm_len;
+*cdp_enabled = false;
+}
+break;
+case 3:
+sysctl.u.psr_cat_op.cmd = XEN_SYSCTL_PSR_CAT_get_l3_info;
+rc = xc_sysctl(xch, );
+if ( !rc )
+{
+*cos_max = sysctl.u.psr_cat_op.u.l3_info.cos_max;
+*cbm_len = sysctl.u.psr_cat_op.u.l3_info.cbm_len;
+*cdp_enabled = sysctl.u.psr_cat_op.u.l3_info.flags &
+   XEN_SYSCTL_PSR_CAT_L3_CDP;
+}
+break;
+default:
+errno = EOPNOTSUPP;
+break;
 }
 
 return rc;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 3924464..c75a928 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -904,6 +904,13 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const 
libxl_mac *src);
  * If this is defined, the Code and Data Prioritization feature is supported.
  */
 #define LIBXL_HAVE_PSR_CDP 1
+
+/*
+ * LIBXL_HAVE_PSR_L2_CAT
+ *
+ * If this is defined, the L2 Cache Allocation Technology feature is supported.
+ */
+#define LIBXL_HAVE_PSR_L2_CAT 1
 #endif
 
 /*
@@ -2166,6 +2173,8 @@ int libxl_psr_cat_get_cbm(libxl_ctx *ctx, uint32_t domid,
  * On success, the function returns an array of elements in 'info',
  * and the length in 'nr'.
  */
+int libxl_psr_cat_get_info(libxl_ctx *ctx, libxl_psr_cat_info **info,
+   int *nr, unsigned int lvl);
 int libxl_psr_cat_get_l3_info(libxl_ctx *ctx, libxl_psr_cat_info **info,
   int *nr);
 void libxl_psr_cat_info_list_free(libxl_psr_cat_info *list, int nr);
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index

[Xen-devel] [PATCH v5 24/24] docs: add L2 CAT description in docs.

This patch adds L2 CAT description in related documents.

Signed-off-by: He Chen 
Signed-off-by: Yi Sun 
---
 docs/man/xl.pod.1.in  | 25 ++---
 docs/misc/xl-psr.markdown | 10 --
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
index 8e2aa5b..2c41ea7 100644
--- a/docs/man/xl.pod.1.in
+++ b/docs/man/xl.pod.1.in
@@ -1701,6 +1701,9 @@ occupancy monitoring share the same set of underlying 
monitoring service. Once
 a domain is attached to the monitoring service, monitoring data can be shown
 for any of these monitoring types.
 
+There is no cache monitoring and memory bandwidth monitoring on L2 cache so
+far.
+
 =over 4
 
 =item B [I]
@@ -1725,7 +1728,7 @@ monitor types are:
 
 Intel Broadwell and later server platforms offer capabilities to configure and
 make use of the Cache Allocation Technology (CAT) mechanisms, which enable more
-cache resources (i.e. L3 cache) to be made available for high priority
+cache resources (i.e. L3/L2 cache) to be made available for high priority
 applications. In the Xen implementation, CAT is used to control cache 
allocation
 on VM basis. To enforce cache on a specific domain, just set capacity bitmasks
 (CBM) for the domain.
@@ -1735,7 +1738,7 @@ Intel Broadwell and later server platforms also offer 
Code/Data Prioritization
 applications. CDP is used on a per VM basis in the Xen implementation. To
 specify code or data CBM for the domain, CDP feature must be enabled and CBM
 type options need to be specified when setting CBM, and the type options (code
-and data) are mutually exclusive.
+and data) are mutually exclusive. There is no CDP support on L2 so far.
 
 =over 4
 
@@ -1752,6 +1755,11 @@ B
 
 Specify the socket to process, otherwise all sockets are processed.
 
+=item B<-l LEVEL>, B<--level=LEVEL>
+
+Specify the cache level to process, otherwise the last level cache (L3) is
+processed.
+
 =item B<-c>, B<--code>
 
 Set code CBM when CDP is enabled.
@@ -1762,10 +1770,21 @@ Set data CBM when CDP is enabled.
 
 =back
 
-=item B [I]
+=item B [I] [I]
 
 Show CAT settings for a certain domain or all domains.
 
+B
+
+=over 4
+
+=item B<-l LEVEL>, B<--level=LEVEL>
+
+Specify the cache level to process, otherwise the last level cache (L3) is
+processed.
+
+=back
+
 =back
 
 =head1 IGNORED FOR COMPATIBILITY WITH XM
diff --git a/docs/misc/xl-psr.markdown b/docs/misc/xl-psr.markdown
index c3c1e8e..bd2b6bd 100644
--- a/docs/misc/xl-psr.markdown
+++ b/docs/misc/xl-psr.markdown
@@ -70,7 +70,7 @@ total-mem-bandwidth instead of cache-occupancy). E.g. after a 
`xl psr-cmt-attach
 
 Cache Allocation Technology (CAT) is a new feature available on Intel
 Broadwell and later server platforms that allows an OS or Hypervisor/VMM to
-partition cache allocation (i.e. L3 cache) based on application priority or
+partition cache allocation (i.e. L3/L2 cache) based on application priority or
 Class of Service (COS). Each COS is configured using capacity bitmasks (CBM)
 which represent cache capacity and indicate the degree of overlap and
 isolation between classes. System cache resource is divided into numbers of
@@ -119,13 +119,19 @@ A cbm is valid only when:
 In a multi-socket system, the same cbm will be set on each socket by default.
 Per socket cbm can be specified with the `--socket SOCKET` option.
 
+In different systems, the different cache level is supported, e.g. L3 cache or
+L2 cache. Per cache level cbm can be specified with the `--level LEVEL` option.
+
 Setting the CBM may not be successful if insufficient COS is available. In
 such case unused COS(es) may be freed by setting CBM of all related domains to
 its default value(all-ones).
 
 Per domain CBM settings can be shown by:
 
-`xl psr-cat-show`
+`xl psr-cat-show [OPTIONS] `
+
+In different systems, the different cache level is supported, e.g. L3 cache or
+L2 cache. Per cache level cbm can be specified with the `--level LEVEL` option.
 
 ## Code and Data Prioritization (CDP)
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 18/24] x86: L2 CAT: implement get hw info flow.

This patch implements get HW info flow for L2 CAT including L2 CAT callback
function.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c  | 16 
 xen/arch/x86/sysctl.c   | 15 +++
 xen/include/asm-x86/psr.h   |  1 +
 xen/include/public/sysctl.h |  6 ++
 4 files changed, 38 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 5320ae6..b630c48 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -268,6 +268,9 @@ static enum psr_feat_type psr_cbm_type_to_feat_type(enum 
cbm_type type)
 case PSR_CBM_TYPE_L3_CODE:
 feat_type = PSR_SOCKET_L3_CDP;
 break;
+case PSR_CBM_TYPE_L2:
+feat_type = PSR_SOCKET_L2_CAT;
+break;
 default:
 feat_type = 0x;
 break;
@@ -715,8 +718,21 @@ static unsigned int l2_cat_get_cos_max(const struct 
feat_node *feat)
 return feat->info.l2_cat_info.cos_max;
 }
 
+static bool l2_cat_get_feat_info(const struct feat_node *feat,
+ uint32_t data[], uint32_t array_len)
+{
+if ( !data || 2 > array_len )
+return false;
+
+data[CBM_LEN] = feat->info.l2_cat_info.cbm_len;
+data[COS_MAX] = feat->info.l2_cat_info.cos_max;
+
+return true;
+}
+
 struct feat_ops l2_cat_ops = {
 .get_cos_max = l2_cat_get_cos_max,
+.get_feat_info = l2_cat_get_feat_info,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index a4c8cfe..ae3600a 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -207,6 +207,21 @@ long arch_do_sysctl(
 ret = -EFAULT;
 break;
 }
+case XEN_SYSCTL_PSR_CAT_get_l2_info:
+{
+uint32_t dat[2];
+ret = psr_get_info(sysctl->u.psr_cat_op.target,
+   PSR_CBM_TYPE_L2, dat, 2);
+if ( ret )
+break;
+
+sysctl->u.psr_cat_op.u.l2_info.cbm_len = dat[CBM_LEN];
+sysctl->u.psr_cat_op.u.l2_info.cos_max = dat[COS_MAX];
+
+if ( !ret && __copy_field_to_guest(u_sysctl, sysctl, u.psr_cat_op) 
)
+ret = -EFAULT;
+break;
+}
 default:
 ret = -EOPNOTSUPP;
 break;
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index d2c7a13..31aa332 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -56,6 +56,7 @@ enum cbm_type {
 PSR_CBM_TYPE_L3,
 PSR_CBM_TYPE_L3_CODE,
 PSR_CBM_TYPE_L3_DATA,
+PSR_CBM_TYPE_L2,
 };
 
 extern struct psr_cmt *psr_cmt;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 00f5e77..cbf5372 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -744,6 +744,7 @@ typedef struct xen_sysctl_pcitopoinfo 
xen_sysctl_pcitopoinfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
 
 #define XEN_SYSCTL_PSR_CAT_get_l3_info   0
+#define XEN_SYSCTL_PSR_CAT_get_l2_info   1
 struct xen_sysctl_psr_cat_op {
 uint32_t cmd;   /* IN: XEN_SYSCTL_PSR_CAT_* */
 uint32_t target;/* IN */
@@ -754,6 +755,11 @@ struct xen_sysctl_psr_cat_op {
 #define XEN_SYSCTL_PSR_CAT_L3_CDP   (1u << 0)
 uint32_t flags; /* OUT: CAT flags */
 } l3_info;
+
+struct {
+uint32_t cbm_len;   /* OUT: CBM length */
+uint32_t cos_max;   /* OUT: Maximum COS */
+} l2_info;
 } u;
 };
 typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 22/24] tools: L2 CAT: support show cbm for L2 CAT.

This patch implements changes in xl/xc changes to support
showing CBM of L2 CAT.

The new level option is introduced to original CAT showing
command in order to show CBM for specified level CAT.
- 'xl psr-cat-show' is updated to show CBM of a domain
  according to input cache level.

Examples:
root@:~$ xl psr-cat-show -l2 1
Socket ID   : 0
Default CBM : 0xff
   ID NAME CBM
1 ubuntu140x7f

Signed-off-by: He Chen 
Signed-off-by: Yi Sun 
---
 tools/libxc/include/xenctrl.h |  1 +
 tools/libxc/xc_psr.c  |  3 ++
 tools/libxl/xl_cmdimpl.c  | 81 ---
 tools/libxl/xl_cmdtable.c |  3 +-
 4 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 7ea0c92..a009625 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2602,6 +2602,7 @@ enum xc_psr_cat_type {
 XC_PSR_CAT_L3_CBM  = 1,
 XC_PSR_CAT_L3_CBM_CODE = 2,
 XC_PSR_CAT_L3_CBM_DATA = 3,
+XC_PSR_CAT_L2_CBM  = 4,
 };
 typedef enum xc_psr_cat_type xc_psr_cat_type;
 
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index 6c61aa5..f0aed2d 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -299,6 +299,9 @@ int xc_psr_cat_get_domain_data(xc_interface *xch, uint32_t 
domid,
 case XC_PSR_CAT_L3_CBM_DATA:
 cmd = XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA;
 break;
+case XC_PSR_CAT_L2_CBM:
+cmd = XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM;
+break;
 default:
 errno = EINVAL;
 return -1;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 42d6827..cdcee5f 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -9379,7 +9379,7 @@ static void psr_cat_print_one_domain_cbm_type(uint32_t 
domid, uint32_t socketid,
 }
 
 static void psr_cat_print_one_domain_cbm(uint32_t domid, uint32_t socketid,
- bool cdp_enabled)
+ bool cdp_enabled, unsigned int lvl)
 {
 char *domain_name;
 
@@ -9387,27 +9387,38 @@ static void psr_cat_print_one_domain_cbm(uint32_t 
domid, uint32_t socketid,
 printf("%5d%25s", domid, domain_name);
 free(domain_name);
 
-if (!cdp_enabled) {
-psr_cat_print_one_domain_cbm_type(domid, socketid,
-  LIBXL_PSR_CBM_TYPE_L3_CBM);
-} else {
-psr_cat_print_one_domain_cbm_type(domid, socketid,
-  LIBXL_PSR_CBM_TYPE_L3_CBM_CODE);
+switch (lvl) {
+case 3:
+if (!cdp_enabled) {
+psr_cat_print_one_domain_cbm_type(domid, socketid,
+  LIBXL_PSR_CBM_TYPE_L3_CBM);
+} else {
+psr_cat_print_one_domain_cbm_type(domid, socketid,
+  LIBXL_PSR_CBM_TYPE_L3_CBM_CODE);
+psr_cat_print_one_domain_cbm_type(domid, socketid,
+  LIBXL_PSR_CBM_TYPE_L3_CBM_DATA);
+}
+break;
+case 2:
 psr_cat_print_one_domain_cbm_type(domid, socketid,
-  LIBXL_PSR_CBM_TYPE_L3_CBM_DATA);
+  LIBXL_PSR_CBM_TYPE_L2_CBM);
+break;
+default:
+printf("Input lvl %d is wrong!", lvl);
+break;
 }
 
 printf("\n");
 }
 
 static int psr_cat_print_domain_cbm(uint32_t domid, uint32_t socketid,
-bool cdp_enabled)
+bool cdp_enabled, unsigned int lvl)
 {
 int i, nr_domains;
 libxl_dominfo *list;
 
 if (domid != INVALID_DOMID) {
-psr_cat_print_one_domain_cbm(domid, socketid, cdp_enabled);
+psr_cat_print_one_domain_cbm(domid, socketid, cdp_enabled, lvl);
 return 0;
 }
 
@@ -9417,49 +9428,55 @@ static int psr_cat_print_domain_cbm(uint32_t domid, 
uint32_t socketid,
 }
 
 for (i = 0; i < nr_domains; i++)
-psr_cat_print_one_domain_cbm(list[i].domid, socketid, cdp_enabled);
+psr_cat_print_one_domain_cbm(list[i].domid, socketid, cdp_enabled, 
lvl);
 libxl_dominfo_list_free(list, nr_domains);
 
 return 0;
 }
 
-static int psr_cat_print_socket(uint32_t domid, libxl_psr_cat_info *info)
+static int psr_cat_print_socket(uint32_t domid, libxl_psr_cat_info *info,
+unsigned int lvl)
 {
 int rc;
 uint32_t l3_cache_size;
 
-rc = libxl_psr_cmt_get_l3_cache_size(ctx, info->id, _cache_size);
-if (rc) {
-fprintf(stderr, "Failed to get l3 cache size for socket:%d\n",
-info->id);
-return -1;
+printf("%-16s: %u\n", "Socket ID", info->id);
+
+/* So far, CMT only supports L3 cache. */
+if (lvl == 3)
+{
+rc =

[Xen-devel] [PATCH v5 20/24] x86: L2 CAT: implement set value flow.

This patch implements L2 CAT set value related callback functions
and domctl interface.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/domctl.c   |  6 +++
 xen/arch/x86/psr.c  | 92 +
 xen/include/asm-x86/msr-index.h |  1 +
 xen/include/public/domctl.h |  1 +
 4 files changed, 100 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index af6153d..2767c6a 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1382,6 +1382,12 @@ long arch_do_domctl(
   PSR_CBM_TYPE_L3_DATA);
 break;
 
+case XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM:
+ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+  domctl->u.psr_cat_op.data,
+  PSR_CBM_TYPE_L2);
+break;
+
 case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM:
 ret = psr_get_val(d, domctl->u.psr_cat_op.target,
   >u.psr_cat_op.data,
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 1fad540..13d85e0 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -741,10 +741,102 @@ static bool l2_cat_get_val(const struct feat_node *feat, 
unsigned int cos,
 return true;
 }
 
+static unsigned int l2_cat_get_cos_num(const struct feat_node *feat)
+{
+/* L2 CAT uses one COS. */
+return 1;
+}
+
+static int l2_cat_get_old_val(uint64_t val[],
+  const struct feat_node *feat,
+  unsigned int old_cos)
+{
+if ( old_cos > feat->info.l2_cat_info.cos_max )
+/* Use default value. */
+old_cos = 0;
+
+val[0] = feat->cos_reg_val[old_cos];
+
+return 0;
+}
+
+static int l2_cat_set_new_val(uint64_t val[],
+  const struct feat_node *feat,
+  enum cbm_type type,
+  uint64_t m)
+{
+if ( !psr_check_cbm(feat->info.l2_cat_info.cbm_len, m) )
+return -EINVAL;
+
+val[0] = m;
+
+return 0;
+}
+
+static int l2_cat_compare_val(const uint64_t val[],
+  const struct feat_node *feat,
+  unsigned int cos, bool *found)
+{
+uint64_t l2_def_cbm;
+
+l2_def_cbm = (1ull << feat->info.l2_cat_info.cbm_len) - 1;
+
+if ( cos > feat->info.l2_cat_info.cos_max )
+{
+if ( val[0] != l2_def_cbm )
+{
+*found = false;
+return -ENOENT;
+}
+*found = true;
+}
+else
+*found = (val[0] == feat->cos_reg_val[cos]);
+
+return 0;
+}
+
+static bool l2_cat_fits_cos_max(const uint64_t val[],
+const struct feat_node *feat,
+unsigned int cos)
+{
+uint64_t l2_def_cbm;
+
+l2_def_cbm = (1ull << feat->info.l2_cat_info.cbm_len) - 1;
+
+if ( cos > feat->info.l2_cat_info.cos_max &&
+ val[0] != l2_def_cbm )
+/*
+ * Exceed cos_max and value to set is not default,
+ * return error.
+ */
+return false;
+
+return true;
+}
+
+static int l2_cat_write_msr(unsigned int cos, const uint64_t val[],
+struct feat_node *feat)
+{
+if ( cos > feat->info.l2_cat_info.cos_max )
+return -EINVAL;
+
+feat->cos_reg_val[cos] = val[0];
+wrmsrl(MSR_IA32_PSR_L2_MASK(cos), val[0]);
+
+return 0;
+}
+
 struct feat_ops l2_cat_ops = {
 .get_cos_max = l2_cat_get_cos_max,
 .get_feat_info = l2_cat_get_feat_info,
 .get_val = l2_cat_get_val,
+.get_cos_num = l2_cat_get_cos_num,
+.get_old_val = l2_cat_get_old_val,
+.set_new_val = l2_cat_set_new_val,
+.compare_val = l2_cat_compare_val,
+.fits_cos_max = l2_cat_fits_cos_max,
+.write_msr = l2_cat_write_msr,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 98dbff1..a41e63a 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -343,6 +343,7 @@
 #define MSR_IA32_PSR_L3_MASK(n)(0x0c90 + (n))
 #define MSR_IA32_PSR_L3_MASK_CODE(n)   (0x0c90 + (n) * 2 + 1)
 #define MSR_IA32_PSR_L3_MASK_DATA(n)   (0x0c90 + (n) * 2)
+#define MSR_IA32_PSR_L2_MASK(n)(0x0d10 + (n))
 
 /* Intel Model 6 */
 #define MSR_P6_PERFCTR(n)  (0x00c1 + (n))
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 8c183ba..523a2cd 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1138,6 +1138,7 @@ struct xen_domctl_psr_cat_op {
 #define XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA3
 #define XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE4
 #define XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA5
+#define XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM 6
 #define XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM 7
 uint32_t cmd;   /*

[Xen-devel] [PATCH v5 15/24] x86: refactor psr: implement get value flow for CDP.

This patch implements L3 CDP get value callback function.

With this patch, 'psr-cat-show' can work for L3 CDP.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index b856761..dc062ff 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -533,9 +533,25 @@ static bool l3_cdp_get_feat_info(const struct feat_node 
*feat,
 return true;
 }
 
+static bool l3_cdp_get_val(const struct feat_node *feat, unsigned int cos,
+   enum cbm_type type, uint64_t *val)
+{
+if ( cos > feat->info.l3_cdp_info.cos_max )
+/* Use default value. */
+cos = 0;
+
+if ( type == PSR_CBM_TYPE_L3_DATA )
+*val = get_cdp_data(feat, cos);
+else
+*val = get_cdp_code(feat, cos);
+
+return true;
+}
+
 struct feat_ops l3_cdp_ops = {
 .get_cos_max = l3_cdp_get_cos_max,
 .get_feat_info = l3_cdp_get_feat_info,
+.get_val = l3_cdp_get_val,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 19/24] x86: L2 CAT: implement get value flow.

This patch implements L2 CAT get value callback function and
interface in domctl.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/domctl.c   |  7 +++
 xen/arch/x86/psr.c  | 12 
 xen/include/public/domctl.h |  1 +
 3 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index db56500..af6153d 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1403,6 +1403,13 @@ long arch_do_domctl(
 copyback = 1;
 break;
 
+case XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM:
+ret = psr_get_val(d, domctl->u.psr_cat_op.target,
+  >u.psr_cat_op.data,
+  PSR_CBM_TYPE_L2);
+copyback = 1;
+break;
+
 default:
 ret = -EOPNOTSUPP;
 break;
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index b630c48..1fad540 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -730,9 +730,21 @@ static bool l2_cat_get_feat_info(const struct feat_node 
*feat,
 return true;
 }
 
+static bool l2_cat_get_val(const struct feat_node *feat, unsigned int cos,
+  enum cbm_type type, uint64_t *val)
+{
+if ( cos > feat->info.l2_cat_info.cos_max )
+cos = 0;
+
+*val = feat->cos_reg_val[cos];
+
+return true;
+}
+
 struct feat_ops l2_cat_ops = {
 .get_cos_max = l2_cat_get_cos_max,
 .get_feat_info = l2_cat_get_feat_info,
+.get_val = l2_cat_get_val,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 85cbb7c..8c183ba 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1138,6 +1138,7 @@ struct xen_domctl_psr_cat_op {
 #define XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA3
 #define XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE4
 #define XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA5
+#define XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM 7
 uint32_t cmd;   /* IN: XEN_DOMCTL_PSR_CAT_OP_* */
 uint32_t target;/* IN */
 uint64_t data;  /* IN/OUT */
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 11/24] x86: refactor psr: set value: implement cos id picking flow.

Continue with previous patch:
'x86: refactor psr: set value: implement cos finding flow.'

If fail to find a COS ID, we need pick a new COS ID for domain. Only COS ID
that ref[COS_ID] is 1 or 0 can be picked to input a new set feature values.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 99 ++
 1 file changed, 99 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 8832e08..c3e25bf 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -154,6 +154,17 @@ struct feat_ops {
  */
 int (*compare_val)(const uint64_t val[], const struct feat_node *feat,
 unsigned int cos, bool *found);
+/*
+ * fits_cos_max is used to check if the input cos id exceeds the
+ * feature's cos_max and if the input value is not the default one.
+ * Even if the associated cos exceeds the cos_max, HW can work with default
+ * value. That is the reason we need check if input value is default one.
+ * If both criteria are fulfilled, that means the input exceeds the range.
+ * If not, that means the input fits the requirements.
+ */
+bool (*fits_cos_max)(const uint64_t val[],
+ const struct feat_node *feat,
+ unsigned int cos);
 };
 
 /*
@@ -388,6 +399,25 @@ static int l3_cat_compare_val(const uint64_t val[],
 return 0;
 }
 
+static bool l3_cat_fits_cos_max(const uint64_t val[],
+const struct feat_node *feat,
+unsigned int cos)
+{
+uint64_t l3_def_cbm;
+
+l3_def_cbm = (1ull << feat->info.l3_cat_info.cbm_len) - 1;
+
+if ( cos > feat->info.l3_cat_info.cos_max &&
+ val[0] != l3_def_cbm )
+/*
+ * Exceed cos_max and value to set is not default,
+ * return error.
+ */
+return false;
+
+return true;
+}
+
 static const struct feat_ops l3_cat_ops = {
 .get_cos_max = l3_cat_get_cos_max,
 .get_feat_info = l3_cat_get_feat_info,
@@ -396,6 +426,7 @@ static const struct feat_ops l3_cat_ops = {
 .get_old_val = l3_cat_get_old_val,
 .set_new_val = l3_cat_set_new_val,
 .compare_val = l3_cat_compare_val,
+.fits_cos_max = l3_cat_fits_cos_max,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
@@ -802,11 +833,79 @@ static int find_cos(const uint64_t *val, uint32_t 
array_len,
 return -ENOENT;
 }
 
+static bool fits_cos_max(const uint64_t *val,
+ uint32_t array_len,
+ const struct psr_socket_info *info,
+ unsigned int cos)
+{
+unsigned int ret;
+const uint64_t *val_tmp = val;
+const struct feat_node *feat;
+
+list_for_each_entry(feat, >feat_list, list)
+{
+ret = feat->ops.fits_cos_max(val_tmp, feat, cos);
+if ( !ret )
+return false;
+
+val_tmp += feat->ops.get_cos_num(feat);
+if ( val_tmp - val > array_len )
+return false;
+}
+
+return true;
+}
+
 static int pick_avail_cos(const struct psr_socket_info *info,
   const uint64_t *val, uint32_t array_len,
   unsigned int old_cos,
   enum psr_feat_type feat_type)
 {
+unsigned int cos;
+unsigned int cos_max = 0;
+const struct feat_node *feat;
+const unsigned int *ref = info->cos_ref;
+
+/*
+ * cos_max is the one of the feature which is being set.
+ */
+list_for_each_entry(feat, >feat_list, list)
+{
+if ( feat->feature != feat_type )
+continue;
+
+cos_max = feat->ops.get_cos_max(feat);
+if ( cos_max > 0 )
+break;
+}
+
+if ( !cos_max )
+return -ENOENT;
+
+/*
+ * If old cos is referred only by the domain, then use it. And, we cannot
+ * use id 0 because it stores the default values.
+ */
+if ( old_cos && ref[old_cos] == 1 &&
+ fits_cos_max(val, array_len, info, old_cos) )
+return old_cos;
+
+/* Find an unused one other than cos0. */
+for ( cos = 1; cos <= cos_max; cos++ )
+{
+/*
+ * ref is 0 means this COS is not used by other domain and
+ * can be used for current setting.
+ */
+if ( !ref[cos] )
+{
+if ( !fits_cos_max(val, array_len, info, cos) )
+return -ENOENT;
+
+return cos;
+}
+}
+
 return -ENOENT;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 10/24] x86: refactor psr: set value: implement cos finding flow.

Continue with patch:
'x86: refactor psr: set value: assemble features value array'

We can try to find if there is a COS ID on which all features' COS registers
values are same as the array assembled before.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 93 ++
 1 file changed, 93 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 7c6f2bf..8832e08 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -141,6 +141,19 @@ struct feat_ops {
const struct feat_node *feat,
enum cbm_type type,
uint64_t m);
+/*
+ * compare_val is used in set value process to compare if the
+ * input value array can match all the features' COS registers values
+ * according to input cos id.
+ *
+ * The return value is the amount of entries to skip in the value array
+ * or error.
+ * 1 - one entry in value array.
+ * 2 - two entries in value array, e.g. CDP uses two entries.
+ * negative - error.
+ */
+int (*compare_val)(const uint64_t val[], const struct feat_node *feat,
+unsigned int cos, bool *found);
 };
 
 /*
@@ -347,6 +360,34 @@ static int l3_cat_set_new_val(uint64_t val[],
 return 0;
 }
 
+static int l3_cat_compare_val(const uint64_t val[],
+  const struct feat_node *feat,
+  unsigned int cos, bool *found)
+{
+uint64_t l3_def_cbm;
+
+l3_def_cbm = (1ull << feat->info.l3_cat_info.cbm_len) - 1;
+
+/*
+ * Different features' cos_max are different. If cos id of the feature
+ * being set exceeds other feature's cos_max, the val of other feature
+ * must be default value. HW supports such case.
+ */
+if ( cos > feat->info.l3_cat_info.cos_max )
+{
+if ( val[0] != l3_def_cbm )
+{
+*found = false;
+return -ENOENT;
+}
+*found = true;
+}
+else
+*found = (val[0] == feat->cos_reg_val[cos]);
+
+return 0;
+}
+
 static const struct feat_ops l3_cat_ops = {
 .get_cos_max = l3_cat_get_cos_max,
 .get_feat_info = l3_cat_get_feat_info,
@@ -354,6 +395,7 @@ static const struct feat_ops l3_cat_ops = {
 .get_cos_num = l3_cat_get_cos_num,
 .get_old_val = l3_cat_get_old_val,
 .set_new_val = l3_cat_set_new_val,
+.compare_val = l3_cat_compare_val,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
@@ -706,6 +748,57 @@ static int find_cos(const uint64_t *val, uint32_t 
array_len,
 enum psr_feat_type feat_type,
 const struct psr_socket_info *info)
 {
+unsigned int cos;
+const unsigned int *ref = info->cos_ref;
+const struct feat_node *feat;
+const uint64_t *val_tmp = val;
+int ret;
+bool found = false;
+unsigned int cos_max = 0;
+
+/* cos_max is the one of the feature which is being set. */
+list_for_each_entry(feat, >feat_list, list)
+{
+if ( feat->feature != feat_type )
+continue;
+
+cos_max = feat->ops.get_cos_max(feat);
+if ( cos_max > 0 )
+break;
+}
+
+for ( cos = 0; cos <= cos_max; cos++ )
+{
+if ( cos && !ref[cos] )
+continue;
+
+/* Not found, need find again from beginning. */
+val_tmp = val;
+list_for_each_entry(feat, >feat_list, list)
+{
+/*
+ * Compare value according to feature list order.
+ * We must follow this order because value array is assembled
+ * as this order in get_old_set_new().
+ */
+ret = feat->ops.compare_val(val_tmp, feat, cos, );
+if ( ret < 0 )
+return ret;
+
+/* If fail to match, go to next cos to compare. */
+if ( !found )
+break;
+
+val_tmp += feat->ops.get_cos_num(feat);
+if ( val_tmp - val > array_len )
+return -EINVAL;
+}
+
+/* For this COS ID all entries in the values array did match. Use it. 
*/
+if ( found )
+return cos;
+}
+
 return -ENOENT;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 12/24] x86: refactor psr: set value: implement write msr flow.

Continue with previous patch:
'x86: refactor psr: set value: implement cos id picking flow.'

We have got all features values and COS ID to set. Then, we write MSRs of all
features except the setting value is same as original value.

Till now, set value process is completed.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 78 +-
 1 file changed, 77 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index c3e25bf..b8d3c82 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -165,6 +165,9 @@ struct feat_ops {
 bool (*fits_cos_max)(const uint64_t val[],
  const struct feat_node *feat,
  unsigned int cos);
+/* write_msr is used to write out feature MSR register. */
+int (*write_msr)(unsigned int cos, const uint64_t val[],
+ struct feat_node *feat);
 };
 
 /*
@@ -418,6 +421,21 @@ static bool l3_cat_fits_cos_max(const uint64_t val[],
 return true;
 }
 
+static int l3_cat_write_msr(unsigned int cos, const uint64_t val[],
+struct feat_node *feat)
+{
+if ( cos > feat->info.l3_cat_info.cos_max )
+return -EINVAL;
+
+if ( feat->cos_reg_val[cos] != val[0] )
+{
+feat->cos_reg_val[cos] = val[0];
+wrmsrl(MSR_IA32_PSR_L3_MASK(cos), val[0]);
+}
+
+return 0;
+}
+
 static const struct feat_ops l3_cat_ops = {
 .get_cos_max = l3_cat_get_cos_max,
 .get_feat_info = l3_cat_get_feat_info,
@@ -427,6 +445,7 @@ static const struct feat_ops l3_cat_ops = {
 .set_new_val = l3_cat_set_new_val,
 .compare_val = l3_cat_compare_val,
 .fits_cos_max = l3_cat_fits_cos_max,
+.write_msr = l3_cat_write_msr,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
@@ -909,10 +928,67 @@ static int pick_avail_cos(const struct psr_socket_info 
*info,
 return -ENOENT;
 }
 
+static unsigned int get_socket_cpu(unsigned int socket)
+{
+if ( likely(socket < nr_sockets) )
+return cpumask_any(socket_cpumask[socket]);
+
+return nr_cpu_ids;
+}
+
+struct cos_write_info
+{
+unsigned int cos;
+struct list_head *feat_list;
+const uint64_t *val;
+};
+
+static void do_write_psr_msr(void *data)
+{
+struct cos_write_info *info = (struct cos_write_info *)data;
+unsigned int cos   = info->cos;
+struct list_head *feat_list= info->feat_list;
+const uint64_t *val= info->val;
+struct feat_node *feat;
+int ret;
+
+if ( !feat_list )
+return;
+
+/* We need set all features values into MSRs. */
+list_for_each_entry(feat, feat_list, list)
+{
+ret = feat->ops.write_msr(cos, val, feat);
+if ( ret < 0 )
+return;
+
+val += feat->ops.get_cos_num(feat);
+}
+}
+
 static int write_psr_msr(unsigned int socket, unsigned int cos,
  const uint64_t *val)
 {
-return -ENOENT;
+struct psr_socket_info *info = get_socket_info(socket);
+struct cos_write_info data =
+{
+.cos = cos,
+.feat_list = >feat_list,
+.val = val,
+};
+
+if ( socket == cpu_to_socket(smp_processor_id()) )
+do_write_psr_msr();
+else
+{
+unsigned int cpu = get_socket_cpu(socket);
+
+if ( cpu >= nr_cpu_ids )
+return -ENOTSOCK;
+on_selected_cpus(cpumask_of(cpu), do_write_psr_msr, , 1);
+}
+
+return 0;
 }
 
 int psr_set_val(struct domain *d, unsigned int socket,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 00/24] Enable L2 Cache Allocation Technology & Refactor psr.c

Hi all,

We plan to bring a new PSR (Platform Shared Resource) feature called
Intel L2 Cache Allocation Technology (L2 CAT) to Xen.

Besides the L2 CAT implementaion, we refactor the psr.c to make it more
flexible to add new features and fulfill the principle, open for extension
but closed for modification. We abstract the general operations of all
features and encapsulate them into a structure. Then, the development
of new feature is simple to mainly implement these callback functions.

The patch set can be found at:
https://github.com/yisun-git/xen_l2_cat_v5.git l2_cat_v5

Yi Sun (24):
  docs: create L2 Cache Allocation Technology (CAT) feature document
  x86: refactor psr: remove L3 CAT/CDP codes.
  x86: refactor psr: implement main data structures.
  x86: refactor psr: implement CPU init and free flow.
  x86: refactor psr: implement Domain init/free and schedule flows.
  x86: refactor psr: implement get hw info flow.
  x86: refactor psr: implement get value flow.
  x86: refactor psr: set value: implement framework.
  x86: refactor psr: set value: assemble features value array.
  x86: refactor psr: set value: implement cos finding flow.
  x86: refactor psr: set value: implement cos id picking flow.
  x86: refactor psr: set value: implement write msr flow.
  x86: refactor psr: implement CPU init and free flow for CDP.
  x86: refactor psr: implement get hw info flow for CDP.
  x86: refactor psr: implement get value flow for CDP.
  x86: refactor psr: implement set value callback functions for CDP.
  x86: L2 CAT: implement CPU init and free flow.
  x86: L2 CAT: implement get hw info flow.
  x86: L2 CAT: implement get value flow.
  x86: L2 CAT: implement set value flow.
  tools: L2 CAT: support get HW info for L2 CAT.
  tools: L2 CAT: support show cbm for L2 CAT.
  tools: L2 CAT: support set cbm for L2 CAT.
  docs: add L2 CAT description in docs.

 docs/features/intel_psr_l2_cat.pandoc |  347 +++
 docs/man/xl.pod.1.in  |   25 +-
 docs/misc/xl-psr.markdown |   10 +-
 tools/libxc/include/xenctrl.h |7 +-
 tools/libxc/xc_psr.c  |   46 +-
 tools/libxl/libxl.h   |9 +
 tools/libxl/libxl_psr.c   |   19 +-
 tools/libxl/libxl_types.idl   |1 +
 tools/libxl/xl_cmdimpl.c  |  162 ++--
 tools/libxl/xl_cmdtable.c |4 +-
 xen/arch/x86/domctl.c |   49 +-
 xen/arch/x86/psr.c| 1593 +++--
 xen/arch/x86/sysctl.c |   45 +-
 xen/include/asm-x86/msr-index.h   |1 +
 xen/include/asm-x86/psr.h |   19 +-
 xen/include/public/domctl.h   |2 +
 xen/include/public/sysctl.h   |6 +
 17 files changed, 1951 insertions(+), 394 deletions(-)
 create mode 100644 docs/features/intel_psr_l2_cat.pandoc

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 07/24] x86: refactor psr: implement get value flow.

This patch implements get value flow including L3 CAT callback
function.

It also changes domctl interface to make it more general.

With this patch, 'psr-cat-show' can work for L3 CAT.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/domctl.c | 18 +-
 xen/arch/x86/psr.c| 41 ++---
 xen/include/asm-x86/psr.h |  4 ++--
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index ab141b1..11d2127 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1383,23 +1383,23 @@ long arch_do_domctl(
 break;
 
 case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM:
-ret = psr_get_l3_cbm(d, domctl->u.psr_cat_op.target,
- >u.psr_cat_op.data,
- PSR_CBM_TYPE_L3);
+ret = psr_get_val(d, domctl->u.psr_cat_op.target,
+  >u.psr_cat_op.data,
+  PSR_CBM_TYPE_L3);
 copyback = 1;
 break;
 
 case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE:
-ret = psr_get_l3_cbm(d, domctl->u.psr_cat_op.target,
- >u.psr_cat_op.data,
- PSR_CBM_TYPE_L3_CODE);
+ret = psr_get_val(d, domctl->u.psr_cat_op.target,
+  >u.psr_cat_op.data,
+  PSR_CBM_TYPE_L3_CODE);
 copyback = 1;
 break;
 
 case XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA:
-ret = psr_get_l3_cbm(d, domctl->u.psr_cat_op.target,
- >u.psr_cat_op.data,
- PSR_CBM_TYPE_L3_DATA);
+ret = psr_get_val(d, domctl->u.psr_cat_op.target,
+  >u.psr_cat_op.data,
+  PSR_CBM_TYPE_L3_DATA);
 copyback = 1;
 break;
 
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 319bfcc..3cbb60c 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -112,6 +112,9 @@ struct feat_ops {
 /* get_feat_info is used to get feature HW info. */
 bool (*get_feat_info)(const struct feat_node *feat,
   uint32_t data[], unsigned int array_len);
+/* get_val is used to get feature COS register value. */
+bool (*get_val)(const struct feat_node *feat, unsigned int cos,
+enum cbm_type type, uint64_t *val);
 };
 
 /*
@@ -251,9 +254,22 @@ static bool l3_cat_get_feat_info(const struct feat_node 
*feat,
 return true;
 }
 
+static bool l3_cat_get_val(const struct feat_node *feat, unsigned int cos,
+   enum cbm_type type, uint64_t *val)
+{
+if ( cos > feat->info.l3_cat_info.cos_max )
+/* Use default value. */
+cos = 0;
+
+*val =  feat->cos_reg_val[cos];
+
+return true;
+}
+
 static const struct feat_ops l3_cat_ops = {
 .get_cos_max = l3_cat_get_cos_max,
 .get_feat_info = l3_cat_get_feat_info,
+.get_val = l3_cat_get_val,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
@@ -498,10 +514,29 @@ int psr_get_info(unsigned int socket, enum cbm_type type,
 return -ENOENT;
 }
 
-int psr_get_l3_cbm(struct domain *d, unsigned int socket,
-   uint64_t *cbm, enum cbm_type type)
+int psr_get_val(struct domain *d, unsigned int socket,
+uint64_t *val, enum cbm_type type)
 {
-return 0;
+const struct psr_socket_info *info = get_socket_info(socket);
+unsigned int cos = d->arch.psr_cos_ids[socket];
+const struct feat_node *feat;
+enum psr_feat_type feat_type;
+
+if ( IS_ERR(info) )
+return PTR_ERR(info);
+
+feat_type = psr_cbm_type_to_feat_type(type);
+list_for_each_entry(feat, >feat_list, list)
+{
+if ( feat->feature != feat_type )
+continue;
+
+if ( feat->ops.get_val(feat, cos, type, val) )
+/* Found */
+return 0;
+}
+
+return -ENOENT;
 }
 
 int psr_set_l3_cbm(struct domain *d, unsigned int socket,
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index e3b18bc..d50e359 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -70,8 +70,8 @@ void psr_ctxt_switch_to(struct domain *d);
 
 int psr_get_info(unsigned int socket, enum cbm_type type,
  uint32_t data[], unsigned int array_len);
-int psr_get_l3_cbm(struct domain *d, unsigned int socket,
-   uint64_t *cbm, enum cbm_type type);
+int psr_get_val(struct domain *d, unsigned int socket,
+uint64_t *val, enum cbm_type type);
 int psr_set_l3_cbm(struct domain *d, unsigned int socket,
uint64_t cbm, enum cbm_type type);
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 17/24] x86: L2 CAT: implement CPU init and free flow.

This patch implements the CPU init and free flow for L2 CAT including
L2 CAT initialization callback function.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c| 72 +++
 xen/include/asm-x86/psr.h |  1 +
 2 files changed, 73 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 596e5b1..5320ae6 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -94,6 +94,7 @@ struct feat_hw_info {
 union {
 struct psr_cat_hw_info l3_cat_info;
 struct psr_cat_hw_info l3_cdp_info;
+struct psr_cat_hw_info l2_cat_info;
 };
 };
 
@@ -236,6 +237,7 @@ static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
  */
 static struct feat_node *feat_l3_cat;
 static struct feat_node *feat_l3_cdp;
+static struct feat_node *feat_l2_cat;
 
 /* Common functions. */
 static void free_feature(struct psr_socket_info *info)
@@ -672,6 +674,51 @@ struct feat_ops l3_cdp_ops = {
 .write_msr = l3_cdp_write_msr,
 };
 
+/* L2 CAT callback functions implementation. */
+static void l2_cat_init_feature(struct cpuid_leaf_regs regs,
+struct feat_node *feat,
+struct psr_socket_info *info)
+{
+struct psr_cat_hw_info l2_cat;
+unsigned int socket;
+
+/* No valid values so do not enable the feature. */
+if ( !regs.eax || !regs.edx )
+return;
+
+l2_cat.cbm_len = (regs.eax & CAT_CBM_LEN_MASK) + 1;
+l2_cat.cos_max = min(opt_cos_max, regs.edx & CAT_COS_MAX_MASK);
+
+/* cos=0 is reserved as default cbm(all ones). */
+feat->cos_reg_val[0] = (1ull << l2_cat.cbm_len) - 1;
+
+feat->feature = PSR_SOCKET_L2_CAT;
+__set_bit(PSR_SOCKET_L2_CAT, >feat_mask);
+
+feat->info.l2_cat_info = l2_cat;
+
+info->nr_feat++;
+
+/* Add this feature into list. */
+list_add_tail(>list, >feat_list);
+
+socket = cpu_to_socket(smp_processor_id());
+if ( opt_cpu_info )
+printk(XENLOG_INFO
+   "L2 CAT: enabled on socket %u, cos_max:%u, cbm_len:%u.\n",
+   socket, feat->info.l2_cat_info.cos_max,
+   feat->info.l2_cat_info.cbm_len);
+}
+
+static unsigned int l2_cat_get_cos_max(const struct feat_node *feat)
+{
+return feat->info.l2_cat_info.cos_max;
+}
+
+struct feat_ops l2_cat_ops = {
+.get_cos_max = l2_cat_get_cos_max,
+};
+
 static void __init parse_psr_bool(char *s, char *value, char *feature,
   unsigned int mask)
 {
@@ -1445,6 +1492,20 @@ static void cpu_init_work(void)
 l3_cat_init_feature(regs, feat, info);
 }
 }
+
+cpuid_count(PSR_CPUID_LEVEL_CAT, 0,
+, , , );
+if ( regs.ebx & PSR_RESOURCE_TYPE_L2 )
+{
+/* Initialize L2 CAT according to CPUID. */
+cpuid_count(PSR_CPUID_LEVEL_CAT, 2,
+, , , );
+
+feat = feat_l2_cat;
+feat_l2_cat = NULL;
+feat->ops = l2_cat_ops;
+l2_cat_init_feature(regs, feat, info);
+}
 }
 
 static void cpu_fini_work(unsigned int cpu)
@@ -1503,6 +1564,17 @@ static int psr_cpu_prepare(unsigned int cpu)
 return -ENOMEM;
 }
 
+if ( feat_l2_cat == NULL &&
+ (feat_l2_cat = xzalloc(struct feat_node)) == NULL )
+{
+xfree(feat_l3_cat);
+feat_l3_cat = NULL;
+
+xfree(feat_l3_cdp);
+feat_l3_cdp = NULL;
+return -ENOMEM;
+}
+
 return 0;
 }
 
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index 97214fe..d2c7a13 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -23,6 +23,7 @@
 
 /* Resource Type Enumeration */
 #define PSR_RESOURCE_TYPE_L30x2
+#define PSR_RESOURCE_TYPE_L20x4
 
 /* L3 Monitoring Features */
 #define PSR_CMT_L3_OCCUPANCY   0x1
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 13/24] x86: refactor psr: implement CPU init and free flow for CDP.

This patch implements the CPU init and free flow for CDP including L3 CDP
initialization callback function.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 98 +++---
 1 file changed, 93 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index b8d3c82..a979128 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -93,6 +93,7 @@ struct psr_cat_hw_info {
 struct feat_hw_info {
 union {
 struct psr_cat_hw_info l3_cat_info;
+struct psr_cat_hw_info l3_cdp_info;
 };
 };
 
@@ -197,6 +198,21 @@ struct cpuid_leaf_regs {
 unsigned int ecx;
 unsigned int edx;
 };
+/*
+ * get_data - get DATA COS register value from input COS ID.
+ * @feat:the feature list entry.
+ * @cos: the COS ID.
+ */
+#define get_cdp_data(feat, cos)  \
+( feat->cos_reg_val[cos * 2] )
+
+/*
+ * get_cdp_code - get CODE COS register value from input COS ID.
+ * @feat:the feature list entry.
+ * @cos: the COS ID.
+ */
+#define get_cdp_code(feat, cos)  \
+( feat->cos_reg_val[cos * 2 + 1] )
 
 struct psr_assoc {
 uint64_t val;
@@ -219,6 +235,7 @@ static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
  * inserted into feature list in cpu_init_work().
  */
 static struct feat_node *feat_l3_cat;
+static struct feat_node *feat_l3_cdp;
 
 /* Common functions. */
 static void free_feature(struct psr_socket_info *info)
@@ -448,6 +465,61 @@ static const struct feat_ops l3_cat_ops = {
 .write_msr = l3_cat_write_msr,
 };
 
+/* L3 CDP functions implementation. */
+static void l3_cdp_init_feature(struct cpuid_leaf_regs regs,
+struct feat_node *feat,
+struct psr_socket_info *info)
+{
+struct psr_cat_hw_info l3_cdp;
+unsigned int socket;
+uint64_t val;
+
+/* No valid value so do not enable feature. */
+if ( !regs.eax || !regs.edx )
+return;
+
+l3_cdp.cbm_len = (regs.eax & CAT_CBM_LEN_MASK) + 1;
+/* Cut half of cos_max when CDP is enabled. */
+l3_cdp.cos_max = min(opt_cos_max, regs.edx & CAT_COS_MAX_MASK) >> 1;
+
+/* cos=0 is reserved as default cbm(all ones). */
+get_cdp_code(feat, 0) =
+ (1ull << l3_cdp.cbm_len) - 1;
+get_cdp_data(feat, 0) =
+ (1ull << l3_cdp.cbm_len) - 1;
+
+/* We only write mask1 since mask0 is always all ones by default. */
+wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << l3_cdp.cbm_len) - 1);
+rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
+wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
+
+feat->feature = PSR_SOCKET_L3_CDP;
+__set_bit(PSR_SOCKET_L3_CDP, >feat_mask);
+
+feat->info.l3_cdp_info = l3_cdp;
+
+info->nr_feat++;
+
+/* Add this feature into list. */
+list_add_tail(>list, >feat_list);
+
+socket = cpu_to_socket(smp_processor_id());
+if ( opt_cpu_info )
+printk(XENLOG_INFO
+   "L3 CDP: enabled on socket %u, cos_max:%u, cbm_len:%u\n",
+   socket, feat->info.l3_cdp_info.cos_max,
+   feat->info.l3_cdp_info.cbm_len);
+}
+
+static unsigned int l3_cdp_get_cos_max(const struct feat_node *feat)
+{
+return feat->info.l3_cdp_info.cos_max;
+}
+
+struct feat_ops l3_cdp_ops = {
+.get_cos_max = l3_cdp_get_cos_max,
+};
+
 static void __init parse_psr_bool(char *s, char *value, char *feature,
   unsigned int mask)
 {
@@ -1207,11 +1279,19 @@ static void cpu_init_work(void)
 cpuid_count(PSR_CPUID_LEVEL_CAT, 1,
 , , , );
 
-feat = feat_l3_cat;
-feat_l3_cat = NULL;
-feat->ops = l3_cat_ops;
-
-l3_cat_init_feature(regs, feat, info);
+if ( (regs.ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
+ !test_bit(PSR_SOCKET_L3_CDP, >feat_mask) )
+{
+feat = feat_l3_cdp;
+feat_l3_cdp = NULL;
+feat->ops = l3_cdp_ops;
+l3_cdp_init_feature(regs, feat, info);
+} else {
+feat = feat_l3_cat;
+feat_l3_cat = NULL;
+feat->ops = l3_cat_ops;
+l3_cat_init_feature(regs, feat, info);
+}
 }
 }
 
@@ -1263,6 +1343,14 @@ static int psr_cpu_prepare(unsigned int cpu)
  (feat_l3_cat = xzalloc(struct feat_node)) == NULL )
 return -ENOMEM;
 
+if ( feat_l3_cdp == NULL &&
+ (feat_l3_cdp = xzalloc(struct feat_node)) == NULL )
+{
+xfree(feat_l3_cat);
+feat_l3_cat = NULL;
+return -ENOMEM;
+}
+
 return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 05/24] x86: refactor psr: implement Domain init/free and schedule flows.

This patch implements the Domain init/free and schedule flows.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 62 +-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index e9dc07a..7f06235 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -50,6 +50,8 @@
  */
 #define MAX_COS_REG_CNT  128
 
+#define PSR_ASSOC_REG_POS 32
+
 /*
  * PSR features are managed per socket. Below structure defines the members
  * used to manage these features.
@@ -211,7 +213,13 @@ static void l3_cat_init_feature(struct cpuid_leaf_regs 
regs,
feat->info.l3_cat_info.cbm_len);
 }
 
+static unsigned int l3_cat_get_cos_max(const struct feat_node *feat)
+{
+return feat->info.l3_cat_info.cos_max;
+}
+
 static const struct feat_ops l3_cat_ops = {
+.get_cos_max = l3_cat_get_cos_max,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
@@ -355,11 +363,33 @@ void psr_free_rmid(struct domain *d)
 d->arch.psr_rmid = 0;
 }
 
+static inline unsigned int get_max_cos_max(const struct psr_socket_info *info)
+{
+const struct feat_node *feat;
+unsigned int cos_max = 0;
+
+list_for_each_entry(feat, >feat_list, list)
+cos_max = max(feat->ops.get_cos_max(feat), cos_max);
+
+return cos_max;
+}
+
 static inline void psr_assoc_init(void)
 {
 struct psr_assoc *psra = _cpu(psr_assoc);
 
-if ( psr_cmt_enabled() )
+if ( socket_info )
+{
+unsigned int socket = cpu_to_socket(smp_processor_id());
+const struct psr_socket_info *info = socket_info + socket;
+unsigned int cos_max = get_max_cos_max(info);
+
+if ( info->feat_mask )
+psra->cos_mask = ((1ull << get_count_order(cos_max)) - 1) <<
+  PSR_ASSOC_REG_POS;
+}
+
+if ( psr_cmt_enabled() || psra->cos_mask )
 rdmsrl(MSR_IA32_PSR_ASSOC, psra->val);
 }
 
@@ -368,6 +398,13 @@ static inline void psr_assoc_rmid(uint64_t *reg, unsigned 
int rmid)
 *reg = (*reg & ~rmid_mask) | (rmid & rmid_mask);
 }
 
+static inline void psr_assoc_cos(uint64_t *reg, unsigned int cos,
+ uint64_t cos_mask)
+{
+*reg = (*reg & ~cos_mask) |
+(((uint64_t)cos << PSR_ASSOC_REG_POS) & cos_mask);
+}
+
 void psr_ctxt_switch_to(struct domain *d)
 {
 struct psr_assoc *psra = _cpu(psr_assoc);
@@ -376,6 +413,11 @@ void psr_ctxt_switch_to(struct domain *d)
 if ( psr_cmt_enabled() )
 psr_assoc_rmid(, d->arch.psr_rmid);
 
+if ( psra->cos_mask )
+psr_assoc_cos(, d->arch.psr_cos_ids ?
+  d->arch.psr_cos_ids[cpu_to_socket(smp_processor_id())] :
+  0, psra->cos_mask);
+
 if ( reg != psra->val )
 {
 wrmsrl(MSR_IA32_PSR_ASSOC, reg);
@@ -401,14 +443,32 @@ int psr_set_l3_cbm(struct domain *d, unsigned int socket,
 return 0;
 }
 
+/* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
+static void psr_free_cos(struct domain *d)
+{
+if( !d->arch.psr_cos_ids )
+return;
+
+xfree(d->arch.psr_cos_ids);
+d->arch.psr_cos_ids = NULL;
+}
+
 int psr_domain_init(struct domain *d)
 {
+if ( socket_info )
+{
+d->arch.psr_cos_ids = xzalloc_array(unsigned int, nr_sockets);
+if ( !d->arch.psr_cos_ids )
+return -ENOMEM;
+}
+
 return 0;
 }
 
 void psr_domain_free(struct domain *d)
 {
 psr_free_rmid(d);
+psr_free_cos(d);
 }
 
 static void cpu_init_work(void)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 16/24] x86: refactor psr: implement set value callback functions for CDP.

This patch implements L3 CDP set value related callback functions.

With this patch, 'psr-cat-cbm-set' command can work for L3 CDP.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 118 +
 1 file changed, 118 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index dc062ff..596e5b1 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -548,10 +548,128 @@ static bool l3_cdp_get_val(const struct feat_node *feat, 
unsigned int cos,
 return true;
 }
 
+static unsigned int l3_cdp_get_cos_num(const struct feat_node *feat)
+{
+return 2;
+}
+
+static int l3_cdp_get_old_val(uint64_t val[],
+  const struct feat_node *feat,
+  unsigned int old_cos)
+{
+if ( old_cos > feat->info.l3_cdp_info.cos_max )
+/* Use default value. */
+old_cos = 0;
+
+/* Data */
+val[0] = get_cdp_data(feat, old_cos);
+/* Code */
+val[1] = get_cdp_code(feat, old_cos);
+
+return 0;
+}
+
+static int l3_cdp_set_new_val(uint64_t val[],
+  const struct feat_node *feat,
+  enum cbm_type type,
+  uint64_t m)
+{
+if ( !psr_check_cbm(feat->info.l3_cdp_info.cbm_len, m) )
+return -EINVAL;
+
+if ( type == PSR_CBM_TYPE_L3_DATA )
+val[0] = m;
+else
+val[1] = m;
+
+return 0;
+}
+
+static int l3_cdp_compare_val(const uint64_t val[],
+  const struct feat_node *feat,
+  unsigned int cos, bool *found)
+{
+uint64_t l3_def_cbm;
+
+l3_def_cbm = (1ull << feat->info.l3_cdp_info.cbm_len) - 1;
+
+/*
+ * Different features' cos_max are different. If cos id of the feature
+ * being set exceeds other feature's cos_max, the val of other feature
+ * must be default value. HW supports such case.
+ */
+if ( cos > feat->info.l3_cdp_info.cos_max )
+{
+if ( val[0] != l3_def_cbm ||
+ val[1] != l3_def_cbm )
+{
+*found = false;
+return -ENOENT;
+}
+*found = true;
+}
+else
+*found = (val[0] == get_cdp_data(feat, cos) &&
+  val[1] == get_cdp_code(feat, cos));
+
+return 0;
+}
+
+static bool l3_cdp_fits_cos_max(const uint64_t val[],
+const struct feat_node *feat,
+unsigned int cos)
+{
+uint64_t l3_def_cbm;
+
+l3_def_cbm = (1ull << feat->info.l3_cdp_info.cbm_len) - 1;
+
+if ( cos > feat->info.l3_cdp_info.cos_max &&
+ (val[0] != l3_def_cbm || val[1] != l3_def_cbm) )
+/*
+ * Exceed cos_max and value to set is not default,
+ * return error.
+ */
+return false;
+
+return true;
+}
+
+static int l3_cdp_write_msr(unsigned int cos, const uint64_t val[],
+struct feat_node *feat)
+{
+/*
+ * If input cos is more than the cos_max of the feature, we should
+ * not set the value.
+ */
+if ( cos > feat->info.l3_cdp_info.cos_max )
+return -EINVAL;
+
+/* Data */
+if ( get_cdp_data(feat, cos) != val[0] )
+{
+get_cdp_data(feat, cos) = val[0];
+wrmsrl(MSR_IA32_PSR_L3_MASK_DATA(cos), val[0]);
+}
+/* Code */
+if ( get_cdp_code(feat, cos) != val[1] )
+{
+get_cdp_code(feat, cos) = val[1];
+wrmsrl(MSR_IA32_PSR_L3_MASK_CODE(cos), val[1]);
+}
+
+return 0;
+}
+
 struct feat_ops l3_cdp_ops = {
 .get_cos_max = l3_cdp_get_cos_max,
 .get_feat_info = l3_cdp_get_feat_info,
 .get_val = l3_cdp_get_val,
+.get_cos_num = l3_cdp_get_cos_num,
+.get_old_val = l3_cdp_get_old_val,
+.set_new_val = l3_cdp_set_new_val,
+.compare_val = l3_cdp_compare_val,
+.fits_cos_max = l3_cdp_fits_cos_max,
+.write_msr = l3_cdp_write_msr,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 14/24] x86: refactor psr: implement get hw info flow for CDP.

This patch implements get HW info flow for CDP including L3 CDP callback
function.

It also changes sysctl function to make it work for CDP.

With this patch, 'psr-hwinfo' can work for L3 CDP.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c| 18 ++
 xen/arch/x86/sysctl.c | 24 +---
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index a979128..b856761 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -262,6 +262,10 @@ static enum psr_feat_type psr_cbm_type_to_feat_type(enum 
cbm_type type)
 case PSR_CBM_TYPE_L3:
 feat_type = PSR_SOCKET_L3_CAT;
 break;
+case PSR_CBM_TYPE_L3_DATA:
+case PSR_CBM_TYPE_L3_CODE:
+feat_type = PSR_SOCKET_L3_CDP;
+break;
 default:
 feat_type = 0x;
 break;
@@ -516,8 +520,22 @@ static unsigned int l3_cdp_get_cos_max(const struct 
feat_node *feat)
 return feat->info.l3_cdp_info.cos_max;
 }
 
+static bool l3_cdp_get_feat_info(const struct feat_node *feat,
+ uint32_t data[], uint32_t array_len)
+{
+if ( !data || 3 > array_len )
+return false;
+
+data[CBM_LEN] = feat->info.l3_cdp_info.cbm_len;
+data[COS_MAX] = feat->info.l3_cdp_info.cos_max;
+data[PSR_FLAG] |= XEN_SYSCTL_PSR_CAT_L3_CDP;
+
+return true;
+}
+
 struct feat_ops l3_cdp_ops = {
 .get_cos_max = l3_cdp_get_cos_max,
+.get_feat_info = l3_cdp_get_feat_info,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index d90db78..a4c8cfe 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -181,9 +181,27 @@ long arch_do_sysctl(
 ret = psr_get_info(sysctl->u.psr_cat_op.target,
PSR_CBM_TYPE_L3, data, 3);
 
-sysctl->u.psr_cat_op.u.l3_info.cbm_len = data[CBM_LEN];
-sysctl->u.psr_cat_op.u.l3_info.cos_max = data[COS_MAX];
-sysctl->u.psr_cat_op.u.l3_info.flags   = data[PSR_FLAG];
+if ( !ret )
+{
+sysctl->u.psr_cat_op.u.l3_info.cbm_len = data[CBM_LEN];
+sysctl->u.psr_cat_op.u.l3_info.cos_max = data[COS_MAX];
+sysctl->u.psr_cat_op.u.l3_info.flags   = data[PSR_FLAG];
+} else {
+/*
+ * Check if CDP is enabled.
+ *
+ * Per spec, L3 CAT and CDP cannot co-exist. So, we need 
replace
+ * output values to CDP's if it is enabled.
+ */
+ret = psr_get_info(sysctl->u.psr_cat_op.target,
+   PSR_CBM_TYPE_L3_CODE, data, 3);
+if ( !ret )
+{
+sysctl->u.psr_cat_op.u.l3_info.cbm_len = data[CBM_LEN];
+sysctl->u.psr_cat_op.u.l3_info.cos_max = data[COS_MAX];
+sysctl->u.psr_cat_op.u.l3_info.flags   = data[PSR_FLAG];
+}
+}
 
 if ( !ret && __copy_field_to_guest(u_sysctl, sysctl, u.psr_cat_op) 
)
 ret = -EFAULT;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v5 04/24] x86: refactor psr: implement CPU init and free flow.

This patch implements the CPU init and free flow including L3 CAT
initialization and feature list free.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 176 -
 1 file changed, 174 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index f7ff3fc..e9dc07a 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -35,6 +35,9 @@
 #define PSR_CAT(1<<1)
 #define PSR_CDP(1<<2)
 
+#define CAT_CBM_LEN_MASK 0x1f
+#define CAT_COS_MAX_MASK 0x
+
 /*
  * Per SDM chapter 'Cache Allocation Technology: Cache Mask Configuration',
  * the MSRs range from 0C90H through 0D0FH (inclusive), enables support for
@@ -127,6 +130,13 @@ struct feat_node {
 struct list_head list;
 };
 
+struct cpuid_leaf_regs {
+unsigned int eax;
+unsigned int ebx;
+unsigned int ecx;
+unsigned int edx;
+};
+
 struct psr_assoc {
 uint64_t val;
 uint64_t cos_mask;
@@ -134,11 +144,76 @@ struct psr_assoc {
 
 struct psr_cmt *__read_mostly psr_cmt;
 
+static struct psr_socket_info *__read_mostly socket_info;
+
 static unsigned int opt_psr;
 static unsigned int __initdata opt_rmid_max = 255;
+static unsigned int __read_mostly opt_cos_max = MAX_COS_REG_CNT;
 static uint64_t rmid_mask;
 static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
 
+/*
+ * Declare global feature list entry for every feature to facilitate the
+ * feature list creation. It will be allocated in psr_cpu_prepare() and
+ * inserted into feature list in cpu_init_work().
+ */
+static struct feat_node *feat_l3_cat;
+
+/* Common functions. */
+static void free_feature(struct psr_socket_info *info)
+{
+struct feat_node *feat, *next;
+
+if ( !info )
+return;
+
+list_for_each_entry_safe(feat, next, >feat_list, list)
+{
+clear_bit(feat->feature, >feat_mask);
+list_del(>list);
+xfree(feat);
+}
+}
+
+/* L3 CAT functions implementation. */
+static void l3_cat_init_feature(struct cpuid_leaf_regs regs,
+struct feat_node *feat,
+struct psr_socket_info *info)
+{
+struct psr_cat_hw_info l3_cat;
+unsigned int socket;
+
+/* No valid value so do not enable feature. */
+if ( !regs.eax || !regs.edx )
+return;
+
+l3_cat.cbm_len = (regs.eax & CAT_CBM_LEN_MASK) + 1;
+l3_cat.cos_max = min(opt_cos_max, regs.edx & CAT_COS_MAX_MASK);
+
+/* cos=0 is reserved as default cbm(all bits within cbm_len are 1). */
+feat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
+
+feat->feature = PSR_SOCKET_L3_CAT;
+__set_bit(PSR_SOCKET_L3_CAT, >feat_mask);
+
+feat->info.l3_cat_info = l3_cat;
+
+info->nr_feat++;
+
+/* Add this feature into list. */
+list_add_tail(>list, >feat_list);
+
+socket = cpu_to_socket(smp_processor_id());
+if ( opt_cpu_info )
+printk(XENLOG_INFO
+   "L3 CAT: enabled on socket %u, cos_max:%u, cbm_len:%u\n",
+   socket, feat->info.l3_cat_info.cos_max,
+   feat->info.l3_cat_info.cbm_len);
+}
+
+static const struct feat_ops l3_cat_ops = {
+};
+
 static void __init parse_psr_bool(char *s, char *value, char *feature,
   unsigned int mask)
 {
@@ -178,6 +253,9 @@ static void __init parse_psr_param(char *s)
 if ( val_str && !strcmp(s, "rmid_max") )
 opt_rmid_max = simple_strtoul(val_str, NULL, 0);
 
+if ( val_str && !strcmp(s, "cos_max") )
+opt_cos_max = simple_strtoul(val_str, NULL, 0);
+
 s = ss + 1;
 } while ( ss );
 }
@@ -333,18 +411,108 @@ void psr_domain_free(struct domain *d)
 psr_free_rmid(d);
 }
 
+static void cpu_init_work(void)
+{
+struct psr_socket_info *info;
+unsigned int socket;
+unsigned int cpu = smp_processor_id();
+struct feat_node *feat;
+struct cpuid_leaf_regs regs;
+
+if ( !cpu_has(_cpu_data, X86_FEATURE_PQE) )
+return;
+else if ( current_cpu_data.cpuid_level < PSR_CPUID_LEVEL_CAT )
+{
+clear_bit(X86_FEATURE_PQE, current_cpu_data.x86_capability);
+return;
+}
+
+socket = cpu_to_socket(cpu);
+info = socket_info + socket;
+if ( info->feat_mask )
+return;
+
+INIT_LIST_HEAD(>feat_list);
+spin_lock_init(>ref_lock);
+
+cpuid_count(PSR_CPUID_LEVEL_CAT, 0,
+, , , );
+if ( regs.ebx & PSR_RESOURCE_TYPE_L3 )
+{
+cpuid_count(PSR_CPUID_LEVEL_CAT, 1,
+, , , );
+
+feat = feat_l3_cat;
+feat_l3_cat = NULL;
+feat->ops = l3_cat_ops;
+
+l3_cat_init_feature(regs, feat, info);
+}
+}
+
+static void cpu_fini_work(unsigned int cpu)
+{
+unsigned int socket = cpu_to_socket(cpu);
+
+if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) )
+{
+free_feature(socket_info + socket);
+}
+}
+
+static void __init init_psr(void)
+{
+if ( opt_cos_max < 1 )
+

[Xen-devel] [PATCH v5 03/24] x86: refactor psr: implement main data structures.

To construct an extendible framework, we need analyze PSR features
and abstract the common things and feature specific things. Then,
encapsulate them into different data structures.

By analyzing PSR features, we can get below map.
+--+--+--+
  ->| Dom0 | Dom1 | ...  |
  | +--+--+--+
  ||
  |Dom ID  | cos_id of domain
  |V
  |
+-+
User ->| PSR
 |
 Socket ID |  +--+---+---+  
 |
   |  | Socket0 Info | Socket 1 Info |...|  
 |
   |  +--+---+---+  
 |
   ||   cos_id=0   cos_id=1 
 ... |
   ||  
+---+---+---+ |
   ||->Ref   : | ref 0 | ref 1 
| ...   | |
   ||  
+---+---+---+ |
   ||  
+---+---+---+ |
   ||->L3 CAT: | cos 0 | cos 1 
| ...   | |
   ||  
+---+---+---+ |
   ||  
+---+---+---+ |
   ||->L2 CAT: | cos 0 | cos 1 
| ...   | |
   ||  
+---+---+---+ |
   ||  
+---+---+---+---+---+ |
   ||->CDP   : | cos0 code | cos0 data | cos1 code | cos1 data 
| ...   | |
   |   
+---+---+---+---+---+ |
   
+-+

So, we need define a socket info data structure, 'struct
psr_socket_info' to manage information per socket. It contains a
reference count array according to COS ID and a feature list to
manage all features enabled. Every entry of the reference count
array is used to record how many domains are using the COS registers
according to the COS ID. For example, L3 CAT and L2 CAT are enabled,
Dom1 uses COS_ID=1 registers of both features to save CBM values, like
below.
+---+---+---+-+
| COS 0 | COS 1 | COS 2 | ... |
+---+---+---+-+
L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
+---+---+---+-+
L2 CAT  | 0xff  | 0xff  | ...   | ... |
+---+---+---+-+

If Dom2 has same CBM values, it can reuse these registers which COS_ID=1.
That means, both Dom1 and Dom2 use same COS registers(ID=1) to save same
L3/L2 values. So, the value ref[1] is 2 which means 2 domains are using
COS_ID 1.

To manage a feature, we need define a feature node data structure,
'struct feat_node', to manage feature's specific HW info, its callback
functions (all feature's specific behaviors are encapsulated into these
callback functions), and an array of all COS registers values of this
feature.

CDP is a special feature which uses two entries of the array
for one COS ID. So, the number of CDP COS registers is the half of L3
CAT. E.g. L3 CAT has 16 COS registers, then CDP has 8 COS registers if
it is enabled. CDP uses the COS registers array as below.

 
+---+---+---+---+---+
CDP cos_reg_val[] index: | 0 | 1 | 2 | 3 |
...|
 
+---+---+---+---+---+
  value: | cos0 code | cos0 data | cos1 code | cos1 data |
...|
 
+---+---+---+---+---+

For more details, please refer spec and codes.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 104 +
 1 file changed, 104 insertions(+)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 96a8589..f7ff3fc 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -17,12 +17,116 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
+/*
+ * Terminology:
+ * - CAT Cache Allocation Technology
+ * - CBM Capacity BitMasks
+ * - CDP Code and Data Prioritization
+ * - COS/CLOSClass of Service. Also mean COS registers.
+ * - COS_MAX Max number of COS for the feature (minus 1)
+ * - MSRsMachine Specific Registers
+ * - PSR Intel

[Xen-devel] [PATCH v5 06/24] x86: refactor psr: implement get hw info flow.

This patch implements get HW info flow including L3 CAT callback
function.

It also changes sysctl interface to make it more general.

With this patch, 'psr-hwinfo' can work for L3 CAT.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c| 73 +--
 xen/arch/x86/sysctl.c | 14 +
 xen/include/asm-x86/psr.h |  9 --
 3 files changed, 86 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 7f06235..319bfcc 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -109,6 +109,9 @@ struct feat_node;
 struct feat_ops {
 /* get_cos_max is used to get feature's cos_max. */
 unsigned int (*get_cos_max)(const struct feat_node *feat);
+/* get_feat_info is used to get feature HW info. */
+bool (*get_feat_info)(const struct feat_node *feat,
+  uint32_t data[], unsigned int array_len);
 };
 
 /*
@@ -177,6 +180,23 @@ static void free_feature(struct psr_socket_info *info)
 }
 }
 
+static enum psr_feat_type psr_cbm_type_to_feat_type(enum cbm_type type)
+{
+enum psr_feat_type feat_type;
+
+/* Judge if feature is enabled. */
+switch ( type ) {
+case PSR_CBM_TYPE_L3:
+feat_type = PSR_SOCKET_L3_CAT;
+break;
+default:
+feat_type = 0x;
+break;
+}
+
+return feat_type;
+}
+
 /* L3 CAT functions implementation. */
 static void l3_cat_init_feature(struct cpuid_leaf_regs regs,
 struct feat_node *feat,
@@ -218,8 +238,22 @@ static unsigned int l3_cat_get_cos_max(const struct 
feat_node *feat)
 return feat->info.l3_cat_info.cos_max;
 }
 
+static bool l3_cat_get_feat_info(const struct feat_node *feat,
+ uint32_t data[], unsigned int array_len)
+{
+if ( !data || 3 > array_len )
+return false;
+
+data[CBM_LEN] = feat->info.l3_cat_info.cbm_len;
+data[COS_MAX] = feat->info.l3_cat_info.cos_max;
+data[PSR_FLAG] = 0;
+
+return true;
+}
+
 static const struct feat_ops l3_cat_ops = {
 .get_cos_max = l3_cat_get_cos_max,
+.get_feat_info = l3_cat_get_feat_info,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
@@ -425,10 +459,43 @@ void psr_ctxt_switch_to(struct domain *d)
 }
 }
 
-int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
-uint32_t *cos_max, uint32_t *flags)
+static struct psr_socket_info *get_socket_info(unsigned int socket)
 {
-return 0;
+if ( !socket_info )
+return ERR_PTR(-ENODEV);
+
+if ( socket >= nr_sockets )
+return ERR_PTR(-ENOTSOCK);
+
+if ( !socket_info[socket].feat_mask )
+return ERR_PTR(-ENOENT);
+
+return socket_info + socket;
+}
+
+int psr_get_info(unsigned int socket, enum cbm_type type,
+ uint32_t data[], unsigned int array_len)
+{
+const struct psr_socket_info *info = get_socket_info(socket);
+const struct feat_node *feat;
+enum psr_feat_type feat_type;
+
+if ( IS_ERR(info) )
+return PTR_ERR(info);
+
+feat_type = psr_cbm_type_to_feat_type(type);
+list_for_each_entry(feat, >feat_list, list)
+{
+if ( feat->feature != feat_type )
+continue;
+
+if ( feat->ops.get_feat_info(feat, data, array_len) )
+return 0;
+else
+return -EINVAL;
+}
+
+return -ENOENT;
 }
 
 int psr_get_l3_cbm(struct domain *d, unsigned int socket,
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 14e7dc7..d90db78 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -176,15 +176,19 @@ long arch_do_sysctl(
 switch ( sysctl->u.psr_cat_op.cmd )
 {
 case XEN_SYSCTL_PSR_CAT_get_l3_info:
-ret = psr_get_cat_l3_info(sysctl->u.psr_cat_op.target,
-  >u.psr_cat_op.u.l3_info.cbm_len,
-  >u.psr_cat_op.u.l3_info.cos_max,
-  >u.psr_cat_op.u.l3_info.flags);
+{
+uint32_t data[3];
+ret = psr_get_info(sysctl->u.psr_cat_op.target,
+   PSR_CBM_TYPE_L3, data, 3);
+
+sysctl->u.psr_cat_op.u.l3_info.cbm_len = data[CBM_LEN];
+sysctl->u.psr_cat_op.u.l3_info.cos_max = data[COS_MAX];
+sysctl->u.psr_cat_op.u.l3_info.flags   = data[PSR_FLAG];
 
 if ( !ret && __copy_field_to_guest(u_sysctl, sysctl, u.psr_cat_op) 
)
 ret = -EFAULT;
 break;
-
+}
 default:
 ret = -EOPNOTSUPP;
 break;
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index 57f47e9..e3b18bc 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -33,6 +33,11 @@
 /* L3 CDP Enable bit*/
 #define PSR_L3_QOS_CDP_ENABLE_BIT   0x0
 
+/* Used by psr_get_info() */
+#define CBM_LEN  0
+#define

[Xen-devel] [PATCH v5 01/24] docs: create L2 Cache Allocation Technology (CAT) feature document

This patch creates L2 CAT feature document in doc/features/.
It describes details of L2 CAT.

Signed-off-by: Yi Sun 
---
 docs/features/intel_psr_l2_cat.pandoc | 347 ++
 1 file changed, 347 insertions(+)
 create mode 100644 docs/features/intel_psr_l2_cat.pandoc

diff --git a/docs/features/intel_psr_l2_cat.pandoc 
b/docs/features/intel_psr_l2_cat.pandoc
new file mode 100644
index 000..77bd61f
--- /dev/null
+++ b/docs/features/intel_psr_l2_cat.pandoc
@@ -0,0 +1,347 @@
+% Intel L2 Cache Allocation Technology (L2 CAT) Feature
+% Revision 1.0
+
+\clearpage
+
+# Basics
+
+ 
+ Status: **Tech Preview**
+
+Architecture(s): Intel x86
+
+   Component(s): Hypervisor, toolstack
+
+   Hardware: Atom codename Goldmont and beyond CPUs
+ 
+
+# Overview
+
+L2 CAT allows an OS or Hypervisor/VMM to control allocation of a
+CPU's shared L2 cache based on application priority or Class of Service
+(COS). Each CLOS is configured using capacity bitmasks (CBM) which
+represent cache capacity and indicate the degree of overlap and
+isolation between classes. Once L2 CAT is configured, the processor
+allows access to portions of L2 cache according to the established
+class of service.
+
+## Terminology
+
+* CAT Cache Allocation Technology
+* CBM Capacity BitMasks
+* CDP Code and Data Prioritization
+* COS/CLOSClass of Service
+* MSRsMachine Specific Registers
+* PSR Intel Platform Shared Resource
+* VMM Virtual Machine Monitor
+
+# User details
+
+* Feature Enabling:
+
+  Add "psr=cat" to boot line parameter to enable all supported level CAT
+  features.
+
+* xl interfaces:
+
+  1. `psr-cat-show [OPTIONS] domain-id`:
+
+ Show domain L2 or L3 CAT CBM.
+
+ New option `-l` is added.
+ `-l2`: Show cbm for L2 cache.
+ `-l3`: Show cbm for L3 cache.
+
+ If neither `-l2` nor `-l3` is given, show both of them. If any one
+ is not supported, will print error info.
+
+  2. `psr-cat-cbm-set [OPTIONS] domain-id cbm`:
+
+ Set domain L2 or L3 CBM.
+
+ New option `-l` is added.
+ `-l2`: Specify cbm for L2 cache.
+ `-l3`: Specify cbm for L3 cache.
+
+ If neither `-l2` nor `-l3` is given, level 3 is the default option.
+
+  3. `psr-hwinfo [OPTIONS]`:
+
+ Show L2 & L3 CAT HW informations on every socket.
+
+# Technical details
+
+L2 CAT is a member of Intel PSR features and part of CAT, it shares
+some base PSR infrastructure in Xen.
+
+## Hardware perspective
+
+L2 CAT defines a new range MSRs to assign different L2 cache access
+patterns which are known as CBMs, each CBM is associated with a COS.
+
+```
+
++++
+   IA32_PQR_ASSOC   | MSR (per socket)   |Address |
+ ++---+---+ +++
+ ||COS|   | | IA32_L2_QOS_MASK_0 | 0xD10  |
+ ++---+---+ +++
+└-> | ...|  ...   |
++++
+| IA32_L2_QOS_MASK_n | 0xD10+n (n<64) |
++++
+```
+
+When context switch happens, the COS of VCPU is written to per-thread
+MSR `IA32_PQR_ASSOC`, and then hardware enforces L2 cache allocation
+according to the corresponding CBM.
+
+## The relationship between L2 CAT and L3 CAT/CDP
+
+L2 CAT is independent of L3 CAT/CDP, which means L2 CAT would be enabled
+while L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are all enabled.
+
+L2 CAT uses a new range CBMs from 0xD10 ~ 0xD10+n (n<64), following by
+the L3 CAT/CDP CBMs, and supports setting different L2 cache accessing
+patterns from L3 cache. Like L3 CAT/CDP requirement, the bits of CBM of
+L2 CAT must be continuous too.
+
+N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same
+associate register `IA32_PQR_ASSOC`, which means one COS associates to a
+pair of L2 CBM and L3 CBM.
+
+Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or
+other PSR features in future). In some cases, a VM is permitted to have a
+COS that is beyond one (or more) of PSR features but within the others.
+For instance, let's assume the max COS of L2 CAT is 8 but the max COS of
+L3 CAT is 16, when a VM is assigned 9 as COS, the L3 CBM associated to
+COS 9 would be enforced, but for L2 CAT, the behavior is fully open (no
+limit) since COS 9 is beyond the max COS (8) of L2 CAT.
+
+## Design Overview
+
+* Core COS/CBM association
+
+  When enforcing L2 CAT, all cores of domains have the same default
+  COS (COS0) which associated to the fully open CBM (all ones bitmask)
+  to access all L2 cache.

[Xen-devel] [PATCH v5 02/24] x86: refactor psr: remove L3 CAT/CDP codes.

The current cache allocation codes in psr.c do not consider
future features addition and are not friendly to extend.

To make psr.c be more flexible to add new features and fulfill
the program principle, open for extension but closed for
modification, we have to refactor the psr.c:
1. Analyze cache allocation features and abstract general data
   structures.
2. Analyze the init and all other functions flow, abstract all
   steps that different features may have different implementations.
   Make these steps be callback functions and register feature
   specific fuctions. Then, the main processes will not be changed
   when introducing a new feature.

Because the quantity of refactor codes is big and the logics are
changed a lot, it will cause reviewers confused if just change
old codes. Reviewers have to understand both old codes and new
implementations. After review iterations from V1 to V3, Jan has
proposed to remove all old cache allocation codes firstly, then
implement new codes step by step. This will help to make codes
be more easily reviewable.

There is no construction without destruction. So, this patch
removes all current L3 CAT/CDP codes in psr.c. The following
patches will introduce the new mechanism.

Signed-off-by: Yi Sun 
Acked-by: Jan Beulich 
---
 xen/arch/x86/psr.c | 470 +
 1 file changed, 5 insertions(+), 465 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 0b5073c..96a8589 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -23,24 +23,6 @@
 #define PSR_CAT(1<<1)
 #define PSR_CDP(1<<2)
 
-struct psr_cat_cbm {
-union {
-uint64_t cbm;
-struct {
-uint64_t code;
-uint64_t data;
-};
-};
-unsigned int ref;
-};
-
-struct psr_cat_socket_info {
-unsigned int cbm_len;
-unsigned int cos_max;
-struct psr_cat_cbm *cos_to_cbm;
-spinlock_t cbm_lock;
-};
-
 struct psr_assoc {
 uint64_t val;
 uint64_t cos_mask;
@@ -48,26 +30,11 @@ struct psr_assoc {
 
 struct psr_cmt *__read_mostly psr_cmt;
 
-static unsigned long *__read_mostly cat_socket_enable;
-static struct psr_cat_socket_info *__read_mostly cat_socket_info;
-static unsigned long *__read_mostly cdp_socket_enable;
-
 static unsigned int opt_psr;
 static unsigned int __initdata opt_rmid_max = 255;
-static unsigned int __read_mostly opt_cos_max = 255;
 static uint64_t rmid_mask;
 static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
 
-static struct psr_cat_cbm *temp_cos_to_cbm;
-
-static unsigned int get_socket_cpu(unsigned int socket)
-{
-if ( likely(socket < nr_sockets) )
-return cpumask_any(socket_cpumask[socket]);
-
-return nr_cpu_ids;
-}
-
 static void __init parse_psr_bool(char *s, char *value, char *feature,
   unsigned int mask)
 {
@@ -107,9 +74,6 @@ static void __init parse_psr_param(char *s)
 if ( val_str && !strcmp(s, "rmid_max") )
 opt_rmid_max = simple_strtoul(val_str, NULL, 0);
 
-if ( val_str && !strcmp(s, "cos_max") )
-opt_cos_max = simple_strtoul(val_str, NULL, 0);
-
 s = ss + 1;
 } while ( ss );
 }
@@ -213,16 +177,7 @@ static inline void psr_assoc_init(void)
 {
 struct psr_assoc *psra = _cpu(psr_assoc);
 
-if ( cat_socket_info )
-{
-unsigned int socket = cpu_to_socket(smp_processor_id());
-
-if ( test_bit(socket, cat_socket_enable) )
-psra->cos_mask = ((1ull << get_count_order(
- cat_socket_info[socket].cos_max)) - 1) << 32;
-}
-
-if ( psr_cmt_enabled() || psra->cos_mask )
+if ( psr_cmt_enabled() )
 rdmsrl(MSR_IA32_PSR_ASSOC, psra->val);
 }
 
@@ -231,12 +186,6 @@ static inline void psr_assoc_rmid(uint64_t *reg, unsigned 
int rmid)
 *reg = (*reg & ~rmid_mask) | (rmid & rmid_mask);
 }
 
-static inline void psr_assoc_cos(uint64_t *reg, unsigned int cos,
- uint64_t cos_mask)
-{
-*reg = (*reg & ~cos_mask) | (((uint64_t)cos << 32) & cos_mask);
-}
-
 void psr_ctxt_switch_to(struct domain *d)
 {
 struct psr_assoc *psra = _cpu(psr_assoc);
@@ -245,459 +194,54 @@ void psr_ctxt_switch_to(struct domain *d)
 if ( psr_cmt_enabled() )
 psr_assoc_rmid(, d->arch.psr_rmid);
 
-if ( psra->cos_mask )
-psr_assoc_cos(, d->arch.psr_cos_ids ?
-  d->arch.psr_cos_ids[cpu_to_socket(smp_processor_id())] :
-  0, psra->cos_mask);
-
 if ( reg != psra->val )
 {
 wrmsrl(MSR_IA32_PSR_ASSOC, reg);
 psra->val = reg;
 }
 }
-static struct psr_cat_socket_info *get_cat_socket_info(unsigned int socket)
-{
-if ( !cat_socket_info )
-return ERR_PTR(-ENODEV);
-
-if ( socket >= nr_sockets )
-return ERR_PTR(-ENOTSOCK);
-
-if ( !test_bit(socket, cat_socket_enable) )
-return ERR_PTR(-ENOENT);
-
-return

[Xen-devel] [PATCH v5 09/24] x86: refactor psr: set value: assemble features value array.

Only can one COS ID be used by one domain at one time. That means all enabled
features' COS registers at this COS ID are valid for this domain at that time.

When user updates a feature's value, we need make sure all other features'
values are not affected. So, we firstly need assemble an array which contains
all features current values and replace the setting feature's value in array
to new value.

Then, we can try to find if there is a COS ID on which all features' COS
registers values are same as the array. If we can find, we just use this COS
ID. If fail to find, we need allocate a new COS ID.

This patch implements value array assembling flow.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 145 +++--
 1 file changed, 142 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 050b0df..7c6f2bf 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -115,6 +115,32 @@ struct feat_ops {
 /* get_val is used to get feature COS register value. */
 bool (*get_val)(const struct feat_node *feat, unsigned int cos,
 enum cbm_type type, uint64_t *val);
+/*
+ * get_cos_num is used to get the COS registers amount used by the
+ * feature for one setting, e.g. CDP uses 2 COSs but CAT uses 1.
+ */
+unsigned int (*get_cos_num)(const struct feat_node *feat);
+/*
+ * get_old_val and set_new_val are a pair of functions called in order.
+ * The caller will traverse all features in the list and call both
+ * functions for every feature to do below two things:
+ * 1. get old_cos register value of all supported features and
+ * 2. set the new value for the feature.
+ *
+ * All the values are set into value array according the traversal order,
+ * meaning the same order of feature list members.
+ *
+ * The return value meaning:
+ * 0 - success.
+ * negative - error.
+ */
+int (*get_old_val)(uint64_t val[],
+   const struct feat_node *feat,
+   unsigned int old_cos);
+int (*set_new_val)(uint64_t val[],
+   const struct feat_node *feat,
+   enum cbm_type type,
+   uint64_t m);
 };
 
 /*
@@ -200,6 +226,29 @@ static enum psr_feat_type psr_cbm_type_to_feat_type(enum 
cbm_type type)
 return feat_type;
 }
 
+static bool psr_check_cbm(unsigned int cbm_len, uint64_t cbm)
+{
+unsigned int first_bit, zero_bit;
+
+/* Set bits should only in the range of [0, cbm_len). */
+if ( cbm & (~0ull << cbm_len) )
+return false;
+
+/* At least one bit need to be set. */
+if ( cbm == 0 )
+return false;
+
+first_bit = find_first_bit(, cbm_len);
+zero_bit = find_next_zero_bit(, cbm_len, first_bit);
+
+/* Set bits should be contiguous. */
+if ( zero_bit < cbm_len &&
+ find_next_bit(, cbm_len, zero_bit) < cbm_len )
+return false;
+
+return true;
+}
+
 /* L3 CAT functions implementation. */
 static void l3_cat_init_feature(struct cpuid_leaf_regs regs,
 struct feat_node *feat,
@@ -266,10 +315,45 @@ static bool l3_cat_get_val(const struct feat_node *feat, 
unsigned int cos,
 return true;
 }
 
+static unsigned int l3_cat_get_cos_num(const struct feat_node *feat)
+{
+return 1;
+}
+
+static int l3_cat_get_old_val(uint64_t val[],
+  const struct feat_node *feat,
+  unsigned int old_cos)
+{
+if ( old_cos > feat->info.l3_cat_info.cos_max )
+/* Use default value. */
+old_cos = 0;
+
+/* CAT */
+val[0] =  feat->cos_reg_val[old_cos];
+
+return 0;
+}
+
+static int l3_cat_set_new_val(uint64_t val[],
+  const struct feat_node *feat,
+  enum cbm_type type,
+  uint64_t m)
+{
+if ( !psr_check_cbm(feat->info.l3_cat_info.cbm_len, m) )
+return -EINVAL;
+
+val[0] = m;
+
+return 0;
+}
+
 static const struct feat_ops l3_cat_ops = {
 .get_cos_max = l3_cat_get_cos_max,
 .get_feat_info = l3_cat_get_feat_info,
 .get_val = l3_cat_get_val,
+.get_cos_num = l3_cat_get_cos_num,
+.get_old_val = l3_cat_get_old_val,
+.set_new_val = l3_cat_set_new_val,
 };
 
 static void __init parse_psr_bool(char *s, char *value, char *feature,
@@ -542,7 +626,14 @@ int psr_get_val(struct domain *d, unsigned int socket,
 /* Set value functions */
 static unsigned int get_cos_num(const struct psr_socket_info *info)
 {
-return 0;
+const struct feat_node *feat_tmp;
+unsigned int num = 0;
+
+/* Get all features total amount. */
+list_for_each_entry(feat_tmp, >feat_list, list)
+num += feat_tmp->ops.get_cos_num(feat_tmp);
+
+return num;
 }
 
 static int assemble_val_array(uint64_t *val,
@@ -550,7 +641,25 @@ static int

[Xen-devel] [PATCH v5 08/24] x86: refactor psr: set value: implement framework.

As set value flow is the most complicated one in psr, it will be
divided to some patches to make things clearer. This patch
implements the set value framework to show a whole picture firstly.

It also changes domctl interface to make it more general.

To make the set value flow be general and can support multiple features
at same time, it includes below steps:
1. Get COS ID of current domain using.
2. Assemble a value array to store all features current value
   in it and replace the current value of the feature which is
   being set to the new input value.
3. Find if there is already a COS ID on which all features'
   values are same as the array. Then, we can reuse this COS
   ID.
4. If fail to find, we need pick an available COS ID. Only COS ID which ref
   is 0 or 1 can be picked.
5. Write all features MSRs according to the COS ID.
6. Update ref according to COS ID.
7. Save the COS ID into current domain's psr_cos_ids[socket] so that we
   can know which COS the domain is using on the socket.

So, some functions are abstracted and the callback functions will be
implemented in next patches.

Here is an example to understand the process. The CPU supports
two featuers, e.g. L3 CAT and L2 CAT. user wants to set L3 CAT
of Dom1 to 0x1ff.
1. Get the old_cos of Dom1 which is 0. L3 CAT is the first
element of feature list. The COS registers values are below at
this time.
---
| COS 0 | COS 1 | COS 2 | ... |
---
L3 CAT  | 0x7ff | ...   | ...   | ... |
---
L2 CAT  | 0xff  | ...   | ...   | ... |
---

2. Assemble The value array to be:
val[0]: 0x1ff
val[1]: 0xff

3. It cannot find a matching COS.

4. Allocate COS 1 to store the value set.

5. Write the COS 1 registers. The COS registers values are
changed to below now.
---
| COS 0 | COS 1 | COS 2 | ... |
---
L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
---
L2 CAT  | 0xff  | 0xff  | ...   | ... |
---

6. The ref[1] is increased to 1 because Dom1 is using it now.

7. Save 1 to Dom1's psr_cos_ids[socket].

Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos
of Dom2 is 0 too. Repeat above flow.

The val array assembled is:
val[0]: 0x1ff
val[1]: 0xff

So, it can find a matching COS, COS 1. Then, it can reuse COS 1
for Dom2.

The ref[1] is increased to 2 now because both Dom1 and Dom2 are
using this COS ID. Set 1 to Dom2's psr_cos_ids[socket].

Signed-off-by: Yi Sun 
---
 xen/arch/x86/domctl.c |  18 ++---
 xen/arch/x86/psr.c| 202 +-
 xen/include/asm-x86/psr.h |   4 +-
 3 files changed, 210 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 11d2127..db56500 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1365,21 +1365,21 @@ long arch_do_domctl(
 switch ( domctl->u.psr_cat_op.cmd )
 {
 case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM:
-ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
- domctl->u.psr_cat_op.data,
- PSR_CBM_TYPE_L3);
+ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+  domctl->u.psr_cat_op.data,
+  PSR_CBM_TYPE_L3);
 break;
 
 case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE:
-ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
- domctl->u.psr_cat_op.data,
- PSR_CBM_TYPE_L3_CODE);
+ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+  domctl->u.psr_cat_op.data,
+  PSR_CBM_TYPE_L3_CODE);
 break;
 
 case XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA:
-ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
- domctl->u.psr_cat_op.data,
- PSR_CBM_TYPE_L3_DATA);
+ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+  domctl->u.psr_cat_op.data,
+  PSR_CBM_TYPE_L3_DATA);
 break;
 
 case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM:
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 3cbb60c..050b0df 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -539,18 +539,214 @@ int psr_get_val(struct domain *d, unsigned int socket,
 return -ENOENT;
 }
 
-int psr_set_l3_cbm(struct domain *d, unsigned int socket,
-   uint64_t cbm, enum cbm_type type)
+/* Set value functions */
+static unsigned int get_cos_num(const struct psr_socket_info *info)
 {
 return 0;
 }
 
+static int assemble_val_array(uint64_t *val,
+

[Xen-devel] [xen-unstable-smoke test] 104232: tolerable all pass - PUSHED

flight 104232 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104232/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  ac94372254232a3ba0ecbb5ae8034cbcc0eeac00
baseline version:
 xen  5ad98e3c7fa92f46d77a788e1109b7d282bd1256

Last test of basis   104206  2017-01-17 12:01:29 Z0 days
Testing same since   104232  2017-01-18 00:01:38 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=ac94372254232a3ba0ecbb5ae8034cbcc0eeac00
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
ac94372254232a3ba0ecbb5ae8034cbcc0eeac00
+ branch=xen-unstable-smoke
+ revision=ac94372254232a3ba0ecbb5ae8034cbcc0eeac00
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.8-testing
+ '[' xac94372254232a3ba0ecbb5ae8034cbcc0eeac00 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git
++ :

[Xen-devel] [PATCH] xen: credit2: improve debug dump output.

Scheduling information debug dump for Credit2 is hard
to read as it contains the same information repeated
multiple time in different ways.

In fact, in Credit2, CPUs are grouped in runqueus. Before
this change, for each CPU, we were printing the while
content of the runqueue, as shown below:

 CPU[00]  sibling=03, core=ff
run: [32767.0] flags=0 cpu=0 credit=-1073741824 [w=0] load=0 (~0%)
  1: [0.0] flags=0 cpu=2 credit=3860932 [w=256] load=262144 (~100%)
  2: [0.1] flags=0 cpu=2 credit=3859906 [w=256] load=262144 (~100%)
 CPU[01]  sibling=03, core=ff
run: [32767.1] flags=0 cpu=1 credit=-1073741824 [w=0] load=0 (~0%)
  1: [0.0] flags=0 cpu=2 credit=2859840 [w=256] load=262144 (~100%)
  2: [0.3] flags=0 cpu=2 credit=-17466062 [w=256] load=262144 (~100%)
 CPU[02]  sibling=0c, core=ff
run: [0.0] flags=2 cpu=2 credit=1858628 [w=256] load=262144 (~100%)
  1: [0.3] flags=0 cpu=2 credit=-17466062 [w=256] load=262144 (~100%)
  2: [0.1] flags=0 cpu=2 credit=-23957055 [w=256] load=262144 (~100%)
 CPU[03]  sibling=0c, core=ff
run: [32767.3] flags=0 cpu=3 credit=-1073741824 [w=0] load=0 (~0%)
  1: [0.1] flags=0 cpu=2 credit=-3957055 [w=256] load=262144 (~100%)
  2: [0.0] flags=0 cpu=2 credit=-6216254 [w=256] load=262144 (~100%)
 CPU[04]  sibling=30, core=ff
run: [32767.4] flags=0 cpu=4 credit=-1073741824 [w=0] load=0 (~0%)
  1: [0.1] flags=0 cpu=2 credit=3782667 [w=256] load=262144 (~100%)
  2: [0.3] flags=0 cpu=2 credit=-16287483 [w=256] load=262144 (~100%)

As it can be seen, all the CPUs print the whole content
of the runqueue they belong to, at the time of their
sampling, and this is cumbersome and hard to interpret!

In new output format we print, for each CPU, only the vCPU
that is running there (if that's not the idle vCPU, in which
case, nothing is printed), while the runqueues content
is printed only once, in a dedicated section.

An example:

 CPUs info:
 CPU[02]  runq=0, sibling=0c, core=ff
run: [0.3] flags=2 cpu=2 credit=8054391 [w=256] load=262144 (~100%)
 CPU[14]  runq=1, sibling=00c000, core=00ff00
run: [0.4] flags=2 cpu=14 credit=8771420 [w=256] load=262144 (~100%)
 ... ... ... ... ... ... ... ... ...
 Runqueue info:
 runqueue 0:
  0: [0.1] flags=0 cpu=2 credit=7869771 [w=256] load=262144 (~100%)
  1: [0.0] flags=0 cpu=2 credit=7709649 [w=256] load=262144 (~100%)
 runqueue 1:
  0: [0.5] flags=0 cpu=14 credit=-1188 [w=256] load=262144 (~100%)

Note that there still is risk of inconsistency between
what is printed in the 'Runqueue info:' and in 'CPUs info:'
sections. That is unavoidable, as the relevant locks are
released and re-acquired, around each single operation.

At least, the inconsistency is less severe than before.

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
Cc: Anshul Makkar 
---
 xen/common/sched_credit2.c |   50 ++--
 xen/common/schedule.c  |1 +
 2 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef8e0d8..90fe591 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2581,50 +2581,35 @@ static void
 csched2_dump_pcpu(const struct scheduler *ops, int cpu)
 {
 struct csched2_private *prv = CSCHED2_PRIV(ops);
-struct list_head *runq, *iter;
 struct csched2_vcpu *svc;
 unsigned long flags;
 spinlock_t *lock;
-int loop;
 #define cpustr keyhandler_scratch
 
 /*
  * We need both locks:
+ * - we print current, so we need the runqueue lock for this
+ *   cpu (the one of the runqueue this cpu is associated to);
  * - csched2_dump_vcpu() wants to access domains' weights,
- *   which are protected by the private scheduler lock;
- * - we scan through the runqueue, so we need the proper runqueue
- *   lock (the one of the runqueue this cpu is associated to).
+ *   which are protected by the private scheduler lock.
  */
 read_lock_irqsave(>lock, flags);
 lock = per_cpu(schedule_data, cpu).schedule_lock;
 spin_lock(lock);
 
-runq = (ops, cpu)->runq;
-
 cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_sibling_mask, cpu));
-printk(" sibling=%s, ", cpustr);
+printk(" runq=%d, sibling=%s, ", c2r(ops, cpu), cpustr);
 cpumask_scnprintf(cpustr, sizeof(cpustr), per_cpu(cpu_core_mask, cpu));
 printk("core=%s\n", cpustr);
 
-/* current VCPU */
+/* current VCPU (nothing to say if that's the idle vcpu) */
 svc = CSCHED2_VCPU(curr_on_cpu(cpu));
-if ( svc )
+if ( svc && !is_idle_vcpu(svc->vcpu) )
 {
 printk("\trun: ");
 csched2_dump_vcpu(prv, svc);
 }
 
-loop = 0;
-list_for_each( iter, runq )
-{
-svc = __runq_elem(iter);
-if ( svc )
-{
-printk("\t%3d: ", ++loop);
-

[Xen-devel] [PATCH] xen: credit2: clear bit instead of skip step in runq_tickle()

Since we are doing cpumask manipulation already, clear a bit
in the mask at once. Doing that will save us an if, later in
the code.

No functional change intended.

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
---
 xen/common/sched_credit2.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef8e0d8..d086264 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -985,7 +985,7 @@ runq_tickle(const struct scheduler *ops, struct 
csched2_vcpu *new, s_time_t now)
 cpumask_andnot(, >active, >idle);
 cpumask_andnot(, , >tickled);
 cpumask_and(, , new->vcpu->cpu_hard_affinity);
-if ( cpumask_test_cpu(cpu, ) )
+if ( __cpumask_test_and_clear_cpu(cpu, ) )
 {
 cur = CSCHED2_VCPU(curr_on_cpu(cpu));
 burn_credits(rqd, cur, now);
@@ -1001,8 +1001,7 @@ runq_tickle(const struct scheduler *ops, struct 
csched2_vcpu *new, s_time_t now)
 for_each_cpu(i, )
 {
 /* Already looked at this one above */
-if ( i == cpu )
-continue;
+ASSERT(i != cpu);
 
 cur = CSCHED2_VCPU(curr_on_cpu(i));
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] PV audio drivers for Linux

On Tue, 17 Jan 2017, Ughreja, Rakesh A wrote:
> Hi,
> 
> I am trying to develop PV audio drivers and facing one issue to 
> achieve zero copy of the buffers between Front End (DOM1) and 
> Back End (DOM0) drivers.

You might want to take a look at the existing PV sound proposal:

http://marc.info/?l=xen-devel=148094319010445


> When the buffer is allocated using __get_free_pages() on the DOM0 
> OS, I am able to grant the access using gnttab_grant_foreign_access() 
> to DOM1 as well as I am able to map it in the DOM1 virtual space 
> using xenbus_map_ring_valloc().
> 
> However the existing audio driver allocates buffer using 
> dma_alloc_coherent(). In that case I am able to grant the access using 
> gnttab_grant_foreign_access() to DOM1 but when I try to map in the 
> DOM1 virtual space using xenbus_map_ring_valloc(), it returns an error.
> 
> [1] Code returns from here.
> 
> 507 xenbus_dev_fatal(dev, map[i].status,
> 508  "mapping in shared page %d from 
> domain %d",
> 509  gnt_refs[i], dev->otherend_id);
> 
> gnttab_batch_map(map, i) is unable to map the page, but I am unable to 
> understand why. May be its due to the difference in the way buffers
> are allocated dma_alloc_coherent() vs __get_free_pages().
> 
> Since I don't want to touch existing audio driver, I need to figure out 
> how to map buffer to DOM1 space with dma_alloc_coherent().
> 
> Any pointers would be really helpful. Thank you in advance.

Pages allocated by dma_alloc_coherent can be a bit special. Are you
going through the swiotlb-xen
(drivers/xen/swiotlb-xen.c:xen_swiotlb_alloc_coherent) in Dom0?

I would probably add a few printks to Xen in
xen/common/grant_table.c:do_grant_table_op to understand what is the
error exactly.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/arm: Don't mix GFN and MFN when using iomem_deny_access

On Tue, 17 Jan 2017, Julien Grall wrote:
> iomem_deny_access is working on MFN and not GFN. Make it clear by
> renaming the local variables.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


>  xen/arch/arm/domain_build.c |  6 +++---
>  xen/arch/arm/gic-v2.c   | 18 +-
>  xen/arch/arm/gic-v3.c   | 18 +-
>  3 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 07b868d..63301e6 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1373,7 +1373,7 @@ static int acpi_iomem_deny_access(struct domain *d)
>  {
>  acpi_status status;
>  struct acpi_table_spcr *spcr = NULL;
> -unsigned long gfn;
> +unsigned long mfn;
>  int rc;
>  
>  /* Firstly permit full MMIO capabilities. */
> @@ -1391,9 +1391,9 @@ static int acpi_iomem_deny_access(struct domain *d)
>  return -EINVAL;
>  }
>  
> -gfn = spcr->serial_port.address >> PAGE_SHIFT;
> +mfn = spcr->serial_port.address >> PAGE_SHIFT;
>  /* Deny MMIO access for UART */
> -rc = iomem_deny_access(d, gfn, gfn + 1);
> +rc = iomem_deny_access(d, mfn, mfn + 1);
>  if ( rc )
>  return rc;
>  
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index 9245e7d..cd8e504 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -991,26 +991,26 @@ static void __init gicv2_dt_init(void)
>  static int gicv2_iomem_deny_access(const struct domain *d)
>  {
>  int rc;
> -unsigned long gfn, nr;
> +unsigned long mfn, nr;
>  
> -gfn = dbase >> PAGE_SHIFT;
> -rc = iomem_deny_access(d, gfn, gfn + 1);
> +mfn = dbase >> PAGE_SHIFT;
> +rc = iomem_deny_access(d, mfn, mfn + 1);
>  if ( rc )
>  return rc;
>  
> -gfn = hbase >> PAGE_SHIFT;
> -rc = iomem_deny_access(d, gfn, gfn + 1);
> +mfn = hbase >> PAGE_SHIFT;
> +rc = iomem_deny_access(d, mfn, mfn + 1);
>  if ( rc )
>  return rc;
>  
> -gfn = cbase >> PAGE_SHIFT;
> +mfn = cbase >> PAGE_SHIFT;
>  nr = DIV_ROUND_UP(csize, PAGE_SIZE);
> -rc = iomem_deny_access(d, gfn, gfn + nr);
> +rc = iomem_deny_access(d, mfn, mfn + nr);
>  if ( rc )
>  return rc;
>  
> -gfn = vbase >> PAGE_SHIFT;
> -return iomem_deny_access(d, gfn, gfn + nr);
> +mfn = vbase >> PAGE_SHIFT;
> +return iomem_deny_access(d, mfn, mfn + nr);
>  }
>  
>  #ifdef CONFIG_ACPI
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index 12775f5..955591b 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -1238,37 +1238,37 @@ static void __init gicv3_dt_init(void)
>  static int gicv3_iomem_deny_access(const struct domain *d)
>  {
>  int rc, i;
> -unsigned long gfn, nr;
> +unsigned long mfn, nr;
>  
> -gfn = dbase >> PAGE_SHIFT;
> +mfn = dbase >> PAGE_SHIFT;
>  nr = DIV_ROUND_UP(SZ_64K, PAGE_SIZE);
> -rc = iomem_deny_access(d, gfn, gfn + nr);
> +rc = iomem_deny_access(d, mfn, mfn + nr);
>  if ( rc )
>  return rc;
>  
>  for ( i = 0; i < gicv3.rdist_count; i++ )
>  {
> -gfn = gicv3.rdist_regions[i].base >> PAGE_SHIFT;
> +mfn = gicv3.rdist_regions[i].base >> PAGE_SHIFT;
>  nr = DIV_ROUND_UP(gicv3.rdist_regions[i].size, PAGE_SIZE);
> -rc = iomem_deny_access(d, gfn, gfn + nr);
> +rc = iomem_deny_access(d, mfn, mfn + nr);
>  if ( rc )
>  return rc;
>  }
>  
>  if ( cbase != INVALID_PADDR )
>  {
> -gfn = cbase >> PAGE_SHIFT;
> +mfn = cbase >> PAGE_SHIFT;
>  nr = DIV_ROUND_UP(csize, PAGE_SIZE);
> -rc = iomem_deny_access(d, gfn, gfn + nr);
> +rc = iomem_deny_access(d, mfn, mfn + nr);
>  if ( rc )
>  return rc;
>  }
>  
>  if ( vbase != INVALID_PADDR )
>  {
> -gfn = vbase >> PAGE_SHIFT;
> +mfn = vbase >> PAGE_SHIFT;
>  nr = DIV_ROUND_UP(csize, PAGE_SIZE);
> -return iomem_deny_access(d, gfn, gfn + nr);
> +return iomem_deny_access(d, mfn, mfn + nr);
>  }
>  
>  return 0;
> -- 
> 1.9.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/arm: bootfdt.c is only used during initialization

On Tue, 17 Jan 2017, Julien Grall wrote:
> This file contains data and code only used at initialization. Mark the
> file as such in the build system and correct kind_guess.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


>  xen/arch/arm/Makefile  | 2 +-
>  xen/arch/arm/bootfdt.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 59b3b53..cf67bbe 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -5,7 +5,7 @@ subdir-$(CONFIG_ARM_64) += efi
>  subdir-$(CONFIG_ACPI) += acpi
>  
>  obj-$(CONFIG_HAS_ALTERNATIVE) += alternative.o
> -obj-y += bootfdt.o
> +obj-y += bootfdt.init.o
>  obj-y += cpu.o
>  obj-y += cpuerrata.o
>  obj-y += cpufeature.o
> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> index d130633..cae6f83 100644
> --- a/xen/arch/arm/bootfdt.c
> +++ b/xen/arch/arm/bootfdt.c
> @@ -168,7 +168,7 @@ static void __init process_multiboot_node(const void 
> *fdt, int node,
>const char *name,
>u32 address_cells, u32 size_cells)
>  {
> -static int kind_guess = 0;
> +static int __initdata kind_guess = 0;
>  const struct fdt_property *prop;
>  const __be32 *cell;
>  bootmodule_kind kind;
> -- 
> 1.9.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Reading network data going into a VM from netback.c

On Tue, 2017-01-17 at 06:02 +, #PATHANGI JANARDHANAN JATINSHRAVAN#
wrote:
> But I am not able to parse the hexadecimal string as shown above. 
> 
> Can anyone point me in the right direction regarding this?
> 
I'm afraid I can't.

At the same time, I'm quite sure that by sending the same exact HTML
email 3 times, within a couple of hours, you're makes it very unlikely
that you'd receive any useful help.

Perhaps (re)read
https://wiki.xenproject.org/wiki/Asking_Developer_Questions

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/3] xen: clean up xenbus internal headers

On 01/16/2017 09:15 AM, Juergen Gross wrote:
> The xenbus driver has an awful mixture of internally and globally
> visible headers: some of the internally used only stuff is defined in
> the global header include/xen/xenbus.h while some stuff defined in
> internal headers is used by other drivers, too.
>
> Clean this up by moving the externally used symbols to
> include/xen/xenbus.h and the symbols used internally only to a new
> header drivers/xen/xenbus/xenbus.h replacing xenbus_comms.h and
> xenbus_probe.h
>
> Signed-off-by: Juergen Gross 


Reviewed-by: Boris Ostrovsky 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86: Use ACPI reboot method for Dell OptiPlex 9020

>>> On 14.12.16 at 14:15,  wrote:
> On 14/12/16 12:58, Jan Beulich wrote:
> On 14.12.16 at 12:12,  wrote:
>>> When EFI booting the Dell OptiPlex 9020, it sometimes GP faults in the
>>> EFI runtime instead of rebooting.
>> Has it been understood what the #GP is due to? I.e. is it namely
>> not because of a mis-aligned SSE instruction memory reference?
> 
> (XEN) [  349.551011] Hardware Dom0 shutdown: rebooting machine
> (XEN) [  349.553668] APIC error on CPU0: 40(00)
> (XEN) [  349.553675] [ Xen-4.7.0-xs128737-d  x86_64  debug=y  Not tainted 
> ]
> (XEN) [  349.553676] CPU:0
> (XEN) [  349.553677] RIP:e008:[] db7aa368
> (XEN) [  349.553678] RFLAGS: 00010246   CONTEXT: hypervisor
> (XEN) [  349.553680] rax: d48595e0   rbx:    rcx: 
> 5a5a5a5a
> (XEN) [  349.553681] rdx: 1830   rsi:    rdi: 
> 8300ded37bb8
> (XEN) [  349.553682] rbp: 0021   rsp: 8300ded37b68   r8:  
> 
> (XEN) [  349.553682] r9:  0001   r10: 0004   r11: 
> 0200
> (XEN) [  349.553683] r12:    r13: 1830   r14: 
> 0065
> (XEN) [  349.553684] r15: 0010   cr0: 80050033   cr4: 
> 001526e0
> (XEN) [  349.553685] cr3: 00040df7c000   cr2: 7f7186aa1a40
> (XEN) [  349.553686] ds: 002b   es: 002b   fs:    gs:    ss: e010   
> cs: e008
> (XEN) [  349.553687] Xen code around  (db7aa368):
> (XEN) [  349.553688]  08 00 00 b9 5a 5a 5a 5a  50 18 48 8b d8 48 83 f8 1f 
> 74 0f 48 8b 15 dd
> (XEN) [  349.553692] Xen stack trace from rsp=8300ded37b68:
> (XEN) [  349.553693]0700ded37bb8 80022023 83041b92b914 
> 83041b80
> (XEN) [  349.553694] db79f348  
> 
> (XEN) [  349.553696]  000d 
> db7ff870
> (XEN) [  349.553697]  005162b3bf76 
> 
> (XEN) [  349.553698]7cff212c83e7 82d080244f57 0010 
> db7fe671
> (XEN) [  349.553699]8300ded37c38 0206 ded28000 
> 
> (XEN) [  349.553701] db7e0311  
> 
> (XEN) [  349.553702] 080f  
> 
> (XEN) [  349.553703]00040df7c000 82d080101242 ded28000 
> 8300ded37ca8
> (XEN) [  349.553704]82d080352b00 8300ded37ca0  
> fffe
> (XEN) [  349.553706]8300ded37cf8 82d0801956e7 8300ded37cd0 
> 0046
> (XEN) [  349.553707]8300dee1d000   
> 8300ded37dd8
> (XEN) [  349.553709]00fb 005162b3bf76 8300ded37d08 
> 82d080195785
> (XEN) [  349.553710]8300ded37d28 82d080137482 0016 
> 
> (XEN) [  349.553711]8300ded37d38 82d080195de7 8300ded37dc8 
> 82d08017532c
> (XEN) [  349.553713]   
> 
> (XEN) [  349.553714]83040df78c50 8300ded97000 8000dee1d000 
> 
> (XEN) [  349.553715] 83040defc000 8300ded37dc0 
> 005162dd573d
> (XEN) [  349.553717]83040df77010 83040df770e8 005162b3bf76 
> 0010
> (XEN) [  349.553718]7cff212c8207 82d080244f57 0010 
> 005162b3bf76
> (XEN) [  349.553720] Xen call trace:
> (XEN) [  349.553721][] db7aa368
> (XEN) [  349.553721] 
> (XEN) [  349.553722] 
> (XEN) [  349.553723] 
> (XEN) [  349.553723] Panic on CPU 0:
> (XEN) [  349.553724] GENERAL PROTECTION FAULT
> (XEN) [  349.553724] [error_code=]
> (XEN) [  349.553725] 
> (XEN) [  349.553725] 
> (XEN) [  349.553726] Reboot in five seconds...
> (XEN) [  349.553727] Executing kexec image on cpu0
> (XEN) [  349.553728] Shot down all CPUs
> 
> 
> This is caused by callq *0x18(%rax)
> 
> The only #GP fault to be have is if the end of that pointer is
> non-canonical.

Thanks for clarifying - I'm fine with the patch then.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems

Hi all,

I would like to discuss with ARM64 and ACPI Linux maintainers the best
way to complete ACPI support in Linux for Dom0 on ARM64.


As a reminder, Xen can only parse static ACPI tables. It doesn't have a
bytecode interpreter. Xen maps all ACPI tables to Dom0, which parses
them as it does on native. Device memory is mapped in stage-2 by Xen
upon Dom0 request: a small driver under drivers/xen registers for
BUS_NOTIFY_ADD_DEVICE events, then calls xen_map_device_mmio, which
issues XENMEM_add_to_physmap_range hypercalls to Xen that creates the
appropriate stage-2 mappings.

This approach works well, but it breaks in few interesting cases.
Specifically, anything that requires a device memory mapping but it is
not a device, doesn't generate a BUS_NOTIFY_ADD_DEVICE event, thus, no
hypercalls to Xen are made. Examples are: ACPI OperationRegion (1), ECAM
(2), other memory regions described in static tables such as BERT (3).

What is the best way to map these regions in Dom0? I am going to
detail a few options that have been proposed and evaluated so far.


(2) and (3), being described by static tables, could be parsed by Xen
and mapped beforehand. However, this approach wouldn't work for (1).
Additionally, Xen and Linux versions can mix and match, so it is
possible, even likely, to run an old Xen and a new Dom0 on a new
platform. Xen might not know about a new ACPI table, while Linux might.
In this scenario, Xen wouldn't be able to map the region described in
the new table beforehand, but Linux would still try to access it. I
imagine that this problem could be work-arounded by blacklisting any
unknown static tables in Xen, but it seems suboptimal. (By blacklisting,
I mean removing them before starting Dom0.)

For this reason, and to use the same approach for (1), (2) and (3), it
looks like the best solution is for Dom0 to request the stage-2
mappings to Xen.  If we go down this route, what is the best way to do
it?


a) One option is to provide a Xen specific implementation of
acpi_os_ioremap in Linux. I think this is the cleanest approach, but
unfortunately, it doesn't cover cases where ioremap is used directly. (2)
is one of such cases, see
arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and
drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases,
see drivers/acpi/apei/bert.c:bert_init.

b) Otherwise, we could write an alternative implementation of ioremap
on arm64. The Xen specific ioremap would request a stage-2 mapping
first, then create the stage-1 mapping as usual. However, this means
issuing an hypercall for every ioremap call.

c) Finally, a third option is to create the stage-2 mappings seamlessly
in Xen upon Dom0 memory faults. Keeping in mind that SMMU and guest
pagetables are shared in the Xen hypervisor, this approach does not work
if one of the pages that need a stage-2 mapping is used as DMA target
before Dom0 accesses it. No SMMU mappings would be available for the
page yet, so the DMA transaction would fail. After Dom0 touches the
page, the DMA transaction would succeed. I don't know how likely is this
scenario to happen, but it seems fragile to rely on it.


For these reasons, I think that the best option might be b).
Do you agree? Did I miss anything? Do you have other suggestions?


Many thanks,

Stefano


References:
https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg01693.html
https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg02425.html
https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg02531.html

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] kexec: implement STATUS hypercall to check if image is loaded

2017-01-17 Thread Daniel Kiper

On Tue, Jan 17, 2017 at 11:29:16AM -0600, Eric DeVolder wrote:
> The tools that use kexec are asynchronous in nature and do not keep
> state changes. As such provide an hypercall to find out whether an
> image has been loaded for either type.
>
> Note: No need to modify XSM as it has one size fits all check and
> does not check for subcommands.
>
> Note: No need to check KEXEC_FLAG_IN_PROGRESS (and error out of
> kexec_status()) as this flag is set only once by the first/only
> cpu on the crash path.
>
> Note: This is just the Xen side of the hypercall, kexec-tools patch
> to come separately.
>
> Signed-off-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Eric DeVolder 

Reviewed-by: Daniel Kiper 

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [seabios test] 104213: tolerable FAIL - PUSHED

flight 104213 seabios real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104213/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 104000
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 104000

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass

version targeted for testing:
 seabios  106543deb447c4005f9a9845f1f43a72547f6209
baseline version:
 seabios  9332965e1c46ddf4e19d7050f1e957a195c703fa

Last test of basis   104000  2016-12-30 16:46:04 Z   18 days
Testing same since   104213  2017-01-17 15:15:02 Z0 days1 attempts


People who touched revisions under test:
  Ladi Prosek 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
 test-amd64-amd64-qemuu-nested-amdfail
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-qemuu-nested-intel  pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass
 test-amd64-amd64-xl-qemuu-winxpsp3   pass
 test-amd64-i386-xl-qemuu-winxpsp3pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=seabios
+ revision=106543deb447c4005f9a9845f1f43a72547f6209
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push seabios 
106543deb447c4005f9a9845f1f43a72547f6209
+ branch=seabios
+ revision=106543deb447c4005f9a9845f1f43a72547f6209
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=seabios
+ xenbranch=xen-unstable
+ '[' xseabios = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable
+ prevxenbranch=xen-4.8-testing
+ '['

[Xen-devel] [PATCH v2 0/2] xen-netback: fix memory leaks on XenBus disconnect

2017-01-17 Thread Igor Druzhinin

Just split the initial patch in two as proposed by Wei.

Since the approach for locking netdev statistics is inconsistent (tends not
to have any locking at all) accross the kernel we'd better to rely on our
internal lock for this purpose.

Igor Druzhinin (2):
  xen-netback: fix memory leaks on XenBus disconnect
  xen-netback: protect resource cleaning on XenBus disconnect

 drivers/net/xen-netback/interface.c |  6 --
 drivers/net/xen-netback/xenbus.c| 13 +
 2 files changed, 17 insertions(+), 2 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 1/2] xen-netback: fix memory leaks on XenBus disconnect

2017-01-17 Thread Igor Druzhinin

Eliminate memory leaks introduced several years ago by cleaning the
queue resources which are allocated on XenBus connection event. Namely, queue
structure array and pages used for IO rings.

Signed-off-by: Igor Druzhinin 
---
 drivers/net/xen-netback/xenbus.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 6c57b02..3e99071 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -493,11 +493,20 @@ static int backend_create_xenvif(struct backend_info *be)
 static void backend_disconnect(struct backend_info *be)
 {
if (be->vif) {
+   unsigned int queue_index;
+
xen_unregister_watchers(be->vif);
 #ifdef CONFIG_DEBUG_FS
xenvif_debugfs_delif(be->vif);
 #endif /* CONFIG_DEBUG_FS */
xenvif_disconnect_data(be->vif);
+   for (queue_index = 0; queue_index < be->vif->num_queues; 
++queue_index)
+   xenvif_deinit_queue(>vif->queues[queue_index]);
+
+   vfree(be->vif->queues);
+   be->vif->num_queues = 0;
+   be->vif->queues = NULL;
+
xenvif_disconnect_ctrl(be->vif);
}
 }
@@ -1026,6 +1035,8 @@ static void connect(struct backend_info *be)
 err:
if (be->vif->num_queues > 0)
xenvif_disconnect_data(be->vif); /* Clean up existing queues */
+   for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index)
+   xenvif_deinit_queue(>vif->queues[queue_index]);
vfree(be->vif->queues);
be->vif->queues = NULL;
be->vif->num_queues = 0;
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 2/2] xen-netback: protect resource cleaning on XenBus disconnect

2017-01-17 Thread Igor Druzhinin

vif->lock is used to protect statistics gathering agents from using the
queue structure during cleaning.

Signed-off-by: Igor Druzhinin 
---
 drivers/net/xen-netback/interface.c | 6 --
 drivers/net/xen-netback/xenbus.c| 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 41c69b3..c48252a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -230,18 +230,18 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
 {
struct xenvif *vif = netdev_priv(dev);
struct xenvif_queue *queue = NULL;
-   unsigned int num_queues = vif->num_queues;
unsigned long rx_bytes = 0;
unsigned long rx_packets = 0;
unsigned long tx_bytes = 0;
unsigned long tx_packets = 0;
unsigned int index;
 
+   spin_lock(>lock);
if (vif->queues == NULL)
goto out;
 
/* Aggregate tx and rx stats from each queue */
-   for (index = 0; index < num_queues; ++index) {
+   for (index = 0; index < vif->num_queues; ++index) {
queue = >queues[index];
rx_bytes += queue->stats.rx_bytes;
rx_packets += queue->stats.rx_packets;
@@ -250,6 +250,8 @@ static struct net_device_stats *xenvif_get_stats(struct 
net_device *dev)
}
 
 out:
+   spin_unlock(>lock);
+
vif->dev->stats.rx_bytes = rx_bytes;
vif->dev->stats.rx_packets = rx_packets;
vif->dev->stats.tx_bytes = tx_bytes;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3e99071..d82cd71 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -503,9 +503,11 @@ static void backend_disconnect(struct backend_info *be)
for (queue_index = 0; queue_index < be->vif->num_queues; 
++queue_index)
xenvif_deinit_queue(>vif->queues[queue_index]);
 
+   spin_lock(>vif->lock);
vfree(be->vif->queues);
be->vif->num_queues = 0;
be->vif->queues = NULL;
+   spin_unlock(>vif->lock);
 
xenvif_disconnect_ctrl(be->vif);
}
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/common: Drop function calls for Xen compile/version information

> > Is this patch of yours that neccessary? Could at least some of the
> > functions still exist?
> 
> Well.  This patch is manually doing what LTO would do automatically when
> it has a cross-translation-unit view of things, and come to the
> conclusion that in all cases, inlining cross-unit will make the calling
> code shorter, and allow the functions to be elided entirely.
> 
> I take it from this that noone has tried livepatching an LTO build of Xen.

Roger did build it (FreeBSD uses it), but I don't think he tried
the test-cases.

Wait, I think he may have as his patch: f8c66c2ad2efdb281e4ebf15bf329d73c4f02ce7
Author: Roger Pau Monne 
Date:   Tue May 3 12:55:09 2016 +0200

xen/xsplice: add ELFOSABI_FREEBSD as a supported OSABI for payloads

implies that he did build an payload (and I remember us trying to figure
out if the issues he had been hitting were due to LTO or something else).


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] kexec: implement STATUS hypercall to check if image is loaded

On 17/01/17 17:29, Eric DeVolder wrote:
> The tools that use kexec are asynchronous in nature and do not keep
> state changes. As such provide an hypercall to find out whether an
> image has been loaded for either type.
>
> Note: No need to modify XSM as it has one size fits all check and
> does not check for subcommands.
>
> Note: No need to check KEXEC_FLAG_IN_PROGRESS (and error out of
> kexec_status()) as this flag is set only once by the first/only
> cpu on the crash path.
>
> Note: This is just the Xen side of the hypercall, kexec-tools patch
> to come separately.
>
> Signed-off-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Eric DeVolder 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/common: Drop function calls for Xen compile/version information

On 17/01/17 19:00, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 17, 2017 at 01:42:54PM -0500, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jan 17, 2017 at 06:16:36PM +, Andrew Cooper wrote:
>>> On 17/01/17 18:05, Konrad Rzeszutek Wilk wrote:
 On Mon, Jan 16, 2017 at 01:04:09PM +, Andrew Cooper wrote:
> The chageset/version/compile information is currently exported as a set of
> function calls into a separate translation unit, which is inefficient for 
> all
> callers.
>
> Replace the function calls with externs pointing appropriately into 
> .rodata,
> which allows all users to generate code referencing the data directly.
>
> No functional change, but causes smaller and more efficient compiled code.
>
> Signed-off-by: Andrew Cooper 
 Ah crud. That breaks the livepatch test-cases (they patch the 
 xen_extra_version
 function).
>>> Lucky I haven't pushed it then, (although the livepatch build seems to
>>> still work fine for me, despite this change.)
>> make tests should fail.
> (which is not by built by default as requested by Jan - but the
> OSSTest test cases I am working would do this).
 Are there some other code that can be modified that is reported
 by 'xl info' on which the test-cases can run (and reported easily?).
>>> Patch do_version() itself to return the same difference of information?
>> Ugh. That is going to make the building of test-cases quite complex.
>>
>> I guess it can just do it and .. return only one value :-)
> As in something like this (not compile tested):
>
>
> diff --git a/xen/arch/x86/test/xen_hello_world_func.c 
> b/xen/arch/x86/test/xen_hello_world_func.c
> index 2e4af9c..3572600 100644
> --- a/xen/arch/x86/test/xen_hello_world_func.c
> +++ b/xen/arch/x86/test/xen_hello_world_func.c
> @@ -10,7 +10,7 @@
>  static unsigned long *non_canonical_addr = (unsigned long 
> *)0xdeadULL;
>  
>  /* Our replacement function for xen_extra_version. */
> -const char *xen_hello_world(void)
> +const char *xen_version_hello_world(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
> arg)
>  {
>  unsigned long tmp;
>  int rc;
> @@ -21,7 +21,19 @@ const char *xen_hello_world(void)
>  rc = __get_user(tmp, non_canonical_addr);
>  BUG_ON(rc != -EFAULT);
>  
> -return "Hello World";
> +if ( cmd == XENVER_extraversion )
> +{
> +xen_extraversion_t extraversion = "Hello World";
> +
> +if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
> +return -EFAULT;
> +return 0;
> +}
> +/*
> + * Can't return -EPERM as certain subversions can't deal with negative
> + * values.
> + */
> +return 0;
>  }
>
>
> That will make three of the test-patches work but not for the last
> one - the NOP one (which needs to patch the _whole_ function).
>
> Argh.
>
> Is this patch of yours that neccessary? Could at least some of the
> functions still exist?

Well.  This patch is manually doing what LTO would do automatically when
it has a cross-translation-unit view of things, and come to the
conclusion that in all cases, inlining cross-unit will make the calling
code shorter, and allow the functions to be elided entirely.

I take it from this that noone has tried livepatching an LTO build of Xen.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 5/5] fix: add multiboot2 protocol support for EFI platforms

This should be squashed into the 4/4 patch 'x86: add multiboot2 protocol
support for EFI platforms'.

- fix incorrect assembly (identified by Andrew Cooper)
- fix issue where the trampoline size was left as 0 and the
  way the memory is allocated for the trampolines we would go to
  the end of an available section and then subtract off the size
  to decide where to place it. The end result was that we would
  always copy the trampolines and the 32-bit stack into some
  form of reserved memory after the conventional region we
  wanted to put things into. On some systems this did not
  manifest as a crash while on others it did. Reworked the
  changes to always reserve 64kb for both the stack and the size
  of the trampolines.

Signed-off-by: Doug Goldstein 
Reviewed-by: Doug Goldstein 
---
Doug v3 - drop ASSERTs since they are runtime only without any output.
  This should be completely mitigated by using max() and
  ensuring we have a sane value.
  (found by Jan Beulich)
- removed extra_mem variable that was incorrectly left behind.
  (found by Jan Beulich)
- fix comment around the "start of stack"
  (found by Jan Beulich)
Doug v2 - new in this version to help show what's changed
---
---
 xen/arch/x86/boot/head.S|  1 +
 xen/arch/x86/efi/efi-boot.h |  9 +
 xen/arch/x86/efi/stub.c |  2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index ac93df0..d288959 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -519,6 +519,7 @@ trampoline_setup:
 1:
 /* Switch to low-memory stack.  */
 mov sym_phys(trampoline_phys),%edi
+/* The stack base is 64kb after the location of trampoline_phys */
 lea 0x1(%edi),%esp
 lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax
 pushl   $BOOT_CS32
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index dc857d8..a73134c 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -146,8 +146,6 @@ static void __init 
efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
 {
 struct e820entry *e;
 unsigned int i;
-/* Check for extra mem for mbi data if Xen is loaded via multiboot2 
protocol. */
-UINTN extra_mem = efi_enabled(EFI_LOADER) ? 0 : (64 << 10);
 
 /* Populate E820 table and check trampoline area availability. */
 e = e820map - 1;
@@ -170,8 +168,7 @@ static void __init 
efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
 /* fall through */
 case EfiConventionalMemory:
 if ( !trampoline_phys && desc->PhysicalStart + len <= 0x10 &&
- len >= cfg.size + extra_mem &&
- desc->PhysicalStart + len > cfg.addr )
+ len >= cfg.size && desc->PhysicalStart + len > cfg.addr )
 cfg.addr = (desc->PhysicalStart + len - cfg.size) & PAGE_MASK;
 /* fall through */
 case EfiLoaderCode:
@@ -686,6 +683,10 @@ paddr_t __init efi_multiboot2(EFI_HANDLE ImageHandle, 
EFI_SYSTEM_TABLE *SystemTa
 setup_efi_pci();
 efi_variables();
 
+/* This is the maximum size of our trampoline + our low memory stack */
+cfg.size = max_t(UINTN, 64 << 10,
+(trampoline_end - trampoline_start) + 4096);
+
 if ( gop )
 efi_set_gop_mode(gop, gop_mode);
 
diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c
index 6ea6aa1..b81adc0 100644
--- a/xen/arch/x86/efi/stub.c
+++ b/xen/arch/x86/efi/stub.c
@@ -33,7 +33,7 @@ paddr_t __init noreturn efi_multiboot2(EFI_HANDLE ImageHandle,
  * not be directly supported by C compiler.
  */
 asm volatile(
-"call %2  \n"
+"call *%2 \n"
 "0:  hlt  \n"
 "jmp  0b  \n"
: "+c" (StdErr), "+d" (err) : "g" (StdErr->OutputString)
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 3/5] efi: create new early memory allocator

From: Daniel Kiper 

There is a problem with place_string() which is used as early memory
allocator. It gets memory chunks starting from start symbol and goes
down. Sadly this does not work when Xen is loaded using multiboot2
protocol because then the start lives on 1 MiB address and we should
not allocate a memory from below of it. So, I tried to use mem_lower
address calculated by GRUB2. However, this solution works only on some
machines. There are machines in the wild (e.g. Dell PowerEdge R820)
which uses first ~640 KiB for boot services code or data... :-(((
Hence, we need new memory allocator for Xen EFI boot code which is
quite simple and generic and could be used by place_string() and
efi_arch_allocate_mmap_buffer(). I think about following solutions:

1) We could use native EFI allocation functions (e.g. AllocatePool()
   or AllocatePages()) to get memory chunk. However, later (somewhere
   in __start_xen()) we must copy its contents to safe place or reserve
   it in e820 memory map and map it in Xen virtual address space. This
   means that the code referring to Xen command line, loaded modules and
   EFI memory map, mostly in __start_xen(), will be further complicated
   and diverge from legacy BIOS cases. Additionally, both former things
   have to be placed below 4 GiB because their addresses are stored in
   multiboot_info_t structure which has 32-bit relevant members.

2) We may allocate memory area statically somewhere in Xen code which
   could be used as memory pool for early dynamic allocations. Looks
   quite simple. Additionally, it would not depend on EFI at all and
   could be used on legacy BIOS platforms if we need it. However, we
   must carefully choose size of this pool. We do not want increase Xen
   binary size too much and waste too much memory but also we must fit
   at least memory map on x86 EFI platforms. As I saw on small machine,
   e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
   than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
   So, it means that we need more than 8 KiB for EFI memory map only.
   Additionally, if we use this memory pool for Xen and modules command
   line storage (it would be used when xen.efi is executed as EFI application)
   then we should add, I think, about 1 KiB. In this case, to be on safe
   side, we should assume at least 64 KiB pool for early memory allocations.
   Which is about 4 times of our earlier calculations. However, during
   discussion on Xen-devel Jan Beulich suggested that just in case we should
   use 1 MiB memory pool like it is in original place_string() implementation.
   So, let's use 1 MiB as it was proposed. If we think that we should not
   waste unallocated memory in the pool on running system then we can mark
   this region as __initdata and move all required data to dynamically
   allocated places somewhere in __start_xen().

2a) We could put memory pool into .bss.page_aligned section. Then allocate
memory chunks starting from the lowest address. After init phase we can
free unused portion of the memory pool as in case of .init.text or 
.init.data
sections. This way we do not need to allocate any space in image file and
freeing of unused area in the memory pool is very simple.

Now #2a solution is implemented because it is quite simple and requires
limited number of changes, especially in __start_xen().

The new allocator is quite generic and can be used on ARM platforms too.

Signed-off-by: Daniel Kiper 
Acked-by: Jan Beulich 
Acked-by: Julien Grall 
Reviewed-by: Doug Goldstein 
---
Doug v1 - removed stale paragraph

v11 - suggestions/fixes:
- #ifdef only EBMALLOC_SIZE from ebmalloc machinery
  (suggested by Jan Beulich).

v10 - suggestions/fixes:
- remove unneeded ARM free_ebmalloc_unused_mem() stub.

v9 - suggestions/fixes:
   - call free_ebmalloc_unused_mem() from efi_init_memory()
 instead of xen/arch/arm/setup.c:init_done()
 (suggested by Jan Beulich),
   - improve comments.

v8 - suggestions/fixes:
   - disable whole ebmalloc machinery on ARM platforms,
   - add comment saying what should be done before
 enabling ebmalloc on ARM,
 (suggested by Julien Grall),
   - move ebmalloc code before efi-boot.h inclusion and
 remove unneeded forward declaration
 (suggested by Jan Beulich),
   - remove free_ebmalloc_unused_mem() call from
 xen/arch/arm/setup.c:init_done()
 (suggested by Julien Grall),
   - improve commit message.

v7 - suggestions/fixes:
   - enable most of ebmalloc machinery on ARM platforms
 (suggested by Jan Beulich),
   - remove unneeded cast
 (suggested by Jan Beulich),
   - wrap long line
 (suggested by Jan Beulich),
   - improve commit message.

v6 - suggestions/fixes:
   - optimize ebmalloc allocator,
   - move ebmalloc machinery to xen/common/efi/boot.c
 (suggested by

[Xen-devel] [PATCH v3 1/5] x86: add multiboot2 protocol support

From: Daniel Kiper 

Add multiboot2 protocol support. Alter min memory limit handling as we
now may not find it from either multiboot (v1) or multiboot2.

This way we are laying the foundation for EFI + GRUB2 + Xen development.

Signed-off-by: Daniel Kiper 
Reviewed-by: Jan Beulich 
Reviewed-by: Doug Goldstein 
---
v9 - suggestions/fixes:
   - use .L label instead of numeric one in multiboot2 data scanning loop;
 I hope that this change does not invalidate Jan's Reviewed-by
 (suggested by Jan Beulich).

v8 - suggestions/fixes:
   - use sizeof(/) instead of sizeof()
 if it is possible
 (suggested by Jan Beulich).

v7 - suggestions/fixes:
   - rename mbi_mbi/mbi2_mbi to mbi_reloc/mbi2_reloc respectively
 (suggested by Jan Beulich),
   - initialize mbi_out->flags using "|=" instead of "="
 (suggested by Jan Beulich),
   - use sizeof(*mmap_dst) instead of sizeof(memory_map_t)
 if it makes sense
 (suggested by Jan Beulich).

v6 - suggestions/fixes:
   - properly index multiboot2_tag_mmap_t.entries[]
 (suggested by Jan Beulich),
   - do not index mbi_out_mods[] beyond its end
 (suggested by Andrew Cooper),
   - reduce number of casts
 (suggested by Andrew Cooper and Jan Beulich),
   - add braces to increase code readability
 (suggested by Andrew Cooper).

v5 - suggestions/fixes:
   - check multiboot2_tag_mmap_t.entry_size before
 multiboot2_tag_mmap_t.entries[] use
 (suggested by Jan Beulich),
   - properly index multiboot2_tag_mmap_t.entries[]
 (suggested by Jan Beulich),
   - use "type name[]" instad of "type name[0]"
 in xen/include/xen/multiboot2.h
 (suggested by Jan Beulich),
   - remove unneeded comment
 (suggested by Jan Beulich).

v4 - suggestions/fixes:
   - avoid assembly usage in xen/arch/x86/boot/reloc.c,
   - fix boundary check issue and optimize
 for() loops in mbi2_mbi(),
   - move to stdcall calling convention,
   - remove unneeded typeof() from ALIGN_UP() macro
 (suggested by Jan Beulich),
   - add and use NULL definition in xen/arch/x86/boot/reloc.c
 (suggested by Jan Beulich),
   - do not read data beyond the end of multiboot2
 information in xen/arch/x86/boot/head.S
 (suggested by Jan Beulich),
   - add :req to some .macro arguments
 (suggested by Jan Beulich),
   - use cmovcc if possible,
   - add .L to multiboot2_header_end label
 (suggested by Jan Beulich),
   - add .L to multiboot2_proto label
 (suggested by Jan Beulich),
   - improve label names
 (suggested by Jan Beulich).

v3 - suggestions/fixes:
   - reorder reloc() arguments
 (suggested by Jan Beulich),
   - remove .L from multiboot2 header labels
 (suggested by Andrew Cooper, Jan Beulich and Konrad Rzeszutek Wilk),
   - take into account alignment when skipping multiboot2 fixed part
 (suggested by Konrad Rzeszutek Wilk),
   - create modules data if modules count != 0
 (suggested by Jan Beulich),
   - improve macros
 (suggested by Jan Beulich),
   - reduce number of casts
 (suggested by Jan Beulich),
   - use const if possible
 (suggested by Jan Beulich),
   - drop static and __used__ attribute from reloc()
 (suggested by Jan Beulich),
   - remove isolated/stray __packed attribute from
 multiboot2_memory_map_t type definition
 (suggested by Jan Beulich),
   - reformat xen/include/xen/multiboot2.h
 (suggested by Konrad Rzeszutek Wilk),
   - improve comments
 (suggested by Konrad Rzeszutek Wilk),
   - remove hard tabs
 (suggested by Jan Beulich and Konrad Rzeszutek Wilk).

v2 - suggestions/fixes:
   - generate multiboot2 header using macros
 (suggested by Jan Beulich),
   - improve comments
 (suggested by Jan Beulich),
   - simplify assembly in xen/arch/x86/boot/head.S
 (suggested by Jan Beulich),
   - do not include include/xen/compiler.h
 in xen/arch/x86/boot/reloc.c
 (suggested by Jan Beulich),
   - do not read data beyond the end of multiboot2 information
 (suggested by Jan Beulich).

v2 - not fixed yet:
   - dynamic dependency generation for xen/arch/x86/boot/reloc.S;
 this requires more work; I am not sure that it pays because
 potential patch requires more changes than addition of just
 multiboot2.h to Makefile
 (suggested by Jan Beulich),
   - isolated/stray __packed attribute usage for multiboot2_memory_map_t
 (suggested by Jan Beulich).
---
---
 xen/arch/x86/boot/Makefile|   3 +-
 xen/arch/x86/boot/head.S  | 107 +++-
 xen/arch/x86/boot/reloc.c | 148 ++-
 xen/arch/x86/x86_64/asm-offsets.c |   9 ++-
 xen/include/xen/multiboot2.h  | 169 +++-
 5 files changed, 426 insertions(+), 10 deletions(-)
 create mode 100644 xen/include/xen/multiboot2.h

diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
index 5fdb5ae..06893d8 100644
---

[Xen-devel] [PATCH v3 4/5] x86: add multiboot2 protocol support for EFI platforms

From: Daniel Kiper 

This way Xen can be loaded on EFI platforms using GRUB2 and
other boot loaders which support multiboot2 protocol.

Signed-off-by: Daniel Kiper 
Reviewed-by: Doug Goldstein 
Tested-by: Doug Goldstein 
---
Doug v2 - dropped all my changes and moved them into their own patch
Doug v1 - fix incorrect assembly (identified by Andrew Cooper)
- fix issue where the trampoline size was left as 0 and the
  way the memory is allocated for the trampolines we would go to
  the end of an available section and then subtract off the size
  to decide where to place it. The end result was that we would
  always copy the trampolines and the 32-bit stack into some
  form of reserved memory after the conventional region we
  wanted to put things into. On some systems this did not
  manifest as a crash while on others it did. Reworked the
  changes to always reserve 64kb for both the stack and the size
  of the trampolines. Added an ASSERT to make sure we never blow
  through this size.

v10 - suggestions/fixes:
- replace ljmpl with lretq
  (suggested by Andrew Cooper),
- introduce efi_platform to increase code readability
  (suggested by Andrew Cooper).

v9 - suggestions/fixes:
   - use .L labels instead of numeric ones in multiboot2 data scanning loops
 (suggested by Jan Beulich).

v8 - suggestions/fixes:
   - use __bss_start(%rip)/__bss_end(%rip) instead of
 of .startof.(.bss)(%rip)/$.sizeof.(.bss) because
 latter is not tested extensively in different
 built environments yet
 (suggested by Andrew Cooper),
   - fix multiboot2 data scanning loop in x86_32 code
 (suggested by Jan Beulich),
   - add check for extra mem for mbi data if Xen is loaded
 via multiboot2 protocol on EFI platform
 (suggested by Jan Beulich),
   - improve comments
 (suggested by Jan Beulich).

v7 - suggestions/fixes:
   - do not allocate twice memory for trampoline if we were
 loaded via multiboot2 protocol on EFI platform,
   - wrap long line
 (suggested by Jan Beulich),
   - improve comments
 (suggested by Jan Beulich).

v6 - suggestions/fixes:
   - improve label names in assembly
 error printing code
 (suggested by Jan Beulich),
   - improve comments
 (suggested by Jan Beulich),
   - various minor cleanups and fixes
 (suggested by Jan Beulich).

v4 - suggestions/fixes:
   - remove redundant BSS alignment,
   - update BSS alignment check,
   - use __set_bit() instead of set_bit() if possible
 (suggested by Jan Beulich),
   - call efi_arch_cpu() from efi_multiboot2()
 even if the same work is done later in
 other place right now
 (suggested by Jan Beulich),
   - xen/arch/x86/efi/stub.c:efi_multiboot2()
 fail properly on EFI platforms,
   - do not read data beyond the end of multiboot2
 information in xen/arch/x86/boot/head.S
 (suggested by Jan Beulich),
   - use 32-bit registers in x86_64 code if possible
 (suggested by Jan Beulich),
   - multiboot2 information address is 64-bit
 in x86_64 code, so, treat it is as is
 (suggested by Jan Beulich),
   - use cmovcc if possible,
   - leave only one space between rep and stosq
 (suggested by Jan Beulich),
   - improve error handling,
   - improve early error messages,
 (suggested by Jan Beulich),
   - improve early error messages printing code,
   - improve label names
 (suggested by Jan Beulich),
   - improve comments
 (suggested by Jan Beulich),
   - various minor cleanups.

v3 - suggestions/fixes:
   - take into account alignment when skipping multiboot2 fixed part
 (suggested by Konrad Rzeszutek Wilk),
   - improve segment registers initialization
 (suggested by Jan Beulich),
   - improve comments
 (suggested by Jan Beulich and Konrad Rzeszutek Wilk),
   - improve commit message
 (suggested by Jan Beulich).

v2 - suggestions/fixes:
   - generate multiboot2 header using macros
 (suggested by Jan Beulich),
   - switch CPU to x86_32 mode before
 jumping to 32-bit code
 (suggested by Andrew Cooper),
   - reduce code changes to increase patch readability
 (suggested by Jan Beulich),
   - improve comments
 (suggested by Jan Beulich),
   - ignore MULTIBOOT2_TAG_TYPE_BASIC_MEMINFO tag on EFI platform
 and find on my own multiboot2.mem_lower value,
   - stop execution if EFI platform is detected
 in legacy BIOS path.
---
---
 xen/arch/x86/boot/head.S  | 263 +--
 xen/arch/x86/efi/efi-boot.h   |  54 +-
 xen/arch/x86/efi/stub.c   |  38 -
 xen/arch/x86/x86_64/asm-offsets.c |   2 +-
 xen/arch/x86/xen.lds.S|   4 +-
 xen/common/efi/boot.c |  11 +-
 6 files changed, 349 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index

[Xen-devel] [PATCH v3 0/5] multiboot2 protocol support

This is a series based on v11 of Daniel Kiper's
"x86: multiboot2 protocol support" series. It aims to collect up all the
fixes and changes that Andrew Cooper, Jan Beulich and myself discovered in
code review and testing on actual hardware. I've had problems with the
relocation portion of the series so I've dropped it as all the hardware I
am needing to support presently for my $EMPLOYER does not load anything at
the 1mb mark. To me this adds MB2 support for all pieces of hardware that
don't have things located at 1mb so it's an incremental step. I've also
dropped the early command line conversion to C as it was done in support
of the relocation changes and therefore not necessary. In the end my goal
is to help Daniel out by providing the portion of the series that works
on half a dozen physical machines I've tested with and integrates all
changes as discussed on the v11 thread. The reason I am posting this is that
Daniel has said he won't be able to address feedback and issues identified
for another 2 weeks but my requirements from my $EMPLOYER are more immediate
than that.

Feel free to grab this series at: https://github.com/cardoe/xen/tree/doug-mb2-v3

v3 - address review comments by Jan Beulich. They are contained within 5/5.
v2 - separate my fixes from Daniel's original series
   - add back some ACKs I accidentally dropped

Daniel Kiper (4):
  x86: add multiboot2 protocol support
  efi: build xen.gz with EFI code
  efi: create new early memory allocator
  x86: add multiboot2 protocol support for EFI platforms

Doug Goldstein (1):
  fix: add multiboot2 protocol support for EFI platforms

 xen/arch/x86/Makefile |   2 +-
 xen/arch/x86/boot/Makefile|   3 +-
 xen/arch/x86/boot/head.S  | 361 +--
 xen/arch/x86/boot/reloc.c | 148 -
 xen/arch/x86/efi/Makefile |  12 +-
 xen/arch/x86/efi/efi-boot.h   |  64 +++--
 xen/arch/x86/efi/stub.c   |  38 +++-
 xen/arch/x86/setup.c  |   3 +-
 xen/arch/x86/x86_64/asm-offsets.c |  11 +-
 xen/arch/x86/xen.lds.S|   8 +-
 xen/common/efi/boot.c |  64 +-
 xen/common/efi/runtime.c  |   9 +-
 xen/include/xen/multiboot2.h  | 169 +++-
 13 files changed, 844 insertions(+), 48 deletions(-)
 create mode 100644 xen/include/xen/multiboot2.h

base-commit: 98be5ffc05e689e2131f175ed95b011a7270db67
-- 
git-series 0.9.1

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 2/5] efi: build xen.gz with EFI code

From: Daniel Kiper 

Build xen.gz with EFI code. We need this to support multiboot2
protocol on EFI platforms.

If we wish to load non-ELF file using multiboot (v1) or multiboot2 then
it must contain "linear" (or "flat") representation of code and data.
This is requirement of both boot protocols. Currently, PE file contains
many sections which are not "linear" (one after another without any holes)
or even do not have representation in a file (e.g. BSS). From EFI point
of view everything is OK and works. However, this file layout cannot be
properly interpreted by multiboot protocols family. In theory there is
a chance that we could build proper PE file (from multiboot protocols POV)
using current build system. However, it means that xen.efi further diverge
from Xen ELF file (in terms of contents and build method). On the other
hand ELF has all needed properties. So, it means that this is good starting
point for further development. Additionally, I think that this is also good
starting point for further xen.efi code and build optimizations. It looks
that there is a chance that finally we can generate xen.efi directly from
Xen ELF using just simple objcopy or other tool. This way we will have one
Xen binary which can be loaded by three boot protocols: EFI native loader,
multiboot (v1) and multiboot2.

Signed-off-by: Daniel Kiper 
Acked-by: Jan Beulich 
Reviewed-by: Doug Goldstein 
---
v6 - suggestions/fixes:
   - improve efi_enabled() checks in efi_runtime_call()
 (suggested by Jan Beulich).

v5 - suggestions/fixes:
   - properly calculate efi symbol address in
 xen/arch/x86/xen.lds.S (I hope that this
 change does not invalidate Jan's ACK).

v4 - suggestions/fixes:
   - functions should return -ENOSYS instead
 of -EOPNOTSUPP if EFI runtime services
 are not available
 (suggested by Jan Beulich),
   - remove stale bits from xen/arch/x86/Makefile
 (suggested by Jan Beulich).

v3 - suggestions/fixes:
   - check for EFI platform in EFI code
 (suggested by Jan Beulich),
   - fix Makefiles
 (suggested by Jan Beulich),
   - improve commit message
 (suggested by Jan Beulich).

v2 - suggestions/fixes:
   - build EFI code only if it is supported in a given build environment
 (suggested by Jan Beulich).
---
---
 xen/arch/x86/Makefile |  2 +-
 xen/arch/x86/efi/Makefile | 12 
 xen/arch/x86/xen.lds.S|  4 ++--
 xen/common/efi/boot.c |  3 +++
 xen/common/efi/runtime.c  |  9 +
 5 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 7f6b5d7..2e22cdf 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -219,6 +219,6 @@ efi/mkreloc: efi/mkreloc.c
 clean::
rm -f asm-offsets.s *.lds boot/*.o boot/*~ boot/core boot/mkelf32
rm -f $(BASEDIR)/.xen-syms.[0-9]* boot/.*.d
-   rm -f $(BASEDIR)/.xen.efi.[0-9]* efi/*.o efi/.*.d efi/*.efi 
efi/disabled efi/mkreloc
+   rm -f $(BASEDIR)/.xen.efi.[0-9]* efi/*.efi efi/disabled efi/mkreloc
rm -f boot/reloc.S boot/reloc.lnk boot/reloc.bin
rm -f note.o
diff --git a/xen/arch/x86/efi/Makefile b/xen/arch/x86/efi/Makefile
index ad3fdf7..442f3fc 100644
--- a/xen/arch/x86/efi/Makefile
+++ b/xen/arch/x86/efi/Makefile
@@ -1,18 +1,14 @@
 CFLAGS += -fshort-wchar
 
-obj-y += stub.o
-
-create = test -e $(1) || touch -t 19990101 $(1)
-
 efi := y$(shell rm -f disabled)
 efi := $(if $(efi),$(shell $(CC) $(filter-out $(CFLAGS-y) .%.d,$(CFLAGS)) -c 
check.c 2>disabled && echo y))
 efi := $(if $(efi),$(shell $(LD) -mi386pep --subsystem=10 -o check.efi check.o 
2>disabled && echo y))
-efi := $(if $(efi),$(shell rm disabled)y,$(shell $(call create,boot.init.o); 
$(call create,runtime.o)))
-
-extra-$(efi) += boot.init.o relocs-dummy.o runtime.o compat.o buildid.o
+efi := $(if $(efi),$(shell rm disabled)y)
 
 %.o: %.ihex
$(OBJCOPY) -I ihex -O binary $< $@
 
-stub.o: $(extra-y)
+obj-y := stub.o
+obj-$(efi) := boot.init.o compat.o relocs-dummy.o runtime.o
+extra-$(efi) += buildid.o
 nogcov-$(efi) += stub.o
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 7676de9..b0b1c9b 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -270,10 +270,10 @@ SECTIONS
   .pad : {
 . = ALIGN(MB(16));
   } :text
-#else
-  efi = .;
 #endif
 
+  efi = DEFINED(efi) ? efi : .;
+
   /* Sections to be discarded */
   /DISCARD/ : {
*(.exit.text)
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 3e5e4ab..df8c702 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -1251,6 +1251,9 @@ void __init efi_init_memory(void)
 } *extra, *extra_head = NULL;
 #endif
 
+if ( !efi_enabled(EFI_BOOT) )
+return;
+
 printk(XENLOG_INFO "EFI memory map:%s\n",
map_bs ? " (mapping BootServices)" : "");
 for ( i = 0; i < efi_memmap_size; i += efi_mdesc_size )
diff --git

[Xen-devel] [qemu-mainline test] 104208: tolerable FAIL - PUSHED

flight 104208 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104208/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt 15 guest-start/debian.repeatfail  like 104176
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 104178
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 104178
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 104178
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 104178
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail  like 104178
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 104178
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 104178
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 104178

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuua8c611e1133f97c979922f41103f79309339dc27
baseline version:
 qemuub6af8ea60282df514f87d32e36afd1c9aeee28c8

Last test of basis   104178  2017-01-14 03:53:39 Z3 days
Failing since104191  2017-01-16 13:12:51 Z1 days3 attempts
Testing same since   104208  2017-01-17 12:43:29 Z0 days1 attempts


People who touched revisions under test:
  Alex BennÃ©e 
  David Gibson 
  Laurent Vivier 
  Peter Maydell 
  Richard Henderson 
  Thomas Huth 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass

Re: [Xen-devel] STAO spec in xen.git

On Tue, 17 Jan 2017, Olaf Hering wrote:
> On Fri, Jan 13, Julien Grall wrote:
> 
> > Regarding the format. Does ODT will allow git to do proper diff?
> 
> There is flat ODT, "Safe as ..." and pick the better format from the pulldown 
> menu.

Sounds like a good idea. Can you submit a patch?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [ovmf baseline-only test] 68383: tolerable trouble: blocked/broken

2017-01-17 Thread Platform Team regression test user

This run is configured for baseline tests only.

flight 68383 ovmf real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/68383/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 build-i3863 host-install(3)   broken baseline untested
 build-amd64   3 host-install(3)   broken baseline untested
 build-i386-pvops  3 host-install(3)   broken baseline untested
 build-i386-xsm3 host-install(3)   broken baseline untested
 build-amd64-pvops 3 host-install(3)   broken baseline untested
 build-amd64-xsm   3 host-install(3)   broken baseline untested

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a

version targeted for testing:
 ovmf 315d9d08fd77db1024ccc5307823da8aaed85e2f
baseline version:
 ovmf 2b631390f9f5f6971c3c8a7f0f47160b80cf072b

Last test of basis68381  2017-01-17 08:46:09 Z0 days
Testing same since68383  2017-01-17 14:46:21 Z0 days1 attempts


People who touched revisions under test:
  Gary Lin 
  Laszlo Ersek 

jobs:
 build-amd64-xsm  broken  
 build-i386-xsm   broken  
 build-amd64  broken  
 build-i386   broken  
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopsbroken  
 build-i386-pvops broken  
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary

broken-step build-i386 host-install(3)
broken-step build-amd64 host-install(3)
broken-step build-i386-pvops host-install(3)
broken-step build-i386-xsm host-install(3)
broken-step build-amd64-pvops host-install(3)
broken-step build-amd64-xsm host-install(3)

Push not applicable.


commit 315d9d08fd77db1024ccc5307823da8aaed85e2f
Author: Gary Lin 
Date:   Tue Jan 17 12:52:32 2017 +0800

OvmfPkg: pull in TLS modules with -D TLS_ENABLE (also enabling HTTPS)

This commit introduces a new build option, TLS_ENABLE, to pull in the
TLS-related modules. If HTTP_BOOT_ENABLE and TLS_ENABLE are enabled at
the same time, the HTTP driver locates the TLS protocols automatically
and thus HTTPS is enabled.

To build OVMF with HTTP Boot:

$ ./build.sh -D HTTP_BOOT_ENABLE

To build OVMF with HTTPS Boot:

$ ./build.sh -D HTTP_BOOT_ENABLE -D TLS_ENABLE

Cc: Laszlo Ersek 
Cc: Justen Jordan L 
Cc: Wu Jiaxin 
Cc: Long Qin 
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Gary Lin 
Reviewed-by: Wu Jiaxin 
Reviewed-by: Laszlo Ersek 

commit 32e22f20c985b6053fe286ef95882ded73b8b398
Author: Gary Lin 
Date:   Tue Jan 17 12:52:31 2017 +0800

OvmfPkg: correct the IScsiDxe module included for the IPv6 stack

Always use IScsiDxe from NetworkPkg when IPv6 is enabled since it provides
the complete ISCSI support.

NOTE: This makes OpenSSL a hard requirement when NETWORK_IP6_ENABLE is
  true.

(Based on Jiaxin's suggestion)

Cc: Laszlo Ersek 
Cc: Justen Jordan L 
Cc: Wu Jiaxin 
Cc: Long Qin 
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Gary Lin 
Reviewed-by: Wu Jiaxin 
Reviewed-by: Laszlo Ersek 
[ler...@redhat.com: update subject line]
Signed-off-by: Laszlo Ersek 

commit 6d0f8941bdc2900085e7c1ab06df5549c10b91dd
Author: Gary Lin 
Date:   Tue Jan

Re: [Xen-devel] Xen ARM - Exposing a PL011 to the guest

On Tue, 17 Jan 2017, Julien Grall wrote:
> Hi,
> 
> Sorry for the late answer, I am just back from holidays and still catching-up
> with my e-mails.
> 
> On 03/01/17 20:08, Stefano Stabellini wrote:
> > On Thu, 29 Dec 2016, Bhupinder Thakur wrote:
> > > On 28 December 2016 at 23:19, Julien Grall  wrote:
> > > > On 21/12/16 22:12, Stefano Stabellini wrote:
> > > > > 
> > > > > On Wed, 21 Dec 2016, Julien Grall wrote:
> > > > > > 
> > > > > > On 20/12/2016 20:53, Stefano Stabellini wrote:
> > > > > > > 
> > > > > > > On Tue, 20 Dec 2016, Julien Grall wrote:
> > > > > > > > 
> > > > > > > > On 19/12/2016 21:24, Stefano Stabellini wrote:
> > > > > > > > > 
> > > > > > > > > On Mon, 19 Dec 2016, Christoffer Dall wrote:
> > > > > > > > > > 
> > > > > > > > > > On Fri, Dec 16, 2016 at 05:03:13PM +, Julien Grall
> > > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > If we use hvm_params for this, we need two new hvm_params and
> > > > > > > > > Xen
> > > > > > > > > needs
> > > > > > > > > to unmap the pfn from the guest immediately, because we don't
> > > > > > > > > want the
> > > > > > > > > guest to have access to it.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > If you unmap the pfn, the PV backend will not be able to request
> > > > > > > > the
> > > > > > > > page
> > > > > > > > because there will be no translation available.
> > > > > > > > 
> > > > > > > > So what you want to do is preventing the guest to at least write
> > > > > > > > into
> > > > > > > > region
> > > > > > > > (not sure if it is worth to restrict read)
> > > > > > > 
> > > > > > > 
> > > > > > > That's a good idea.
> > > > > > > 
> > > > > > > 
> > > > > > > > and unmap the page via the hypercall
> > > > > > > > XENMEM_decrease_reservation.
> > > > > > > 
> > > > > > > 
> > > > > > > That would be issued by the guest itself, right? To save address
> > > > > > > space?
> > > > > > 
> > > > > > 
> > > > > > Correct. The main use case today is ballooning, but guest could call
> > > > > > it
> > > > > > on any
> > > > > > other RAM baked page.
> > > > > > 
> > > > > > I was thinking about more about the protection needed. Technically
> > > > > > the
> > > > > > data in
> > > > > > the ring are not trusted. So if the guest is messing up with it, it
> > > > > > would
> > > > > > not
> > > > > > be a big issue. Or did I miss anything here?
> > > > > 
> > > > > 
> > > > > I understand that a guest would be smart to call
> > > > > XENMEM_decrease_reservation on the PV console page for pl011, but it
> > > > > cannot be a security measure, because, in fact, it needs to be called
> > > > > by
> > > > > the guest.  Of course, a malicious guest can simply not call
> > > > > XENMEM_decrease_reservation for it.
> > > > 
> > > > 
> > > > Sorry I was not clear. I was not suggested the guest to call
> > > > XENMEM_decrease_reservation on ring for security but a malicious guest
> > > > issuing the hypercall on the ring protected and replacing by another
> > > > page.
> > > > 
> > > > This is the exact same problem as the one I mentioned on the ITS thread.
> > > > The
> > > > page live in guest memory but contains data that will only be touched by
> > > > Xen.
> > > > 
> > > > If you remove those page from stage-2, the translation IPA -> MFN will
> > > > be
> > > > lost unless you store somewhere else. You would have to do it per-page
> > > > as
> > > > the buffer will use contiguous IPA but potentially noncontiguous MFN.
> > > > 
> > > > In the case of ITS the memory is provisioned by the guest. So there are
> > > > not
> > > > much to do there except adding protection in stage-2 such as write
> > > > protection and preventing the guest to unmap it. However for the pl011
> > > > ring,
> > > > as Andrew pointed on IRC, what we need to do is accounting this page to
> > > > the
> > > > domain memory. No mapping is necessary in stage-2.
> > > 
> > > Please clarify what is meant by that no stage-2 mapping is required.
> > > Does it mean that no stage-2 mapping is required for the guest as it
> > > never needs to access this page?
> > 
> > That's right.
> > 
> > 
> > > However, the Xen HYP will need the stage-2 mapping to find out the
> > > pl011 PFN --> physical MFN mapping so that it can map the page to its
> > > own address space. Currently, I am using prepare_ring_for_helper () to
> > > map the pl011 PFN (passed via hvm call) ---> phyiscal MFN ---> Xen HYP
> > > VA.
> > 
> > I am not sure what Julien had in mind exactly. I like the idea of not
> > mapping the page at stage-2, but it is true that many interfaces expect
> > pfns. If Xen is the one to allocate the pl011 PV console page, then Xen
> > knows the mfn and could use it to map the page, instead of the pfn.
> > However, the PV console backend also needs to map the same page, and it
> > currently does that by calling xc_map_foreign_range, which I believe
> > also expect a pfn.
> 
> Do you agree that page such as ioreq and the pl011 PV console are only used
> for

Re: [Xen-devel] [PATCH] xen/common: Drop function calls for Xen compile/version information

On Tue, Jan 17, 2017 at 01:42:54PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 17, 2017 at 06:16:36PM +, Andrew Cooper wrote:
> > On 17/01/17 18:05, Konrad Rzeszutek Wilk wrote:
> > > On Mon, Jan 16, 2017 at 01:04:09PM +, Andrew Cooper wrote:
> > >> The chageset/version/compile information is currently exported as a set 
> > >> of
> > >> function calls into a separate translation unit, which is inefficient 
> > >> for all
> > >> callers.
> > >>
> > >> Replace the function calls with externs pointing appropriately into 
> > >> .rodata,
> > >> which allows all users to generate code referencing the data directly.
> > >>
> > >> No functional change, but causes smaller and more efficient compiled 
> > >> code.
> > >>
> > >> Signed-off-by: Andrew Cooper 
> > > Ah crud. That breaks the livepatch test-cases (they patch the 
> > > xen_extra_version
> > > function).
> > 
> > Lucky I haven't pushed it then, (although the livepatch build seems to
> > still work fine for me, despite this change.)
> 
> make tests should fail.

(which is not by built by default as requested by Jan - but the
OSSTest test cases I am working would do this).
> 
> > 
> > > Are there some other code that can be modified that is reported
> > > by 'xl info' on which the test-cases can run (and reported easily?).
> > 
> > Patch do_version() itself to return the same difference of information?
> 
> Ugh. That is going to make the building of test-cases quite complex.
> 
> I guess it can just do it and .. return only one value :-)

As in something like this (not compile tested):


diff --git a/xen/arch/x86/test/xen_hello_world_func.c 
b/xen/arch/x86/test/xen_hello_world_func.c
index 2e4af9c..3572600 100644
--- a/xen/arch/x86/test/xen_hello_world_func.c
+++ b/xen/arch/x86/test/xen_hello_world_func.c
@@ -10,7 +10,7 @@
 static unsigned long *non_canonical_addr = (unsigned long 
*)0xdeadULL;
 
 /* Our replacement function for xen_extra_version. */
-const char *xen_hello_world(void)
+const char *xen_version_hello_world(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 unsigned long tmp;
 int rc;
@@ -21,7 +21,19 @@ const char *xen_hello_world(void)
 rc = __get_user(tmp, non_canonical_addr);
 BUG_ON(rc != -EFAULT);
 
-return "Hello World";
+if ( cmd == XENVER_extraversion )
+{
+xen_extraversion_t extraversion = "Hello World";
+
+if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
+return -EFAULT;
+return 0;
+}
+/*
+ * Can't return -EPERM as certain subversions can't deal with negative
+ * values.
+ */
+return 0;
 }


That will make three of the test-patches work but not for the last
one - the NOP one (which needs to patch the _whole_ function).

Argh.

Is this patch of yours that neccessary? Could at least some of the
functions still exist?

Like 

xen_minor_version() and xen_extra_verison() ?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [xen-unstable test] 104202: tolerable FAIL - PUSHED

flight 104202 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104202/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 104181
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 104181
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 104181
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 104181
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 104181
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 104181
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 104181
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 104181
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 104181
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail  like 104181

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  c33b5f013db3460c07c017dea45a1c010c3dacc0
baseline version:
 xen  98be5ffc05e689e2131f175ed95b011a7270db67

Last test of basis   104187  2017-01-16 01:55:18 Z1 days
Testing same since   104197  2017-01-17 01:57:57 Z0 days2 attempts


People who touched revisions under test:
  Andrew Cooper 
  He Chen 
  Stefano Stabellini 
  Wei Liu 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-oldkern  pass
 build-i386-oldkern   pass
 build-amd64-prev pass
 build-i386-prev

Re: [Xen-devel] PVH CPU hotplug design document

On 01/17/2017 12:45 PM, Roger Pau Monné wrote:
> On Tue, Jan 17, 2017 at 10:50:44AM -0500, Boris Ostrovsky wrote:
>> On 01/17/2017 10:33 AM, Jan Beulich wrote:
>> On 17.01.17 at 16:27,  wrote:
 On 01/17/2017 09:44 AM, Jan Beulich wrote:
 On 17.01.17 at 15:13,  wrote:
>> There's only one kind of PVHv2 guest that doesn't require ACPI, and that 
>> guest
>> type also doesn't have emulated local APICs. We agreed that this model 
>> was
>> interesting from things like unikernels DomUs, but that's the only 
>> reason why
>> we are providing it. Not that full OSes couldn't use it, but it seems
>> pointless.
> You writing things this way makes me notice another possible design
> issue here: Requiring ACPI is a bad thing imo, with even bare hardware
> going different directions for at least some use cases (SFI being one
> example). Hence I think ACPI should - like on bare hardware - remain
> an optional thing. Which in turn require _all_ information obtained from
> ACPI (if available) to also be available another way. And this other
> way might by hypercalls in our case.
 At the risk of derailing this thread: why do we need vCPU hotplug for
 dom0 in the first place? What do we gain over "echo {1|0} >
 /sys/devices/system/cpu/cpuX/online" ?

 I can see why this may be needed for domUs where Xen can enforce number
 of vCPUs that are allowed to run (which we don't enforce now anyway) but
 why for dom0?
>>> Good that you now ask this too - that's the PV hotplug mechanism,
>>> and I've been saying all the time that this should be just fine for PVH
>>> (Dom0 and DomU).
>> I think domU hotplug has some value in that we can change number VCPUs
>> that the guest sees and ACPI-based hotplug allows us to do that in a
>> "standard" manner.
>>
>> For dom0 this doesn't seem to be necessary as it's a special domain
>> available only to platform administrator.
>>
>> Part of confusion I think is because PV hotplug is not hotplug, really,
>> as far as Linux kernel is concerned.
> Hm, I'm not really sure I'm following, but I think that we could translate 
> this
> Dom0 PV hotplug mechanism to PVH as:
>
>  - Dom0 is provided with up to HVM_MAX_VCPUS local APIC entries in the MADT, 
> and
>the entries > dom0_max_vcpus are marked as disabled.
>  - Dom0 has HVM_MAX_VCPUS vCPUs ready to be started, either by using the local
>APIC or an hypercall.
>
> Would that match what's done for classic PV Dom0?

To match what we have for PV dom0 I believe you'd provide MADT with
opt_dom0_max_vcpus_max entries and mark all of them enabled.

dom0 brings up all opt_dom0_max_vcpus_max VCPUs, and then offlines
(opt_dom0_max_vcpus_min+1)..opt_dom0_max_vcpus_max. See
drivers/xen/cpu_hotplug.c:setup_cpu_watcher(). That's why I said it's
not a hotplug but rather on/off-lining.

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/common: Drop function calls for Xen compile/version information

On Tue, Jan 17, 2017 at 06:16:36PM +, Andrew Cooper wrote:
> On 17/01/17 18:05, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 16, 2017 at 01:04:09PM +, Andrew Cooper wrote:
> >> The chageset/version/compile information is currently exported as a set of
> >> function calls into a separate translation unit, which is inefficient for 
> >> all
> >> callers.
> >>
> >> Replace the function calls with externs pointing appropriately into 
> >> .rodata,
> >> which allows all users to generate code referencing the data directly.
> >>
> >> No functional change, but causes smaller and more efficient compiled code.
> >>
> >> Signed-off-by: Andrew Cooper 
> > Ah crud. That breaks the livepatch test-cases (they patch the 
> > xen_extra_version
> > function).
> 
> Lucky I haven't pushed it then, (although the livepatch build seems to
> still work fine for me, despite this change.)

make tests should fail.

> 
> > Are there some other code that can be modified that is reported
> > by 'xl info' on which the test-cases can run (and reported easily?).
> 
> Patch do_version() itself to return the same difference of information?

Ugh. That is going to make the building of test-cases quite complex.

I guess it can just do it and .. return only one value :-)

> 
> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/common: Drop function calls for Xen compile/version information

On 17/01/17 18:05, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 16, 2017 at 01:04:09PM +, Andrew Cooper wrote:
>> The chageset/version/compile information is currently exported as a set of
>> function calls into a separate translation unit, which is inefficient for all
>> callers.
>>
>> Replace the function calls with externs pointing appropriately into .rodata,
>> which allows all users to generate code referencing the data directly.
>>
>> No functional change, but causes smaller and more efficient compiled code.
>>
>> Signed-off-by: Andrew Cooper 
> Ah crud. That breaks the livepatch test-cases (they patch the 
> xen_extra_version
> function).

Lucky I haven't pushed it then, (although the livepatch build seems to
still work fine for me, despite this change.)

> Are there some other code that can be modified that is reported
> by 'xl info' on which the test-cases can run (and reported easily?).

Patch do_version() itself to return the same difference of information?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] swiotlb-xen: implement xen_swiotlb_get_sgtable callback

On Mon, Jan 16, 2017 at 05:09:24PM -0800, Stefano Stabellini wrote:
> On Mon, 16 Jan 2017, Andrii Anisov wrote:
> > From: Andrii Anisov 
> > 
> > Signed-off-by: Andrii Anisov 
> 
> Thanks for the patch!
> 
> 
> >  arch/arm/xen/mm.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> > index ff812a2..dc83a35 100644
> > --- a/arch/arm/xen/mm.c
> > +++ b/arch/arm/xen/mm.c
> > @@ -176,6 +176,16 @@ static int xen_swiotlb_dma_mmap(struct device *dev, 
> > struct vm_area_struct *vma,
> > return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size);
> >  }
> 
> As for the other patch, I would move xen_swiotlb_get_sgtable to
> drivers/xen/swiotlb-xen.c, if Konrad agrees.

That is fine.
> 
> 
> > +static int xen_swiotlb_get_sgtable(struct device *dev, struct sg_table 
> > *sgt,
> > +void *cpu_addr, dma_addr_t handle, size_t size,
> > +unsigned long attrs)
> > +{
> > +   if (__generic_dma_ops(dev)->get_sgtable)
> > +   return __generic_dma_ops(dev)->get_sgtable(dev, sgt, cpu_addr, 
> > handle,
> > +size, attrs);
> > +   return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size);
> > +}
> > +
> 
> __generic_dma_ops(dev)->get_sgtable on ARM is implemented by
> arm_dma_get_sgtable, which doesn't work on foreign pages (pages for
> which bfn != pfn).
> 
> If get_sgtable is guaranteed to be always called passing references to
> pages previously allocated with dma_alloc_coherent, then we don't have
> any issues, because those can't be foreign pages. I suggest we add an
> in-code comment to explain why this is safe, as for the previous patch.
> I think this is the case, but I am not 100% sure.
> 
> On the other hand, if this function can be called passing as parameters
> cpu_addr and handle that could potentially refer to a foreign page, then
> we have a problem. On ARM, virt_to_phys doesn't work on some pages, in
> fact that is the reason why ARM has its own separate get_sgtable
> implementation (arm_dma_get_sgtable). But with Xen foreign pages,
> dma_to_pfn doesn't work either, because we have no way of finding out
> the pfn address corresponding to the mfn of the foreign page. Both
> arm_dma_get_sgtable and dma_common_get_sgtable wouldn't work. I have no
> solution to this problem, but maybe we could add a check like the
> following (also to the previous patch?). I haven't tested it, but I
> think it should work as long as page_is_ram is returns the correct value
> for the handle parameter.
> 
> Signed-off-by: Stefano Stabellini 
> 
> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> index dc83a35..cd0441c 100644
> --- a/arch/arm/xen/mm.c
> +++ b/arch/arm/xen/mm.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -180,9 +181,18 @@ static int xen_swiotlb_get_sgtable(struct device *dev, 
> struct sg_table *sgt,
>void *cpu_addr, dma_addr_t handle, size_t size,
>unsigned long attrs)
>  {
> - if (__generic_dma_ops(dev)->get_sgtable)
> +
> + if (__generic_dma_ops(dev)->get_sgtable) {
> + /* We can't handle foreign pages here. */
> +#ifdef CONFIG_ARM
> + unsigned long bfn = dma_to_pfn(dev, handle);
> +#else
> + unsigned long bfn = handle >> PAGE_SHIFT;
> +#endif
> + BUG_ON (!page_is_ram(bfn));
>   return __generic_dma_ops(dev)->get_sgtable(dev, sgt, cpu_addr, 
> handle,
>size, attrs);
> + }
>   return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size);
>  }
>  
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/common: Drop function calls for Xen compile/version information

On Mon, Jan 16, 2017 at 01:04:09PM +, Andrew Cooper wrote:
> The chageset/version/compile information is currently exported as a set of
> function calls into a separate translation unit, which is inefficient for all
> callers.
> 
> Replace the function calls with externs pointing appropriately into .rodata,
> which allows all users to generate code referencing the data directly.
> 
> No functional change, but causes smaller and more efficient compiled code.
> 
> Signed-off-by: Andrew Cooper 

Ah crud. That breaks the livepatch test-cases (they patch the xen_extra_version
function).

Are there some other code that can be modified that is reported
by 'xl info' on which the test-cases can run (and reported easily?).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] PV audio drivers for Linux

2017-01-17 Thread Ughreja, Rakesh A

Hi,

I am trying to develop PV audio drivers and facing one issue to 
achieve zero copy of the buffers between Front End (DOM1) and 
Back End (DOM0) drivers.

When the buffer is allocated using __get_free_pages() on the DOM0 
OS, I am able to grant the access using gnttab_grant_foreign_access() 
to DOM1 as well as I am able to map it in the DOM1 virtual space 
using xenbus_map_ring_valloc().

However the existing audio driver allocates buffer using 
dma_alloc_coherent(). In that case I am able to grant the access using 
gnttab_grant_foreign_access() to DOM1 but when I try to map in the 
DOM1 virtual space using xenbus_map_ring_valloc(), it returns an error.

[1] Code returns from here.

507 xenbus_dev_fatal(dev, map[i].status,
508  "mapping in shared page %d from 
domain %d",
509  gnt_refs[i], dev->otherend_id);

gnttab_batch_map(map, i) is unable to map the page, but I am unable to 
understand why. May be its due to the difference in the way buffers
are allocated dma_alloc_coherent() vs __get_free_pages().

Since I don't want to touch existing audio driver, I need to figure out 
how to map buffer to DOM1 space with dma_alloc_coherent().

Any pointers would be really helpful. Thank you in advance.

Regards,
Rakesh

[1] http://lxr.free-electrons.com/source/drivers/xen/xenbus/xenbus_client.c#L469


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] PVH CPU hotplug design document

2017-01-17 Thread Roger Pau Monné

On Tue, Jan 17, 2017 at 10:50:44AM -0500, Boris Ostrovsky wrote:
> On 01/17/2017 10:33 AM, Jan Beulich wrote:
>  On 17.01.17 at 16:27,  wrote:
> >> On 01/17/2017 09:44 AM, Jan Beulich wrote:
> >> On 17.01.17 at 15:13,  wrote:
>  There's only one kind of PVHv2 guest that doesn't require ACPI, and that 
>  guest
>  type also doesn't have emulated local APICs. We agreed that this model 
>  was
>  interesting from things like unikernels DomUs, but that's the only 
>  reason why
>  we are providing it. Not that full OSes couldn't use it, but it seems
>  pointless.
> >>> You writing things this way makes me notice another possible design
> >>> issue here: Requiring ACPI is a bad thing imo, with even bare hardware
> >>> going different directions for at least some use cases (SFI being one
> >>> example). Hence I think ACPI should - like on bare hardware - remain
> >>> an optional thing. Which in turn require _all_ information obtained from
> >>> ACPI (if available) to also be available another way. And this other
> >>> way might by hypercalls in our case.
> >>
> >> At the risk of derailing this thread: why do we need vCPU hotplug for
> >> dom0 in the first place? What do we gain over "echo {1|0} >
> >> /sys/devices/system/cpu/cpuX/online" ?
> >>
> >> I can see why this may be needed for domUs where Xen can enforce number
> >> of vCPUs that are allowed to run (which we don't enforce now anyway) but
> >> why for dom0?
> > Good that you now ask this too - that's the PV hotplug mechanism,
> > and I've been saying all the time that this should be just fine for PVH
> > (Dom0 and DomU).
> 
> I think domU hotplug has some value in that we can change number VCPUs
> that the guest sees and ACPI-based hotplug allows us to do that in a
> "standard" manner.
> 
> For dom0 this doesn't seem to be necessary as it's a special domain
> available only to platform administrator.
> 
> Part of confusion I think is because PV hotplug is not hotplug, really,
> as far as Linux kernel is concerned.

Hm, I'm not really sure I'm following, but I think that we could translate this
Dom0 PV hotplug mechanism to PVH as:

 - Dom0 is provided with up to HVM_MAX_VCPUS local APIC entries in the MADT, and
   the entries > dom0_max_vcpus are marked as disabled.
 - Dom0 has HVM_MAX_VCPUS vCPUs ready to be started, either by using the local
   APIC or an hypercall.

Would that match what's done for classic PV Dom0?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 4/5] xen: sched: impove use of cpumask scratch space in Credit1.

It is ok to use just cpumask_scratch in csched_runq_steal().
In fact, the cpu parameter comes from the cpu local variable
in csched_load_balance(), which in turn comes from cpu in
csched_schedule(), which is smp_processor_id().

While there, also:
 - move the comment about cpumask_scratch in the header
   where the scratch space is declared;
 - spell more clearly (in that same comment) what are the
   serialization rules.

No functional change intended.

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
Cc: Jan Beulich 
---
 xen/common/sched_credit.c  |5 ++---
 xen/common/schedule.c  |7 +--
 xen/include/xen/sched-if.h |7 +++
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index dfe8545..ad20819 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1636,9 +1636,8 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, int 
balance_step)
  && !__vcpu_has_soft_affinity(vc, vc->cpu_hard_affinity) )
 continue;
 
-csched_balance_cpumask(vc, balance_step, cpumask_scratch_cpu(cpu));
-if ( __csched_vcpu_is_migrateable(vc, cpu,
-  cpumask_scratch_cpu(cpu)) )
+csched_balance_cpumask(vc, balance_step, cpumask_scratch);
+if ( __csched_vcpu_is_migrateable(vc, cpu, cpumask_scratch) )
 {
 /* We got a candidate. Grab it! */
 TRACE_3D(TRC_CSCHED_STOLEN_VCPU, peer_cpu,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 36ff2e9..bee5d1f 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -65,12 +65,7 @@ static void poll_timer_fn(void *data);
 DEFINE_PER_CPU(struct schedule_data, schedule_data);
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 
-/*
- * Scratch space, for avoiding having too many cpumask_var_t on the stack.
- * Properly serializing access, if necessary, is responsibility of each
- * scheduler (typically, one can expect this to be protected by the per pCPU
- * or per runqueue lock).
- */
+/* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
 
 extern const struct scheduler *__start_schedulers_array[], 
*__end_schedulers_array[];
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index bc0e794..c25cda6 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -47,6 +47,13 @@ DECLARE_PER_CPU(struct schedule_data, schedule_data);
 DECLARE_PER_CPU(struct scheduler *, scheduler);
 DECLARE_PER_CPU(struct cpupool *, cpupool);
 
+/*
+ * Scratch space, for avoiding having too many cpumask_var_t on the stack.
+ * Within each scheduler, when using the scratch mask of one pCPU:
+ * - the pCPU must belong to the scheduler,
+ * - the caller must own the per-pCPU scheduler lock (a.k.a. runqueue
+ *   lock).
+ */
 DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
 #define cpumask_scratch(_cpu(cpumask_scratch))
 #define cpumask_scratch_cpu(c) (_cpu(cpumask_scratch, c))


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 4/8] dm_op: convert HVMOP_set_pci_intx_level, HVMOP_set_isa_irq_level, and...

... HVMOP_set_pci_link_route

These HVMOPs were exposed to guests so their definitions need to be
preserved for compatibility. This patch therefore updates
__XEN_LATEST_INTERFACE_VERSION__ to 0x00040900 and makes the HVMOP
defintions conditional on __XEN_INTERFACE_VERSION__ less than that value.

NOTE: This patch also widens the 'domain' parameter of
  xc_hvm_set_pci_intx_level() from a uint8_t to a uint16_t.

Suggested-by: Jan Beulich 
Signed-off-by: Paul Durrant 
---
Reviewed-by: Jan Beulich 
Cc: Daniel De Graaf 
Cc: Ian Jackson 
Acked-by: Wei Liu 
Cc: Andrew Cooper 

v3:
- Remove unnecessary padding.

v2:
- Interface version modification moved to this patch, where it is needed.
- Addressed several comments from Jan.
---
 tools/flask/policy/modules/xen.if   |   8 +--
 tools/libxc/include/xenctrl.h   |   2 +-
 tools/libxc/xc_misc.c   |  83 --
 xen/arch/x86/hvm/dm.c   |  72 +++
 xen/arch/x86/hvm/hvm.c  | 136 
 xen/arch/x86/hvm/irq.c  |   7 +-
 xen/include/public/hvm/dm_op.h  |  42 +++
 xen/include/public/hvm/hvm_op.h |   4 ++
 xen/include/public/xen-compat.h |   2 +-
 xen/include/xen/hvm/irq.h   |   2 +-
 xen/include/xsm/dummy.h |  18 -
 xen/include/xsm/xsm.h   |  18 -
 xen/xsm/dummy.c |   3 -
 xen/xsm/flask/hooks.c   |  15 
 xen/xsm/flask/policy/access_vectors |   6 --
 15 files changed, 158 insertions(+), 260 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if 
b/tools/flask/policy/modules/xen.if
index 45e5b5f..092a6c5 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -57,8 +57,8 @@ define(`create_domain_common', `
allow $1 $2:shadow enable;
allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
allow $1 $2:grant setup;
-   allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
-   setparam pcilevel nested altp2mhvm altp2mhvm_op 
send_irq };
+   allow $1 $2:hvm { cacheattr getparam hvmctl sethvmc
+   setparam nested altp2mhvm altp2mhvm_op send_irq };
 ')
 
 # create_domain(priv, target)
@@ -93,7 +93,7 @@ define(`manage_domain', `
 #   (inbound migration is the same as domain creation)
 define(`migrate_domain_out', `
allow $1 domxen_t:mmu map_read;
-   allow $1 $2:hvm { gethvmc getparam irqlevel };
+   allow $1 $2:hvm { gethvmc getparam };
allow $1 $2:mmu { stat pageinfo map_read };
allow $1 $2:domain { getaddrsize getvcpucontext pause destroy };
allow $1 $2:domain2 gettsc;
@@ -151,7 +151,7 @@ define(`device_model', `
 
allow $1 $2_target:domain { getdomaininfo shutdown };
allow $1 $2_target:mmu { map_read map_write adjust physmap target_hack 
};
-   allow $1 $2_target:hvm { getparam setparam hvmctl irqlevel pciroute 
pcilevel cacheattr send_irq dm };
+   allow $1 $2_target:hvm { getparam setparam hvmctl cacheattr send_irq dm 
};
 ')
 
 # make_device_model(priv, dm_dom, hvm_dom)
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index c7ee412..f819bf2 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1594,7 +1594,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 
 int xc_hvm_set_pci_intx_level(
 xc_interface *xch, domid_t dom,
-uint8_t domain, uint8_t bus, uint8_t device, uint8_t intx,
+uint16_t domain, uint8_t bus, uint8_t device, uint8_t intx,
 unsigned int level);
 int xc_hvm_set_isa_irq_level(
 xc_interface *xch, domid_t dom,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 4c41d41..ddea2bb 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -470,33 +470,24 @@ int xc_getcpuinfo(xc_interface *xch, int max_cpus,
 
 int xc_hvm_set_pci_intx_level(
 xc_interface *xch, domid_t dom,
-uint8_t domain, uint8_t bus, uint8_t device, uint8_t intx,
+uint16_t domain, uint8_t bus, uint8_t device, uint8_t intx,
 unsigned int level)
 {
-DECLARE_HYPERCALL_BUFFER(struct xen_hvm_set_pci_intx_level, arg);
-int rc;
-
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-{
-PERROR("Could not allocate memory for xc_hvm_set_pci_intx_level 
hypercall");
-return -1;
-}
+struct xen_dm_op op;
+struct xen_dm_op_set_pci_intx_level *data;
 
-arg->domid  = dom;
-arg->domain = domain;
-arg->bus= bus;
-arg->device = device;
-arg->intx   = intx;
-arg->level  = level;
+memset(, 0, sizeof(op));
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_set_pci_intx_level,
-

[Xen-devel] [PATCH v4 7/8] dm_op: convert HVMOP_inject_trap and HVMOP_inject_msi

NOTE: This patch also modifies the types of the 'vector', 'type' and
  'insn_len' arguments of xc_hvm_inject_trap() from uint32_t to
  uint8_t. In practice the values passed were always truncated to
  8 bits.

Suggested-by: Jan Beulich 
Signed-off-by: Paul Durrant 
---
Reviewed-by: Jan Beulich 
Cc: Daniel De Graaf 
Cc: Ian Jackson 
Acked-by: Wei Liu 
Cc: Andrew Cooper 

v3:
- Fixed prefixing and padding.

v2:
- Addressed several comments from Jan.
---
 tools/flask/policy/modules/xen.if   |  2 +-
 tools/libxc/include/xenctrl.h   |  4 +-
 tools/libxc/xc_misc.c   | 64 +++
 xen/arch/x86/hvm/dm.c   | 49 
 xen/arch/x86/hvm/hvm.c  | 76 -
 xen/include/public/hvm/dm_op.h  | 48 +++
 xen/include/public/hvm/hvm_op.h | 45 --
 xen/include/xsm/dummy.h |  6 ---
 xen/include/xsm/xsm.h   |  6 ---
 xen/xsm/dummy.c |  1 -
 xen/xsm/flask/hooks.c   |  6 ---
 xen/xsm/flask/policy/access_vectors |  5 +--
 12 files changed, 124 insertions(+), 188 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if 
b/tools/flask/policy/modules/xen.if
index 092a6c5..45e5cea 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -151,7 +151,7 @@ define(`device_model', `
 
allow $1 $2_target:domain { getdomaininfo shutdown };
allow $1 $2_target:mmu { map_read map_write adjust physmap target_hack 
};
-   allow $1 $2_target:hvm { getparam setparam hvmctl cacheattr send_irq dm 
};
+   allow $1 $2_target:hvm { getparam setparam hvmctl cacheattr dm };
 ')
 
 # make_device_model(priv, dm_dom, hvm_dom)
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 13431bb..539cc69 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1641,8 +1641,8 @@ int xc_hvm_set_mem_type(
  * resumes. 
  */
 int xc_hvm_inject_trap(
-xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
-uint32_t type, uint32_t error_code, uint32_t insn_len,
+xc_interface *xch, domid_t dom, int vcpu, uint8_t vector,
+uint8_t type, uint32_t error_code, uint8_t insn_len,
 uint64_t cr2);
 
 /*
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 5b06d6b..98ab826 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -527,29 +527,20 @@ int xc_hvm_set_pci_link_route(
 }
 
 int xc_hvm_inject_msi(
-xc_interface *xch, domid_t dom, uint64_t addr, uint32_t data)
+xc_interface *xch, domid_t dom, uint64_t msi_addr, uint32_t msi_data)
 {
-DECLARE_HYPERCALL_BUFFER(struct xen_hvm_inject_msi, arg);
-int rc;
-
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-{
-PERROR("Could not allocate memory for xc_hvm_inject_msi hypercall");
-return -1;
-}
+struct xen_dm_op op;
+struct xen_dm_op_inject_msi *data;
 
-arg->domid = dom;
-arg->addr  = addr;
-arg->data  = data;
+memset(, 0, sizeof(op));
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_inject_msi,
-  HYPERCALL_BUFFER_AS_ARG(arg));
+op.op = XEN_DMOP_inject_msi;
+data = _msi;
 
-xc_hypercall_buffer_free(xch, arg);
+data->addr = msi_addr;
+data->data = msi_data;
 
-return rc;
+return do_dm_op(xch, dom, 1, , sizeof(op));
 }
 
 int xc_hvm_track_dirty_vram(
@@ -608,35 +599,26 @@ int xc_hvm_set_mem_type(
 }
 
 int xc_hvm_inject_trap(
-xc_interface *xch, domid_t dom, int vcpu, uint32_t vector,
-uint32_t type, uint32_t error_code, uint32_t insn_len,
+xc_interface *xch, domid_t dom, int vcpu, uint8_t vector,
+uint8_t type, uint32_t error_code, uint8_t insn_len,
 uint64_t cr2)
 {
-DECLARE_HYPERCALL_BUFFER(struct xen_hvm_inject_trap, arg);
-int rc;
-
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-{
-PERROR("Could not allocate memory for xc_hvm_inject_trap hypercall");
-return -1;
-}
+struct xen_dm_op op;
+struct xen_dm_op_inject_trap *data;
 
-arg->domid   = dom;
-arg->vcpuid  = vcpu;
-arg->vector  = vector;
-arg->type= type;
-arg->error_code  = error_code;
-arg->insn_len= insn_len;
-arg->cr2 = cr2;
+memset(, 0, sizeof(op));
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_inject_trap,
-  HYPERCALL_BUFFER_AS_ARG(arg));
+op.op = XEN_DMOP_inject_trap;
+data = _trap;
 
-xc_hypercall_buffer_free(xch, arg);
+data->vcpuid = vcpu;
+data->vector = vector;
+data->type = type;
+data->error_code = error_code;
+

[Xen-devel] [PATCH v4 8/8] x86/hvm: serialize trap injecting producer and consumer

Since injection works on a remote vCPU, and since there's no
enforcement of the subject vCPU being paused, there's a potential race
between the producing and consuming sides. Fix this by leveraging the
vector field as synchronization variable.

Signed-off-by: Jan Beulich 
[re-based]
Signed-off-by: Paul Durrant 
---
Reviewed-by: Andrew Cooper 

v3:
- Re-re-re-based after more changes.

v2:
- Re-re-based after Andrew's recent changes.
---
 xen/arch/x86/hvm/dm.c | 5 -
 xen/arch/x86/hvm/hvm.c| 8 +---
 xen/include/asm-x86/hvm/hvm.h | 3 +++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 3c031d9..4739deb 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -245,13 +245,16 @@ static int inject_trap(struct domain *d, unsigned int 
vcpuid,
 if ( vcpuid >= d->max_vcpus || !(v = d->vcpu[vcpuid]) )
 return -EINVAL;
 
-if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
+if ( cmpxchg(>arch.hvm_vcpu.inject_trap.vector,
+ HVM_TRAP_VECTOR_UNSET, HVM_TRAP_VECTOR_UPDATING) !=
+ HVM_TRAP_VECTOR_UNSET )
 return -EBUSY;
 
 v->arch.hvm_vcpu.inject_trap.type = type;
 v->arch.hvm_vcpu.inject_trap.insn_len = insn_len;
 v->arch.hvm_vcpu.inject_trap.error_code = error_code;
 v->arch.hvm_vcpu.inject_trap.cr2 = cr2;
+smp_wmb();
 v->arch.hvm_vcpu.inject_trap.vector = vector;
 
 return 0;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f1d59b2..eafad65 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -539,12 +539,14 @@ void hvm_do_resume(struct vcpu *v)
 }
 
 /* Inject pending hw/sw trap */
-if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
+if ( v->arch.hvm_vcpu.inject_trap.vector >= 0 )
 {
+smp_rmb();
+
 if ( !hvm_event_pending(v) )
 hvm_inject_event(>arch.hvm_vcpu.inject_trap);
 
-v->arch.hvm_vcpu.inject_trap.vector = -1;
+v->arch.hvm_vcpu.inject_trap.vector = HVM_TRAP_VECTOR_UNSET;
 }
 
 if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
@@ -1515,7 +1517,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 (void(*)(unsigned long))hvm_assert_evtchn_irq,
 (unsigned long)v);
 
-v->arch.hvm_vcpu.inject_trap.vector = -1;
+v->arch.hvm_vcpu.inject_trap.vector = HVM_TRAP_VECTOR_UNSET;
 
 if ( is_pvh_domain(d) )
 {
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 04e67fe..9b58346 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -77,6 +77,9 @@ enum hvm_intblk {
 #define HVM_HAP_SUPERPAGE_2MB   0x0001
 #define HVM_HAP_SUPERPAGE_1GB   0x0002
 
+#define HVM_TRAP_VECTOR_UNSET(-1)
+#define HVM_TRAP_VECTOR_UPDATING (-2)
+
 /*
  * The hardware virtual machine (HVM) interface abstracts away from the
  * x86/x86_64 CPU virtualization assist specifics. Currently this interface
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 0/8] New hypercall for device models

Following on from the design submitted by Jennifer Herbert to the list [1]
this series provides an implementation of __HYPERCALL_dm_op followed by
patches based on Jan Beulich's previous HVMCTL series [2] to convert
tools-only HVMOPs used by device models to DMOPs.

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg01052.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02433.html

Paul Durrant (8):
  public / x86: Introduce __HYPERCALL_dm_op...
  dm_op: convert HVMOP_*ioreq_server*
  dm_op: convert HVMOP_track_dirty_vram
  dm_op: convert HVMOP_set_pci_intx_level, HVMOP_set_isa_irq_level,
and...
  dm_op: convert HVMOP_modified_memory
  dm_op: convert HVMOP_set_mem_type
  dm_op: convert HVMOP_inject_trap and HVMOP_inject_msi
  x86/hvm: serialize trap injecting producer and consumer

 docs/designs/dmop.markdown  | 165 +
 tools/flask/policy/modules/xen.if   |   8 +-
 tools/libxc/include/xenctrl.h   |  13 +-
 tools/libxc/xc_domain.c | 212 +--
 tools/libxc/xc_misc.c   | 235 +
 tools/libxc/xc_private.c|  70 
 tools/libxc/xc_private.h|   2 +
 xen/arch/x86/hvm/Makefile   |   1 +
 xen/arch/x86/hvm/dm.c   | 565 ++
 xen/arch/x86/hvm/hvm.c  | 677 +---
 xen/arch/x86/hvm/ioreq.c|  36 +-
 xen/arch/x86/hvm/irq.c  |   7 +-
 xen/arch/x86/hypercall.c|   2 +
 xen/arch/x86/mm/hap/hap.c   |   2 +-
 xen/arch/x86/mm/shadow/common.c |   2 +-
 xen/include/Makefile|   1 +
 xen/include/asm-x86/hap.h   |   2 +-
 xen/include/asm-x86/hvm/domain.h|   3 +-
 xen/include/asm-x86/hvm/hvm.h   |   3 +
 xen/include/asm-x86/shadow.h|   2 +-
 xen/include/public/hvm/dm_op.h  | 377 
 xen/include/public/hvm/hvm_op.h | 230 +---
 xen/include/public/xen-compat.h |   2 +-
 xen/include/public/xen.h|   1 +
 xen/include/xen/hvm/irq.h   |   2 +-
 xen/include/xen/hypercall.h |  15 +
 xen/include/xlat.lst|   1 +
 xen/include/xsm/dummy.h |  36 +-
 xen/include/xsm/xsm.h   |  36 +-
 xen/xsm/dummy.c |   5 -
 xen/xsm/flask/hooks.c   |  37 +-
 xen/xsm/flask/policy/access_vectors |  15 +-
 32 files changed, 1451 insertions(+), 1314 deletions(-)
 create mode 100644 docs/designs/dmop.markdown
 create mode 100644 xen/arch/x86/hvm/dm.c
 create mode 100644 xen/include/public/hvm/dm_op.h

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 2/8] dm_op: convert HVMOP_ioreq_server

The definitions of HVM_IOREQSRV_BUFIOREQ_* have to persist as they are
already in use by callers of the libxc interface.

Suggested-by: Jan Beulich 
Signed-off-by: Paul Durrant 
--
Reviewed-by: Jan Beulich 
Cc: Ian Jackson 
Acked-by: Wei Liu 
Cc: Andrew Cooper 
Cc: Daniel De Graaf 

v4:
- #define uint64_aligned_t if necessary to handle compat code generation.

v3:
- Fix pad check.

v2:
- Addressed several comments from Jan.
---
 tools/libxc/xc_domain.c  | 212 -
 xen/arch/x86/hvm/dm.c|  89 
 xen/arch/x86/hvm/hvm.c   | 219 ---
 xen/arch/x86/hvm/ioreq.c |  36 +++
 xen/include/asm-x86/hvm/domain.h |   3 +-
 xen/include/public/hvm/dm_op.h   | 157 
 xen/include/public/hvm/hvm_op.h  | 132 +--
 xen/include/xsm/dummy.h  |   6 --
 xen/include/xsm/xsm.h|   6 --
 xen/xsm/dummy.c  |   1 -
 xen/xsm/flask/hooks.c|   6 --
 11 files changed, 360 insertions(+), 507 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 296b852..419a897 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1417,24 +1417,24 @@ int xc_hvm_create_ioreq_server(xc_interface *xch,
int handle_bufioreq,
ioservid_t *id)
 {
-DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
+struct xen_dm_op op;
+struct xen_dm_op_create_ioreq_server *data;
 int rc;
 
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-return -1;
+memset(, 0, sizeof(op));
 
-arg->domid = domid;
-arg->handle_bufioreq = handle_bufioreq;
+op.op = XEN_DMOP_create_ioreq_server;
+data = _ioreq_server;
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_create_ioreq_server,
-  HYPERCALL_BUFFER_AS_ARG(arg));
+data->handle_bufioreq = handle_bufioreq;
+
+rc = do_dm_op(xch, domid, 1, , sizeof(op));
+if ( rc )
+return rc;
 
-*id = arg->id;
+*id = data->id;
 
-xc_hypercall_buffer_free(xch, arg);
-return rc;
+return 0;
 }
 
 int xc_hvm_get_ioreq_server_info(xc_interface *xch,
@@ -1444,84 +1444,71 @@ int xc_hvm_get_ioreq_server_info(xc_interface *xch,
  xen_pfn_t *bufioreq_pfn,
  evtchn_port_t *bufioreq_port)
 {
-DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
+struct xen_dm_op op;
+struct xen_dm_op_get_ioreq_server_info *data;
 int rc;
 
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-return -1;
+memset(, 0, sizeof(op));
 
-arg->domid = domid;
-arg->id = id;
+op.op = XEN_DMOP_get_ioreq_server_info;
+data = _ioreq_server_info;
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_get_ioreq_server_info,
-  HYPERCALL_BUFFER_AS_ARG(arg));
-if ( rc != 0 )
-goto done;
+data->id = id;
+
+rc = do_dm_op(xch, domid, 1, , sizeof(op));
+if ( rc )
+return rc;
 
 if ( ioreq_pfn )
-*ioreq_pfn = arg->ioreq_pfn;
+*ioreq_pfn = data->ioreq_pfn;
 
 if ( bufioreq_pfn )
-*bufioreq_pfn = arg->bufioreq_pfn;
+*bufioreq_pfn = data->bufioreq_pfn;
 
 if ( bufioreq_port )
-*bufioreq_port = arg->bufioreq_port;
+*bufioreq_port = data->bufioreq_port;
 
-done:
-xc_hypercall_buffer_free(xch, arg);
-return rc;
+return 0;
 }
 
 int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
 ioservid_t id, int is_mmio,
 uint64_t start, uint64_t end)
 {
-DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
-int rc;
+struct xen_dm_op op;
+struct xen_dm_op_ioreq_server_range *data;
 
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-return -1;
+memset(, 0, sizeof(op));
 
-arg->domid = domid;
-arg->id = id;
-arg->type = is_mmio ? HVMOP_IO_RANGE_MEMORY : HVMOP_IO_RANGE_PORT;
-arg->start = start;
-arg->end = end;
+op.op = XEN_DMOP_map_io_range_to_ioreq_server;
+data = _io_range_to_ioreq_server;
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_map_io_range_to_ioreq_server,
-  HYPERCALL_BUFFER_AS_ARG(arg));
+data->id = id;
+data->type = is_mmio ? XEN_DMOP_IO_RANGE_MEMORY : XEN_DMOP_IO_RANGE_PORT;
+data->start = start;
+data->end = end;
 
-xc_hypercall_buffer_free(xch, arg);
-return rc;
+return do_dm_op(xch, domid, 1, , sizeof(op));
 }

[Xen-devel] [PATCH v4 3/8] dm_op: convert HVMOP_track_dirty_vram

The handle type passed to the underlying shadow and hap functions is
changed for compatibility with the new hypercall buffer.

NOTE: This patch also modifies the type of the 'nr' parameter of
  xc_hvm_track_dirty_vram() from uint64_t to uint32_t. In practice
  the value passed was always truncated to 32 bits.

Suggested-by: Jan Beulich 
Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Daniel De Graaf 
Cc: Ian Jackson 
Acked-by: Wei Liu 
Cc: Andrew Cooper 
Acked-by: George Dunlap 
Acked-by: Tim Deegan 

v4:
- Knock-on changes from compat code in dm.c. Not adding Jan's R-b since
  the patch has fundamentally changed.

v3:
- Check d->max_vcpus rather than d->vcpu, as requested by Jan.
- The handle type changes (from uint8 to void) are still necessary, hence
  omitting Jan's R-b until this is confirmed to be acceptable.

v2:
- Addressed several comments from Jan.
---
 tools/flask/policy/modules/xen.if   |  4 ++--
 tools/libxc/include/xenctrl.h   |  2 +-
 tools/libxc/xc_misc.c   | 32 +
 xen/arch/x86/hvm/dm.c   | 38 ++
 xen/arch/x86/hvm/hvm.c  | 41 -
 xen/arch/x86/mm/hap/hap.c   |  2 +-
 xen/arch/x86/mm/shadow/common.c |  2 +-
 xen/include/asm-x86/hap.h   |  2 +-
 xen/include/asm-x86/shadow.h|  2 +-
 xen/include/public/hvm/dm_op.h  | 18 
 xen/include/public/hvm/hvm_op.h | 16 ---
 xen/xsm/flask/hooks.c   |  3 ---
 xen/xsm/flask/policy/access_vectors |  2 --
 13 files changed, 73 insertions(+), 91 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if 
b/tools/flask/policy/modules/xen.if
index f9254c2..45e5b5f 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -58,7 +58,7 @@ define(`create_domain_common', `
allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
allow $1 $2:grant setup;
allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
-   setparam pcilevel trackdirtyvram nested altp2mhvm 
altp2mhvm_op send_irq };
+   setparam pcilevel nested altp2mhvm altp2mhvm_op 
send_irq };
 ')
 
 # create_domain(priv, target)
@@ -151,7 +151,7 @@ define(`device_model', `
 
allow $1 $2_target:domain { getdomaininfo shutdown };
allow $1 $2_target:mmu { map_read map_write adjust physmap target_hack 
};
-   allow $1 $2_target:hvm { getparam setparam trackdirtyvram hvmctl 
irqlevel pciroute pcilevel cacheattr send_irq dm };
+   allow $1 $2_target:hvm { getparam setparam hvmctl irqlevel pciroute 
pcilevel cacheattr send_irq dm };
 ')
 
 # make_device_model(priv, dm_dom, hvm_dom)
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2ba46d7..c7ee412 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1620,7 +1620,7 @@ int xc_hvm_inject_msi(
  */
 int xc_hvm_track_dirty_vram(
 xc_interface *xch, domid_t dom,
-uint64_t first_pfn, uint64_t nr,
+uint64_t first_pfn, uint32_t nr,
 unsigned long *bitmap);
 
 /*
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 06e90de..4c41d41 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -581,34 +581,22 @@ int xc_hvm_inject_msi(
 
 int xc_hvm_track_dirty_vram(
 xc_interface *xch, domid_t dom,
-uint64_t first_pfn, uint64_t nr,
+uint64_t first_pfn, uint32_t nr,
 unsigned long *dirty_bitmap)
 {
-DECLARE_HYPERCALL_BOUNCE(dirty_bitmap, (nr+7) / 8, 
XC_HYPERCALL_BUFFER_BOUNCE_OUT);
-DECLARE_HYPERCALL_BUFFER(struct xen_hvm_track_dirty_vram, arg);
-int rc;
+struct xen_dm_op op;
+struct xen_dm_op_track_dirty_vram *data;
 
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL || xc_hypercall_bounce_pre(xch, dirty_bitmap) )
-{
-PERROR("Could not bounce memory for xc_hvm_track_dirty_vram 
hypercall");
-rc = -1;
-goto out;
-}
+memset(, 0, sizeof(op));
 
-arg->domid = dom;
-arg->first_pfn = first_pfn;
-arg->nr= nr;
-set_xen_guest_handle(arg->dirty_bitmap, dirty_bitmap);
+op.op = XEN_DMOP_track_dirty_vram;
+data = _dirty_vram;
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_track_dirty_vram,
-  HYPERCALL_BUFFER_AS_ARG(arg));
+data->first_pfn = first_pfn;
+data->nr = nr;
 
-out:
-xc_hypercall_buffer_free(xch, arg);
-xc_hypercall_bounce_post(xch, dirty_bitmap);
-return rc;
+return do_dm_op(xch, dom, 2, , sizeof(op),
+dirty_bitmap, (nr + 7) / 8);
 }
 
 int xc_hvm_modified_memory(
diff

[Xen-devel] [PATCH v4 1/8] public / x86: Introduce __HYPERCALL_dm_op...

...as a set of hypercalls to be used by a device model.

As stated in the new docs/designs/dm_op.markdown:

"The aim of DMOP is to prevent a compromised device model from
compromising domains other then the one it is associated with. (And is
therefore likely already compromised)."

See that file for further information.

This patch simply adds the boilerplate for the hypercall.

Signed-off-by: Paul Durrant 
Suggested-by: Ian Jackson 
Suggested-by: Jennifer Herbert 
---
Cc: Ian Jackson 
Cc: Jennifer Herbert 
Cc: Daniel De Graaf 
Cc: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 

v4:
- Change XEN_GUEST_HANDLE_64 to XEN_GUEST_HANDLE in struct xen_dm_op_buf
  and add the necessary compat code. Drop Jan's R-b since the patch has
  been fundamentally modified.

v3:
- Re-written large portions of dmop.markdown to remove references to
  previous proposals and make it a standalone design doc.

v2:
- Addressed several comments from Jan.
- Removed modification of __XEN_LATEST_INTERFACE_VERSION__ as it is not
  needed in this patch.
---
 docs/designs/dmop.markdown| 165 ++
 tools/flask/policy/modules/xen.if |   2 +-
 tools/libxc/include/xenctrl.h |   1 +
 tools/libxc/xc_private.c  |  70 
 tools/libxc/xc_private.h  |   2 +
 xen/arch/x86/hvm/Makefile |   1 +
 xen/arch/x86/hvm/dm.c | 149 ++
 xen/arch/x86/hvm/hvm.c|   1 +
 xen/arch/x86/hypercall.c  |   2 +
 xen/include/Makefile  |   1 +
 xen/include/public/hvm/dm_op.h|  71 
 xen/include/public/xen.h  |   1 +
 xen/include/xen/hypercall.h   |  15 
 xen/include/xlat.lst  |   1 +
 xen/include/xsm/dummy.h   |   6 ++
 xen/include/xsm/xsm.h |   6 ++
 xen/xsm/flask/hooks.c |   7 ++
 17 files changed, 500 insertions(+), 1 deletion(-)
 create mode 100644 docs/designs/dmop.markdown
 create mode 100644 xen/arch/x86/hvm/dm.c
 create mode 100644 xen/include/public/hvm/dm_op.h

diff --git a/docs/designs/dmop.markdown b/docs/designs/dmop.markdown
new file mode 100644
index 000..9f2f0d4
--- /dev/null
+++ b/docs/designs/dmop.markdown
@@ -0,0 +1,165 @@
+DMOP
+
+
+Introduction
+
+
+The aim of DMOP is to prevent a compromised device model from compromising
+domains other then the one it is associated with. (And is therefore likely
+already compromised).
+
+The problem occurs when you a device model issues an hypercall that
+includes references to user memory other than the operation structure
+itself, such as with Track dirty VRAM (as used in VGA emulation).
+Is this case, the address of this other user memory needs to be vetted,
+to ensure it is not within restricted address ranges, such as kernel
+memory. The real problem comes down to how you would vet this address -
+the idea place to do this is within the privcmd driver, without privcmd
+having to have specific knowledge of the hypercall's semantics.
+
+The Design
+--
+
+The privcmd driver implements a new restriction ioctl, which takes a domid
+parameter.  After that restriction ioctl is issued, the privcmd driver will
+permit only DMOP hypercalls, and only with the specified target domid.
+
+A DMOP hypercall consists of an array of buffers and lengths, with the
+first one containing the specific DMOP parameters. These can then reference
+further buffers from within in the array. Since the only user buffers
+passed are that found with that array, they can all can be audited by
+privcmd.
+
+The following code illustrates this idea:
+
+struct xen_dm_op {
+uint32_t op;
+};
+
+struct xen_dm_op_buf {
+XEN_GUEST_HANDLE(void) h;
+unsigned long size;
+};
+typedef struct xen_dm_op_buf xen_dm_op_buf_t;
+
+enum neg_errnoval
+HYPERVISOR_dm_op(domid_t domid,
+ xen_dm_op_buf_t bufs[],
+ unsigned int nr_bufs)
+
+@domid is the domain the hypercall operates on.
+@bufs points to an array of buffers where @bufs[0] contains a struct
+dm_op, describing the specific device model operation and its parameters.
+@bufs[1..] may be referenced in the parameters for the purposes of
+passing extra information to or from the domain.
+@nr_bufs is the number of buffers in the @bufs array.
+
+It is forbidden for the above struct (xen_dm_op) to contain any guest
+handles. If they are needed, they should instead be in
+HYPERVISOR_dm_op->bufs.
+
+Validation by privcmd driver
+
+
+If the privcmd driver has been restricted to specific domain (using a
+ new ioctl), when it received an op, it will:
+
+1. Check hypercall is DMOP.
+
+2. Check domid == restricted domid.
+
+3. For each @nr_bufs in @bufs: Check @h and @size give

[Xen-devel] [PATCH v4 6/8] dm_op: convert HVMOP_set_mem_type

This patch removes the need for handling HVMOP restarts, so that
infrastructure is removed.

NOTE: This patch also modifies the type of the 'nr' argument of
  xc_hvm_set_mem_type() from uint64_t to uint32_t. In practice the
  value passed was always truncated to 32 bits.

Suggested-by: Jan Beulich 
Signed-off-by: Paul Durrant 
---
Reviewed-by: Jan Beulich 
Cc: Ian Jackson 
Acked-by: Wei Liu 
Cc: Andrew Cooper 
Cc: Daniel De Graaf 

v4:
- Added initializers as requested by Jan.

v3:
- Addressed more comments from Jan.

v2:
- Addressed several comments from Jan.
---
 tools/libxc/include/xenctrl.h   |   2 +-
 tools/libxc/xc_misc.c   |  29 +++-
 xen/arch/x86/hvm/dm.c   |  90 
 xen/arch/x86/hvm/hvm.c  | 136 +---
 xen/include/public/hvm/dm_op.h  |  22 ++
 xen/include/public/hvm/hvm_op.h |  20 --
 xen/xsm/flask/policy/access_vectors |   2 +-
 7 files changed, 125 insertions(+), 176 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index a5c234f..13431bb 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1634,7 +1634,7 @@ int xc_hvm_modified_memory(
  * Allowed types are HVMMEM_ram_rw, HVMMEM_ram_ro, HVMMEM_mmio_dm
  */
 int xc_hvm_set_mem_type(
-xc_interface *xch, domid_t dom, hvmmem_type_t memtype, uint64_t first_pfn, 
uint64_t nr);
+xc_interface *xch, domid_t dom, hvmmem_type_t memtype, uint64_t first_pfn, 
uint32_t nr);
 
 /*
  * Injects a hardware/software CPU trap, to take effect the next time the HVM 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 597df99..5b06d6b 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -590,30 +590,21 @@ int xc_hvm_modified_memory(
 }
 
 int xc_hvm_set_mem_type(
-xc_interface *xch, domid_t dom, hvmmem_type_t mem_type, uint64_t 
first_pfn, uint64_t nr)
+xc_interface *xch, domid_t dom, hvmmem_type_t mem_type, uint64_t 
first_pfn, uint32_t nr)
 {
-DECLARE_HYPERCALL_BUFFER(struct xen_hvm_set_mem_type, arg);
-int rc;
-
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-{
-PERROR("Could not allocate memory for xc_hvm_set_mem_type hypercall");
-return -1;
-}
+struct xen_dm_op op;
+struct xen_dm_op_set_mem_type *data;
 
-arg->domid= dom;
-arg->hvmmem_type  = mem_type;
-arg->first_pfn= first_pfn;
-arg->nr   = nr;
+memset(, 0, sizeof(op));
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_set_mem_type,
-  HYPERCALL_BUFFER_AS_ARG(arg));
+op.op = XEN_DMOP_set_mem_type;
+data = _mem_type;
 
-xc_hypercall_buffer_free(xch, arg);
+data->mem_type = mem_type;
+data->first_pfn = first_pfn;
+data->nr = nr;
 
-return rc;
+return do_dm_op(xch, dom, 1, , sizeof(op));
 }
 
 int xc_hvm_inject_trap(
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index dd81116..b3c91f8 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -159,6 +159,82 @@ static int modified_memory(struct domain *d, xen_pfn_t 
*first_pfn,
 return rc;
 }
 
+static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
+{
+return p2m_is_ram(old) ||
+   (p2m_is_hole(old) && new == p2m_mmio_dm) ||
+   (old == p2m_ioreq_server && new == p2m_ram_rw);
+}
+
+static int set_mem_type(struct domain *d, hvmmem_type_t mem_type,
+xen_pfn_t *first_pfn, unsigned int *nr)
+{
+xen_pfn_t last_pfn = *first_pfn + *nr - 1;
+unsigned int iter = 0;
+int rc = 0;
+
+/* Interface types to internal p2m types */
+static const p2m_type_t memtype[] = {
+[HVMMEM_ram_rw]  = p2m_ram_rw,
+[HVMMEM_ram_ro]  = p2m_ram_ro,
+[HVMMEM_mmio_dm] = p2m_mmio_dm,
+[HVMMEM_unused] = p2m_invalid,
+[HVMMEM_ioreq_server] = p2m_ioreq_server
+};
+
+if ( (*first_pfn > last_pfn) ||
+ (last_pfn > domain_get_maximum_gpfn(d)) )
+return -EINVAL;
+
+if ( mem_type >= ARRAY_SIZE(memtype) ||
+ unlikely(mem_type == HVMMEM_unused) )
+return -EINVAL;
+
+while ( iter < *nr )
+{
+unsigned long pfn = *first_pfn + iter;
+p2m_type_t t;
+
+get_gfn_unshare(d, pfn, );
+if ( p2m_is_paging(t) )
+{
+put_gfn(d, pfn);
+p2m_mem_paging_populate(d, pfn);
+return -EAGAIN;
+}
+
+if ( p2m_is_shared(t) )
+rc = -EAGAIN;
+else if ( !allow_p2m_type_change(t, memtype[mem_type]) )
+rc = -EINVAL;
+else
+rc = p2m_change_type_one(d, pfn, t, memtype[mem_type]);
+
+put_gfn(d, pfn);
+
+if ( rc )
+

[Xen-devel] [PATCH v4 5/8] dm_op: convert HVMOP_modified_memory

This patch introduces code to handle DMOP continuations.

NOTE: This patch also modifies the type of the 'nr' argument of
  xc_hvm_modified_memory() from uint64_t to uint32_t. In practice the
  value passed was always truncated to 32 bits.

Suggested-by: Jan Beulich 
Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Ian Jackson 
Acked-by: Wei Liu 
Cc: Andrew Cooper 
Cc: Daniel De Graaf 

v4:
- Continuation code in dm.c modified as knock-on from compat code. Not
  adding Jan's R-b since patch has fundamentally changed.

v3:
- Addressed more comments from Jan.

v2:
- Addressed several comments from Jan, including...
- Added explanatory note on continuation handling
---
 tools/libxc/include/xenctrl.h   |  2 +-
 tools/libxc/xc_misc.c   | 27 +
 xen/arch/x86/hvm/dm.c   | 77 -
 xen/arch/x86/hvm/hvm.c  | 60 -
 xen/include/public/hvm/dm_op.h  | 19 +
 xen/include/public/hvm/hvm_op.h | 13 ---
 xen/xsm/flask/policy/access_vectors |  2 +-
 7 files changed, 106 insertions(+), 94 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index f819bf2..a5c234f 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1627,7 +1627,7 @@ int xc_hvm_track_dirty_vram(
  * Notify that some pages got modified by the Device Model
  */
 int xc_hvm_modified_memory(
-xc_interface *xch, domid_t dom, uint64_t first_pfn, uint64_t nr);
+xc_interface *xch, domid_t dom, uint64_t first_pfn, uint32_t nr);
 
 /*
  * Set a range of memory to a specific type.
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index ddea2bb..597df99 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -573,29 +573,20 @@ int xc_hvm_track_dirty_vram(
 }
 
 int xc_hvm_modified_memory(
-xc_interface *xch, domid_t dom, uint64_t first_pfn, uint64_t nr)
+xc_interface *xch, domid_t dom, uint64_t first_pfn, uint32_t nr)
 {
-DECLARE_HYPERCALL_BUFFER(struct xen_hvm_modified_memory, arg);
-int rc;
-
-arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
-if ( arg == NULL )
-{
-PERROR("Could not allocate memory for xc_hvm_modified_memory 
hypercall");
-return -1;
-}
+struct xen_dm_op op;
+struct xen_dm_op_modified_memory *data;
 
-arg->domid = dom;
-arg->first_pfn = first_pfn;
-arg->nr= nr;
+memset(, 0, sizeof(op));
 
-rc = xencall2(xch->xcall, __HYPERVISOR_hvm_op,
-  HVMOP_modified_memory,
-  HYPERCALL_BUFFER_AS_ARG(arg));
+op.op = XEN_DMOP_modified_memory;
+data = _memory;
 
-xc_hypercall_buffer_free(xch, arg);
+data->first_pfn = first_pfn;
+data->nr = nr;
 
-return rc;
+return do_dm_op(xch, dom, 1, , sizeof(op));
 }
 
 int xc_hvm_set_mem_type(
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 12a82e5..dd81116 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -14,6 +14,7 @@
  * this program; If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -105,6 +106,59 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
 return 0;
 }
 
+static int modified_memory(struct domain *d, xen_pfn_t *first_pfn,
+   unsigned int *nr)
+{
+xen_pfn_t last_pfn = *first_pfn + *nr - 1;
+unsigned int iter = 0;
+int rc = 0;
+
+if ( (*first_pfn > last_pfn) ||
+ (last_pfn > domain_get_maximum_gpfn(d)) )
+return -EINVAL;
+
+if ( !paging_mode_log_dirty(d) )
+return 0;
+
+while ( iter < *nr )
+{
+unsigned long pfn = *first_pfn + iter;
+struct page_info *page;
+
+page = get_page_from_gfn(d, pfn, NULL, P2M_UNSHARE);
+if ( page )
+{
+mfn_t gmfn = _mfn(page_to_mfn(page));
+
+paging_mark_dirty(d, gmfn);
+/*
+ * These are most probably not page tables any more
+ * don't take a long time and don't die either.
+ */
+sh_remove_shadows(d, gmfn, 1, 0);
+put_page(page);
+}
+
+iter++;
+
+/*
+ * Check for continuation every 256th iteration and if the
+ * iteration is not the last.
+ */
+if ( (iter < *nr) && ((iter & 0xff) == 0) &&
+ hypercall_preempt_check() )
+{
+*first_pfn += iter;
+*nr -= iter;
+
+rc = -ERESTART;
+break;
+}
+}
+
+return rc;
+}
+
 static int dm_op(domid_t domid,
  unsigned int nr_bufs,
  xen_dm_op_buf_t bufs[])
@@ -266,12 +320,25 @@ static int dm_op(domid_t domid,
 break;
 }
 
+

[Xen-devel] [PATCH v3] kexec: implement STATUS hypercall to check if image is loaded

2017-01-17 Thread Eric DeVolder

The tools that use kexec are asynchronous in nature and do not keep
state changes. As such provide an hypercall to find out whether an
image has been loaded for either type.

Note: No need to modify XSM as it has one size fits all check and
does not check for subcommands.

Note: No need to check KEXEC_FLAG_IN_PROGRESS (and error out of
kexec_status()) as this flag is set only once by the first/only
cpu on the crash path.

Note: This is just the Xen side of the hypercall, kexec-tools patch
to come separately.

Signed-off-by: Konrad Rzeszutek Wilk 
Signed-off-by: Eric DeVolder 
---
CC: Andrew Cooper 
CC: Elena Ufimtseva 
CC: Daniel Kiper 

v0: Internal version.
v1: Dropped Reviewed-by, posting on xen-devel.
v2: Incorporated xen-devel feedback, re-posted on xen-devel.
v3: Incorporated xen-devel feedback
---
 tools/libxc/include/xenctrl.h | 10 ++
 tools/libxc/xc_kexec.c| 24 
 xen/common/kexec.c| 19 +++
 xen/include/public/kexec.h| 13 +
 4 files changed, 66 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 4ab0f57..63c616f 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2574,6 +2574,16 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, 
uint16_t arch,
  */
 int xc_kexec_unload(xc_interface *xch, int type);
 
+/*
+ * Find out whether the image has been succesfully loaded.
+ *
+ * The type can be either KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
+ * If zero is returned, that means no image is loaded for the type.
+ * If one is returned, that means an image is loaded for the type.
+ * Otherwise, negative return value indicates error.
+ */
+int xc_kexec_status(xc_interface *xch, int type);
+
 typedef xenpf_resource_entry_t xc_resource_entry_t;
 
 /*
diff --git a/tools/libxc/xc_kexec.c b/tools/libxc/xc_kexec.c
index 59e2f07..a4e8966 100644
--- a/tools/libxc/xc_kexec.c
+++ b/tools/libxc/xc_kexec.c
@@ -126,3 +126,27 @@ out:
 
 return ret;
 }
+
+int xc_kexec_status(xc_interface *xch, int type)
+{
+DECLARE_HYPERCALL_BUFFER(xen_kexec_status_t, status);
+int ret = -1;
+
+status = xc_hypercall_buffer_alloc(xch, status, sizeof(*status));
+if ( status == NULL )
+{
+PERROR("Could not alloc buffer for kexec status hypercall");
+goto out;
+}
+
+status->type = type;
+
+ret = xencall2(xch->xcall, __HYPERVISOR_kexec_op,
+   KEXEC_CMD_kexec_status,
+   HYPERCALL_BUFFER_AS_ARG(status));
+
+out:
+xc_hypercall_buffer_free(xch, status);
+
+return ret;
+}
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index c83d48f..aa808cb 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -1169,6 +1169,22 @@ static int kexec_unload(XEN_GUEST_HANDLE_PARAM(void) 
uarg)
 return kexec_do_unload();
 }
 
+static int kexec_status(XEN_GUEST_HANDLE_PARAM(void) uarg)
+{
+xen_kexec_status_t status;
+int base, bit;
+
+if ( unlikely(copy_from_guest(, uarg, 1)) )
+return -EFAULT;
+
+/* No need to check KEXEC_FLAG_IN_PROGRESS. */
+
+if ( kexec_load_get_bits(status.type, , ) )
+return -EINVAL;
+
+return test_bit(bit, _flags);
+}
+
 static int do_kexec_op_internal(unsigned long op,
 XEN_GUEST_HANDLE_PARAM(void) uarg,
 bool_t compat)
@@ -1208,6 +1224,9 @@ static int do_kexec_op_internal(unsigned long op,
 case KEXEC_CMD_kexec_unload:
 ret = kexec_unload(uarg);
 break;
+case KEXEC_CMD_kexec_status:
+ret = kexec_status(uarg);
+break;
 }
 
 return ret;
diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
index a6a0a88..c200e8c 100644
--- a/xen/include/public/kexec.h
+++ b/xen/include/public/kexec.h
@@ -227,6 +227,19 @@ typedef struct xen_kexec_unload {
 } xen_kexec_unload_t;
 DEFINE_XEN_GUEST_HANDLE(xen_kexec_unload_t);
 
+/*
+ * Figure out whether we have an image loaded. A return value of
+ * zero indicates no image loaded. A return value of one
+ * indicates an image is loaded. A negative return value
+ * indicates an error.
+ *
+ * Type must be one of KEXEC_TYPE_DEFAULT or KEXEC_TYPE_CRASH.
+ */
+#define KEXEC_CMD_kexec_status 6
+typedef struct xen_kexec_status {
+uint8_t type;
+} xen_kexec_status_t;
+DEFINE_XEN_GUEST_HANDLE(xen_kexec_status_t);
 #else /* __XEN_INTERFACE_VERSION__ < 0x00040400 */
 
 #define KEXEC_CMD_kexec_load KEXEC_CMD_kexec_load_v1
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 3/5] xen: credit2: fix shutdown/suspend when playing with cpupools.

In fact, during shutdown/suspend, we temporarily move all
the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
domains, we call csched2_vcpu_migrate(), expects to find the
target pCPU in the domain's pool

Therefore, if Credit2 is the default scheduler and we have
removed pCPU 0 from cpupool0, shutdown/suspend fails like
this:

 RIP:e008:[] sched_credit2.c#migrate+0x274/0x2d1
 Xen call trace:
[] sched_credit2.c#migrate+0x274/0x2d1
[] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
[] schedule.c#vcpu_move_locked+0x69/0x6f
[] cpu_disable_scheduler+0x3d7/0x430
[] __cpu_disable+0x299/0x2b0
[] cpu.c#take_cpu_down+0x2f/0x38
[] stop_machine.c#stopmachine_action+0x7f/0x8d
[] tasklet.c#do_tasklet_work+0x74/0xab
[] do_tasklet+0x66/0x8b
[] domain.c#idle_loop+0x3b/0x5e

 
 Panic on CPU 8:
 Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
 

On the other hand, if Credit2 is the scheduler of another
pool, when trying (still during shutdown/suspend) to move
the vCPUs of the Credit2 domains to pCPU 0, it figures
out that pCPU 0 is not a Credit2 pCPU, and fails like this:

 RIP:e008:[] 
sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
 Xen call trace:
[] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
[] schedule.c#vcpu_move_locked+0x69/0x6f
[] cpu_disable_scheduler+0x3d7/0x430
[] __cpu_disable+0x299/0x2b0
[] cpu.c#take_cpu_down+0x2f/0x38
[] stop_machine.c#stopmachine_action+0x7f/0x8d
[] tasklet.c#do_tasklet_work+0x74/0xab
[] do_tasklet+0x66/0x8b
[] domain.c#idle_loop+0x3b/0x5e

The solution is to recognise the specific situation, inside
csched2_vcpu_migrate() and, considering it is something temporary,
which only happens during shutdown/suspend, quickly deal with it.

Then, in the resume path, in restore_vcpu_affinity(), things
are set back to normal, and a new v->processor is chosen, for
each vCPU, from the proper set of pCPUs (i.e., the ones of
the proper cpupool).

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
---
This is a bugfix, and should be backported to 4.8.

Note that Credit2 being used, either as default scheduler or in a cpupool is
what triggers the bug, but it's actually more a general thing, which would
affect any scheduler that remaps the runqueue locks.
---
 xen/common/sched_credit2.c |   32 +++-
 xen/common/schedule.c  |   25 ++---
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ce0e146..2ce738d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1946,13 +1946,43 @@ static void
 csched2_vcpu_migrate(
 const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
 {
+struct domain *d = vc->domain;
 struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
 struct csched2_runqueue_data *trqd;
+s_time_t now = NOW();
 
 /* Check if new_cpu is valid */
 ASSERT(cpumask_test_cpu(new_cpu, _PRIV(ops)->initialized));
 ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
+/*
+ * Being passed a target pCPU which is outside of our cpupool is only
+ * valid if we are shutting down (or doing ACPI suspend), and we are
+ * moving everyone to BSP, no matter whether or not BSP is inside our
+ * cpupool.
+ *
+ * And since there indeed is the chance that it is not part of it, all
+ * we must do is remove _and_ unassign the vCPU from any runqueue, as
+ * well as updating v->processor with the target, so that the suspend
+ * process can continue.
+ *
+ * It will then be during resume that a new, meaningful, value for
+ * v->processor will be chosen, and during actual domain unpause that
+ * the vCPU will be assigned to and added to the proper runqueue.
+ */
+if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+{
+ASSERT(system_state == SYS_STATE_suspend);
+if ( __vcpu_on_runq(svc) )
+{
+__runq_remove(svc);
+update_load(ops, svc->rqd, NULL, -1, now);
+}
+__runq_deassign(svc);
+vc->processor = new_cpu;
+return;
+}
+
 trqd = RQD(ops, new_cpu);
 
 /*
@@ -1964,7 +1994,7 @@ csched2_vcpu_migrate(
  * pointing to a pcpu where we can't run any longer.
  */
 if ( trqd != svc->rqd )
-migrate(ops, svc, trqd, NOW());
+migrate(ops, svc, trqd, now);
 else
 vc->processor = new_cpu;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5b444c4..36ff2e9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -633,8 +633,11 @@ void vcpu_force_reschedule(struct vcpu *v)
 
 void restore_vcpu_affinity(struct domain *d)
 {
+unsigned int cpu =

[Xen-devel] [PATCH 5/5] xen: sched: simplify ACPI S3 resume path.

In fact, when domains are being unpaused:
 - it's not necessary to put the vcpu to sleep, as
   they are all already paused;
 - it is not necessary to call vcpu_migrate(), as
   the vcpus are still paused, and therefore won't
   wakeup anyway.

Basically, the only important thing is to call
pick_cpu, to let the scheduler run and figure out
what would be the best initial placement (i.e., the
value stored in v->processor), for the vcpus, as
they come back up, one after another.

Note that this is consistent with what was happening
before this change, as vcpu_migrate() calls pick_cpu.
But much simpler and quicker.

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
---
 xen/common/schedule.c |   22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index bee5d1f..43b5b99 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -635,7 +635,11 @@ void restore_vcpu_affinity(struct domain *d)
 
 for_each_vcpu ( d, v )
 {
-spinlock_t *lock = vcpu_schedule_lock_irq(v);
+spinlock_t *lock;
+
+ASSERT(!vcpu_runnable(v));
+
+lock = vcpu_schedule_lock_irq(v);
 
 if ( v->affinity_broken )
 {
@@ -659,17 +663,11 @@ void restore_vcpu_affinity(struct domain *d)
 cpupool_domain_cpumask(v->domain));
 v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
-if ( v->processor == cpu )
-{
-set_bit(_VPF_migrating, >pause_flags);
-spin_unlock_irq(lock);;
-vcpu_sleep_nosync(v);
-vcpu_migrate(v);
-}
-else
-{
-spin_unlock_irq(lock);
-}
+spin_unlock_irq(lock);;
+
+lock = vcpu_schedule_lock_irq(v);
+v->processor = SCHED_OP(VCPU2OP(v), pick_cpu, v);
+spin_unlock_irq(lock);
 }
 
 domain_update_node_affinity(d);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 0/5] xen: sched: scheduling (mostly, Credit2) and cpupool fixes and improvements

Hello,

This series fixes a few issues issues, related to Credit2 and to scheduling and
cpupools interactions in a more general fashion.

The first 3 patches cures (symptoms of) bugs in Credit2, and should be
backported to 4.8 (it should not be too hard to do so, and I can help with
that, if necessary).

In fact, patch 1 ("xen: credit2: use the correct scratch cpumask."), fixes a
buggy behavior identified by Jan here [1]. No Oops, or ASSERT were triggering,
but there's the risk of incurring in nonoptimal or unpredictable scheduling
behavior, when multiple cpupools, with different schedulers, are used.

Patch 2 ("xen: credit2: never consider CPUs outside of our cpupool.") is
necessary because I thought we were already taking all the proper measure to
have Credit2 vCPUs live in their cpupool, but that wasn't the case. The patch
cures potential crash, so it's important, IMO, and should also be backported.
As noted in the extended changelog, while working on this, I identified some
unideal aspects of the interface and the interactions between cpupools and the
scheduler. Fixing that properly will require more work, if not a rethink of the
said interface.

Path 3 ("xen: credit2: fix shutdown/suspend when playing with cpupools.") also
fixes a bug which manifests itself when the host is shutdown or attempts
suspending with the BSP (CPU 0, as of now) not belonging to cpupool0 as it does
by default. This again manifests only when Credit2 is involved (see patch
description for more details), but is more general and could potentially affect
any scheduler that does a runqueue lock remapping and management similar to
what Credit2 does in that department. This is probably the most 'invasive'
(affects schedule.c), but I think it should also be backported.

The last 2 patches, OTOH, are improvements rather than bugfixes, and so they're
not backport candidates.

There is a git branch with the patch applied available here:

* git://xenbits.xen.org/people/dariof/xen.git rel/sched/fix-credit2-cpupool
*
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/fix-credit2-cpupool
* https://travis-ci.org/fdario/xen/builds/192726171

Thanks and Regards,
Dario

---
Dario Faggioli (5):
xen: credit2: use the correct scratch cpumask.
xen: credit2: never consider CPUs outside of our cpupool.
xen: credit2: fix shutdown/suspend when playing with cpupools.
xen: sched: impove use of cpumask scratch space in Credit1.
xen: sched: simplify ACPI S3 resume path.

xen/common/sched_credit.c |5 +-
xen/common/sched_credit2.c | 110
xen/common/schedule.c | 48 ---
xen/include/xen/sched-if.h |7 +++
4 files changed, 118 insertions(+), 52 deletions(-)
--
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 1/5] xen: credit2: use the correct scratch cpumask.

In fact, there is one scratch mask per each CPU. When
you use the one of a CPU, it must be true that:
 - the CPU belongs to your cpupool and scheduler,
 - you own the runqueue lock (the one you take via
   {v,p}cpu_schedule_lock()) for that CPU.

This was not the case within the following functions:

get_fallback_cpu(), csched2_cpu_pick(): as we can't be
sure we either are on, or hold the lock for, the CPU
that is in the vCPU's 'v->processor'.

migrate(): it's ok, when called from balance_load(),
because that comes from csched2_schedule(), which takes
the runqueue lock of the CPU where it executes. But it is
not ok when we come from csched2_vcpu_migrate(), which
can be called from other places.

The fix is to explicitly use the scratch space of the
CPUs for which we know we hold the runqueue lock.

Signed-off-by: Dario Faggioli 
Reported-by: Jan Beulich 
---
Cc: George Dunlap 
Cc: Jan Beulich 
---
This is a bugfix, and should be backported to 4.8.
---
 xen/common/sched_credit2.c |   39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef8e0d8..523922e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -510,24 +510,23 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t 
*mask)
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-int cpu;
+int fallback_cpu, cpu = svc->vcpu->processor;
 
-if ( likely(cpumask_test_cpu(svc->vcpu->processor,
- svc->vcpu->cpu_hard_affinity)) )
-return svc->vcpu->processor;
+if ( likely(cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity)) )
+return cpu;
 
-cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
 >rqd->active);
-cpu = cpumask_first(cpumask_scratch);
-if ( likely(cpu < nr_cpu_ids) )
-return cpu;
+fallback_cpu = cpumask_first(cpumask_scratch_cpu(cpu));
+if ( likely(fallback_cpu < nr_cpu_ids) )
+return fallback_cpu;
 
 cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
 cpupool_domain_cpumask(svc->vcpu->domain));
 
-ASSERT(!cpumask_empty(cpumask_scratch));
+ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
-return cpumask_first(cpumask_scratch);
+return cpumask_first(cpumask_scratch_cpu(cpu));
 }
 
 /*
@@ -1492,7 +1491,7 @@ static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
 struct csched2_private *prv = CSCHED2_PRIV(ops);
-int i, min_rqi = -1, new_cpu;
+int i, min_rqi = -1, new_cpu, cpu = vc->processor;
 struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
 s_time_t min_avgload = MAX_LOAD;
 
@@ -1512,7 +1511,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu 
*vc)
  * just grab the prv lock.  Instead, we'll have to trylock, and
  * do something else reasonable if we fail.
  */
-ASSERT(spin_is_locked(per_cpu(schedule_data, 
vc->processor).schedule_lock));
+ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
 
 if ( !read_trylock(>lock) )
 {
@@ -1539,9 +1538,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu 
*vc)
 }
 else
 {
-cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
 >migrate_rqd->active);
-new_cpu = cpumask_any(cpumask_scratch);
+new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
 if ( new_cpu < nr_cpu_ids )
 goto out_up;
 }
@@ -1598,9 +1597,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu 
*vc)
 goto out_up;
 }
 
-cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
 >rqd[min_rqi].active);
-new_cpu = cpumask_any(cpumask_scratch);
+new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
 BUG_ON(new_cpu >= nr_cpu_ids);
 
  out_up:
@@ -1675,6 +1674,8 @@ static void migrate(const struct scheduler *ops,
 struct csched2_runqueue_data *trqd, 
 s_time_t now)
 {
+int cpu = svc->vcpu->processor;
+
 if ( unlikely(tb_init_done) )
 {
 struct {
@@ -1696,7 +1697,7 @@ static void migrate(const struct scheduler *ops,
 svc->migrate_rqd = trqd;
 __set_bit(_VPF_migrating, >vcpu->pause_flags);
 __set_bit(__CSFLAG_runq_migrate_request, >flags);
-cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
+cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
 SCHED_STAT_CRANK(migrate_requested);
 }
 else
@@ -1711,9 +1712,9 @@ static void migrate(const struct scheduler *ops,
 }

[Xen-devel] [DRAFT C] PVH CPU hotplug design document

2017-01-17 Thread Roger Pau Monné

Hello,

Below is a draft of a design document for PVHv2 CPU hotplug. It should cover
both vCPU and pCPU hotplug. It's mainly centered around the hardware domain,
since for unprivileged PVH guests the vCPU hotplug mechanism is already
described in Boris series [0], and it's shared with HVM.

The aim here is to find a way to use ACPI vCPU hotplug for the hardware domain,
while still being able to properly detect and notify Xen of pCPU hotplug.

[0] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00060.html

---8<---
% CPU hotplug support for PVH
% Roger Pau Monné 
% Draft C

# Revision History

| Version | Date| Changes   |
|-|-|---|
| Draft A | 5 Jan 2017  | Initial draft.|
|-|-|---|
| Draft B | 12 Jan 2017 | Removed the XXX comments and clarify some |
| | | sections. |
| | |   |
| | | Added a sample of the SSDT ASL code that would be |
| | | appended to the hardware domain.  |
|-|-|---|
|Draft C  | 17 Jan 2017 | Define a _SB.XEN0 bus device and place all the|
| | | processor objects and the GPE block inside of it. |
| | |   |
| | | Place the GPE status and enable registers and |
| | | the vCPU enable bitmap in memory instead of IO|
| | | space.|

# Preface

This document aims to describe the interface to use in order to implement CPU
hotplug for PVH guests, this applies to hotplug of both physical and virtual
CPUs.

# Introduction

One of the design goals of PVH is to be able to remove as much Xen PV specific
code as possible, thus limiting the number of Xen PV interfaces used by guests,
and tending to use native interfaces (as used by bare metal) as much as
possible. This is in line with the efforts also done by Xen on ARM and helps
reduce the burden of maintaining huge amounts of Xen PV code inside of guests
kernels.

This however presents some challenges due to the model used by the Xen
Hypervisor, where some devices are handled by Xen while others are left for the
hardware domain to manage. The fact that Xen lacks and AML parser also makes it
harder, since it cannot get the full hardware description from dynamic ACPI
tables (DSDT, SSDT) without the hardware domain collaboration.

One of such issues is CPU enumeration and hotplug, for both the hardware and
unprivileged domains. The aim is to be able to use the same enumeration and
hotplug interface for all PVH guests, regardless of their privilege.

This document aims to describe the interface used in order to fulfill the
following actions:

 * Virtual CPU (vCPU) enumeration at boot time.
 * Hotplug of vCPUs.
 * Hotplug of physical CPUs (pCPUs) to Xen.

# Prior work

## PV CPU hotplug

CPU hotplug for Xen PV guests is implemented using xenstore and hypercalls. The
guest has to setup a watch event on the "cpu/" xenstore node, and react to
changes in this directory. CPUs are added creating a new node and setting it's
"availability" to online:

cpu/X/availability = "online"

Where X is the vCPU ID. This is an out-of-band method, that relies on Xen
specific interfaces in order to perform CPU hotplug.

## QEMU CPU hotplug using ACPI

The ACPI tables provided to HVM guests contain processor objects, as created by
libacpi. The number of processor objects in the ACPI namespace matches the
maximum number of processors supported by HVM guests (up to 128 at the time of
writing). Processors currently disabled are marked as so in the MADT and in
their \_MAT and \_STA methods.

A PRST operation region in I/O space is also defined, with a size of 128bits,
that's used as a bitmap of enabled vCPUs on the system. A PRSC method is
provided in order to check for updates to the PRST region and trigger
notifications on the affected processor objects. The execution of the PRSC
method is done by a GPE event. Then OSPM checks the value returned by \_STA for
the ACPI\_STA\_DEVICE\_PRESENT flag in order to check if the vCPU has been
enabled.

## Native CPU hotplug

OSPM waits for a notification from ACPI on the processor object and when an
event is received the return value from _STA is checked in order to see if
ACPI\_STA\_DEVICE\_PRESENT has been enabled. This notification is triggered
from the method of a GPE block.

# PVH CPU hotplug

The aim as stated in the introduction is to use a method as similar as possible
to bare metal CPU hotplug for PVH, this is

Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

>>> On 17.01.17 at 16:08,  wrote:
> I was lucky to capture the full log before it fills up my 100MB ring buffer
> (in less than 2 seconds).
> Please find the log in the attachment.

Sadly nothing helpful in there; I'm a little puzzled though that the
first thing we see is

(XEN) [VT-D]iommu.c:909: iommu_fault_status: Fault Overflow

which suggests there were (unlogged) faults already before.

My primary suspicion right now is that you problem is due to the
relatively large RMRR, as the first logged fault occurs on the first
2Mb boundary after the start of the RMRR. I'll therefore have to
find time to create a debugging patch for you.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen/arm: bootfdt.c is only used during initialization

2017-01-17 Thread Julien Grall

This file contains data and code only used at initialization. Mark the
file as such in the build system and correct kind_guess.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/Makefile  | 2 +-
 xen/arch/arm/bootfdt.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 59b3b53..cf67bbe 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -5,7 +5,7 @@ subdir-$(CONFIG_ARM_64) += efi
 subdir-$(CONFIG_ACPI) += acpi
 
 obj-$(CONFIG_HAS_ALTERNATIVE) += alternative.o
-obj-y += bootfdt.o
+obj-y += bootfdt.init.o
 obj-y += cpu.o
 obj-y += cpuerrata.o
 obj-y += cpufeature.o
diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index d130633..cae6f83 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -168,7 +168,7 @@ static void __init process_multiboot_node(const void *fdt, 
int node,
   const char *name,
   u32 address_cells, u32 size_cells)
 {
-static int kind_guess = 0;
+static int __initdata kind_guess = 0;
 const struct fdt_property *prop;
 const __be32 *cell;
 bootmodule_kind kind;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen/arm: Don't mix GFN and MFN when using iomem_deny_access

2017-01-17 Thread Julien Grall

iomem_deny_access is working on MFN and not GFN. Make it clear by
renaming the local variables.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/domain_build.c |  6 +++---
 xen/arch/arm/gic-v2.c   | 18 +-
 xen/arch/arm/gic-v3.c   | 18 +-
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 07b868d..63301e6 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1373,7 +1373,7 @@ static int acpi_iomem_deny_access(struct domain *d)
 {
 acpi_status status;
 struct acpi_table_spcr *spcr = NULL;
-unsigned long gfn;
+unsigned long mfn;
 int rc;
 
 /* Firstly permit full MMIO capabilities. */
@@ -1391,9 +1391,9 @@ static int acpi_iomem_deny_access(struct domain *d)
 return -EINVAL;
 }
 
-gfn = spcr->serial_port.address >> PAGE_SHIFT;
+mfn = spcr->serial_port.address >> PAGE_SHIFT;
 /* Deny MMIO access for UART */
-rc = iomem_deny_access(d, gfn, gfn + 1);
+rc = iomem_deny_access(d, mfn, mfn + 1);
 if ( rc )
 return rc;
 
diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 9245e7d..cd8e504 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -991,26 +991,26 @@ static void __init gicv2_dt_init(void)
 static int gicv2_iomem_deny_access(const struct domain *d)
 {
 int rc;
-unsigned long gfn, nr;
+unsigned long mfn, nr;
 
-gfn = dbase >> PAGE_SHIFT;
-rc = iomem_deny_access(d, gfn, gfn + 1);
+mfn = dbase >> PAGE_SHIFT;
+rc = iomem_deny_access(d, mfn, mfn + 1);
 if ( rc )
 return rc;
 
-gfn = hbase >> PAGE_SHIFT;
-rc = iomem_deny_access(d, gfn, gfn + 1);
+mfn = hbase >> PAGE_SHIFT;
+rc = iomem_deny_access(d, mfn, mfn + 1);
 if ( rc )
 return rc;
 
-gfn = cbase >> PAGE_SHIFT;
+mfn = cbase >> PAGE_SHIFT;
 nr = DIV_ROUND_UP(csize, PAGE_SIZE);
-rc = iomem_deny_access(d, gfn, gfn + nr);
+rc = iomem_deny_access(d, mfn, mfn + nr);
 if ( rc )
 return rc;
 
-gfn = vbase >> PAGE_SHIFT;
-return iomem_deny_access(d, gfn, gfn + nr);
+mfn = vbase >> PAGE_SHIFT;
+return iomem_deny_access(d, mfn, mfn + nr);
 }
 
 #ifdef CONFIG_ACPI
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 12775f5..955591b 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1238,37 +1238,37 @@ static void __init gicv3_dt_init(void)
 static int gicv3_iomem_deny_access(const struct domain *d)
 {
 int rc, i;
-unsigned long gfn, nr;
+unsigned long mfn, nr;
 
-gfn = dbase >> PAGE_SHIFT;
+mfn = dbase >> PAGE_SHIFT;
 nr = DIV_ROUND_UP(SZ_64K, PAGE_SIZE);
-rc = iomem_deny_access(d, gfn, gfn + nr);
+rc = iomem_deny_access(d, mfn, mfn + nr);
 if ( rc )
 return rc;
 
 for ( i = 0; i < gicv3.rdist_count; i++ )
 {
-gfn = gicv3.rdist_regions[i].base >> PAGE_SHIFT;
+mfn = gicv3.rdist_regions[i].base >> PAGE_SHIFT;
 nr = DIV_ROUND_UP(gicv3.rdist_regions[i].size, PAGE_SIZE);
-rc = iomem_deny_access(d, gfn, gfn + nr);
+rc = iomem_deny_access(d, mfn, mfn + nr);
 if ( rc )
 return rc;
 }
 
 if ( cbase != INVALID_PADDR )
 {
-gfn = cbase >> PAGE_SHIFT;
+mfn = cbase >> PAGE_SHIFT;
 nr = DIV_ROUND_UP(csize, PAGE_SIZE);
-rc = iomem_deny_access(d, gfn, gfn + nr);
+rc = iomem_deny_access(d, mfn, mfn + nr);
 if ( rc )
 return rc;
 }
 
 if ( vbase != INVALID_PADDR )
 {
-gfn = vbase >> PAGE_SHIFT;
+mfn = vbase >> PAGE_SHIFT;
 nr = DIV_ROUND_UP(csize, PAGE_SIZE);
-return iomem_deny_access(d, gfn, gfn + nr);
+return iomem_deny_access(d, mfn, mfn + nr);
 }
 
 return 0;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] PVH CPU hotplug design document

On 01/17/2017 10:33 AM, Jan Beulich wrote:
 On 17.01.17 at 16:27,  wrote:
>> On 01/17/2017 09:44 AM, Jan Beulich wrote:
>> On 17.01.17 at 15:13,  wrote:
 There's only one kind of PVHv2 guest that doesn't require ACPI, and that 
 guest
 type also doesn't have emulated local APICs. We agreed that this model was
 interesting from things like unikernels DomUs, but that's the only reason 
 why
 we are providing it. Not that full OSes couldn't use it, but it seems
 pointless.
>>> You writing things this way makes me notice another possible design
>>> issue here: Requiring ACPI is a bad thing imo, with even bare hardware
>>> going different directions for at least some use cases (SFI being one
>>> example). Hence I think ACPI should - like on bare hardware - remain
>>> an optional thing. Which in turn require _all_ information obtained from
>>> ACPI (if available) to also be available another way. And this other
>>> way might by hypercalls in our case.
>>
>> At the risk of derailing this thread: why do we need vCPU hotplug for
>> dom0 in the first place? What do we gain over "echo {1|0} >
>> /sys/devices/system/cpu/cpuX/online" ?
>>
>> I can see why this may be needed for domUs where Xen can enforce number
>> of vCPUs that are allowed to run (which we don't enforce now anyway) but
>> why for dom0?
> Good that you now ask this too - that's the PV hotplug mechanism,
> and I've been saying all the time that this should be just fine for PVH
> (Dom0 and DomU).

I think domU hotplug has some value in that we can change number VCPUs
that the guest sees and ACPI-based hotplug allows us to do that in a
"standard" manner.

For dom0 this doesn't seem to be necessary as it's a special domain
available only to platform administrator.

Part of confusion I think is because PV hotplug is not hotplug, really,
as far as Linux kernel is concerned.


-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/8] public / x86: Introduce __HYPERCALL_dm_op...

>>> On 17.01.17 at 16:06,  wrote:
> On 17/01/17 12:29, George Dunlap wrote:
>> On Tue, Jan 17, 2017 at 11:22 AM, Andrew Cooper
>>  wrote:
>>> On 16/01/17 16:16, Jan Beulich wrote:
>>> On 16.01.17 at 17:05,  wrote:
> On 13/01/17 12:47, Jan Beulich wrote:
>> The kernel already has to parse this structure anyway, and will know 
>> the
>> bitness of its userspace process.  We could easily (at this point)
>> require the kernel to turn it into the kernels bitness for 
>> forwarding on
>> to Xen, which covers the 32bit userspace under a 64bit kernel 
>> problem,
>> in a way which won't break the hypercall ABI when 128bit comes along.
 But that won't cover a 32-bit kernel.
>>> Yes it will.
>> How that, without a compat translation layer in Xen?
> Why shouldn't there be a compat layer?
 Because the compat layer we have is kind of ugly to maintain. Hence
 I would expect additions to it to not make the situation any better.
>>> This is because our compat handling is particularly ugly (partially
>>> because our ABI has varying-size fields at random places in the middle
>>> of structures).  Not because a compat layer is the wrong thing to do.
>>>
 And I'm not sure we really need to bother considering hypothetical
 128-bit architectures at this point in time.
>>> Because considering this case will avoid us painting ourselves into a
>>> corner.
>> Why would we consider this case here, when all other part of the
>> public interface don't do so?
> This is asking why we should continue to shoot ourselves in the foot,
> ABI wise, rather than trying to do something better.
>
> And the answer is that I'd prefer that we started fixing the problem,
> rather than making it worse.
 Okay, so 128 bit handles then. But wait, we should be prepared for
 256-bit environments to, so 256-bit handles then. But wait, ...
>>> Precisely. A fixed bit width doesn't work, and cannot work going
>>> forwards.  Using a fixed bitsize will force is to burn a hypercall
>>> number every time we want to implement this ABI at a larger bit size.
>> Are we running so low on hypercall numbers that "burning" them when
>> the dominant bit width doubles in size is going to be an issue?
> 
> There is a fixed ABI of 63 hypercalls.
> 
> This can compatibly be extend up to 255 (the amount of extra room in the
> hypercall page), but no further, as c/s 2a33551d in 2008 added:
> 
> /*
>  * Leaf 3 (0x4002)
>  * EAX: Number of hypercall transfer pages. This register is always
> guaranteed
>  *  to specify one hypercall page.
> 
> to our public ABI.

As said in the other reply - there's nothing keeping us from making
the hypervisor fill stub N such that it passes N+0x1000 as hypercall
number.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/6] x86/cpuid: Introduce recalculate_xstate()

On 17/01/17 15:28, Jan Beulich wrote:
 On 17.01.17 at 16:15,  wrote:
>> On 17/01/17 12:52, Jan Beulich wrote:
>> On 17.01.17 at 12:27,  wrote:
 @@ -154,6 +152,13 @@ struct cpuid_policy
  };
  uint32_t /* b */:32, xss_low, xss_high;
  };
 +
 +/* Per-component common state.  Valid for i >= 2. */
 +struct {
 +uint32_t size, offset;
 +bool xss:1, align:1;
 +uint32_t _res_d;
>>> I see you've decided against an inner union. Should be fine of
>>> course, at least until we have a need to access the full ECX value
>>> by name.
>> Oh - I misinterpreted what you meant then.
>>
>> Did you mean
>>
>> struct {
>> uint32_t size, offset;
>> union {
>> struct {
>> bool xss:1, align:1;
>> };
>> uint32_t c;
>> };
>> uint32_t /* d */:32;
>> };
>>
>> Then?
> Yes. But in the end it's up to you which variant to use.

We only write via the xss/align names, and read through raw[].

For now, lets go with mine which is a simpler structure.  We can always
change it if we need to.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] PVH CPU hotplug design document

>>> On 17.01.17 at 16:27,  wrote:
> On 01/17/2017 09:44 AM, Jan Beulich wrote:
> On 17.01.17 at 15:13,  wrote:
>>> There's only one kind of PVHv2 guest that doesn't require ACPI, and that 
>>> guest
>>> type also doesn't have emulated local APICs. We agreed that this model was
>>> interesting from things like unikernels DomUs, but that's the only reason 
>>> why
>>> we are providing it. Not that full OSes couldn't use it, but it seems
>>> pointless.
>> You writing things this way makes me notice another possible design
>> issue here: Requiring ACPI is a bad thing imo, with even bare hardware
>> going different directions for at least some use cases (SFI being one
>> example). Hence I think ACPI should - like on bare hardware - remain
>> an optional thing. Which in turn require _all_ information obtained from
>> ACPI (if available) to also be available another way. And this other
>> way might by hypercalls in our case.
> 
> 
> At the risk of derailing this thread: why do we need vCPU hotplug for
> dom0 in the first place? What do we gain over "echo {1|0} >
> /sys/devices/system/cpu/cpuX/online" ?
> 
> I can see why this may be needed for domUs where Xen can enforce number
> of vCPUs that are allowed to run (which we don't enforce now anyway) but
> why for dom0?

Good that you now ask this too - that's the PV hotplug mechanism,
and I've been saying all the time that this should be just fine for PVH
(Dom0 and DomU).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/8] public / x86: Introduce __HYPERCALL_dm_op...

>>> On 17.01.17 at 16:13,  wrote:
> On 17/01/17 12:42, Jan Beulich wrote:
>>
 And I'm not sure we really need to bother considering hypothetical
 128-bit architectures at this point in time.
>>> Because considering this case will avoid us painting ourselves into a
>>> corner.
>> Why would we consider this case here, when all other part of the
>> public interface don't do so?
> This is asking why we should continue to shoot ourselves in the foot,
> ABI wise, rather than trying to do something better.
>
> And the answer is that I'd prefer that we started fixing the problem,
> rather than making it worse.
 Okay, so 128 bit handles then. But wait, we should be prepared for
 256-bit environments to, so 256-bit handles then. But wait, ...
>>> Precisely. A fixed bit width doesn't work, and cannot work going
>>> forwards.  Using a fixed bitsize will force is to burn a hypercall
>>> number every time we want to implement this ABI at a larger bit size.
>> With wider machine word width the number space of hypercalls
>> widens too, so I would not be worried at all using new hypercall
>> numbers, or even wholesale new hypercall number ranges.
> 
> I will leave this to the other fork of the thread, but our hypercall
> space does extend.  It is currently fixed at an absolute maximum of 4k/32.

If we were to introduce a new range, we'd likely still fit the stubs
all in one page, just that the numbers the stubs put into whatever
the equivalent of %rax would be would then have some higher
bit set.

 Or maybe I'm simply not getting what you mean to put in place here.
>>> The interface should be in terms of void * (and where appropriate,
>>> size_t), from the guests point of view, and is what a plain
>>> GUEST_HANDLE() gives you.
>> As said - that'll further break 32-bit tool stacks on 64-bit kernels.
> 
> dmop is not a tool api.  It can only be issued by a kernel, after the
> kernel has audited the internals.

To me qemu, for which the interface is mainly being made (hence
the naming of it), is very much a tool. And fundamentally I don't
see anything wrong with using a 32-bit qemu binary.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] STAO spec in xen.git

2017-01-17 Thread Olaf Hering

On Fri, Jan 13, Julien Grall wrote:

> Regarding the format. Does ODT will allow git to do proper diff?

There is flat ODT, "Safe as ..." and pick the better format from the pulldown 
menu.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/6] x86/cpuid: Introduce recalculate_xstate()

>>> On 17.01.17 at 16:15,  wrote:
> On 17/01/17 12:52, Jan Beulich wrote:
> On 17.01.17 at 12:27,  wrote:
>>> @@ -154,6 +152,13 @@ struct cpuid_policy
>>>  };
>>>  uint32_t /* b */:32, xss_low, xss_high;
>>>  };
>>> +
>>> +/* Per-component common state.  Valid for i >= 2. */
>>> +struct {
>>> +uint32_t size, offset;
>>> +bool xss:1, align:1;
>>> +uint32_t _res_d;
>> I see you've decided against an inner union. Should be fine of
>> course, at least until we have a need to access the full ECX value
>> by name.
> 
> Oh - I misinterpreted what you meant then.
> 
> Did you mean
> 
> struct {
> uint32_t size, offset;
> union {
> struct {
> bool xss:1, align:1;
> };
> uint32_t c;
> };
> uint32_t /* d */:32;
> };
> 
> Then?

Yes. But in the end it's up to you which variant to use.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/6] x86/cpuid: Introduce recalculate_xstate()

On 17/01/17 12:52, Jan Beulich wrote:
 On 17.01.17 at 12:27,  wrote:
>> --- a/xen/arch/x86/cpuid.c
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -80,6 +80,103 @@ static void sanitise_featureset(uint32_t *fs)
>>(fs[FEATURESET_e1d] & ~CPUID_COMMON_1D_FEATURES));
>>  }
>>  
>> +static void recalculate_xstate(struct cpuid_policy *p)
>> +{
>> +uint64_t xstates = XSTATE_FP_SSE;
>> +uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
>> +unsigned int i, Da1 = p->xstate.Da1;
>> +
>> +/*
>> + * The Da1 leaf is the only piece if information preserved in the common
>> + * case.  Everything else is derived from other feature state.
>> + */
> "piece of" I think.

Ah yes - will fix.

>
>> +memset(>xstate, 0, sizeof(p->xstate));
>> +
>> +if ( !p->basic.xsave )
>> +return;
>> +
>> +if ( p->basic.avx )
>> +{
>> +xstates |= XSTATE_YMM;
>> +xstate_size = max(xstate_size,
>> +  xstate_offsets[_XSTATE_YMM] +
>> +  xstate_sizes[_XSTATE_YMM]);
>> +}
>> +
>> +if ( p->feat.mpx )
>> +{
>> +xstates |= XSTATE_BNDREGS | XSTATE_BNDCSR;
>> +xstate_size = max(xstate_size,
>> +  xstate_offsets[_XSTATE_BNDCSR] +
>> +  xstate_sizes[_XSTATE_BNDCSR]);
>> +}
>> +
>> +if ( p->feat.avx512f )
>> +{
>> +xstates |= XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM;
>> +xstate_size = max(xstate_size,
>> +  xstate_offsets[_XSTATE_HI_ZMM] +
>> +  xstate_sizes[_XSTATE_HI_ZMM]);
>> +}
>> +
>> +if ( p->feat.pku )
>> +{
>> +xstates |= XSTATE_PKRU;
>> +xstate_size = max(xstate_size,
>> +  xstate_offsets[_XSTATE_PKRU] +
>> +  xstate_sizes[_XSTATE_PKRU]);
>> +}
>> +
>> +if ( p->extd.lwp )
>> +{
>> +xstates |= XSTATE_LWP;
>> +xstate_size = max(xstate_size,
>> +  xstate_offsets[_XSTATE_LWP] +
>> +  xstate_sizes[_XSTATE_LWP]);
>> +}
>> +
>> +/* Sanity check we aren't advertising unknown states. */
>> +ASSERT((xstates & ~XCNTXT_MASK) == 0);
>> +
>> +p->xstate.max_size  =  xstate_size;
>> +p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
>> +p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
>> +
>> +p->xstate.Da1 = Da1;
>> +if ( p->xstate.xsaves )
>> +{
>> +p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
>> +p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
>> +}
>> +else
>> +xstates &= ~XSTATE_XSAVES_ONLY;
>> +
>> +for ( i = 2; i < min(63ul, ARRAY_SIZE(p->xstate.comp)); ++i )
>> +{
>> +uint64_t curr_xstate = 1ul << i;
>> +
>> +if ( !(xstates & curr_xstate) )
>> +continue;
>> +
>> +p->xstate.comp[i].size   = xstate_sizes[i];
>> +p->xstate.comp[i].offset = xstate_offsets[i];
>> +p->xstate.comp[i].xss= curr_xstate & XSTATE_XSAVES_ONLY;
>> +p->xstate.comp[i].align  = curr_xstate & xstate_align;
>> +
>> +/*
>> + * Sanity checks:
>> + * - All valid components should have non-zero size.
>> + * - All xcr0 components should have non-zero offset.
>> + * - All xss components should report 0 offset.
>> + */
>> +ASSERT(xstate_sizes[i]);
>> +if ( curr_xstate & XSTATE_XSAVES_ONLY )
>> +ASSERT(xstate_offsets[i] == 0);
>> +else
>> +ASSERT(xstate_offsets[i]);
>> +}
> Hmm, now that I look at this again - what business do these
> assertions have here? They're checking host information, which
> isn't going to change post boot. Such checking, if indeed wanted,
> should be done once during system boot.

I put this in to try and ensure that we don't put junk into the policy,
but thinking about it more, by the time do have junk in these arrays, we
have bigger xstate problems.  I will drop these from here and focus on
beefing up the xstate driver itself.

>
> I also think such checks should be consistent in style - either both
> explicitly comparing with zero, or using ! in the if() branch to match
> the else one.
>
>> @@ -154,6 +152,13 @@ struct cpuid_policy
>>  };
>>  uint32_t /* b */:32, xss_low, xss_high;
>>  };
>> +
>> +/* Per-component common state.  Valid for i >= 2. */
>> +struct {
>> +uint32_t size, offset;
>> +bool xss:1, align:1;
>> +uint32_t _res_d;
> I see you've decided against an inner union. Should be fine of
> course, at least until we have a need to access the full ECX value
> by name.

Oh - I misinterpreted what you meant then.

Did you mean

struct {
uint32_t size, offset;
union {
struct {
bool xss:1, align:1;
};
uint32_t c;

Re: [Xen-devel] PVH CPU hotplug design document

On 01/17/2017 09:44 AM, Jan Beulich wrote:
 On 17.01.17 at 15:13,  wrote:
>> On Tue, Jan 17, 2017 at 05:33:41AM -0700, Jan Beulich wrote:
>> On 17.01.17 at 12:43,  wrote:
 If the PVH domain has access to an APIC and wants to use it it must parse 
 the
 info from the MADT, or else it cannot get the APIC address or the APIC ID 
 (you
 could guess those, since their position is quite standard, but what's the
 point?)
>>> There's always the option of obtaining needed information via hypercall.
>> I think we should avoid that and instead use ACPI only, or else we are
>> duplicating the information provided in ACPI using another interface, which 
>> is
>> pointless IMHO.
>>
>> There's only one kind of PVHv2 guest that doesn't require ACPI, and that 
>> guest
>> type also doesn't have emulated local APICs. We agreed that this model was
>> interesting from things like unikernels DomUs, but that's the only reason why
>> we are providing it. Not that full OSes couldn't use it, but it seems
>> pointless.
> You writing things this way makes me notice another possible design
> issue here: Requiring ACPI is a bad thing imo, with even bare hardware
> going different directions for at least some use cases (SFI being one
> example). Hence I think ACPI should - like on bare hardware - remain
> an optional thing. Which in turn require _all_ information obtained from
> ACPI (if available) to also be available another way. And this other
> way might by hypercalls in our case.


At the risk of derailing this thread: why do we need vCPU hotplug for
dom0 in the first place? What do we gain over "echo {1|0} >
/sys/devices/system/cpu/cpuX/online" ?

I can see why this may be needed for domUs where Xen can enforce number
of vCPUs that are allowed to run (which we don't enforce now anyway) but
why for dom0?

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] STAO spec in xen.git

On 17/01/17 04:55, Juergen Gross wrote:
> On 16/01/17 19:40, Stefano Stabellini wrote:
>> On Mon, 16 Jan 2017, Ian Jackson wrote:
>>> Stefano Stabellini writes ("Re: STAO spec in xen.git"):
 In that case, I think we should still commit it as ODT, but convert it
 automatically to PDF when we publish it (we do something similar with
 the markdown docs, converting them from markdown to html).
>>> Exactly.
>>>
>>> The fact that git diff won't show updates well is not ideal, but IMO
>>> it is imperative to get the document 1. it into our own version
>>> control and 2. in a format we can edit 3. using FLOSS tools.
>>>
>>> Please can we commit the source of the document to xen.git as a first
>>> step.  At that point the xen.git copy will become the master copy.
>>>
>>> We can then think about PDF export.  If the conversion has to be done
>>> ad hoc manually at first then that is not a blocker for committing the
>>> file.
>> Right. I am going to commit this version for now.
>>
>> If somebody was willing to come up with an alternative version of this
>> document in a more git-friendly format, I would be more than happy for
>> it to replace the ODT version.
> LaTeX? Should be less than 1 hour work. I can do it if a .tex file is
> okay.

We have had LaTeX documents in the past in tree.  It certainly would be
more text-friendly.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/8] public / x86: Introduce __HYPERCALL_dm_op...

On 17/01/17 12:42, Jan Beulich wrote:
>
>>> And I'm not sure we really need to bother considering hypothetical
>>> 128-bit architectures at this point in time.
>> Because considering this case will avoid us painting ourselves into a
>> corner.
> Why would we consider this case here, when all other part of the
> public interface don't do so?
 This is asking why we should continue to shoot ourselves in the foot,
 ABI wise, rather than trying to do something better.

 And the answer is that I'd prefer that we started fixing the problem,
 rather than making it worse.
>>> Okay, so 128 bit handles then. But wait, we should be prepared for
>>> 256-bit environments to, so 256-bit handles then. But wait, ...
>> Precisely. A fixed bit width doesn't work, and cannot work going
>> forwards.  Using a fixed bitsize will force is to burn a hypercall
>> number every time we want to implement this ABI at a larger bit size.
> With wider machine word width the number space of hypercalls
> widens too, so I would not be worried at all using new hypercall
> numbers, or even wholesale new hypercall number ranges.

I will leave this to the other fork of the thread, but our hypercall
space does extend.  It is currently fixed at an absolute maximum of 4k/32.

>
>>> Or maybe I'm simply not getting what you mean to put in place here.
>> The interface should be in terms of void * (and where appropriate,
>> size_t), from the guests point of view, and is what a plain
>> GUEST_HANDLE() gives you.
> As said - that'll further break 32-bit tool stacks on 64-bit kernels.

dmop is not a tool api.  It can only be issued by a kernel, after the
kernel has audited the internals.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/8] public / x86: Introduce __HYPERCALL_dm_op...