date:20181129

Re: [Xen-devel] Xen optimization

2018-11-29 Thread Andrii Anisov


Hello Stefano,

On 27.11.18 23:27, Stefano Stabellini wrote:

Hi Andrii,

See the following:

https://marc.info/?l=xen-devel&m=148668817704668
Thank you for the point. I remember this email, but missed it also gives 
details to setup the experiment. It looks that bare-metal app is not SoC 
specific, so going to take it in use.



The numbers have improved now thanks to vwfi=native and other
optimizations but the mechanism to setup the experiment are the same.I know 
about `vwfi=native` but it does not fit our requirements :(


--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Juergen Gross

On 29/11/2018 02:22, Hans van Kranenburg wrote:
> Hi,
> 
> As also seen at:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
> 
> Attached there are two serial console output logs. One is starting with
> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
> 
> [2.085543] BUG: unable to handle kernel paging request at
> 888d9fffc000
> [2.085610] PGD 200c067 P4D 200c067 PUD 0
> [2.085674] Oops:  [#1] SMP NOPTI
> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
> [...]

I can reproduce this.

Now searching for the patch causing that ...


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 14/18] xen: add implementations of xen-qdisk connect and disconnect functions...

2018-11-29 Thread Kevin Wolf

Am 28.11.2018 um 17:40 hat Paul Durrant geschrieben:
> > -Original Message-
> > From: Kevin Wolf [mailto:kw...@redhat.com]
> > Sent: 28 November 2018 16:35
> > To: Paul Durrant 
> > Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> > de...@lists.xenproject.org; Stefano Stabellini ;
> > Anthony Perard ; Max Reitz 
> > Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk connect
> > and disconnect functions...
> > 
> > Am 21.11.2018 um 16:12 hat Paul Durrant geschrieben:
> > > ...and wire in the dataplane.
> > >
> > > This patch adds the remaining code to make the xen-qdisk XenDevice
> > > functional. The parameters that a block frontend expects to find are
> > > populated in the backend xenstore area, and the 'ring-ref' and
> > > 'event-channel' values specified in the frontend xenstore area are
> > > mapped/bound and used to set up the dataplane.
> > >
> > > Signed-off-by: Paul Durrant 
> > > ---
> > > Cc: Stefano Stabellini 
> > > Cc: Anthony Perard 
> > > Cc: Kevin Wolf 
> > > Cc: Max Reitz 
> > > ---
> > >  hw/block/xen-qdisk.c   | 140
> > +
> > >  hw/xen/xen-bus.c   |  12 ++--
> > >  include/hw/xen/xen-bus.h   |   8 +++
> > >  include/hw/xen/xen-qdisk.h |  12 
> > >  4 files changed, 166 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/hw/block/xen-qdisk.c b/hw/block/xen-qdisk.c
> > > index 35f7b70480..8c88393832 100644
> > > --- a/hw/block/xen-qdisk.c
> > > +++ b/hw/block/xen-qdisk.c
> > > @@ -9,6 +9,10 @@
> > >  #include "qapi/visitor.h"
> > >  #include "hw/hw.h"
> > >  #include "hw/xen/xen-qdisk.h"
> > > +#include "sysemu/blockdev.h"
> > > +#include "sysemu/block-backend.h"
> > > +#include "sysemu/iothread.h"
> > > +#include "dataplane/xen-qdisk.h"
> > >  #include "trace.h"
> > >
> > >  static char *xen_qdisk_get_name(XenDevice *xendev, Error **errp)
> > > @@ -23,6 +27,11 @@ static void xen_qdisk_realize(XenDevice *xendev,
> > Error **errp)
> > >  {
> > >  XenQdiskDevice *qdiskdev = XEN_QDISK_DEVICE(xendev);
> > >  XenQdiskVdev *vdev = &qdiskdev->vdev;
> > > +BlockConf *conf = &qdiskdev->conf;
> > > +DriveInfo *dinfo;
> > > +bool is_cdrom;
> > > +unsigned int info;
> > > +int64_t size;
> > >
> > >  if (!vdev->valid) {
> > >  error_setg(errp, "vdev property not set");
> > > @@ -30,13 +39,134 @@ static void xen_qdisk_realize(XenDevice *xendev,
> > Error **errp)
> > >  }
> > >
> > >  trace_xen_qdisk_realize(vdev->disk, vdev->partition);
> > > +
> > > +if (!conf->blk) {
> > > +error_setg(errp, "drive property not set");
> > > +return;
> > > +}
> > > +
> > > +if (!blk_is_inserted(conf->blk)) {
> > > +error_setg(errp, "device needs media, but drive is empty");
> > > +return;
> > > +}
> > 
> > Hm, the code below suggests that you support CD-ROMs. Don't you want to
> > support media change as well then? Which would mean that you need to
> > support empty drives.
> 
> Yes, that's a good point. I should get rid of that check.

Or rather apply it only to hard disks. And for empty CDs, you'll
probably need to create an empty BlockBackend (the !conf->blk case).
Just check the IDE and/or SCSI code for comparison.

> > 
> > > +if (!blkconf_apply_backend_options(conf, blk_is_read_only(conf-
> > >blk),
> > > +   false, errp)) {
> > > +return;
> > > +}
> > > +
> > > +if (!blkconf_geometry(conf, NULL, 65535, 255, 255, errp)) {
> > > +return;
> > > +}
> > > +
> > > +dinfo = blk_legacy_dinfo(conf->blk);
> > > +is_cdrom = (dinfo && dinfo->media_cd);
> > 
> > It's called legacy for a reason. Don't use this in new devices.
> > 
> > The proper way is to have two different devices for hard disks and CDs
> > (like scsi-hd and scsi-cd).
> 
> ...or presumably I could have a property? The legacy init code could
> then set it based on the drive info.

Technically yes, but why would that be a good way to model things? I
mean, it's true that xen-qdisk is not real hardware, but I've never seen
any hardware that has a switch to decide whether it should behave as a
CD drive or a hard disk.

Both have very different characteristics (read-only with removable
media, or a single read-write disk), and the existing implementations
use two separate devices. So even if you're not convinced that users
will consider them different concepts (I am; and if they weren't
different concepts, you wouldn't need an is_cdrom variable), consistency
is still a good thing.

Kevin

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Qemu-block] [PATCH 02/18] xen: introduce new 'XenBus' and 'XenDevice' object hierarchy

2018-11-29 Thread Kevin Wolf

Am 28.11.2018 um 17:46 hat Paul Durrant geschrieben:
> > -Original Message-
> > From: Paul Durrant
> > Sent: 28 November 2018 16:46
> > To: 'Kevin Wolf' 
> > Cc: 'Stefano Stabellini' ; qemu-bl...@nongnu.org;
> > qemu-de...@nongnu.org; xen-devel@lists.xenproject.org; Eduardo Habkost
> > ; Michael S. Tsirkin ; Marcel
> > Apfelbaum ; Anthony Perard
> > ; Paolo Bonzini ; Richard
> > Henderson 
> > Subject: RE: [Qemu-block] [PATCH 02/18] xen: introduce new 'XenBus' and
> > 'XenDevice' object hierarchy
> > 
> > > -Original Message-
> > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > Sent: 28 November 2018 16:39
> > > To: Paul Durrant 
> > > Cc: 'Stefano Stabellini' ; qemu-
> > bl...@nongnu.org;
> > > qemu-de...@nongnu.org; xen-devel@lists.xenproject.org; Eduardo Habkost
> > > ; Michael S. Tsirkin ; Marcel
> > > Apfelbaum ; Anthony Perard
> > > ; Paolo Bonzini ;
> > Richard
> > > Henderson 
> > > Subject: Re: [Qemu-block] [PATCH 02/18] xen: introduce new 'XenBus' and
> > > 'XenDevice' object hierarchy
> > >
> > > Am 28.11.2018 um 17:29 hat Paul Durrant geschrieben:
> > > > > -Original Message-
> > > > > From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> > > > > Sent: 28 November 2018 16:28
> > > > > To: Paul Durrant 
> > > > > Cc: 'Kevin Wolf' ; qemu-bl...@nongnu.org; qemu-
> > > > > de...@nongnu.org; xen-devel@lists.xenproject.org; Stefano Stabellini
> > > > > ; Eduardo Habkost ;
> > > Michael
> > > > > S. Tsirkin ; Marcel Apfelbaum
> > > > > ; Anthony Perard
> > > ;
> > > > > Paolo Bonzini ; Richard Henderson
> > > 
> > > > > Subject: RE: [Qemu-block] [PATCH 02/18] xen: introduce new 'XenBus'
> > > and
> > > > > 'XenDevice' object hierarchy
> > > > >
> > > > > On Wed, 28 Nov 2018, Paul Durrant wrote:
> > > > > > > -Original Message-
> > > > > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > > > > Sent: 28 November 2018 16:19
> > > > > > > To: Paul Durrant 
> > > > > > > Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> > > > > > > de...@lists.xenproject.org; Stefano Stabellini
> > > > > ;
> > > > > > > Eduardo Habkost ; Michael S. Tsirkin
> > > > > > > ; Marcel Apfelbaum ;
> > > > > Anthony
> > > > > > > Perard ; Paolo Bonzini
> > > > > ;
> > > > > > > Richard Henderson 
> > > > > > > Subject: Re: [Qemu-block] [PATCH 02/18] xen: introduce new
> > > 'XenBus'
> > > > > and
> > > > > > > 'XenDevice' object hierarchy
> > > > > > >
> > > > > > > Am 21.11.2018 um 16:11 hat Paul Durrant geschrieben:
> > > > > > > > This patch adds the basic boilerplate for a 'XenBus' object
> > that
> > > > > will
> > > > > > > act
> > > > > > > > as a parent to 'XenDevice' PV backends.
> > > > > > > > A new 'XenBridge' object is also added to connect XenBus to
> > the
> > > > > system
> > > > > > > bus.
> > > > > > > >
> > > > > > > > The XenBus object is instantiated by a new xen_bus_init()
> > > function
> > > > > > > called
> > > > > > > > from the same sites as the legacy xen_be_init() function.
> > > > > > > >
> > > > > > > > Subsequent patches will flesh-out the functionality of these
> > > > > objects.
> > > > > > > >
> > > > > > > > Signed-off-by: Paul Durrant 
> > > > > > >
> > > > > > > > diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
> > > > > > > > new file mode 100644
> > > > > > > > index 00..dede2d914a
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/hw/xen/xen-bus.c
> > > > > > > > @@ -0,0 +1,125 @@
> > > > > > > > +/*
> > > > > > > > + * Copyright (c) Citrix Systems Inc.
> > > > > > > > + * All rights reserved.
> > > > > > > > + */
> > > > > > >
> > > > > > > This doesn't look very compatible with the GPL. In fact it might
> > > even
> > > > > > > make it illegal for the QEMU project to distribute this code. :-
> > )
> > > > > > >
> > > > > > > Other files you add throughout the series seem to have the same
> > > > > problem.
> > > > > > >
> > > > > >
> > > > > > I was working on the assumption that a lack of explicit license
> > > meant
> > > > > that the overall project license as described in item 2 in LICENSE.
> > > Did I
> > > > > misinterpret that text?
> > > > >
> > > > > It's "All rights reserved." the problem
> > > >
> > > > Oh, I see. I'm happy to remove that.
> > >
> > > That would be better at least. I'm not sure about files that have a
> > > copyright header, but no license statement. Do such files exist yet in
> > > the source tree?
> > 
> > Yes, there's quite a few... e.g. (first ones I tripped over)
> > hw/rdma/rdma_backend.c, hw/virtio/vhost-backend.c, ...
> 
> Oh... I see they have a license statement but no boilerplate... is
> that statement enough?

Yes, this is good enough.

> > > To be on the safe side, I'd just stick with the
> > > established practice, which is having a license header in every file.
> > 
> > Ok... if that is established practice. It really wasn't clear.
> > 
> > >
> > > By the way, in a later patch you remove the existing license header,
> > > which is different from the default license (because parts of the

Re: [Xen-devel] [PATCH v4 1/6] microcode/intel: extend microcode_update_match()

2018-11-29 Thread Roger Pau Monné

On Thu, Nov 29, 2018 at 10:00:34AM +0800, Chao Gao wrote:
> On Wed, Nov 28, 2018 at 11:58:06AM +0100, Roger Pau Monné wrote:
> >On Wed, Nov 28, 2018 at 01:34:11PM +0800, Chao Gao wrote:
> >>  static int microcode_sanity_check(void *mc)
> >> @@ -236,31 +259,13 @@ static int get_matching_microcode(const void *mc, 
> >> unsigned int cpu)
> >>  {
> >>  struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
> >>  const struct microcode_header_intel *mc_header = mc;
> >> -const struct extended_sigtable *ext_header;
> >>  unsigned long total_size = get_totalsize(mc_header);
> >> -int ext_sigcount, i;
> >> -struct extended_signature *ext_sig;
> >>  void *new_mc;
> >>  
> >> -if ( microcode_update_match(cpu, mc_header,
> >> -mc_header->sig, mc_header->pf) )
> >> -goto find;
> >> -
> >> -if ( total_size <= (get_datasize(mc_header) + MC_HEADER_SIZE) )
> >> +if ( microcode_update_match(mc, uci->cpu_sig.sig, uci->cpu_sig.pf,
> >> +uci->cpu_sig.rev) != NEW_UCODE )
> >>  return 0;
> >
> >Shouldn't you differentiate between the function returning OLD_UCODE
> >or MIS_UCODE? I would expect that trying to load a mismatched UCODE
> >would trigger some kind of message from Xen.
> 
> I don't differentiate these two cases. For both of them, we do nothing.
> Actually, I add a message "No newer or matched microcode found" in patch 4
> for them (Currently each cpu parses the file locally, if we add
> an error message, it will show up many times). However, if you are to load
> a corrupted file, another error will be prompted.

What I wanted to point out is that you don't need 3 different return
values if you only differentiate between two of them. Ie: a boolean
will achieve the same result here.

If those 3 different return values are indeed used by patch 4 then
that's fine, I didn't realize it.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/6] microcode: save all microcodes which pass sanity check

2018-11-29 Thread Roger Pau Monné

On Thu, Nov 29, 2018 at 10:40:32AM +0800, Chao Gao wrote:
> On Wed, Nov 28, 2018 at 01:00:14PM +0100, Roger Pau Monné wrote:
> >On Wed, Nov 28, 2018 at 01:34:12PM +0800, Chao Gao wrote:
> >> ... and search caches to find a suitable one when loading.
> >
> >Why do you need to save all of them? You are only going to load a
> >single microcode, so I don't understand the need to cache them all.

I think the above question needs an answer.

> >IMO making such modifications to the AMD code without testing it is
> >very dangerous. Could you get an AMD system or ask an AMD dev to test
> >it? I would try with the AMD SVM maintainers.
> 
> It is improbable for me to find an AMD machine in my team. I will copy AMD
> SVM maintainers in the coming versions and ask them to help to test this
> series.

I'm Cc'ing them now in case they want to provide some feedback.

> >> +static int save_patch(struct ucode_patch *new_patch)
> >> +{
> >> +struct ucode_patch *ucode_patch;
> >> +struct microcode_amd *new_mc = new_patch->data;
> >> +struct microcode_header_amd *new_header = new_mc->mpb;
> >> +
> >> +list_for_each_entry(ucode_patch, µcode_cache, list)
> >> +{
> >> +struct microcode_amd *old_mc = ucode_patch->data;
> >> +struct microcode_header_amd *old_header = old_mc->mpb;
> >> +
> >> +if ( new_header->processor_rev_id == old_header->processor_rev_id 
> >> )
> >> +{
> >> +if ( new_header->patch_id <= old_header->patch_id )
> >> +return -1;
> >> +list_replace(&ucode_patch->list, &new_patch->list);
> >> +free_ucode_patch(ucode_patch);
> >> +return 0;
> >> +}
> >> +}
> >
> >This could be made common code with a specific hook for AMD and Intel
> >in order to do the comparison, so that at least the loop over the
> >list of ucode entries could be shared.
> 
> Something like pt_pirq_iterate()? Will give it a try.

Yes, that might also be helpful. I was thinking of adding such a
comparison hook in microcode_ops, also having something like
pt_pirq_iterate will be helpful if you need to iterate over the cache
in other functions.

> >> @@ -491,6 +559,21 @@ static int cpu_request_microcode(unsigned int cpu, 
> >> const void *buf,
> >>  while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
> >> &offset)) == 0 )
> >>  {
> >> +struct ucode_patch *ucode_patch;
> >> +
> >> +/*
> >> + * Save this microcode before checking the signature. It is to
> >> + * optimize microcode update on a mixed family system. Parsing
> >
> >Er, is it possible to have a system with CPUs of different family?
> >What's going to happen with CPUs having different features?
> 
> I have no idea. That each cpu has a per-cpu variable to store the
> microcode rather than a global one gives me a feeling that the current
> implementation wants to make it work on a system with CPUs of different
> family.

I think we need AMD maintainers input on this one. TBH I very much
doubt there are (working) systems out there with mixed family CPUs.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] vpci: deferral of register write until p2m changes are done

2018-11-29 Thread Jan Beulich

>>> On 28.11.18 at 18:48,  wrote:
> On Wed, Nov 28, 2018 at 10:04:33AM -0700, Jan Beulich wrote:
>> >>> On 28.11.18 at 17:54,  wrote:
>> > On Wed, Nov 28, 2018 at 09:22:16AM -0700, Jan Beulich wrote:
>> >> >>> On 28.11.18 at 16:41,  wrote:
>> >> > My plan is that DomUs won't be allowed to toggle the memory decoding
>> >> > bit, and it's going to be always enabled, like it's currently done for
>> >> > pci-passthrough in QEMU. Toggling the memory decoding bit in a DomU is
>> >> > going to trigger a change to the p2m (map or unmap) but the command
>> >> > register will always have the memory decoding bit enabled.
>> >> 
>> >> But this isn't entirely correct, even if we've got away with this
>> >> so far. But we're mostly considering well-behaved guests and
>> >> devices. What if one actually triggers bus activity in parallel to
>> >> a BAR change?
>> > 
>> > Well, that's likely to not work properly in any case with or without
>> > disabling the memory decoding bit?
>> 
>> Of course not.
>> 
>> > But I don't see how that's going to affect Xen stability (or what the
>> > domain is attempting to achieve with it).
>> 
>> "I don't see how ..." != "That's not going to ...". And in case my
>> prior way of wording it was ambiguous: We very much need to
>> think about malicious guests (once any of this is to be extended
>> to DomU-s). Hence a goal of "I want to crash Xen" needs to be
>> taken into consideration.
> 
> Right, so Xen might what to disable memory decoding for DomUs also.
> 
> But that's orthogonal to whether we agree that the write to the
> command register can happen before the p2m modifications, both in the
> map and the unmap case. I think so, but I would like to be sure you
> agree before I code this.

Thing is, as said before, I'm not sure. I can be convinced by, well,
convincing arguments (which a proof that nothing bad can happen
would be, but anything less likely wouldn't).

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 14/18] xen: add implementations of xen-qdisk connect and disconnect functions...

2018-11-29 Thread Paul Durrant

> -Original Message-
> From: Kevin Wolf [mailto:kw...@redhat.com]
> Sent: 29 November 2018 09:01
> To: Paul Durrant 
> Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> de...@lists.xenproject.org; Stefano Stabellini ;
> Anthony Perard ; Max Reitz 
> Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk connect
> and disconnect functions...
> 
> Am 28.11.2018 um 17:40 hat Paul Durrant geschrieben:
> > > -Original Message-
> > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > Sent: 28 November 2018 16:35
> > > To: Paul Durrant 
> > > Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> > > de...@lists.xenproject.org; Stefano Stabellini
> ;
> > > Anthony Perard ; Max Reitz
> 
> > > Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk
> connect
> > > and disconnect functions...
> > >
> > > Am 21.11.2018 um 16:12 hat Paul Durrant geschrieben:
> > > > ...and wire in the dataplane.
> > > >
> > > > This patch adds the remaining code to make the xen-qdisk XenDevice
> > > > functional. The parameters that a block frontend expects to find are
> > > > populated in the backend xenstore area, and the 'ring-ref' and
> > > > 'event-channel' values specified in the frontend xenstore area are
> > > > mapped/bound and used to set up the dataplane.
> > > >
> > > > Signed-off-by: Paul Durrant 
> > > > ---
> > > > Cc: Stefano Stabellini 
> > > > Cc: Anthony Perard 
> > > > Cc: Kevin Wolf 
> > > > Cc: Max Reitz 
> > > > ---
> > > >  hw/block/xen-qdisk.c   | 140
> > > +
> > > >  hw/xen/xen-bus.c   |  12 ++--
> > > >  include/hw/xen/xen-bus.h   |   8 +++
> > > >  include/hw/xen/xen-qdisk.h |  12 
> > > >  4 files changed, 166 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/hw/block/xen-qdisk.c b/hw/block/xen-qdisk.c
> > > > index 35f7b70480..8c88393832 100644
> > > > --- a/hw/block/xen-qdisk.c
> > > > +++ b/hw/block/xen-qdisk.c
> > > > @@ -9,6 +9,10 @@
> > > >  #include "qapi/visitor.h"
> > > >  #include "hw/hw.h"
> > > >  #include "hw/xen/xen-qdisk.h"
> > > > +#include "sysemu/blockdev.h"
> > > > +#include "sysemu/block-backend.h"
> > > > +#include "sysemu/iothread.h"
> > > > +#include "dataplane/xen-qdisk.h"
> > > >  #include "trace.h"
> > > >
> > > >  static char *xen_qdisk_get_name(XenDevice *xendev, Error **errp)
> > > > @@ -23,6 +27,11 @@ static void xen_qdisk_realize(XenDevice *xendev,
> > > Error **errp)
> > > >  {
> > > >  XenQdiskDevice *qdiskdev = XEN_QDISK_DEVICE(xendev);
> > > >  XenQdiskVdev *vdev = &qdiskdev->vdev;
> > > > +BlockConf *conf = &qdiskdev->conf;
> > > > +DriveInfo *dinfo;
> > > > +bool is_cdrom;
> > > > +unsigned int info;
> > > > +int64_t size;
> > > >
> > > >  if (!vdev->valid) {
> > > >  error_setg(errp, "vdev property not set");
> > > > @@ -30,13 +39,134 @@ static void xen_qdisk_realize(XenDevice
> *xendev,
> > > Error **errp)
> > > >  }
> > > >
> > > >  trace_xen_qdisk_realize(vdev->disk, vdev->partition);
> > > > +
> > > > +if (!conf->blk) {
> > > > +error_setg(errp, "drive property not set");
> > > > +return;
> > > > +}
> > > > +
> > > > +if (!blk_is_inserted(conf->blk)) {
> > > > +error_setg(errp, "device needs media, but drive is empty");
> > > > +return;
> > > > +}
> > >
> > > Hm, the code below suggests that you support CD-ROMs. Don't you want
> to
> > > support media change as well then? Which would mean that you need to
> > > support empty drives.
> >
> > Yes, that's a good point. I should get rid of that check.
> 
> Or rather apply it only to hard disks. And for empty CDs, you'll
> probably need to create an empty BlockBackend (the !conf->blk case).
> Just check the IDE and/or SCSI code for comparison.
> 
> > >
> > > > +if (!blkconf_apply_backend_options(conf, blk_is_read_only(conf-
> > > >blk),
> > > > +   false, errp)) {
> > > > +return;
> > > > +}
> > > > +
> > > > +if (!blkconf_geometry(conf, NULL, 65535, 255, 255, errp)) {
> > > > +return;
> > > > +}
> > > > +
> > > > +dinfo = blk_legacy_dinfo(conf->blk);
> > > > +is_cdrom = (dinfo && dinfo->media_cd);
> > >
> > > It's called legacy for a reason. Don't use this in new devices.
> > >
> > > The proper way is to have two different devices for hard disks and CDs
> > > (like scsi-hd and scsi-cd).
> >
> > ...or presumably I could have a property? The legacy init code could
> > then set it based on the drive info.
> 
> Technically yes, but why would that be a good way to model things? I
> mean, it's true that xen-qdisk is not real hardware, but I've never seen
> any hardware that has a switch to decide whether it should behave as a
> CD drive or a hard disk.
> 
> Both have very different characteristics (read-only with removable
> media, or a single read-write disk), and the existing implementations
> use two separate devices. So even if you're not conv

Re: [Xen-devel] [ARM] gvirt_to_maddr fails when DomU is created

2018-11-29 Thread Jan Beulich

>>> On 28.11.18 at 01:05,  wrote:
> update_runstate_area() using a virtual address is a complete misfeature,
> and the sooner we can replace it, the better.  It's history is with x86
> PV guests, where the early ABIs were designed in terms of Linux's
> copy_{to,from}_user().
> 
> It is similarly broken in x86 with meltdown mitigations, as well as SMAP
> considerations (PAN in ARM, iirc).
> 
> We've got two options.  Invent a new API which takes a gfn/gaddr, or
> retrofit the API to be "you pass a virtual address, we translate to
> gfn/gaddr, then update that".  Perhaps both.
> 
> When this was last discussed, I think the "onetime translate to
> gfn/gaddr" was a good enough compatibility to cope with existing guests,
> but that we should have a more clean way for modern guests.

I don't think a one-time translate can be a replacement without
the guest giving its consent, at which point the guest could as
well be switched to whatever the replacement interface is going
to be. Aiui (the introduction of the functionality predating my
involvement with Xen) the original idea was that guests would
specifically be allowed to context switch the mapping of the
involved linear address.

Furthermore for x86-64 guests we already have logic to deal
with the case where there is no present mapping at the time
the write is to occur, as that's a common situation with x86-64
guest user mode running with kernel page tables removed.
For x86-32 guests with Meltdown mitigation in place something
similar might indeed be needed. Whether something like this is
doable on ARM depends on whether Xen has a way to know
at which point missing mappings re-appear.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Juergen Gross

On 29/11/2018 02:22, Hans van Kranenburg wrote:
> Hi,
> 
> As also seen at:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
> 
> Attached there are two serial console output logs. One is starting with
> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
> 
> [2.085543] BUG: unable to handle kernel paging request at
> 888d9fffc000
> [2.085610] PGD 200c067 P4D 200c067 PUD 0
> [2.085674] Oops:  [#1] SMP NOPTI
> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
> [...]

The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.

Current upstream kernel is booting fine under Xen, so in general the
patch should be fine. Using an upstream kernel built from above commit
(with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
too.

Kirill, are you aware of any prerequisite patch from 4.20 which could be
missing in 4.19.5?

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 4/6] microcode: don't call apply_microcode() in cpu_request_microcode()

2018-11-29 Thread Roger Pau Monné

On Thu, Nov 29, 2018 at 12:28:46PM +0800, Chao Gao wrote:
> On Wed, Nov 28, 2018 at 04:02:25PM +0100, Roger Pau Monné wrote:
> >On Wed, Nov 28, 2018 at 01:34:14PM +0800, Chao Gao wrote:
> >> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
> >> index 8350d22..cca7b2c 100644
> >> --- a/xen/arch/x86/microcode.c
> >> +++ b/xen/arch/x86/microcode.c
> >> @@ -233,20 +233,12 @@ int microcode_resume_cpu(unsigned int cpu)
> >>  return err;
> >>  }
> >>  
> >> -static int microcode_update_cpu(const void *buf, size_t size)
> >> +static int microcode_update_cpu(void)
> >>  {
> >>  int err;
> >> -unsigned int cpu = smp_processor_id();
> >> -struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
> >>  
> >>  spin_lock(µcode_mutex);
> >> -
> >> -err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
> >> -if ( likely(!err) )
> >> -err = microcode_ops->cpu_request_microcode(cpu, buf, size);
> >> -else
> >> -__microcode_fini_cpu(cpu);
> >> -
> >> +err = microcode_ops->apply_microcode(smp_processor_id());
> >>  spin_unlock(µcode_mutex);
> >>  
> >>  return err;
> >> @@ -259,7 +251,7 @@ static long do_microcode_update(void *_info)
> >>  
> >>  BUG_ON(info->cpu != smp_processor_id());
> >>  
> >> -error = microcode_update_cpu(info->buffer, info->buffer_size);
> >> +error = microcode_update_cpu();
> >
> >Why don't you just set info->error = microcode_update_cpu()?
> >
> >AFAICT this is done to attempt to update the remaining CPUs if one
> >update failed?
> 
> Yes. But this patch doesn't change the logic here. Actually, if HT is
> enabled and microcode is shared between the logical threads of the same
> core, so if one thread updates microcode successfully, its sibling would
> always fail in current logic. I am trying to explain why we cannot abort
> the update even though an error is met in current logic. It definitely
> can be solved by tweaking the logic slightly. 
> 
> >
> >Is there anyway to rollback to the previous state so all CPUs have the
> >same microcode?
> 
> Seems it is not allowed to load a microcode with numeratically smaller
> revision according to 9.11.7.2.
> 
> With patch 6, a panic() would be triggered if some cpus failed to do the
> update. I didn't try to change the logic here.
> 
> >I assume nothing good will come out of running a
> >system with CPUs using different microcode versions, but maybe that's
> >not so bad?
> 
> It is better that all CPUs have the same microcode revision. 
> 
> Linux kernel rejects late microcode update if finding some CPUs
> offlined. I may port this patch to Xen too in a separate patch.

What happens with hotplug CPUs?

Even if you disable late loading when there are offlined CPUs you
still need to handle hotplug CPUs, which IMO should share the same
path with offlined CPUs: the microcode update should be done ASAP
after bringing the CPU up.

> >
> >>  if ( error )
> >>  info->error = error;
> >>  
> >> @@ -276,6 +268,8 @@ int 
> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
> >>  {
> >>  int ret;
> >>  struct microcode_info *info;
> >> +unsigned int cpu = smp_processor_id();
> >> +struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
> >>  
> >>  if ( len != (uint32_t)len )
> >>  return -E2BIG;
> >> @@ -294,10 +288,6 @@ int 
> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
> >>  return ret;
> >>  }
> >>  
> >> -info->buffer_size = len;
> >> -info->error = 0;
> >> -info->cpu = cpumask_first(&cpu_online_map);
> >> -
> >>  if ( microcode_ops->start_update )
> >>  {
> >>  ret = microcode_ops->start_update();
> >> @@ -308,6 +298,26 @@ int 
> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
> >>  }
> >>  }
> >>  
> >> +spin_lock(µcode_mutex);
> >> +
> >> +ret = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
> >> +if ( likely(!ret) )
> >> +ret = microcode_ops->cpu_request_microcode(cpu, info->buffer, 
> >> len);
> >> +else
> >> +__microcode_fini_cpu(cpu);
> >> +
> >> +spin_unlock(µcode_mutex);
> >
> >Why do you need to hold the lock here?
> >
> >microcode_update is already serialized and should only be executed on
> >a CPU at a time due to the usage of chained
> >continue_hypercall_on_cpu.
> 
> microcode_resume_cpu() also uses the 'uci' and the global microcode cache.
> This lock is to prevent them happening simultaneously (someone is
> adding/replacing entries to a list and another is reading the list).
> All existing call sites of collec_cpu_info() and cpu_request_microcode()
> are protected with this lock.

I mean, there should be some kind of protection to prevent the list
from changing at all while there's an update in progress, or else you
risk using different versions for different CPUs if there's a list
addition while a microcode update is in progres

Re: [Xen-devel] [ARM] gvirt_to_maddr fails when DomU is created

2018-11-29 Thread Juergen Gross

On 29/11/2018 10:39, Jan Beulich wrote:
 On 28.11.18 at 01:05,  wrote:
>> update_runstate_area() using a virtual address is a complete misfeature,
>> and the sooner we can replace it, the better.  It's history is with x86
>> PV guests, where the early ABIs were designed in terms of Linux's
>> copy_{to,from}_user().
>>
>> It is similarly broken in x86 with meltdown mitigations, as well as SMAP
>> considerations (PAN in ARM, iirc).
>>
>> We've got two options.  Invent a new API which takes a gfn/gaddr, or
>> retrofit the API to be "you pass a virtual address, we translate to
>> gfn/gaddr, then update that".  Perhaps both.
>>
>> When this was last discussed, I think the "onetime translate to
>> gfn/gaddr" was a good enough compatibility to cope with existing guests,
>> but that we should have a more clean way for modern guests.
> 
> I don't think a one-time translate can be a replacement without
> the guest giving its consent, at which point the guest could as
> well be switched to whatever the replacement interface is going
> to be. Aiui (the introduction of the functionality predating my
> involvement with Xen) the original idea was that guests would
> specifically be allowed to context switch the mapping of the
> involved linear address.
> 
> Furthermore for x86-64 guests we already have logic to deal
> with the case where there is no present mapping at the time
> the write is to occur, as that's a common situation with x86-64
> guest user mode running with kernel page tables removed.
> For x86-32 guests with Meltdown mitigation in place something
> similar might indeed be needed. Whether something like this is
> doable on ARM depends on whether Xen has a way to know
> at which point missing mappings re-appear.

In any case we want some interface using gfn/gaddr for performance
reasons: Always having to do a vaddr->gaddr translation is expensive
(especially for HVM/PVH and probably on ARM, too), so we should try to
avoid that.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 6/6] x86/microcode: Synchronize late microcode loading

2018-11-29 Thread Roger Pau Monné

On Thu, Nov 29, 2018 at 12:43:25PM +0800, Chao Gao wrote:
> On Wed, Nov 28, 2018 at 04:22:09PM +0100, Roger Pau Monné wrote:
> >On Wed, Nov 28, 2018 at 01:34:16PM +0800, Chao Gao wrote:
> >> This patch ports microcode improvement patches from linux kernel.
> >> 
> >> Before you read any further: the early loading method is still the
> >> preferred one and you should always do that. The following patch is
> >> improving the late loading mechanism for long running jobs and cloud use
> >> cases.
> >> 
> >> Gather all cores and serialize the microcode update on them by doing it
> >> one-by-one to make the late update process as reliable as possible and
> >> avoid potential issues caused by the microcode update.
> >> 
> >> Signed-off-by: Chao Gao 
> >> Tested-by: Chao Gao 
> >> [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
> >> [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
> >
> >If this patch is the squash of two Linux commits, please post the
> >ported versions of the two commits separately.
> 
> I don't understand this one.

You reference two Linux commits above, why is this done?

I assume this is because you are porting two Linux commits to Xen, in
which case I think that should be done in two different patches, or a
note needs to be added to why you merge two Linux commits into a
single Xen patch.

> >> @@ -311,13 +350,45 @@ int 
> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
> >>  if ( ret <= 0 )
> >>  {
> >>  printk("No valid or newer microcode found. Update abort!\n");
> >> -return -EINVAL;
> >> +ret = -EINVAL;
> >> +goto put;
> >>  }
> >>  
> >> -info->error = 0;
> >> -info->cpu = cpumask_first(&cpu_online_map);
> >> +atomic_set(&info->cpu_in, 0);
> >> +atomic_set(&info->cpu_out, 0);
> >> +
> >> +/* Calculate the number of online CPU core */
> >> +nr_cores = 0;
> >> +for_each_online_cpu(cpu)
> >> +if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
> >> +nr_cores++;
> >> +
> >> +printk("%d cores are to update its microcode\n", nr_cores);
> >>  
> >> -return continue_hypercall_on_cpu(info->cpu, do_microcode_update, 
> >> info);
> >> +/*
> >> + * We intend to disable interrupt for long time, which may lead to
> >> + * watchdog timeout.
> >> + */
> >> +watchdog_disable();
> >> +/*
> >> + * Late loading dance. Why the heavy-handed stop_machine effort?
> >> + *
> >> + * - HT siblings must be idle and not execute other code while the 
> >> other
> >> + *   sibling is loading microcode in order to avoid any negative
> >> + *   interactions cause by the loading.
> >
> >Well, the HT siblings will be executing code, since they are in a
> >while loop waiting for the non-siblings cores to finish updating.
> 
> Strictly speaking, you are right. The 'idle' I think means no other
> workload on the cpu except microcode loading (for a HT sibling which
> isn't chosen to do the update, means waiting for the completion of
> the other sibling).

Could you clarify the comment then?

By workload you mean that no other microcode loading should be
attempted from a HT sibling?

Is there a set of instructions or functionality that cannot be used by
HT siblings while performing a microcode load?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V10 4/5] p2m: Always use hostp2m when clipping rangesets

2018-11-29 Thread Jan Beulich

>>> On 28.11.18 at 22:56,  wrote:
> Changes since V9:
>  - Removed the patch RFC (replaced by a printk(XENLOG_G_WARNING).
>  - Reused start and end in change_type_range() and removed the
>intermediary variables range_start and range_end.
>  - Added an extra explanation for the if ( start > end ) return;
>code in the comment.

This last item isn't really taking care of the comments I gave on v9.
The _incoming_ start being larger than the _incoming_ end is
something worth to point out. But you put that check after clipping
end. Furthermore it looks like you continue to break the case
where ->max_mapped_pfn increases subsequently, i.e. you still
don't update the rangeset with the unmodified incoming values.
Or otherwise I would have expected an explanation (as a reply
to my comments, not necessarily by adding to description or
comments of the patch here) why either this is not an issue or I'm
misreading anything.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ARM] gvirt_to_maddr fails when DomU is created

2018-11-29 Thread Andrew Cooper

On 29/11/2018 09:39, Jan Beulich wrote:
 On 28.11.18 at 01:05,  wrote:
>> update_runstate_area() using a virtual address is a complete misfeature,
>> and the sooner we can replace it, the better.  It's history is with x86
>> PV guests, where the early ABIs were designed in terms of Linux's
>> copy_{to,from}_user().
>>
>> It is similarly broken in x86 with meltdown mitigations, as well as SMAP
>> considerations (PAN in ARM, iirc).
>>
>> We've got two options.  Invent a new API which takes a gfn/gaddr, or
>> retrofit the API to be "you pass a virtual address, we translate to
>> gfn/gaddr, then update that".  Perhaps both.
>>
>> When this was last discussed, I think the "onetime translate to
>> gfn/gaddr" was a good enough compatibility to cope with existing guests,
>> but that we should have a more clean way for modern guests.
> I don't think a one-time translate can be a replacement without
> the guest giving its consent, at which point the guest could as
> well be switched to whatever the replacement interface is going
> to be. Aiui (the introduction of the functionality predating my
> involvement with Xen) the original idea was that guests would
> specifically be allowed to context switch the mapping of the
> involved linear address.

That entirely depends on how guests currently use the address they set
up.  The purpose of investigating an option like this is specifically to
avoid needing to alter the guest...

>
> Furthermore for x86-64 guests we already have logic to deal
> with the case where there is no present mapping at the time
> the write is to occur, as that's a common situation with x86-64
> guest user mode running with kernel page tables removed.
> For x86-32 guests with Meltdown mitigation in place something
> similar might indeed be needed. Whether something like this is
> doable on ARM depends on whether Xen has a way to know
> at which point missing mappings re-appear.

... so we can get something which works without having to do a full
pagewalk (and nested vmexit!) on every context switch and each time we
toggle the guest pagetables.

Furthermore, looking at the content held in the runstate area, what do
guests actually use the information for?  It looks like it is of
marginal use for diagnostics within the guest.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V10 5/5] p2m: change_type_range: Only invalidate mapped gfns

2018-11-29 Thread Jan Beulich

>>> On 28.11.18 at 22:56,  wrote:
> change_range_type() invalidates gfn ranges to lazily change the type
> of a range of gfns, and also modifies the logdirty rangesets of that
> p2m. At the moment, it clips both down by the hostp2m.
> 
> While this will result in correct behavior, it's not entirely efficient,
> since invalidated entries outside that range will, on fault, simply be
> modified back to "empty" before faulting normally again.
> 
> Separate out the calculation of the two ranges.  Keep using the
> hostp2m's max_mapped_pfn to clip the logdirty ranges, but use the
> current p2m's max_mapped_pfn to further clip the invalidation range
> for alternate p2ms.
> 
> Signed-off-by: George Dunlap 
> Signed-off-by: Razvan Cojocaru 
> Reviewed-by: Jan Beulich 
> 
> ---
> CC: George Dunlap 
> CC: Jan Beulich 
> CC: Andrew Cooper 
> CC: Wei Liu 
> CC: "Roger Pau Monné" 
> 
> ---
> Changes since V9:
>  - Corrected function name in patch subject.

Funny: I didn't even notice the issue in the patch title. I had
pointed it out against the start of the description, where it
still exists.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ARM] gvirt_to_maddr fails when DomU is created

2018-11-29 Thread Jan Beulich

>>> On 29.11.18 at 11:03,  wrote:
> Furthermore, looking at the content held in the runstate area, what do
> guests actually use the information for?  It looks like it is of
> marginal use for diagnostics within the guest.

Correct steal time accounting, at the very least, depends on this
iirc. Steal time accounting being wrong has lead to multiple support
calls here in the past, so I don't think I'd subscribe to the "marginal"
part.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/6] microcode: save all microcodes which pass sanity check

2018-11-29 Thread Jan Beulich

>>> On 29.11.18 at 03:40,  wrote:
> On Wed, Nov 28, 2018 at 01:00:14PM +0100, Roger Pau Monné wrote:
>>On Wed, Nov 28, 2018 at 01:34:12PM +0800, Chao Gao wrote:
>>> @@ -491,6 +559,21 @@ static int cpu_request_microcode(unsigned int cpu, 
>>> const void *buf,
>>>  while ( (error = get_ucode_from_buffer_amd(mc_amd, buf, bufsize,
>>> &offset)) == 0 )
>>>  {
>>> +struct ucode_patch *ucode_patch;
>>> +
>>> +/*
>>> + * Save this microcode before checking the signature. It is to
>>> + * optimize microcode update on a mixed family system. Parsing
>>
>>Er, is it possible to have a system with CPUs of different family?
>>What's going to happen with CPUs having different features?
> 
> I have no idea. That each cpu has a per-cpu variable to store the
> microcode rather than a global one gives me a feeling that the current
> implementation wants to make it work on a system with CPUs of different
> family.

I think we've long given up on supporting mixed-model or even
mixed-family systems. Therefore in this overhaul doing away with
per-CPU tracking beyond the present ucode level (which indeed
may differ, even if we may have to consider to refuse keeping the
system up in such a case) would perhaps be pretty reasonable.

One thing definitely needs to work: Updating of ucode when
firmware has loaded differing versions (which usually boils down
to firmware neglecting to load ucode on all cores).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-4.11-testing baseline-only test] 75625: regressions - FAIL

2018-11-29 Thread Platform Team regression test user

This run is configured for baseline tests only.

flight 75625 xen-4.11-testing real [real]
http://osstest.xensource.com/osstest/logs/75625/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-qemuu-nested-intel 14 xen-boot/l1fail REGR. vs. 75588

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail like 75588
 test-amd64-amd64-i386-pvgrub 19 guest-start/debian.repeatfail   like 75588
 test-amd64-i386-xl-raw   10 debian-di-installfail   like 75588
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvshim   12 guest-start  fail   never pass
 test-armhf-armhf-xl-rtds 12 guest-start  fail   never pass
 test-armhf-armhf-xl-midway   12 guest-start  fail   never pass
 test-armhf-armhf-xl-credit1  12 guest-start  fail   never pass
 test-armhf-armhf-xl-multivcpu 12 guest-start  fail  never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install fail never pass
 test-armhf-armhf-xl-credit2  12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 12 guest-start  fail   never pass
 test-armhf-armhf-xl  12 guest-start  fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win10-i386 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-vhd  10 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-raw 10 debian-di-installfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win10-i386 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  49caabf2584a26d16f73b4bd423329f8d99f7e71
baseline version:
 xen  dea9fc0e02d92f5e6d46680aa0a52fa758eca9c4

Last test of basis75588  2018-11-11 18:23:28 Z   17 days
Testing same since75625  2018-11-29 00:19:25 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Paul Durrant 
  Roger Pau Monné 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-prev pass
 build-i386-prev  pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumprun  pass
 build-i386-rumprun   pass
 test-xtf-amd64-amd64-1   pass
 test-xtf-amd64-amd64-2   pass
 t

Re: [Xen-devel] [RFC 09/16] gic-vgic:vgic: avoid excessive conversions

2018-11-29 Thread Julien Grall


Hi Andrii,

On 28/11/2018 21:32, Andrii Anisov wrote:

From: Andrii Anisov 

Avoid excessive conversions between pending_irq and irq number/priority.
This simlifies functions interface and reduces under locks code size.


I was expecting the commit message to be updated based on the discussion we had 
when you sent this patch separately.


Cheers,



Signed-off-by: Andrii Anisov 
---
  xen/arch/arm/gic-vgic.c| 10 +++---
  xen/arch/arm/vgic-v3-its.c |  2 +-
  xen/arch/arm/vgic.c|  6 +++---
  xen/include/asm-arm/gic.h  |  3 ---
  xen/include/asm-arm/vgic.h |  2 ++
  5 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 9bbd0c5..f0e6c6f 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -75,10 +75,8 @@ void gic_remove_from_lr_pending(struct vcpu *v, struct 
pending_irq *p)
  list_del_init(&p->lr_queue);
  }
  
-void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)

+void gic_raise_inflight_irq(struct vcpu *v, struct pending_irq *n)
  {
-struct pending_irq *n = irq_to_pending(v, virtual_irq);
-
  #ifdef CONFIG_GICV3
  /* If an LPI has been removed meanwhile, there is nothing left to raise. 
*/
  if ( unlikely(!n) )
@@ -135,12 +133,10 @@ static unsigned int gic_find_unused_lr(struct vcpu *v,
  return lr;
  }
  
-void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,

-unsigned int priority)
+void gic_raise_guest_irq(struct vcpu *v, struct pending_irq *p)
  {
  int i;
  unsigned int nr_lrs = gic_get_nr_lrs();
-struct pending_irq *p = irq_to_pending(v, virtual_irq);
  
  ASSERT(spin_is_locked(&v->arch.vgic.lock));
  
@@ -233,7 +229,7 @@ static void gic_update_one_lr(struct vcpu *v, int i)

  if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) &&
   test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) &&
   !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
-gic_raise_guest_irq(v, irq, p->priority);
+gic_raise_guest_irq(v, p);
  else {
  list_del_init(&p->inflight);
  /*
diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
index 3c6693d..f9bd27f 100644
--- a/xen/arch/arm/vgic-v3-its.c
+++ b/xen/arch/arm/vgic-v3-its.c
@@ -448,7 +448,7 @@ static void update_lpi_vgic_status(struct vcpu *v, struct 
pending_irq *p)
  {
  if ( !list_empty(&p->inflight) &&
   !test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
-gic_raise_guest_irq(v, p->irq, p->lpi_priority);
+gic_raise_guest_irq(v, p);
  }
  else
  gic_remove_from_lr_pending(v, p);
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index e30a518..c142476 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -434,7 +434,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
  }
  out:
  if ( !list_empty(&p->inflight) && !test_bit(GIC_IRQ_GUEST_VISIBLE, 
&p->status) )
-gic_raise_guest_irq(v_target, irq, p->priority);
+gic_raise_guest_irq(v_target, p);
  spin_unlock_irqrestore(&v_target->arch.vgic.lock, flags);
  if ( p->desc != NULL )
  {
@@ -608,14 +608,14 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
  
  if ( !list_empty(&n->inflight) )

  {
-gic_raise_inflight_irq(v, virq);
+gic_raise_inflight_irq(v, n);
  goto out;
  }
  
  priority = vgic_get_virq_priority(v, virq);

  n->priority = priority;
  
-gic_raise_guest_irq(v, virq, priority);

+gic_raise_guest_irq(v, n);
  
  list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )

  {
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index fab02f1..3d7394d 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -252,9 +252,6 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned 
int virq,
  extern void gic_clear_pending_irqs(struct vcpu *v);
  
  extern void init_maintenance_interrupt(void);

-extern void gic_raise_guest_irq(struct vcpu *v, unsigned int irq,
-unsigned int priority);
-extern void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq);
  
  /* Accept an interrupt from the GIC and dispatch its handler */

  extern void gic_interrupt(struct cpu_user_regs *regs, int is_fiq);
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 7507162..a27a1a9 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -287,6 +287,8 @@ enum gic_sgi_mode;
  extern struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq);
  extern void vgic_remove_irq_from_queues(struct vcpu *v, struct pending_irq 
*p);
  extern void gic_remove_from_lr_pending(struct vcpu *v, struct pending_irq *p);
+extern void gic_raise_guest_irq(struct vcpu *v, struct pending_irq *p);
+extern void gic_raise_inflight_irq(struct vcpu *v, struct pending_ir

Re: [Xen-devel] vpci: deferral of register write until p2m changes are done

2018-11-29 Thread Roger Pau Monné

On Thu, Nov 29, 2018 at 02:25:02AM -0700, Jan Beulich wrote:
> >>> On 28.11.18 at 18:48,  wrote:
> > On Wed, Nov 28, 2018 at 10:04:33AM -0700, Jan Beulich wrote:
> >> >>> On 28.11.18 at 17:54,  wrote:
> >> > On Wed, Nov 28, 2018 at 09:22:16AM -0700, Jan Beulich wrote:
> >> >> >>> On 28.11.18 at 16:41,  wrote:
> >> >> > My plan is that DomUs won't be allowed to toggle the memory decoding
> >> >> > bit, and it's going to be always enabled, like it's currently done for
> >> >> > pci-passthrough in QEMU. Toggling the memory decoding bit in a DomU is
> >> >> > going to trigger a change to the p2m (map or unmap) but the command
> >> >> > register will always have the memory decoding bit enabled.
> >> >> 
> >> >> But this isn't entirely correct, even if we've got away with this
> >> >> so far. But we're mostly considering well-behaved guests and
> >> >> devices. What if one actually triggers bus activity in parallel to
> >> >> a BAR change?
> >> > 
> >> > Well, that's likely to not work properly in any case with or without
> >> > disabling the memory decoding bit?
> >> 
> >> Of course not.
> >> 
> >> > But I don't see how that's going to affect Xen stability (or what the
> >> > domain is attempting to achieve with it).
> >> 
> >> "I don't see how ..." != "That's not going to ...". And in case my
> >> prior way of wording it was ambiguous: We very much need to
> >> think about malicious guests (once any of this is to be extended
> >> to DomU-s). Hence a goal of "I want to crash Xen" needs to be
> >> taken into consideration.
> > 
> > Right, so Xen might what to disable memory decoding for DomUs also.
> > 
> > But that's orthogonal to whether we agree that the write to the
> > command register can happen before the p2m modifications, both in the
> > map and the unmap case. I think so, but I would like to be sure you
> > agree before I code this.
> 
> Thing is, as said before, I'm not sure. I can be convinced by, well,
> convincing arguments (which a proof that nothing bad can happen
> would be, but anything less likely wouldn't).

Hm, OK, please bear with me because I'm afraid I will need some help
in order to provide such argument.

We agree that the end result is going to be the same, ie: a p2m change
and a write to the device command register. Then the issue is the
order in which to perform those, and whether that order will have a
security impact.

We also agree that in the unmap case it's best to first disable memory
decoding and then perform the p2m unmap. So the only remaining
problematic case is the map action.

I guess the only vector of a possible attack could be other vCPUs in a
SMP guest, that could poke at either the device registers or the p2m
after the memory decoding bit has been set and while the map operation
is taking place:

 - Writing to the BARs MMIO while performing the p2m map and with the
   memory decoding bit is enabled can result in missing writes not
   reaching the device, because the p2m entries are not yet setup.
   This can also be triggered by the guest when the BARs are
   completely mapped by performing incorrect writes.

 - Attempting to program a DMA transaction to the device MMIO BAR
   regions will result in IOMMU page faults, the same can be achieved by
   attempting to perform DMA transactions to unpopulated memory
   regions.

I think there are other scenarios you are worried about and I'm
missing, could you please point them out?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] vpci: deferral of register write until p2m changes are done

2018-11-29 Thread Jan Beulich

>>> On 29.11.18 at 11:25,  wrote:
> On Thu, Nov 29, 2018 at 02:25:02AM -0700, Jan Beulich wrote:
>> >>> On 28.11.18 at 18:48,  wrote:
>> > On Wed, Nov 28, 2018 at 10:04:33AM -0700, Jan Beulich wrote:
>> >> >>> On 28.11.18 at 17:54,  wrote:
>> >> > On Wed, Nov 28, 2018 at 09:22:16AM -0700, Jan Beulich wrote:
>> >> >> >>> On 28.11.18 at 16:41,  wrote:
>> >> >> > My plan is that DomUs won't be allowed to toggle the memory decoding
>> >> >> > bit, and it's going to be always enabled, like it's currently done 
>> >> >> > for
>> >> >> > pci-passthrough in QEMU. Toggling the memory decoding bit in a DomU 
>> >> >> > is
>> >> >> > going to trigger a change to the p2m (map or unmap) but the command
>> >> >> > register will always have the memory decoding bit enabled.
>> >> >> 
>> >> >> But this isn't entirely correct, even if we've got away with this
>> >> >> so far. But we're mostly considering well-behaved guests and
>> >> >> devices. What if one actually triggers bus activity in parallel to
>> >> >> a BAR change?
>> >> > 
>> >> > Well, that's likely to not work properly in any case with or without
>> >> > disabling the memory decoding bit?
>> >> 
>> >> Of course not.
>> >> 
>> >> > But I don't see how that's going to affect Xen stability (or what the
>> >> > domain is attempting to achieve with it).
>> >> 
>> >> "I don't see how ..." != "That's not going to ...". And in case my
>> >> prior way of wording it was ambiguous: We very much need to
>> >> think about malicious guests (once any of this is to be extended
>> >> to DomU-s). Hence a goal of "I want to crash Xen" needs to be
>> >> taken into consideration.
>> > 
>> > Right, so Xen might what to disable memory decoding for DomUs also.
>> > 
>> > But that's orthogonal to whether we agree that the write to the
>> > command register can happen before the p2m modifications, both in the
>> > map and the unmap case. I think so, but I would like to be sure you
>> > agree before I code this.
>> 
>> Thing is, as said before, I'm not sure. I can be convinced by, well,
>> convincing arguments (which a proof that nothing bad can happen
>> would be, but anything less likely wouldn't).
> 
> Hm, OK, please bear with me because I'm afraid I will need some help
> in order to provide such argument.
> 
> We agree that the end result is going to be the same, ie: a p2m change
> and a write to the device command register. Then the issue is the
> order in which to perform those, and whether that order will have a
> security impact.
> 
> We also agree that in the unmap case it's best to first disable memory
> decoding and then perform the p2m unmap. So the only remaining
> problematic case is the map action.
> 
> I guess the only vector of a possible attack could be other vCPUs in a
> SMP guest, that could poke at either the device registers or the p2m
> after the memory decoding bit has been set and while the map operation
> is taking place:
> 
>  - Writing to the BARs MMIO while performing the p2m map and with the
>memory decoding bit is enabled can result in missing writes not
>reaching the device, because the p2m entries are not yet setup.
>This can also be triggered by the guest when the BARs are
>completely mapped by performing incorrect writes.

Not sure what "incorrect writes" you mean here. Ones to other GFNs
are not targeted at the device in question anyway. But yes, P2M
mappings not in place _should_ not have any bad effects. My issue
is with the "should not" here, which I'd like to be "cannot".

>  - Attempting to program a DMA transaction to the device MMIO BAR
>regions will result in IOMMU page faults, the same can be achieved by
>attempting to perform DMA transactions to unpopulated memory
>regions.
> 
> I think there are other scenarios you are worried about and I'm
> missing, could you please point them out?

That's the point: I can't point out other scenarios, but I also can't
convince myself that there are none. What I'm concerned about
from a more abstract pov are any actions by the guest which might
ultimately lead to master or target aborts (or alike), which may in
turn cause #SERR to be signaled. Yet then again we all know that
PCI pass-through is not perfectly secure despite all claims (or
should I say wishes), so maybe all of this is really tolerable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 14/18] xen: add implementations of xen-qdisk connect and disconnect functions...

2018-11-29 Thread Kevin Wolf

Am 29.11.2018 um 10:33 hat Paul Durrant geschrieben:
> > -Original Message-
> > From: Kevin Wolf [mailto:kw...@redhat.com]
> > Sent: 29 November 2018 09:01
> > To: Paul Durrant 
> > Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> > de...@lists.xenproject.org; Stefano Stabellini ;
> > Anthony Perard ; Max Reitz 
> > Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk connect
> > and disconnect functions...
> > 
> > Am 28.11.2018 um 17:40 hat Paul Durrant geschrieben:
> > > > -Original Message-
> > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > Sent: 28 November 2018 16:35
> > > > To: Paul Durrant 
> > > > Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> > > > de...@lists.xenproject.org; Stefano Stabellini
> > ;
> > > > Anthony Perard ; Max Reitz
> > 
> > > > Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk
> > connect
> > > > and disconnect functions...
> > > >
> > > > Am 21.11.2018 um 16:12 hat Paul Durrant geschrieben:
> > > > > ...and wire in the dataplane.
> > > > >
> > > > > This patch adds the remaining code to make the xen-qdisk XenDevice
> > > > > functional. The parameters that a block frontend expects to find are
> > > > > populated in the backend xenstore area, and the 'ring-ref' and
> > > > > 'event-channel' values specified in the frontend xenstore area are
> > > > > mapped/bound and used to set up the dataplane.
> > > > >
> > > > > Signed-off-by: Paul Durrant 
> > > > > ---
> > > > > Cc: Stefano Stabellini 
> > > > > Cc: Anthony Perard 
> > > > > Cc: Kevin Wolf 
> > > > > Cc: Max Reitz 
> > > > > ---
> > > > >  hw/block/xen-qdisk.c   | 140
> > > > +
> > > > >  hw/xen/xen-bus.c   |  12 ++--
> > > > >  include/hw/xen/xen-bus.h   |   8 +++
> > > > >  include/hw/xen/xen-qdisk.h |  12 
> > > > >  4 files changed, 166 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/hw/block/xen-qdisk.c b/hw/block/xen-qdisk.c
> > > > > index 35f7b70480..8c88393832 100644
> > > > > --- a/hw/block/xen-qdisk.c
> > > > > +++ b/hw/block/xen-qdisk.c
> > > > > @@ -9,6 +9,10 @@
> > > > >  #include "qapi/visitor.h"
> > > > >  #include "hw/hw.h"
> > > > >  #include "hw/xen/xen-qdisk.h"
> > > > > +#include "sysemu/blockdev.h"
> > > > > +#include "sysemu/block-backend.h"
> > > > > +#include "sysemu/iothread.h"
> > > > > +#include "dataplane/xen-qdisk.h"
> > > > >  #include "trace.h"
> > > > >
> > > > >  static char *xen_qdisk_get_name(XenDevice *xendev, Error **errp)
> > > > > @@ -23,6 +27,11 @@ static void xen_qdisk_realize(XenDevice *xendev,
> > > > Error **errp)
> > > > >  {
> > > > >  XenQdiskDevice *qdiskdev = XEN_QDISK_DEVICE(xendev);
> > > > >  XenQdiskVdev *vdev = &qdiskdev->vdev;
> > > > > +BlockConf *conf = &qdiskdev->conf;
> > > > > +DriveInfo *dinfo;
> > > > > +bool is_cdrom;
> > > > > +unsigned int info;
> > > > > +int64_t size;
> > > > >
> > > > >  if (!vdev->valid) {
> > > > >  error_setg(errp, "vdev property not set");
> > > > > @@ -30,13 +39,134 @@ static void xen_qdisk_realize(XenDevice
> > *xendev,
> > > > Error **errp)
> > > > >  }
> > > > >
> > > > >  trace_xen_qdisk_realize(vdev->disk, vdev->partition);
> > > > > +
> > > > > +if (!conf->blk) {
> > > > > +error_setg(errp, "drive property not set");
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +if (!blk_is_inserted(conf->blk)) {
> > > > > +error_setg(errp, "device needs media, but drive is empty");
> > > > > +return;
> > > > > +}
> > > >
> > > > Hm, the code below suggests that you support CD-ROMs. Don't you want
> > to
> > > > support media change as well then? Which would mean that you need to
> > > > support empty drives.
> > >
> > > Yes, that's a good point. I should get rid of that check.
> > 
> > Or rather apply it only to hard disks. And for empty CDs, you'll
> > probably need to create an empty BlockBackend (the !conf->blk case).
> > Just check the IDE and/or SCSI code for comparison.
> > 
> > > >
> > > > > +if (!blkconf_apply_backend_options(conf, blk_is_read_only(conf-
> > > > >blk),
> > > > > +   false, errp)) {
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +if (!blkconf_geometry(conf, NULL, 65535, 255, 255, errp)) {
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +dinfo = blk_legacy_dinfo(conf->blk);
> > > > > +is_cdrom = (dinfo && dinfo->media_cd);
> > > >
> > > > It's called legacy for a reason. Don't use this in new devices.
> > > >
> > > > The proper way is to have two different devices for hard disks and CDs
> > > > (like scsi-hd and scsi-cd).
> > >
> > > ...or presumably I could have a property? The legacy init code could
> > > then set it based on the drive info.
> > 
> > Technically yes, but why would that be a good way to model things? I
> > mean, it's true that xen-qdisk is not real hardware, but I've ne

Re: [Xen-devel] [PATCH 14/18] xen: add implementations of xen-qdisk connect and disconnect functions...

2018-11-29 Thread Paul Durrant

> -Original Message-
> From: Kevin Wolf [mailto:kw...@redhat.com]
> Sent: 29 November 2018 10:46
> To: Paul Durrant 
> Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> de...@lists.xenproject.org; Stefano Stabellini ;
> Anthony Perard ; Max Reitz 
> Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk connect
> and disconnect functions...
> 
> Am 29.11.2018 um 10:33 hat Paul Durrant geschrieben:
> > > -Original Message-
> > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > Sent: 29 November 2018 09:01
> > > To: Paul Durrant 
> > > Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> > > de...@lists.xenproject.org; Stefano Stabellini
> ;
> > > Anthony Perard ; Max Reitz
> 
> > > Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk
> connect
> > > and disconnect functions...
> > >
> > > Am 28.11.2018 um 17:40 hat Paul Durrant geschrieben:
> > > > > -Original Message-
> > > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > > Sent: 28 November 2018 16:35
> > > > > To: Paul Durrant 
> > > > > Cc: qemu-bl...@nongnu.org; qemu-de...@nongnu.org; xen-
> > > > > de...@lists.xenproject.org; Stefano Stabellini
> > > ;
> > > > > Anthony Perard ; Max Reitz
> > > 
> > > > > Subject: Re: [PATCH 14/18] xen: add implementations of xen-qdisk
> > > connect
> > > > > and disconnect functions...
> > > > >
> > > > > Am 21.11.2018 um 16:12 hat Paul Durrant geschrieben:
> > > > > > ...and wire in the dataplane.
> > > > > >
> > > > > > This patch adds the remaining code to make the xen-qdisk
> XenDevice
> > > > > > functional. The parameters that a block frontend expects to find
> are
> > > > > > populated in the backend xenstore area, and the 'ring-ref' and
> > > > > > 'event-channel' values specified in the frontend xenstore area
> are
> > > > > > mapped/bound and used to set up the dataplane.
> > > > > >
> > > > > > Signed-off-by: Paul Durrant 
> > > > > > ---
> > > > > > Cc: Stefano Stabellini 
> > > > > > Cc: Anthony Perard 
> > > > > > Cc: Kevin Wolf 
> > > > > > Cc: Max Reitz 
> > > > > > ---
> > > > > >  hw/block/xen-qdisk.c   | 140
> > > > > +
> > > > > >  hw/xen/xen-bus.c   |  12 ++--
> > > > > >  include/hw/xen/xen-bus.h   |   8 +++
> > > > > >  include/hw/xen/xen-qdisk.h |  12 
> > > > > >  4 files changed, 166 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/block/xen-qdisk.c b/hw/block/xen-qdisk.c
> > > > > > index 35f7b70480..8c88393832 100644
> > > > > > --- a/hw/block/xen-qdisk.c
> > > > > > +++ b/hw/block/xen-qdisk.c
> > > > > > @@ -9,6 +9,10 @@
> > > > > >  #include "qapi/visitor.h"
> > > > > >  #include "hw/hw.h"
> > > > > >  #include "hw/xen/xen-qdisk.h"
> > > > > > +#include "sysemu/blockdev.h"
> > > > > > +#include "sysemu/block-backend.h"
> > > > > > +#include "sysemu/iothread.h"
> > > > > > +#include "dataplane/xen-qdisk.h"
> > > > > >  #include "trace.h"
> > > > > >
> > > > > >  static char *xen_qdisk_get_name(XenDevice *xendev, Error
> **errp)
> > > > > > @@ -23,6 +27,11 @@ static void xen_qdisk_realize(XenDevice
> *xendev,
> > > > > Error **errp)
> > > > > >  {
> > > > > >  XenQdiskDevice *qdiskdev = XEN_QDISK_DEVICE(xendev);
> > > > > >  XenQdiskVdev *vdev = &qdiskdev->vdev;
> > > > > > +BlockConf *conf = &qdiskdev->conf;
> > > > > > +DriveInfo *dinfo;
> > > > > > +bool is_cdrom;
> > > > > > +unsigned int info;
> > > > > > +int64_t size;
> > > > > >
> > > > > >  if (!vdev->valid) {
> > > > > >  error_setg(errp, "vdev property not set");
> > > > > > @@ -30,13 +39,134 @@ static void xen_qdisk_realize(XenDevice
> > > *xendev,
> > > > > Error **errp)
> > > > > >  }
> > > > > >
> > > > > >  trace_xen_qdisk_realize(vdev->disk, vdev->partition);
> > > > > > +
> > > > > > +if (!conf->blk) {
> > > > > > +error_setg(errp, "drive property not set");
> > > > > > +return;
> > > > > > +}
> > > > > > +
> > > > > > +if (!blk_is_inserted(conf->blk)) {
> > > > > > +error_setg(errp, "device needs media, but drive is
> empty");
> > > > > > +return;
> > > > > > +}
> > > > >
> > > > > Hm, the code below suggests that you support CD-ROMs. Don't you
> want
> > > to
> > > > > support media change as well then? Which would mean that you need
> to
> > > > > support empty drives.
> > > >
> > > > Yes, that's a good point. I should get rid of that check.
> > >
> > > Or rather apply it only to hard disks. And for empty CDs, you'll
> > > probably need to create an empty BlockBackend (the !conf->blk case).
> > > Just check the IDE and/or SCSI code for comparison.
> > >
> > > > >
> > > > > > +if (!blkconf_apply_backend_options(conf,
> blk_is_read_only(conf-
> > > > > >blk),
> > > > > > +   false, errp)) {
> > > > > > +return;
> > > > > > +}
> > > > > > +
> > > > > > +if (!blkconf_geometry(conf, NULL, 65535, 255, 255, errp)) {
> > > > > > +

Re: [Xen-devel] Xen stable-4.11 crash when trying to start a VM on fedora 29

2018-11-29 Thread Jan Beulich

>>> On 28.11.18 at 17:46,  wrote:
> I managed to compile and install Xen from source,using the  stable-4.11 
> branch.
> 
> I configured my VMs in libvirt, but when I try to start one of them, the 
> host completely crashes after a few seconds (the VM is never started), and 
> reboots.
> 
> I ran dmesg -w before starting the VM, and the ouput
> is 2 general protection faults:
> https://gist.github.com/Wenzel/763b17fae773f7dd6b1d8b303187f3c0 

Without a full hypervisor log this is going to remain guesswork, but
could you check whether "pcid=no" and/or "pv-l1tf=no" on the Xen
boot command line help?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 00/16] Old GIC (gic-vgic) optimizations for GICV2

2018-11-29 Thread Julien Grall


Hi,

On 29/11/2018 07:40, Andrii Anisov wrote:


Hello,

Again, I sent this cover letter only to myself. So, here it is, hope it does 
not break the thread. Sorry for the mess.


It is seems to be part of the thread. Thank you!




From: Andrii Anisov 
Sent: Wednesday, November 28, 2018 11:31 PM
Cc: Andrii Anisov
Subject: [RFC 00/16] Old GIC (gic-vgic) optimizations for GICV2
   


From: Andrii Anisov 

This patch series is an attempt to reduce IRQ latency with the
old GIC implementation (gic-vgic). These patches originally based
on XEN 4.10 release. The motivation was to improve benchmark
results of a system given to a customer for evaluation.


I will repeat what I said on patch #15 and expanding it a bit. This series
is touching the old vGIC we plan to decommissioned soon. Amongst may issues
with the old vGIC the majors one are the locking is fragile
and edge/level interrupts are not handled correctly.

Furthermore, a lot of those patches seem to borrow the ideas from the new vGIC.
So I think it would be more beneficial to focus on the new vGIC that will be
the only vGIC supported soon.



This patch series is tailored for GICv2 on RCAR H3. > Several
of the most controversial patches (i.e. LRs shadowing) were
not shared to the customer, and here are for comments and discussion.


We need benchmark and detailed explanation to understand where the latency
comes from and how every patches reduce the latency. Ideally I would like to
see the performance improvement done patch by patch so we can decide
which one are worth it.


I hope several patches from here could be upstreamed. Some as is,
others with modifications.

There are several simple ideas behind these changes:
     - reduce an excessive code (condition checks)


This makes sense.


     - drop an excessive peripheral register accesses


This needs discussion. I will have a look at the patches.


     - if not reduce, then move an excessive code out of spinlocks
     - if not drop, then move an excessive register
   accesses out of spinlocks

In general spinlocks should not be contended otherwise you have other
issues. Moving code of spinlock is only worth it if that code may
return in most of the cases.



With porting existing patches, I've got another idea that accessing
PER_CPU variables like `current` and `lr_mask` is not really cheap.

How come? The resulting code for lr_mask is pretty trivial.

C function:

uint64_t foo(void)
{
return this_cpu(lr_mask);
}

Assembly code:

44:   9000adrpx0, 0 
48:   9100add x0, x0, #0x0
4c:   d53cd041mrs x1, tpidr_el2
50:   f8616800ldr x0, [x0, x1]
54:   d65f03c0ret

So this is one memory load. Leaving aside nested virt, TPDIR_EL2
should be fast to access.


So it should produce faster code keeping `lr_mask` solely in
`struct vcpu` and pass `current` as a function parameter instead
of calculation it each time in called functions.


So you are going to save 3 instructions... But that's just a drop in the bucket 
compare to the rest of code. Actually, I would be surprised if you actually 
notice it on your benchmark.


Overall, I think that there are other places to look at where
the benefits will be much bigger. We can discuss about smaller one if the 
performance are still low.


Cheers,



Andrii Anisov (15):
   hack: drop GIC v3 support
   vgic: move pause_flags check out of vgic spinlock
   vgic: move irq_to_pending out of lock
   gic-vgic: Drop an excessive clear_lrs
   gic: drop interrupts enabling on interrupts processing
   gic-vgic:vgic: do not keep disabled IRQs in any of queues
   gic: separate ppi processing
   gic-vgic:vgic: avoid excessive conversions
   gic:vgic:gic-vgic: introduce non-atomic bitops
   irq: skip action avalability check for guest's IRQ
   gic-v2: Write HCR only on change
   gic-vgic: skip irqs locking
   hack: arm/domain: simplify context restore from idle vcpu
   hack: move gicv2 LRs reads and writes out of spinlocks
   gic: vgic: align frequently accessed data by cache line size

Julien Grall (1):
   xen/arm: Re-enable interrupt later in the trap path

  xen/arch/arm/arm64/entry.S   | 11 ++---
  xen/arch/arm/configs/arm64_defconfig |  1 +
  xen/arch/arm/domain.c    | 25 ++
  xen/arch/arm/gic-v2.c    | 63 
  xen/arch/arm/gic-v3-lpi.c    |  2 +
  xen/arch/arm/gic-v3.c    |  2 +
  xen/arch/arm/gic-vgic.c  | 84 +++-
  xen/arch/arm/gic.c   | 32 +++--
  xen/arch/arm/irq.c   | 46 +++---
  xen/arch/arm/traps.c |  6 +++
  xen/arch/arm/vgic-v3-its.c   |  3 +-
  xen/arch/arm/vgic.c  | 93 ++--
  xen/arch/arm/vgic/vgic.c |  2 +
  xen/include/asm-arm/config.h |  2 +-
  xen/include/asm-arm/gic.h    | 10 ++--
  xen/include/asm-arm/irq.h    |

[Xen-devel] [distros-debian-wheezy test] 75626: all pass

2018-11-29 Thread Platform Team regression test user

flight 75626 distros-debian-wheezy real [real]
http://osstest.xensource.com/osstest/logs/75626/

Perfect :-)
All tests in this flight passed as required
baseline version:
 flight   75617

jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-wheezy-netboot-pvgrub pass
 test-amd64-i386-i386-wheezy-netboot-pvgrub   pass
 test-amd64-i386-amd64-wheezy-netboot-pygrub  pass
 test-amd64-amd64-i386-wheezy-netboot-pygrub  pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xensource.com/osstest/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 04/16] vgic: move irq_to_pending out of lock

2018-11-29 Thread Julien Grall




On 28/11/2018 21:31, Andrii Anisov wrote:

From: Andrii Anisov 

For GICV2 pending_irq allocation is not concurrent, so reduce
some code under lock.


So this is reverting part of 5f66da659060563df8481a86c017f07455095045
"ARM: vGIC: move irq_to_pending() calls under the VGIC VCPU lock".

The vGIC is meant to be common, so I don't think the patch would be correct for 
GICv3.


However, this function can only be called with SPI (PPI/SGI can't migrate). All 
but one SPI are associated to a physical interrupts. So I am not entirely 
convinced this will have any benefits on GICv2. Do you see any contention on the 
lock?




Signed-off-by: Andrii Anisov 
---
  xen/arch/arm/vgic.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index a64633f..37857b1 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -267,17 +267,16 @@ bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, 
unsigned int irq)
  ASSERT(!is_lpi(irq));
  #endif
  
-spin_lock_irqsave(&old->arch.vgic.lock, flags);

-
  p = irq_to_pending(old, irq);
  
  /* nothing to do for virtual interrupts */

  if ( p->desc == NULL )
  {
-spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
  return true;
  }
  
+spin_lock_irqsave(&old->arch.vgic.lock, flags);

+
  /* migration already in progress, no need to do anything */
  if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
  {
@@ -569,8 +568,6 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
  return;
  }
  
-spin_lock_irqsave(&v->arch.vgic.lock, flags);

-
  n = irq_to_pending(v, virq);
  #ifdef CONFIG_GICV3
  /* If an LPI has been removed, there is nothing to inject here. */
@@ -581,6 +578,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
  }
  #endif
  
+spin_lock_irqsave(&v->arch.vgic.lock, flags);

+
  set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
  
  if ( !list_empty(&n->inflight) )




Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ARM] gvirt_to_maddr fails when DomU is created

2018-11-29 Thread Julien Grall


Hi Juergen,

I have noticed on the last few threads that your e-mails don't get threaded
correctly. Looking at the source, I can't find the In-Reply-To tag. Do you have 
any issue with your e-mail?


On 29/11/2018 09:51, Juergen Gross wrote:

On 29/11/2018 10:39, Jan Beulich wrote:

On 28.11.18 at 01:05,  wrote:

update_runstate_area() using a virtual address is a complete misfeature,
and the sooner we can replace it, the better.  It's history is with x86
PV guests, where the early ABIs were designed in terms of Linux's
copy_{to,from}_user().

It is similarly broken in x86 with meltdown mitigations, as well as SMAP
considerations (PAN in ARM, iirc).

We've got two options.  Invent a new API which takes a gfn/gaddr, or
retrofit the API to be "you pass a virtual address, we translate to
gfn/gaddr, then update that".  Perhaps both.

When this was last discussed, I think the "onetime translate to
gfn/gaddr" was a good enough compatibility to cope with existing guests,
but that we should have a more clean way for modern guests.


I don't think a one-time translate can be a replacement without
the guest giving its consent, at which point the guest could as
well be switched to whatever the replacement interface is going
to be. Aiui (the introduction of the functionality predating my
involvement with Xen) the original idea was that guests would
specifically be allowed to context switch the mapping of the
involved linear address.

Furthermore for x86-64 guests we already have logic to deal
with the case where there is no present mapping at the time
the write is to occur, as that's a common situation with x86-64
guest user mode running with kernel page tables removed.
For x86-32 guests with Meltdown mitigation in place something
similar might indeed be needed. Whether something like this is
doable on ARM depends on whether Xen has a way to know
at which point missing mappings re-appear.


In any case we want some interface using gfn/gaddr for performance
reasons: Always having to do a vaddr->gaddr translation is expensive
(especially for HVM/PVH and probably on ARM, too), so we should try to
avoid that.


On Arm, performance is one of the reason, but not the main one. Using Virtual 
Address is by default unreliable as you have no way to prevent the guest to play 
with its page-tables. For instance, Arm requires to go through an invalid state 
when updating the page-tables entry under certain condition (e.g superpage <-> 
small mapping).


The long-term goal is to use guest physical address for all hypercalls rather 
than guest virtual address.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] xen: Introduce shared buffer helpers for page directory...

2018-11-29 Thread Oleksandr Andrushchenko


ping

On 11/22/18 12:02 PM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

based frontends. Currently the frontends which implement
similar code for sharing big buffers between frontend and
backend are para-virtualized DRM and sound drivers.
Both define the same way to share grant references of a
data buffer with the corresponding backend with little
differences.

Move shared code into a helper module, so there is a single
implementation of the same functionality for all.

Signed-off-by: Oleksandr Andrushchenko 
---
  drivers/xen/Kconfig |   3 +
  drivers/xen/Makefile|   1 +
  drivers/xen/xen-front-pgdir-shbuf.c | 553 
  include/xen/xen-front-pgdir-shbuf.h |  89 +
  4 files changed, 646 insertions(+)
  create mode 100644 drivers/xen/xen-front-pgdir-shbuf.c
  create mode 100644 include/xen/xen-front-pgdir-shbuf.h

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 815b9e9bb975..838b66a9a0e7 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -340,4 +340,7 @@ config XEN_SYMS
  config XEN_HAVE_VPMU
 bool
  
+config XEN_FRONT_PGDIR_SHBUF

+   tristate
+
  endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 3e542f60f29f..c48927a58e10 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -44,3 +44,4 @@ xen-gntdev-y  := gntdev.o
  xen-gntdev-$(CONFIG_XEN_GNTDEV_DMABUF)+= gntdev-dmabuf.o
  xen-gntalloc-y:= gntalloc.o
  xen-privcmd-y := privcmd.o privcmd-buf.o
+obj-$(CONFIG_XEN_FRONT_PGDIR_SHBUF)+= xen-front-pgdir-shbuf.o
diff --git a/drivers/xen/xen-front-pgdir-shbuf.c 
b/drivers/xen/xen-front-pgdir-shbuf.c
new file mode 100644
index ..48a658dc7ccf
--- /dev/null
+++ b/drivers/xen/xen-front-pgdir-shbuf.c
@@ -0,0 +1,553 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+
+/*
+ * Xen frontend/backend page directory based shared buffer
+ * helper module.
+ *
+ * Copyright (C) 2018 EPAM Systems Inc.
+ *
+ * Author: Oleksandr Andrushchenko 
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#ifndef GRANT_INVALID_REF
+/*
+ * FIXME: usage of grant reference 0 as invalid grant reference:
+ * grant reference 0 is valid, but never exposed to a PV driver,
+ * because of the fact it is already in use/reserved by the PV console.
+ */
+#define GRANT_INVALID_REF  0
+#endif
+
+/**
+ * This structure represents the structure of a shared page
+ * that contains grant references to the pages of the shared
+ * buffer. This structure is common to many Xen para-virtualized
+ * protocols at include/xen/interface/io/
+ */
+struct xen_page_directory {
+   grant_ref_t gref_dir_next_page;
+   grant_ref_t gref[1]; /* Variable length */
+};
+
+/**
+ * Shared buffer ops which are differently implemented
+ * depending on the allocation mode, e.g. if the buffer
+ * is allocated by the corresponding backend or frontend.
+ * Some of the operations.
+ */
+struct xen_front_pgdir_shbuf_ops {
+   /*
+* Calculate number of grefs required to handle this buffer,
+* e.g. if grefs are required for page directory only or the buffer
+* pages as well.
+*/
+   void (*calc_num_grefs)(struct xen_front_pgdir_shbuf *buf);
+
+   /* Fill page directory according to para-virtual display protocol. */
+   void (*fill_page_dir)(struct xen_front_pgdir_shbuf *buf);
+
+   /* Claim grant references for the pages of the buffer. */
+   int (*grant_refs_for_buffer)(struct xen_front_pgdir_shbuf *buf,
+grant_ref_t *priv_gref_head, int gref_idx);
+
+   /* Map grant references of the buffer. */
+   int (*map)(struct xen_front_pgdir_shbuf *buf);
+
+   /* Unmap grant references of the buffer. */
+   int (*unmap)(struct xen_front_pgdir_shbuf *buf);
+};
+
+/**
+ * Get granted reference to the very first page of the
+ * page directory. Usually this is passed to the backend,
+ * so it can find/fill the grant references to the buffer's
+ * pages.
+ *
+ * \param buf shared buffer which page directory is of interest.
+ * \return granted reference to the very first page of the
+ * page directory.
+ */
+grant_ref_t
+xen_front_pgdir_shbuf_get_dir_start(struct xen_front_pgdir_shbuf *buf)
+{
+   if (!buf->grefs)
+   return GRANT_INVALID_REF;
+
+   return buf->grefs[0];
+}
+EXPORT_SYMBOL_GPL(xen_front_pgdir_shbuf_get_dir_start);
+
+/**
+ * Map granted references of the shared buffer.
+ *
+ * Depending on the shared buffer mode of allocation
+ * (be_alloc flag) this can either do nothing (for buffers
+ * shared by the frontend itself) or map the provided granted
+ * references onto the backing storage (buf->pages).
+ *
+ * \param buf shared buffer which grants to be maped.
+ * \return zero on success or a negative number on failure.
+ */
+int xen_front_pgdir_sh

Re: [Xen-devel] [PATCH] tools/firmware: update OVMF Makefile

2018-11-29 Thread Anthony PERARD

On Wed, Nov 28, 2018 at 05:43:33PM +, Wei Liu wrote:
> OVMF has become dependent on OpenSSL, which it is included as a submodule.
> Initialise submodules before building.
> 
> Signed-off-by: Wei Liu 
> ---
> This should fix the build breakage for OVMF branch in OSSTEST.
> 
> Cc: Anthony PERARD 
> Cc: Ian Jackson 
> ---
>  tools/firmware/ovmf-makefile | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/firmware/ovmf-makefile b/tools/firmware/ovmf-makefile
> index 2838744461..3de2fc0300 100644
> --- a/tools/firmware/ovmf-makefile
> +++ b/tools/firmware/ovmf-makefile
> @@ -16,6 +16,7 @@ all: build
>  
>  .PHONY: build
>  build:
> + $(GIT) submodule update --init --recursive
>   OvmfPkg/build.sh -a X64 -b $(TARGET) -n 4
>   cp Build/OvmfX64/$(TARGET)_GCC*/FV/OVMF.fd ovmf.bin
>  

What about the release tarball? Do we includes OVMF in it?

Also, doesn't osstest needs some updates? I forgot if there is something
to do when projects have submodules.

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.12 0/2] xen/arm: mm: Boot fixes

2018-11-29 Thread Julien Grall

Hi all,

This patch series fixes 2 bug in the boot code for the memory management.

The first patch should resolve Xen stall when setting SCTLR.XN on some
platforms.

The second patch should allow to boot Xen again the Hikey board.

Cheers,

Cc: Shameerali Kolothum Thodi 
Cc: Jan-Peter Larsson 
Cc: Matthew Daley 

Julien Grall (2):
  xen/arm: mm: Set-up page permission for Xen mappings earlier on
  xen/arm: Stop relocating Xen

 xen/arch/arm/arm32/head.S | 54 +++---
 xen/arch/arm/arm64/head.S | 50 +--
 xen/arch/arm/mm.c | 67 ++-
 xen/arch/arm/setup.c  | 65 +++--
 xen/include/asm-arm/mm.h  |  2 +-
 5 files changed, 39 insertions(+), 199 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] xen/arm: Stop relocating Xen

2018-11-29 Thread Julien Grall

At the moment, Xen is relocated towards the end of the memory. While
this has the advantage to free space in low memory, the code is not
compliant with the break-before-make because it requires to switch
between two sets of page-table. This is not entirely trivial to fix as
it would require us to go through an identity mapping and disabling MMU.

Furthermore, it looks like that some platform (such as the Hikey960)
may not be able to bring-up secondary CPUs if the entry is too high.

I don't believe the low memory is an issue because Xen is quite tiny
(< 2MB). So the best solution is to stop relocating Xen. This has the
advantage to simplify the code and should speed-up the boot as relocation
is not necessary anymore.

Note that the break-before-make issue is not fixed by this patch.

Signed-off-by: Julien Grall 
Reported-by: Matthew Daley 
---
 xen/arch/arm/arm32/head.S | 54 +++
 xen/arch/arm/arm64/head.S | 50 +---
 xen/arch/arm/mm.c | 20 ---
 xen/arch/arm/setup.c  | 65 +++
 xen/include/asm-arm/mm.h  |  2 +-
 5 files changed, 18 insertions(+), 173 deletions(-)

diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 93b51e9ef2..390a505e05 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -469,58 +469,12 @@ fail:   PRINT("- Boot failed -\r\n")
 GLOBAL(_end_boot)
 
 /*
- * Copy Xen to new location and switch TTBR
+ * Switch TTBR
  * r1:r0   ttbr
- * r2  source address
- * r3  destination address
- * [sp]=>r4length
  *
- * Source and destination must be word aligned, length is rounded up
- * to a 16 byte boundary.
- *
- * MUST BE VERY CAREFUL when saving things to RAM over the copy
+ * TODO: This code does not comply with break-before-make.
  */
-ENTRY(relocate_xen)
-push {r4,r5,r6,r7,r8,r9,r10,r11}
-
-ldr   r4, [sp, #8*4]/* Get 4th argument from stack */
-
-/* Copy 16 bytes at a time using:
- * r5:  counter
- * r6:  data
- * r7:  data
- * r8:  data
- * r9:  data
- * r10: source
- * r11: destination
- */
-mov   r5, r4
-mov   r10, r2
-mov   r11, r3
-1:  ldmia r10!, {r6, r7, r8, r9}
-stmia r11!, {r6, r7, r8, r9}
-
-subs  r5, r5, #16
-bgt   1b
-
-/* Flush destination from dcache using:
- * r5: counter
- * r6: step
- * r7: vaddr
- */
-dsb/* So the CPU issues all writes to the range */
-
-mov   r5, r4
-ldr   r6, =dcache_line_bytes /* r6 := step */
-ldr   r6, [r6]
-mov   r7, r3
-
-1:  mcr   CP32(r7, DCCMVAC)
-
-add   r7, r7, r6
-subs  r5, r5, r6
-bgt   1b
-
+ENTRY(switch_ttbr)
 dsb/* Ensure the flushes happen before
 * continuing */
 isb/* Ensure synchronization with previous
@@ -543,8 +497,6 @@ ENTRY(relocate_xen)
 dsb/* Ensure completion of TLB+BP flush */
 isb
 
-pop {r4, r5,r6,r7,r8,r9,r10,r11}
-
 mov pc, lr
 
 #ifdef CONFIG_EARLY_PRINTK
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 9428c3f5a2..4589a37874 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -605,52 +605,14 @@ fail:   PRINT("- Boot failed -\r\n")
 
 GLOBAL(_end_boot)
 
-/* Copy Xen to new location and switch TTBR
- * x0ttbr
- * x1source address
- * x2destination address
- * x3length
+/*
+ * Switch TTBR
  *
- * Source and destination must be word aligned, length is rounded up
- * to a 16 byte boundary.
+ * x0ttbr
  *
- * MUST BE VERY CAREFUL when saving things to RAM over the copy */
-ENTRY(relocate_xen)
-/* Copy 16 bytes at a time using:
- *   x9: counter
- *   x10: data
- *   x11: data
- *   x12: source
- *   x13: destination
- */
-mov x9, x3
-mov x12, x1
-mov x13, x2
-
-1:  ldp x10, x11, [x12], #16
-stp x10, x11, [x13], #16
-
-subsx9, x9, #16
-bgt 1b
-
-/* Flush destination from dcache using:
- * x9: counter
- * x10: step
- * x11: vaddr
- */
-dsb   sy/* So the CPU issues all writes to the range */
-
-mov   x9, x3
-ldr   x10, =dcache_line_bytes /* x10 := step */
-ldr   x10, [x10]
-mov   x11, x2
-
-1:  dccvac, x11
-
-add   x11, x11, x10
-subs  x9, x9, x10
-bgt   1b
-
+ * TODO: This code does not comply with break-before-make.
+ */
+ENTRY(switch_ttbr)
 dsb   sy /* Ensure the flushes happen before
   * continuing */
 isb

[Xen-devel] [PATCH 1/2] xen/arm: mm: Set-up page permission for Xen mappings earlier on

2018-11-29 Thread Julien Grall

Xen mapping is first create using a 2MB page and then shatterred in 4KB
page for fine-graine permission. However, it is not safe to break-down
superpage page without going to an intermediate step invalidating
the entry.

As we are changing Xen mappings, we cannot go through the intermediate
step. The only solution is to create Xen mapping using 4KB entries
directly. As the Xen should always access the mappings according with
the runtime permission, it is then possible to set-up the permissions
while create the mapping.

We are still playing with the fire as there are still some
break-before-make issue in setup_pagetables (i.e switch between 2 sets of
page-tables). But it should slightly be better than the current state.

Signed-off-by: Julien Grall 
Reported-by: Shameerali Kolothum Thodi 
Reported-by: Jan-Peter Larsson 

---
I had few reports on new platforms where Xen reliably stale as soon as
SCTLR.WXN is turned on. This likely happens because of not complying
with Break-Before-Make when setting-up the permission as we
break-down a superpage to 4KB mappings.
---
 xen/arch/arm/mm.c | 49 ++---
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 987fcb9162..2556e57a99 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -649,11 +649,31 @@ void __init setup_pagetables(unsigned long 
boot_phys_offset, paddr_t xen_paddr)
 }
 #endif
 
+/* Break up the Xen mapping into 4k pages and protect them separately. */
+for ( i = 0; i < LPAE_ENTRIES; i++ )
+{
+mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i);
+unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT);
+
+if ( !is_kernel(va) )
+break;
+pte = mfn_to_xen_entry(mfn, MT_NORMAL);
+pte.pt.table = 1; /* 4k mappings always have this bit set */
+if ( is_kernel_text(va) || is_kernel_inittext(va) )
+{
+pte.pt.xn = 0;
+pte.pt.ro = 1;
+}
+if ( is_kernel_rodata(va) )
+pte.pt.ro = 1;
+xen_xenmap[i] = pte;
+}
+
 /* Initialise xen second level entries ... */
 /* ... Xen's text etc */
 
-pte = mfn_to_xen_entry(maddr_to_mfn(xen_paddr), MT_NORMAL);
-pte.pt.xn = 0;/* Contains our text mapping! */
+pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
+pte.pt.table = 1;
 xen_second[second_table_offset(XEN_VIRT_START)] = pte;
 
 /* ... Fixmap */
@@ -693,31 +713,6 @@ void __init setup_pagetables(unsigned long 
boot_phys_offset, paddr_t xen_paddr)
 clear_table(boot_second);
 clear_table(boot_third);
 
-/* Break up the Xen mapping into 4k pages and protect them separately. */
-for ( i = 0; i < LPAE_ENTRIES; i++ )
-{
-mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i);
-unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT);
-if ( !is_kernel(va) )
-break;
-pte = mfn_to_xen_entry(mfn, MT_NORMAL);
-pte.pt.table = 1; /* 4k mappings always have this bit set */
-if ( is_kernel_text(va) || is_kernel_inittext(va) )
-{
-pte.pt.xn = 0;
-pte.pt.ro = 1;
-}
-if ( is_kernel_rodata(va) )
-pte.pt.ro = 1;
-write_pte(xen_xenmap + i, pte);
-/* No flush required here as page table is not hooked in yet. */
-}
-
-pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
-pte.pt.table = 1;
-write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte);
-/* TLBFLUSH and ISB would be needed here, but wait until we set WXN */
-
 /* From now on, no mapping may be both writable and executable. */
 WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2);
 /* Flush everything after setting WXN bit. */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] xen/arm: mm: Set-up page permission for Xen mappings earlier on

2018-11-29 Thread Julien Grall

Xen mapping is first create using a 2MB page and then shatterred in 4KB
page for fine-graine permission. However, it is not safe to break-down
superpage page without going to an intermediate step invalidating
the entry.

As we are changing Xen mappings, we cannot go through the intermediate
step. The only solution is to create Xen mapping using 4KB entries
directly. As the Xen should always access the mappings according with
the runtime permission, it is then possible to set-up the permissions
while create the mapping.

We are still playing with the fire as there are still some
break-before-make issue in setup_pagetables (i.e switch between 2 sets of
page-tables). But it should slightly be better than the current state.

Signed-off-by: Julien Grall 
Reported-by: Shameerali Kolothum Thodi 
Reported-by: Jan-Peter Larsson 

---
I had few reports on new platforms where Xen reliably stale as soon as
SCTLR.WXN is turned on. This likely happens because of not complying
with Break-Before-Make when setting-up the permission as we
break-down a superpage to 4KB mappings.
---
 xen/arch/arm/mm.c | 49 ++---
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 987fcb9162..2556e57a99 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -649,11 +649,31 @@ void __init setup_pagetables(unsigned long 
boot_phys_offset, paddr_t xen_paddr)
 }
 #endif
 
+/* Break up the Xen mapping into 4k pages and protect them separately. */
+for ( i = 0; i < LPAE_ENTRIES; i++ )
+{
+mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i);
+unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT);
+
+if ( !is_kernel(va) )
+break;
+pte = mfn_to_xen_entry(mfn, MT_NORMAL);
+pte.pt.table = 1; /* 4k mappings always have this bit set */
+if ( is_kernel_text(va) || is_kernel_inittext(va) )
+{
+pte.pt.xn = 0;
+pte.pt.ro = 1;
+}
+if ( is_kernel_rodata(va) )
+pte.pt.ro = 1;
+xen_xenmap[i] = pte;
+}
+
 /* Initialise xen second level entries ... */
 /* ... Xen's text etc */
 
-pte = mfn_to_xen_entry(maddr_to_mfn(xen_paddr), MT_NORMAL);
-pte.pt.xn = 0;/* Contains our text mapping! */
+pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
+pte.pt.table = 1;
 xen_second[second_table_offset(XEN_VIRT_START)] = pte;
 
 /* ... Fixmap */
@@ -693,31 +713,6 @@ void __init setup_pagetables(unsigned long 
boot_phys_offset, paddr_t xen_paddr)
 clear_table(boot_second);
 clear_table(boot_third);
 
-/* Break up the Xen mapping into 4k pages and protect them separately. */
-for ( i = 0; i < LPAE_ENTRIES; i++ )
-{
-mfn_t mfn = mfn_add(maddr_to_mfn(xen_paddr), i);
-unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT);
-if ( !is_kernel(va) )
-break;
-pte = mfn_to_xen_entry(mfn, MT_NORMAL);
-pte.pt.table = 1; /* 4k mappings always have this bit set */
-if ( is_kernel_text(va) || is_kernel_inittext(va) )
-{
-pte.pt.xn = 0;
-pte.pt.ro = 1;
-}
-if ( is_kernel_rodata(va) )
-pte.pt.ro = 1;
-write_pte(xen_xenmap + i, pte);
-/* No flush required here as page table is not hooked in yet. */
-}
-
-pte = pte_of_xenaddr((vaddr_t)xen_xenmap);
-pte.pt.table = 1;
-write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte);
-/* TLBFLUSH and ISB would be needed here, but wait until we set WXN */
-
 /* From now on, no mapping may be both writable and executable. */
 WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2);
 /* Flush everything after setting WXN bit. */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] xen/arm: Stop relocating Xen

2018-11-29 Thread Julien Grall

At the moment, Xen is relocated towards the end of the memory. While
this has the advantage to free space in low memory, the code is not
compliant with the break-before-make because it requires to switch
between two sets of page-table. This is not entirely trivial to fix as
it would require us to go through an identity mapping and disabling MMU.

Furthermore, it looks like that some platform (such as the Hikey960)
may not be able to bring-up secondary CPUs if the entry is too high.

I don't believe the low memory is an issue because Xen is quite tiny
(< 2MB). So the best solution is to stop relocating Xen. This has the
advantage to simplify the code and should speed-up the boot as relocation
is not necessary anymore.

Note that the break-before-make issue is not fixed by this patch.

Signed-off-by: Julien Grall 
Reported-by: Matthew Daley 
---
 xen/arch/arm/arm32/head.S | 54 +++
 xen/arch/arm/arm64/head.S | 50 +---
 xen/arch/arm/mm.c | 20 ---
 xen/arch/arm/setup.c  | 65 +++
 xen/include/asm-arm/mm.h  |  2 +-
 5 files changed, 18 insertions(+), 173 deletions(-)

diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 93b51e9ef2..390a505e05 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -469,58 +469,12 @@ fail:   PRINT("- Boot failed -\r\n")
 GLOBAL(_end_boot)
 
 /*
- * Copy Xen to new location and switch TTBR
+ * Switch TTBR
  * r1:r0   ttbr
- * r2  source address
- * r3  destination address
- * [sp]=>r4length
  *
- * Source and destination must be word aligned, length is rounded up
- * to a 16 byte boundary.
- *
- * MUST BE VERY CAREFUL when saving things to RAM over the copy
+ * TODO: This code does not comply with break-before-make.
  */
-ENTRY(relocate_xen)
-push {r4,r5,r6,r7,r8,r9,r10,r11}
-
-ldr   r4, [sp, #8*4]/* Get 4th argument from stack */
-
-/* Copy 16 bytes at a time using:
- * r5:  counter
- * r6:  data
- * r7:  data
- * r8:  data
- * r9:  data
- * r10: source
- * r11: destination
- */
-mov   r5, r4
-mov   r10, r2
-mov   r11, r3
-1:  ldmia r10!, {r6, r7, r8, r9}
-stmia r11!, {r6, r7, r8, r9}
-
-subs  r5, r5, #16
-bgt   1b
-
-/* Flush destination from dcache using:
- * r5: counter
- * r6: step
- * r7: vaddr
- */
-dsb/* So the CPU issues all writes to the range */
-
-mov   r5, r4
-ldr   r6, =dcache_line_bytes /* r6 := step */
-ldr   r6, [r6]
-mov   r7, r3
-
-1:  mcr   CP32(r7, DCCMVAC)
-
-add   r7, r7, r6
-subs  r5, r5, r6
-bgt   1b
-
+ENTRY(switch_ttbr)
 dsb/* Ensure the flushes happen before
 * continuing */
 isb/* Ensure synchronization with previous
@@ -543,8 +497,6 @@ ENTRY(relocate_xen)
 dsb/* Ensure completion of TLB+BP flush */
 isb
 
-pop {r4, r5,r6,r7,r8,r9,r10,r11}
-
 mov pc, lr
 
 #ifdef CONFIG_EARLY_PRINTK
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 9428c3f5a2..4589a37874 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -605,52 +605,14 @@ fail:   PRINT("- Boot failed -\r\n")
 
 GLOBAL(_end_boot)
 
-/* Copy Xen to new location and switch TTBR
- * x0ttbr
- * x1source address
- * x2destination address
- * x3length
+/*
+ * Switch TTBR
  *
- * Source and destination must be word aligned, length is rounded up
- * to a 16 byte boundary.
+ * x0ttbr
  *
- * MUST BE VERY CAREFUL when saving things to RAM over the copy */
-ENTRY(relocate_xen)
-/* Copy 16 bytes at a time using:
- *   x9: counter
- *   x10: data
- *   x11: data
- *   x12: source
- *   x13: destination
- */
-mov x9, x3
-mov x12, x1
-mov x13, x2
-
-1:  ldp x10, x11, [x12], #16
-stp x10, x11, [x13], #16
-
-subsx9, x9, #16
-bgt 1b
-
-/* Flush destination from dcache using:
- * x9: counter
- * x10: step
- * x11: vaddr
- */
-dsb   sy/* So the CPU issues all writes to the range */
-
-mov   x9, x3
-ldr   x10, =dcache_line_bytes /* x10 := step */
-ldr   x10, [x10]
-mov   x11, x2
-
-1:  dccvac, x11
-
-add   x11, x11, x10
-subs  x9, x9, x10
-bgt   1b
-
+ * TODO: This code does not comply with break-before-make.
+ */
+ENTRY(switch_ttbr)
 dsb   sy /* Ensure the flushes happen before
   * continuing */
 isb

Re: [Xen-devel] [PATCH] tools/firmware: update OVMF Makefile

2018-11-29 Thread Wei Liu

On Thu, Nov 29, 2018 at 11:31:41AM +, Anthony PERARD wrote:
> On Wed, Nov 28, 2018 at 05:43:33PM +, Wei Liu wrote:
> > OVMF has become dependent on OpenSSL, which it is included as a submodule.
> > Initialise submodules before building.
> > 
> > Signed-off-by: Wei Liu 
> > ---
> > This should fix the build breakage for OVMF branch in OSSTEST.
> > 
> > Cc: Anthony PERARD 
> > Cc: Ian Jackson 
> > ---
> >  tools/firmware/ovmf-makefile | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/tools/firmware/ovmf-makefile b/tools/firmware/ovmf-makefile
> > index 2838744461..3de2fc0300 100644
> > --- a/tools/firmware/ovmf-makefile
> > +++ b/tools/firmware/ovmf-makefile
> > @@ -16,6 +16,7 @@ all: build
> >  
> >  .PHONY: build
> >  build:
> > +   $(GIT) submodule update --init --recursive
> > OvmfPkg/build.sh -a X64 -b $(TARGET) -n 4
> > cp Build/OvmfX64/$(TARGET)_GCC*/FV/OVMF.fd ovmf.bin
> >  
> 
> What about the release tarball? Do we includes OVMF in it?

Yes we do. But this should work because the Makefile is also shipped.
What does qemu-xen do regarding its submodules? OVMF should just follow
suite.

> 
> Also, doesn't osstest needs some updates? I forgot if there is something
> to do when projects have submodules.

Yes there is some special arrangement for libvirt. Not sure what needs
to be done for OVMF since it is part of xen.git.

Ian?

Wei.

> 
> -- 
> Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 9/9] libxl: Kill QEMU with "reaper" ruid

2018-11-29 Thread George Dunlap


> On Nov 28, 2018, at 6:33 PM, George Dunlap  wrote:
> 
>>> -ret = setresuid(dm_uid, dm_uid, 0);
>>> +fd = open(lockfile, O_RDWR|O_CREAT, 0666);
>>> +if (fd < 0) {
>>> +/* All other errno: EBADF, EINVAL, ENOLCK, EWOULDBLOCK */
>>> +LOGED(ERROR, domid,
>>> +  "unexpected error while trying to open lockfile %s, 
>>> errno=%d",
>>> +  lockfile, errno);
>>> +goto kill;
>> 
>> More gotos!  I doubt this error handling is right.
>> 
>> I'm also not convinced that it is sensible to handle the case where we
>> have multiple per-domain uids but no reaper uid.  This turns host
>> configuration errors into systems that are less secure than they
>> should be in a really obscure way.
> 
> At this point, we have a target_uid but no way of getting a lock for 
> reaper_uid.  We have three options:
> 
> 1. Don’t kill(-1) at all.
> 2. Try to  kill(-1) with setresuid(target_uid, target_uid, 0)
> 3. kill(-1) with setresuid(reaper_uid, target_uid, 0) without holding the 
> lock.
> 
> #1 means that a rogue qemu will not be destroyed.
> 
> #2 means that there’s a race, whereby sometimes the rogue qemu is destroyed, 
> and sometimes the reaper process is destroyed by the rogue qemu first.
> 
> #3 means there’s a race, whereby sometimes everything works fine, sometimes 
> both the rogue qemu and *the reaper process from another domain* is 
> destroyed, and sometimes this reaper process is killed by the reaper process 
> from another domain (leaving the rogue qemu alive).
> 
> I think #1 is obviously the best option.

Sorry, this should say, ‘I think #2 is the best option’.

 -G
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/3] Remove tmem

2018-11-29 Thread Wei Liu

On Wed, Nov 28, 2018 at 09:50:33PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 28, 2018 at 01:58:03PM +, Wei Liu wrote:
> > It is agreed that tmem can be removed from xen.git. See the thread starting 
> > 
> >   
> > from .
> > 
> > In this version:
> > 
> > 1. Remove some residuals from previous version and fix all build errors
> >discovered by Gitlab CI.
> > 2. Swap the order of patches to make sure bisection still works. This
> >is verified by calling
> >   `./automation/scripts/build-test.sh origin/staging HEAD`
> > 3. Make sure Xen still boots and passes all XTF tests after the removal.
> > 4. Keep public/tmem.h.
> 
> Please also remove the entry in the MAINTAINERS file.

Sure. I will fold that into the hypervisor patch.

> 
> Acked-by: Konrad Rzeszutek Wilk 

Thank you!

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-4.7-testing test] 130826: regressions - FAIL

2018-11-29 Thread osstest service owner

flight 130826 xen-4.7-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/130826/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64  broken  in 130773
 build-arm64-pvopsbroken  in 130773
 build-arm64-xsm  broken  in 130773
 test-xtf-amd64-amd64-5 50 xtf/test-hvm64-lbr-tsx-vmentry fail REGR. vs. 129540

Tests which are failing intermittently (not blocking):
 test-xtf-amd64-amd64-3   50 xtf/test-hvm64-lbr-tsx-vmentry fail pass in 130773

Regressions which are regarded as allowable (not blocking):
 build-arm64-xsm 2 hosts-allocate broken in 130773 REGR. vs. 129540
 build-arm64 2 hosts-allocate broken in 130773 REGR. vs. 129540
 build-arm64-pvops   2 hosts-allocate broken in 130773 REGR. vs. 129540

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 130773 n/a
 build-arm64-libvirt   1 build-check(1)   blocked in 130773 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 130773 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 130773 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 130773 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 130773 n/a
 build-arm64-xsm  3 capture-logs broken in 130773 blocked in 129540
 build-arm64-pvops3 capture-logs broken in 130773 blocked in 129540
 build-arm64  3 capture-logs broken in 130773 blocked in 129540
 test-xtf-amd64-amd64-3   69 xtf/test-hvm64-xsa-278  fail blocked in 129540
 test-xtf-amd64-amd64-5 69 xtf/test-hvm64-xsa-278 fail in 130773 blocked in 
129540
 test-xtf-amd64-amd64-1 50 xtf/test-hvm64-lbr-tsx-vmentry fail in 130773 like 
129540
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 129540
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 129540
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 129540
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 129540
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail like 129540
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 129540
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 129540
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 129540
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 129540
 test-amd64-amd64-xl-rtds 10 debian-install   fail  like 129540
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 129540
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 129540
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail

Re: [Xen-devel] [PATCH V10 5/5] p2m: change_type_range: Only invalidate mapped gfns

2018-11-29 Thread Razvan Cojocaru

On 11/29/18 12:07 PM, Jan Beulich wrote:
 On 28.11.18 at 22:56,  wrote:
>> change_range_type() invalidates gfn ranges to lazily change the type
>> of a range of gfns, and also modifies the logdirty rangesets of that
>> p2m. At the moment, it clips both down by the hostp2m.
>>
>> While this will result in correct behavior, it's not entirely efficient,
>> since invalidated entries outside that range will, on fault, simply be
>> modified back to "empty" before faulting normally again.
>>
>> Separate out the calculation of the two ranges.  Keep using the
>> hostp2m's max_mapped_pfn to clip the logdirty ranges, but use the
>> current p2m's max_mapped_pfn to further clip the invalidation range
>> for alternate p2ms.
>>
>> Signed-off-by: George Dunlap 
>> Signed-off-by: Razvan Cojocaru 
>> Reviewed-by: Jan Beulich 
>>
>> ---
>> CC: George Dunlap 
>> CC: Jan Beulich 
>> CC: Andrew Cooper 
>> CC: Wei Liu 
>> CC: "Roger Pau Monné" 
>>
>> ---
>> Changes since V9:
>>  - Corrected function name in patch subject.
> 
> Funny: I didn't even notice the issue in the patch title. I had
> pointed it out against the start of the description, where it
> still exists.

Right, I'll correct that as well.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 8/9] libxl: Kill QEMU by uid when possible

2018-11-29 Thread Wei Liu

On Wed, Nov 28, 2018 at 03:57:58PM +, Anthony PERARD wrote:
> On Fri, Nov 23, 2018 at 05:18:59PM +, George Dunlap wrote:
> > On 11/23/18 5:15 PM, George Dunlap wrote:
> > Does libxl__qmp_cleanup() need to be called after the kill() happens?
> > If not, we could put this before the kill() and avoid having two call sites.
> 
> QEMU is supposed to create monitor sockets before the guest is running,
> even before it drops priviledge, so I don't think it matter when we `rm`
> those qmp sockets. There are only useful to libxl anyway, once libxl
> don't needs them they can be removed.
> 
> So, before kill() should be fine.

With this scheme, my question is supposedly there is a rogue QEMU, will
it be able to recreate these sockets again by forking so we may end up
having some garbage lying around after it has been killed?

Wei.

> 
> -- 
> Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/xenstore: domain can sometimes disappear when destroying connection

2018-11-29 Thread Wei Liu

On Mon, Nov 26, 2018 at 01:22:04PM +, Petre Eftime wrote:
> There is a circular link formed between domain and a connection. In certain
> circustances, when conn is freed, domain is also freed, which leads to use

circumstances.

> after free when trying to set the conn field in domain to null.
> 
> Signed-off-by: Petre Eftime 


> ---
>  tools/xenstore/xenstored_domain.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xenstore/xenstored_domain.c 
> b/tools/xenstore/xenstored_domain.c
> index fa6655033a..f085d40476 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -222,6 +222,7 @@ static void domain_cleanup(void)
>  {
>   xc_dominfo_t dominfo;
>   struct domain *domain;
> + struct connection *tmp_conn;

This can be moved to a smaller scope inside the if statement.

Anyway, I'm happy to fix these issues while committing:

Acked-by: Wei Liu 

>   int notify = 0;
>  
>   again:
> @@ -238,8 +239,14 @@ static void domain_cleanup(void)
>   continue;
>   }
>   if (domain->conn) {
> - talloc_unlink(talloc_autofree_context(), domain->conn);
> + /*
> +  * In certain circumstances conn owns domain and
> +  * domain will be freed when conn is unlinked.
> +  */
> + tmp_conn = domain->conn;
>   domain->conn = NULL;
> +
> + talloc_unlink(talloc_autofree_context(), tmp_conn);
>   notify = 0; /* destroy_domain() fires the watch */
>   goto again;
>   }
> -- 
> 2.16.5
> 
> 
> 
> 
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
> Romania. Registration number J22/2621/2005.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/firmware: update OVMF Makefile

2018-11-29 Thread Anthony PERARD

On Thu, Nov 29, 2018 at 11:39:54AM +, Wei Liu wrote:
> On Thu, Nov 29, 2018 at 11:31:41AM +, Anthony PERARD wrote:
> > What about the release tarball? Do we includes OVMF in it?
> 
> Yes we do. But this should work because the Makefile is also shipped.

The fact that the Makefile is shipped doesn't matter... Also of course
it would work, as long as the machine that build xen have access to
internet.

If we ship OVMF sources code, we need to include its submodules, so we
need to ship openssl source code. I think I've look into removing this
build dependency, but I don't think that was possible with simple build
options.

> What does qemu-xen do regarding its submodules? OVMF should just follow
> suite.

Nothing, qemu-xen isn't a project. If you meant "what does xen.git do
regarding qemu-xen's submodules" then the answer is, we call a script
from qemu to have it export whatever it needs to build (the fact that
there are submodules are transparent to users building qemu, like xen
do).

OVMF/edk2 is a bit special, its build system doesn't know about external
source code and teaching it to do something about it is probably not
going to be simple.

So we can't treat OVMF like QEMU, regarding the build, and releases.

We probably need a script to export git-submodules in xen.git.

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 8/9] libxl: Kill QEMU by uid when possible

2018-11-29 Thread George Dunlap



> On Nov 29, 2018, at 11:55 AM, Wei Liu  wrote:
> 
> On Wed, Nov 28, 2018 at 03:57:58PM +, Anthony PERARD wrote:
>> On Fri, Nov 23, 2018 at 05:18:59PM +, George Dunlap wrote:
>>> On 11/23/18 5:15 PM, George Dunlap wrote:
>>> Does libxl__qmp_cleanup() need to be called after the kill() happens?
>>> If not, we could put this before the kill() and avoid having two call sites.
>> 
>> QEMU is supposed to create monitor sockets before the guest is running,
>> even before it drops priviledge, so I don't think it matter when we `rm`
>> those qmp sockets. There are only useful to libxl anyway, once libxl
>> don't needs them they can be removed.
>> 
>> So, before kill() should be fine.
> 
> With this scheme, my question is supposedly there is a rogue QEMU, will
> it be able to recreate these sockets again by forking so we may end up
> having some garbage lying around after it has been killed?

No; it should at that point be deprivileged and chrooted into a directory owned 
by root; so it shouldn’t be able to create any new sockets.

It wouldn’t be terribly hard to have a common “exit” to both the kill-by-pid 
and kill-by-uid paths that did it once, but it would involve adding Yet Another 
Function; and each additional function makes the code a little bit more 
difficult to follow.

 -George
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 00/16] Old GIC (gic-vgic) optimizations for GICV2

2018-11-29 Thread Andre Przywara

On Thu, 29 Nov 2018 07:40:00 +
Andrii Anisov  wrote:

> Hello,
> 
> Again, I sent this cover letter only to myself. So, here it is, hope
> it does not break the thread. Sorry for the mess.
> 
> 
> From: Andrii Anisov 
> Sent: Wednesday, November 28, 2018 11:31 PM
> Cc: Andrii Anisov
> Subject: [RFC 00/16] Old GIC (gic-vgic) optimizations for GICV2
>   
> 
> From: Andrii Anisov 
> 
> This patch series is an attempt to reduce IRQ latency with the
> old GIC implementation (gic-vgic). These patches originally based
> on XEN 4.10 release. The motivation was to improve benchmark
> results of a system given to a customer for evaluation.

Have you tried the new VGIC? I am curious if you see a performance drop
compared to the old VGIC. It *should* be faster in the common case of
no (or maybe one) pending interrupts, because it doesn't bother to
touch LRs at all.

> This patch series is tailored for GICv2 on RCAR H3. Several
> of the most controversial patches (i.e. LRs shadowing) were
> not shared to the customer, and here are for comments and discussion.
> I hope several patches from here could be upstreamed. Some as is,
> others with modifications.
> 
> There are several simple ideas behind these changes:
>     - reduce an excessive code (condition checks)
>     - drop an excessive peripheral register accesses
>     - if not reduce, then move an excessive code out of spinlocks
>     - if not drop, then move an excessive register
>   accesses out of spinlocks

So some of the patches contain some good ideas, but as Julien said: we
would need some numbers to prove that they are worth it.

Cheers,
Andre.
 
> With porting existing patches, I've got another idea that accessing
> PER_CPU variables like `current` and `lr_mask` is not really cheap.
> So it should produce faster code keeping `lr_mask` solely in
> `struct vcpu` and pass `current` as a function parameter instead
> of calculation it each time in called functions.
> 
> Andrii Anisov (15):
>   hack: drop GIC v3 support
>   vgic: move pause_flags check out of vgic spinlock
>   vgic: move irq_to_pending out of lock
>   gic-vgic: Drop an excessive clear_lrs
>   gic: drop interrupts enabling on interrupts processing
>   gic-vgic:vgic: do not keep disabled IRQs in any of queues
>   gic: separate ppi processing
>   gic-vgic:vgic: avoid excessive conversions
>   gic:vgic:gic-vgic: introduce non-atomic bitops
>   irq: skip action avalability check for guest's IRQ
>   gic-v2: Write HCR only on change
>   gic-vgic: skip irqs locking
>   hack: arm/domain: simplify context restore from idle vcpu
>   hack: move gicv2 LRs reads and writes out of spinlocks
>   gic: vgic: align frequently accessed data by cache line size
> 
> Julien Grall (1):
>   xen/arm: Re-enable interrupt later in the trap path
> 
>  xen/arch/arm/arm64/entry.S   | 11 ++---
>  xen/arch/arm/configs/arm64_defconfig |  1 +
>  xen/arch/arm/domain.c    | 25 ++
>  xen/arch/arm/gic-v2.c    | 63 
>  xen/arch/arm/gic-v3-lpi.c    |  2 +
>  xen/arch/arm/gic-v3.c    |  2 +
>  xen/arch/arm/gic-vgic.c  | 84
> +++- xen/arch/arm/gic.c
> | 32 +++-- xen/arch/arm/irq.c   | 46
> +++--- xen/arch/arm/traps.c |  6 +++
>  xen/arch/arm/vgic-v3-its.c   |  3 +-
>  xen/arch/arm/vgic.c  | 93
> ++--
> xen/arch/arm/vgic/vgic.c |  2 +
> xen/include/asm-arm/config.h |  2 +-
> xen/include/asm-arm/gic.h    | 10 ++--
> xen/include/asm-arm/irq.h    |  3 ++
> xen/include/asm-arm/vgic.h   | 24 ++
> xen/include/xen/sched.h  |  1 + 18 files changed, 299
> insertions(+), 111 deletions(-)
> 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 02/16] hack: drop GIC v3 support

2018-11-29 Thread Andre Przywara

On Wed, 28 Nov 2018 23:31:57 +0200
Andrii Anisov  wrote:

Hi,

> From: Andrii Anisov 
> 
> This reduces some code and conditions in an IRQ processing path,
> and opens way to further code reduction.

While I understand that this is some sort of a hack, I am commenting
just on this patch to demonstrate some issues I see all over the series.
So please apply those ideas to the other patches as well if applicable.

> Also insert compilation errors into gicv3 code, because it would
> not compile or work with following patches.
> 
> Signed-off-by: Andrii Anisov 
> ---
>  xen/arch/arm/configs/arm64_defconfig |  1 +
>  xen/arch/arm/gic-v3-lpi.c|  2 ++
>  xen/arch/arm/gic-v3.c|  2 ++
>  xen/arch/arm/gic-vgic.c  | 13 -
>  xen/arch/arm/gic.c   |  6 ++
>  xen/arch/arm/vgic-v3-its.c   |  1 +
>  xen/arch/arm/vgic.c  | 20 
>  xen/arch/arm/vgic/vgic.c |  2 ++
>  xen/include/asm-arm/irq.h|  2 ++
>  xen/include/asm-arm/vgic.h   | 17 +++--
>  10 files changed, 59 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/arm/configs/arm64_defconfig
> b/xen/arch/arm/configs/arm64_defconfig index e69de29..452c7ac 100644
> --- a/xen/arch/arm/configs/arm64_defconfig
> +++ b/xen/arch/arm/configs/arm64_defconfig
> @@ -0,0 +1 @@
> +# CONFIG_GICV3 is not set
> diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
> index e8c6e15..be64e17 100644
> --- a/xen/arch/arm/gic-v3-lpi.c
> +++ b/xen/arch/arm/gic-v3-lpi.c
> @@ -32,6 +32,8 @@
>  #include 
>  #include 
>  
> +#error "The current gic/vgic/domain code does not support GICv3"
> +

I didn't check too thoroughly, but why do you need this? Your code
changes below seem to be guarded by #ifdef CONFIG_GICV3, so this file
(and the next) wouldn't even be build.

>  /*
>   * There could be a lot of LPIs on the host side, and they always go
> to
>   * a guest. So having a struct irq_desc for each of them would be
> wasteful diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index 6fbc106..8e835b5 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -44,6 +44,8 @@
>  #include 
>  #include 
>  
> +#error "The current gic/vgic/domain code does not support GICv3"
> +
>  /* Global state */
>  static struct {
>  void __iomem *map_dbase;  /* Mapped address of distributor
> registers */ diff --git a/xen/arch/arm/gic-vgic.c
> b/xen/arch/arm/gic-vgic.c index 990399c..869987a 100644
> --- a/xen/arch/arm/gic-vgic.c
> +++ b/xen/arch/arm/gic-vgic.c
> @@ -36,7 +36,9 @@ static inline void gic_set_lr(int lr, struct
> pending_irq *p, {
>  ASSERT(!local_irq_is_enabled());
>  
> +#ifdef CONFIG_GICV3
>  clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
> +#endif
>  
>  gic_hw_ops->update_lr(lr, p->irq, p->priority,
>p->desc ? p->desc->irq : INVALID_IRQ,
> state); @@ -77,9 +79,11 @@ void gic_raise_inflight_irq(struct vcpu
> *v, unsigned int virtual_irq) {
>  struct pending_irq *n = irq_to_pending(v, virtual_irq);
>  
> +#ifdef CONFIG_GICV3
>  /* If an LPI has been removed meanwhile, there is nothing left
> to raise. */ if ( unlikely(!n) )
>  return;
> +#endif

So are you sure that this really saves you something? This check boils
down to a:
cbz x0, 670 
in assembly, which is basically free on modern CPUs.
Surprisingly removing this removes some more instructions from the
function, it would be interesting to know why.

In general I get the impression you optimise things just by looking at
the C code. Does saving a few instructions here and there really make a
difference? For instance I'd say that any non-memory instructions are
not worth to think about, and any stack accesses are probably caught in
the L1 cache.

In general I believe your performance drop is due to *exits* from the
guests due to the h/w interrupts, not necessarily because of the VGIC
code. So to reduce latency I think it would be more worthwhile to see
if we can reduce the world switch time, by avoiding unnessary
save/restores.
At least that saved a lot in KVM (with VHE), although the conditions
there are quite different.

>  ASSERT(spin_is_locked(&v->arch.vgic.lock));
>  
> @@ -112,13 +116,14 @@ static unsigned int gic_find_unused_lr(struct
> vcpu *v, {
>  unsigned int nr_lrs = gic_get_nr_lrs();
>  unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
> -struct gic_lr lr_val;
>  
>  ASSERT(spin_is_locked(&v->arch.vgic.lock));
>  
> +#ifdef CONFIG_GICV3

I am not against removing GICv3 code from the code path if that is
not configured, but obviously those #ifdef's directly in the code are
hideous.
If you can prove that this saves you something, you could think
factoring out those functions, and define them as empty if CONFIG_GICV3
is not defined.
In this case you could do (outside of this function):

#ifdef CONFIG_GICV3
#define lpi_pristine_bit_set(p)

Re: [Xen-devel] [PATCH v6 11/20] xen: setup hypercall page for PVH

2018-11-29 Thread Daniel Kiper

On Wed, Nov 28, 2018 at 02:55:21PM +0100, Juergen Gross wrote:
> Add the needed code to setup the hypercall page for calling into the
> Xen hypervisor.
>
> Import the XEN_HVM_DEBUGCONS_IOPORT define from Xen unstable into
> include/xen/arch-x86/xen.h
>
> Signed-off-by: Juergen Gross 
> Reviewed-by: Roger Pau Monné 
> ---
> V3: grub_xen_early_halt->grub_xen_panic (Roger Pau Monné)
> issue panic message (Roger Pau Monné)
> rewrite grub_xen_hypercall to avoid register variables (Daniel Kiper)
> V5: Use XEN_HVM_DEBUGCONS_IOPORT from Xen unstable (Roger Pau Monné)
> Issue "System halted!" in panic (Daniel Kiper)
> Clear interrupts and loop for halting (Roger Pau Monné, Daniel Kiper)
> Use only one dummy variable for hypercall asm statement
> V6: Added some comments (Daniel Kiper)
> Use "+x" constraints instead of dummy variable (Daniel Kiper)
> ---
>  grub-core/kern/i386/xen/pvh.c | 80 
> +++
>  include/xen/arch-x86/xen.h|  7 
>  2 files changed, 87 insertions(+)
>
> diff --git a/grub-core/kern/i386/xen/pvh.c b/grub-core/kern/i386/xen/pvh.c
> index 4f629b15e..a2554fb1d 100644
> --- a/grub-core/kern/i386/xen/pvh.c
> +++ b/grub-core/kern/i386/xen/pvh.c
> @@ -20,15 +20,95 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>
>  grub_uint64_t grub_rsdp_addr;
>
> +static char hypercall_page[GRUB_XEN_PAGE_SIZE]
> +  __attribute__ ((aligned (GRUB_XEN_PAGE_SIZE)));
> +
> +static grub_uint32_t xen_cpuid_base;
> +
> +static void
> +grub_xen_cons_msg (const char *msg)
> +{
> +  const char *c;
> +
> +  for (c = msg; *c; c++)
> +grub_outb (*c, XEN_HVM_DEBUGCONS_IOPORT);
> +}
> +
> +static void
> +grub_xen_panic (const char *msg)
> +{
> +  grub_xen_cons_msg (msg);
> +  grub_xen_cons_msg ("System halted!\n");
> +
> +  asm volatile ("cli");
> +
> +  while (1)
> +{
> +  asm volatile ("hlt");
> +}
> +}
> +
> +static void
> +grub_xen_cpuid_base (void)
> +{
> +  grub_uint32_t base, eax, signature[3];
> +
> +  for (base = 0x4000; base < 0x4001; base += 0x100)
> +{
> +  grub_cpuid (base, eax, signature[0], signature[1], signature[2]);
> +  if (!grub_memcmp ("XenVMMXenVMM", signature, 12) && (eax - base) >= 2)
> + {
> +   xen_cpuid_base = base;
> +   return;
> + }
> +}
> +
> +  grub_xen_panic ("Found no Xen signature!\n");
> +}
> +
> +static void
> +grub_xen_setup_hypercall_page (void)
> +{
> +  grub_uint32_t msr, addr, eax, ebx, ecx, edx;
> +
> +  /* Get base address of Xen-specific MSRs. */
> +  grub_cpuid (xen_cpuid_base + 2, eax, ebx, ecx, edx);
> +  msr = ebx;
> +  addr = (grub_uint32_t) (&hypercall_page);
> +
> +  /* Specify hypercall page address for Xen. */
> +  asm volatile ("wrmsr" : : "c" (msr), "a" (addr), "d" (0) : "memory");
> +}
> +
> +int
> +grub_xen_hypercall (grub_uint32_t callno, grub_uint32_t a0,
> + grub_uint32_t a1, grub_uint32_t a2,
> + grub_uint32_t a3, grub_uint32_t a4,
> + grub_uint32_t a5 __attribute__ ((unused)))
> +{
> +  grub_uint32_t res;
> +
> +  asm volatile ("call *%[callno]"
> + : "=a" (res), "+b" (a0), "+c" (a1), "+d" (a2),
> +   "+S" (a3), "+D" (a4)
> + : [callno] "a" (&hypercall_page[callno * 32])

I am not sure why you so tightly tie compiler hands using "a" constraint for
[callno]. IMO you can use "rm" here. However, this not change anything in
the output code, so, Reviewed-by: Daniel Kiper 

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 10/16] gic:vgic:gic-vgic: introduce non-atomic bitops

2018-11-29 Thread Andre Przywara

On Wed, 28 Nov 2018 23:32:05 +0200
Andrii Anisov  wrote:

Hi,

> From: Andrii Anisov 
> 
> All bit operations for gic, vgic and gic-vgic are performed under
> spinlocks, so there is no need for atomic bit ops here, they only
> introduce excessive call to functions used more expensive exclusive
> ARM instructions.
> 
> Signed-off-by: Andrii Anisov 
> ---
>  xen/arch/arm/gic-vgic.c | 16 
>  xen/arch/arm/gic.c  | 16 
>  xen/arch/arm/vgic.c | 16 
>  3 files changed, 48 insertions(+)
> 
> diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
> index f0e6c6f..5b73bbd 100644
> --- a/xen/arch/arm/gic-vgic.c
> +++ b/xen/arch/arm/gic-vgic.c
> @@ -25,6 +25,22 @@
>  #include 
>  #include 
>  
> +#undef set_bit
> +#define set_bit(nr, addr) (*(addr) |= (1< +
> +#undef clear_bit
> +#define clear_bit(nr, addr) (*(addr) &= ~(1< +
> +#undef test_bit
> +#define test_bit(nr,addr) (*(addr) & (1< +
> +#undef test_and_clear_bit
> +#define test_and_clear_bit(nr,addr) ({\
> +bool _x; \
> +_x = (*(addr) & (1< +(*(addr) &= ~(1< +(_x);})
> +
>  #define lr_all_full() (this_cpu(lr_mask) == ((1 << gic_get_nr_lrs())
> - 1)) 
>  #undef GIC_DEBUG
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 52e4270..d558059 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -40,6 +40,22 @@
>  
>  DEFINE_PER_CPU(uint64_t, lr_mask);
>  
> +#undef set_bit
> +#define set_bit(nr, addr) (*(addr) |= (1< +
> +#undef clear_bit
> +#define clear_bit(nr, addr) (*(addr) &= ~(1< +
> +#undef test_bit
> +#define test_bit(nr,addr) (*(addr) & (1< +
> +#undef test_and_clear_bit
> +#define test_and_clear_bit(nr,addr) ({\
> +bool _x; \
> +_x = (*(addr) & (1< +(*(addr) &= ~(1< +(_x);})
> +
>  #undef GIC_DEBUG
>  
>  const struct gic_hw_operations *gic_hw_ops;
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index c142476..7691310 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -33,6 +33,22 @@
>  #include 
>  #include 
>  
> +#undef set_bit

Nah, please don't do this. Can you show that atomic bit ops are a
problem? They shouldn't be expensive unless contended, also pretty
lightweight on small systems (single cluster).

But if you really think this is useful, why not go with the Linux way
of using __set_bit to provide a non-atomic version?
This would have the big advantage that you can replace them on a
case-by-case base, which is much less risky than unconditionally
replacing every (even future!) usage in the whole file.

Cheers,
Andre.

> +#define set_bit(nr, addr) (*(addr) |= (1< +
> +#undef clear_bit
> +#define clear_bit(nr, addr) (*(addr) &= ~(1< +
> +#undef test_bit
> +#define test_bit(nr,addr) (*(addr) & (1< +
> +#undef test_and_clear_bit
> +#define test_and_clear_bit(nr,addr) ({\
> +bool _x = (*(addr) & (1< +(*(addr) &= ~(1< +return (_x); \
> +})
> +
>  static inline struct vgic_irq_rank *vgic_get_rank(struct vcpu *v,
> int rank) {
>  if ( rank == 0 )


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 16/20] grub-module-verifier: Ignore all_video for xenpvh

2018-11-29 Thread Daniel Kiper

On Wed, Nov 28, 2018 at 11:21:54PM +0100, Hans van Kranenburg wrote:
> On 11/28/18 2:55 PM, Juergen Gross wrote:
> > From: Hans van Kranenburg 
> >
> > This solves the build failing with "Error: no symbol table and no
> > .moddeps section"
> >
> > Also see:
> > - 6371e9c10433578bb236a8284ddb9ce9e201eb59
> > - https://savannah.gnu.org/bugs/?49012
> >
> > Signed-off-by: Hans van Kranenburg 
> > Reviewed-by: Daniel Kiper 
>
> Just a small detail... The xenpvh in the subject was not renamed to
> xen_pvh. That can probably be fixed up while committing. :)

Yeah, I can do that...

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] pvcalls-front: Use GFP_ATOMIC under spin_lock

2018-11-29 Thread Wen Yang

The problem is that we call this with a spin lock held.
The call tree is:
pvcalls_front_accept() holds bedata->socket_lock.
-> create_active()
-> __get_free_pages() uses GFP_KERNEL

The create_active() function is only called from pvcalls_front_accept()
with a spin_lock held, The allocation is not allowed to sleep and
GFP_KERNEL is not sufficient, it has to be ATOMIC.

This issue was detected by using the Coccinelle software.

Signed-off-by: Wen Yang 
CC: Julia Lawall 
CC: Boris Ostrovsky 
CC: Juergen Gross 
CC: Stefano Stabellini 
CC: xen-devel@lists.xenproject.org
CC: linux-ker...@vger.kernel.org
---
 drivers/xen/pvcalls-front.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 2f11ca72a281..f2bbc06a0f7f 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -344,11 +344,11 @@ static int create_active(struct sock_mapping *map, int 
*evtchn)
init_waitqueue_head(&map->active.inflight_conn_req);
 
map->active.ring = (struct pvcalls_data_intf *)
-   __get_free_page(GFP_KERNEL | __GFP_ZERO);
+   __get_free_page(GFP_ATOMIC | __GFP_ZERO);
if (map->active.ring == NULL)
goto out_error;
map->active.ring->ring_order = PVCALLS_RING_ORDER;
-   bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+   bytes = (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO,
PVCALLS_RING_ORDER);
if (bytes == NULL)
goto out_error;
-- 
2.19.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/9] libxl: Get rid of support for QEMU_USER_BASE (xen-qemuuser-domidNN)

2018-11-29 Thread Ian Jackson

George Dunlap writes ("Re: [PATCH 3/9] libxl: Get rid of support for 
QEMU_USER_BASE (xen-qemuuser-domidNN)"):
> I’d personally just as soon leave it out (and add it back in if someone asks 
> for it), but if you think it has value I can leave it in and do the work of 
> thinking about the logic.

OK, fine, I'm sold.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/9] dm_depriv: Describe expected usage of device_model_user parameter

2018-11-29 Thread Ian Jackson

George Dunlap writes ("Re: [PATCH 4/9] dm_depriv: Describe expected usage of 
device_model_user parameter"):
> Because the feature is already implemented and working correctly according to 
> the pre-series semantics (AFAICT), but not documented (other than a comment 
> in libxl_types.idl saying, “is not ready for use yet” (which I suppose I 
> should remove while I’m at it)).

Oh!   Yes, please do remove that comment then.

> Given that...
> 
> > So, I guess,
> > 
> > Acked-by: Ian Jackson 
> 
> Does the Ack still stand?

Yes.

ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 00/20] xen: add pvh guest support

2018-11-29 Thread Daniel Kiper

On Wed, Nov 28, 2018 at 02:55:10PM +0100, Juergen Gross wrote:
> This patch series adds support for booting Linux as PVH guest.
>
> Similar to i386/xen and x86_64/xen platforms the new i386/xenpvh
> platform grub is booted as a standalone image directly by Xen.
>
> For booting Linux kernel it is using the standard linux kernel
> loader. The only modification of the linux loader is to pass the
> ACPI RSDP address via boot parameters to the kernel, as that table
> might not be located at the usual physical address just below 1MB.
>
> The related Linux kernel patches have been accepted for 4.20-rc4

Patch set LGTM. If there are no objections I will apply it in a week
or so.

Hans, may I add "Tested-by: Hans van Kranenburg " to
the patches?

Juergen, thank you for doing the work.

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 8/9] libxl: Kill QEMU by uid when possible

2018-11-29 Thread Ian Jackson

George Dunlap writes ("Re: [PATCH 8/9] libxl: Kill QEMU by uid when possible"):
> It wouldn’t be terribly hard to have a common “exit” to both the
> kill-by-pid and kill-by-uid paths that did it once, but it would
> involve adding Yet Another Function; and each additional function
> makes the code a little bit more difficult to follow.

I'm afraid I disagree on this point.

I agree that additional functions should be avoided where they are
needless.  But a single exit path is more important.

Without a single exit path, someone who modifies this code in the
fututure (to add new state, say) will risk altering only one of the
exit paths.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 8/9] libxl: Kill QEMU by uid when possible

2018-11-29 Thread George Dunlap



> On Nov 29, 2018, at 12:26 PM, Ian Jackson  wrote:
> 
> George Dunlap writes ("Re: [PATCH 8/9] libxl: Kill QEMU by uid when 
> possible"):
>> It wouldn’t be terribly hard to have a common “exit” to both the
>> kill-by-pid and kill-by-uid paths that did it once, but it would
>> involve adding Yet Another Function; and each additional function
>> makes the code a little bit more difficult to follow.
> 
> I'm afraid I disagree on this point.
> 
> I agree that additional functions should be avoided where they are
> needless.  But a single exit path is more important.
> 
> Without a single exit path, someone who modifies this code in the
> fututure (to add new state, say) will risk altering only one of the
> exit paths.

“Creating an extra function to avoid moving qmp_cleanup earlier” doesn’t sound 
like a good reason to me. “Creating an extra function to make sure future 
modifications have only one exit path” sounds like a good reason, though.

 -George
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 00/20] xen: add pvh guest support

2018-11-29 Thread Hans van Kranenburg

Hi Daniel,

On 11/29/18 1:22 PM, Daniel Kiper wrote:
> On Wed, Nov 28, 2018 at 02:55:10PM +0100, Juergen Gross wrote:
>> This patch series adds support for booting Linux as PVH guest.
>>
>> Similar to i386/xen and x86_64/xen platforms the new i386/xenpvh
>> platform grub is booted as a standalone image directly by Xen.
>>
>> For booting Linux kernel it is using the standard linux kernel
>> loader. The only modification of the linux loader is to pass the
>> ACPI RSDP address via boot parameters to the kernel, as that table
>> might not be located at the usual physical address just below 1MB.
>>
>> The related Linux kernel patches have been accepted for 4.20-rc4
> 
> Patch set LGTM. If there are no objections I will apply it in a week
> or so.
> 
> Hans, may I add "Tested-by: Hans van Kranenburg " to
> the patches?

Sure!

> Juergen, thank you for doing the work.

Hans

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/xenstore: domain can sometimes disappear when destroying connection

2018-11-29 Thread Wei Liu

On Mon, Nov 26, 2018 at 01:22:04PM +, Petre Eftime wrote:
> There is a circular link formed between domain and a connection. In certain
> circustances, when conn is freed, domain is also freed, which leads to use
> after free when trying to set the conn field in domain to null.

Actually, can you provide more context on this? When will the circular
link happen?

Wei.

> 
> Signed-off-by: Petre Eftime 
> ---
>  tools/xenstore/xenstored_domain.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xenstore/xenstored_domain.c 
> b/tools/xenstore/xenstored_domain.c
> index fa6655033a..f085d40476 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -222,6 +222,7 @@ static void domain_cleanup(void)
>  {
>   xc_dominfo_t dominfo;
>   struct domain *domain;
> + struct connection *tmp_conn;
>   int notify = 0;
>  
>   again:
> @@ -238,8 +239,14 @@ static void domain_cleanup(void)
>   continue;
>   }
>   if (domain->conn) {
> - talloc_unlink(talloc_autofree_context(), domain->conn);
> + /*
> +  * In certain circumstances conn owns domain and
> +  * domain will be freed when conn is unlinked.
> +  */
> + tmp_conn = domain->conn;
>   domain->conn = NULL;
> +
> + talloc_unlink(talloc_autofree_context(), tmp_conn);
>   notify = 0; /* destroy_domain() fires the watch */
>   goto again;
>   }
> -- 
> 2.16.5
> 
> 
> 
> 
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
> Romania. Registration number J22/2621/2005.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2] tools/xenstore: domain can sometimes disappear when destroying connection

2018-11-29 Thread Petre Eftime

There is a circular link formed between domain and a connection. Under certain
circumstances, when conn is freed, domain is also freed, which leads to use
after free when trying to set the conn field in domain to null.

Signed-off-by: Petre Eftime 
---
 tools/xenstore/xenstored_domain.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c
index fa6655033a..f085d40476 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -222,6 +222,7 @@ static void domain_cleanup(void)
 {
xc_dominfo_t dominfo;
struct domain *domain;
+   struct connection *tmp_conn;
int notify = 0;
 
  again:
@@ -238,8 +239,14 @@ static void domain_cleanup(void)
continue;
}
if (domain->conn) {
-   talloc_unlink(talloc_autofree_context(), domain->conn);
+   /*
+* In certain circumstances conn owns domain and
+* domain will be freed when conn is unlinked.
+*/
+   tmp_conn = domain->conn;
domain->conn = NULL;
+
+   talloc_unlink(talloc_autofree_context(), tmp_conn);
notify = 0; /* destroy_domain() fires the watch */
goto again;
}
-- 
2.16.5




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] amd-iommu: replace occurrences of bool_t with bool

2018-11-29 Thread Paul Durrant

Ping? Can I get an ack or otherwise from an AMD maintainer please?

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 26 November 2018 11:33
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant ; Suravee Suthikulpanit
> ; Brian Woods 
> Subject: [PATCH v2 1/2] amd-iommu: replace occurrences of bool_t with bool
> 
> Bring the coding style up to date. No functional change (except for
> removal of some pointless initializers).
> 
> Signed-off-by: Paul Durrant 
> Reviewed-by: Jan Beulich 
> ---
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> ---
>  xen/drivers/passthrough/amd/iommu_map.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c
> b/xen/drivers/passthrough/amd/iommu_map.c
> index c1daba8422..fde4686ee9 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -45,9 +45,9 @@ static void clear_iommu_pte_present(unsigned long
> l1_mfn, unsigned long dfn)
>  unmap_domain_page(table);
>  }
> 
> -static bool_t set_iommu_pde_present(u32 *pde, unsigned long next_mfn,
> -unsigned int next_level,
> -bool_t iw, bool_t ir)
> +static bool set_iommu_pde_present(u32 *pde, unsigned long next_mfn,
> +  unsigned int next_level,
> +  bool iw, bool ir)
>  {
>  uint64_t addr_lo, addr_hi, maddr_next;
>  u32 entry;
> @@ -123,13 +123,13 @@ static bool_t set_iommu_pde_present(u32 *pde,
> unsigned long next_mfn,
>  return need_flush;
>  }
> 
> -static bool_t set_iommu_pte_present(unsigned long pt_mfn, unsigned long
> dfn,
> -unsigned long next_mfn, int
> pde_level,
> -bool_t iw, bool_t ir)
> +static bool set_iommu_pte_present(unsigned long pt_mfn, unsigned long
> dfn,
> +  unsigned long next_mfn, int pde_level,
> +  bool iw, bool ir)
>  {
>  u64 *table;
>  u32 *pde;
> -bool_t need_flush = 0;
> +bool need_flush;
> 
>  table = map_domain_page(_mfn(pt_mfn));
> 
> @@ -347,16 +347,16 @@ static void set_pde_count(u64 *pde, unsigned int
> count)
>  /* Return 1, if pages are suitable for merging at merge_level.
>   * otherwise increase pde count if mfn is contigous with mfn - 1
>   */
> -static int iommu_update_pde_count(struct domain *d, unsigned long pt_mfn,
> -  unsigned long dfn, unsigned long mfn,
> -  unsigned int merge_level)
> +static bool iommu_update_pde_count(struct domain *d, unsigned long
> pt_mfn,
> +   unsigned long dfn, unsigned long mfn,
> +   unsigned int merge_level)
>  {
>  unsigned int pde_count, next_level;
>  unsigned long first_mfn;
>  u64 *table, *pde, *ntable;
>  u64 ntable_maddr, mask;
>  struct domain_iommu *hd = dom_iommu(d);
> -bool_t ok = 0;
> +bool ok = false;
> 
>  ASSERT( spin_is_locked(&hd->arch.mapping_lock) && pt_mfn );
> 
> @@ -384,7 +384,7 @@ static int iommu_update_pde_count(struct domain *d,
> unsigned long pt_mfn,
>  pde_count = get_pde_count(*pde);
> 
>  if ( pde_count == (PTE_PER_TABLE_SIZE - 1) )
> -ok = 1;
> +ok = true;
>  else if ( pde_count < (PTE_PER_TABLE_SIZE - 1))
>  {
>  pde_count++;
> @@ -648,7 +648,7 @@ static int update_paging_mode(struct domain *d,
> unsigned long dfn)
>  int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
> unsigned int flags)
>  {
> -bool_t need_flush = 0;
> +bool need_flush;
>  struct domain_iommu *hd = dom_iommu(d);
>  int rc;
>  unsigned long pt_mfn[7];
> --
> 2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] amd-iommu: replace occurrences of u with uint_t...

2018-11-29 Thread Paul Durrant

Ping? Can I get an ack or otherwise from an AMD maintainer please?

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 26 November 2018 11:33
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant ; Suravee Suthikulpanit
> ; Brian Woods 
> Subject: [PATCH v2 2/2] amd-iommu: replace occurrences of u with
> uint_t...
> 
> ...for N in {8, 16, 32, 64}.
> 
> Bring the coding style up to date.
> 
> Also, while in the neighbourhood, fix some tabs and remove use of uint64_t
> values where it leads to the need for explicit casting.
> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> 
> v2:
>  - Remove some uses of uint64_t variables
>  - Add missing blanks in pointer casts
>  - Use paddr_t for address argument in
> amd_iommu_reserve_domain_unity_map()
> ---
>  xen/drivers/passthrough/amd/iommu_map.c   | 124 +
> -
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   2 +-
>  2 files changed, 65 insertions(+), 61 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c
> b/xen/drivers/passthrough/amd/iommu_map.c
> index fde4686ee9..0ac3f473b3 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -37,7 +37,7 @@ static unsigned int pfn_to_pde_idx(unsigned long pfn,
> unsigned int level)
> 
>  static void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long
> dfn)
>  {
> -u64 *table, *pte;
> +uint64_t *table, *pte;
> 
>  table = map_domain_page(_mfn(l1_mfn));
>  pte = table + pfn_to_pde_idx(dfn, 1);
> @@ -45,15 +45,15 @@ static void clear_iommu_pte_present(unsigned long
> l1_mfn, unsigned long dfn)
>  unmap_domain_page(table);
>  }
> 
> -static bool set_iommu_pde_present(u32 *pde, unsigned long next_mfn,
> +static bool set_iommu_pde_present(uint32_t *pde, unsigned long next_mfn,
>unsigned int next_level,
>bool iw, bool ir)
>  {
> -uint64_t addr_lo, addr_hi, maddr_next;
> -u32 entry;
> +uint64_t maddr_next;
> +uint32_t addr_lo, addr_hi, entry;
>  bool need_flush = false, old_present;
> 
> -maddr_next = (u64)next_mfn << PAGE_SHIFT;
> +maddr_next = __pfn_to_paddr(next_mfn);
> 
>  old_present = get_field_from_reg_u32(pde[0], IOMMU_PTE_PRESENT_MASK,
>   IOMMU_PTE_PRESENT_SHIFT);
> @@ -79,7 +79,8 @@ static bool set_iommu_pde_present(u32 *pde, unsigned
> long next_mfn,
> IOMMU_PTE_IO_READ_PERMISSION_MASK,
> 
> IOMMU_PTE_IO_READ_PERMISSION_SHIFT);
> 
> -maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
> +maddr_old = ((uint64_t)addr_hi << 32) |
> +((uint64_t)addr_lo << PAGE_SHIFT);
> 
>  if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
>   old_level != next_level )
> @@ -90,7 +91,7 @@ static bool set_iommu_pde_present(u32 *pde, unsigned
> long next_mfn,
>  addr_hi = maddr_next >> 32;
> 
>  /* enable read/write permissions,which will be enforced at the PTE */
> -set_field_in_reg_u32((u32)addr_hi, 0,
> +set_field_in_reg_u32(addr_hi, 0,
>   IOMMU_PDE_ADDR_HIGH_MASK,
>   IOMMU_PDE_ADDR_HIGH_SHIFT, &entry);
>  set_field_in_reg_u32(iw, entry,
> @@ -109,7 +110,7 @@ static bool set_iommu_pde_present(u32 *pde, unsigned
> long next_mfn,
>  pde[1] = entry;
> 
>  /* mark next level as 'present' */
> -set_field_in_reg_u32((u32)addr_lo >> PAGE_SHIFT, 0,
> +set_field_in_reg_u32(addr_lo >> PAGE_SHIFT, 0,
>   IOMMU_PDE_ADDR_LOW_MASK,
>   IOMMU_PDE_ADDR_LOW_SHIFT, &entry);
>  set_field_in_reg_u32(next_level, entry,
> @@ -127,24 +128,24 @@ static bool set_iommu_pte_present(unsigned long
> pt_mfn, unsigned long dfn,
>unsigned long next_mfn, int pde_level,
>bool iw, bool ir)
>  {
> -u64 *table;
> -u32 *pde;
> +uint64_t *table;
> +uint32_t *pde;
>  bool need_flush;
> 
>  table = map_domain_page(_mfn(pt_mfn));
> 
> -pde = (u32*)(table + pfn_to_pde_idx(dfn, pde_level));
> +pde = (uint32_t *)(table + pfn_to_pde_idx(dfn, pde_level));
> 
>  need_flush = set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
>  unmap_domain_page(table);
>  return need_flush;
>  }
> 
> -void amd_iommu_set_root_page_table(
> -u32 *dte, u64 root_ptr, u16 domain_id, u8 paging_mode, u8 valid)
> +void amd_iommu_set_root_page_table(uint32_t *dte, uint64_t root_ptr,
> +   uint16_t domain_id, uint8_t
> paging_mode,
> +   uint8_t valid)
>  {
> -u64 addr_hi, addr_lo;
> -u32 entry;
> +uint32_t addr_hi, addr_lo, entry;
>  set_field_in_reg_u32(domain_id, 0,
>   IOMMU_DEV_TABLE_DOMAIN_ID_M

Re: [Xen-devel] [PATCH v6 00/20] xen: add pvh guest support

2018-11-29 Thread Daniel Kiper

On Thu, Nov 29, 2018 at 01:40:35PM +0100, Hans van Kranenburg wrote:
> Hi Daniel,
>
> On 11/29/18 1:22 PM, Daniel Kiper wrote:
> > On Wed, Nov 28, 2018 at 02:55:10PM +0100, Juergen Gross wrote:
> >> This patch series adds support for booting Linux as PVH guest.
> >>
> >> Similar to i386/xen and x86_64/xen platforms the new i386/xenpvh
> >> platform grub is booted as a standalone image directly by Xen.
> >>
> >> For booting Linux kernel it is using the standard linux kernel
> >> loader. The only modification of the linux loader is to pass the
> >> ACPI RSDP address via boot parameters to the kernel, as that table
> >> might not be located at the usual physical address just below 1MB.
> >>
> >> The related Linux kernel patches have been accepted for 4.20-rc4
> >
> > Patch set LGTM. If there are no objections I will apply it in a week
> > or so.
> >
> > Hans, may I add "Tested-by: Hans van Kranenburg " to
> > the patches?
>
> Sure!

Great! Thanks a lot!

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/firmware: update OVMF Makefile

2018-11-29 Thread Wei Liu

On Thu, Nov 29, 2018 at 12:02:16PM +, Anthony PERARD wrote:
> On Thu, Nov 29, 2018 at 11:39:54AM +, Wei Liu wrote:
> > On Thu, Nov 29, 2018 at 11:31:41AM +, Anthony PERARD wrote:
> > > What about the release tarball? Do we includes OVMF in it?
> > 
> > Yes we do. But this should work because the Makefile is also shipped.
> 
> The fact that the Makefile is shipped doesn't matter... Also of course
> it would work, as long as the machine that build xen have access to
> internet.
> 

I had assumed that QEMU would clone submodule on the fly. But I was
wrong. They had a script to extract those modules.

> If we ship OVMF sources code, we need to include its submodules, so we
> need to ship openssl source code. I think I've look into removing this
> build dependency, but I don't think that was possible with simple build
> options.
> 

Yes so we should ship those submodules as well.

> > What does qemu-xen do regarding its submodules? OVMF should just follow
> > suite.
> 
> Nothing, qemu-xen isn't a project. If you meant "what does xen.git do
> regarding qemu-xen's submodules" then the answer is, we call a script
> from qemu to have it export whatever it needs to build (the fact that
> there are submodules are transparent to users building qemu, like xen
> do).
> 
> OVMF/edk2 is a bit special, its build system doesn't know about external
> source code and teaching it to do something about it is probably not
> going to be simple.
> 

Maybe we should ask upstream's opinion on this matter?

Their github page has a release tarball but it doesn't build because of
the same problem.

If upstream doesn't want to do anything we would need special
arrangement for xen.git build and tarball build separately.

> So we can't treat OVMF like QEMU, regarding the build, and releases.
> 
> We probably need a script to export git-submodules in xen.git.
> 

We should.

Wei.

> -- 
> Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/xenstore: domain can sometimes disappear when destroying connection

2018-11-29 Thread Eftime, Petre

I didn't go extremely deep in my debugging, as the talloc library is a bit
difficult to debug, but under the do_introduce function you have these
two lines:

 /* Now domain belongs to its connection. */
 talloc_steal(domain->conn, domain);

After these happen, destroying the domain leads to a SIGSEGV in xenstored, as
when conn gets freed, so does domain, which ends up in a use-after-free.

I've posted the patch with the fixed text.

Best,
Petre

On 2018-11-29, 14:54, "Wei Liu"  wrote:

On Mon, Nov 26, 2018 at 01:22:04PM +, Petre Eftime wrote:
> There is a circular link formed between domain and a connection. In 
certain
> circustances, when conn is freed, domain is also freed, which leads to use
> after free when trying to set the conn field in domain to null.

Actually, can you provide more context on this? When will the circular
link happen?

Wei.

> 
> Signed-off-by: Petre Eftime 
> ---
>  tools/xenstore/xenstored_domain.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c
> index fa6655033a..f085d40476 100644
> --- a/tools/xenstore/xenstored_domain.c
> +++ b/tools/xenstore/xenstored_domain.c
> @@ -222,6 +222,7 @@ static void domain_cleanup(void)
>  {
>   xc_dominfo_t dominfo;
>   struct domain *domain;
> + struct connection *tmp_conn;
>   int notify = 0;
>  
>   again:
> @@ -238,8 +239,14 @@ static void domain_cleanup(void)
>   continue;
>   }
>   if (domain->conn) {
> - talloc_unlink(talloc_autofree_context(), domain->conn);
> + /*
> +  * In certain circumstances conn owns domain and
> +  * domain will be freed when conn is unlinked.
> +  */
> + tmp_conn = domain->conn;
>   domain->conn = NULL;
> +
> + talloc_unlink(talloc_autofree_context(), tmp_conn);
>   notify = 0; /* destroy_domain() fires the watch */
>   goto again;
>   }
> -- 
> 2.16.5
> 
> 
> 
> 
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.
> 





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-11-29 Thread Olaf Hering

Thanks Andrew and Ian for taking the time to look at this change.
In turn it took me some time to get back to this topic.

Am Mon, 1 Oct 2018 13:39:51 +0100
schrieb Andrew Cooper :

> On 07/06/18 14:08, Olaf Hering wrote:
> > Add an option to control when vTSC emulation will be activated for a
> > domU with tsc_mode=default. Without such option each TSC access from
> > domU will be emulated, which causes a significant perfomance drop for
> > workloads that make use of rdtsc.
> >
> > One option to avoid the TSC option is to run domUs with tsc_mode=native.
> > This has the drawback that migrating a domU from a "2.3GHz" class host
> > to a "2.4GHz" class host may change the rate at wich the TSC counter
> > increases, the domU may not be prepared for that.
> >
> > With the new option the host admin can decide how a domU should behave
> > when it is migrated across systems of the same class. Since there is
> > always some jitter when Xen calibrates the cpu_khz value, all hosts of
> > the same class will most likely have slightly different values. As a
> > result vTSC emulation is unavoidable. Data collected during the incident
> > which triggered this change showed a jitter of up to 200 KHz across
> > systems of the same class.  
> 
> Do you have any further details of the systems involved?  If they are
> identical systems, they should all have the same real TSC frequency, and
> its a known issue that Xen isn't very good at working out the
> frequency.  TBH, fixing that would be far better overall.

My test hosts have a E5504 cpu. The ones where the issue was reported
use "E7-8880 v3" today, the used hardware two years ago was likely older.

From what I understand the TSC frequency stored in "cpu_khz" is just an
estimated value, not the real hardware frequency. Still, it is used to
decide if two hosts tick at the same speed. The domU kernel may use the
estimated value for its timekeeping, I think it does the same estimation
as Xen itself. But, I have to dig into that.

To me it looks like domUs should run ntpd themselves if there is a plan
to migrate them at some point in the future. At least if they use TSC
for timekeeping. With ntpd the domU would detect time skew, even if it
was not yet migrated to another host. I do not know much about timekeeping,
so this is just a guess on my side.

> > Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> > The padding is sent as zero in write_tsc_info to the receving host.
> > The padding is undefined if the changed code runs as receiver.  
> I'm not sure what you mean by this final sentence.

I have removed that part, since incoming padding is in practice always zero.

> > handle_tsc_info has no code to verify that padding is indeed zero. Due
> > to the lack of a version field it is impossible to know if the sender
> > already has the newly introduced vtsc_tolerance field. In the worst
> > case the receiving domU will get an unemulated TSC.  
> 
> The lack of padding verification is deliberate, for forwards
> compatibility.  Why does the sending code matter?  One way or another,
> if the field is 0, the option wasn't present or wasn't configured. 
> Neither of these situations affect the decision-making that the
> receiving side needs to perform.

> > Signed-off-by: Olaf Hering 
> > Reviewed-by: Wei Liu  (v07/v08)
> > Reviewed-by: Jan Beulich  (v08)  
> 
> I'm still -0.5 for this patch.  I can appreciate why you want it, but it
> is a gross hack which only works when you don't skew time more than NTP
> in the guest can cope with.  My gut feeling is that there will be other
> more subtle fallout.

I will do some research regarding how much skewing a domU can handle.
As said in another reply, the expected time drift with a difference of
just 11 kHz in the cpu_khz variable on a 2.6GHz system is about 0.3 seconds
during a day.

IanJ requested clarification for how much time skew a system can handle.
Perhaps this should have been part of the initial submission for
tsc_mode=native already. I will do some research. Also some of the concerns
about missing documentation are already covered in paragraph #3 of the
commit message.

I will do some more testing with staging and send v10 next week.
The accumulated changes for a v10, so far:
v10:
 - rebase to 402411ec40
 - update write_libxc_tsc_info to handle the new parameter vtsc_tolerance_khz
   without this change, migration from xen-4.5 will fail (Andrew)
 - add newline to tsc_set_info (Andrew)
 - add measurment unit to libxc-migration-stream.pandoc (Andrew)
 - add pointer to xen-tscmode(7) in xl.cfg(5)/vtsc_tolerance_khz (Andrew)
 - reword the newly added paragraph in xen-tscmode(7) (Andrew),
   and also mention that it is about the measured/estimated TSC value
   rather than the real value. The latter is simply unknown.
 - simplify wording regarding the value of padding field in old Xen
   versions, the previous one turned out to be confusing and not helpful

Olaf

pgpqIIyvcccC2.pgp
Description: Digitale

[Xen-devel] [PATCH V10 4/5] p2m: Always use hostp2m when clipping rangesets

2018-11-29 Thread Razvan Cojocaru

On 11/29/18 12:04 PM, Jan Beulich wrote:
 On 28.11.18 at 22:56,  wrote:
>> Changes since V9:
>>  - Removed the patch RFC (replaced by a printk(XENLOG_G_WARNING).
>>  - Reused start and end in change_type_range() and removed the
>>intermediary variables range_start and range_end.
>>  - Added an extra explanation for the if ( start > end ) return;
>>code in the comment.
> 
> This last item isn't really taking care of the comments I gave on v9.
> The _incoming_ start being larger than the _incoming_ end is
> something worth to point out. But you put that check after clipping
> end. Furthermore it looks like you continue to break the case
> where ->max_mapped_pfn increases subsequently, i.e. you still
> don't update the rangeset with the unmodified incoming values.
> Or otherwise I would have expected an explanation (as a reply
> to my comments, not necessarily by adding to description or
> comments of the patch here) why either this is not an issue or I'm
> misreading anything.

max_mapped_pfn _should_ end up being >= the logdirty range upper bound,
since AFAICT the logdirty ranges are tied to ept_set_entry() calls,
which always end up calling p2m_altp2m_propagate_change() when they
occur on the hostp2m (which in turn calls p2m_set_entry() on the
altp2ms, and so on).

Long story short, all modifications to the hostp2m's max_mapped_pfn will
end up updating it for all active altp2ms. The other way around is not
true if I understand the code correctly, so it is theoretically possible
for altp2m->max_mapped_pfn > hostp2m->max_mapped_pfn (although, again
AFAICT, this should not really affect the logdirty case where we're now
doing our best to keep the hostp2m in sync with altp2ms).

So this is why I believe that this is not an issue, however I might be
missing something or am (quite likely) possibly misunderstanding the
question.

Also, apologies if I'm speaking out of turn and the question was really
addressed to George (who is the proper author of the patch).

Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Kirill A. Shutemov

On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote:
> On 29/11/2018 02:22, Hans van Kranenburg wrote:
> > Hi,
> > 
> > As also seen at:
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
> > 
> > Attached there are two serial console output logs. One is starting with
> > Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
> > 
> > [2.085543] BUG: unable to handle kernel paging request at
> > 888d9fffc000
> > [2.085610] PGD 200c067 P4D 200c067 PUD 0
> > [2.085674] Oops:  [#1] SMP NOPTI
> > [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> > 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
> > [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
> > [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
> > [...]
> 
> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
> 
> Current upstream kernel is booting fine under Xen, so in general the
> patch should be fine. Using an upstream kernel built from above commit
> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
> too.
> 
> Kirill, are you aware of any prerequisite patch from 4.20 which could be
> missing in 4.19.5?

I'm not.

Let me look into this.

-- 
 Kirill A. Shutemov

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] PCI BAR sizing quirks

2018-11-29 Thread Jan Beulich

Roger,

in the context of some other bug report here I've been pointed
at Linux commit 6af7e4f77259ee946103387372cb159f2e99a6d4.
I think we will need to implement something similar for any
devices that may be present on x86 systems, but don't have
standard compliant behavior of registers in what normally is the
BAR range in config space, for your PVH BAR handling to not
cause boot failures.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] amd-iommu: replace occurrences of bool_t with bool

2018-11-29 Thread Jan Beulich

>>> On 29.11.18 at 13:53,  wrote:
> Ping? Can I get an ack or otherwise from an AMD maintainer please?

Brian did ack this one, which I've committed yesterday, but not 2.

Jan

>> -Original Message-
>> From: Paul Durrant [mailto:paul.durr...@citrix.com]
>> Sent: 26 November 2018 11:33
>> To: xen-devel@lists.xenproject.org 
>> Cc: Paul Durrant ; Suravee Suthikulpanit
>> ; Brian Woods 
>> Subject: [PATCH v2 1/2] amd-iommu: replace occurrences of bool_t with bool
>> 
>> Bring the coding style up to date. No functional change (except for
>> removal of some pointless initializers).
>> 
>> Signed-off-by: Paul Durrant 
>> Reviewed-by: Jan Beulich 
>> ---
>> Cc: Suravee Suthikulpanit 
>> Cc: Brian Woods 
>> ---
>>  xen/drivers/passthrough/amd/iommu_map.c | 26 +-
>>  1 file changed, 13 insertions(+), 13 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/amd/iommu_map.c
>> b/xen/drivers/passthrough/amd/iommu_map.c
>> index c1daba8422..fde4686ee9 100644
>> --- a/xen/drivers/passthrough/amd/iommu_map.c
>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
>> @@ -45,9 +45,9 @@ static void clear_iommu_pte_present(unsigned long
>> l1_mfn, unsigned long dfn)
>>  unmap_domain_page(table);
>>  }
>> 
>> -static bool_t set_iommu_pde_present(u32 *pde, unsigned long next_mfn,
>> -unsigned int next_level,
>> -bool_t iw, bool_t ir)
>> +static bool set_iommu_pde_present(u32 *pde, unsigned long next_mfn,
>> +  unsigned int next_level,
>> +  bool iw, bool ir)
>>  {
>>  uint64_t addr_lo, addr_hi, maddr_next;
>>  u32 entry;
>> @@ -123,13 +123,13 @@ static bool_t set_iommu_pde_present(u32 *pde,
>> unsigned long next_mfn,
>>  return need_flush;
>>  }
>> 
>> -static bool_t set_iommu_pte_present(unsigned long pt_mfn, unsigned long
>> dfn,
>> -unsigned long next_mfn, int
>> pde_level,
>> -bool_t iw, bool_t ir)
>> +static bool set_iommu_pte_present(unsigned long pt_mfn, unsigned long
>> dfn,
>> +  unsigned long next_mfn, int pde_level,
>> +  bool iw, bool ir)
>>  {
>>  u64 *table;
>>  u32 *pde;
>> -bool_t need_flush = 0;
>> +bool need_flush;
>> 
>>  table = map_domain_page(_mfn(pt_mfn));
>> 
>> @@ -347,16 +347,16 @@ static void set_pde_count(u64 *pde, unsigned int
>> count)
>>  /* Return 1, if pages are suitable for merging at merge_level.
>>   * otherwise increase pde count if mfn is contigous with mfn - 1
>>   */
>> -static int iommu_update_pde_count(struct domain *d, unsigned long pt_mfn,
>> -  unsigned long dfn, unsigned long mfn,
>> -  unsigned int merge_level)
>> +static bool iommu_update_pde_count(struct domain *d, unsigned long
>> pt_mfn,
>> +   unsigned long dfn, unsigned long mfn,
>> +   unsigned int merge_level)
>>  {
>>  unsigned int pde_count, next_level;
>>  unsigned long first_mfn;
>>  u64 *table, *pde, *ntable;
>>  u64 ntable_maddr, mask;
>>  struct domain_iommu *hd = dom_iommu(d);
>> -bool_t ok = 0;
>> +bool ok = false;
>> 
>>  ASSERT( spin_is_locked(&hd->arch.mapping_lock) && pt_mfn );
>> 
>> @@ -384,7 +384,7 @@ static int iommu_update_pde_count(struct domain *d,
>> unsigned long pt_mfn,
>>  pde_count = get_pde_count(*pde);
>> 
>>  if ( pde_count == (PTE_PER_TABLE_SIZE - 1) )
>> -ok = 1;
>> +ok = true;
>>  else if ( pde_count < (PTE_PER_TABLE_SIZE - 1))
>>  {
>>  pde_count++;
>> @@ -648,7 +648,7 @@ static int update_paging_mode(struct domain *d,
>> unsigned long dfn)
>>  int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
>> unsigned int flags)
>>  {
>> -bool_t need_flush = 0;
>> +bool need_flush;
>>  struct domain_iommu *hd = dom_iommu(d);
>>  int rc;
>>  unsigned long pt_mfn[7];
>> --
>> 2.11.0
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org 
> https://lists.xenproject.org/mailman/listinfo/xen-devel 




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Juergen Gross

On 29/11/2018 14:26, Kirill A. Shutemov wrote:
> On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote:
>> On 29/11/2018 02:22, Hans van Kranenburg wrote:
>>> Hi,
>>>
>>> As also seen at:
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
>>>
>>> Attached there are two serial console output logs. One is starting with
>>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
>>>
>>> [2.085543] BUG: unable to handle kernel paging request at
>>> 888d9fffc000
>>> [2.085610] PGD 200c067 P4D 200c067 PUD 0
>>> [2.085674] Oops:  [#1] SMP NOPTI
>>> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
>>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
>>> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
>>> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
>>> [...]
>>
>> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
>> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
>> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
>>
>> Current upstream kernel is booting fine under Xen, so in general the
>> patch should be fine. Using an upstream kernel built from above commit
>> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
>> too.
>>
>> Kirill, are you aware of any prerequisite patch from 4.20 which could be
>> missing in 4.19.5?
> 
> I'm not.
> 
> Let me look into this.
> 

What is making me suspicious is the failure happening just after
releasing the init memory. Maybe there is an access to .init.data
segment or similar? The native kernel booting could be related to the
usage of 2M mappings not being available in a PV-domain.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/xenstore: domain can sometimes disappear when destroying connection

2018-11-29 Thread Wei Liu

On Thu, Nov 29, 2018 at 01:10:52PM +, Eftime, Petre wrote:
> I didn't go extremely deep in my debugging, as the talloc library is a bit
> difficult to debug, but under the do_introduce function you have these
> two lines:

difficult to debug> I completely agree. ;-)

> 
>  /* Now domain belongs to its connection. */
>  talloc_steal(domain->conn, domain);
> 

Thanks, I think this is what I missed. This is useful information. I
think it should go into the commit message if we end up committing your
patch.

> After these happen, destroying the domain leads to a SIGSEGV in xenstored, as
> when conn gets freed, so does domain, which ends up in a use-after-free.
> 
> I've posted the patch with the fixed text.

Yes, saw that. Thanks for the quick response. There is no need to post
another one just yet. Please give us some time while we try to figure
out the root cause.

Wei.

> 
> Best,
> Petre
> 
> On 2018-11-29, 14:54, "Wei Liu"  wrote:
> 
> On Mon, Nov 26, 2018 at 01:22:04PM +, Petre Eftime wrote:
> > There is a circular link formed between domain and a connection. In 
> certain
> > circustances, when conn is freed, domain is also freed, which leads to 
> use
> > after free when trying to set the conn field in domain to null.
> 
> Actually, can you provide more context on this? When will the circular
> link happen?
> 
> Wei.
> 
> > 
> > Signed-off-by: Petre Eftime 
> > ---
> >  tools/xenstore/xenstored_domain.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/xenstore/xenstored_domain.c 
> b/tools/xenstore/xenstored_domain.c
> > index fa6655033a..f085d40476 100644
> > --- a/tools/xenstore/xenstored_domain.c
> > +++ b/tools/xenstore/xenstored_domain.c
> > @@ -222,6 +222,7 @@ static void domain_cleanup(void)
> >  {
> > xc_dominfo_t dominfo;
> > struct domain *domain;
> > +   struct connection *tmp_conn;
> > int notify = 0;
> >  
> >   again:
> > @@ -238,8 +239,14 @@ static void domain_cleanup(void)
> > continue;
> > }
> > if (domain->conn) {
> > -   talloc_unlink(talloc_autofree_context(), 
> domain->conn);
> > +   /*
> > +* In certain circumstances conn owns domain and
> > +* domain will be freed when conn is unlinked.
> > +*/
> > +   tmp_conn = domain->conn;
> > domain->conn = NULL;
> > +
> > +   talloc_unlink(talloc_autofree_context(), 
> tmp_conn);
> > notify = 0; /* destroy_domain() fires the watch 
> */
> > goto again;
> > }
> > -- 
> > 2.16.5
> > 
> > 
> > 
> > 
> > Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. 
> Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered 
> in Romania. Registration number J22/2621/2005.
> > 
> 
> 
> 
> 
> 
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
> Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
> Romania. Registration number J22/2621/2005.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] vpci: deferral of register write until p2m changes are done

2018-11-29 Thread Roger Pau Monné

On Thu, Nov 29, 2018 at 03:41:07AM -0700, Jan Beulich wrote:
> >>> On 29.11.18 at 11:25,  wrote:
> > On Thu, Nov 29, 2018 at 02:25:02AM -0700, Jan Beulich wrote:
> >> >>> On 28.11.18 at 18:48,  wrote:
> >> > On Wed, Nov 28, 2018 at 10:04:33AM -0700, Jan Beulich wrote:
> >> >> >>> On 28.11.18 at 17:54,  wrote:
> >> >> > On Wed, Nov 28, 2018 at 09:22:16AM -0700, Jan Beulich wrote:
> >> >> >> >>> On 28.11.18 at 16:41,  wrote:
> >> >> >> > My plan is that DomUs won't be allowed to toggle the memory 
> >> >> >> > decoding
> >> >> >> > bit, and it's going to be always enabled, like it's currently done 
> >> >> >> > for
> >> >> >> > pci-passthrough in QEMU. Toggling the memory decoding bit in a 
> >> >> >> > DomU is
> >> >> >> > going to trigger a change to the p2m (map or unmap) but the command
> >> >> >> > register will always have the memory decoding bit enabled.
> >> >> >> 
> >> >> >> But this isn't entirely correct, even if we've got away with this
> >> >> >> so far. But we're mostly considering well-behaved guests and
> >> >> >> devices. What if one actually triggers bus activity in parallel to
> >> >> >> a BAR change?
> >> >> > 
> >> >> > Well, that's likely to not work properly in any case with or without
> >> >> > disabling the memory decoding bit?
> >> >> 
> >> >> Of course not.
> >> >> 
> >> >> > But I don't see how that's going to affect Xen stability (or what the
> >> >> > domain is attempting to achieve with it).
> >> >> 
> >> >> "I don't see how ..." != "That's not going to ...". And in case my
> >> >> prior way of wording it was ambiguous: We very much need to
> >> >> think about malicious guests (once any of this is to be extended
> >> >> to DomU-s). Hence a goal of "I want to crash Xen" needs to be
> >> >> taken into consideration.
> >> > 
> >> > Right, so Xen might what to disable memory decoding for DomUs also.
> >> > 
> >> > But that's orthogonal to whether we agree that the write to the
> >> > command register can happen before the p2m modifications, both in the
> >> > map and the unmap case. I think so, but I would like to be sure you
> >> > agree before I code this.
> >> 
> >> Thing is, as said before, I'm not sure. I can be convinced by, well,
> >> convincing arguments (which a proof that nothing bad can happen
> >> would be, but anything less likely wouldn't).
> > 
> > Hm, OK, please bear with me because I'm afraid I will need some help
> > in order to provide such argument.
> > 
> > We agree that the end result is going to be the same, ie: a p2m change
> > and a write to the device command register. Then the issue is the
> > order in which to perform those, and whether that order will have a
> > security impact.
> > 
> > We also agree that in the unmap case it's best to first disable memory
> > decoding and then perform the p2m unmap. So the only remaining
> > problematic case is the map action.
> > 
> > I guess the only vector of a possible attack could be other vCPUs in a
> > SMP guest, that could poke at either the device registers or the p2m
> > after the memory decoding bit has been set and while the map operation
> > is taking place:
> > 
> >  - Writing to the BARs MMIO while performing the p2m map and with the
> >memory decoding bit is enabled can result in missing writes not
> >reaching the device, because the p2m entries are not yet setup.
> >This can also be triggered by the guest when the BARs are
> >completely mapped by performing incorrect writes.
> 
> Not sure what "incorrect writes" you mean here. Ones to other GFNs
> are not targeted at the device in question anyway. But yes, P2M
> mappings not in place _should_ not have any bad effects. My issue
> is with the "should not" here, which I'd like to be "cannot".
> 
> >  - Attempting to program a DMA transaction to the device MMIO BAR
> >regions will result in IOMMU page faults, the same can be achieved by
> >attempting to perform DMA transactions to unpopulated memory
> >regions.
> > 
> > I think there are other scenarios you are worried about and I'm
> > missing, could you please point them out?
> 
> That's the point: I can't point out other scenarios, but I also can't
> convince myself that there are none. What I'm concerned about
> from a more abstract pov are any actions by the guest which might
> ultimately lead to master or target aborts (or alike), which may in
> turn cause #SERR to be signaled.

I cannot see how interactions with a device with half-mapped BARs
could trigger aborts that cannot be triggered when the device has
fully mapped BARs. Ie: if there's indeed a way to trigger such aborts
it would also be possible to do so with fully mapped BARs.

> Yet then again we all know that
> PCI pass-through is not perfectly secure despite all claims (or
> should I say wishes), so maybe all of this is really tolerable.

Well, we already know there are devices that expose backdoors to the
PCI config space in a BAR, so it's mostly a question of whether you
trust the devices you are passing thro

Re: [Xen-devel] [PATCH] tools/xenstore: domain can sometimes disappear when destroying connection

2018-11-29 Thread Wei Liu

On Thu, Nov 29, 2018 at 01:38:11PM +, Wei Liu wrote:
> On Thu, Nov 29, 2018 at 01:10:52PM +, Eftime, Petre wrote:
> > I didn't go extremely deep in my debugging, as the talloc library is a bit
> > difficult to debug, but under the do_introduce function you have these
> > two lines:
> 
> difficult to debug> I completely agree. ;-)
> 
> > 
> >  /* Now domain belongs to its connection. */
> >  talloc_steal(domain->conn, domain);
> > 
> 
> Thanks, I think this is what I missed. This is useful information. I
> think it should go into the commit message if we end up committing your
> patch.
> 

I have taken a closer look. I don't think the circular link exists. Note
that a new connection is always created with talloc_autofree_context.

Following the code, we get (to distinguish, I use conn(X)):

 conn(1) <- belongs to talloc_autofree_context, created by new_connection when
accepting connection from sockets.

 conn(1)->in <- allocated with new_buffer, belongs to conn(1).

 domain <- allocated with new_domain, belongs to conn(1)->in .

 domain->conn(2) <- allocated with new_connection, belongs to 
talloc_autofree_context.

 talloc_steal(domain->conn(2), domain) <- now domain belongs to conn(2).

So the n-ary tree ends up like:

  talloc_autofree_context->conn(1)->in
  \->conn(2)->domain

Ian, can I have a second opinion from you?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V10 4/5] p2m: Always use hostp2m when clipping rangesets

2018-11-29 Thread Jan Beulich

>>> On 29.11.18 at 14:23,  wrote:
> On 11/29/18 12:04 PM, Jan Beulich wrote:
> On 28.11.18 at 22:56,  wrote:
>>> Changes since V9:
>>>  - Removed the patch RFC (replaced by a printk(XENLOG_G_WARNING).
>>>  - Reused start and end in change_type_range() and removed the
>>>intermediary variables range_start and range_end.
>>>  - Added an extra explanation for the if ( start > end ) return;
>>>code in the comment.
>> 
>> This last item isn't really taking care of the comments I gave on v9.
>> The _incoming_ start being larger than the _incoming_ end is
>> something worth to point out. But you put that check after clipping
>> end. Furthermore it looks like you continue to break the case
>> where ->max_mapped_pfn increases subsequently, i.e. you still
>> don't update the rangeset with the unmodified incoming values.
>> Or otherwise I would have expected an explanation (as a reply
>> to my comments, not necessarily by adding to description or
>> comments of the patch here) why either this is not an issue or I'm
>> misreading anything.
> 
> max_mapped_pfn _should_ end up being >= the logdirty range upper bound,
> since AFAICT the logdirty ranges are tied to ept_set_entry() calls,
> which always end up calling p2m_altp2m_propagate_change() when they
> occur on the hostp2m (which in turn calls p2m_set_entry() on the
> altp2ms, and so on).

Altp2m-s don't matter here at all. My point is that the present,
unpatched p2m_change_type_range() updates the log-dirty
ranges with the unclipped [start,end), but calls
p2m->change_entry_type_range() with a possibly reduced
range. Any subsequent caller of p2m_is_logdirty_range() may
thus be mislead if the rangeset update now also used only the
clipped range.

> Also, apologies if I'm speaking out of turn and the question was really
> addressed to George (who is the proper author of the patch).

That's okay, since you've made the change in question. I did
expect George to reply to my comments, but you volunteering
to take care of addressing them is fine with me.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 17/18] xen/arm: Resume Dom0 after Xen resumes

2018-11-29 Thread Mirela Simonovic

Hi Julien,

On Tue, Nov 27, 2018 at 7:36 PM Julien Grall  wrote:
>
>
>
> On 11/17/18 4:01 PM, Mirela Simonovic wrote:
> > Hi,
>
> Hi Mirela,
>
> >
> > On Sat, Nov 17, 2018 at 12:06 AM Stefano Stabellini
> >  wrote:
> >>
> >> On Sat, 17 Nov 2018, Dario Faggioli wrote:
> >>> On Fri, 2018-11-16 at 21:58 +, Julien Grall wrote:
>  On 16/11/2018 21:41, Mirela Simonovic wrote:
> > On Fri, Nov 16, 2018 at 8:09 PM Stefano Stabellini
> >  wrote:
> >>> It should be possible to figure out which domain needs to
> >>> awaken from
> >>> there.
> >>
> >> Actually, evtchn_send eventually will trigger a proper interrupt
> >> injection into the domain
> >> (xen/arch/arm/vgic.c:arch_evtchn_inject),
> >> which will necessarely wake it up. So it is possible that it will
> >> already work without any need for additional changes?
> >>
> >
> > Absolutely, that sounds great :) Then we could just drop this
> > patch.
> 
>  I don't think you can drop this patch... As you tie the host suspend
>  to
>  the hardware domain suspend, it may makes sense to resume at the same
>  time.
> 
> >>> FWIW, I think that too.
> >>>
> >>> In fact, let's assume a *fully* disaggregated setup, where dom0 only
> >>> has the toolstack, while it has no hardware, no PV backend, etc... If
> >>> we don't resume it explicitly together with Xen, who is going to resume
> >>> it? :-O
> >>
> >> Yes, that's right. However, it should work for driver domains: there is
> >> no need to wake up driver domains explicitly because they will be
> >> woken up by the frontends?
> >>
> >
> > I think we all agree, except some of us weren't so clear about it :)
> > For now, dom0 issues suspend and should resume as well when Xen
> > suspends. This is done in the series, resume is covered by this patch,
> > and commit message should be clarified.
> >
> > If a domU has a backend, we should verify that it can be woken-up by
> > an event triggered by a frontend driver in another domain.
> >
> > One day, this patch could be dropped/reverted if one come up with a
> > different logic for triggering Xen suspend. This should be of the
> > table for now, but a good option to remember for future.
>
> Such change cannot be easily dropped because some hardware domain OS may
> rely on that behavior.
>
> I am also interested to see how this is going to fit with the Dom0less
> use case. The end goal is to have no Dom0/Hardware domain. So how do you
> expect suspend/resume to work in that case? Note that I am not asking to
> implement it :).
>

From the implementation perspective and future changes which would be
required in this series it's not going to be the problem to support
dom0less approach - just one line of code (the if statement that
checks whether the hardware domain suspended) has to be replaced with
some other check. That other check would be a new condition to suspend
Xen that needs to be implemented. What that check would do depends on
the system architecture and target use cases that are specific to
dom0less architecture. I'm not so familiar with dom0less work, so
can't really say what would be the best approach to suspend condition
rules.
Do you have some whitepaper or anything similar that describes an
example of a target system architecture and/or use cases for dom0less
setup?
In dom0less world, who creates other domains? Is some domain still
more privileged than the others? E.g. is there a domain which knows
about other domains in the system so that it could do the coordination
in EL1?

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Qemu-devel] [PATCH 17/18] MAINTAINERS: add myself as a Xen maintainer

2018-11-29 Thread Philippe Mathieu-Daudé

On 21/11/18 16:12, Paul Durrant wrote:
> I have made many significant contributions to the Xen code in QEMU,
> particularly the recent patches introducing a new PV device framework.
> I intend to make further significant contributions, porting other PV back-
> ends to the new framework with the intent of eventually removing the
> legacy code. It therefore seems reasonable that I become a maintiner of

"a maintainer of"

> the Xen code.
> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paolo Bonzini 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5871f035c3..0b668dd205 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -382,6 +382,7 @@ Guest CPU Cores (Xen):
>  X86
>  M: Stefano Stabellini 
>  M: Anthony Perard 
> +M: Paul Durrant 
>  L: xen-devel@lists.xenproject.org
>  S: Supported
>  F: */xen*
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Qemu-devel] [PATCH 17/18] MAINTAINERS: add myself as a Xen maintainer

2018-11-29 Thread Paul Durrant

> -Original Message-
> From: Philippe Mathieu-Daudé [mailto:phi...@redhat.com]
> Sent: 29 November 2018 14:01
> To: Paul Durrant ; qemu-bl...@nongnu.org; qemu-
> de...@nongnu.org; xen-devel@lists.xenproject.org
> Cc: Anthony Perard ; Paolo Bonzini
> ; Stefano Stabellini 
> Subject: Re: [Qemu-devel] [PATCH 17/18] MAINTAINERS: add myself as a Xen
> maintainer
> 
> On 21/11/18 16:12, Paul Durrant wrote:
> > I have made many significant contributions to the Xen code in QEMU,
> > particularly the recent patches introducing a new PV device framework.
> > I intend to make further significant contributions, porting other PV
> back-
> > ends to the new framework with the intent of eventually removing the
> > legacy code. It therefore seems reasonable that I become a maintiner of
> 
> "a maintainer of"

Yes, of course. I'll correct it in v2.

  Paul

> 
> > the Xen code.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > Cc: Paolo Bonzini 
> > ---
> >  MAINTAINERS | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 5871f035c3..0b668dd205 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -382,6 +382,7 @@ Guest CPU Cores (Xen):
> >  X86
> >  M: Stefano Stabellini 
> >  M: Anthony Perard 
> > +M: Paul Durrant 
> >  L: xen-devel@lists.xenproject.org
> >  S: Supported
> >  F: */xen*
> >
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 4.4 29/86] x86/entry/64: Remove %ebx handling from error_entry/exit

2018-11-29 Thread Greg Kroah-Hartman

4.4-stable review patch.  If anyone has any objections, please let me know.

--

[ Upstream commit b3681dd548d06deb2e1573890829dff4b15abf46 ]

error_entry and error_exit communicate the user vs. kernel status of
the frame using %ebx.  This is unnecessary -- the information is in
regs->cs.  Just use regs->cs.

This makes error_entry simpler and makes error_exit more robust.

It also fixes a nasty bug.  Before all the Spectre nonsense, the
xen_failsafe_callback entry point returned like this:

ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
ENCODE_FRAME_POINTER
jmp error_exit

And it did not go through error_entry.  This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.

Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used.  As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks.  Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes.  This was introduced by:

commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for 
exceptions/interrupts, to reduce speculation attack surface")

With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.

I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.

[ Historical note: I wrote this patch as a cleanup before I was aware
  of the bug it fixed. ]

[ Note to stable maintainers: this should probably get applied to all
  kernels.  If you're nervous about that, a more conservative fix to
  add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should
  also fix the problem. ]

Reported-and-tested-by: M. Vefa Bicakci 
Signed-off-by: Andy Lutomirski 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: Dominik Brodowski 
Cc: Greg KH 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, 
to reduce speculation attack surface")
Link: 
http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.l...@kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 arch/x86/entry/entry_64.S | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b569b46660fc..375ed605c83d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -856,7 +856,7 @@ ENTRY(\sym)
 
call\do_sym
 
-   jmp error_exit  /* %ebx: no swapgs flag */
+   jmp error_exit
.endif
 END(\sym)
 .endm
@@ -1118,7 +1118,6 @@ END(paranoid_exit)
 
 /*
  * Save all registers in pt_regs, and switch gs if needed.
- * Return: EBX=0: came from user mode; EBX=1: otherwise
  */
 ENTRY(error_entry)
cld
@@ -1131,7 +1130,6 @@ ENTRY(error_entry)
 * the kernel CR3 here.
 */
SWITCH_KERNEL_CR3
-   xorl%ebx, %ebx
testb   $3, CS+8(%rsp)
jz  .Lerror_kernelspace
 
@@ -1165,7 +1163,6 @@ ENTRY(error_entry)
 * for these here too.
 */
 .Lerror_kernelspace:
-   incl%ebx
leaqnative_irq_return_iret(%rip), %rcx
cmpq%rcx, RIP+8(%rsp)
je  .Lerror_bad_iret
@@ -1196,28 +1193,19 @@ ENTRY(error_entry)
 
/*
 * Pretend that the exception came from user mode: set up pt_regs
-* as if we faulted immediately after IRET and clear EBX so that
-* error_exit knows that we will be returning to user mode.
+* as if we faulted immediately after IRET.
 */
mov %rsp, %rdi
callfixup_bad_iret
mov %rax, %rsp
-   decl%ebx
jmp .Lerror_entry_from_usermode_after_swapgs
 END(error_entry)
 
-
-/*
- * On entry, EBX is a "return to kernel mode" flag:
- *   1: already in kernel mode, don't need SWAPGS
- *   0: user gsbase is loaded, we need SWAPGS and standard preparation for 
return to usermode
- */
 ENTRY(error_exit)
-   movl%ebx, %eax
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
-   testl   %eax, %eax
-   jnz retint_kernel
+   testb   $3, CS(%rsp)
+   jz  retint_kernel
jmp retint_user
 END(error_exit)
 
-- 
2.17.1




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Kirill A. Shutemov

On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote:
> On 29/11/2018 14:26, Kirill A. Shutemov wrote:
> > On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote:
> >> On 29/11/2018 02:22, Hans van Kranenburg wrote:
> >>> Hi,
> >>>
> >>> As also seen at:
> >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
> >>>
> >>> Attached there are two serial console output logs. One is starting with
> >>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
> >>>
> >>> [2.085543] BUG: unable to handle kernel paging request at
> >>> 888d9fffc000
> >>> [2.085610] PGD 200c067 P4D 200c067 PUD 0
> >>> [2.085674] Oops:  [#1] SMP NOPTI
> >>> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> >>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
> >>> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
> >>> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
> >>> [...]
> >>
> >> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
> >> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
> >> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
> >>
> >> Current upstream kernel is booting fine under Xen, so in general the
> >> patch should be fine. Using an upstream kernel built from above commit
> >> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
> >> too.
> >>
> >> Kirill, are you aware of any prerequisite patch from 4.20 which could be
> >> missing in 4.19.5?
> > 
> > I'm not.
> > 
> > Let me look into this.
> > 
> 
> What is making me suspicious is the failure happening just after
> releasing the init memory. Maybe there is an access to .init.data
> segment or similar? The native kernel booting could be related to the
> usage of 2M mappings not being available in a PV-domain.

Sounds like a valid hypothesis.

[ 2.085616] Code: 00 00 00 00 40 00 00 49 83 c5 08 48 01 04 24 4c 3b 6c 24 48 
0f 84 83 02 00 00 48 8b 04 24 48 c1 f8
 10 48 89 84 24 88 00 00 00 <49> 8b 7d 00 48 f7 c7 9f ff ff ff 0f 85 36 ff ff 
ff 41 b8 03 00 00
All code

   0:   00 00   add%al,(%rax)
   2:   00 00   add%al,(%rax)
   4:   40 00 00add%al,(%rax)
   7:   49 83 c5 08 add$0x8,%r13
   b:   48 01 04 24 add%rax,(%rsp)
   f:   4c 3b 6c 24 48  cmp0x48(%rsp),%r13
  14:   0f 84 83 02 00 00   je 0x29d
  1a:   48 8b 04 24 mov(%rsp),%rax
  1e:   48 c1 f8 10 sar$0x10,%rax
  22:   48 89 84 24 88 00 00mov%rax,0x88(%rsp)
  29:   00
  2a:*  49 8b 7d 00 mov0x0(%r13),%rdi   <-- trapping 
instruction
  2e:   48 f7 c7 9f ff ff fftest   $0xff9f,%rdi
  35:   0f 85 36 ff ff ff   jne0xff71
  3b:   41  rex.B
  3c:   b8  .byte 0xb8
  3d:   03 00   add(%rax),%eax
...

Code starting with the faulting instruction
===
   0:   49 8b 7d 00 mov0x0(%r13),%rdi
   4:   48 f7 c7 9f ff ff fftest   $0xff9f,%rdi
   b:   0f 85 36 ff ff ff   jne0xff47
  11:   41  rex.B
  12:   b8  .byte 0xb8
  13:   03 00   add(%rax),%eax
...

Reading from %r13 causes the fault.

I don't have a setup to reproduce the issue myself and have hard time
correlate the code with source.

What is ptdump_walk_pgd_level_core+0x1fd/0x490 for you?

-- 
 Kirill A. Shutemov

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Kirill A. Shutemov

On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote:
> On 29/11/2018 14:26, Kirill A. Shutemov wrote:
> > On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote:
> >> On 29/11/2018 02:22, Hans van Kranenburg wrote:
> >>> Hi,
> >>>
> >>> As also seen at:
> >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
> >>>
> >>> Attached there are two serial console output logs. One is starting with
> >>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
> >>>
> >>> [2.085543] BUG: unable to handle kernel paging request at
> >>> 888d9fffc000
> >>> [2.085610] PGD 200c067 P4D 200c067 PUD 0
> >>> [2.085674] Oops:  [#1] SMP NOPTI
> >>> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> >>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
> >>> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
> >>> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
> >>> [...]
> >>
> >> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
> >> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
> >> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
> >>
> >> Current upstream kernel is booting fine under Xen, so in general the
> >> patch should be fine. Using an upstream kernel built from above commit
> >> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
> >> too.
> >>
> >> Kirill, are you aware of any prerequisite patch from 4.20 which could be
> >> missing in 4.19.5?
> > 
> > I'm not.
> > 
> > Let me look into this.
> > 
> 
> What is making me suspicious is the failure happening just after
> releasing the init memory. Maybe there is an access to .init.data
> segment or similar? The native kernel booting could be related to the
> usage of 2M mappings not being available in a PV-domain.

Ahh.. Could you test this:

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index a12afff146d1..7dec63ec7aab 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx)
 * 8000 - 87ff is reserved for
 * the hypervisor.
 */
-   return  (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
+   return  (idx >= pgd_index(__PAGE_OFFSET) - 17) &&
(idx <  pgd_index(__PAGE_OFFSET));
 #else
return false;
-- 
 Kirill A. Shutemov

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] vpci: deferral of register write until p2m changes are done

2018-11-29 Thread Jan Beulich

>>> On 29.11.18 at 14:44,  wrote:
> I cannot see how interactions with a device with half-mapped BARs
> could trigger aborts that cannot be triggered when the device has
> fully mapped BARs. Ie: if there's indeed a way to trigger such aborts
> it would also be possible to do so with fully mapped BARs.

Whether comparing with the fully mapped case or the fully
unmapped one is more relevant I can't tell. In any event I
don't think we're going to get anywhere here without either
someone chiming in who has better PCIe knowledge than
me, or without you submitting a patch with a sufficiently
convincing description.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Kirill A. Shutemov

On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote:
> On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote:
> > On 29/11/2018 14:26, Kirill A. Shutemov wrote:
> > > On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote:
> > >> On 29/11/2018 02:22, Hans van Kranenburg wrote:
> > >>> Hi,
> > >>>
> > >>> As also seen at:
> > >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
> > >>>
> > >>> Attached there are two serial console output logs. One is starting with
> > >>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
> > >>>
> > >>> [2.085543] BUG: unable to handle kernel paging request at
> > >>> 888d9fffc000
> > >>> [2.085610] PGD 200c067 P4D 200c067 PUD 0
> > >>> [2.085674] Oops:  [#1] SMP NOPTI
> > >>> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> > >>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
> > >>> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
> > >>> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
> > >>> [...]
> > >>
> > >> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
> > >> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
> > >> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
> > >>
> > >> Current upstream kernel is booting fine under Xen, so in general the
> > >> patch should be fine. Using an upstream kernel built from above commit
> > >> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
> > >> too.
> > >>
> > >> Kirill, are you aware of any prerequisite patch from 4.20 which could be
> > >> missing in 4.19.5?
> > > 
> > > I'm not.
> > > 
> > > Let me look into this.
> > > 
> > 
> > What is making me suspicious is the failure happening just after
> > releasing the init memory. Maybe there is an access to .init.data
> > segment or similar? The native kernel booting could be related to the
> > usage of 2M mappings not being available in a PV-domain.
> 
> Ahh.. Could you test this:
> 
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index a12afff146d1..7dec63ec7aab 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx)
>* 8000 - 87ff is reserved for
>* the hypervisor.
>*/
> - return  (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
> + return  (idx >= pgd_index(__PAGE_OFFSET) - 17) &&
>   (idx <  pgd_index(__PAGE_OFFSET));
>  #else
>   return false;

Or, better, this:

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index a12afff146d1..8c04fadc4423 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -496,8 +496,8 @@ static inline bool is_hypervisor_range(int idx)
 * 8000 - 87ff is reserved for
 * the hypervisor.
 */
-   return  (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
-   (idx <  pgd_index(__PAGE_OFFSET));
+   return  (idx >= pgd_index(LDT_BASE_ADDR) - 16) &&
+   (idx <  pgd_index(LDT_BASE_ADDR));
 #else
return false;
 #endif
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 2c84c6ad8b50..b078a5b0ac91 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -652,7 +652,7 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd,
 * will end up making a zero-sized hole and so is a no-op.
 */
hole_low = pgd_index(USER_LIMIT);
-   hole_high = pgd_index(PAGE_OFFSET);
+   hole_high = pgd_index(LDT_BASE_ADDR);
 
nr = pgd_index(limit) + 1;
for (i = 0; i < nr; i++) {
-- 
 Kirill A. Shutemov

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 13/13] IOMMU: patch certain indirect calls to direct ones

2018-11-29 Thread Wei Liu

On Thu, Nov 08, 2018 at 09:17:06AM -0700, Jan Beulich wrote:
> This is intentionally not touching hooks used rarely (or not at all)
> during the lifetime of a VM, unless perhaps sitting on an error path
> next to a call which gets changed (in which case I think the error
> path better remains consistent with the respective main path).
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/shadow: don't enable shadow mode with too small a shadow allocation

2018-11-29 Thread Jan Beulich

We've had more than one report of host crashes after failed migration,
and in at least one case we've had a hint towards a too far shrunk
shadow allocation pool. Instead of just checking the pool for being
empty, check whether the pool is smaller than what
shadow_set_allocation() would minimally bump it to if it was invoked in
the first place.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -977,7 +977,7 @@ const u8 sh_type_to_size[] = {
  * allow for more than ninety allocated pages per vcpu.  We round that
  * up to 128 pages, or half a megabyte per vcpu, and add 1 more vcpu's
  * worth to make sure we never return zero. */
-static unsigned int shadow_min_acceptable_pages(struct domain *d)
+static unsigned int shadow_min_acceptable_pages(const struct domain *d)
 {
 return (d->max_vcpus + 1) * 128;
 }
@@ -1369,6 +1369,15 @@ shadow_free_p2m_page(struct domain *d, s
 paging_unlock(d);
 }
 
+static unsigned int sh_min_allocation(const struct domain *d)
+{
+/*
+ * Don't allocate less than the minimum acceptable, plus one page per
+ * megabyte of RAM (for the p2m table).
+ */
+return shadow_min_acceptable_pages(d) + (d->tot_pages / 256);
+}
+
 int shadow_set_allocation(struct domain *d, unsigned int pages, bool 
*preempted)
 {
 struct page_info *sp;
@@ -1384,9 +1393,7 @@ int shadow_set_allocation(struct domain
 else
 pages -= d->arch.paging.shadow.p2m_pages;
 
-/* Don't allocate less than the minimum acceptable, plus one page per
- * megabyte of RAM (for the p2m table) */
-lower_bound = shadow_min_acceptable_pages(d) + (d->tot_pages / 256);
+lower_bound = sh_min_allocation(d);
 if ( pages < lower_bound )
 pages = lower_bound;
 }
@@ -2712,7 +2719,7 @@ int shadow_enable(struct domain *d, u32
 
 /* Init the shadow memory allocation if the user hasn't done so */
 old_pages = d->arch.paging.shadow.total_pages;
-if ( old_pages == 0 )
+if ( old_pages < sh_min_allocation(d) + d->arch.paging.shadow.p2m_pages )
 {
 paging_lock(d);
 rv = shadow_set_allocation(d, 1024, NULL); /* Use at least 4MB */





___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 01/13] x86: reduce general stack alignment to 8

2018-11-29 Thread Wei Liu

On Thu, Nov 08, 2018 at 09:05:45AM -0700, Jan Beulich wrote:
> We don't need bigger alignment except when calling EFI boot or runtime
> services functions (and we don't guarantee that either, as explained
> close to the top of xen/common/efi/runtime.c in the struct efi_rs_state
> declaration). Hence if the compiler supports reducing stack alignment
> from the ABI compatible 16 bytes (gcc 7 and newer), do so wherever
> possible.
> 
> The EFI case itself is largely dealt with already (actually forcing
> 32-byte alignment) as a result of commit f6b7fedc89 ("x86/EFI: meet
> further spec requirements for runtime calls"). However, as explained in
> the description of that earlier change, without using
> -mincoming-stack-boundary=3 (which we don't want) we still have to make
> the compiler assume 16-byte stack boundaries for CUs making EFI calls in
> order to keep the compiler from aligning the stack, but then placing an
> odd number of 8-byte objects on it, resulting in a mis-aligned outgoing
> stack.
> 
> This as a side effect yields some code size reduction, since for a
> number of sufficiently simple non-leaf functions the stack adjustment
> (by 8, when there are no local stack variables at all) gets dropped
> altogether. I notice exceptions though, for example in guest_cpuid(),
> where in a release build gcc 8.2 now decides to set up a frame pointer
> (without ever using %rbp); I consider this a compiler quirk which we
> should leave to the compiler folks to address eventually.
> 
> Signed-off-by: Jan Beulich 

The code does what it says it does, that's all I can without having gone
through EFI spec.

Since you're the EFI maintainer, you have the final say on this matter.
Not sure if you're expecting anything else from Andrew or me.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 5/6] xen-access: add support for slotted channel vm_events

2018-11-29 Thread Petre Ovidiu PIRCALABU

On Wed, 2018-11-28 at 21:24 +0200, Razvan Cojocaru wrote:
> On 11/28/18 5:29 PM, Petre Pircalabu wrote:
> > Signed-off-by: Petre Pircalabu 
> > ---
> > 
> > +static int xenaccess_evtchn_bind(xenaccess_t *xenaccess)
> > +{
> +int rc, i = 0;
> > +
> > +rc = xenaccess_evtchn_bind_port(xenaccess->vm_event.ring-
> > >evtchn_port,
> > +xenaccess->vm_event.domain_id,
> > +&xenaccess->vm_event.ring-
> > >xce_handle,
> > +&xenaccess->vm_event.ring-
> > >port);
> > +if ( rc < 0 )
> > +{
> > +ERROR("Failed to bind ring events\n");
> > +return rc;
> > +}
> +
> +if ( xenaccess->vm_event.channel == NULL)
> > +return 0;
> > +
> > +for ( i = 0; i < xenaccess->vm_event.num_vcpus; i++ )
> > +{
> > +rc = xenaccess_evtchn_bind_port(xenaccess-
> > >vm_event.channel->evtchn_ports[i],
> > +xenaccess-
> > >vm_event.domain_id,
> > +&xenaccess-
> > >vm_event.channel->xce_handles[i],
> > +&xenaccess-
> > >vm_event.channel->ports[i]);
> > +if ( rc < 0 )
> >  {
> > -ERROR("Failed to unmask event channel port");
> > +ERROR("Failed to bind channel events\n");
> >  goto err;
> >  }
> >  }
> > -else
> > -port = -1;
> >  
> > -return port;
> > +evtchn_bind = true;
> > +return 0;
> >  
> > - err:
> > -return -errno;
> > +err:
> > +xenaccess_evtchn_unbind_port(xenaccess->vm_event.ring-
> > >evtchn_port,
> > + &xenaccess->vm_event.ring-
> > >xce_handle,
> + &xenaccess->vm_event.ring->port);
> > +
> > +for ( i--; i >= 0; i-- )
> 
> This for() looks peculiar.
> 
In case of an error "i" will point to the xce_handles index for which
"bind" failed. "unbind" has to be called for the xce_handles 0 to (i-1) 
.
> 
> Thanks,
> Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Juergen Gross

On 29/11/2018 15:32, Kirill A. Shutemov wrote:
> On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote:
>> On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote:
>>> On 29/11/2018 14:26, Kirill A. Shutemov wrote:
 On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote:
> On 29/11/2018 02:22, Hans van Kranenburg wrote:
>> Hi,
>>
>> As also seen at:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
>>
>> Attached there are two serial console output logs. One is starting with
>> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
>>
>> [2.085543] BUG: unable to handle kernel paging request at
>> 888d9fffc000
>> [2.085610] PGD 200c067 P4D 200c067 PUD 0
>> [2.085674] Oops:  [#1] SMP NOPTI
>> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
>> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
>> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
>> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
>> [...]
>
> The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
> ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
> is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
>
> Current upstream kernel is booting fine under Xen, so in general the
> patch should be fine. Using an upstream kernel built from above commit
> (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
> too.
>
> Kirill, are you aware of any prerequisite patch from 4.20 which could be
> missing in 4.19.5?

 I'm not.

 Let me look into this.

>>>
>>> What is making me suspicious is the failure happening just after
>>> releasing the init memory. Maybe there is an access to .init.data
>>> segment or similar? The native kernel booting could be related to the
>>> usage of 2M mappings not being available in a PV-domain.
>>
>> Ahh.. Could you test this:
>>
>> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
>> index a12afff146d1..7dec63ec7aab 100644
>> --- a/arch/x86/mm/dump_pagetables.c
>> +++ b/arch/x86/mm/dump_pagetables.c
>> @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx)
>>   * 8000 - 87ff is reserved for
>>   * the hypervisor.
>>   */
>> -return  (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
>> +return  (idx >= pgd_index(__PAGE_OFFSET) - 17) &&
>>  (idx <  pgd_index(__PAGE_OFFSET));
>>  #else
>>  return false;
> 
> Or, better, this:

That makes it boot again!

Any idea why upstream doesn't need it?


Juergen

> 
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index a12afff146d1..8c04fadc4423 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -496,8 +496,8 @@ static inline bool is_hypervisor_range(int idx)
>* 8000 - 87ff is reserved for
>* the hypervisor.
>*/
> - return  (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
> - (idx <  pgd_index(__PAGE_OFFSET));
> + return  (idx >= pgd_index(LDT_BASE_ADDR) - 16) &&
> + (idx <  pgd_index(LDT_BASE_ADDR));
>  #else
>   return false;
>  #endif
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index 2c84c6ad8b50..b078a5b0ac91 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -652,7 +652,7 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t 
> *pgd,
>* will end up making a zero-sized hole and so is a no-op.
>*/
>   hole_low = pgd_index(USER_LIMIT);
> - hole_high = pgd_index(PAGE_OFFSET);
> + hole_high = pgd_index(LDT_BASE_ADDR);
>  
>   nr = pgd_index(limit) + 1;
>   for (i = 0; i < nr; i++) {
> 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 01/13] x86: reduce general stack alignment to 8

2018-11-29 Thread Jan Beulich

>>> On 29.11.18 at 15:54,  wrote:
> On Thu, Nov 08, 2018 at 09:05:45AM -0700, Jan Beulich wrote:
>> We don't need bigger alignment except when calling EFI boot or runtime
>> services functions (and we don't guarantee that either, as explained
>> close to the top of xen/common/efi/runtime.c in the struct efi_rs_state
>> declaration). Hence if the compiler supports reducing stack alignment
>> from the ABI compatible 16 bytes (gcc 7 and newer), do so wherever
>> possible.
>> 
>> The EFI case itself is largely dealt with already (actually forcing
>> 32-byte alignment) as a result of commit f6b7fedc89 ("x86/EFI: meet
>> further spec requirements for runtime calls"). However, as explained in
>> the description of that earlier change, without using
>> -mincoming-stack-boundary=3 (which we don't want) we still have to make
>> the compiler assume 16-byte stack boundaries for CUs making EFI calls in
>> order to keep the compiler from aligning the stack, but then placing an
>> odd number of 8-byte objects on it, resulting in a mis-aligned outgoing
>> stack.
>> 
>> This as a side effect yields some code size reduction, since for a
>> number of sufficiently simple non-leaf functions the stack adjustment
>> (by 8, when there are no local stack variables at all) gets dropped
>> altogether. I notice exceptions though, for example in guest_cpuid(),
>> where in a release build gcc 8.2 now decides to set up a frame pointer
>> (without ever using %rbp); I consider this a compiler quirk which we
>> should leave to the compiler folks to address eventually.
>> 
>> Signed-off-by: Jan Beulich 
> 
> The code does what it says it does, that's all I can without having gone
> through EFI spec.
> 
> Since you're the EFI maintainer, you have the final say on this matter.
> Not sure if you're expecting anything else from Andrew or me.

Well, since the change affects all x86 code, not just the EFI pieces,
an ack allowing me to commit this would be nice.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.19.5 fails to boot as Xen dom0

2018-11-29 Thread Kirill A. Shutemov

On Thu, Nov 29, 2018 at 03:00:45PM +, Juergen Gross wrote:
> On 29/11/2018 15:32, Kirill A. Shutemov wrote:
> > On Thu, Nov 29, 2018 at 02:24:47PM +, Kirill A. Shutemov wrote:
> >> On Thu, Nov 29, 2018 at 01:35:17PM +, Juergen Gross wrote:
> >>> On 29/11/2018 14:26, Kirill A. Shutemov wrote:
>  On Thu, Nov 29, 2018 at 09:41:25AM +, Juergen Gross wrote:
> > On 29/11/2018 02:22, Hans van Kranenburg wrote:
> >> Hi,
> >>
> >> As also seen at:
> >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914951
> >>
> >> Attached there are two serial console output logs. One is starting with
> >> Xen 4.11 (from debian unstable) as dom0, and the other one without Xen.
> >>
> >> [2.085543] BUG: unable to handle kernel paging request at
> >> 888d9fffc000
> >> [2.085610] PGD 200c067 P4D 200c067 PUD 0
> >> [2.085674] Oops:  [#1] SMP NOPTI
> >> [2.085736] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> >> 4.19.0-trunk-amd64 #1 Debian 4.19.5-1~exp1+pvh1
> >> [2.085823] Hardware name: HP ProLiant DL360 G7, BIOS P68 05/21/2018
> >> [2.085895] RIP: e030:ptdump_walk_pgd_level_core+0x1fd/0x490
> >> [...]
> >
> > The offending stable commit is 4074ca7d8a1832921c865d250bbd08f3441b3657
> > ("x86/mm: Move LDT remap out of KASLR region on 5-level paging"), this
> > is commit d52888aa2753e3063a9d3a0c9f72f94aa9809c15 upstream.
> >
> > Current upstream kernel is booting fine under Xen, so in general the
> > patch should be fine. Using an upstream kernel built from above commit
> > (with the then needed Xen fixup patch 1457d8cf7664f34c4ba534) is fine,
> > too.
> >
> > Kirill, are you aware of any prerequisite patch from 4.20 which could be
> > missing in 4.19.5?
> 
>  I'm not.
> 
>  Let me look into this.
> 
> >>>
> >>> What is making me suspicious is the failure happening just after
> >>> releasing the init memory. Maybe there is an access to .init.data
> >>> segment or similar? The native kernel booting could be related to the
> >>> usage of 2M mappings not being available in a PV-domain.
> >>
> >> Ahh.. Could you test this:
> >>
> >> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> >> index a12afff146d1..7dec63ec7aab 100644
> >> --- a/arch/x86/mm/dump_pagetables.c
> >> +++ b/arch/x86/mm/dump_pagetables.c
> >> @@ -496,7 +496,7 @@ static inline bool is_hypervisor_range(int idx)
> >> * 8000 - 87ff is reserved for
> >> * the hypervisor.
> >> */
> >> -  return  (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
> >> +  return  (idx >= pgd_index(__PAGE_OFFSET) - 17) &&
> >>(idx <  pgd_index(__PAGE_OFFSET));
> >>  #else
> >>return false;
> > 
> > Or, better, this:
> 
> That makes it boot again!
> 
> Any idea why upstream doesn't need it?

Nope.

I'll prepare a proper fix.

-- 
 Kirill A. Shutemov

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] amd-iommu: replace occurrences of u with uint_t...

2018-11-29 Thread Woods, Brian

On Mon, Nov 26, 2018 at 11:32:53AM +, Paul Durrant wrote:
> ...for N in {8, 16, 32, 64}.
> 
> Bring the coding style up to date.
> 
> Also, while in the neighbourhood, fix some tabs and remove use of uint64_t
> values where it leads to the need for explicit casting.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 2/2] amd/iommu: skip bridge devices when updating IOMMU page tables

2018-11-29 Thread Woods, Brian

On Tue, Nov 27, 2018 at 04:24:41PM +0100, Roger Pau Monne wrote:
> Bridges are not behind an IOMMU, and are already special cased and
> skipped in amd_iommu_add_device. Apply the same special casing when
> updating page tables.
> 
> This is required or else update_paging_mode will fail and return an
> error to the caller (amd_iommu_{un}map_page) which will destroy the
> domain.
> 
> Signed-off-by: Roger Pau Monné 

Acked-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 1/2] amd/iommu: assign iommu devices to Xen

2018-11-29 Thread Woods, Brian

On Tue, Nov 27, 2018 at 04:24:40PM +0100, Roger Pau Monne wrote:
> AMD IOMMU devices are exposed on the PCI bus, and thus are assigned by
> default to the hardware domain. This can cause issues because the
> IOMMU devices themselves are not behind an IOMMU, so update_paging_mode will
> return an error if Xen tries to expand the page tables of a domain
> that has assigned devices not behind an IOMMU. update_paging_mode
> failing will cause the domain to be destroyed.
> 
> Fix this by hiding PCI IOMMU devices, so they are not assigned to the
> hardware domain.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Jan Beulich 

Acked-by: Brian Woods 

-- 
Brian Woods

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 14/16] hack: arm/domain: simplify context restore from idle vcpu

2018-11-29 Thread Julien Grall


Hi Andrii,

On 28/11/2018 21:32, Andrii Anisov wrote:

From: Andrii Anisov 

Simplify context restore from idle vcpu to the one ran before it.
This improves low cpu load but high irq rate use-cases.


While I agree that the context switch today is pretty inefficient, I would be 
interest to know how much you improve it.


Also, it would be nice if you can explain why you improve it that way. Why do 
you only need to restore the timer and gic?


Lastly, there are code coming up that will require p2m_save_store and 
p2m_restore_state to work in pair (see Cortex A76 erratum [1]). So that solution 
may not work very well in all the cases.


Overall, I think you can save much more in the context switch than what you 
currently do. For instance, if you are switch from vCPU A to idle vCPU you don't 
need to save the context.


You would only do it if after the idle vCPU you are scheduling a different vCPU 
or the vCPU is been migrated.


Another place of optimization when switching between 2 non-idle vCPU is 
save/restore of the VFP. You can avoid restoring them and instead trap it when 
the guest is using it.


A last place is the number of isb scattered over the context switch code. A lot 
of them are unnecessary at all because the system registers asssociated to 
EL1/EL0 are out-of-context.


I know that Stefano has been working on some patches. Stefano would it be 
possible to share your work?


Cheers,

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03241.html



Signed-off-by: Andrii Anisov 
---
  xen/arch/arm/domain.c   | 21 +++--
  xen/include/xen/sched.h |  1 +
  2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 1d926dc..8e886b7 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -187,9 +187,6 @@ static void ctxt_switch_to(struct vcpu *n)
  WRITE_SYSREG32(vpidr, VPIDR_EL2);
  WRITE_SYSREG(n->arch.vmpidr, VMPIDR_EL2);
  
-/* VGIC */

-gic_restore_state(n);
-
  /* VFP */
  vfp_restore_state(n);
  
@@ -263,11 +260,6 @@ static void ctxt_switch_to(struct vcpu *n)

  WRITE_SYSREG(n->arch.csselr, CSSELR_EL1);
  
  isb();

-
-/* This is could trigger an hardware interrupt from the virtual
- * timer. The interrupt needs to be injected into the guest. */
-WRITE_SYSREG32(n->arch.cntkctl, CNTKCTL_EL1);
-virt_timer_restore(n);
  }
  
  /* Update per-VCPU guest runstate shared memory area (if registered). */

@@ -302,8 +294,17 @@ static void update_runstate_area(struct vcpu *v)
  static void schedule_tail(struct vcpu *prev)
  {
  ctxt_switch_from(prev);
-
-ctxt_switch_to(current);
+if ( !(is_idle_vcpu(prev) && (prev->prev == current)) )
+ctxt_switch_to(current);
+/* VGIC */
+if ( !is_idle_vcpu(current) )
+{
+gic_restore_state(current);
+WRITE_SYSREG32(current->arch.cntkctl, CNTKCTL_EL1);
+virt_timer_restore(current);
+}
+else
+current->prev = prev;
  
  local_irq_enable();
  
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h

index 0309c1f..e85108d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -272,6 +272,7 @@ struct vcpu
  struct vpci_vcpu vpci;
  
  struct arch_vcpu arch;

+struct vcpu *prev;
  };
  
  /* Per-domain lock can be recursively acquired in fault handlers. */




--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 14/16] hack: arm/domain: simplify context restore from idle vcpu

2018-11-29 Thread Wei Liu

FYI the To: field is empty for your patch.

On Wed, Nov 28, 2018 at 11:32:09PM +0200, Andrii Anisov wrote:
[...]
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -272,6 +272,7 @@ struct vcpu
>  struct vpci_vcpu vpci;
>  
>  struct arch_vcpu arch;
> +struct vcpu *prev;

I guess this is why I get CC'ed on this patch. I would suggest you put
it into arch_vcpu if this is going to be ARM specific.

(I have skipped other parts of this email)

Wei.

>  };
>  
>  /* Per-domain lock can be recursively acquired in fault handlers. */
> -- 
> 2.7.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Qemu-devel] [PATCH for-3.2 v4 15/28] hw: apply accel compat properties without touching globals

2018-11-29 Thread Eduardo Habkost

On Wed, Nov 28, 2018 at 12:02:21AM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Nov 27, 2018 at 11:40 PM Eduardo Habkost  wrote:
> >
> > On Tue, Nov 27, 2018 at 01:27:48PM +0400, Marc-André Lureau wrote:
> > > Introduce object_apply_global_props() function, to apply compatibility
> > > properties from a GPtrArray.
> > >
> > > For accel compatibility properties, apply them during
> > > device_post_init(), after accel_register_compat_props() has set them.
> > >
> > > To populate the compatibility properties, introduce SET_COMPAT(), a
> > > more generic version of SET_MACHINE_COMPAT() that can set compat
> > > properties on other objects than Machine, and using GPtrArray.
> > >
> > > Signed-off-by: Marc-André Lureau 
> > > ---
> > >  include/hw/qdev-core.h | 13 +
> > >  include/qom/object.h   |  3 +++
> > >  include/sysemu/accel.h |  4 +---
> > >  accel/accel.c  | 12 
> > >  hw/core/qdev.c | 11 +++
> > >  hw/xen/xen-common.c| 38 +++---
> > >  qom/object.c   | 25 +
> > >  vl.c   |  2 +-
> > >  8 files changed, 73 insertions(+), 35 deletions(-)
> > >
> > > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > > index a24d0dd566..82afd3c50d 100644
> > > --- a/include/hw/qdev-core.h
> > > +++ b/include/hw/qdev-core.h
> > > @@ -267,6 +267,19 @@ typedef struct GlobalProperty {
> > >  Error **errp;
> > >  } GlobalProperty;
> > >
> > > +#define SET_COMPAT(S, COMPAT)   \
> > > +do {\
> > > +int i;  \
> > > +static GlobalProperty props[] = {   \
> > > +COMPAT  \
> > > +};  \
> > > +for (i = 0; i < G_N_ELEMENTS(props); i++) { \
> > > +g_ptr_array_add((S)->compat_props, (void *)&props[i]);  \
> > > +}   \
> > > +} while (0)
> >
> > I think this macro would be an acceptable alternative to the
> > existing SET_MACHINE_COMPAT macro trickery, but:
> >
> > > +
> > > +void accel_register_compat_props(const GPtrArray *props);
> > [...]
> > > @@ -185,7 +183,9 @@ static void xen_accel_class_init(ObjectClass *oc, 
> > > void *data)
> > >  ac->init_machine = xen_init;
> > >  ac->setup_post = xen_setup_post;
> > >  ac->allowed = &xen_allowed;
> > > -ac->global_props = xen_compat_props;
> > > +ac->compat_props = g_ptr_array_new();
> > > +
> > > +SET_COMPAT(ac, XEN_COMPAT);
> >
> > I think this is a step backwards.  I like us to be able to
> > register compat properties without macro magic.  The existence of
> > SET_MACHINE_COMPAT is a bug and not a feature.
> >
> > If you really want to use GPtrArray instead of a simple
> > GlobalProperty* field (I'm not sure I understand the reasoning
> > behind the choice to use GPtrArray), what about:
> 
> Except in the Xen case, It needs to register multiple GlobalProperty*,
> not necessarily from contiguous in memory. That's why we have an array
> of ptr.

If you also need to register multiple properties not from a
contiguous array, would a simple wrapper that does a single
g_ptr_array_add() call be enough?


> 
> >
> > static GPtrArray *build_compat_props_array(GlobalProperty *props)
> > {
> > GlobalProperty *p = props;
> > GPtrArray *array = g_ptr_array_new();
> > while (p->driver) {
> > g_ptr_array_add(array, (void *)p);
> > }
> > return array;
> > }
> >
> >
> > static void xen_accel_class_init(ObjectClass *oc, void *data)
> > {
> > ...
> > ac->compat_props = build_compat_props_array(xen_compat_props);
> 
> If we would register from one place, that would be fine.
> 
> We could replace the macro by a function, then we would have to
> declare the GlobalProperty arrays manually basically.

I would prefer to replace the macro with a function, if possible.
What do you mean by declaring the GlobalProperty arrays manually?


> 
> > }
> >
> >
> >
> > >  }
> > >
> > >  #define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen")
> > > diff --git a/qom/object.c b/qom/object.c
> > > index 17921c0a71..dbdab0aead 100644
> > > --- a/qom/object.c
> > > +++ b/qom/object.c
> > > @@ -370,6 +370,31 @@ static void object_post_init_with_type(Object *obj, 
> > > TypeImpl *ti)
> > >  }
> > >  }
> > >
> > > +void object_apply_global_props(Object *obj, const GPtrArray *props, 
> > > Error **errp)
> > > +{
> > > +Error *err = NULL;
> > > +int i;
> > > +
> > > +if (!props) {
> > > +return;
> > > +}
> > > +
> > > +for (i = 0; i < props->len; i++) {
> > > +GlobalProperty *p = g_ptr_array_index(props, i);
> > > +
> > > +if (object_dynamic_cast(obj, p->driver) == NULL)

Re: [Xen-devel] [PATCH 03/18] xen: introduce 'xen-qdisk'

2018-11-29 Thread Anthony PERARD

On Wed, Nov 21, 2018 at 03:11:56PM +, Paul Durrant wrote:
> This patch adds a new XenDevice: 'xen-qdisk' [1]. This will eventually
> replace the 'xen_disk' legacy PV backend but it is illustrative to build
> up the implementation incrementally, along with the XenBus/XenDevice
> framework. Subsequent patches will therefore add to this device's
> implementation as new features are added to the framework.
> 
> After this patch has been applied it is possible to instantiate a new
> 'xen-qdisk' device with a single 'vdev' parameter, which accepts values
> adhering to the Xen VBD naming scheme [2]. For example, a command-line
> instantiation of a xen-qdisk can be done with an argument similar to the
> following:
> 
> -device xen-qdisk,vdev=hda

That works when QEMU boot, but doing the same thing once the guest have
booted, via QMP, doesn't. Here is the result (tested in qmp-shell):

(QEMU) device_add driver=xen-qdisk vdev=hda
{
"error": {
"class": "GenericError", 
"desc": "Bus 'xen-bus.0' does not support hotplugging"
}
}

That's probably why I've asked about the hotplug capability on the
previous patch.

> The implementation of the vdev parameter formulates the appropriate VBD
> number for use in the PV protocol.
> 
> [1] The name 'qdisk' as always been the name given to the QEMU
> implementation of the Xen PV block protocol backend implementation
> [2] 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/man/xen-vbd-interface.markdown.7

Maybe a link to the generated docs would be better:
https://xenbits.xen.org/docs/unstable/man/xen-vbd-interface.7.html

Also, it would be useful to have the same link in the source code.

> diff --git a/hw/block/xen-qdisk.c b/hw/block/xen-qdisk.c
> new file mode 100644
> index 00..72122073f7
> --- /dev/null
> +++ b/hw/block/xen-qdisk.c
[...]
> +static char *disk_to_vbd_name(unsigned int disk)
> +{
> +unsigned int len = DIV_ROUND_UP(disk, 26);
> +char *name = g_malloc0(len + 1);
> +
> +do {
> +name[len--] = 'a' + (disk % 26);
> +disk /= 26;
> +} while (disk != 0);
> +assert(len == 0);
> +
> +return name;
> +}

That function doesn't work.

For a simple xvdp, (so disk==15), it return "", I mean "\0p".

For a more complicated 'xvdbhwza', we have len == 22901. And the assert failed.

Maybe the recursing algo in libxl would be fine, with a buffer that is
big enough, and could probably be on the stack (in _get_vdev).

> +
> +switch (vdev->type) {
> +case XEN_QDISK_VDEV_TYPE_DP:
> +case XEN_QDISK_VDEV_TYPE_XVD:
> +if (vdev->disk < (1 << 4) && vdev->partition < (1 << 4)) {
> +vdev->number = (202 << 8) | (vdev->disk << 4) |
> +vdev->partition;
> +} else if (vdev->disk < (1 << 20) && vdev->partition < (1 << 8)) {
> +vdev->number = (1 << 28) | (vdev->disk << 8) |
> +vdev->partition;
> +} else {
> +goto invalid;
> +}
> +break;
> +
> +case XEN_QDISK_VDEV_TYPE_HD:
> +if ((vdev->disk == 0 || vdev->disk == 1) &&
> +vdev->partition < (1 << 4)) {

I think that should be:

vdev->partition < (1 << 6)

Because hd disk have 0..63 partitions.

> +vdev->number = (3 << 8) | (vdev->disk << 6) | vdev->partition;
> +} else if ((vdev->disk == 2 || vdev->disk == 3) &&
> +   vdev->partition < (1 << 4)) {

same here.

> +vdev->number = (22 << 8) | ((vdev->disk - 2) << 6) |
> +vdev->partition;
> +} else {
> +goto invalid;
> +}
> +break;
> +
> +case XEN_QDISK_VDEV_TYPE_SD:
> +if (vdev->disk < (1 << 4) && vdev->partition < (1 << 4)) {
> +vdev->number = (8 << 8) | (vdev->disk << 4) | vdev->partition;
> +} else {
> +goto invalid;
> +}
> +break;
> +
> +default:
> +goto invalid;
> +}
> +
> +g_free(str);
> +vdev->valid = true;
> +return;
> +
> +invalid:
> +error_setg(errp, "invalid virtual disk specifier");
> +g_free(str);

:(, g_free is called twice.

maybe we could have:
vdev->valid=true;
out:
  if (!vdev->valid)
error_setg(...);
  g_free;

> diff --git a/include/hw/xen/xen-qdisk.h b/include/hw/xen/xen-qdisk.h
> new file mode 100644
> index 00..ade0866037
> --- /dev/null
> +++ b/include/hw/xen/xen-qdisk.h
> @@ -0,0 +1,38 @@
> +/*
> + * Copyright (c) Citrix Systems Inc.
> + * All rights reserved.
> + */
> +
> +#ifndef HW_XEN_QDISK_H
> +#define HW_XEN_QDISK_H
> +
> +#include "hw/xen/xen-bus.h"
> +
> +typedef enum XenQdiskVdevType {
> +XEN_QDISK_VDEV_TYPE_DP,

Maybe we could set type_dp value to 1, so that, when vdev->type isn't
set, we can detect it later.


> +XEN_QDISK_VDEV_TYPE_XVD,
> +XEN_QDISK_VDEV_TYPE_HD,
> +XEN_QDISK_VDEV_TYPE_SD,
> +XEN_QDISK_VDEV_TYPE__MAX
> +} XenQdiskVdevType;

Thanks,

-- 
Anthony PERARD

___

Re: [Xen-devel] [RFC 10/16] gic:vgic:gic-vgic: introduce non-atomic bitops

2018-11-29 Thread Julien Grall


Hi,

On 29/11/2018 12:14, Andre Przywara wrote:

On Wed, 28 Nov 2018 23:32:05 +0200
Andrii Anisov  wrote:

Hi,


From: Andrii Anisov 

All bit operations for gic, vgic and gic-vgic are performed under
spinlocks, so there is no need for atomic bit ops here, they only
introduce excessive call to functions used more expensive exclusive
ARM instructions.

Signed-off-by: Andrii Anisov 
---
  xen/arch/arm/gic-vgic.c | 16 
  xen/arch/arm/gic.c  | 16 
  xen/arch/arm/vgic.c | 16 
  3 files changed, 48 insertions(+)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index f0e6c6f..5b73bbd 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -25,6 +25,22 @@
  #include 
  #include 
  
+#undef set_bit

+#define set_bit(nr, addr) (*(addr) |= (1<  
  DEFINE_PER_CPU(uint64_t, lr_mask);
  
+#undef set_bit

+#define set_bit(nr, addr) (*(addr) |= (1<  
  const struct gic_hw_operations *gic_hw_ops;

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index c142476..7691310 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -33,6 +33,22 @@
  #include 
  #include 
  
+#undef set_bit


Nah, please don't do this. Can you show that atomic bit ops are a
problem? They shouldn't be expensive unless contended, also pretty
lightweight on small systems (single cluster).

But if you really think this is useful, why not go with the Linux way
of using __set_bit to provide a non-atomic version?


We already have __set_bit/__clear_bit on Xen. However, it looks like the arm 
version is a wrapper to the atomic version. Rationale from the comment is those 
functions should be interrupt safe.


I don't know where that requirement come from. You already have race if that 
variable is modified simultaneously. So I would just re-use the Linux version.



This would have the big advantage that you can replace them on a
case-by-case base, which is much less risky than unconditionally
replacing every (even future!) usage in the whole file.

Cheers,
Andre.


+#define set_bit(nr, addr) (*(addr) |= (1<



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] remove the ->mapping_error method from dma_map_ops V2

2018-11-29 Thread Christoph Hellwig

On Wed, Nov 28, 2018 at 11:19:15AM -0800, Linus Torvalds wrote:
> Let me just paste it back in here:
> 
>   "Which is what we ALREADY do for these exact reasons. If the DMA
> mappings means that you'd need to add one more page to that list of
> reserved pages, then so be it."
> 
> So no, I'm not at all confused.
> 
> Let me re-iterate: the argument that all addresses have to be dma'able is
> garbage.
> 
> *Exactly* as with kmalloc and limited virtual addresses, we can limit
> physical addresses.

We can.  At least in theory.  The problem is that depending on the
crazy mapping from physical and kernel virtual address to dma addresses
these might be pages at pretty random places.  Look at fun like
arch/x86/pci/sta2x11-fixup.c for how ugly these mappings could look.

It also means that we might have setup swiotlb on just about every
32-bit architecture, even if it has no real addressing limit except for
the one we imposed.  I don't really see how this is a win for us just
to be able to report more detailed error codes, which would be nice
to have, but the lack of which hasn't really harmed us.

So as far as I'm concerned I'd go either with the series that we are
discussing here, or change the map_page method to return an errno
and the dma_addr_t in the argument.  As Davem pointed out that can lead
to less optimal code, but it would still be better than the indirect
call we have.  But then again I think the series as posted here might
and up much simpler and good enough without opening up this rathole.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 16/16] gic: vgic: align frequently accessed data by cache line size

2018-11-29 Thread Julien Grall




On 28/11/2018 21:32, Andrii Anisov wrote:

From: Andrii Anisov 

Cache line size assumed 64 byte as for H3. Align the `struct
pending_irq` and allocate lrs shadow aligned to cache line size.


The arch_vcpu is already aligned to a cache size. So how does it improve the 
performance by making lr also aligned to a cache line?


I can believe that pending_irq would be nice to have it aligned to a cache line 
(or half of it). But you need to explain the trade-off as you are going to use 
more memory.




Signed-off-by: Andrii Anisov 
---
  xen/arch/arm/domain.c| 4 
  xen/arch/arm/vgic.c  | 9 +
  xen/include/asm-arm/config.h | 2 +-
  xen/include/asm-arm/gic.h| 2 +-
  xen/include/asm-arm/vgic.h   | 2 +-
  5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 8e886b7..8bcb667 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -558,6 +558,10 @@ int arch_vcpu_create(struct vcpu *v)
  v->arch.saved_context.sp = (register_t)v->arch.cpu_info;
  v->arch.saved_context.pc = (register_t)continue_new_vcpu;
  
+v->arch.gic.v2.lr = xzalloc_bytes(sizeof(uint32_t) * gic_number_lines());

+if ( v->arch.gic.v2.lr == NULL )
+return -ENOMEM;
+
  /* Idle VCPUs don't need the rest of this setup */
  if ( is_idle_vcpu(v) )
  return rc;
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 7691310..bedfb99 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -166,6 +166,15 @@ int domain_vgic_init(struct domain *d, unsigned int 
nr_spis)
  
  d->arch.vgic.pending_irqs =

  xzalloc_array(struct pending_irq, d->arch.vgic.nr_spis);
+
+if ( sizeof(struct pending_irq) != dcache_line_bytes )
+{
+printk ("sizeof(struct pending_irq) = %lu  is not equal to cacheline"
+"size %lu. Is it expected?\n", sizeof(struct pending_irq),
+dcache_line_bytes);
+WARN();
+}
What could happen if pending_irq is not aligned to your cacheline? If your 
cacheline is smaller than pending_irq, then you are going to use multiple cache 
line. So nothing to worry.


If your cacheline is bigger, then it is a potential concern as you would share 
the cacheline with something else. This may be an issue, but I would like to see 
numbers first.



+
  if ( d->arch.vgic.pending_irqs == NULL )
  return -ENOMEM;
  
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h

index bc89e84..4f3669f 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -28,7 +28,7 @@
  
  #define CONFIG_ARM 1
  
-#define CONFIG_ARM_L1_CACHE_SHIFT 7 /* XXX */

+#define CONFIG_ARM_L1_CACHE_SHIFT 6 /* XXX */


That's not acceptable, this value is based on the biggest cache line used by Arm 
processors. If you want to use 64 bytes cache line, then you either make sure 
the structure fits in 64-bytes by adding padding.


  
  #define CONFIG_SMP 1
  
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h

index add2566..fe44d3a 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -186,7 +186,7 @@ struct gic_v2 {
  uint32_t hcr;
  uint32_t vmcr;
  uint32_t apr;
-uint32_t lr[64];
+uint32_t *lr;
  uint64_t lr_update_mask;
  };
  
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h

index a27a1a9..d4ec96f 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -99,7 +99,7 @@ struct pending_irq
   * TODO: when implementing irq migration, taking only the current
   * vgic lock is not going to be enough. */
  struct list_head lr_queue;
-};
+}__cacheline_aligned;
  
  #define NR_INTERRUPT_PER_RANK   32

  #define INTERRUPT_RANK_MASK (NR_INTERRUPT_PER_RANK - 1)



Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 12/16] gic-v2: Write HCR only on change

2018-11-29 Thread Julien Grall




On 28/11/2018 21:32, Andrii Anisov wrote:

From: Andrii Anisov 

This saves one write to peripheral HCR register per hypervisor entry for
most cases.

Signed-off-by: Andrii Anisov 
---
  xen/arch/arm/gic-v2.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 1a744c5..25147bd 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -579,14 +579,17 @@ static void gicv2_write_lr(int lr, const struct gic_lr 
*lr_reg)
  
  static void gicv2_hcr_status(uint32_t flag, bool status)

  {
-uint32_t hcr = readl_gich(GICH_HCR);
+uint32_t hcr, ohcr;
+
+ohcr = hcr = readl_gich(GICH_HCR);
  
  if ( status )

  hcr |= flag;
  else
  hcr &= (~flag);
  
-writel_gich(hcr, GICH_HCR);

+if ( hcr != ohcr )
+writel_gich(hcr, GICH_HCR);


While I understand that theoretically this is an issue. Is it actually a real 
performance benefits?


I would actually expect the read to be more expensive than the write because we 
implement writel_gich using writel_relaxed. So there are no barrier afterwards 
to "block" the processor to go forward.



  }
  
  static unsigned int gicv2_read_vmcr_priority(void)




Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

1 2 >

1 - 100 of 142 matches

Mail list logo