Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing

2014-12-09 Thread Andrew Cooper
On 09/12/14 08:46, Jan Beulich wrote:
>> We have certain machines which are showing reliable failure to boot
>> under Xen-4.5, where they worked with 4.4.  Symptoms range from the dom0
>> kernel crashing before printing anything, to complaining that the initrd
>> is corrupt when attempting to decompress.  This appears to be hardware
>> specific.
> Any chance this is C-state related, just like narrowed down to for
> http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00228.html?
> I.e. Westmere Xeons being affected? If not, this would seem rather
> worrying to me (read: a release blocker). And even if so, a workaround
> would be minimally needed. Otoh you didn't report so for earlier RCs -
> was that just because the testing scope was more narrow then, or can
> we imply that this is a recently introduced regression?
>
> Jan
>

I very much doubt it.  We blanket set max cstate to 1 on that era of
hardware, because the existing workarounds in Xen still experimentally
don't work.

https://github.com/xenserver/xen-4.4.pg/blob/master/detect-nehalem-c-state.patch

The first system I am looking at with a view to fixing is a SandyBridge
EN IBM Blade.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing

2014-12-09 Thread Paul Durrant
> -Original Message-
> From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> boun...@lists.xen.org] On Behalf Of Andrew Cooper
> Sent: 09 December 2014 00:21
> To: Konrad Rzeszutek Wilk
> Cc: Xen-devel List
> Subject: Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
> 
> On 08/12/2014 20:00, Konrad Rzeszutek Wilk wrote:
> > On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote:
> >> Hi,
> > Hey Andrew!
> >> Over the weekend, XenServer testing managed to run a side-by-side test
> >> of XenServer trunk and XenServer experimental xen-4.5.  These are
> >> identical other than the version of Xen (and associated libraries) in
> >> use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace,
> >> other than as linked against newer Xen libraries.
> >>
> >> The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg-
> config
> >> to determine systemd library availability"
> >>
> >> There are a few notable issues exposed:
> >>
> >> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it
> didn't in
> >> xen-4.4, given identical parameters.  The hypercall gained an extra
> >> permission check as part of 0561e1f01e.  Our usecase here is a daemon in
> >> dom0 mapping guest RAM to emulate a graphics card, but I currently don't
> >> see how that is incompatible with the new permissions check.
> > I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to
> grant
> > the other domain access to its RAM? And before it makes the
> > XEN_DOMCTL_memory_mapping hypercall.
> 
> This is purely an implementer of the ioreq server infrastructure
> providing an emulated set of BARs in the guest as qemu would, but
> without using dom0 map-foreign powers.  The gfn ranges in question are
> regular guest RAM as far as I am aware, and should not require any
> special io permissions.

IIRC the ranges in question are sections of BAR on the GPU which the dom0 code 
is trying to inject into the guest.

  Paul

> 
> Either way, the identified changeset has apparently caused a regression,
> but I am not yet certain whether it is legitimately disabling something
> which should not have worked in the first place, or whether it is a
> change which needs reconsidering.
> 
> >
> >> Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90%
> >> of the time, both PV and HVM guests).  This includes migrates from older
> >> XenServers using the legacy->v2 migration code which is confirmed-good
> >> in "upgrade to Xen-4.4" case.  The migration v2 code in use is identical.
> > The XenServer is not using the in-the-tree migration system except in
> > the older version of XenServer right?
> 
> XenServer expermental-4.5 is strictly using migration v2, including
> upgrade from legacy, but as far as I am aware identical migration v2 as
> our current Xen-4.4 trunk which works fine for all tests.
> 
> >> We have certain machines which are showing reliable failure to boot
> >> under Xen-4.5, where they worked with 4.4.  Symptoms range from the
> dom0
> >> kernel crashing before printing anything, to complaining that the initrd
> >> is corrupt when attempting to decompress.  This appears to be hardware
> >> specific.
> > Hardware specific is good. Could you give some ideas of what make/model
> > this is?
> 
> They are all IBM blades to the best of my knowledge, which are a similar
> system to the hardware which gave me 1ed7679 to debug, so I am hoping it
> is a latent BIOS issue and not a Xen 4.5 issue.
> 
> ~Andrew
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing

2014-12-09 Thread Jan Beulich
>>> On 09.12.14 at 01:21,  wrote:
> On 08/12/2014 20:00, Konrad Rzeszutek Wilk wrote:
>> On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote:
>>> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in
>>> xen-4.4, given identical parameters.  The hypercall gained an extra
>>> permission check as part of 0561e1f01e.  Our usecase here is a daemon in
>>> dom0 mapping guest RAM to emulate a graphics card, but I currently don't
>>> see how that is incompatible with the new permissions check.
>> I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to grant
>> the other domain access to its RAM? And before it makes the
>> XEN_DOMCTL_memory_mapping hypercall.
> 
> This is purely an implementer of the ioreq server infrastructure
> providing an emulated set of BARs in the guest as qemu would, but
> without using dom0 map-foreign powers.  The gfn ranges in question are
> regular guest RAM as far as I am aware, and should not require any
> special io permissions.
> 
> Either way, the identified changeset has apparently caused a regression,
> but I am not yet certain whether it is legitimately disabling something
> which should not have worked in the first place, or whether it is a
> change which needs reconsidering.

Actually I think this is still too lax: When we set up Dom0's iomem
permissions, we blindly add all memory, when we should only add
all I/O memory (or better, exclude all RAM). Iiuc such a change
would similarly break your daemon, without need for Arianna's.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing

2014-12-09 Thread Jan Beulich
>>> On 08.12.14 at 20:03,  wrote:
> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in
> xen-4.4, given identical parameters.  The hypercall gained an extra
> permission check as part of 0561e1f01e.  Our usecase here is a daemon in
> dom0 mapping guest RAM to emulate a graphics card, but I currently don't
> see how that is incompatible with the new permissions check.

This seems quite obvious: The added check makes sure that what gets
mapped is I/O memory both domains have access to, yet you say the
daemon maps guest RAM. What I can't see is why you need this
hypercall in this case - given what you say it's certainly not meant for
the daemon to map memory into the guest? Mapping guest RAM ought
to work via the privcmd kernel interface.

> We have certain machines which are showing reliable failure to boot
> under Xen-4.5, where they worked with 4.4.  Symptoms range from the dom0
> kernel crashing before printing anything, to complaining that the initrd
> is corrupt when attempting to decompress.  This appears to be hardware
> specific.

Any chance this is C-state related, just like narrowed down to for
http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00228.html?
I.e. Westmere Xeons being affected? If not, this would seem rather
worrying to me (read: a release blocker). And even if so, a workaround
would be minimally needed. Otoh you didn't report so for earlier RCs -
was that just because the testing scope was more narrow then, or can
we imply that this is a recently introduced regression?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing

2014-12-08 Thread Andrew Cooper
On 08/12/2014 20:00, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote:
>> Hi,
> Hey Andrew!
>> Over the weekend, XenServer testing managed to run a side-by-side test
>> of XenServer trunk and XenServer experimental xen-4.5.  These are
>> identical other than the version of Xen (and associated libraries) in
>> use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace,
>> other than as linked against newer Xen libraries.
>>
>> The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg-config
>> to determine systemd library availability"
>>
>> There are a few notable issues exposed:
>>
>> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in
>> xen-4.4, given identical parameters.  The hypercall gained an extra
>> permission check as part of 0561e1f01e.  Our usecase here is a daemon in
>> dom0 mapping guest RAM to emulate a graphics card, but I currently don't
>> see how that is incompatible with the new permissions check.
> I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to grant
> the other domain access to its RAM? And before it makes the
> XEN_DOMCTL_memory_mapping hypercall.

This is purely an implementer of the ioreq server infrastructure
providing an emulated set of BARs in the guest as qemu would, but
without using dom0 map-foreign powers.  The gfn ranges in question are
regular guest RAM as far as I am aware, and should not require any
special io permissions.

Either way, the identified changeset has apparently caused a regression,
but I am not yet certain whether it is legitimately disabling something
which should not have worked in the first place, or whether it is a
change which needs reconsidering.

>
>> Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90%
>> of the time, both PV and HVM guests).  This includes migrates from older
>> XenServers using the legacy->v2 migration code which is confirmed-good
>> in "upgrade to Xen-4.4" case.  The migration v2 code in use is identical.
> The XenServer is not using the in-the-tree migration system except in
> the older version of XenServer right?

XenServer expermental-4.5 is strictly using migration v2, including
upgrade from legacy, but as far as I am aware identical migration v2 as
our current Xen-4.4 trunk which works fine for all tests.

>> We have certain machines which are showing reliable failure to boot
>> under Xen-4.5, where they worked with 4.4.  Symptoms range from the dom0
>> kernel crashing before printing anything, to complaining that the initrd
>> is corrupt when attempting to decompress.  This appears to be hardware
>> specific.
> Hardware specific is good. Could you give some ideas of what make/model
> this is?

They are all IBM blades to the best of my knowledge, which are a similar
system to the hardware which gave me 1ed7679 to debug, so I am hoping it
is a latent BIOS issue and not a Xen 4.5 issue.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing

2014-12-08 Thread Konrad Rzeszutek Wilk
On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote:
> Hi,

Hey Andrew!
> 
> Over the weekend, XenServer testing managed to run a side-by-side test
> of XenServer trunk and XenServer experimental xen-4.5.  These are
> identical other than the version of Xen (and associated libraries) in
> use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace,
> other than as linked against newer Xen libraries.
> 
> The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg-config
> to determine systemd library availability"
> 
> There are a few notable issues exposed:
> 
> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in
> xen-4.4, given identical parameters.  The hypercall gained an extra
> permission check as part of 0561e1f01e.  Our usecase here is a daemon in
> dom0 mapping guest RAM to emulate a graphics card, but I currently don't
> see how that is incompatible with the new permissions check.

I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to grant
the other domain access to its RAM? And before it makes the
XEN_DOMCTL_memory_mapping hypercall.

> 
> Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90%
> of the time, both PV and HVM guests).  This includes migrates from older
> XenServers using the legacy->v2 migration code which is confirmed-good
> in "upgrade to Xen-4.4" case.  The migration v2 code in use is identical.

The XenServer is not using the in-the-tree migration system except in
the older version of XenServer right?
> 
> We have certain machines which are showing reliable failure to boot
> under Xen-4.5, where they worked with 4.4.  Symptoms range from the dom0
> kernel crashing before printing anything, to complaining that the initrd
> is corrupt when attempting to decompress.  This appears to be hardware
> specific.

Hardware specific is good. Could you give some ideas of what make/model
this is?

> 
> 
> I will be looking into all of these issues, to identify whether they are
> indeed regressions in xen-4.5, or whether I have make some mistakes
> forward porting our patch queue.

OK
> 
> Unfortunately, our PCI Passthrough testing has been blocked behind other
> regressions which have crept in recently, meaning that both sets of
> tests were equally affected, and no 4.4 vs 4.5 comparison has been
> possible at this time.  I hope this will be fixed by the next time I run
> a similar pair of tests.

 Thank you!
> 
> ~Andrew
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] XenServer Xen-4.5 (-rc3ish) testing

2014-12-08 Thread Andrew Cooper
Hi,

Over the weekend, XenServer testing managed to run a side-by-side test
of XenServer trunk and XenServer experimental xen-4.5.  These are
identical other than the version of Xen (and associated libraries) in
use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace,
other than as linked against newer Xen libraries.

The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg-config
to determine systemd library availability"

There are a few notable issues exposed:

XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in
xen-4.4, given identical parameters.  The hypercall gained an extra
permission check as part of 0561e1f01e.  Our usecase here is a daemon in
dom0 mapping guest RAM to emulate a graphics card, but I currently don't
see how that is incompatible with the new permissions check.

Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90%
of the time, both PV and HVM guests).  This includes migrates from older
XenServers using the legacy->v2 migration code which is confirmed-good
in "upgrade to Xen-4.4" case.  The migration v2 code in use is identical.

We have certain machines which are showing reliable failure to boot
under Xen-4.5, where they worked with 4.4.  Symptoms range from the dom0
kernel crashing before printing anything, to complaining that the initrd
is corrupt when attempting to decompress.  This appears to be hardware
specific.


I will be looking into all of these issues, to identify whether they are
indeed regressions in xen-4.5, or whether I have make some mistakes
forward porting our patch queue.

Unfortunately, our PCI Passthrough testing has been blocked behind other
regressions which have crept in recently, meaning that both sets of
tests were equally affected, and no 4.4 vs 4.5 comparison has been
possible at this time.  I hope this will be fixed by the next time I run
a similar pair of tests.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel