Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
On 09/12/14 08:46, Jan Beulich wrote: >> We have certain machines which are showing reliable failure to boot >> under Xen-4.5, where they worked with 4.4. Symptoms range from the dom0 >> kernel crashing before printing anything, to complaining that the initrd >> is corrupt when attempting to decompress. This appears to be hardware >> specific. > Any chance this is C-state related, just like narrowed down to for > http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00228.html? > I.e. Westmere Xeons being affected? If not, this would seem rather > worrying to me (read: a release blocker). And even if so, a workaround > would be minimally needed. Otoh you didn't report so for earlier RCs - > was that just because the testing scope was more narrow then, or can > we imply that this is a recently introduced regression? > > Jan > I very much doubt it. We blanket set max cstate to 1 on that era of hardware, because the existing workarounds in Xen still experimentally don't work. https://github.com/xenserver/xen-4.4.pg/blob/master/detect-nehalem-c-state.patch The first system I am looking at with a view to fixing is a SandyBridge EN IBM Blade. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
> -Original Message- > From: xen-devel-boun...@lists.xen.org [mailto:xen-devel- > boun...@lists.xen.org] On Behalf Of Andrew Cooper > Sent: 09 December 2014 00:21 > To: Konrad Rzeszutek Wilk > Cc: Xen-devel List > Subject: Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing > > On 08/12/2014 20:00, Konrad Rzeszutek Wilk wrote: > > On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote: > >> Hi, > > Hey Andrew! > >> Over the weekend, XenServer testing managed to run a side-by-side test > >> of XenServer trunk and XenServer experimental xen-4.5. These are > >> identical other than the version of Xen (and associated libraries) in > >> use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace, > >> other than as linked against newer Xen libraries. > >> > >> The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg- > config > >> to determine systemd library availability" > >> > >> There are a few notable issues exposed: > >> > >> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it > didn't in > >> xen-4.4, given identical parameters. The hypercall gained an extra > >> permission check as part of 0561e1f01e. Our usecase here is a daemon in > >> dom0 mapping guest RAM to emulate a graphics card, but I currently don't > >> see how that is incompatible with the new permissions check. > > I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to > grant > > the other domain access to its RAM? And before it makes the > > XEN_DOMCTL_memory_mapping hypercall. > > This is purely an implementer of the ioreq server infrastructure > providing an emulated set of BARs in the guest as qemu would, but > without using dom0 map-foreign powers. The gfn ranges in question are > regular guest RAM as far as I am aware, and should not require any > special io permissions. IIRC the ranges in question are sections of BAR on the GPU which the dom0 code is trying to inject into the guest. Paul > > Either way, the identified changeset has apparently caused a regression, > but I am not yet certain whether it is legitimately disabling something > which should not have worked in the first place, or whether it is a > change which needs reconsidering. > > > > >> Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90% > >> of the time, both PV and HVM guests). This includes migrates from older > >> XenServers using the legacy->v2 migration code which is confirmed-good > >> in "upgrade to Xen-4.4" case. The migration v2 code in use is identical. > > The XenServer is not using the in-the-tree migration system except in > > the older version of XenServer right? > > XenServer expermental-4.5 is strictly using migration v2, including > upgrade from legacy, but as far as I am aware identical migration v2 as > our current Xen-4.4 trunk which works fine for all tests. > > >> We have certain machines which are showing reliable failure to boot > >> under Xen-4.5, where they worked with 4.4. Symptoms range from the > dom0 > >> kernel crashing before printing anything, to complaining that the initrd > >> is corrupt when attempting to decompress. This appears to be hardware > >> specific. > > Hardware specific is good. Could you give some ideas of what make/model > > this is? > > They are all IBM blades to the best of my knowledge, which are a similar > system to the hardware which gave me 1ed7679 to debug, so I am hoping it > is a latent BIOS issue and not a Xen 4.5 issue. > > ~Andrew > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
>>> On 09.12.14 at 01:21, wrote: > On 08/12/2014 20:00, Konrad Rzeszutek Wilk wrote: >> On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote: >>> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in >>> xen-4.4, given identical parameters. The hypercall gained an extra >>> permission check as part of 0561e1f01e. Our usecase here is a daemon in >>> dom0 mapping guest RAM to emulate a graphics card, but I currently don't >>> see how that is incompatible with the new permissions check. >> I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to grant >> the other domain access to its RAM? And before it makes the >> XEN_DOMCTL_memory_mapping hypercall. > > This is purely an implementer of the ioreq server infrastructure > providing an emulated set of BARs in the guest as qemu would, but > without using dom0 map-foreign powers. The gfn ranges in question are > regular guest RAM as far as I am aware, and should not require any > special io permissions. > > Either way, the identified changeset has apparently caused a regression, > but I am not yet certain whether it is legitimately disabling something > which should not have worked in the first place, or whether it is a > change which needs reconsidering. Actually I think this is still too lax: When we set up Dom0's iomem permissions, we blindly add all memory, when we should only add all I/O memory (or better, exclude all RAM). Iiuc such a change would similarly break your daemon, without need for Arianna's. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
>>> On 08.12.14 at 20:03, wrote: > XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in > xen-4.4, given identical parameters. The hypercall gained an extra > permission check as part of 0561e1f01e. Our usecase here is a daemon in > dom0 mapping guest RAM to emulate a graphics card, but I currently don't > see how that is incompatible with the new permissions check. This seems quite obvious: The added check makes sure that what gets mapped is I/O memory both domains have access to, yet you say the daemon maps guest RAM. What I can't see is why you need this hypercall in this case - given what you say it's certainly not meant for the daemon to map memory into the guest? Mapping guest RAM ought to work via the privcmd kernel interface. > We have certain machines which are showing reliable failure to boot > under Xen-4.5, where they worked with 4.4. Symptoms range from the dom0 > kernel crashing before printing anything, to complaining that the initrd > is corrupt when attempting to decompress. This appears to be hardware > specific. Any chance this is C-state related, just like narrowed down to for http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00228.html? I.e. Westmere Xeons being affected? If not, this would seem rather worrying to me (read: a release blocker). And even if so, a workaround would be minimally needed. Otoh you didn't report so for earlier RCs - was that just because the testing scope was more narrow then, or can we imply that this is a recently introduced regression? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
On 08/12/2014 20:00, Konrad Rzeszutek Wilk wrote: > On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote: >> Hi, > Hey Andrew! >> Over the weekend, XenServer testing managed to run a side-by-side test >> of XenServer trunk and XenServer experimental xen-4.5. These are >> identical other than the version of Xen (and associated libraries) in >> use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace, >> other than as linked against newer Xen libraries. >> >> The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg-config >> to determine systemd library availability" >> >> There are a few notable issues exposed: >> >> XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in >> xen-4.4, given identical parameters. The hypercall gained an extra >> permission check as part of 0561e1f01e. Our usecase here is a daemon in >> dom0 mapping guest RAM to emulate a graphics card, but I currently don't >> see how that is incompatible with the new permissions check. > I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to grant > the other domain access to its RAM? And before it makes the > XEN_DOMCTL_memory_mapping hypercall. This is purely an implementer of the ioreq server infrastructure providing an emulated set of BARs in the guest as qemu would, but without using dom0 map-foreign powers. The gfn ranges in question are regular guest RAM as far as I am aware, and should not require any special io permissions. Either way, the identified changeset has apparently caused a regression, but I am not yet certain whether it is legitimately disabling something which should not have worked in the first place, or whether it is a change which needs reconsidering. > >> Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90% >> of the time, both PV and HVM guests). This includes migrates from older >> XenServers using the legacy->v2 migration code which is confirmed-good >> in "upgrade to Xen-4.4" case. The migration v2 code in use is identical. > The XenServer is not using the in-the-tree migration system except in > the older version of XenServer right? XenServer expermental-4.5 is strictly using migration v2, including upgrade from legacy, but as far as I am aware identical migration v2 as our current Xen-4.4 trunk which works fine for all tests. >> We have certain machines which are showing reliable failure to boot >> under Xen-4.5, where they worked with 4.4. Symptoms range from the dom0 >> kernel crashing before printing anything, to complaining that the initrd >> is corrupt when attempting to decompress. This appears to be hardware >> specific. > Hardware specific is good. Could you give some ideas of what make/model > this is? They are all IBM blades to the best of my knowledge, which are a similar system to the hardware which gave me 1ed7679 to debug, so I am hoping it is a latent BIOS issue and not a Xen 4.5 issue. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
On Mon, Dec 08, 2014 at 07:03:19PM +, Andrew Cooper wrote: > Hi, Hey Andrew! > > Over the weekend, XenServer testing managed to run a side-by-side test > of XenServer trunk and XenServer experimental xen-4.5. These are > identical other than the version of Xen (and associated libraries) in > use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace, > other than as linked against newer Xen libraries. > > The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg-config > to determine systemd library availability" > > There are a few notable issues exposed: > > XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in > xen-4.4, given identical parameters. The hypercall gained an extra > permission check as part of 0561e1f01e. Our usecase here is a daemon in > dom0 mapping guest RAM to emulate a graphics card, but I currently don't > see how that is incompatible with the new permissions check. I presume the daemon also does 'XEN_DOMCTL_iomem_permission' to grant the other domain access to its RAM? And before it makes the XEN_DOMCTL_memory_mapping hypercall. > > Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90% > of the time, both PV and HVM guests). This includes migrates from older > XenServers using the legacy->v2 migration code which is confirmed-good > in "upgrade to Xen-4.4" case. The migration v2 code in use is identical. The XenServer is not using the in-the-tree migration system except in the older version of XenServer right? > > We have certain machines which are showing reliable failure to boot > under Xen-4.5, where they worked with 4.4. Symptoms range from the dom0 > kernel crashing before printing anything, to complaining that the initrd > is corrupt when attempting to decompress. This appears to be hardware > specific. Hardware specific is good. Could you give some ideas of what make/model this is? > > > I will be looking into all of these issues, to identify whether they are > indeed regressions in xen-4.5, or whether I have make some mistakes > forward porting our patch queue. OK > > Unfortunately, our PCI Passthrough testing has been blocked behind other > regressions which have crept in recently, meaning that both sets of > tests were equally affected, and no 4.4 vs 4.5 comparison has been > possible at this time. I hope this will be fixed by the next time I run > a similar pair of tests. Thank you! > > ~Andrew > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] XenServer Xen-4.5 (-rc3ish) testing
Hi, Over the weekend, XenServer testing managed to run a side-by-side test of XenServer trunk and XenServer experimental xen-4.5. These are identical other than the version of Xen (and associated libraries) in use, i.e. identical dom0 kernel, Xapi toolstack and dom0 userspace, other than as linked against newer Xen libraries. The Xen-4.5 tests were on top of c/s e6c3d371d4 "systemd: use pkg-config to determine systemd library availability" There are a few notable issues exposed: XEN_DOMCTL_memory_mapping hypercall fails with EPERM where it didn't in xen-4.4, given identical parameters. The hypercall gained an extra permission check as part of 0561e1f01e. Our usecase here is a daemon in dom0 mapping guest RAM to emulate a graphics card, but I currently don't see how that is incompatible with the new permissions check. Migrations from older Xens to Xen-4.5 fairly reliably crash domains (90% of the time, both PV and HVM guests). This includes migrates from older XenServers using the legacy->v2 migration code which is confirmed-good in "upgrade to Xen-4.4" case. The migration v2 code in use is identical. We have certain machines which are showing reliable failure to boot under Xen-4.5, where they worked with 4.4. Symptoms range from the dom0 kernel crashing before printing anything, to complaining that the initrd is corrupt when attempting to decompress. This appears to be hardware specific. I will be looking into all of these issues, to identify whether they are indeed regressions in xen-4.5, or whether I have make some mistakes forward porting our patch queue. Unfortunately, our PCI Passthrough testing has been blocked behind other regressions which have crept in recently, meaning that both sets of tests were equally affected, and no 4.4 vs 4.5 comparison has been possible at this time. I hope this will be fixed by the next time I run a similar pair of tests. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel