Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 22/01/2016 08:57, Håkon Alstadheim wrote: > Den 17. jan. 2016 16:25, skrev Andrew Cooper: >> On 17/01/16 15:16, Andrew Cooper wrote: > This isn't the first time we have seen this on Haswell processors. Do > you have microcode loading set up? > > ~Andrew > Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated cpu microcode, using microcode from 20151106. > ... Actually, this will be more useful: diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 1228568..4e75b03 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1165,6 +1165,15 @@ static void __do_IRQ_guest(int irq) if ( action->ack_type == ACKTYPE_EOI ) { sp = pending_eoi_sp(peoi); +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) +{ +int p; + +printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector); +for ( p = sp; p > 0; --p ) +printk("**peoi[%d] = {%d, %#x, %d}\n", + p-1, peoi[p-1].irq, peoi[p-1].vector, peoi[p-1].ready); +} ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); peoi[sp].irq = irq; > Got one again. dom5 is my desktop, dom1 is my > mail-server/router/firewall. (planning to split that up ... ) . Is there > any additional info that would be useful? > > Running now with gentoo xen 4.6.0-r8 and xen-tools 4.6.0-r7. dom0 kernel > is gentoo-sources-4.1.15-r1 , and the above patch. > > I tried running with maxcpus=6 for a while, but I had to disable some > services to get that running. So, when nothing happened for a while I > re-enabled all my cores (two cpus, 12 cores, 24 threads). I was running > with two cpu-pools, one for each cpu. I have not re-enabled that. grant_table.c:1491:d1v3 Expanding dom (1) grant table from (12) to (13) frames. ** sp 1, irq 107, vec 0x3b **peoi[0] = {107, 0x3b, 0} Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172 [ Xen-4.6.0 x86_64 debug=y Tainted:C ] Xen call trace: [] do_IRQ+0x451/0x6ea [] common_interrupt+0x62/0x70 [] mwait_idle+0x2cb/0x315 [] idle_loop+0x51/0x6b So we have been interrupted with an interrupt we already believe to be pending. I wonder if there is an erratum to do with going to sleep with a pending interrupt. I will see about extending the debugging patch to stash the IIR/ISR before going to sleep. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
>>> On 22.01.16 at 10:20,wrote: > ** sp 1, irq 107, vec 0x3b > **peoi[0] = {107, 0x3b, 0} > Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172 > [ Xen-4.6.0 x86_64 debug=y Tainted:C ] > > Xen call trace: >[] do_IRQ+0x451/0x6ea >[] common_interrupt+0x62/0x70 >[] mwait_idle+0x2cb/0x315 >[] idle_loop+0x51/0x6b > > So we have been interrupted with an interrupt we already believe to be > pending. I wonder if there is an erratum to do with going to sleep with > a pending interrupt. An immediate way to check whether that's (part of) the problem would be to run with "cpuidle=0" for a while. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
>>> On 18.01.16 at 11:35,wrote: > On 18/01/16 10:31, Jan Beulich wrote: > On 18.01.16 at 00:07, wrote: >>> There we go :-/ . Log attached from boot to assertion-failure with >>> loglvl=all guest_loglvl=all . Some of the log output might be a bit >>> cryptic, they are notes to myself from local boot-scripts, basically >>> firing up my router/name-server/dhcp-server and waiting until services >>> are ready before continuing. >>> >>> --- >>> (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0} >> According to >> >> (XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> >> IRQ 3 > Mode:0 Active:0) >> >> this might be the serial console, albeit IRQ 107 contradicts this >> afaict. Does this also occur without serial console? Are we >> perhaps wrongly re-using vector 0x40 (and if so might this be >> fixed with -unstable commit fc0c3fa2ad, in turn requiring >> e509b8e09c)? > > I also had a bug in the first patch which printed the vector as 0x%u, > fixed in the second to be %#x. As such, the actual vector on the > pending EOI stack is 0x28. That wouldn't make it any better, as then, considering the other similar messages, we would have to conclude it's the vector of some other Xen internally used device (the IOMMU?), which again shouldn't be used by guest IRQ unless it got recycled (albeit I don't think e.g. IOMMU vectors get recycled at all). Håkon, considering (XEN) Failed to enable Interrupt Remapping: Will not enable x2APIC. plus (XEN) Intel VT-d Interrupt Remapping enabled. (a logging inconsistency addressed on -unstable already) could you check your BIOS setup whether you can make firmware permit use of x2APIC mode? And could you try whether the issue goes away with "maxcpus=6" (or less) on the Xen command line? Also, you appear to be doing GPU pass-through - is the problem connected to that? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 18/01/16 10:31, Jan Beulich wrote: On 18.01.16 at 00:07,wrote: >> There we go :-/ . Log attached from boot to assertion-failure with >> loglvl=all guest_loglvl=all . Some of the log output might be a bit >> cryptic, they are notes to myself from local boot-scripts, basically >> firing up my router/name-server/dhcp-server and waiting until services >> are ready before continuing. >> >> --- >> (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0} > According to > > (XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> > IRQ 3 Mode:0 Active:0) > > this might be the serial console, albeit IRQ 107 contradicts this > afaict. Does this also occur without serial console? Are we > perhaps wrongly re-using vector 0x40 (and if so might this be > fixed with -unstable commit fc0c3fa2ad, in turn requiring > e509b8e09c)? I also had a bug in the first patch which printed the vector as 0x%u, fixed in the second to be %#x. As such, the actual vector on the pending EOI stack is 0x28. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
>>> On 18.01.16 at 00:07,wrote: > There we go :-/ . Log attached from boot to assertion-failure with > loglvl=all guest_loglvl=all . Some of the log output might be a bit > cryptic, they are notes to myself from local boot-scripts, basically > firing up my router/name-server/dhcp-server and waiting until services > are ready before continuing. > > --- > (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0} According to (XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> IRQ 3 Mode:0 Active:0) this might be the serial console, albeit IRQ 107 contradicts this afaict. Does this also occur without serial console? Are we perhaps wrongly re-using vector 0x40 (and if so might this be fixed with -unstable commit fc0c3fa2ad, in turn requiring e509b8e09c)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
Den 18. jan. 2016 11:31, skrev Jan Beulich: On 18.01.16 at 00:07,wrote: >> There we go :-/ . Log attached from boot to assertion-failure with >> loglvl=all guest_loglvl=all . Some of the log output might be a bit >> cryptic, they are notes to myself from local boot-scripts, basically >> firing up my router/name-server/dhcp-server and waiting until services >> are ready before continuing. >> >> --- >> (XEN) [2016-01-17 22:46:38] **peoi[0] = {107, 0x40, 0} > According to > > (XEN) [2016-01-17 16:50:49] IOAPIC[0]: Set PCI routing entry (1-3 -> 0x40 -> > IRQ 3 Mode:0 Active:0) > > this might be the serial console, albeit IRQ 107 contradicts this > afaict. Does this also occur without serial console? Are we > perhaps wrongly re-using vector 0x40 (and if so might this be > fixed with -unstable commit fc0c3fa2ad, in turn requiring > e509b8e09c)? > I don't understand all this, but fyi I believe I have two "serial ports" on the motherboard, one old fashioned serial-port and one created by BMC for "SOL". They show up in dom0 as ttyS0 and ttyS1. Only ttyS0 is ever used that I am aware of. I also have one usb-rs232 emulation thingy which is actually my UPS. All of these are used directly by dom0. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 17/01/2016 23:07, Håkon Alstadheim wrote: > Den 17. jan. 2016 17:30, skrev Håkon Alstadheim: >> Den 17. jan. 2016 16:16, skrev Andrew Cooper: >>> On 17/01/16 14:50, Håkon Alstadheim wrote: Den 15. jan. 2016 12:05, skrev Andrew Cooper: > On 15/01/16 10:58, Håkon Alstadheim wrote: >> CPUINFO: >> vendor_id: GenuineIntel >> cpu family: 6 >> model: 63 >> model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz >> >> # smbios-sys-info >> Libsmbios version: 2.2.28 >> Product Name: Z10PE-D8 WS >> Vendor: ASUSTeK COMPUTER INC. >> BIOS Version: 3101 >> >> >> I have been experiencing issues with domains with passed through PCIe >> devices since I first installed xen. Then at version 4.5.x , I'm now >> at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci >> pass through and interrupts (usb-cards, sound cards). >> >> Recently the system has been more stable, whether it is because I pass >> through as few things as possible, or because of improvements in Xen I >> do not know. I have also taken to building with debug, which leads to >> more abrupt but less mysterious failures. Earlier (w/o debug and under >> xen 4.5 ) stuff would just gradually stop working and end up in total >> hang of everything. So, hey, things are improving :-b > This isn't the first time we have seen this on Haswell processors. Do > you have microcode loading set up? > > ~Andrew > Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated cpu microcode, using microcode from 20151106. >>> Ok - I previously investigated this issue, but my repro evaporated from >>> under my feet with a firmware update, and I never got to the bottom of it. >>> >>> Please can you start with the following patch which will dump some more >>> information on crash. >>> >>> ---8<--- >>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c >>> index 1228568..588b562 100644 >>> --- a/xen/arch/x86/irq.c >>> +++ b/xen/arch/x86/irq.c >>> @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq) >>> if ( action->ack_type == ACKTYPE_EOI ) >>> { >>> sp = pending_eoi_sp(peoi); >>> +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) >>> +{ >>> +int p; >>> +for ( p = sp; p > 0; --p ) >>> +printk("**peoi[%d] = {%d, 0x%u, %d}\n", >>> + p-1, peoi[p-1].irq, peoi[p-1].vector, >>> peoi[p-1].ready); >>> +} >>> ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); >>> ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); >>> peoi[sp].irq = irq; >>> >>> >> Will do. Building now. >> Seems there is a line accidentally folded "peoi[p-1].ready);" belongs at >> the end of preceding line I presume? >> > There we go :-/ . Log attached from boot to assertion-failure with > loglvl=all guest_loglvl=all . Some of the log output might be a bit > cryptic, they are notes to myself from local boot-scripts, basically > firing up my router/name-server/dhcp-server and waiting until services > are ready before continuing. Would you mind running with the second patch I sent? It gathers more information. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 17/01/16 14:50, Håkon Alstadheim wrote: > Den 15. jan. 2016 12:05, skrev Andrew Cooper: >> On 15/01/16 10:58, Håkon Alstadheim wrote: >>> CPUINFO: >>> vendor_id: GenuineIntel >>> cpu family: 6 >>> model: 63 >>> model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz >>> >>> # smbios-sys-info >>> Libsmbios version: 2.2.28 >>> Product Name: Z10PE-D8 WS >>> Vendor: ASUSTeK COMPUTER INC. >>> BIOS Version: 3101 >>> >>> >>> I have been experiencing issues with domains with passed through PCIe >>> devices since I first installed xen. Then at version 4.5.x , I'm now >>> at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci >>> pass through and interrupts (usb-cards, sound cards). >>> >>> Recently the system has been more stable, whether it is because I pass >>> through as few things as possible, or because of improvements in Xen I >>> do not know. I have also taken to building with debug, which leads to >>> more abrupt but less mysterious failures. Earlier (w/o debug and under >>> xen 4.5 ) stuff would just gradually stop working and end up in total >>> hang of everything. So, hey, things are improving :-b >> This isn't the first time we have seen this on Haswell processors. Do >> you have microcode loading set up? >> >> ~Andrew >> > Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated > cpu microcode, using microcode from 20151106. Ok - I previously investigated this issue, but my repro evaporated from under my feet with a firmware update, and I never got to the bottom of it. Please can you start with the following patch which will dump some more information on crash. ---8<--- diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 1228568..588b562 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq) if ( action->ack_type == ACKTYPE_EOI ) { sp = pending_eoi_sp(peoi); +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) +{ +int p; +for ( p = sp; p > 0; --p ) +printk("**peoi[%d] = {%d, 0x%u, %d}\n", + p-1, peoi[p-1].irq, peoi[p-1].vector, peoi[p-1].ready); +} ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); peoi[sp].irq = irq; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 17/01/16 15:16, Andrew Cooper wrote: > >>> This isn't the first time we have seen this on Haswell processors. Do >>> you have microcode loading set up? >>> >>> ~Andrew >>> >> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated >> cpu microcode, using microcode from 20151106. > Ok - I previously investigated this issue, but my repro evaporated from > under my feet with a firmware update, and I never got to the bottom of it. > > Please can you start with the following patch which will dump some more > information on crash. > > ---8<--- > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > index 1228568..588b562 100644 > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq) > if ( action->ack_type == ACKTYPE_EOI ) > { > sp = pending_eoi_sp(peoi); > +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) > +{ > +int p; > +for ( p = sp; p > 0; --p ) > +printk("**peoi[%d] = {%d, 0x%u, %d}\n", > + p-1, peoi[p-1].irq, peoi[p-1].vector, > peoi[p-1].ready); > +} > ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); > ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); > peoi[sp].irq = irq; Actually, this will be more useful: diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 1228568..4e75b03 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1165,6 +1165,15 @@ static void __do_IRQ_guest(int irq) if ( action->ack_type == ACKTYPE_EOI ) { sp = pending_eoi_sp(peoi); +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) +{ +int p; + +printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector); +for ( p = sp; p > 0; --p ) +printk("**peoi[%d] = {%d, %#x, %d}\n", + p-1, peoi[p-1].irq, peoi[p-1].vector, peoi[p-1].ready); +} ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); peoi[sp].irq = irq; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
Den 17. jan. 2016 16:16, skrev Andrew Cooper: > On 17/01/16 14:50, Håkon Alstadheim wrote: >> Den 15. jan. 2016 12:05, skrev Andrew Cooper: >>> On 15/01/16 10:58, Håkon Alstadheim wrote: CPUINFO: vendor_id: GenuineIntel cpu family: 6 model: 63 model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz # smbios-sys-info Libsmbios version: 2.2.28 Product Name: Z10PE-D8 WS Vendor: ASUSTeK COMPUTER INC. BIOS Version: 3101 I have been experiencing issues with domains with passed through PCIe devices since I first installed xen. Then at version 4.5.x , I'm now at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci pass through and interrupts (usb-cards, sound cards). Recently the system has been more stable, whether it is because I pass through as few things as possible, or because of improvements in Xen I do not know. I have also taken to building with debug, which leads to more abrupt but less mysterious failures. Earlier (w/o debug and under xen 4.5 ) stuff would just gradually stop working and end up in total hang of everything. So, hey, things are improving :-b >>> This isn't the first time we have seen this on Haswell processors. Do >>> you have microcode loading set up? >>> >>> ~Andrew >>> >> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated >> cpu microcode, using microcode from 20151106. > Ok - I previously investigated this issue, but my repro evaporated from > under my feet with a firmware update, and I never got to the bottom of it. > > Please can you start with the following patch which will dump some more > information on crash. > > ---8<--- > diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c > index 1228568..588b562 100644 > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq) > if ( action->ack_type == ACKTYPE_EOI ) > { > sp = pending_eoi_sp(peoi); > +if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) > +{ > +int p; > +for ( p = sp; p > 0; --p ) > +printk("**peoi[%d] = {%d, 0x%u, %d}\n", > + p-1, peoi[p-1].irq, peoi[p-1].vector, > peoi[p-1].ready); > +} > ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); > ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); > peoi[sp].irq = irq; > > Will do. Building now. Seems there is a line accidentally folded "peoi[p-1].ready);" belongs at the end of preceding line I presume? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 15/01/16 10:58, Håkon Alstadheim wrote: > This is just a preliminary report, mostly just for the record. > > I will report again if this keeps happening after 4.7 is out, or upon > request. Anyone working on this, please mail me and request more > information. I have available logs from dom0 boot (I dump dmesg and xl > dmesg to disk after every boot, and log dom0 serial console to disk). > I will send boot logs if requested. I will turn on maximum verbosity > and provide all output. My serial console is very slow, so I can not > keep running at max verbosity all the time. > > At the end of this mail there is "xl info" and output from dom0 serial > console. > > CPUINFO: > vendor_id: GenuineIntel > cpu family: 6 > model: 63 > model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz > > # smbios-sys-info > Libsmbios version: 2.2.28 > Product Name: Z10PE-D8 WS > Vendor: ASUSTeK COMPUTER INC. > BIOS Version: 3101 > > Dom0 OS: > Linux gentoo 4.1.12-gentoo #1 SMP Sat Jan 2 09:36:31 CET 2016 x86_64 > Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux. > Kernel is gentoo-sources, with experimental use-flag. Cpu type set to > Haswell. Issue also happened without experimental. > # cat /proc/cmdline > placeholder root=LABEL=ssdroot ro > xen-pciback.hide=(02:00.*)(08:00.*)(00:1b.*)(81:00.*)(82:00.*)(83:00.*) > console=hvc0 > console=vga domodules domdadm dolvm intel_iommu=on earlyprintk=xen > usbcore.autosuspend=-1 > > The system is mostly built with stable packages, xen and xen-tools > keyworded to ~amd64. > > I have been experiencing issues with domains with passed through PCIe > devices since I first installed xen. Then at version 4.5.x , I'm now > at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci > pass through and interrupts (usb-cards, sound cards). > > Recently the system has been more stable, whether it is because I pass > through as few things as possible, or because of improvements in Xen I > do not know. I have also taken to building with debug, which leads to > more abrupt but less mysterious failures. Earlier (w/o debug and under > xen 4.5 ) stuff would just gradually stop working and end up in total > hang of everything. So, hey, things are improving :-b This isn't the first time we have seen this on Haswell processors. Do you have microcode loading set up? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 01/15/2016 12:05 PM, Andrew Cooper wrote: On 15/01/16 10:58, Håkon Alstadheim wrote: This is just a preliminary report, mostly just for the record. I will report again if this keeps happening after 4.7 is out, or upon request. Anyone working on this, please mail me and request more information. I have available logs from dom0 boot (I dump dmesg and xl dmesg to disk after every boot, and log dom0 serial console to disk). I will send boot logs if requested. I will turn on maximum verbosity and provide all output. My serial console is very slow, so I can not keep running at max verbosity all the time. At the end of this mail there is "xl info" and output from dom0 serial console. CPUINFO: vendor_id: GenuineIntel cpu family: 6 model: 63 model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz # smbios-sys-info Libsmbios version: 2.2.28 Product Name: Z10PE-D8 WS Vendor: ASUSTeK COMPUTER INC. BIOS Version: 3101 Dom0 OS: Linux gentoo 4.1.12-gentoo #1 SMP Sat Jan 2 09:36:31 CET 2016 x86_64 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux. Kernel is gentoo-sources, with experimental use-flag. Cpu type set to Haswell. Issue also happened without experimental. # cat /proc/cmdline placeholder root=LABEL=ssdroot ro xen-pciback.hide=(02:00.*)(08:00.*)(00:1b.*)(81:00.*)(82:00.*)(83:00.*) console=hvc0 console=vga domodules domdadm dolvm intel_iommu=on earlyprintk=xen usbcore.autosuspend=-1 The system is mostly built with stable packages, xen and xen-tools keyworded to ~amd64. I have been experiencing issues with domains with passed through PCIe devices since I first installed xen. Then at version 4.5.x , I'm now at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci pass through and interrupts (usb-cards, sound cards). Recently the system has been more stable, whether it is because I pass through as few things as possible, or because of improvements in Xen I do not know. I have also taken to building with debug, which leads to more abrupt but less mysterious failures. Earlier (w/o debug and under xen 4.5 ) stuff would just gradually stop working and end up in total hang of everything. So, hey, things are improving :-b This isn't the first time we have seen this on Haswell processors. Do you have microcode loading set up? Not entirely sure to be honest. Is microcode: 0x31 the newest? I AM running the very latest bios from Asus, but I do not have confidence in my microcode loading setup, so I have not had one in place. Trying now. Downloading microcode.dat from Intel Installing iucode_tool, which in its --help states: -w, --write-to=fileWrite selected microcodes to a file in binary format. The binary format is suitable to be uploaded to the kernel Ran "iucode_tool microcode.dat -w microcode.bin" # ls -l micro* -rwxr-xr-x 1 root root 693248 Jan 15 12:40 microcode.bin -rwxr-xr-x 1 root root 2081807 Nov 6 04:04 microcode.dat placed microcode.bin in /boot/microcode.bin booted with : --- xen_commandline: ssd-xen-debug-marker console_timestamps=date loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 core_parking=power ucode=microcode.bin --- #cat /proc/cpuinfo | grep micro says: microcode: 0x31 This is no change from previous boot. Now: How do I know wheter 0x31 is the newest? Grepping the console output reveals no reference to ucode or microcode other than the Xen command-line. --- Håkon ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 01/15/2016 01:42 PM, Jan Beulich wrote: On 15.01.16 at 13:32,wrote: placed microcode.bin in /boot/microcode.bin booted with : --- xen_commandline: ssd-xen-debug-marker console_timestamps=date loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 core_parking=power ucode=microcode.bin --- This can't work - did you look at the command line documentation? You can't specify a file name here - there's no file system driver inside the hypervisor, and hence it can't read files (it instead has to rely on the boot loader bringing those into memory for it). Get with the times :-) . Under EFI it most definitely wants a file-name. Not entirely sure about the file FORMAT though. From xen-command-line.html "Note further that use of this option has an unspecified effect when used with xen.efi (there the concept of modules doesn't exist, and the blob gets specified via the ucode= config file/section entry; see EFI configuration file description). From efi.html "ucode= Specifies a CPU microcode blob to load. (x86 only) #cat /proc/cpuinfo | grep micro says: microcode: 0x31 This is no change from previous boot. Now: How do I know wheter 0x31 is the newest? By checking - for the precise model and stepping of your CPU(s) - the information in the blob (which admittedly is a little cumbersome, but without knowing model and stepping I also can't try to help). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
>>> On 15.01.16 at 13:32,wrote: > placed microcode.bin in /boot/microcode.bin > > booted with : > --- > xen_commandline: ssd-xen-debug-marker console_timestamps=date > loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug > iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 > dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose > tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 > core_parking=power ucode=microcode.bin > --- This can't work - did you look at the command line documentation? You can't specify a file name here - there's no file system driver inside the hypervisor, and hence it can't read files (it instead has to rely on the boot loader bringing those into memory for it). > #cat /proc/cpuinfo | grep micro > says: microcode: 0x31 > > This is no change from previous boot. > Now: How do I know wheter 0x31 is the newest? By checking - for the precise model and stepping of your CPU(s) - the information in the blob (which admittedly is a little cumbersome, but without knowing model and stepping I also can't try to help). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
This is just a preliminary report, mostly just for the record. I will report again if this keeps happening after 4.7 is out, or upon request. Anyone working on this, please mail me and request more information. I have available logs from dom0 boot (I dump dmesg and xl dmesg to disk after every boot, and log dom0 serial console to disk). I will send boot logs if requested. I will turn on maximum verbosity and provide all output. My serial console is very slow, so I can not keep running at max verbosity all the time. At the end of this mail there is "xl info" and output from dom0 serial console. CPUINFO: vendor_id: GenuineIntel cpu family: 6 model: 63 model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz # smbios-sys-info Libsmbios version: 2.2.28 Product Name: Z10PE-D8 WS Vendor: ASUSTeK COMPUTER INC. BIOS Version: 3101 Dom0 OS: Linux gentoo 4.1.12-gentoo #1 SMP Sat Jan 2 09:36:31 CET 2016 x86_64 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz GenuineIntel GNU/Linux. Kernel is gentoo-sources, with experimental use-flag. Cpu type set to Haswell. Issue also happened without experimental. # cat /proc/cmdline placeholder root=LABEL=ssdroot ro xen-pciback.hide=(02:00.*)(08:00.*)(00:1b.*)(81:00.*)(82:00.*)(83:00.*) console=hvc0 console=vga domodules domdadm dolvm intel_iommu=on earlyprintk=xen usbcore.autosuspend=-1 The system is mostly built with stable packages, xen and xen-tools keyworded to ~amd64. I have been experiencing issues with domains with passed through PCIe devices since I first installed xen. Then at version 4.5.x , I'm now at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci pass through and interrupts (usb-cards, sound cards). Recently the system has been more stable, whether it is because I pass through as few things as possible, or because of improvements in Xen I do not know. I have also taken to building with debug, which leads to more abrupt but less mysterious failures. Earlier (w/o debug and under xen 4.5 ) stuff would just gradually stop working and end up in total hang of everything. So, hey, things are improving :-b ---xl info: host : gentoo release: 4.1.12-gentoo version: #1 SMP Sat Jan 2 09:36:31 CET 2016 machine: x86_64 nr_cpus: 24 max_cpu_id : 23 nr_nodes : 2 cores_per_socket : 6 threads_per_core : 2 cpu_mhz: 2394 hw_caps: bfebfbff:2c100800::7f00:77fefbff::0021:37ab virt_caps : hvm hvm_directio total_memory : 65379 free_memory: 20123 sharing_freed_memory : 0 sharing_used_memory: 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 6 xen_extra : .0 xen_version: 4.6.0 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params: virt_start=0x8000 xen_changeset : xen_commandline: ssd-xen-marker console_timestamps=date loglvl=warn/warn guest_loglvl=warn/warn iommu=1 iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose tmem=1 sched_smt_power_savings=1 e820-verbose=1 core_parking=power cc_compiler: x86_64-pc-linux-gnu-gcc (Gentoo 4.9.3 p1.4, pie-0.6.4) 4.9.3 cc_compile_by : cc_compile_domain : alstadheim.priv.no cc_compile_date: Tue Jan 12 22:19:05 CET 2016 xend_config_format : 4 - serial console output from time dom0 finished booting until crash: (I have more of these) (XEN) [2016-01-13 08:46:25] tmem: flushing tmem pools for domid=4 (XEN) [2016-01-13 08:54:27] tmem: flushing tmem pools for domid=5 (XEN) [2016-01-13 09:01:53] tmem: flushing tmem pools for domid=6 (XEN) [2016-01-13 09:04:41] tmem: flushing tmem pools for domid=7 (XEN) [2016-01-13 09:19:46] tmem: flushing tmem pools for domid=8 (XEN) [2016-01-13 09:22:42] tmem: flushing tmem pools for domid=9 (XEN) [2016-01-13 09:40:37] tmem: flushing tmem pools for domid=10 (XEN) [2016-01-13 20:59:46] [VT-D] It's risky to assign :81:00.0 with shared RMRR at 7db92000 for Dom12. (XEN) [2016-01-13 22:21:06] tmem: flushing tmem pools for domid=3 (XEN) [2016-01-14 19:31:12] tmem: flushing tmem pools for domid=11 (XEN) [2016-01-15 07:02:56] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163 (XEN) [2016-01-15 07:02:56] [ Xen-4.6.0 x86_64 debug=y Not tainted ] (XEN) [2016-01-15 07:02:56] CPU:19 (XEN) [2016-01-15 07:02:56] RIP:e008:[] do_IRQ+0x3ca/0x63b (XEN) [2016-01-15 07:02:56] RFLAGS: 00010046 CONTEXT: hypervisor (XEN) [2016-01-15 07:02:56]
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 01/15/2016 02:09 PM, Ian Campbell wrote: On Fri, 2016-01-15 at 13:49 +0100, Håkon Alstadheim wrote: On 01/15/2016 01:42 PM, Jan Beulich wrote: On 15.01.16 at 13:32,wrote: placed microcode.bin in /boot/microcode.bin booted with : --- xen_commandline: ssd-xen-debug-marker console_timestamps=date loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 core_parking=power ucode=microcode.bin --- This can't work - did you look at the command line documentation? You can't specify a file name here - there's no file system driver inside the hypervisor, and hence it can't read files (it instead has to rely on the boot loader bringing those into memory for it). Get with the times :-) . Under EFI it most definitely wants a file-name. Not entirely sure about the file FORMAT though. From xen-command-line.html "Note further that use of this option has an unspecified effect when used with xen.efi (there the concept of modules doesn't exist, and the blob gets specified via the ucode= config file/section entry; see EFI configuration file description). From efi.html "ucode= Specifies a CPU microcode blob to load. (x86 only) This needs to go in your xen.cfg file (alongside kernel= ramdisk= etc), not on the xen command line. Ian. Ahh (face + palm) . It dawned on me right after I sent my previous. Now I DO get some acknowledgement of microcode.bin in the console-log, but /proc/cpuinfo still reports microcode: 0x31, so it seems stale microcode is not the issue :-/ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163
On 01/15/2016 01:49 PM, Håkon Alstadheim wrote: On 01/15/2016 01:42 PM, Jan Beulich wrote: On 15.01.16 at 13:32,wrote: placed microcode.bin in /boot/microcode.bin booted with : --- xen_commandline: ssd-xen-debug-marker console_timestamps=date loglvl=all guest_loglvl=all sync_console iommu=1,verbose,debug iommu_inclusive_mapping=1 com1=115200,8n1 console=com1 dom0_max_vcpus=4 dom0_vcpus_pin=1 dom0_mem=8G,max:8G cpufreq=xen:performance,verbose tmem=1 sched_smt_power_savings=1 apic_verbosity=debug e820-verbose=1 core_parking=power ucode=microcode.bin --- This can't work - did you look at the command line documentation? You can't specify a file name here - there's no file system driver inside the hypervisor, and hence it can't read files (it instead has to rely on the boot loader bringing those into memory for it). Get with the times :-) . Under EFI it most definitely wants a file-name. Not entirely sure about the file FORMAT though. From xen-command-line.html "Note further that use of this option has an unspecified effect when used with xen.efi (there the concept of modules doesn't exist, and the blob gets specified via the ucode= config file/section entry; see EFI configuration file description). From efi.html "ucode= Specifies a CPU microcode blob to load. (x86 only) #cat /proc/cpuinfo | grep micro says: microcode: 0x31 This is no change from previous boot. Now: How do I know wheter 0x31 is the newest? By checking - for the precise model and stepping of your CPU(s) - the information in the blob (which admittedly is a little cumbersome, but without knowing model and stepping I also can't try to help). Jan My fingers running faster than my head here. Managed to generate a blob that Xen accepts with command "iucode_tool microcode.dat -S -w microcode.bin" (missed the -S before). ucode=microcode.bin on a line by itself in the config. Now the file actually loads, there is indeed an update, to 0x36 in my case. If the error at irq.c:1163 keeps happening, I'll be sure to report again. :-~ Humbly, thanks Håkon. Sorry for all the noise. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel