Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-13 Thread Jan Beulich
>>> On 08.02.18 at 13:18, wrote: > We switch the NMI frequency to ~2Hz after the calibration, but that is > after having run the BSP at 100Hz for a long period of time, and the APs > at this rate for a short while. Irrespective of the exact fix here, it > is simply not a good idea to be running w

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Alexey G
On Thu, 8 Feb 2018 15:00:33 + Andrew Cooper wrote: >On 08/02/18 14:37, Alexey G wrote: >> On Thu, 8 Feb 2018 12:40:41 + >> Andrew Cooper wrote: >>> - Perf/Oprofile.  This is currently mutually exclusive with Xen >>> using the watchdog, but needn't be and hopefully won't be in the >>> f

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Andrew Cooper
On 08/02/18 14:37, Alexey G wrote: > On Thu, 8 Feb 2018 12:40:41 + > Andrew Cooper wrote: >> - Perf/Oprofile.  This is currently mutually exclusive with Xen using >> the watchdog, but needn't be and hopefully won't be in the future. >> >>> Most of the time we deal with watchdog NMIs, while all

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Alexey G
On Thu, 8 Feb 2018 12:40:41 + Andrew Cooper wrote: >- Perf/Oprofile.  This is currently mutually exclusive with Xen using >the watchdog, but needn't be and hopefully won't be in the future. > >> >> Most of the time we deal with watchdog NMIs, while all others should >> be somewhat rare. The th

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Andrew Cooper
On 08/02/18 12:32, Alexey G wrote: > On Thu, 8 Feb 2018 10:47:45 + > Igor Druzhinin wrote: >> I've done this measurement before. So what we are seeing exactly is >> that the time we are spending in SMI is spiking (sometimes up to >> 200ms) at the moment we go through INIT-SIPI-SIPI sequence. L

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Alexey G
On Thu, 8 Feb 2018 10:47:45 + Igor Druzhinin wrote: >I've done this measurement before. So what we are seeing exactly is >that the time we are spending in SMI is spiking (sometimes up to >200ms) at the moment we go through INIT-SIPI-SIPI sequence. Looks like >this is enough to push the system

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Andrew Cooper
On 08/02/18 09:12, Jan Beulich wrote: On 07.02.18 at 18:08, wrote: >> On 07/02/18 15:06, Jan Beulich wrote: >>> Also you completely ignore my argument against the seemingly >>> random division by 10, including the resulting question of what you >>> mean to do once 10Hz also turns out too high

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Igor Druzhinin
On 08/02/18 06:37, Alexey G wrote: > On Wed, 7 Feb 2018 13:01:08 + > Igor Druzhinin wrote: >> So far the issue confirmed: >> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >> that it was tested on), Intel S2600XX, etc. >> >> Also see: >> https://bugs.xenserver.org/browse/

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-08 Thread Jan Beulich
>>> On 07.02.18 at 18:08, wrote: > On 07/02/18 15:06, Jan Beulich wrote: >> Also you completely ignore my argument against the seemingly >> random division by 10, including the resulting question of what you >> mean to do once 10Hz also turns out too high a frequency. > > We've got to pick a freq

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Alexey G
On Wed, 7 Feb 2018 13:01:08 + Igor Druzhinin wrote: >So far the issue confirmed: >Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >that it was tested on), Intel S2600XX, etc. > >Also see: >https://bugs.xenserver.org/browse/XSO-774 > >Well, no-watchdog is what we currently

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Andrew Cooper
On 07/02/18 15:06, Jan Beulich wrote: On 07.02.18 at 14:24, wrote: >> On 07/02/18 13:08, Jan Beulich wrote: >> On 07.02.18 at 14:01, wrote: So far the issue confirmed: Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one that it was tested on), Intel S2600X

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Jan Beulich
>>> On 07.02.18 at 14:24, wrote: > On 07/02/18 13:08, Jan Beulich wrote: > On 07.02.18 at 14:01, wrote: >>> So far the issue confirmed: >>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >>> that it was tested on), Intel S2600XX, etc. >>> >>> Also see: >>> https://bugs.x

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Igor Druzhinin
On 07/02/18 13:08, Jan Beulich wrote: On 07.02.18 at 14:01, wrote: >> So far the issue confirmed: >> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >> that it was tested on), Intel S2600XX, etc. >> >> Also see: >> https://bugs.xenserver.org/browse/XSO-774 >> >> Well, no

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Andrew Cooper
On 07/02/18 13:08, Jan Beulich wrote: On 07.02.18 at 14:01, wrote: >> So far the issue confirmed: >> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one >> that it was tested on), Intel S2600XX, etc. >> >> Also see: >> https://bugs.xenserver.org/browse/XSO-774 >> >> Well, no

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Jan Beulich
>>> On 07.02.18 at 14:01, wrote: > So far the issue confirmed: > Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one > that it was tested on), Intel S2600XX, etc. > > Also see: > https://bugs.xenserver.org/browse/XSO-774 > > Well, no-watchdog is what we currently recommend in t

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Igor Druzhinin
On 07/02/18 09:13, Jan Beulich wrote: On 06.02.18 at 22:51, wrote: >> The problem with a quirk/commandline parameter is that the issue is >> reported for a wide variety of systems and, as it looks like, depends on >> the default BIOS setup - means it's hard to identify particular >> machines.

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-07 Thread Jan Beulich
>>> On 06.02.18 at 22:51, wrote: > The problem with a quirk/commandline parameter is that the issue is > reported for a wide variety of systems and, as it looks like, depends on > the default BIOS setup - means it's hard to identify particular > machines. We should obviously sort this out with Int

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Alexey G
If the actual SMI source is not related to some place in the NMI handler code but was eg. due to some SMI timer, lowering NMI watchdog frequency might not fix the issue completely, but lower its reproducibility (perhaps to some very rare occurrences). So it's better be sur

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:23, Jan Beulich wrote: On 06.02.18 at 17:14, wrote: >> On 06/02/18 16:07, Jan Beulich wrote: >> On 05.02.18 at 22:18, wrote: --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -34,7 +34,8 @@ #include unsigned int nmi_watchdog = NMI_NONE

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 18:17, Alexey G wrote: > On Tue, 6 Feb 2018 17:21:19 + > Igor Druzhinin wrote: >> On 06/02/18 17:08, Alexey G wrote: >>> The major concern here is the possiblity of SMI being triggered _not_ >>> by some specific I/O port access. Primarily, if it actually was a >>> periodic SMI. >>>

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Alexey G
On Tue, 6 Feb 2018 17:21:19 + Igor Druzhinin wrote: >On 06/02/18 17:08, Alexey G wrote: >> The major concern here is the possiblity of SMI being triggered _not_ >> by some specific I/O port access. Primarily, if it actually was a >> periodic SMI. >> >> If the actual SMI source is not related

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 17:08, Alexey G wrote: > On Tue, 6 Feb 2018 14:21:12 + > Andrew Cooper wrote: > >> On 06/02/18 03:10, Alexey G wrote: >>> I/O port 61h normally is not emulated by SMI legacy kbd handling code >>> in BIOS, only ports like 60h, 64h, etc. >>> Contrary to USB legacy emulation, it has

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Alexey G
On Tue, 6 Feb 2018 14:21:12 + Andrew Cooper wrote: >On 06/02/18 03:10, Alexey G wrote: >> I/O port 61h normally is not emulated by SMI legacy kbd handling code >> in BIOS, only ports like 60h, 64h, etc. >> Contrary to USB legacy emulation, it has to intercept port 61h via a >> different appro

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:23, Jan Beulich wrote: On 06.02.18 at 17:14, wrote: >> On 06/02/18 16:07, Jan Beulich wrote: >> On 05.02.18 at 22:18, wrote: --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -34,7 +34,8 @@ #include unsigned int nmi_watchdog = NMI_NONE

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:23, Jan Beulich wrote: On 06.02.18 at 17:14, wrote: >> On 06/02/18 16:07, Jan Beulich wrote: >> On 05.02.18 at 22:18, wrote: --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -34,7 +34,8 @@ #include unsigned int nmi_watchdog = NMI_NONE

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Jan Beulich
>>> On 06.02.18 at 17:14, wrote: > On 06/02/18 16:07, Jan Beulich wrote: > On 05.02.18 at 22:18, wrote: >>> --- a/xen/arch/x86/nmi.c >>> +++ b/xen/arch/x86/nmi.c >>> @@ -34,7 +34,8 @@ >>> #include >>> >>> unsigned int nmi_watchdog = NMI_NONE; >>> -static unsigned int nmi_hz = HZ; >>> +/*

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Igor Druzhinin
On 06/02/18 16:07, Jan Beulich wrote: On 05.02.18 at 22:18, wrote: >> --- a/xen/arch/x86/nmi.c >> +++ b/xen/arch/x86/nmi.c >> @@ -34,7 +34,8 @@ >> #include >> >> unsigned int nmi_watchdog = NMI_NONE; >> -static unsigned int nmi_hz = HZ; >> +/* initial watchdog frequency - shouldn't be to

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Jan Beulich
>>> On 05.02.18 at 22:18, wrote: > --- a/xen/arch/x86/nmi.c > +++ b/xen/arch/x86/nmi.c > @@ -34,7 +34,8 @@ > #include > > unsigned int nmi_watchdog = NMI_NONE; > -static unsigned int nmi_hz = HZ; > +/* initial watchdog frequency - shouldn't be too high to avoid boot hangs */ > +static unsigne

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Andrew Cooper
On 06/02/18 03:10, Alexey G wrote: > On Mon, 5 Feb 2018 21:18:42 + > Igor Druzhinin wrote: > >> We're noticing a reproducible system boot hang on certain >> post-Skylake platforms where the BIOS is configured in >> legacy boot mode with x2APIC disabled. The system stalls >> immediately after w

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-06 Thread Andrew Cooper
On 05/02/18 21:18, Igor Druzhinin wrote: > We're noticing a reproducible system boot hang on certain > post-Skylake platforms where the BIOS is configured in Its just a plain Skylake Server, from what I can see. > legacy boot mode with x2APIC disabled. The system stalls > immediately after writin

Re: [Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-05 Thread Alexey G
On Mon, 5 Feb 2018 21:18:42 + Igor Druzhinin wrote: >We're noticing a reproducible system boot hang on certain >post-Skylake platforms where the BIOS is configured in >legacy boot mode with x2APIC disabled. The system stalls >immediately after writing the first SMP initialization >sequence in

[Xen-devel] [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

2018-02-05 Thread Igor Druzhinin
We're noticing a reproducible system boot hang on certain post-Skylake platforms where the BIOS is configured in legacy boot mode with x2APIC disabled. The system stalls immediately after writing the first SMP initialization sequence into APIC ICR. The cause of the problem is watchdog NMI handler