>>> On 08.02.18 at 13:18, wrote:
> We switch the NMI frequency to ~2Hz after the calibration, but that is
> after having run the BSP at 100Hz for a long period of time, and the APs
> at this rate for a short while. Irrespective of the exact fix here, it
> is simply not a good idea to be running w
On Thu, 8 Feb 2018 15:00:33 +
Andrew Cooper wrote:
>On 08/02/18 14:37, Alexey G wrote:
>> On Thu, 8 Feb 2018 12:40:41 +
>> Andrew Cooper wrote:
>>> - Perf/Oprofile. This is currently mutually exclusive with Xen
>>> using the watchdog, but needn't be and hopefully won't be in the
>>> f
On 08/02/18 14:37, Alexey G wrote:
> On Thu, 8 Feb 2018 12:40:41 +
> Andrew Cooper wrote:
>> - Perf/Oprofile. This is currently mutually exclusive with Xen using
>> the watchdog, but needn't be and hopefully won't be in the future.
>>
>>> Most of the time we deal with watchdog NMIs, while all
On Thu, 8 Feb 2018 12:40:41 +
Andrew Cooper wrote:
>- Perf/Oprofile. This is currently mutually exclusive with Xen using
>the watchdog, but needn't be and hopefully won't be in the future.
>
>>
>> Most of the time we deal with watchdog NMIs, while all others should
>> be somewhat rare. The th
On 08/02/18 12:32, Alexey G wrote:
> On Thu, 8 Feb 2018 10:47:45 +
> Igor Druzhinin wrote:
>> I've done this measurement before. So what we are seeing exactly is
>> that the time we are spending in SMI is spiking (sometimes up to
>> 200ms) at the moment we go through INIT-SIPI-SIPI sequence. L
On Thu, 8 Feb 2018 10:47:45 +
Igor Druzhinin wrote:
>I've done this measurement before. So what we are seeing exactly is
>that the time we are spending in SMI is spiking (sometimes up to
>200ms) at the moment we go through INIT-SIPI-SIPI sequence. Looks like
>this is enough to push the system
On 08/02/18 09:12, Jan Beulich wrote:
On 07.02.18 at 18:08, wrote:
>> On 07/02/18 15:06, Jan Beulich wrote:
>>> Also you completely ignore my argument against the seemingly
>>> random division by 10, including the resulting question of what you
>>> mean to do once 10Hz also turns out too high
On 08/02/18 06:37, Alexey G wrote:
> On Wed, 7 Feb 2018 13:01:08 +
> Igor Druzhinin wrote:
>> So far the issue confirmed:
>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>> that it was tested on), Intel S2600XX, etc.
>>
>> Also see:
>> https://bugs.xenserver.org/browse/
>>> On 07.02.18 at 18:08, wrote:
> On 07/02/18 15:06, Jan Beulich wrote:
>> Also you completely ignore my argument against the seemingly
>> random division by 10, including the resulting question of what you
>> mean to do once 10Hz also turns out too high a frequency.
>
> We've got to pick a freq
On Wed, 7 Feb 2018 13:01:08 +
Igor Druzhinin wrote:
>So far the issue confirmed:
>Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>that it was tested on), Intel S2600XX, etc.
>
>Also see:
>https://bugs.xenserver.org/browse/XSO-774
>
>Well, no-watchdog is what we currently
On 07/02/18 15:06, Jan Beulich wrote:
On 07.02.18 at 14:24, wrote:
>> On 07/02/18 13:08, Jan Beulich wrote:
>> On 07.02.18 at 14:01, wrote:
So far the issue confirmed:
Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
that it was tested on), Intel S2600X
>>> On 07.02.18 at 14:24, wrote:
> On 07/02/18 13:08, Jan Beulich wrote:
> On 07.02.18 at 14:01, wrote:
>>> So far the issue confirmed:
>>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>>> that it was tested on), Intel S2600XX, etc.
>>>
>>> Also see:
>>> https://bugs.x
On 07/02/18 13:08, Jan Beulich wrote:
On 07.02.18 at 14:01, wrote:
>> So far the issue confirmed:
>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>> that it was tested on), Intel S2600XX, etc.
>>
>> Also see:
>> https://bugs.xenserver.org/browse/XSO-774
>>
>> Well, no
On 07/02/18 13:08, Jan Beulich wrote:
On 07.02.18 at 14:01, wrote:
>> So far the issue confirmed:
>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>> that it was tested on), Intel S2600XX, etc.
>>
>> Also see:
>> https://bugs.xenserver.org/browse/XSO-774
>>
>> Well, no
>>> On 07.02.18 at 14:01, wrote:
> So far the issue confirmed:
> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
> that it was tested on), Intel S2600XX, etc.
>
> Also see:
> https://bugs.xenserver.org/browse/XSO-774
>
> Well, no-watchdog is what we currently recommend in t
On 07/02/18 09:13, Jan Beulich wrote:
On 06.02.18 at 22:51, wrote:
>> The problem with a quirk/commandline parameter is that the issue is
>> reported for a wide variety of systems and, as it looks like, depends on
>> the default BIOS setup - means it's hard to identify particular
>> machines.
>>> On 06.02.18 at 22:51, wrote:
> The problem with a quirk/commandline parameter is that the issue is
> reported for a wide variety of systems and, as it looks like, depends on
> the default BIOS setup - means it's hard to identify particular
> machines. We should obviously sort this out with Int
If the actual SMI source is not related to some place in the NMI
handler code but was eg. due to some SMI timer, lowering NMI
watchdog frequency might not fix the issue completely, but lower
its reproducibility (perhaps to some very rare occurrences). So
it's better be sur
On 06/02/18 16:23, Jan Beulich wrote:
On 06.02.18 at 17:14, wrote:
>> On 06/02/18 16:07, Jan Beulich wrote:
>> On 05.02.18 at 22:18, wrote:
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -34,7 +34,8 @@
#include
unsigned int nmi_watchdog = NMI_NONE
On 06/02/18 18:17, Alexey G wrote:
> On Tue, 6 Feb 2018 17:21:19 +
> Igor Druzhinin wrote:
>> On 06/02/18 17:08, Alexey G wrote:
>>> The major concern here is the possiblity of SMI being triggered _not_
>>> by some specific I/O port access. Primarily, if it actually was a
>>> periodic SMI.
>>>
On Tue, 6 Feb 2018 17:21:19 +
Igor Druzhinin wrote:
>On 06/02/18 17:08, Alexey G wrote:
>> The major concern here is the possiblity of SMI being triggered _not_
>> by some specific I/O port access. Primarily, if it actually was a
>> periodic SMI.
>>
>> If the actual SMI source is not related
On 06/02/18 17:08, Alexey G wrote:
> On Tue, 6 Feb 2018 14:21:12 +
> Andrew Cooper wrote:
>
>> On 06/02/18 03:10, Alexey G wrote:
>>> I/O port 61h normally is not emulated by SMI legacy kbd handling code
>>> in BIOS, only ports like 60h, 64h, etc.
>>> Contrary to USB legacy emulation, it has
On Tue, 6 Feb 2018 14:21:12 +
Andrew Cooper wrote:
>On 06/02/18 03:10, Alexey G wrote:
>> I/O port 61h normally is not emulated by SMI legacy kbd handling code
>> in BIOS, only ports like 60h, 64h, etc.
>> Contrary to USB legacy emulation, it has to intercept port 61h via a
>> different appro
On 06/02/18 16:23, Jan Beulich wrote:
On 06.02.18 at 17:14, wrote:
>> On 06/02/18 16:07, Jan Beulich wrote:
>> On 05.02.18 at 22:18, wrote:
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -34,7 +34,8 @@
#include
unsigned int nmi_watchdog = NMI_NONE
On 06/02/18 16:23, Jan Beulich wrote:
On 06.02.18 at 17:14, wrote:
>> On 06/02/18 16:07, Jan Beulich wrote:
>> On 05.02.18 at 22:18, wrote:
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -34,7 +34,8 @@
#include
unsigned int nmi_watchdog = NMI_NONE
>>> On 06.02.18 at 17:14, wrote:
> On 06/02/18 16:07, Jan Beulich wrote:
> On 05.02.18 at 22:18, wrote:
>>> --- a/xen/arch/x86/nmi.c
>>> +++ b/xen/arch/x86/nmi.c
>>> @@ -34,7 +34,8 @@
>>> #include
>>>
>>> unsigned int nmi_watchdog = NMI_NONE;
>>> -static unsigned int nmi_hz = HZ;
>>> +/*
On 06/02/18 16:07, Jan Beulich wrote:
On 05.02.18 at 22:18, wrote:
>> --- a/xen/arch/x86/nmi.c
>> +++ b/xen/arch/x86/nmi.c
>> @@ -34,7 +34,8 @@
>> #include
>>
>> unsigned int nmi_watchdog = NMI_NONE;
>> -static unsigned int nmi_hz = HZ;
>> +/* initial watchdog frequency - shouldn't be to
>>> On 05.02.18 at 22:18, wrote:
> --- a/xen/arch/x86/nmi.c
> +++ b/xen/arch/x86/nmi.c
> @@ -34,7 +34,8 @@
> #include
>
> unsigned int nmi_watchdog = NMI_NONE;
> -static unsigned int nmi_hz = HZ;
> +/* initial watchdog frequency - shouldn't be too high to avoid boot hangs */
> +static unsigne
On 06/02/18 03:10, Alexey G wrote:
> On Mon, 5 Feb 2018 21:18:42 +
> Igor Druzhinin wrote:
>
>> We're noticing a reproducible system boot hang on certain
>> post-Skylake platforms where the BIOS is configured in
>> legacy boot mode with x2APIC disabled. The system stalls
>> immediately after w
On 05/02/18 21:18, Igor Druzhinin wrote:
> We're noticing a reproducible system boot hang on certain
> post-Skylake platforms where the BIOS is configured in
Its just a plain Skylake Server, from what I can see.
> legacy boot mode with x2APIC disabled. The system stalls
> immediately after writin
On Mon, 5 Feb 2018 21:18:42 +
Igor Druzhinin wrote:
>We're noticing a reproducible system boot hang on certain
>post-Skylake platforms where the BIOS is configured in
>legacy boot mode with x2APIC disabled. The system stalls
>immediately after writing the first SMP initialization
>sequence in
We're noticing a reproducible system boot hang on certain
post-Skylake platforms where the BIOS is configured in
legacy boot mode with x2APIC disabled. The system stalls
immediately after writing the first SMP initialization
sequence into APIC ICR.
The cause of the problem is watchdog NMI handler
32 matches
Mail list logo