RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-08-17 Thread Liang, Kan
> On Tue, 15 Aug 2017, Liang, Kan wrote: > > This patch which speed up the hrtimer > (https://lkml.org/lkml/2017/6/26/685) > > is decent to fix the spurious hard lockups. > > Tested-by: Kan Liang > > > > Please consider to merge it into both mainline and stable tree. > > Well, it 'fixes' the pr

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Liang, Kan wrote: > This patch which speed up the hrtimer (https://lkml.org/lkml/2017/6/26/685) > is decent to fix the spurious hard lockups. > Tested-by: Kan Liang > > Please consider to merge it into both mainline and stable tree. Well, it 'fixes' the problem, but at the s

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 6:16 PM, Liang, Kan wrote: > > We have confirmed that the hardlock with "speed up the hrtimer" patch is > actually another issue. Good. However: > Tim has already proposed a patch to fix it. > Here is his patch. https://lkml.org/lkml/2017/8/14/1000 Ugh. I hate that patc

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-08-14 Thread Liang, Kan
> On Mon, Jul 17, 2017 at 01:24:23AM +, Liang, Kan wrote: > > Hi Don & Thomas, > > > > Sorry for the late response. We just finished the tests for all proposed > patches. > > > > There are three proposed patches so far. > > Patch 1: The patch as above which speed up the hrtimer. > > Patch 2:

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-07-17 Thread Thomas Gleixner
On Mon, 17 Jul 2017, Liang, Kan wrote: > > > > > According to our test, only patch 3 works well. > > > > > The other two patches will hang the system eventually. > > > > Hang the system eventually? Does that mean that the system stops working > > and the watchdog does not catch the problem? > > R

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-07-17 Thread Liang, Kan
> On Mon, 17 Jul 2017, Liang, Kan wrote: > > > That doesn't make sense. What's the exact test procedure? > > > > I don't know the exact test procedure. The test case is from our customer. > > I only know that the test case makes calls into the x11 libs. > > Sigh. This starts to be silly. You tes

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-07-17 Thread Don Zickus
On Mon, Jul 17, 2017 at 01:24:23AM +, Liang, Kan wrote: > Hi Don & Thomas, > > Sorry for the late response. We just finished the tests for all proposed > patches. > > There are three proposed patches so far. > Patch 1: The patch as above which speed up the hrtimer. > Patch 2: Thomas's first

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-07-17 Thread Thomas Gleixner
On Mon, 17 Jul 2017, Liang, Kan wrote: > > That doesn't make sense. What's the exact test procedure? > > I don't know the exact test procedure. The test case is from our customer. > I only know that the test case makes calls into the x11 libs. Sigh. This starts to be silly. You test something and

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-07-17 Thread Liang, Kan
> > On Mon, 17 Jul 2017, Liang, Kan wrote: > > There are three proposed patches so far. > > Patch 1: The patch as above which speed up the hrtimer. > > Patch 2: Thomas's first proposal. > > https://patchwork.kernel.org/patch/9803033/ > > https://patchwork.kernel.org/patch/9805903/ > > Patch 3: my

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-07-17 Thread Thomas Gleixner
On Mon, 17 Jul 2017, Liang, Kan wrote: > There are three proposed patches so far. > Patch 1: The patch as above which speed up the hrtimer. > Patch 2: Thomas's first proposal. > https://patchwork.kernel.org/patch/9803033/ > https://patchwork.kernel.org/patch/9805903/ > Patch 3: my original proposal

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-07-16 Thread Liang, Kan
> On Mon, Jun 26, 2017 at 04:19:27PM -0400, Don Zickus wrote: > > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > > > On Fri, 23 Jun 2017, Don Zickus wrote: > > > > Hmm, all this work for a temp fix. Kan, how much longer until the > > > > real fix of having perf count the righ

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-29 Thread Andi Kleen
> > Thomas' patch to modulate the frequency seemed reasonable to me. > > It made the NMI watchdog depend on accurate ktime, but that's probably ok. > > Ok, did Kan finish testing this patch (with the small fix on top)? Kan doesn't have the specific hardware to test it. We've been waiting for anot

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-29 Thread Don Zickus
On Thu, Jun 29, 2017 at 09:12:20AM -0700, Andi Kleen wrote: > On Thu, Jun 29, 2017 at 11:44:06AM -0400, Don Zickus wrote: > > On Wed, Jun 28, 2017 at 01:14:04PM -0700, Andi Kleen wrote: > > > It can be a useful debugging tool for a specific class of bugs: > > > when kernel software is looping fore

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-29 Thread Andi Kleen
On Thu, Jun 29, 2017 at 11:44:06AM -0400, Don Zickus wrote: > On Wed, Jun 28, 2017 at 01:14:04PM -0700, Andi Kleen wrote: > > It can be a useful debugging tool for a specific class of bugs: > > when kernel software is looping forever. > > > > But if that happens does it really matter how many ite

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-29 Thread Don Zickus
On Wed, Jun 28, 2017 at 01:14:04PM -0700, Andi Kleen wrote: > It can be a useful debugging tool for a specific class of bugs: > when kernel software is looping forever. > > But if that happens does it really matter how many iterations the > loop does before it is stopped? > > Even the current ti

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-28 Thread Andi Kleen
On Wed, Jun 28, 2017 at 03:00:08PM -0400, Don Zickus wrote: > On Tue, Jun 27, 2017 at 04:48:22PM -0700, Andi Kleen wrote: > > > I haven't heard back any test result yet. > > > > > > The above patch looks good to me. > > > > This needs performance testing. It may slow down performance or latency

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-28 Thread Don Zickus
On Tue, Jun 27, 2017 at 04:48:22PM -0700, Andi Kleen wrote: > > I haven't heard back any test result yet. > > > > The above patch looks good to me. > > This needs performance testing. It may slow down performance or latency > sensitive workloads. More motivation to work through the issues with

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-27 Thread Andi Kleen
> I haven't heard back any test result yet. > > The above patch looks good to me. This needs performance testing. It may slow down performance or latency sensitive workloads. > Which workaround do you prefer, the above one or the one checking timestamp? I prefer the earlier patch, it has far

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-27 Thread Don Zickus
On Tue, Jun 27, 2017 at 08:49:19PM +, Liang, Kan wrote: > > > On Mon, Jun 26, 2017 at 04:19:27PM -0400, Don Zickus wrote: > > > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > > > > On Fri, 23 Jun 2017, Don Zickus wrote: > > > > > Hmm, all this work for a temp fix. Kan, how

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-27 Thread Liang, Kan
> On Mon, Jun 26, 2017 at 04:19:27PM -0400, Don Zickus wrote: > > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > > > On Fri, 23 Jun 2017, Don Zickus wrote: > > > > Hmm, all this work for a temp fix. Kan, how much longer until the > > > > real fix of having perf count the right

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-27 Thread Don Zickus
On Mon, Jun 26, 2017 at 04:19:27PM -0400, Don Zickus wrote: > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > > On Fri, 23 Jun 2017, Don Zickus wrote: > > > Hmm, all this work for a temp fix. Kan, how much longer until the real > > > fix > > > of having perf count the right cyc

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-26 Thread Thomas Gleixner
On Mon, 26 Jun 2017, Don Zickus wrote: > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > > On Fri, 23 Jun 2017, Don Zickus wrote: > > > Hmm, all this work for a temp fix. Kan, how much longer until the real > > > fix > > > of having perf count the right cycles? > > > > Quite a

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-26 Thread Don Zickus
On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > On Fri, 23 Jun 2017, Don Zickus wrote: > > Hmm, all this work for a temp fix. Kan, how much longer until the real fix > > of having perf count the right cycles? > > Quite a while. The approach is wilfully breaking the user space A

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-23 Thread Thomas Gleixner
On Fri, 23 Jun 2017, Don Zickus wrote: > Hmm, all this work for a temp fix. Kan, how much longer until the real fix > of having perf count the right cycles? Quite a while. The approach is wilfully breaking the user space ABI, which is not going to happen. And there is a simpler solution as well,

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-23 Thread Don Zickus
On Fri, Jun 23, 2017 at 10:01:55AM +0200, Thomas Gleixner wrote: > On Thu, 22 Jun 2017, Don Zickus wrote: > > On Wed, Jun 21, 2017 at 11:53:57PM +0200, Thomas Gleixner wrote: > > > On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > > > We now have more and more systems where the Turbo range is wid

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-23 Thread Thomas Gleixner
On Thu, 22 Jun 2017, Don Zickus wrote: > On Wed, Jun 21, 2017 at 11:53:57PM +0200, Thomas Gleixner wrote: > > On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > > We now have more and more systems where the Turbo range is wide enough > > > that the NMI watchdog expires faster than the soft watchdo

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-22 Thread Liang, Kan
> Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups > > On Wed, Jun 21, 2017 at 11:53:57PM +0200, Thomas Gleixner wrote: > > On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > > We now have more and more systems where the Turbo range is wide > >

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-22 Thread Don Zickus
On Wed, Jun 21, 2017 at 11:53:57PM +0200, Thomas Gleixner wrote: > On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > We now have more and more systems where the Turbo range is wide enough > > that the NMI watchdog expires faster than the soft watchdog timer that > > updates the interrupt tick the

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-22 Thread Thomas Gleixner
On Wed, 21 Jun 2017, Thomas Gleixner wrote: > On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > We now have more and more systems where the Turbo range is wide enough > > that the NMI watchdog expires faster than the soft watchdog timer that > > updates the interrupt tick the NMI watchdog relies

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-21 Thread Thomas Gleixner
On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > We now have more and more systems where the Turbo range is wide enough > that the NMI watchdog expires faster than the soft watchdog timer that > updates the interrupt tick the NMI watchdog relies on. > > This problem was originally added by commit

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-21 Thread Thomas Gleixner
On Wed, 21 Jun 2017, Andi Kleen wrote: > On Wed, Jun 21, 2017 at 05:12:06PM +0200, Thomas Gleixner wrote: > > On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > > +/* > > > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which > > > +

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-21 Thread Prarit Bhargava
On 06/21/2017 11:47 AM, Liang, Kan wrote: > > >> On Wed, 21 Jun 2017, kan.li...@intel.com wrote: >>> >>> #ifdef CONFIG_HARDLOCKUP_DETECTOR >>> +/* >>> + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, >> which >>> + * can tick faster than the measured CPU Frequency due to Turbo mo

RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-21 Thread Liang, Kan
> On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > +/* > > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, > which > > + * can tick faster than the measured CPU Frequency due to Turbo mode. > > + * That can lead to spurious timeouts. >

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-21 Thread Andi Kleen
On Wed, Jun 21, 2017 at 05:12:06PM +0200, Thomas Gleixner wrote: > On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > +/* > > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which > > + * can tick faster than the measured CPU Frequency du

Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

2017-06-21 Thread Thomas Gleixner
On Wed, 21 Jun 2017, kan.li...@intel.com wrote: > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > +/* > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which > + * can tick faster than the measured CPU Frequency due to Turbo mode. > + * That can lead to spurious timeouts. > + * To workaroun