Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-26 Thread Steven Rostedt
On Mon, 24 Jun 2019 22:03:45 -0500 Josh Poimboeuf wrote: > Looking at the dmesg, panic_on_oops doesn't seem to be enabled: it went > through the rewind_stack_do_exit() path instead of the panic() path. So > the system is apparently not configured to reboot on oops. "Command line: BOOT_IMAGE=/bo

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-24 Thread Josh Poimboeuf
On Wed, Jun 19, 2019 at 01:42:53PM -0700, Linus Torvalds wrote: > On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson > wrote: > > > > > Do you have the oops itself at all? > > > > An example at > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > > https://intel-gfx-ci.01

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Chris Wilson
Quoting Thomas Gleixner (2019-06-21 20:33:36) > On Fri, 21 Jun 2019, Chris Wilson wrote: > > > Quoting Thomas Gleixner (2019-06-21 16:30:52) > > > Chris, do you have the actual NMI lockup detector splats somewhere? > > > > Sorry, I'm having a hard time reproducing this at will now. The test > > c

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Thomas Gleixner
On Fri, 21 Jun 2019, Chris Wilson wrote: > Quoting Thomas Gleixner (2019-06-21 16:30:52) > > Chris, do you have the actual NMI lockup detector splats somewhere? > > Sorry, I'm having a hard time reproducing this at will now. The test > case depends on the right timing of the wrong event to cause

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Chris Wilson
Quoting Thomas Gleixner (2019-06-21 16:30:52) > Chris, do you have the actual NMI lockup detector splats somewhere? Sorry, I'm having a hard time reproducing this at will now. The test case depends on the right timing of the wrong event to cause the GPU to hang. >From memory, I got the "W

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Thomas Gleixner
On Wed, 19 Jun 2019, Linus Torvalds wrote: > On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson > wrote: > > > > > Do you have the oops itself at all? > > > > An example at > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > > https://intel-gfx-ci.01.org/tree/drm-tip/CI

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-19 Thread Linus Torvalds
On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson wrote: > > > Do you have the oops itself at all? > > An example at > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/boot0.log > > The bug causing the oops

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-19 Thread Chris Wilson
Quoting Linus Torvalds (2019-06-19 19:49:37) > On Wed, Jun 19, 2019 at 5:40 AM Chris Wilson wrote: > > > > I haven't bisected this, but with the merge of rc5 into our CI we > > started hitting an issue that resulted in a oops and the NMI watchdog > > firing as we dumped the ftrace. > > Do you hav

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-19 Thread Linus Torvalds
On Wed, Jun 19, 2019 at 5:40 AM Chris Wilson wrote: > > I haven't bisected this, but with the merge of rc5 into our CI we > started hitting an issue that resulted in a oops and the NMI watchdog > firing as we dumped the ftrace. Do you have the oops itself at all? Linus