Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-26 Thread Steven Rostedt
On Mon, 24 Jun 2019 22:03:45 -0500 Josh Poimboeuf wrote: > Looking at the dmesg, panic_on_oops doesn't seem to be enabled: it went > through the rewind_stack_do_exit() path instead of the panic() path. So > the system is apparently not configured to reboot on oops. "Command line: BOOT_IMAGE=/bo

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-24 Thread Josh Poimboeuf
On Wed, Jun 19, 2019 at 01:42:53PM -0700, Linus Torvalds wrote: > On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson > wrote: > > > > > Do you have the oops itself at all? > > > > An example at > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > > https://intel-gfx-ci.01

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Chris Wilson
Quoting Thomas Gleixner (2019-06-21 20:33:36) > On Fri, 21 Jun 2019, Chris Wilson wrote: > > > Quoting Thomas Gleixner (2019-06-21 16:30:52) > > > Chris, do you have the actual NMI lockup detector splats somewhere? > > > > Sorry, I'm having a hard time reproducing this at will now. The test > > c

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Thomas Gleixner
On Fri, 21 Jun 2019, Chris Wilson wrote: > Quoting Thomas Gleixner (2019-06-21 16:30:52) > > Chris, do you have the actual NMI lockup detector splats somewhere? > > Sorry, I'm having a hard time reproducing this at will now. The test > case depends on the right timing of the wrong event to cause

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Chris Wilson
Quoting Thomas Gleixner (2019-06-21 16:30:52) > Chris, do you have the actual NMI lockup detector splats somewhere? Sorry, I'm having a hard time reproducing this at will now. The test case depends on the right timing of the wrong event to cause the GPU to hang. >From memory, I got the "W

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-21 Thread Thomas Gleixner
On Wed, 19 Jun 2019, Linus Torvalds wrote: > On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson > wrote: > > > > > Do you have the oops itself at all? > > > > An example at > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > > https://intel-gfx-ci.01.org/tree/drm-tip/CI

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-19 Thread Linus Torvalds
On Wed, Jun 19, 2019 at 12:19 PM Chris Wilson wrote: > > > Do you have the oops itself at all? > > An example at > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/boot0.log > > The bug causing the oops

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-19 Thread Chris Wilson
Quoting Linus Torvalds (2019-06-19 19:49:37) > On Wed, Jun 19, 2019 at 5:40 AM Chris Wilson wrote: > > > > I haven't bisected this, but with the merge of rc5 into our CI we > > started hitting an issue that resulted in a oops and the NMI watchdog > > firing as we dumped the ftrace. > > Do you hav

Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-19 Thread Linus Torvalds
On Wed, Jun 19, 2019 at 5:40 AM Chris Wilson wrote: > > I haven't bisected this, but with the merge of rc5 into our CI we > started hitting an issue that resulted in a oops and the NMI watchdog > firing as we dumped the ftrace. Do you have the oops itself at all? Linus

NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

2019-06-19 Thread Chris Wilson
I haven't bisected this, but with the merge of rc5 into our CI we started hitting an issue that resulted in a oops and the NMI watchdog firing as we dumped the ftrace. This NMI watchdog locks up prior to the backtraces being printed, preventing the machine from rebooting, and can be avoided with ha