Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds
On Wed, Mar 08, 2017 at 12:13:12PM +, Chris Wilson wrote: On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote: FYI, we noticed the following commit: commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the lock number for stress test") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: boot on test machine: qemu-system-i386 -enable-kvm -m 320M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): Now the test is running, it takes too long. :) Sorry that's right. Up to now the 0day robot still cannot guarantee the timely reporting of a runtime regression, nor can it guarantee bisecting of a new regression even when some test actually triggered the bug. One fundamental challenge is, there are ~50,000 runtime "regressions" queued for bisect. Obviously there is no way to bisect them all. So a large portion of real regressions never get a chance to be bisected. Not to mention the problem of bisect reliability and efficiency. Most of the test "regressions" may be duplicates to each other (eg. a bug in mainline kernel will also show up in various developer trees). A great portion of them may also be random noises (eg. performance fluctuations). We've tried various approaches to improve the de-duplicate, filtering, prioritize etc. algorithms. Together with increased test coverage, they have been reflected in our slowly increasing report numbers. However there is still a long way to go. Thanks, Fengguang
Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds
On Wed, Mar 08, 2017 at 12:13:12PM +, Chris Wilson wrote: On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote: FYI, we noticed the following commit: commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the lock number for stress test") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: boot on test machine: qemu-system-i386 -enable-kvm -m 320M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): Now the test is running, it takes too long. :) Sorry that's right. Up to now the 0day robot still cannot guarantee the timely reporting of a runtime regression, nor can it guarantee bisecting of a new regression even when some test actually triggered the bug. One fundamental challenge is, there are ~50,000 runtime "regressions" queued for bisect. Obviously there is no way to bisect them all. So a large portion of real regressions never get a chance to be bisected. Not to mention the problem of bisect reliability and efficiency. Most of the test "regressions" may be duplicates to each other (eg. a bug in mainline kernel will also show up in various developer trees). A great portion of them may also be random noises (eg. performance fluctuations). We've tried various approaches to improve the de-duplicate, filtering, prioritize etc. algorithms. Together with increased test coverage, they have been reflected in our slowly increasing report numbers. However there is still a long way to go. Thanks, Fengguang
Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds
On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote: > > FYI, we noticed the following commit: > > commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust > the lock number for stress test") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > in testcase: boot > > on test machine: qemu-system-i386 -enable-kvm -m 320M > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): Now the test is running, it takes too long. :) wait_for_completion_interruptible() would stop the hung task check? That leaves NMI watchdog to check if we hit a deadlock between the workers. And add a timeout to the stress test. -Chris -- Chris Wilson, Intel Open Source Technology Centre
Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds
On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote: > > FYI, we noticed the following commit: > > commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust > the lock number for stress test") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > in testcase: boot > > on test machine: qemu-system-i386 -enable-kvm -m 320M > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): Now the test is running, it takes too long. :) wait_for_completion_interruptible() would stop the hung task check? That leaves NMI watchdog to check if we hit a deadlock between the workers. And add a timeout to the stress test. -Chris -- Chris Wilson, Intel Open Source Technology Centre
[lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds
FYI, we noticed the following commit: commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the lock number for stress test") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: boot on test machine: qemu-system-i386 -enable-kvm -m 320M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +---+++ | | 7fb4a2cea6 | 857811a371 | +---+++ | boot_successes| 0 | 0 | | boot_failures | 8 | 8 | | WARNING:at_kernel/locking/lockdep.c:#__lock_acquire | 8 || | WARNING:at_drivers/gpu/drm/drm_mode_config.c:#drm_mode_config_cleanup | 8 | 4 | | INFO:task_blocked_for_more_than#seconds | 0 | 4 | | Kernel_panic-not_syncing:hung_task:blocked_tasks | 0 | 4 | | calltrace:debug_show_all_locks| 0 | 2 | +---+++ [ 246.752394] INFO: task swapper:1 blocked for more than 120 seconds. [ 246.765907] Not tainted 4.10.0-11078-g857811a #1 [ 246.777618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.793776] swapper D 5948 1 0 0x [ 246.806182] Call Trace: [ 246.812532] __schedule+0x369/0xdb0 [ 246.821768] schedule+0x3c/0xd0 [ 246.830251] schedule_timeout+0x105/0x410 [ 246.837524] ? trace_hardirqs_on_caller+0x12c/0x1d0 [ 246.849060] ? do_raw_spin_unlock+0xd9/0x170 [ 246.859553] ? trace_hardirqs_on+0xb/0x10 [ 246.867883] wait_for_completion+0x11a/0x190 [ 246.878229] ? do_task_dead+0x70/0x70 [ 246.887266] flush_workqueue+0x2fc/0xc30 [ 246.896894] ? flush_workqueue+0xa0/0xc30 [ 246.906836] ? kmalloc_array+0x70/0x70 [ 246.915459] stress+0x18c/0x214 [ 246.923946] test_ww_mutex_init+0x3b2/0x3d9 [ 246.934226] ? lock_torture_init+0x7f0/0x7f0 [ 246.944458] do_one_initcall+0x9f/0x1b6 [ 246.951910] ? parse_args+0x465/0x540 [ 246.959370] kernel_init_freeable+0x109/0x1cb [ 246.967281] ? rest_init+0x130/0x130 [ 246.976223] kernel_init+0x11/0x190 [ 246.982703] ? schedule_tail+0x11/0x70 [ 246.990237] ? rest_init+0x130/0x130 [ 246.42] ? rest_init+0x130/0x130 [ 247.008987] ret_from_fork+0x21/0x2c [ 247.018186] [ 247.018186] Showing all locks held in the system: [ 247.033511] 2 locks held by khungtaskd/11: [ 247.040956] #0: (rcu_read_lock){..}, at: [] watchdog+0xc7/0x7b0 [ 247.054513] #1: (tasklist_lock){.+.+..}, at: [] debug_show_all_locks+0x37/0x1a0 [ 247.073278] 4 locks held by kworker/u2:1/30: [ 247.083586] #0: ("test-ww_mutex"){.+}, at: [] process_one_work+0x21c/0xab0 [ 247.101437] #1: ((>work)){+.+.+.}, at: [] process_one_work+0x21c/0xab0 [ 247.117544] #2: (ww_class_acquire){+.+...}, at: [] process_one_work+0x2e3/0xab0 [ 247.132833] #3: (ww_class_mutex){+.+...}, at: [] stress_reorder_work+0x1cb/0x4f0 [ 247.152234] [ 247.156751] = [ 247.156751] [ 247.173280] NMI backtrace for cpu 0 [ 247.182489] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 4.10.0-11078-g857811a #1 [ 247.199866] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 247.220997] Call Trace: [ 247.228372] dump_stack+0x16/0x20 [ 247.236300] nmi_cpu_backtrace+0xb3/0xc0 [ 247.245947] nmi_trigger_cpumask_backtrace+0x10a/0x150 [ 247.256543] ? debug_show_all_locks+0x15b/0x1a0 [ 247.267307] ? setup_vector_irq+0x280/0x280 [ 247.277455] arch_trigger_cpumask_backtrace+0x15/0x20 [ 247.287045] watchdog+0x4fd/0x7b0 [ 247.296058] ? hung_task_panic+0x20/0x20 [ 247.306052] kthread+0x153/0x160 [ 247.314949] ? hung_task_panic+0x20/0x20 [ 247.324960] ? __kthread_create_on_node+0x230/0x230 [ 247.336396] ret_from_fork+0x21/0x2c [ 247.345793] Kernel panic - not syncing: hung_task: blocked tasks [ 247.346565] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 4.10.0-11078-g857811a #1 [ 247.346565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 247.346565] Call Trace: [ 247.346565] dump_stack+0x16/0x20 [ 247.346565] panic+0xda/0x295 [ 247.346565] watchdog+0x509/0x7b0 [ 247.346565] ? hung_task_panic+0x20/0x20 [ 247.346565] kthread+0x153/0x160 [ 247.346565] ? hung_task_panic+0x20/0x20 [ 247.346565] ? __kthread_create_on_node+0x230/0x230 [ 247.346565] ret_from_fork+0x21/0x2c [
[lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds
FYI, we noticed the following commit: commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the lock number for stress test") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: boot on test machine: qemu-system-i386 -enable-kvm -m 320M caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +---+++ | | 7fb4a2cea6 | 857811a371 | +---+++ | boot_successes| 0 | 0 | | boot_failures | 8 | 8 | | WARNING:at_kernel/locking/lockdep.c:#__lock_acquire | 8 || | WARNING:at_drivers/gpu/drm/drm_mode_config.c:#drm_mode_config_cleanup | 8 | 4 | | INFO:task_blocked_for_more_than#seconds | 0 | 4 | | Kernel_panic-not_syncing:hung_task:blocked_tasks | 0 | 4 | | calltrace:debug_show_all_locks| 0 | 2 | +---+++ [ 246.752394] INFO: task swapper:1 blocked for more than 120 seconds. [ 246.765907] Not tainted 4.10.0-11078-g857811a #1 [ 246.777618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.793776] swapper D 5948 1 0 0x [ 246.806182] Call Trace: [ 246.812532] __schedule+0x369/0xdb0 [ 246.821768] schedule+0x3c/0xd0 [ 246.830251] schedule_timeout+0x105/0x410 [ 246.837524] ? trace_hardirqs_on_caller+0x12c/0x1d0 [ 246.849060] ? do_raw_spin_unlock+0xd9/0x170 [ 246.859553] ? trace_hardirqs_on+0xb/0x10 [ 246.867883] wait_for_completion+0x11a/0x190 [ 246.878229] ? do_task_dead+0x70/0x70 [ 246.887266] flush_workqueue+0x2fc/0xc30 [ 246.896894] ? flush_workqueue+0xa0/0xc30 [ 246.906836] ? kmalloc_array+0x70/0x70 [ 246.915459] stress+0x18c/0x214 [ 246.923946] test_ww_mutex_init+0x3b2/0x3d9 [ 246.934226] ? lock_torture_init+0x7f0/0x7f0 [ 246.944458] do_one_initcall+0x9f/0x1b6 [ 246.951910] ? parse_args+0x465/0x540 [ 246.959370] kernel_init_freeable+0x109/0x1cb [ 246.967281] ? rest_init+0x130/0x130 [ 246.976223] kernel_init+0x11/0x190 [ 246.982703] ? schedule_tail+0x11/0x70 [ 246.990237] ? rest_init+0x130/0x130 [ 246.42] ? rest_init+0x130/0x130 [ 247.008987] ret_from_fork+0x21/0x2c [ 247.018186] [ 247.018186] Showing all locks held in the system: [ 247.033511] 2 locks held by khungtaskd/11: [ 247.040956] #0: (rcu_read_lock){..}, at: [] watchdog+0xc7/0x7b0 [ 247.054513] #1: (tasklist_lock){.+.+..}, at: [] debug_show_all_locks+0x37/0x1a0 [ 247.073278] 4 locks held by kworker/u2:1/30: [ 247.083586] #0: ("test-ww_mutex"){.+}, at: [] process_one_work+0x21c/0xab0 [ 247.101437] #1: ((>work)){+.+.+.}, at: [] process_one_work+0x21c/0xab0 [ 247.117544] #2: (ww_class_acquire){+.+...}, at: [] process_one_work+0x2e3/0xab0 [ 247.132833] #3: (ww_class_mutex){+.+...}, at: [] stress_reorder_work+0x1cb/0x4f0 [ 247.152234] [ 247.156751] = [ 247.156751] [ 247.173280] NMI backtrace for cpu 0 [ 247.182489] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 4.10.0-11078-g857811a #1 [ 247.199866] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 247.220997] Call Trace: [ 247.228372] dump_stack+0x16/0x20 [ 247.236300] nmi_cpu_backtrace+0xb3/0xc0 [ 247.245947] nmi_trigger_cpumask_backtrace+0x10a/0x150 [ 247.256543] ? debug_show_all_locks+0x15b/0x1a0 [ 247.267307] ? setup_vector_irq+0x280/0x280 [ 247.277455] arch_trigger_cpumask_backtrace+0x15/0x20 [ 247.287045] watchdog+0x4fd/0x7b0 [ 247.296058] ? hung_task_panic+0x20/0x20 [ 247.306052] kthread+0x153/0x160 [ 247.314949] ? hung_task_panic+0x20/0x20 [ 247.324960] ? __kthread_create_on_node+0x230/0x230 [ 247.336396] ret_from_fork+0x21/0x2c [ 247.345793] Kernel panic - not syncing: hung_task: blocked tasks [ 247.346565] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 4.10.0-11078-g857811a #1 [ 247.346565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 247.346565] Call Trace: [ 247.346565] dump_stack+0x16/0x20 [ 247.346565] panic+0xda/0x295 [ 247.346565] watchdog+0x509/0x7b0 [ 247.346565] ? hung_task_panic+0x20/0x20 [ 247.346565] kthread+0x153/0x160 [ 247.346565] ? hung_task_panic+0x20/0x20 [ 247.346565] ? __kthread_create_on_node+0x230/0x230 [ 247.346565] ret_from_fork+0x21/0x2c [