Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds

2017-03-08 Thread Fengguang Wu

On Wed, Mar 08, 2017 at 12:13:12PM +, Chris Wilson wrote:

On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote:


FYI, we noticed the following commit:

commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the lock 
number for stress test")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 320M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


Now the test is running, it takes too long. :)


Sorry that's right. Up to now the 0day robot still cannot guarantee
the timely reporting of a runtime regression, nor can it guarantee
bisecting of a new regression even when some test actually triggered
the bug.

One fundamental challenge is, there are ~50,000 runtime "regressions"
queued for bisect. Obviously there is no way to bisect them all. So a
large portion of real regressions never get a chance to be bisected.
Not to mention the problem of bisect reliability and efficiency.

Most of the test "regressions" may be duplicates to each other (eg. a
bug in mainline kernel will also show up in various developer trees).
A great portion of them may also be random noises (eg. performance
fluctuations). We've tried various approaches to improve the
de-duplicate, filtering, prioritize etc. algorithms. Together with
increased test coverage, they have been reflected in our slowly
increasing report numbers. However there is still a long way to go.

Thanks,
Fengguang


Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds

2017-03-08 Thread Fengguang Wu

On Wed, Mar 08, 2017 at 12:13:12PM +, Chris Wilson wrote:

On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote:


FYI, we noticed the following commit:

commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the lock 
number for stress test")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 320M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


Now the test is running, it takes too long. :)


Sorry that's right. Up to now the 0day robot still cannot guarantee
the timely reporting of a runtime regression, nor can it guarantee
bisecting of a new regression even when some test actually triggered
the bug.

One fundamental challenge is, there are ~50,000 runtime "regressions"
queued for bisect. Obviously there is no way to bisect them all. So a
large portion of real regressions never get a chance to be bisected.
Not to mention the problem of bisect reliability and efficiency.

Most of the test "regressions" may be duplicates to each other (eg. a
bug in mainline kernel will also show up in various developer trees).
A great portion of them may also be random noises (eg. performance
fluctuations). We've tried various approaches to improve the
de-duplicate, filtering, prioritize etc. algorithms. Together with
increased test coverage, they have been reflected in our slowly
increasing report numbers. However there is still a long way to go.

Thanks,
Fengguang


Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds

2017-03-08 Thread Chris Wilson
On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust 
> the lock number for stress test")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: boot
> 
> on test machine: qemu-system-i386 -enable-kvm -m 320M
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):

Now the test is running, it takes too long. :)

wait_for_completion_interruptible() would stop the hung task check? That
leaves NMI watchdog to check if we hit a deadlock between the workers.

And add a timeout to the stress test.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


Re: [lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds

2017-03-08 Thread Chris Wilson
On Wed, Mar 08, 2017 at 09:08:54AM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust 
> the lock number for stress test")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: boot
> 
> on test machine: qemu-system-i386 -enable-kvm -m 320M
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):

Now the test is running, it takes too long. :)

wait_for_completion_interruptible() would stop the hung task check? That
leaves NMI watchdog to check if we hit a deadlock between the workers.

And add a timeout to the stress test.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds

2017-03-07 Thread kernel test robot

FYI, we noticed the following commit:

commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the 
lock number for stress test")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 320M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+---+++
|   | 
7fb4a2cea6 | 857811a371 |
+---+++
| boot_successes| 0 
 | 0  |
| boot_failures | 8 
 | 8  |
| WARNING:at_kernel/locking/lockdep.c:#__lock_acquire   | 8 
 ||
| WARNING:at_drivers/gpu/drm/drm_mode_config.c:#drm_mode_config_cleanup | 8 
 | 4  |
| INFO:task_blocked_for_more_than#seconds   | 0 
 | 4  |
| Kernel_panic-not_syncing:hung_task:blocked_tasks  | 0 
 | 4  |
| calltrace:debug_show_all_locks| 0 
 | 2  |
+---+++



[  246.752394] INFO: task swapper:1 blocked for more than 120 seconds.
[  246.765907]   Not tainted 4.10.0-11078-g857811a #1
[  246.777618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  246.793776] swapper D 5948 1  0 0x
[  246.806182] Call Trace:
[  246.812532]  __schedule+0x369/0xdb0
[  246.821768]  schedule+0x3c/0xd0
[  246.830251]  schedule_timeout+0x105/0x410
[  246.837524]  ? trace_hardirqs_on_caller+0x12c/0x1d0
[  246.849060]  ? do_raw_spin_unlock+0xd9/0x170
[  246.859553]  ? trace_hardirqs_on+0xb/0x10
[  246.867883]  wait_for_completion+0x11a/0x190
[  246.878229]  ? do_task_dead+0x70/0x70
[  246.887266]  flush_workqueue+0x2fc/0xc30
[  246.896894]  ? flush_workqueue+0xa0/0xc30
[  246.906836]  ? kmalloc_array+0x70/0x70
[  246.915459]  stress+0x18c/0x214
[  246.923946]  test_ww_mutex_init+0x3b2/0x3d9
[  246.934226]  ? lock_torture_init+0x7f0/0x7f0
[  246.944458]  do_one_initcall+0x9f/0x1b6
[  246.951910]  ? parse_args+0x465/0x540
[  246.959370]  kernel_init_freeable+0x109/0x1cb
[  246.967281]  ? rest_init+0x130/0x130
[  246.976223]  kernel_init+0x11/0x190
[  246.982703]  ? schedule_tail+0x11/0x70
[  246.990237]  ? rest_init+0x130/0x130
[  246.42]  ? rest_init+0x130/0x130
[  247.008987]  ret_from_fork+0x21/0x2c
[  247.018186] 
[  247.018186] Showing all locks held in the system:
[  247.033511] 2 locks held by khungtaskd/11:
[  247.040956]  #0:  (rcu_read_lock){..}, at: [] 
watchdog+0xc7/0x7b0
[  247.054513]  #1:  (tasklist_lock){.+.+..}, at: [] 
debug_show_all_locks+0x37/0x1a0
[  247.073278] 4 locks held by kworker/u2:1/30:
[  247.083586]  #0:  ("test-ww_mutex"){.+}, at: [] 
process_one_work+0x21c/0xab0
[  247.101437]  #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x21c/0xab0
[  247.117544]  #2:  (ww_class_acquire){+.+...}, at: [] 
process_one_work+0x2e3/0xab0
[  247.132833]  #3:  (ww_class_mutex){+.+...}, at: [] 
stress_reorder_work+0x1cb/0x4f0
[  247.152234] 
[  247.156751] =
[  247.156751] 
[  247.173280] NMI backtrace for cpu 0
[  247.182489] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 
4.10.0-11078-g857811a #1
[  247.199866] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  247.220997] Call Trace:
[  247.228372]  dump_stack+0x16/0x20
[  247.236300]  nmi_cpu_backtrace+0xb3/0xc0
[  247.245947]  nmi_trigger_cpumask_backtrace+0x10a/0x150
[  247.256543]  ? debug_show_all_locks+0x15b/0x1a0
[  247.267307]  ? setup_vector_irq+0x280/0x280
[  247.277455]  arch_trigger_cpumask_backtrace+0x15/0x20
[  247.287045]  watchdog+0x4fd/0x7b0
[  247.296058]  ? hung_task_panic+0x20/0x20
[  247.306052]  kthread+0x153/0x160
[  247.314949]  ? hung_task_panic+0x20/0x20
[  247.324960]  ? __kthread_create_on_node+0x230/0x230
[  247.336396]  ret_from_fork+0x21/0x2c
[  247.345793] Kernel panic - not syncing: hung_task: blocked tasks
[  247.346565] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 
4.10.0-11078-g857811a #1
[  247.346565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  247.346565] Call Trace:
[  247.346565]  dump_stack+0x16/0x20
[  247.346565]  panic+0xda/0x295
[  247.346565]  watchdog+0x509/0x7b0
[  247.346565]  ? hung_task_panic+0x20/0x20
[  247.346565]  kthread+0x153/0x160
[  247.346565]  ? hung_task_panic+0x20/0x20
[  247.346565]  ? __kthread_create_on_node+0x230/0x230
[  247.346565]  ret_from_fork+0x21/0x2c
[  

[lkp-robot] [locking/ww_mutex] 857811a371: INFO:task_blocked_for_more_than#seconds

2017-03-07 Thread kernel test robot

FYI, we noticed the following commit:

commit: 857811a37129f5d2ba162d7be3986eff44724014 ("locking/ww_mutex: Adjust the 
lock number for stress test")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 320M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+---+++
|   | 
7fb4a2cea6 | 857811a371 |
+---+++
| boot_successes| 0 
 | 0  |
| boot_failures | 8 
 | 8  |
| WARNING:at_kernel/locking/lockdep.c:#__lock_acquire   | 8 
 ||
| WARNING:at_drivers/gpu/drm/drm_mode_config.c:#drm_mode_config_cleanup | 8 
 | 4  |
| INFO:task_blocked_for_more_than#seconds   | 0 
 | 4  |
| Kernel_panic-not_syncing:hung_task:blocked_tasks  | 0 
 | 4  |
| calltrace:debug_show_all_locks| 0 
 | 2  |
+---+++



[  246.752394] INFO: task swapper:1 blocked for more than 120 seconds.
[  246.765907]   Not tainted 4.10.0-11078-g857811a #1
[  246.777618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  246.793776] swapper D 5948 1  0 0x
[  246.806182] Call Trace:
[  246.812532]  __schedule+0x369/0xdb0
[  246.821768]  schedule+0x3c/0xd0
[  246.830251]  schedule_timeout+0x105/0x410
[  246.837524]  ? trace_hardirqs_on_caller+0x12c/0x1d0
[  246.849060]  ? do_raw_spin_unlock+0xd9/0x170
[  246.859553]  ? trace_hardirqs_on+0xb/0x10
[  246.867883]  wait_for_completion+0x11a/0x190
[  246.878229]  ? do_task_dead+0x70/0x70
[  246.887266]  flush_workqueue+0x2fc/0xc30
[  246.896894]  ? flush_workqueue+0xa0/0xc30
[  246.906836]  ? kmalloc_array+0x70/0x70
[  246.915459]  stress+0x18c/0x214
[  246.923946]  test_ww_mutex_init+0x3b2/0x3d9
[  246.934226]  ? lock_torture_init+0x7f0/0x7f0
[  246.944458]  do_one_initcall+0x9f/0x1b6
[  246.951910]  ? parse_args+0x465/0x540
[  246.959370]  kernel_init_freeable+0x109/0x1cb
[  246.967281]  ? rest_init+0x130/0x130
[  246.976223]  kernel_init+0x11/0x190
[  246.982703]  ? schedule_tail+0x11/0x70
[  246.990237]  ? rest_init+0x130/0x130
[  246.42]  ? rest_init+0x130/0x130
[  247.008987]  ret_from_fork+0x21/0x2c
[  247.018186] 
[  247.018186] Showing all locks held in the system:
[  247.033511] 2 locks held by khungtaskd/11:
[  247.040956]  #0:  (rcu_read_lock){..}, at: [] 
watchdog+0xc7/0x7b0
[  247.054513]  #1:  (tasklist_lock){.+.+..}, at: [] 
debug_show_all_locks+0x37/0x1a0
[  247.073278] 4 locks held by kworker/u2:1/30:
[  247.083586]  #0:  ("test-ww_mutex"){.+}, at: [] 
process_one_work+0x21c/0xab0
[  247.101437]  #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x21c/0xab0
[  247.117544]  #2:  (ww_class_acquire){+.+...}, at: [] 
process_one_work+0x2e3/0xab0
[  247.132833]  #3:  (ww_class_mutex){+.+...}, at: [] 
stress_reorder_work+0x1cb/0x4f0
[  247.152234] 
[  247.156751] =
[  247.156751] 
[  247.173280] NMI backtrace for cpu 0
[  247.182489] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 
4.10.0-11078-g857811a #1
[  247.199866] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  247.220997] Call Trace:
[  247.228372]  dump_stack+0x16/0x20
[  247.236300]  nmi_cpu_backtrace+0xb3/0xc0
[  247.245947]  nmi_trigger_cpumask_backtrace+0x10a/0x150
[  247.256543]  ? debug_show_all_locks+0x15b/0x1a0
[  247.267307]  ? setup_vector_irq+0x280/0x280
[  247.277455]  arch_trigger_cpumask_backtrace+0x15/0x20
[  247.287045]  watchdog+0x4fd/0x7b0
[  247.296058]  ? hung_task_panic+0x20/0x20
[  247.306052]  kthread+0x153/0x160
[  247.314949]  ? hung_task_panic+0x20/0x20
[  247.324960]  ? __kthread_create_on_node+0x230/0x230
[  247.336396]  ret_from_fork+0x21/0x2c
[  247.345793] Kernel panic - not syncing: hung_task: blocked tasks
[  247.346565] CPU: 0 PID: 11 Comm: khungtaskd Not tainted 
4.10.0-11078-g857811a #1
[  247.346565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  247.346565] Call Trace:
[  247.346565]  dump_stack+0x16/0x20
[  247.346565]  panic+0xda/0x295
[  247.346565]  watchdog+0x509/0x7b0
[  247.346565]  ? hung_task_panic+0x20/0x20
[  247.346565]  kthread+0x153/0x160
[  247.346565]  ? hung_task_panic+0x20/0x20
[  247.346565]  ? __kthread_create_on_node+0x230/0x230
[  247.346565]  ret_from_fork+0x21/0x2c
[