Ever since upgrading to a Ryzen (1700X), I have experienced frequent
system freezes, which may be related to the problems discussed here.


The freeze mostly happens during a certain heavily threaded task with
disk io.

Symptoms:

* Screen completely freezes, including mouse pointer,
* Existing SSH connections die, no new connection can be established,
* System can no longer switch to text console,
* LEDs indicate **unceasing disk activity**,
* System still responds to pings,
* Alt-SysRq keys remain active, but cannot output to screen even if already in 
text console.

I've succeeded in capturing kernel logging after a freeze using
netconsole:

This timeout message appears:

    [35042.581242] INFO: task jbd2/dm-2-8:610 blocked for more than 120 seconds.
    [35042.581259]       Not tainted 4.15.0-62-generic #69-Ubuntu
    [35042.581262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
    [35042.581273] jbd2/dm-2-8     D    0   610      2 0x80000000
    [35042.581278] Call Trace:
    [35042.581290]  __schedule+0x24e/0x880
    [35042.581295]  ? bit_wait+0x60/0x60
    [35042.581300]  schedule+0x2c/0x80
    [35042.581304]  io_schedule+0x16/0x40
    [35042.581308]  bit_wait_io+0x11/0x60
    [35042.581313]  __wait_on_bit+0x4c/0x90
    [35042.581317]  out_of_line_wait_on_bit+0x90/0xb0
    [35042.581323]  ? bit_waitqueue+0x40/0x40
    [35042.581328]  __wait_on_buffer+0x32/0x40
    [35042.581333]  jbd2_journal_commit_transaction+0xdac/0x1730
    [35042.581337]  ? __switch_to_asm+0x41/0x70
    [35042.581343]  kjournald2+0xc8/0x270
    [35042.581347]  ? kjournald2+0xc8/0x270
    [35042.581351]  ? wait_woken+0x80/0x80
    [35042.581355]  kthread+0x121/0x140
    [35042.581359]  ? commit_timeout+0x20/0x20
    [35042.581363]  ? kthread_create_worker_on_cpu+0x70/0x70
    [35042.581366]  ret_from_fork+0x22/0x40
    [35042.581242] INFO: task jbd2/dm-2-8:610 blocked for more than 120 seconds.
    [35042.581259]       Not tainted 4.15.0-62-generic #69-Ubuntu
    [35042.581262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
    [35042.581273] jbd2/dm-2-8     D    0   610      2 0x80000000
    [35042.581278] Call Trace:
    [35042.581290]  __schedule+0x24e/0x880
    [35042.581295]  ? bit_wait+0x60/0x60
    [35042.581300]  schedule+0x2c/0x80
    [35042.581304]  io_schedule+0x16/0x40
    [35042.581308]  bit_wait_io+0x11/0x60
    [35042.581313]  __wait_on_bit+0x4c/0x90
    [35042.581317]  out_of_line_wait_on_bit+0x90/0xb0
    [35042.581323]  ? bit_waitqueue+0x40/0x40
    [35042.581328]  __wait_on_buffer+0x32/0x40
    [35042.581333]  jbd2_journal_commit_transaction+0xdac/0x1730
    [35042.581337]  ? __switch_to_asm+0x41/0x70
    [35042.581343]  kjournald2+0xc8/0x270
    [35042.581347]  ? kjournald2+0xc8/0x270
    [35042.581351]  ? wait_woken+0x80/0x80
    [35042.581355]  kthread+0x121/0x140
    [35042.581359]  ? commit_timeout+0x20/0x20
    [35042.581363]  ? kthread_create_worker_on_cpu+0x70/0x70
    [35042.581366]  ret_from_fork+0x22/0x40

Also, I have thousands of lines of output for blocked tasks. Most traces
look more or less like this:

    [34274.346748] sysrq: SysRq : Show Blocked State
    [34274.346766]   task                        PC stack   pid father
    [34274.346771] systemd         D    0     1      0 0x00000000
    [34274.346776] Call Trace:
    [34274.346786]  __schedule+0x24e/0x880
    [34274.346792]  ? mempool_alloc_slab+0x15/0x20
    [34274.346795]  schedule+0x2c/0x80
    [34274.346798]  schedule_timeout+0x15d/0x350
    [34274.346804]  ? __next_timer_interrupt+0xe0/0xe0
    [34274.346808]  ? wait_woken+0x80/0x80
    [34274.346812]  io_schedule_timeout+0x1e/0x50
    [34274.346815]  mempool_alloc+0x15d/0x190
    [34274.346820]  ? wait_woken+0x80/0x80
    [34274.346825]  bio_alloc_bioset+0xa9/0x1e0
    [34274.346830]  __split_and_process_non_flush+0x147/0x2c0
    [34274.346834]  __split_and_process_bio+0x139/0x2a0
    [34274.346838]  dm_make_request+0x7a/0xd0
    [34274.346843]  ? SyS_madvise+0x990/0x990
    [34274.346847]  generic_make_request+0x124/0x300
    [34274.346850]  submit_bio+0x73/0x140
    [34274.346853]  ? submit_bio+0x73/0x140
    [34274.346856]  ? get_swap_bio+0xcd/0x100
    [34274.346861]  __swap_writepage+0x323/0x3b0
    [34274.346865]  ? __frontswap_store+0x73/0x100
    [34274.346869]  swap_writepage+0x34/0x90
    [34274.346872]  pageout.isra.54+0x11b/0x350
    [34274.346878]  shrink_page_list+0x99a/0xbc0
    [34274.346883]  shrink_inactive_list+0x242/0x590
    [34274.346887]  shrink_node_memcg+0x364/0x770
    [34274.346892]  shrink_node+0xf7/0x300
    [34274.346896]  ? shrink_node+0xf7/0x300
    [34274.346900]  do_try_to_free_pages+0xc9/0x330
    [34274.346904]  try_to_free_pages+0xee/0x1b0
    [34274.346910]  __alloc_pages_slowpath+0x3fc/0xe00
    [34274.346914]  ? __switch_to_asm+0x35/0x70
    [34274.346917]  ? __switch_to_asm+0x35/0x70
    [34274.346920]  ? __switch_to_asm+0x35/0x70
    [34274.346924]  ? __switch_to_asm+0x35/0x70
    [34274.346929]  ? __switch_to_asm+0x35/0x70
    [34274.346932]  ? __switch_to_asm+0x41/0x70
    [34274.346936]  __alloc_pages_nodemask+0x29a/0x2c0
    [34274.346940]  alloc_pages_current+0x6a/0xe0
    [34274.346944]  __page_cache_alloc+0x81/0xa0
    [34274.346948]  __do_page_cache_readahead+0x113/0x2c0
    [34274.346952]  ? radix_tree_lookup_slot+0x22/0x50
    [34274.346956]  ? find_get_entry+0x1e/0x110
    [34274.346959]  filemap_fault+0x2ad/0x6f0
    [34274.346968]  ? filemap_fault+0x2ad/0x6f0
    [34274.346971]  ? page_add_file_rmap+0x134/0x180
    [34274.346975]  ? filemap_map_pages+0x181/0x390
    [34274.346980]  ext4_filemap_fault+0x31/0x44
    [34274.346748] sysrq: SysRq : Show Blocked State
    [34274.346984]  __do_fault+0x5b/0x115
    [34274.346988]  __handle_mm_fault+0xdef/0x1290
    [34274.346992]  handle_mm_fault+0xb1/0x210
    [34274.346997]  __do_page_fault+0x281/0x4b0
    [34274.347001]  do_page_fault+0x2e/0xe0
    [34274.347004]  ? page_fault+0x2f/0x50
    [34274.347008]  page_fault+0x45/0x50
    [34274.347011] RIP: 0033:0x7fa9446ee83a
    [34274.347015] RSP: 002b:00007ffcccb01470 EFLAGS: 00010206
    [34274.347019] RAX: 0000000000000001 RBX: 00005615eff63650 RCX: 
00007fa944bcebb7
    [34274.347021] RDX: 0000000000000093 RSI: 00007ffcccb01470 RDI: 
0000000000000000
    [34274.347025] RBP: 00007ffcccb01c60 R08: 0000000000000000 R09: 
0000000000000008
    [34274.347027] R10: 00000000ffffffff R11: 0000000000000000 R12: 
0000000000000001
    [34274.347032] R13: ffffffffffffffff R14: 00007ffcccb01470 R15: 
0000000000000001

Another detail that may be relevant it that show-task-states outputs
about 5000 lines of this kind:

    [34830.962684]     in-flight: 3235:kcryptd_crypt [dm_crypt], 
3237:kcryptd_crypt [dm_crypt], 6056:kcryptd_crypt [dm_crypt], 
6058:kcryptd_crypt [dm_crypt], 6055:kcryptd_crypt [dm_crypt], 
6057:kcryptd_crypt [dm_crypt], 3992:kcryptd_crypt [dm_crypt], 
4861:kcryptd_crypt [dm_crypt], 32431:kcryptd_crypt [dm_crypt], 
2682:kcryptd_crypt [dm_crypt], 4850:kcryptd_crypt [dm_crypt], 
1429:kcryptd_crypt [dm_crypt], 6054:kcryptd_crypt [dm_crypt], 
6060:kcryptd_crypt [dm_crypt], 4862:kcryptd_crypt [dm_crypt], 
1862:kcryptd_crypt [dm_crypt]
    [34830.874519] DefaultDispatch 22853    343906.389378     12131   120 
    [34830.962714]     delayed: kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt 
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
    [34830.962761] , kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
    [34830.881008]   .nr_spread_over                : 0
    [34830.962862] , kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], 
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]

Is there someone who can interpret all this? If it is helpful I can
attach the full blocked-tasks output.

(kernel version is 4.15.0-62-generic)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to