Public bug reported:

I have a network of stock 16.04.2 LTS (Xenial Xerus) servers with
entirely unmodified "4.4.0-62-generic #83-Ubuntu" kernel running on a
private network; they run telemetry programs, mostly sh/php out of
crontab, with very light user interaction for configuration via apache
and extremely occasional adminstrator ssh access.  They all are on the
same hardware: same motherboard, same amount of RAM, vary similar very
small SATA SSD disks.

A recent fault made us examine the logs, and we see that since 2017
about half a dozen servers are reporting kernel bugs about once a month.

 BUG: unable to handle kernel paging request at ffff88032fc00062
 CPU: 0 PID: 26071 Comm: mkdir Not tainted 4.4.0-62-generic #83-Ubuntu

The details vary.  The most common command is mkdir, but also rm, head,
basename, ls, sleep. (There are every-minute cronjobs sh-scripts which
run these commands.)

About half of the logs show tainted (G, D) and have untainted.

I have found no pattern with time of day, uptime, load (0.16, 0.22, 0.25
for following report), day of week.

This is a typical syslog entry, from 2021-01-29; it has the same issue
in March and May (Comm: mkdir, but tainted G D).

Jan 29 19:50:17 hostname kernel: [2315584.884470] BUG: unable to handle kernel 
paging request at ffff88042fc80062
Jan 29 19:50:17 hostname kernel: [2315584.884500] IP: [<ffffffff811af629>] 
__inc_zone_state+0x19/0x60
Jan 29 19:50:17 hostname kernel: [2315584.884524] PGD 220b067 PUD 0 
Jan 29 19:50:17 hostname kernel: [2315584.884538] Oops: 0002 [#1] SMP 
Jan 29 19:50:17 hostname kernel: [2315584.884552] Modules linked in: ppdev 
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel 
coretemp snd_hda_codec serio_raw snd_hda_core snd_hwdep snd_pcm snd_timer snd 
lpc_ich shpchp soundcore parport_pc mac_hid 8250_fintek parport ib_iser rdma_cm 
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear psmouse ahci e1000e libahci ptp pps_core video fjes
Jan 29 19:50:17 hostname kernel: [2315584.884744] CPU: 1 PID: 10730 Comm: mkdir 
Not tainted 4.4.0-62-generic #83-Ubuntu
Jan 29 19:50:17 hostname kernel: [2315584.884760] Hardware name:                
  /PD11TI, BIOS MTCDT10N.85T.0201.2014.1209.1030 12/09/2014
Jan 29 19:50:17 hostname kernel: [2315584.884779] task: ffff880034c13fc0 ti: 
ffff8800c8838000 task.ti: ffff8800c8838000
Jan 29 19:50:17 hostname kernel: [2315584.884795] RIP: 
0010:[<ffffffff811af629>]  [<ffffffff811af629>] __inc_zone_state+0x19/0x60
Jan 29 19:50:17 hostname kernel: [2315584.884816] RSP: 0000:ffff8800c883bc28  
EFLAGS: 00010203
Jan 29 19:50:17 hostname kernel: [2315584.884842] RAX: 0000000000000001 RBX: 
ffffea000285d540 RCX: 00000002ffffffff
Jan 29 19:50:17 hostname kernel: [2315584.884878] RDX: 0000000300000062 RSI: 
0000000000000021 RDI: ffffea000285d540
Jan 29 19:50:17 hostname kernel: [2315584.884915] RBP: ffff8800c883bc28 R08: 
ffffffff81cd2dc4 R09: ffffffff81cd2db3
Jan 29 19:50:17 hostname kernel: [2315584.884951] R10: 0000000000000000 R11: 
ffffffff81cd2da2 R12: ffff88012fff7f80
Jan 29 19:50:17 hostname kernel: [2315584.884987] R13: 0000000000800000 R14: 
ffffea000285d500 R15: ffff88012fff77c0
Jan 29 19:50:17 hostname kernel: [2315584.885027] FS:  00007fa2813a1800(0000) 
GS:ffff88012fc80000(0000) knlGS:0000000000000000
Jan 29 19:50:17 hostname kernel: [2315584.885065] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
Jan 29 19:50:17 hostname kernel: [2315584.885088] CR2: ffff88042fc80062 CR3: 
0000000034f10000 CR4: 00000000000006e0
Jan 29 19:50:17 hostname kernel: [2315584.885125] Stack:
Jan 29 19:50:17 hostname kernel: [2315584.885144]  ffff8800c883bcf0 
ffffffff811af98d ffff88012fff96c0 00000001df4d6b62
Jan 29 19:50:17 hostname kernel: [2315584.885186]  ffff880035327a10 
ffff880035327a00 ffff8800c97a7628 ffff8800c97a7628
Jan 29 19:50:17 hostname kernel: [2315584.885227]  00000000df4d6b62 
ffff880035327a00 ffff88012fff96d0 0000000000000000
Jan 29 19:50:17 hostname kernel: [2315584.885269] Call Trace:
Jan 29 19:50:17 hostname kernel: [2315584.885296]  [<ffffffff811af98d>] 
zone_statistics+0x5d/0xa0
Jan 29 19:50:17 hostname kernel: [2315584.885324]  [<ffffffff81198d29>] 
__alloc_pages_nodemask+0x159/0x2a0
Jan 29 19:50:17 hostname kernel: [2315584.885355]  [<ffffffff811e3f5e>] 
alloc_pages_vma+0xbe/0x240
Jan 29 19:50:17 hostname kernel: [2315584.885383]  [<ffffffff811c1e11>] 
handle_mm_fault+0x1491/0x1820
Jan 29 19:50:17 hostname kernel: [2315584.885410]  [<ffffffff811c9563>] ? 
do_mmap+0x333/0x420
Jan 29 19:50:17 hostname kernel: [2315584.885436]  [<ffffffff811adc2b>] ? 
vm_mmap_pgoff+0xbb/0xe0
Jan 29 19:50:17 hostname kernel: [2315584.885464]  [<ffffffff8106b4f7>] 
__do_page_fault+0x197/0x400
Jan 29 19:50:17 hostname kernel: [2315584.885490]  [<ffffffff8106b782>] 
do_page_fault+0x22/0x30
Jan 29 19:50:17 hostname kernel: [2315584.885517]  [<ffffffff8183a778>] 
page_fault+0x28/0x30
Jan 29 19:50:17 hostname kernel: [2315584.885541] Code: 41 5c 41 5d 41 5e 41 5f 
5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 4f 58 89 f6 55 b8 01 00 00 
00 48 89 e5 48 8d 54 31 42 <65> 0f c0 02 83 c0 01 65 8a 49 41 38 c8 7f 02 5d c3 
d0 f9 0f be 
Jan 29 19:50:17 hostname kernel: [2315584.885736] RIP  [<ffffffff811af629>] 
__inc_zone_state+0x19/0x60
Jan 29 19:50:17 hostname kernel: [2315584.885763]  RSP <ffff8800c883bc28>
Jan 29 19:50:17 hostname kernel: [2315584.885783] CR2: ffff88042fc80062
Jan 29 19:50:17 hostname kernel: [2315584.886024] ---[ end trace 
f32f2db37ef9c9df ]---


I have several years of logs on multiple machines for this and will be happy to 
supply whatever information is necessary.  As these machines are in service it 
is difficult to do experiments like run different kernels, but I'll consider 
any requests carefully.

Please ask for whatever logs/information might be helpful.

Thanks in advance for any help.

Jonathan

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1928405

Title:
  mkdir/rm/sleep/ls causes kernel 'BUG: unable to handle kernel paging
  request'

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1928405/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to