** Summary changed: - kernel panic after upgrading to kernel 5.13.0-23 + amd_sfh: Null pointer dereference on early device init causes early panic and fails to boot
** Description changed: - After upgrading my son's Asus PN50 with Ubuntu 21.10 to the latest - kernel 5.13.0-23, I am no longer able to boot it normally. Kernel fails - with the panic halfway through the boot process (which got overall - suspiciously slow): + BugLink: https://bugs.launchpad.net/bugs/1956519 - [ 1.359465] BUG: kernel NULL pointer dereference, address: 000000000000000c - [ 1.359498] #PF: supervisor write access in kernel mode - [ 1.359519] #PF: error_code(0x0002) - not-present page - [ 1.359540] PGD 0 P4D 0 - [ 1.359553] Oops: 0002 [#1] SMP NOPTI - [ 1.359569] CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu - [ 1.359602] Hardware name: ASUSTeK COMPUTER INC. MINIPC PN50/PN50, BIOS 0623 05/13/2021 - [ 1.359632] RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] - [ 1.359661] Code: 00 53 48 83 ec 20 48 8b 5f 08 48 8b 07 48 8d b3 22 01 00 00 4c 8d b0 c8 00 00 00 e8 23 07 00 00 45 31 c0 31 c9 ba 00 00 20 00 <89> 43 0c 48 8d 83 68 01 00 00 48 8d bb 80 01 00 00 48 c7 c6 20 6d - [ 1.359729] RSP: 0018:ffffbf71c099f9d8 EFLAGS: 00010246 - [ 1.359750] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 - [ 1.359777] RDX: 0000000000200000 RSI: ffffffffc03cd249 RDI: ffffffffa680004c - [ 1.359804] RBP: ffffbf71c099fa20 R08: 0000000000000000 R09: 0000000000000006 - [ 1.359831] R10: ffffbf71c0d00000 R11: 0000000000000007 R12: 0000000fffffffe0 - [ 1.359857] R13: ffff992bc3387cd8 R14: ffff992bc11560c8 R15: ffff992bc3387cd8 - [ 1.359884] FS: 00007ff0ec1a48c0(0000) GS:ffff992ebf600000(0000) knlGS:0000000000000000 - [ 1.359915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 - [ 1.359937] CR2: 000000000000000c CR3: 0000000102fd0000 CR4: 0000000000350ef0 - [ 1.359964] Call Trace: - [ 1.359976] ? __pci_set_master+0x5f/0xe0 - [ 1.359997] amd_mp2_pci_probe+0xad/0x160 [amd_sfh] - [ 1.360021] local_pci_probe+0x48/0x80 - [ 1.360038] pci_device_probe+0x105/0x1c0 - [ 1.360056] really_probe+0x24b/0x4c0 - [ 1.360073] driver_probe_device+0xf0/0x160 - [ 1.360091] device_driver_attach+0xab/0xb0 - [ 1.360110] __driver_attach+0xb2/0x140 - [ 1.360126] ? device_driver_attach+0xb0/0xb0 - [ 1.360145] bus_for_each_dev+0x7e/0xc0 - [ 1.360161] driver_attach+0x1e/0x20 - [ 1.360177] bus_add_driver+0x135/0x1f0 - [ 1.360194] driver_register+0x95/0xf0 - [ 1.360210] ? 0xffffffffc03d2000 - [ 1.360225] __pci_register_driver+0x57/0x60 - [ 1.360242] amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh] - [ 1.360266] do_one_initcall+0x48/0x1d0 - [ 1.360284] ? kmem_cache_alloc_trace+0xfb/0x240 - [ 1.360306] do_init_module+0x62/0x290 - [ 1.360323] load_module+0xa8f/0xb10 - [ 1.360340] __do_sys_finit_module+0xc2/0x120 - [ 1.360359] __x64_sys_finit_module+0x18/0x20 - [ 1.360377] do_syscall_64+0x61/0xb0 - [ 1.361638] ? ksys_mmap_pgoff+0x135/0x260 - [ 1.362883] ? exit_to_user_mode_prepare+0x37/0xb0 - [ 1.364121] ? syscall_exit_to_user_mode+0x27/0x50 - [ 1.365343] ? __x64_sys_mmap+0x33/0x40 - [ 1.366550] ? do_syscall_64+0x6e/0xb0 - [ 1.367749] ? do_syscall_64+0x6e/0xb0 - [ 1.368923] ? do_syscall_64+0x6e/0xb0 - [ 1.370079] ? syscall_exit_to_user_mode+0x27/0x50 - [ 1.371227] ? do_syscall_64+0x6e/0xb0 - [ 1.372359] ? exc_page_fault+0x8f/0x170 - [ 1.373478] ? asm_exc_page_fault+0x8/0x30 - [ 1.374584] entry_SYSCALL_64_after_hwframe+0x44/0xae - [ 1.375684] RIP: 0033:0x7ff0ec73a94d - [ 1.376767] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 64 0f 00 f7 d8 64 89 01 48 - [ 1.377926] RSP: 002b:00007ffd00724ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 - [ 1.379076] RAX: ffffffffffffffda RBX: 000055e130084390 RCX: 00007ff0ec73a94d - [ 1.380225] RDX: 0000000000000000 RSI: 00007ff0ec8ca3fe RDI: 0000000000000005 - [ 1.381363] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 - [ 1.382488] R10: 0000000000000005 R11: 0000000000000246 R12: 00007ff0ec8ca3fe - [ 1.383598] R13: 000055e130083370 R14: 000055e130084480 R15: 000055e130086cb0 - [ 1.384698] Modules linked in: ahci(+) libahci i2c_piix4(+) r8169(+) amd_sfh(+) i2c_hid_acpi realtek i2c_hid xhci_pci(+) xhci_pci_renesas wmi(+) video(+) fjes(+) hid - [ 1.385841] CR2: 000000000000000c - [ 1.386955] ---[ end trace b2ebcacf74b788da ]--- - [ 1.388064] RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] - [ 1.389176] Code: 00 53 48 83 ec 20 48 8b 5f 08 48 8b 07 48 8d b3 22 01 00 00 4c 8d b0 c8 00 00 00 e8 23 07 00 00 45 31 c0 31 c9 ba 00 00 20 00 <89> 43 0c 48 8d 83 68 01 00 00 48 8d bb 80 01 00 00 48 c7 c6 20 6d - [ 1.390374] RSP: 0018:ffffbf71c099f9d8 EFLAGS: 00010246 - [ 1.391560] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 - [ 1.392338] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20 - [ 1.392763] RDX: 0000000000200000 RSI: ffffffffc03cd249 RDI: ffffffffa680004c - [ 1.395162] RBP: ffffbf71c099fa20 R08: 0000000000000000 R09: 0000000000000006 - [ 1.396372] R10: ffffbf71c0d00000 R11: 0000000000000007 R12: 0000000fffffffe0 - [ 1.397564] R13: ffff992bc3387cd8 R14: ffff992bc11560c8 R15: ffff992bc3387cd8 - [ 1.398754] FS: 00007ff0ec1a48c0(0000) GS:ffff992ebf600000(0000) knlGS:0000000000000000 - [ 1.399916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 - [ 1.401044] CR2: 000000000000000c CR3: 0000000102fd0000 CR4: 0000000000350ef0 + [Impact] - Previous kernel 5.13.0-22 works alright. + A regression was introduced into 5.13.0-23-generic for devices using AMD + Ryzen chipsets that incorporate AMD Sensor Fusion Hub (SFH) HID devices, + which are mostly Ryzen based laptops, but desktops do have the SOC + embedded as well. - ProblemType: Bug - DistroRelease: Ubuntu 21.10 - Package: linux-image-5.13.0-23-generic 5.13.0-23.23 - ProcVersionSignature: Ubuntu 5.13.0-22.22-generic 5.13.19 - Uname: Linux 5.13.0-22-generic x86_64 - NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair - ApportVersion: 2.20.11-0ubuntu71 - Architecture: amd64 - AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-id', '/dev/snd/controlC1', '/dev/snd/pcmC1D0c', '/dev/snd/controlC2', '/dev/snd/hwC2D0', '/dev/snd/pcmC2D0c', '/dev/snd/pcmC2D0p', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: - CasperMD5CheckResult: unknown - Date: Wed Jan 5 19:00:15 2022 - InstallationDate: Installed on 2021-01-01 (369 days ago) - InstallationMedia: Ubuntu 20.10 "Groovy Gorilla" - Release amd64 (20201022) - MachineType: ASUSTeK COMPUTER INC. MINIPC PN50 - ProcFB: 0 amdgpudrmfb - ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_ct91lc@/vmlinuz-5.13.0-22-generic root=ZFS=rpool/ROOT/ubuntu_ct91lc ro quiet splash - RelatedPackageVersions: - linux-restricted-modules-5.13.0-22-generic N/A - linux-backports-modules-5.13.0-22-generic N/A - linux-firmware 1.201.3 - SourcePackage: linux - UpgradeStatus: Upgraded to impish on 2021-10-17 (80 days ago) - WifiSyslog: + On early boot, when the driver initialises the device, it hits a null + pointer dereference with the following stack trace: - dmi.bios.date: 05/13/2021 - dmi.bios.release: 6.23 - dmi.bios.vendor: ASUSTeK COMPUTER INC. - dmi.bios.version: 0623 - dmi.board.asset.tag: Default string - dmi.board.name: PN50 - dmi.board.vendor: ASUSTeK COMPUTER INC. - dmi.board.version: To be filled by O.E.M. - dmi.chassis.asset.tag: Default string - dmi.chassis.type: 35 - dmi.chassis.vendor: Default string - dmi.chassis.version: Default string - dmi.modalias: dmi:bvnASUSTeKCOMPUTERINC.:bvr0623:bd05/13/2021:br6.23:svnASUSTeKCOMPUTERINC.:pnMINIPCPN50:pvr0623:rvnASUSTeKCOMPUTERINC.:rnPN50:rvrTobefilledbyO.E.M.:cvnDefaultstring:ct35:cvrDefaultstring:sku: - dmi.product.family: Vivo PC - dmi.product.name: MINIPC PN50 - dmi.product.version: 0623 - dmi.sys.vendor: ASUSTeK COMPUTER INC. + BUG: kernel NULL pointer dereference, address: 000000000000000c + #PF: supervisor write access in kernel mode + #PF: error_code(0x0002) - not-present page + PGD 0 P4D 0 + Oops: 0002 [#1] SMP NOPTI + CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu + RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh] + Call Trace: + ? __pci_set_master+0x5f/0xe0 + amd_mp2_pci_probe+0xad/0x160 [amd_sfh] + local_pci_probe+0x48/0x80 + pci_device_probe+0x105/0x1c0 + really_probe+0x24b/0x4c0 + driver_probe_device+0xf0/0x160 + device_driver_attach+0xab/0xb0 + __driver_attach+0xb2/0x140 + ? device_driver_attach+0xb0/0xb0 + bus_for_each_dev+0x7e/0xc0 + driver_attach+0x1e/0x20 + bus_add_driver+0x135/0x1f0 + driver_register+0x95/0xf0 + ? 0xffffffffc03d2000 + __pci_register_driver+0x57/0x60 + amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh] + do_one_initcall+0x48/0x1d0 + ? kmem_cache_alloc_trace+0xfb/0x240 + do_init_module+0x62/0x290 + load_module+0xa8f/0xb10 + __do_sys_finit_module+0xc2/0x120 + __x64_sys_finit_module+0x18/0x20 + do_syscall_64+0x61/0xb0 + ? ksys_mmap_pgoff+0x135/0x260 + ? exit_to_user_mode_prepare+0x37/0xb0 + ? syscall_exit_to_user_mode+0x27/0x50 + ? __x64_sys_mmap+0x33/0x40 + ? do_syscall_64+0x6e/0xb0 + ? do_syscall_64+0x6e/0xb0 + ? do_syscall_64+0x6e/0xb0 + ? syscall_exit_to_user_mode+0x27/0x50 + ? do_syscall_64+0x6e/0xb0 + ? exc_page_fault+0x8f/0x170 + ? asm_exc_page_fault+0x8/0x30 + entry_SYSCALL_64_after_hwframe+0x44/0xae + + This causes a panic and the system is unable to continue booting, and + the user must select an older kernel to boot. + + [Fix] + + The issue was introduced in 5.13.0-23-generic by the commit: + + commit d46ef750ed58cbeeba2d9a55c99231c30a172764 + commit-impish 56559d7910e704470ad72da58469b5588e8cbf85 + Author: Evgeny Novikov <novi...@ispras.ru> + Date: Tue Jun 1 19:38:01 2021 +0300 + Subject:HID: amd_sfh: Fix potential NULL pointer dereference + Link: https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764 + + The issue is pretty straightforward, amd_sfh_client.c attempts to + dereference cl_data, but it is NULL: + + $ eu-addr2line -ifae ./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko amd_sfh_hid_client_init+0x47 + 0x0000000000000767 + amd_sfh_hid_client_init + /build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27 + + 134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) + 135 { + ... + 146 + 147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]); + 148 + ... + + The patch moves the call to amd_sfh_hid_client_init() before + privdata->cl_data is actually allocated by devm_kzalloc, hence cl_data + being NULL. + + + rc = amd_sfh_hid_client_init(privdata); + + if (rc) + + return rc; + + + privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL); + if (!privdata->cl_data) + return -ENOMEM; + ... + - return amd_sfh_hid_client_init(privdata); + + return 0; + + The issue was fixed upstream in 5.15-rc4 by the commit: + + commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee + Author: Basavaraj Natikar <basavaraj.nati...@amd.com> + Date: Thu Sep 23 17:59:27 2021 +0530 + Subject: HID: amd_sfh: Fix potential NULL pointer dereference + Link: https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee + + The fix places the call to amd_sfh_hid_client_init() after + privdata->cl_data is allocated, and it changes the order of + amd_sfh_hid_client_init() to happen before devm_add_action_or_reset() + fixing the actual null pointer dereference which caused these commits to + exist. + + This patch also landed in 5.14.10 -stable, but it seems it was omitted + from being backported to impish, likely due to it sharing the exact same + subject line as the regression commit, so it was likely dropped as a + duplicate? + + [Testcase] + + You need an AMD Ryzen based system that has a AMD Sensor Fusion Hub HID + device built in to test this. + + Simply booting the system is enough to trigger the issue. + + A test kernel is available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/lp1956519-test + + A community user has tested the test kernel, and has confirmed that it + fixes the issue. + + [Where problems could occur] + + If a regression were to occur, it would only affect AMD Ryzen based + systems with the AMD Sensor Fusion Hub HID device SOC. Since the changes + affect the device initialisation function, a regression could cause + systems to panic during boot, forcing users to revert to older kernels + to start their systems. + + Saying that, the patch is present in 5.15-rc4 and is in 5.14.10, and is + in widespread use, and is already present in Jammy. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1956519 Title: amd_sfh: Null pointer dereference on early device init causes early panic and fails to boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956519/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs