** Summary changed:

- kernel panic after upgrading to kernel 5.13.0-23
+ amd_sfh: Null pointer dereference on early device init causes early panic and 
fails to boot

** Description changed:

- After upgrading my son's Asus PN50 with Ubuntu 21.10 to the latest
- kernel 5.13.0-23, I am no longer able to boot it normally. Kernel fails
- with the panic halfway through the boot process (which got overall
- suspiciously slow):
+ BugLink: https://bugs.launchpad.net/bugs/1956519
  
- [    1.359465] BUG: kernel NULL pointer dereference, address: 000000000000000c
- [    1.359498] #PF: supervisor write access in kernel mode
- [    1.359519] #PF: error_code(0x0002) - not-present page
- [    1.359540] PGD 0 P4D 0
- [    1.359553] Oops: 0002 [#1] SMP NOPTI
- [    1.359569] CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 
5.13.0-23-generic #23-Ubuntu
- [    1.359602] Hardware name: ASUSTeK COMPUTER INC. MINIPC PN50/PN50, BIOS 
0623 05/13/2021
- [    1.359632] RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh]
- [    1.359661] Code: 00 53 48 83 ec 20 48 8b 5f 08 48 8b 07 48 8d b3 22 01 00 
00 4c 8d b0 c8 00 00 00 e8 23 07 00 00 45 31 c0 31 c9 ba 00 00 20 00 <89> 43 0c 
48 8d 83 68 01 00 00 48 8d bb 80 01 00 00 48 c7 c6 20 6d
- [    1.359729] RSP: 0018:ffffbf71c099f9d8 EFLAGS: 00010246
- [    1.359750] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
- [    1.359777] RDX: 0000000000200000 RSI: ffffffffc03cd249 RDI: 
ffffffffa680004c
- [    1.359804] RBP: ffffbf71c099fa20 R08: 0000000000000000 R09: 
0000000000000006
- [    1.359831] R10: ffffbf71c0d00000 R11: 0000000000000007 R12: 
0000000fffffffe0
- [    1.359857] R13: ffff992bc3387cd8 R14: ffff992bc11560c8 R15: 
ffff992bc3387cd8
- [    1.359884] FS:  00007ff0ec1a48c0(0000) GS:ffff992ebf600000(0000) 
knlGS:0000000000000000
- [    1.359915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [    1.359937] CR2: 000000000000000c CR3: 0000000102fd0000 CR4: 
0000000000350ef0
- [    1.359964] Call Trace:
- [    1.359976]  ? __pci_set_master+0x5f/0xe0
- [    1.359997]  amd_mp2_pci_probe+0xad/0x160 [amd_sfh]
- [    1.360021]  local_pci_probe+0x48/0x80
- [    1.360038]  pci_device_probe+0x105/0x1c0
- [    1.360056]  really_probe+0x24b/0x4c0
- [    1.360073]  driver_probe_device+0xf0/0x160
- [    1.360091]  device_driver_attach+0xab/0xb0
- [    1.360110]  __driver_attach+0xb2/0x140
- [    1.360126]  ? device_driver_attach+0xb0/0xb0
- [    1.360145]  bus_for_each_dev+0x7e/0xc0
- [    1.360161]  driver_attach+0x1e/0x20
- [    1.360177]  bus_add_driver+0x135/0x1f0
- [    1.360194]  driver_register+0x95/0xf0
- [    1.360210]  ? 0xffffffffc03d2000
- [    1.360225]  __pci_register_driver+0x57/0x60
- [    1.360242]  amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh]
- [    1.360266]  do_one_initcall+0x48/0x1d0
- [    1.360284]  ? kmem_cache_alloc_trace+0xfb/0x240
- [    1.360306]  do_init_module+0x62/0x290
- [    1.360323]  load_module+0xa8f/0xb10
- [    1.360340]  __do_sys_finit_module+0xc2/0x120
- [    1.360359]  __x64_sys_finit_module+0x18/0x20
- [    1.360377]  do_syscall_64+0x61/0xb0
- [    1.361638]  ? ksys_mmap_pgoff+0x135/0x260
- [    1.362883]  ? exit_to_user_mode_prepare+0x37/0xb0
- [    1.364121]  ? syscall_exit_to_user_mode+0x27/0x50
- [    1.365343]  ? __x64_sys_mmap+0x33/0x40
- [    1.366550]  ? do_syscall_64+0x6e/0xb0
- [    1.367749]  ? do_syscall_64+0x6e/0xb0
- [    1.368923]  ? do_syscall_64+0x6e/0xb0
- [    1.370079]  ? syscall_exit_to_user_mode+0x27/0x50
- [    1.371227]  ? do_syscall_64+0x6e/0xb0
- [    1.372359]  ? exc_page_fault+0x8f/0x170
- [    1.373478]  ? asm_exc_page_fault+0x8/0x30
- [    1.374584]  entry_SYSCALL_64_after_hwframe+0x44/0xae
- [    1.375684] RIP: 0033:0x7ff0ec73a94d
- [    1.376767] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d b3 64 0f 00 f7 d8 64 89 01 48
- [    1.377926] RSP: 002b:00007ffd00724ba8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000139
- [    1.379076] RAX: ffffffffffffffda RBX: 000055e130084390 RCX: 
00007ff0ec73a94d
- [    1.380225] RDX: 0000000000000000 RSI: 00007ff0ec8ca3fe RDI: 
0000000000000005
- [    1.381363] RBP: 0000000000020000 R08: 0000000000000000 R09: 
0000000000000000
- [    1.382488] R10: 0000000000000005 R11: 0000000000000246 R12: 
00007ff0ec8ca3fe
- [    1.383598] R13: 000055e130083370 R14: 000055e130084480 R15: 
000055e130086cb0
- [    1.384698] Modules linked in: ahci(+) libahci i2c_piix4(+) r8169(+) 
amd_sfh(+) i2c_hid_acpi realtek i2c_hid xhci_pci(+) xhci_pci_renesas wmi(+) 
video(+) fjes(+) hid
- [    1.385841] CR2: 000000000000000c
- [    1.386955] ---[ end trace b2ebcacf74b788da ]---
- [    1.388064] RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh]
- [    1.389176] Code: 00 53 48 83 ec 20 48 8b 5f 08 48 8b 07 48 8d b3 22 01 00 
00 4c 8d b0 c8 00 00 00 e8 23 07 00 00 45 31 c0 31 c9 ba 00 00 20 00 <89> 43 0c 
48 8d 83 68 01 00 00 48 8d bb 80 01 00 00 48 c7 c6 20 6d
- [    1.390374] RSP: 0018:ffffbf71c099f9d8 EFLAGS: 00010246
- [    1.391560] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
- [    1.392338] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 
0xb20
- [    1.392763] RDX: 0000000000200000 RSI: ffffffffc03cd249 RDI: 
ffffffffa680004c
- [    1.395162] RBP: ffffbf71c099fa20 R08: 0000000000000000 R09: 
0000000000000006
- [    1.396372] R10: ffffbf71c0d00000 R11: 0000000000000007 R12: 
0000000fffffffe0
- [    1.397564] R13: ffff992bc3387cd8 R14: ffff992bc11560c8 R15: 
ffff992bc3387cd8
- [    1.398754] FS:  00007ff0ec1a48c0(0000) GS:ffff992ebf600000(0000) 
knlGS:0000000000000000
- [    1.399916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [    1.401044] CR2: 000000000000000c CR3: 0000000102fd0000 CR4: 
0000000000350ef0
+ [Impact]
  
- Previous kernel 5.13.0-22 works alright.
+ A regression was introduced into 5.13.0-23-generic for devices using AMD
+ Ryzen chipsets that incorporate AMD Sensor Fusion Hub (SFH) HID devices,
+ which are mostly Ryzen based laptops, but desktops do have the SOC
+ embedded as well.
  
- ProblemType: Bug
- DistroRelease: Ubuntu 21.10
- Package: linux-image-5.13.0-23-generic 5.13.0-23.23
- ProcVersionSignature: Ubuntu 5.13.0-22.22-generic 5.13.19
- Uname: Linux 5.13.0-22-generic x86_64
- NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
- ApportVersion: 2.20.11-0ubuntu71
- Architecture: amd64
- AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-id', 
'/dev/snd/controlC1', '/dev/snd/pcmC1D0c', '/dev/snd/controlC2', 
'/dev/snd/hwC2D0', '/dev/snd/pcmC2D0c', '/dev/snd/pcmC2D0p', 
'/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', 
'/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', 
'/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
- CasperMD5CheckResult: unknown
- Date: Wed Jan  5 19:00:15 2022
- InstallationDate: Installed on 2021-01-01 (369 days ago)
- InstallationMedia: Ubuntu 20.10 "Groovy Gorilla" - Release amd64 (20201022)
- MachineType: ASUSTeK COMPUTER INC. MINIPC PN50
- ProcFB: 0 amdgpudrmfb
- ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_ct91lc@/vmlinuz-5.13.0-22-generic 
root=ZFS=rpool/ROOT/ubuntu_ct91lc ro quiet splash
- RelatedPackageVersions:
-  linux-restricted-modules-5.13.0-22-generic N/A
-  linux-backports-modules-5.13.0-22-generic  N/A
-  linux-firmware                             1.201.3
- SourcePackage: linux
- UpgradeStatus: Upgraded to impish on 2021-10-17 (80 days ago)
- WifiSyslog:
+ On early boot, when the driver initialises the device, it hits a null
+ pointer dereference with the following stack trace:
  
- dmi.bios.date: 05/13/2021
- dmi.bios.release: 6.23
- dmi.bios.vendor: ASUSTeK COMPUTER INC.
- dmi.bios.version: 0623
- dmi.board.asset.tag: Default string
- dmi.board.name: PN50
- dmi.board.vendor: ASUSTeK COMPUTER INC.
- dmi.board.version: To be filled by O.E.M.
- dmi.chassis.asset.tag: Default string
- dmi.chassis.type: 35
- dmi.chassis.vendor: Default string
- dmi.chassis.version: Default string
- dmi.modalias: 
dmi:bvnASUSTeKCOMPUTERINC.:bvr0623:bd05/13/2021:br6.23:svnASUSTeKCOMPUTERINC.:pnMINIPCPN50:pvr0623:rvnASUSTeKCOMPUTERINC.:rnPN50:rvrTobefilledbyO.E.M.:cvnDefaultstring:ct35:cvrDefaultstring:sku:
- dmi.product.family: Vivo PC
- dmi.product.name: MINIPC PN50
- dmi.product.version: 0623
- dmi.sys.vendor: ASUSTeK COMPUTER INC.
+ BUG: kernel NULL pointer dereference, address: 000000000000000c
+ #PF: supervisor write access in kernel mode
+ #PF: error_code(0x0002) - not-present page
+ PGD 0 P4D 0
+ Oops: 0002 [#1] SMP NOPTI
+ CPU: 0 PID: 175 Comm: systemd-udevd Not tainted 5.13.0-23-generic #23-Ubuntu
+ RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh]
+ Call Trace:
+   ? __pci_set_master+0x5f/0xe0
+   amd_mp2_pci_probe+0xad/0x160 [amd_sfh]
+   local_pci_probe+0x48/0x80
+   pci_device_probe+0x105/0x1c0
+   really_probe+0x24b/0x4c0
+   driver_probe_device+0xf0/0x160
+   device_driver_attach+0xab/0xb0
+   __driver_attach+0xb2/0x140
+   ? device_driver_attach+0xb0/0xb0
+   bus_for_each_dev+0x7e/0xc0
+   driver_attach+0x1e/0x20
+   bus_add_driver+0x135/0x1f0
+   driver_register+0x95/0xf0
+   ? 0xffffffffc03d2000
+   __pci_register_driver+0x57/0x60
+   amd_mp2_pci_driver_init+0x23/0x1000 [amd_sfh]
+   do_one_initcall+0x48/0x1d0
+   ? kmem_cache_alloc_trace+0xfb/0x240
+   do_init_module+0x62/0x290
+   load_module+0xa8f/0xb10
+   __do_sys_finit_module+0xc2/0x120
+   __x64_sys_finit_module+0x18/0x20
+   do_syscall_64+0x61/0xb0
+   ? ksys_mmap_pgoff+0x135/0x260
+   ? exit_to_user_mode_prepare+0x37/0xb0
+   ? syscall_exit_to_user_mode+0x27/0x50
+   ? __x64_sys_mmap+0x33/0x40
+   ? do_syscall_64+0x6e/0xb0
+   ? do_syscall_64+0x6e/0xb0
+   ? do_syscall_64+0x6e/0xb0
+   ? syscall_exit_to_user_mode+0x27/0x50
+   ? do_syscall_64+0x6e/0xb0
+   ? exc_page_fault+0x8f/0x170
+   ? asm_exc_page_fault+0x8/0x30
+   entry_SYSCALL_64_after_hwframe+0x44/0xae
+ 
+ This causes a panic and the system is unable to continue booting, and
+ the user must select an older kernel to boot.
+ 
+ [Fix]
+ 
+ The issue was introduced in 5.13.0-23-generic by the commit:
+ 
+ commit d46ef750ed58cbeeba2d9a55c99231c30a172764
+ commit-impish 56559d7910e704470ad72da58469b5588e8cbf85
+ Author: Evgeny Novikov <novi...@ispras.ru>
+ Date: Tue Jun 1 19:38:01 2021 +0300
+ Subject:HID: amd_sfh: Fix potential NULL pointer dereference
+ Link: 
https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764
+ 
+ The issue is pretty straightforward, amd_sfh_client.c attempts to
+ dereference cl_data, but it is NULL:
+ 
+ $ eu-addr2line -ifae 
./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko
 amd_sfh_hid_client_init+0x47
+ 0x0000000000000767
+ amd_sfh_hid_client_init
+ 
/build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27
+ 
+ 134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
+ 135 {
+ ...
+ 146
+ 147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, 
&cl_data->sensor_idx[0]);
+ 148
+ ...
+ 
+ The patch moves the call to amd_sfh_hid_client_init() before
+ privdata->cl_data is actually allocated by devm_kzalloc, hence cl_data
+ being NULL.
+ 
+ + rc = amd_sfh_hid_client_init(privdata);
+ + if (rc)
+ + return rc;
+ +
+         privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct 
amdtp_cl_data), GFP_KERNEL);
+         if (!privdata->cl_data)
+                 return -ENOMEM;
+ ...
+ - return amd_sfh_hid_client_init(privdata);
+ + return 0;
+ 
+ The issue was fixed upstream in 5.15-rc4 by the commit:
+ 
+ commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee
+ Author: Basavaraj Natikar <basavaraj.nati...@amd.com>
+ Date: Thu Sep 23 17:59:27 2021 +0530
+ Subject: HID: amd_sfh: Fix potential NULL pointer dereference
+ Link: 
https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee
+ 
+ The fix places the call to amd_sfh_hid_client_init() after
+ privdata->cl_data is allocated, and it changes the order of
+ amd_sfh_hid_client_init() to happen before devm_add_action_or_reset()
+ fixing the actual null pointer dereference which caused these commits to
+ exist.
+ 
+ This patch also landed in 5.14.10 -stable, but it seems it was omitted
+ from being backported to impish, likely due to it sharing the exact same
+ subject line as the regression commit, so it was likely dropped as a
+ duplicate?
+ 
+ [Testcase]
+ 
+ You need an AMD Ryzen based system that has a AMD Sensor Fusion Hub HID
+ device built in to test this.
+ 
+ Simply booting the system is enough to trigger the issue.
+ 
+ A test kernel is available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/lp1956519-test
+ 
+ A community user has tested the test kernel, and has confirmed that it
+ fixes the issue.
+ 
+ [Where problems could occur]
+ 
+ If a regression were to occur, it would only affect AMD Ryzen based
+ systems with the AMD Sensor Fusion Hub HID device SOC. Since the changes
+ affect the device initialisation function, a regression could cause
+ systems to panic during boot, forcing users to revert to older kernels
+ to start their systems.
+ 
+ Saying that, the patch is present in 5.15-rc4 and is in 5.14.10, and is
+ in widespread use, and is already present in Jammy.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1956519

Title:
  amd_sfh: Null pointer dereference on early device init causes early
  panic and fails to boot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956519/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to