Public bug reported:

SRU Justification

[Impact]
Systems that use the Intel Data Accelerator Driver (IDXD) may see a kernel NULL 
pointer dereference when reading the op_config attribute of an idxd WQ, if WQs 
do not offer the op_config capability.

On a DGXH100 system, this can be reproduced by running:
$ cat /sys/devices/pci0000\:e7/0000\:e7\:02.0/iax3/wq3.7/op_config

This affects 5.15.0-112-generic, and derivative kernels based on that
generic version.

[Fix]

Author: Jacob Martin <jacob.mar...@canonical.com>
Date:   Tue Jun 11 11:48:32 2024 -0500

    UBUNTU: SAUCE: dmaengine: idxd: set is_visible member of 
idxd_wq_attribute_group
    
    BugLink: ...
    
    The backport of commit b0325aefd398 ("dmaengine: idxd: add WQ operation
    cap restriction support") for K5.15 omitted a line setting the
    is_visible callback of idxd_wq_attribute_group to the
    idxd_wq_attr_visible function introduced in the same commit.
    
    This results in the op_config attribute being accessible from userspace
    when the underlying wq->opcap_bmap pointer used to service reads from it
    is uninitialized, leading to a NULL pointer dereference when the
    op_config attribute is read. Resolve this by setting the is_visible
    callback as the upstream commit does.
    
    Signed-off-by: Jacob Martin <jacob.mar...@canonical.com>

This patch adds a line setting the is_visible callback of
idxd_wq_attribute_group to the function introduced by the Jammy K5.15
backport of commit b0325aefd398 ("dmaengine: idxd: add WQ operation cap
restriction support"). The backport does not set this callback, but the
upstream version does, so this fix is just bringing us in sync with the
upstream commit.

[Test Case]
Verified that the patch "UBUNTU: SAUCE: dmaengine: idxd: set is_visible member 
of idxd_wq_attribute_group" resolves the issue on DGXH100. No instances of 
op_config are present under /sys, and thus the attribute cannot be read when it 
is invalid to do so on this system.

[Regression Potential]
There is a low risk of regression:
* this is specific to systems using IDXD.
* this patch brings us closer in-line with the upstream change.

[Other]
The Mantic 6.5 and Noble 6.8 kernels already have the upstream version of patch 
b0325aefd398 ("dmaengine: idxd: add WQ operation cap restriction support") as 
it was introduced in v6.1. These kernels set the is_visible attribute, so they 
are unaffected by this issue. Only Jammy K5.15 needs this fix.

-----------------------
On a DGXH100 system, this can be reproduced by running:
$ cat /sys/devices/pci0000\:e7/0000\:e7\:02.0/iax3/wq3.7/op_config

$ dmesg
...
[  236.620986] BUG: kernel NULL pointer dereference, address: 0000000000000018
[  236.628829] #PF: supervisor read access in kernel mode
[  236.634615] #PF: error_code(0x0000) - not-present page
[  236.640404] PGD 1eff19067 P4D 0 
[  236.644049] Oops: 0000 [#1] SMP NOPTI
[  236.648180] CPU: 117 PID: 8857 Comm: cat Tainted: G           OE     
5.15.0-112-generic #122-Ubuntu
[  236.658361] Hardware name: NVIDIA DGXH100/DGXH100, BIOS 1.1.3 10/30/2023
[  236.665901] RIP: 0010:op_cap_show_common+0x33/0x110 [idxd]
[  236.672095] Code: 41 57 49 89 f7 41 56 4c 8d 72 10 41 55 49 89 d5 41 54 53 
31 db 48 83 ec 18 48 89 7d c0 65 48 8b 04 25 28 00 00 00 48 89 45 d0 <48> 8b 42 
18 48 89 45 c8 89 de 4c 8d 45 c8 b9 40 00 00 00 4c 89 ff
[  236.693194] RSP: 0018:ff85e6e6b43f3bf8 EFLAGS: 00010292
[  236.699084] RAX: a91c7a5c2727bd00 RBX: 0000000000000000 RCX: 0000000000000000
[  236.707118] RDX: 0000000000000000 RSI: ff401fb55265d000 RDI: ff4020b52b7f8040
[  236.715146] RBP: ff85e6e6b43f3c38 R08: ff4020b52b7f8040 R09: ff401fb55265d000
[  236.723170] R10: 000000000000000b R11: 0000000000000000 R12: ffffffffb9ad1f60
[  236.731198] R13: 0000000000000000 R14: 0000000000000010 R15: ff401fb55265d000
[  236.739221] FS:  00007fdae7c99740(0000) GS:ff4020b0bf340000(0000) 
knlGS:0000000000000000
[  236.748328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  236.754800] CR2: 0000000000000018 CR3: 000000017be08004 CR4: 0000000000771ee0
[  236.762836] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  236.770863] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  236.778893] PKRU: 55555554
[  236.781954] Call Trace:
[  236.784720]  <TASK>
[  236.787098]  ? show_trace_log_lvl+0x1d6/0x2ea
[  236.792017]  ? show_trace_log_lvl+0x1d6/0x2ea
[  236.796929]  ? wq_op_config_show+0x18/0x20 [idxd]
[  236.802231]  ? show_regs.part.0+0x23/0x29
[  236.806743]  ? __die_body.cold+0x8/0xd
[  236.810969]  ? __die+0x2b/0x37
[  236.814405]  ? page_fault_oops+0x13b/0x170
[  236.819029]  ? do_user_addr_fault+0x321/0x670
[  236.823937]  ? page_counter_try_charge+0x34/0xc0
[  236.829143]  ? exc_page_fault+0x77/0x170
[  236.833569]  ? asm_exc_page_fault+0x27/0x30
[  236.838287]  ? op_cap_show_common+0x33/0x110 [idxd]
[  236.843785]  wq_op_config_show+0x18/0x20 [idxd]
[  236.848891]  dev_attr_show+0x1a/0x50
[  236.852925]  sysfs_kf_seq_show+0xa2/0x100
[  236.857448]  kernfs_seq_show+0x24/0x30
[  236.861678]  seq_read_iter+0x121/0x4b0
[  236.865910]  ? __alloc_pages+0x17e/0x330
[  236.870339]  kernfs_fop_read_iter+0x30/0x40
[  236.875058]  new_sync_read+0x10a/0x190
[  236.879279]  vfs_read+0x103/0x1a0
[  236.883022]  ksys_read+0x67/0xf0
[  236.886665]  __x64_sys_read+0x19/0x20
[  236.890796]  x64_sys_call+0x1dba/0x1fa0
[  236.895130]  do_syscall_64+0x56/0xb0
[  236.899167]  ? handle_mm_fault+0xd8/0x2c0
[  236.903686]  ? do_user_addr_fault+0x1e7/0x670
[  236.908599]  ? do_syscall_64+0x63/0xb0
[  236.912828]  ? exit_to_user_mode_prepare+0x37/0xb0
[  236.918232]  ? irqentry_exit_to_user_mode+0x17/0x20
[  236.923726]  ? irqentry_exit+0x1d/0x30
[  236.927957]  ? exc_page_fault+0x89/0x170
[  236.932378]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
[  236.938071] RIP: 0033:0x7fdae7db07e2
[  236.942107] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a b4 0c 00 e8 a5 1d 02 00 
0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 
f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[  236.963206] RSP: 002b:00007ffd26fe4b48 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[  236.971720] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fdae7db07e2
[  236.979747] RDX: 0000000000020000 RSI: 00007fdae792e000 RDI: 0000000000000003
[  236.987780] RBP: 00007fdae792e000 R08: 00007fdae792d010 R09: 00007fdae792d010
[  236.995812] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[  237.003843] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[  237.011876]  </TASK>
[  237.014349] Modules linked in: intel_rapl_msr intel_rapl_common i10nm_edac 
nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel binfmt_misc kvm 
ipmi_ssif nls_iso8859_1 rapl qat_4xxx intel_th_gth idxd intel_qat mlx5_ib(OE) 
isst_if_mmio pmt_telemetry isst_if_mbox_pci pmt_crashlog intel_th_pci 
ib_uverbs(OE) joydev input_leds isst_if_common pmt_class idxd_bus authenc 
mei_me ib_core(OE) intel_th mei switchtec acpi_ipmi ipmi_si ipmi_devintf 
ipmi_msghandler mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_lenovo 
hid_generic usbhid hid ast mlx5_core(OE) i2c_algo_bit drm_vram_helper 
drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul 
ghash_clmulni_intel sysfillrect sha256_ssse3 sysimgblt mlxdevm(OE) fb_sys_fops 
sha1_ssse3 mlxfw(OE)
[  237.014410]  aesni_intel crypto_simd psample cryptd ixgbe cec tls xfrm_algo 
rc_core mlx_compat(OE) xhci_pci dca i2c_i801 nvme intel_pmt drm pci_hyperv_intf 
mdio xhci_pci_renesas i2c_smbus i2c_ismt nvme_core wmi pinctrl_emmitsburg
[  237.134526] CR2: 0000000000000018
[  237.138263] ---[ end trace 58ef1dd45abd6934 ]---
[  237.461968] RIP: 0010:op_cap_show_common+0x33/0x110 [idxd]
[  237.468149] Code: 41 57 49 89 f7 41 56 4c 8d 72 10 41 55 49 89 d5 41 54 53 
31 db 48 83 ec 18 48 89 7d c0 65 48 8b 04 25 28 00 00 00 48 89 45 d0 <48> 8b 42 
18 48 89 45 c8 89 de 4c 8d 45 c8 b9 40 00 00 00 4c 89 ff
[  237.489246] RSP: 0018:ff85e6e6b43f3bf8 EFLAGS: 00010292
[  237.495128] RAX: a91c7a5c2727bd00 RBX: 0000000000000000 RCX: 0000000000000000
[  237.503158] RDX: 0000000000000000 RSI: ff401fb55265d000 RDI: ff4020b52b7f8040
[  237.511187] RBP: ff85e6e6b43f3c38 R08: ff4020b52b7f8040 R09: ff401fb55265d000
[  237.519217] R10: 000000000000000b R11: 0000000000000000 R12: ffffffffb9ad1f60
[  237.527246] R13: 0000000000000000 R14: 0000000000000010 R15: ff401fb55265d000
[  237.535275] FS:  00007fdae7c99740(0000) GS:ff4020b0bf340000(0000) 
knlGS:0000000000000000
[  237.544379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  237.550845] CR2: 0000000000000018 CR3: 000000017be08004 CR4: 0000000000771ee0
[  237.558875] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  237.566904] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[  237.574933] PKRU: 55555554

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Jacob Martin (jacobmartin)
         Status: New

** Affects: linux-nvidia (Ubuntu)
     Importance: Undecided
     Assignee: Jacob Martin (jacobmartin)
         Status: New

** Affects: linux (Ubuntu Jammy)
     Importance: Undecided
     Assignee: Jacob Martin (jacobmartin)
         Status: New

** Affects: linux-nvidia (Ubuntu Jammy)
     Importance: Undecided
     Assignee: Jacob Martin (jacobmartin)
         Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Jacob Martin (jacobmartin)

** Also affects: linux-nvidia (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: linux-nvidia (Ubuntu)
     Assignee: (unassigned) => Jacob Martin (jacobmartin)

** Changed in: linux-nvidia (Ubuntu Jammy)
     Assignee: (unassigned) => Jacob Martin (jacobmartin)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2069081

Title:
  idxd: NULL pointer dereference reading wq op_config attribute

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069081/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to