** Description changed: - In the Ubuntu 24's 6.8.0-31-generic kernel version, the capability - https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a - (virtio-pci: Introduce admin virtqueue) was added. However, an issue was - overlooked by the upstream community, which is that if the virtio device - is not a modern virtio device, but a legacy virtio device, the is_avq - function pointer is not assigned, resulting in a NULL pointer for the - is_avq function pointer in the virtio_pci_device structure of the legacy - virtio device. When unloading the virtio device, if the code calls if - (vp_dev->is_avq(vdev, vq->index)), the RIP register of the CPU points to - a NULL pointer address. + BugLink: https://bugs.launchpad.net/bugs/2067862 - I have noticed that the kernel community has already included a related - solution, and I hope that the Ubuntu kernel can backport to support the - remove operation for legacy virtio devices: - https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498 - (virtio-pci: Check if is_avq is NULL). + [Impact] + + If you detach a legacy virtio-pci device from a current Noble system, it + will cause a null pointer dereference, and panic the system. This is an + issue if you force noble to use legacy virtio-pci devices, or run noble + on very old hypervisors that only support legacy virtio-pci devices, + e.g. trusty and older. + + BUG: kernel NULL pointer dereference, address: 0000000000000000 + ... + CPU: 2 PID: 358 Comm: kworker/u8:3 Kdump: loaded Not tainted 6.8.0-31-generic #31-Ubuntu + Workqueue: kacpi_hotplug acpi_hotplug_work_fn + RIP: 0010:0x0 + ... + Call Trace: + <TASK> + ? show_regs+0x6d/0x80 + ? __die+0x24/0x80 + ? page_fault_oops+0x99/0x1b0 + ? do_user_addr_fault+0x2ee/0x6b0 + ? exc_page_fault+0x83/0x1b0 + ? asm_exc_page_fault+0x27/0x30 + vp_del_vqs+0x6e/0x2a0 + remove_vq_common+0x166/0x1a0 + virtnet_remove+0x61/0x80 + virtio_dev_remove+0x3f/0xc0 + device_remove+0x40/0x80 + device_release_driver_internal+0x20b/0x270 + device_release_driver+0x12/0x20 + bus_remove_device+0xcb/0x140 + device_del+0x161/0x3e0 + ? pci_bus_generic_read_dev_vendor_id+0x2c/0x1a0 + device_unregister+0x17/0x60 + unregister_virtio_device+0x16/0x40 + virtio_pci_remove+0x43/0xa0 + pci_device_remove+0x36/0xb0 + device_remove+0x40/0x80 + device_release_driver_internal+0x20b/0x270 + device_release_driver+0x12/0x20 + pci_stop_bus_device+0x7a/0xb0 + pci_stop_and_remove_bus_device+0x12/0x30 + disable_slot+0x4f/0xa0 + acpiphp_disable_and_eject_slot+0x1c/0xa0 + hotplug_event+0x11b/0x280 + ? __pfx_acpiphp_hotplug_notify+0x10/0x10 + acpiphp_hotplug_notify+0x27/0x70 + acpi_device_hotplug+0xb6/0x300 + acpi_hotplug_work_fn+0x1e/0x40 + process_one_work+0x16c/0x350 + worker_thread+0x306/0x440 + ? _raw_spin_lock_irqsave+0xe/0x20 + ? __pfx_worker_thread+0x10/0x10 + kthread+0xef/0x120 + ? __pfx_kthread+0x10/0x10 + ret_from_fork+0x44/0x70 + ? __pfx_kthread+0x10/0x10 + ret_from_fork_asm+0x1b/0x30 + </TASK> + + The issue was introduced in: + + commit fd27ef6b44bec26915c5b2b22c13856d9f0ba17a + Author: Feng Liu <fe...@nvidia.com> + Date: Tue Dec 19 11:32:40 2023 +0200 + Subject: virtio-pci: Introduce admin virtqueue + Link: https://github.com/torvalds/linux/commit/fd27ef6b44bec26915c5b2b22c13856d9f0ba17a + + Modern virtio-pci devices are not affected. If the device is a legacy + virtio device, the is_avq function pointer is not assigned in the + virtio_pci_device structure of the legacy virtio device, resulting in a + NULL pointer dereference when the code calls if (vp_dev->is_avq(vdev, + vq->index)). + + There is no workaround. If you are affected, then not detaching devices + for the time being is the only solution. + + [Fix] + + This was fixed in 6.9-rc1 by: + + commit c8fae27d141a32a1624d0d0d5419d94252824498 + From: Li Zhang <zhangliker...@gmail.com> + Date: Sat, 16 Mar 2024 13:25:54 +0800 + Subject: virtio-pci: Check if is_avq is NULL + Link: https://github.com/torvalds/linux/commit/c8fae27d141a32a1624d0d0d5419d94252824498 + + This is a clean cherry pick to noble. The commit just adds a basic NULL + pointer check before it dereferences the pointer. + + [Testcase] + + Start a fresh Noble VM. + + Edit the grub kernel command line: + + 1) sudo vim /etc/default/grub + GRUB_CMDLINE_LINUX_DEFAULT="virtio_pci.force_legacy=1" + 2) sudo update-grub + 3) sudo reboot + + Outside the VM, on the host: + + $ qemu-img create -f qcow2 /root/share-device.qcow2 2G + $ cat >> share-device.xml << EOF + disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='writeback' io='threads'/> + <source file='/root/share-device.qcow2'/> + <target dev='vdc' bus='virtio'/> + </disk> + EOF + $ sudo -s + # virsh attach-device noble-test share-device.xml --config --live + # virsh detach-device noble-test share-device.xml --config --live + + A kernel panic should occur. + + There is a test kernel available in: + + https://launchpad.net/~mruffell/+archive/ubuntu/lp2067862-test + + If you install it, the panic should no longer occur. + + [Where problems could occur] + + We are adding a basic null pointer check right before the pointer is + about to be used, which is quite low risk. + + If a regression were to occur, it would only affect VMs using legacy + virtio-pci devices, which is not the default. It would potentially have + large impacts on fleets of very old hypervisors running trusty, precise + or lucid, but that is very unlikely in this day and age. + + [Other Info] + + Upstream mailing list discussion and author testcase: + https://lore.kernel.org/kvm/CACGkMEs1t-ipP7TasHkKNKd=pevees6xdw1zssjkb-bc9et...@mail.gmail.com/T/#m167335bf7ab09b12fec3bdc5d46a30bc2e26cac7
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/2067862 Title: Removing legacy virtio-pci devices causes kernel panic To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2067862/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs