broken — "Unknown driver 'rbd'" + crash on retry

Christian Ehrhardt Wed, 24 Jun 2026 04:01:46 -0700

** Description changed:

+ [ Impact ]
+ 
+  * An explanation of the effects of the bug on users and justification
+    for backporting the fix to the stable release.
+ 
+  * In addition, it is helpful, but not required, to include an
+    explanation of how the upload fixes this bug.
+ 
+ [ Test Plan ]
+ 
+  * detailed instructions how to reproduce the bug
+ 
+  * these should allow someone who is not familiar with the affected
+    package to reproduce the bug and verify that the updated package
+    fixes the problem.
+ 
+  * if other testing is appropriate to perform before landing this
+    update, this should also be described here.
+ 
+ [ Where problems could occur ]
+ 
+  * This is in the code for gpu in the virito/virtgl context and only
+ there. This is a reasonable, but rare setup. If we missed a regression
+ one should look for those components (virtgl and virti-gpu) in the bug
+ report to map it back to potentially be an issue cause by this.
+ 
+ [ Other Info ]
+ 
+  * n/a
+ 
  ## Package
  
  qemu-system-x86 (Ubuntu noble)
  
  ## Affects
  
  qemu (Ubuntu)
  
  ## Related bugs
  
  - LP #1847361 (Upgrade of qemu binaries causes running instances to be unable 
to hot-attach)
  - LP #1913421 (module retention improvements)
  
  ## Description
  
  ### Summary
  
  After upgrading QEMU packages on a compute node (e.g. from
  `1:8.2.2+ds-0ubuntu1.12` to `0ubuntu1.13`), long-running VM instances
  started with the older build can no longer hot-attach Ceph RBD volumes —
  even though `/run/qemu/` contains the retained modules for the old
  build.
  
  The first attach attempt fails with "Unknown driver 'rbd'". A second
  attempt crashes QEMU with an assertion failure.
  
  This is a regression in the module-retention mechanism introduced for LP
  #1847361.
  
  ### Root cause
  
  Two bugs in `util/module.c` (confirmed identical on current QEMU master
  as of 2026-03-26):
  
  **Bug A — module_load() does not fall back on build mismatch:**
  
  The directory search loop (lines 282–303) only continues to the next
  directory when the module file is not found (`ENOENT`). When the file
  exists but `module_load_dso()` fails (build mismatch), the loop hits
  `goto out` immediately — never reaching `/run/qemu/<version>/`.
  
  `CONFIG_MODULE_UPGRADES` is enabled in the Ubuntu noble build
  (`debian/rules`: `$(if ${enable-system},--enable-module-upgrades)`), so
  the `/run/qemu/<version>/` path is added to the search list — but it is
  never reached because the system path (`/usr/lib/x86_64-linux-
  gnu/qemu/`) contains the new build's modules, which exist but fail the
  stamp check.
  
  **Bug B — module_load_dso() leaks dso_init_list on failure:**
  
  When `g_module_open()` loads a `.so`, its constructors populate
  `dso_init_list`. On build mismatch, `g_module_close()` is called but
  `dso_init_list` is not drained. On the next module load attempt,
  `assert(QTAILQ_EMPTY(&dso_init_list))` fires and QEMU aborts.
  
  ### Environment
  
  - Ubuntu 24.04 (noble), OpenStack compute nodes (Nova Victoria, libvirt/kvm, 
Cinder/Ceph RBD)
  - Kernel: `6.14.0-37-generic #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64`
  - QEMU: `qemu-system-x86 1:8.2.2+ds-0ubuntu1.13`
  - libvirt: 10.0.0-2ubuntu8.12
  - AppArmor: enabled, no DENIED entries for `/run/qemu` or `block-rbd.so`
  - `/run/qemu` mounted as tmpfs (rw, no noexec)
  
  ### Observed symptoms
  
  **Instance started with QEMU 0ubuntu1.11, host upgraded to
  0ubuntu1.13:**
  
  Instance log (first attach attempt):
  ```
  failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so
  Only modules from the same build can be loaded.
  ```
  
  libvirt:
  ```
  internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 
'rbd'
  ```
  
  VM continues running, but attach fails. `/proc/$PID/maps` shows no
  mapping of `block-rbd.so`.
  
  Second attempt — instance log:
  ```
  qemu-system-x86_64: util/module.c:165: module_load_dso: Assertion 
`QTAILQ_EMPTY(&dso_init_list)' failed.
  ```
  
  QEMU exits (`reason=crashed`), VM ends up in SHUTOFF state.
  
  At the time of the failure, the retained modules exist:
  ```
  /run/qemu/Debian_1_8.2.2+ds-0ubuntu1.12/block-rbd.so   (40312 bytes, readable)
  /run/qemu/Debian_1_8.2.2+ds-0ubuntu1.11/block-rbd.so   (40312 bytes, readable)
  ```
  
  This has been reproduced across multiple minor build upgrades
  (0ubuntu1.11→12 and 0ubuntu1.12→13).
  
  ### Steps to reproduce
  
  1. Start an OpenStack instance on a compute node running QEMU 
`1:8.2.2+ds-0ubuntu1.X`. The instance must not use RBD at boot.
  2. Upgrade QEMU on the host to `0ubuntu1.(X+1)` while the instance keeps 
running.
  3. Verify `/run/qemu/Debian_1_8.2.2+ds-0ubuntu1.X/block-rbd.so` exists.
  4. Hot-attach a Cinder/Ceph RBD volume (`openstack server add volume`).
  5. First attempt: "Unknown driver 'rbd'".
  6. Second attempt: QEMU assertion crash.
  
  ### Impact
  
  - Long-running VMs that predate a QEMU package upgrade cannot hot-attach RBD 
volumes (or any other module-backed driver not already loaded).
  - Second attempt crashes the VM, causing unplanned downtime.
  - Defeats the purpose of the `/run/qemu/` module-retention mechanism (LP 
#1847361, LP #1913421).
  
  ### Proposed fix
  
  See upstream QEMU GitLab issue (https://gitlab.com/qemu-
  project/qemu/-/work_items/3354) for detailed code analysis and patch
  proposals. Summary:
  
  - **Bug A:** On `module_load_dso()` failure, clear the error and `continue` 
to the next directory instead of `goto out`.
  - **Bug B:** In `module_load_dso()`, drain `dso_init_list` before 
`g_module_close()` when the stamp check fails.
  
  Both fixes are against upstream `util/module.c` — the code is identical
  on current QEMU master.
  
  ### Current workaround
  
  Proactively reboot or live-migrate any instance whose running QEMU
  version (via QMP `query-version`) does not match the installed package
  version, before hot-attaching RBD volumes.


** Description changed:

  [ Impact ]
  
-  * An explanation of the effects of the bug on users and justification
-    for backporting the fix to the stable release.
+  * An explanation of the effects of the bug on users and justification
+    for backporting the fix to the stable release.
  
-  * In addition, it is helpful, but not required, to include an
-    explanation of how the upload fixes this bug.
+  * In addition, it is helpful, but not required, to include an
+    explanation of how the upload fixes this bug.
  
  [ Test Plan ]
  
-  * detailed instructions how to reproduce the bug
+  * detailed instructions how to reproduce the bug
  
-  * these should allow someone who is not familiar with the affected
-    package to reproduce the bug and verify that the updated package
-    fixes the problem.
+  * these should allow someone who is not familiar with the affected
+    package to reproduce the bug and verify that the updated package
+    fixes the problem.
  
-  * if other testing is appropriate to perform before landing this
-    update, this should also be described here.
+  * if other testing is appropriate to perform before landing this
+    update, this should also be described here.
  
  [ Where problems could occur ]
  
-  * This is in the code for gpu in the virito/virtgl context and only
- there. This is a reasonable, but rare setup. If we missed a regression
- one should look for those components (virtgl and virti-gpu) in the bug
- report to map it back to potentially be an issue cause by this.
+  * This makes loading of modules after upgrades possible,
+    therefore of the vast amount of things qemu does gladly only
+    loading modules is the path we'd look out for in regard to
+    regressions.  Most common real world situation for this is
+    hot attaching devices which might load further modules late
+    in the lifecycle of a process.
  
  [ Other Info ]
  
-  * n/a
+  * n/a
  
  ## Package
  
  qemu-system-x86 (Ubuntu noble)
  
  ## Affects
  
  qemu (Ubuntu)
  
  ## Related bugs
  
  - LP #1847361 (Upgrade of qemu binaries causes running instances to be unable 
to hot-attach)
  - LP #1913421 (module retention improvements)
  
  ## Description
  
  ### Summary
  
  After upgrading QEMU packages on a compute node (e.g. from
  `1:8.2.2+ds-0ubuntu1.12` to `0ubuntu1.13`), long-running VM instances
  started with the older build can no longer hot-attach Ceph RBD volumes —
  even though `/run/qemu/` contains the retained modules for the old
  build.
  
  The first attach attempt fails with "Unknown driver 'rbd'". A second
  attempt crashes QEMU with an assertion failure.
  
  This is a regression in the module-retention mechanism introduced for LP
  #1847361.
  
  ### Root cause
  
  Two bugs in `util/module.c` (confirmed identical on current QEMU master
  as of 2026-03-26):
  
  **Bug A — module_load() does not fall back on build mismatch:**
  
  The directory search loop (lines 282–303) only continues to the next
  directory when the module file is not found (`ENOENT`). When the file
  exists but `module_load_dso()` fails (build mismatch), the loop hits
  `goto out` immediately — never reaching `/run/qemu/<version>/`.
  
  `CONFIG_MODULE_UPGRADES` is enabled in the Ubuntu noble build
  (`debian/rules`: `$(if ${enable-system},--enable-module-upgrades)`), so
  the `/run/qemu/<version>/` path is added to the search list — but it is
  never reached because the system path (`/usr/lib/x86_64-linux-
  gnu/qemu/`) contains the new build's modules, which exist but fail the
  stamp check.
  
  **Bug B — module_load_dso() leaks dso_init_list on failure:**
  
  When `g_module_open()` loads a `.so`, its constructors populate
  `dso_init_list`. On build mismatch, `g_module_close()` is called but
  `dso_init_list` is not drained. On the next module load attempt,
  `assert(QTAILQ_EMPTY(&dso_init_list))` fires and QEMU aborts.
  
  ### Environment
  
  - Ubuntu 24.04 (noble), OpenStack compute nodes (Nova Victoria, libvirt/kvm, 
Cinder/Ceph RBD)
  - Kernel: `6.14.0-37-generic #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64`
  - QEMU: `qemu-system-x86 1:8.2.2+ds-0ubuntu1.13`
  - libvirt: 10.0.0-2ubuntu8.12
  - AppArmor: enabled, no DENIED entries for `/run/qemu` or `block-rbd.so`
  - `/run/qemu` mounted as tmpfs (rw, no noexec)
  
  ### Observed symptoms
  
  **Instance started with QEMU 0ubuntu1.11, host upgraded to
  0ubuntu1.13:**
  
  Instance log (first attach attempt):
  ```
  failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so
  Only modules from the same build can be loaded.
  ```
  
  libvirt:
  ```
  internal error: unable to execute QEMU command 'blockdev-add': Unknown driver 
'rbd'
  ```
  
  VM continues running, but attach fails. `/proc/$PID/maps` shows no
  mapping of `block-rbd.so`.
  
  Second attempt — instance log:
  ```
  qemu-system-x86_64: util/module.c:165: module_load_dso: Assertion 
`QTAILQ_EMPTY(&dso_init_list)' failed.
  ```
  
  QEMU exits (`reason=crashed`), VM ends up in SHUTOFF state.
  
  At the time of the failure, the retained modules exist:
  ```
  /run/qemu/Debian_1_8.2.2+ds-0ubuntu1.12/block-rbd.so   (40312 bytes, readable)
  /run/qemu/Debian_1_8.2.2+ds-0ubuntu1.11/block-rbd.so   (40312 bytes, readable)
  ```
  
  This has been reproduced across multiple minor build upgrades
  (0ubuntu1.11→12 and 0ubuntu1.12→13).
  
  ### Steps to reproduce
  
  1. Start an OpenStack instance on a compute node running QEMU 
`1:8.2.2+ds-0ubuntu1.X`. The instance must not use RBD at boot.
  2. Upgrade QEMU on the host to `0ubuntu1.(X+1)` while the instance keeps 
running.
  3. Verify `/run/qemu/Debian_1_8.2.2+ds-0ubuntu1.X/block-rbd.so` exists.
  4. Hot-attach a Cinder/Ceph RBD volume (`openstack server add volume`).
  5. First attempt: "Unknown driver 'rbd'".
  6. Second attempt: QEMU assertion crash.
  
  ### Impact
  
  - Long-running VMs that predate a QEMU package upgrade cannot hot-attach RBD 
volumes (or any other module-backed driver not already loaded).
  - Second attempt crashes the VM, causing unplanned downtime.
  - Defeats the purpose of the `/run/qemu/` module-retention mechanism (LP 
#1847361, LP #1913421).
  
  ### Proposed fix
  
  See upstream QEMU GitLab issue (https://gitlab.com/qemu-
  project/qemu/-/work_items/3354) for detailed code analysis and patch
  proposals. Summary:
  
  - **Bug A:** On `module_load_dso()` failure, clear the error and `continue` 
to the next directory instead of `goto out`.
  - **Bug B:** In `module_load_dso()`, drain `dso_init_list` before 
`g_module_close()` when the stamp check fails.
  
  Both fixes are against upstream `util/module.c` — the code is identical
  on current QEMU master.
  
  ### Current workaround
  
  Proactively reboot or live-migrate any instance whose running QEMU
  version (via QMP `query-version`) does not match the installed package
  version, before hot-attaching RBD volumes.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146445

Title:
  qemu-system-x86: module upgrade fallback in /run/qemu/ broken —
  "Unknown driver 'rbd'" + crash on retry

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2146445/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2146445] Re: qemu-system-x86: module upgrade fallback in /run/qemu/ broken — "Unknown driver 'rbd'" + crash on retry

Reply via email to