Public bug reported:

[Impact]

libvirt's virHostCPUParseNode()(src/util/virhostcpu.c) derives the host
socket count from the maximum physical_package_id value read from
/sys/devices/system/cpu/cpu*/topology/physical_package_id.

On systems where physical_package_id is not contiguous or zero-based(for 
example NVIDIA GB200),
These identifiers can be very large arbitrary numbers (e.g. 256123234).
libvirt then allocates an array (cores_maps) sized to that maximum value and 
creates one virBitmap per slot.
With a package id of ~10^9 this becomes a multi-gigabyte allocation plus ~10^9 
allocation calls, causing excessive memory use, long CPU time, and OOM / denial 
of service whenever libvirt inspects host CPU topology (virsh capabilities, 
virsh nodeinfo, domain start, etc.).

[Test Plan]
1. Prepare a fresh VM.
2. Install libvirt:
     sudo apt-get update
     sudo apt-get install -y libvirt-daemon-system libvirt-clients

3. Give CPU 0 a large physical_package_id and query host node info
(sysfs is read-only, so the value is supplied with a bind mount):

     ppath=/sys/devices/system/cpu/cpu0/topology/physical_package_id
     echo 999999999 | sudo tee /tmp/ppid
     sudo mount --bind /tmp/ppid "$ppath"
     sudo virsh nodeinfo
     sudo umount "$ppath"

4. Result:
- Without the fix

virHostCPUParseNode() derives the socket count from the package id, so 
cores_maps is sized for ~10^9 sockets (an 8 GB array) and the daemon then 
begins allocating a bitmap per socket.
The libvirt daemon's memory balloons until the kernel OOM-killer terminates it.
"virsh nodeinfo" fails:
     error: Disconnected from qemu:///system due to end of file
     error: failed to get node information
     error: End of file while reading data: Input/output error

...
     Out of memory: Killed process <pid> (libvirtd) total-vm:~15 GB

- With the fix
the large id is counted as a single unique socket and "virsh nodeinfo" returns 
the host information normally("CPU socket(s): 1").

[Where problems could occur]

The change is limited to virHostCPUParseNode() in src/util/virhostcpu.c.
Instead of using the maximum physical_package_id value as the socket count, it 
now counts unique physical_package_id values and maps them to sequential socket 
indexes using a GHashTable.
If a regression occurs, it would most likely appear as incorrect host CPU 
topology reporting (such as sockets, cores, or threads in virsh nodeinfo or 
capabilities output).
The change does not affect guest handling, migration, or any on-disk state.

[Other Info]

Fixed upstream in libvirt 12.1.0 by
https://github.com/libvirt/libvirt/commit/a64367115015df58e0d82635a40d76df56144c60
 

commit a64367115015df58e0d82635a40d76df56144c60.
util: Fix max socket calculation

Affected Ubuntu releases (libvirt < 12.1.0)

focal     6.0.0-0ubuntu8
jammy     8.0.0-1ubuntu7
noble     10.0.0-2ubuntu8
questing  11.6.0-1ubuntu3
resolute  12.0.0-1ubuntu5

** Affects: libvirt (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: libvirt (Ubuntu Focal)
     Importance: Undecided
         Status: New

** Affects: libvirt (Ubuntu Jammy)
     Importance: Undecided
         Status: New

** Affects: libvirt (Ubuntu Noble)
     Importance: Undecided
         Status: New

** Affects: libvirt (Ubuntu Resolute)
     Importance: Undecided
         Status: New

** Also affects: libvirt (Ubuntu Stonking)
   Importance: Undecided
       Status: New

** Also affects: libvirt (Ubuntu Resolute)
   Importance: Undecided
       Status: New

** Also affects: libvirt (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: libvirt (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Also affects: libvirt (Ubuntu Focal)
   Importance: Undecided
       Status: New

** No longer affects: libvirt (Ubuntu Stonking)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2153530

Title:
  libvirt: excessive memory allocation / OOM when physical_package_id is
  large

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2153530/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to