Rebased, reviewed, and build-tested on PPA [1].
Uploaded to Focal.

[1] https://launchpad.net/~mfo/+archive/ubuntu/lp1999814

** Description changed:

+ [ Impact ]
+ 
+  * Live migration is increasingly being impacted by changes to CPU flags
+    (e.g., 'xsaves' disabled on AMD EPYC; PKRU/'xsave' behavior changes), 
+    which prevents migration on otherwise identical hypervisors, but the
+    only difference is a CPU flag (i.e., source hypervisor still has flag
+    enabled; destination hypervisor had flag disabled on a kernel update).
+     
+  * These CPU flags updates require changes to CPU model definitions in
+    several places (qemu, libvirt, and nova if openstack is being used),
+    which is a lot of overhead for each subtle variation that may appear.
+    
+  * Fortunately, it's possible to reduce the changes required by allowing
+    nova to customize CPU flags to enable/disable _on top_ of a CPU model
+    definition (e.g., the same AMD EPYC CPU model with 'xsaves' disabled).
+ 
+  * This change is present in Jammy and later, and is backward compatible
+    with the existing config files, as the (new) enable/disable operators 
+    are an optional prefix to existing flags (e.g., '-xsaves' or '+xsaves').
+ 
+ [ Test Plan ]
+ 
+  * Deploy Openstack with 2 hypervisors (or more), and configure nova.conf
+    with a cpu_model and cpu_extra_flags to disable/enable, for example:
+    
+    # grep cpu_model /etc/nova/nova.conf
+    cpu_model = EPYC-Rome
+    cpu_model_extra_flags = -xsaves
+ 
+  * Start a VM before/after the package upgrade (focal-proposed), checking
+    the VM XML for that flag (e.g., policy change from require to disable);
+    for example:
+    
+    Before:
+    
+    # virsh dumpxml instance-<number> | grep xsaves
+    <feature policy='require' name='xsaves'/>
+    
+    After:
+    
+    # virsh dumpxml instance-<number> | grep xsaves 
+    <feature policy='disable' name='xsaves'/>
+    
+  * Ensure that nova is able to start *with* and *without*  enable/disable
+    cpu flag changes.
+  
+  * Ensure live migration works on both ways across the 2 hypervisors 
+    *with* and *without* enable/disable cpu flag changes.
+    
+    
+ [ Regression Potential ]
+ 
+  * Regressions would likely manifest in the areas modified by the patches,
+    i.e., parsing the config file's cpu flags (on nova startup), generating
+    a VM's XML file (on nova VM start/creation), and also live migration.
+ 
+  * The patched packages have been evaluated/running in production for 2-3
+    months now, and live migration have been performed, without any issues.
+    
+ [ Other Info ]
+ 
+  * The code changes had their callee-paths reviewed, and potential issues
+    were not identified.
+    
+  * The patches are already present in Jammy and later.
+ 
+ [ Original Bug Description ]
  The linux kernel upstream disabled XSAVES on AMD EPYC Rome CPUs ([1]). 
Upstream qemu shortly followed with a patch adding a CPU model version of 
EPYC-Rome without XSAVES ([2])
  The change in the kernel has been backported to ubuntu focal ([3]).
  
  Without further workarounds or the adapted CPU model in qemu this will lead 
to a situation were virtual machines with an EPYC-Rome CPU model created on 
hypervisors with newer EPYC CPUs will have the XSAVES flag enabled, thus 
preventing live migration to hypervisors with EPYC Rome CPUs were XSAVES is no 
longer available.
  Therefore I would like to argue that the patch adapting the CPU model in qemu 
should also be backported to ubuntu focal.
- 
- 
  
  [1]
  
https://lore.kernel.org/all/20230307174643.1240184-1-andrew.coop...@citrix.com/
  
  [2]
  https://patchew.org/QEMU/20230524213748.8918-1-davydov-...@yandex-team.ru/
  
  [3]
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023420

** Changed in: nova (Ubuntu Focal)
       Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2048517

Title:
  EPYC-Rome model without XSAVES may break live migration since the
  removal of the flag on the physical CPU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/2048517/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to