Public bug reported:

[Impact]

During our AWS testing we were able to trigger some hibernation failures
in some Xen instance types.

One problem is a kernel panic in the resume callback of the xen-netfront
driver. A workaround to this problem is to compile the driver as a
module and reload it at resume (we were already doing this reload with
the bionic kernel that had this driver compiled as a module, but for
some reasons eoan and focal had this statically compiled).

Other issues were showing up as hangs on resume, these seem to be prevented by 
using the new Xen/hibernation patch set posted by Anchal to the LKML:
https://lore.kernel.org/lkml/cover.1589926004.git.ancha...@amazon.com/

This new patch set is still being reviewed, but according to our tests
it really seems to fix some of these hangs on resume.

In addition to that we can improve hibernation reliability and
performance even more by applying the updated swapoff optimization patch
(that has been merged upstream).

[Test case]

Create a Xen instance in AWS, hibernate/resume multiple times.

[Fix]

The following set of fixes can be used to improve hibernation performance and 
reliability:
 - new Xen/hibernation patch set from the LKML (see link above)
 - config change to compile xen-netfront as a module
 - new swapoff optimization patch

[Regression potential]

The xen-netfront config change and the new swapoff optimization patch
are pretty safe (one is a config change that affects only the xen-
netfront driver, the other is a clean cherry-pick of an upstream
commit).

The new Xen/hibernation update is pretty big and the new patches are
still under review, however according to our tests it really seems to
fix some of the hang issues (it definitely makes things better).
Moreover, all the changes are affecting Xen and they are restricted to
the hibernation/resume code paths, so, in conclusion, the overall
regression potential is minimal.

[See also]

NOTE: the fix mentioned in LP: #1879711 (disable CONFIG_DMA_CMA) was
also applied during our tests and it is also required to make
hibernation stable in Xen.

** Affects: linux-aws (Ubuntu)
     Importance: High
     Assignee: Andrea Righi (arighi)
         Status: New

** Affects: linux-aws (Ubuntu Eoan)
     Importance: High
     Assignee: Andrea Righi (arighi)
         Status: New

** Affects: linux-aws (Ubuntu Focal)
     Importance: High
     Assignee: Andrea Righi (arighi)
         Status: New

** Also affects: linux-aws (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux-aws (Ubuntu Eoan)
   Importance: Undecided
       Status: New

** Changed in: linux-aws (Ubuntu Eoan)
   Importance: Undecided => High

** Changed in: linux-aws (Ubuntu Focal)
   Importance: Undecided => High

** Changed in: linux-aws (Ubuntu Focal)
     Assignee: (unassigned) => Andrea Righi (arighi)

** Changed in: linux-aws (Ubuntu Eoan)
     Assignee: (unassigned) => Andrea Righi (arighi)

** Changed in: linux-aws (Ubuntu)
     Assignee: (unassigned) => Andrea Righi (arighi)

** Changed in: linux-aws (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1881869

Title:
  linux-aws: fix Xen / hibernation issues

Status in linux-aws package in Ubuntu:
  New
Status in linux-aws source package in Eoan:
  New
Status in linux-aws source package in Focal:
  New

Bug description:
  [Impact]

  During our AWS testing we were able to trigger some hibernation
  failures in some Xen instance types.

  One problem is a kernel panic in the resume callback of the xen-
  netfront driver. A workaround to this problem is to compile the driver
  as a module and reload it at resume (we were already doing this reload
  with the bionic kernel that had this driver compiled as a module, but
  for some reasons eoan and focal had this statically compiled).

  Other issues were showing up as hangs on resume, these seem to be prevented 
by using the new Xen/hibernation patch set posted by Anchal to the LKML:
  https://lore.kernel.org/lkml/cover.1589926004.git.ancha...@amazon.com/

  This new patch set is still being reviewed, but according to our tests
  it really seems to fix some of these hangs on resume.

  In addition to that we can improve hibernation reliability and
  performance even more by applying the updated swapoff optimization
  patch (that has been merged upstream).

  [Test case]

  Create a Xen instance in AWS, hibernate/resume multiple times.

  [Fix]

  The following set of fixes can be used to improve hibernation performance and 
reliability:
   - new Xen/hibernation patch set from the LKML (see link above)
   - config change to compile xen-netfront as a module
   - new swapoff optimization patch

  [Regression potential]

  The xen-netfront config change and the new swapoff optimization patch
  are pretty safe (one is a config change that affects only the xen-
  netfront driver, the other is a clean cherry-pick of an upstream
  commit).

  The new Xen/hibernation update is pretty big and the new patches are
  still under review, however according to our tests it really seems to
  fix some of the hang issues (it definitely makes things better).
  Moreover, all the changes are affecting Xen and they are restricted to
  the hibernation/resume code paths, so, in conclusion, the overall
  regression potential is minimal.

  [See also]

  NOTE: the fix mentioned in LP: #1879711 (disable CONFIG_DMA_CMA) was
  also applied during our tests and it is also required to make
  hibernation stable in Xen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1881869/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to