Re: [PATCH v3 00/11] Fix PM hibernation in Xen guests

2020-09-11 Thread Anchal Agarwal
On Fri, Aug 28, 2020 at 06:39:45PM +, Anchal Agarwal wrote:
> On Fri, Aug 28, 2020 at 08:29:24PM +0200, Rafael J. Wysocki wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> > 
> > 
> > 
> > On Fri, Aug 28, 2020 at 8:26 PM Anchal Agarwal  wrote:
> > >
> > > On Fri, Aug 21, 2020 at 10:22:43PM +, Anchal Agarwal wrote:
> > > > Hello,
> > > > This series fixes PM hibernation for hvm guests running on xen 
> > > > hypervisor.
> > > > The running guest could now be hibernated and resumed successfully at a
> > > > later time. The fixes for PM hibernation are added to block and
> > > > network device drivers i.e xen-blkfront and xen-netfront. Any other 
> > > > driver
> > > > that needs to add S4 support if not already, can follow same method of
> > > > introducing freeze/thaw/restore callbacks.
> > > > The patches had been tested against upstream kernel and xen4.11. Large
> > > > scale testing is also done on Xen based Amazon EC2 instances. All this 
> > > > testing
> > > > involved running memory exhausting workload in the background.
> > > >
> > > > Doing guest hibernation does not involve any support from hypervisor and
> > > > this way guest has complete control over its state. Infrastructure
> > > > restrictions for saving up guest state can be overcome by guest 
> > > > initiated
> > > > hibernation.
> > > >
> > > > These patches were send out as RFC before and all the feedback had been
> > > > incorporated in the patches. The last v1 & v2 could be found here:
> > > >
> > > > [v1]: https://lkml.org/lkml/2020/5/19/1312
> > > > [v2]: https://lkml.org/lkml/2020/7/2/995
> > > > All comments and feedback from v2 had been incorporated in v3 series.
> > > >
> > > > Known issues:
> > > > 1.KASLR causes intermittent hibernation failures. VM fails to resumes 
> > > > and
> > > > has to be restarted. I will investigate this issue separately and 
> > > > shouldn't
> > > > be a blocker for this patch series.
> > > > 2. During hibernation, I observed sometimes that freezing of tasks 
> > > > fails due
> > > > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may 
> > > > be 1
> > > > out of 200 runs and hibernation is aborted in this case. Re-trying 
> > > > hibernation
> > > > may work. Also, this is a known issue with hibernation and some
> > > > filesystems like XFS has been discussed by the community for years with 
> > > > not an
> > > > effectve resolution at this point.
> > > >
> > > > Testing How to:
> > > > ---
> > > > 1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 
> > > > +upstream
> > > > xen-4.11]
> > > > 2. Bring up a HVM guest w/t kernel compiled with hibernation patches
> > > > [I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem 
> > > > images].
> > > > 3. Create a swap file size=RAM size
> > > > 4. Update grub parameters and reboot
> > > > 5. Trigger pm-hibernation from within the VM
> > > >
> > > > Example:
> > > > Set up a file-backed swap space. Swap file size>=Total memory on the 
> > > > system
> > > > sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB
> > > > sudo chmod 600 /swap
> > > > sudo mkswap /swap
> > > > sudo swapon /swap
> > > >
> > > > Update resume device/resume offset in grub if using swap file:
> > > > resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1
> > > >
> > > > Execute:
> > > > 
> > > > sudo pm-hibernate
> > > > OR
> > > > echo disk > /sys/power/state && echo reboot > /sys/power/disk
> > > >
> > > > Compute resume offset code:
> > > > "
> > > > #!/usr/bin/env python
> > > > import sys
> > > > import array
> > > > import fcntl
> > > >
> > > > #swap file
> > > > f = open(sys.argv[1], 'r')
> > > > buf = array.array('L', [0])
> > > >
> > > > #FIBMAP
> > > > ret = fcntl.ioctl(f.fileno(), 0x01, buf)
> > > > print buf[0]
> > > > "
> > > >
> > > > Aleksei Besogonov (1):
> > > >   PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
> > > >
> > > > Anchal Agarwal (4):
> > > >   x86/xen: Introduce new function to map HYPERVISOR_shared_info on
> > > > Resume
> > > >   x86/xen: save and restore steal clock during PM hibernation
> > > >   xen: Introduce wrapper for save/restore sched clock offset
> > > >   xen: Update sched clock offset to avoid system instability in
> > > > hibernation
> > > >
> > > > Munehisa Kamata (5):
> > > >   xen/manage: keep track of the on-going suspend mode
> > > >   xenbus: add freeze/thaw/restore callbacks support
> > > >   x86/xen: add system core suspend and resume callbacks
> > > >   xen-blkfront: add callbacks for PM suspend and hibernation
> > > >   xen-netfront: add callbacks for PM suspend and hibernation
> > > >
> > > > Thomas Gleixner (1):
> > > >   genirq: Shutdown irq chips in suspend/resume during hibernation
> > > >
> > > >  arch/x86/xen/enlighten_hvm.c  |   7 +++
> > > >

Re: [PATCH v3 00/11] Fix PM hibernation in Xen guests

2020-09-11 Thread boris . ostrovsky


On 8/21/20 6:22 PM, Anchal Agarwal wrote:
>
> Known issues:
> 1.KASLR causes intermittent hibernation failures. VM fails to resumes and
> has to be restarted. I will investigate this issue separately and shouldn't
> be a blocker for this patch series.


Is there any change in status for this? This has been noted since January.


-boris


> 2. During hibernation, I observed sometimes that freezing of tasks fails due
> to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1
> out of 200 runs and hibernation is aborted in this case. Re-trying hibernation
> may work. Also, this is a known issue with hibernation and some
> filesystems like XFS has been discussed by the community for years with not an
> effectve resolution at this point.
>



Re: [PATCH v3 00/11] Fix PM hibernation in Xen guests

2020-08-28 Thread Rafael J. Wysocki
On Fri, Aug 28, 2020 at 8:26 PM Anchal Agarwal  wrote:
>
> On Fri, Aug 21, 2020 at 10:22:43PM +, Anchal Agarwal wrote:
> > Hello,
> > This series fixes PM hibernation for hvm guests running on xen hypervisor.
> > The running guest could now be hibernated and resumed successfully at a
> > later time. The fixes for PM hibernation are added to block and
> > network device drivers i.e xen-blkfront and xen-netfront. Any other driver
> > that needs to add S4 support if not already, can follow same method of
> > introducing freeze/thaw/restore callbacks.
> > The patches had been tested against upstream kernel and xen4.11. Large
> > scale testing is also done on Xen based Amazon EC2 instances. All this 
> > testing
> > involved running memory exhausting workload in the background.
> >
> > Doing guest hibernation does not involve any support from hypervisor and
> > this way guest has complete control over its state. Infrastructure
> > restrictions for saving up guest state can be overcome by guest initiated
> > hibernation.
> >
> > These patches were send out as RFC before and all the feedback had been
> > incorporated in the patches. The last v1 & v2 could be found here:
> >
> > [v1]: https://lkml.org/lkml/2020/5/19/1312
> > [v2]: https://lkml.org/lkml/2020/7/2/995
> > All comments and feedback from v2 had been incorporated in v3 series.
> >
> > Known issues:
> > 1.KASLR causes intermittent hibernation failures. VM fails to resumes and
> > has to be restarted. I will investigate this issue separately and shouldn't
> > be a blocker for this patch series.
> > 2. During hibernation, I observed sometimes that freezing of tasks fails due
> > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1
> > out of 200 runs and hibernation is aborted in this case. Re-trying 
> > hibernation
> > may work. Also, this is a known issue with hibernation and some
> > filesystems like XFS has been discussed by the community for years with not 
> > an
> > effectve resolution at this point.
> >
> > Testing How to:
> > ---
> > 1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream
> > xen-4.11]
> > 2. Bring up a HVM guest w/t kernel compiled with hibernation patches
> > [I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem 
> > images].
> > 3. Create a swap file size=RAM size
> > 4. Update grub parameters and reboot
> > 5. Trigger pm-hibernation from within the VM
> >
> > Example:
> > Set up a file-backed swap space. Swap file size>=Total memory on the system
> > sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB
> > sudo chmod 600 /swap
> > sudo mkswap /swap
> > sudo swapon /swap
> >
> > Update resume device/resume offset in grub if using swap file:
> > resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1
> >
> > Execute:
> > 
> > sudo pm-hibernate
> > OR
> > echo disk > /sys/power/state && echo reboot > /sys/power/disk
> >
> > Compute resume offset code:
> > "
> > #!/usr/bin/env python
> > import sys
> > import array
> > import fcntl
> >
> > #swap file
> > f = open(sys.argv[1], 'r')
> > buf = array.array('L', [0])
> >
> > #FIBMAP
> > ret = fcntl.ioctl(f.fileno(), 0x01, buf)
> > print buf[0]
> > "
> >
> > Aleksei Besogonov (1):
> >   PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
> >
> > Anchal Agarwal (4):
> >   x86/xen: Introduce new function to map HYPERVISOR_shared_info on
> > Resume
> >   x86/xen: save and restore steal clock during PM hibernation
> >   xen: Introduce wrapper for save/restore sched clock offset
> >   xen: Update sched clock offset to avoid system instability in
> > hibernation
> >
> > Munehisa Kamata (5):
> >   xen/manage: keep track of the on-going suspend mode
> >   xenbus: add freeze/thaw/restore callbacks support
> >   x86/xen: add system core suspend and resume callbacks
> >   xen-blkfront: add callbacks for PM suspend and hibernation
> >   xen-netfront: add callbacks for PM suspend and hibernation
> >
> > Thomas Gleixner (1):
> >   genirq: Shutdown irq chips in suspend/resume during hibernation
> >
> >  arch/x86/xen/enlighten_hvm.c  |   7 +++
> >  arch/x86/xen/suspend.c|  63 
> >  arch/x86/xen/time.c   |  15 -
> >  arch/x86/xen/xen-ops.h|   3 +
> >  drivers/block/xen-blkfront.c  | 122 
> > --
> >  drivers/net/xen-netfront.c|  96 +-
> >  drivers/xen/events/events_base.c  |   1 +
> >  drivers/xen/manage.c  |  46 ++
> >  drivers/xen/xenbus/xenbus_probe.c |  96 +-
> >  include/linux/irq.h   |   2 +
> >  include/xen/xen-ops.h |   3 +
> >  include/xen/xenbus.h  |   3 +
> >  kernel/irq/chip.c |   2 +-
> >  kernel/irq/internals.h|   1 +
> >  kernel/irq/pm.c   |  31 +++---
> >  kernel/power/user.c