** Description changed:

- TODO
+ [Impact]
+ 
+  * During resuming EC2 instances from hibernation sometimes processes
+ are killed OOM manager.
+ 
+ [Test Case]
+ 
+  * Set up an EC2 instance to allow hibernation as the stop instance action.
+  * Start the attached Python script in a screen session to reserve 85% of the 
memory:
+   python3 mem-waster-pct.py -p 85
+ 
+  * Log out, hibernate, then resume the instance.
+  * Observe the Python script still running after resuming
+ 
+ [Regression Potential]
+ 
+  * The fix is setting memory overcommit policy to 'always overcommit'
+ while removing the swap file. This helps dealing with the shrinking swap
+ space during the swap removal. There is no expected side effect, since
+ processes trying to allocate excessive amount of memory would fail with
+ stricter policies, too.
+ 
+ The fix introduces a potential race condition with processes detecting
+ the overcommit policy:
+ 
+ The policy used when the hibernation took place is saved shortly after
+ resuming and it is restored after the swap file is removed. In this time
+ window other processes detect the policy as 'always overcommit', despite
+ it may not have been set as such before hibernation and may be restored
+ to a different policy after removing the swap file. Hitting this race
+ condition seems to be unlikely and there seem to be no good way of
+ avoiding it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1863242

Title:
  [SRU] OOM errors with new kernels on resuming

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ec2-hibinit-agent/+bug/1863242/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to