** Description changed: - TODO + [Impact] + + * During resuming EC2 instances from hibernation sometimes processes + are killed OOM manager. + + [Test Case] + + * Set up an EC2 instance to allow hibernation as the stop instance action. + * Start the attached Python script in a screen session to reserve 85% of the memory: + python3 mem-waster-pct.py -p 85 + + * Log out, hibernate, then resume the instance. + * Observe the Python script still running after resuming + + [Regression Potential] + + * The fix is setting memory overcommit policy to 'always overcommit' + while removing the swap file. This helps dealing with the shrinking swap + space during the swap removal. There is no expected side effect, since + processes trying to allocate excessive amount of memory would fail with + stricter policies, too. + + The fix introduces a potential race condition with processes detecting + the overcommit policy: + + The policy used when the hibernation took place is saved shortly after + resuming and it is restored after the swap file is removed. In this time + window other processes detect the policy as 'always overcommit', despite + it may not have been set as such before hibernation and may be restored + to a different policy after removing the swap file. Hitting this race + condition seems to be unlikely and there seem to be no good way of + avoiding it.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863242 Title: [SRU] OOM errors with new kernels on resuming To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ec2-hibinit-agent/+bug/1863242/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs