** Tags added: ubuntu-certified
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861359
Title:
swap storms kills interactive use
To manage notifications about this bug go to:
I was reminded of this bug earlier today -- Andrea, Sultan, thanks so
much for fixing my issues. I've been happily running along for months
now. :) Thanks!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
This bug was fixed in the package linux - 5.4.0-26.30
---
linux (5.4.0-26.30) focal; urgency=medium
* focal/linux: 5.4.0-26.30 -proposed tracker (LP: #1873882)
* Packaging resync (LP: #1786013)
- update dkms package versions
* swap storms kills interactive use (LP:
** Description changed:
[Impact]
- High watermark boosting can cause large swap activity under certain
- memory intensive workloads, making the system very unresponsive (screen
- does not refresh, keyboard not responding, etc.).
-
- This large swap activity seems to be prevented disabling
Seth, thanks for the update!
JFYI, I've just upladed also a v3 kernel (5.4.0-24.28+lp1861359v3) that
I'm currently testing on my laptop with positive result. This change is
even smaller than the previous one (v2), because we simply disable the
direct swap out in the i915 shrinker
Andrea, I've been running the v1 kernel for a day or so now:
[0.00] Linux version 5.4.0-24-generic (arighi@sita) (gcc version
9.3.0 (Ubuntu 9.3.0-10ubuntu1)) #28+lp1861359v1 SMP Wed Apr 15 14:49:33
UTC 2020 (Ubuntu 5.4.0-24.28+lp1861359v1-generic 5.4.30)
$ uptime
02:21:45 up 1 day, 9
This entry:
* swap storms kills interactive use (LP: #1861359)
- SAUCE: mm/page_alloc.c: disable memory reclaim watermark boosting by
default
closed this bug, but per latest comments, that isn't sufficient to
address the issue. Putting back to Confirmed.
** Changed in: linux (Ubuntu
This bug was fixed in the package linux - 5.4.0-24.28
---
linux (5.4.0-24.28) focal; urgency=medium
* focal/linux: 5.4.0-24.28 -proposed tracker (LP: #1871939)
* getitimer returns it_value=0 erroneously (LP: #1349028)
- [Config] CONTEXT_TRACKING_FORCE policy should be unset
I've uploaded another test kernel (5.4.0-24.28+lp1861359v2):
https://kernel.ubuntu.com/~arighi/LP-1861359/
In this one, instead of completely disabling the i915 shrinker, I'm only
preventing to swap out the i915 caches when the system is short on
memory.
I'm testing this new one on my laptop
Hi Seth, sorry for my late response.
I did more tests this morning on my laptop tracing the callers of
__alloc_pages_nodemask() and I noticed that pretty much all the time it
is called by the i915 shrinker. So I tried to disable it and I have to
say that on my laptop (at least) the system is
I should point out that the period bursts of writes every five seconds
in my vmstat 1 output is due to zfs's flushing mechanism; by default it
flushes dirty pages every five seconds.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Thanks Andrea, I don't think that helped. I'll attach a file with vmstat
1 output and funclatency output, along with a few notes on the testing.
Thanks
** Attachment added: "after-limiting-dirty-bytes"
TL;DR @seth-arnold, as a test can you try to set the following options?
$ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_bytes
$ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_background_bytes
Repeat the test and see if the system is still unresponsive.
Details below.
Stefan, while recent kernels seem happier than previous kernels (I think
-14 era was terrible), I don't think this problem is fixed yet:
sarnold@millbarge:/tmp$ uname -a
Linux millbarge 5.4.0-21-generic #25-Ubuntu SMP Sat Mar 28 13:10:28 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux
Reporter hasn't confirmed that it's corrected yet... "Fix committed"
seems premature.
** Changed in: linux (Ubuntu Focal)
Status: Fix Committed => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
** Also affects: linux (Ubuntu Focal)
Importance: High
Assignee: Andrea Righi (arighi)
Status: Confirmed
** Changed in: linux (Ubuntu Focal)
Status: Confirmed => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
Sultan put together a kernel with some debugging for me:
[101616.889859] __alloc_pages_nodemask: stall of 3683ms for order-0, mask:
0x100dca
[101616.889863] Call Trace:
[101616.889880] __alloc_pages_nodemask+0x34f/0x3b0
[101616.889887] alloc_pages_vma+0x7f/0x200
[101616.889893]
As a note: https://platform.leolabs.space/visualizations/leo is not a
valid reproducer for this bug, since the lags it causes are from
overloading the GPU, not from stressing memory.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Sultan, thanks for the advice.
I set this watermark boost factor to zero as you suggested, and then
decided to try a stupid simple benchmark of my storage -- my swap is a
zfs dataset on nvme. zfs means it'll go slower than raw nvme block
access:
$ dd if=ubuntu-18.04.4-desktop-amd64.iso of=foo
FYI, this bug has nothing to do with the use of swap. It just happens
that the slow writeback incurred by using a swap device backed by non-
volatile memory makes kswapd's bouts of page thrashing last longer,
enough to the point where there's a visual freeze.
--
You received this bug
** Description changed:
+ [Impact]
+
+ High watermark boosting can cause large swap activity under certain
+ memory intensive workloads, making the system very unresponsive (screen
+ does not refresh, keyboard not responding, etc.).
+
+ This large swap activity seems to be prevented disabling
This problem is caused by an upstream memory management feature called
watermark boosting. Normally, when a memory allocation fails and falls
back to the page allocator, the page allocator will wake up kswapd to
free up pages in order to make the memory allocation succeed. kswapd
tries to free
I'm adding the champagne tag to this bug to bring it to a potential
wider audience; I think we may need to take more drastic steps like
disabling swap on upgrades, not offering swap in our installers, etc.,
to try to have a better experience.
Thanks
** Tags added: champagne
--
You received
BTW, this is still happening in:
Linux millbarge 5.4.0-20-generic #24-Ubuntu SMP Mon Mar 23 20:55:46 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux
I've seen it both with firefox in trello, firefox in launchpad (typing
this comment) and doing two sequential wgets of
Using 5.4.0-17.21-generic, my laptop has 16G of ram. If I launch 3 vms
(xenial desktop (768M), bionic desktop (1.5G) and focal desktop (2.6G))
then load this page: https://people.canonical.com/~ubuntu-
security/oval/com.ubuntu.xenial.cve.oval.xml, at some point while the
page is loading, the
There is an interesting (to me, anyway) change of behaviour with the -17
kernel: while earlier kernels would appear to be locked solid for 30-60
seconds before the screen could update, -17 allows screen updates every
six seconds or so.
I have an always-running mosh session to a remote host
@seth-arnold ok I'll do this tests also on my side and see if I can
reproduce the problem. If you find a specific web page that can trigger
the problem easily let me know. Thanks!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Andrea, unfortunately this updated kernel hasn't fixed the problem:
01:02:48 up 21:21, 9 users, load average: 1.45, 0.98, 0.58
Linux millbarge 5.4.0-17-generic #21-Ubuntu SMP Fri Feb 28 16:18:44 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux
I was able to reproduce the swap growth and hangs with
Andrea, this new kernel looks promising.
Linux millbarge 5.4.0-17-generic #21-Ubuntu SMP Fri Feb 28 16:18:44 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux
Unfortunately I didn't check the reproducer before rebooting: the error
message I get with it now suggests that it might not work any more.
The depmod error messages have been fixed in initramfs-tools (see
https://bugs.launchpad.net/bugs/1863261). It doesn't actually prevent
the kernel from booting, so you can safely reboot.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
On Sat, Feb 29, 2020 at 10:06:42AM -, Andrea Righi wrote:
> Seth, can you try to see if you can reproduce the problem with the
> latest unstable kernel (5.4.0-17.21)? https://launchpad.net/~canonical-
> kernel-team/+archive/ubuntu/unstable
>
> I can't reproduce the problem with it. I have not
Seth, can you try to see if you can reproduce the problem with the
latest unstable kernel (5.4.0-17.21)? https://launchpad.net/~canonical-
kernel-team/+archive/ubuntu/unstable
I can't reproduce the problem with it. I have not verified yet, but I
suspect it might be related to this commit:
Many thanks for the reproducer Seth! I've been able to reproduce the
swapping issue on my laptop! Now I can investigate more on my side. I'll
keep you posted!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
This web page may be a good reproducer candidate:
https://platform.leolabs.space/visualizations/conjunction?type=conjunction=2004981040
Loading it in firefox would make my computer unresponsive for over a
minute. (Be careful with firefox reloading it when re-opening firefox.)
Loading it in
I guess we can't use ftrace and secure boot at the same time then...
would it be possible to disable secure boot / kernel lockdown on your
side and run a test using that kprobe-perf command?
If it's not possible or too complicated we'll find an alternative way,
maybe I can create a custom kernel
$ uname -a
Linux millbarge 5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/sys/kernel/ftrace_enabled
1
$ sudo kprobe-perf -s 'p:shrink_node'
[sudo] password for sarnold:
ERROR: func shrink_node not in
Ah! You're right, that's the reason! When the kernel is locked down
ftrace is explicitly disabled. To confirm that, you should have 0 in
/proc/sys/kernel/ftrace_enabled.
Can you try to set it back to 1 and see if kprobe-perf works after that?
Otherwise I'll figure out an alternative way to trace
$ uname -a
Linux millbarge 5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux
$ grep FTRACE /boot/config-`uname -r`
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_STM_SOURCE_FTRACE=m
# CONFIG_PSTORE_FTRACE is not set
Weird that kprobe-perf isn't working... I've just tried it on a fresh
new installed 20.04 instance and:
ubuntu@ubuntu:~$ uname -a
Linux ubuntu 5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux
ubuntu@ubuntu:~$ sudo ls -l
> What do you have in /proc/sys/vm/swappiness? Could you try to set that
> to 0 (kernel prefers to drop file-backed pages instead of swapping out
> anonymous pages) and see if the swap out activity is still happening?
Unfortunately, this did not solve the problem.
Setting swappiness to 0 and
On Mon, Feb 10, 2020 at 07:53:20AM -, Andrea Righi wrote:
> OK, so we know that it's not related to the memory cgroup subsystem.
But this is a good instinct. It does seem to happen when eg firefox or git
is in heavy memory use, not the system as a whole.
> Another reason of such unexpected
OK, so we know that it's not related to the memory cgroup subsystem.
Another reason of such unexpected swapping activity could be due to
memory compaction code that is triggering some direct memory reclaim and
forcing to swap out pages.
What do you have in /proc/sys/vm/swappiness? Could you try
$ systemctl show '*.slice' | grep -e '^Slice' -e '^ControlGroup' -e
"^DefaultMem" -e "^Memory"
Slice=system.slice
ControlGroup=/system.slice/system-systemd\x2dfsck.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
This kernel command line parameter didn't appear to help:
Moments after loading a new URL into firefox:
1 00 69917617804 369982000 0 0
2289 863 2 1 98 0 0
0 00 70286017804 369978000 0 2288
Hello Seth, thanks for reporting the problem. I was wondering if this
could be related to the memory cgroup controller.
As a simple test could you try to reboot the system adding
cgroup_disable=memory to the kernel boot parameters?
In this way if the problem goes away at least we know it's
** Changed in: linux (Ubuntu)
Assignee: Colin Ian King (colin-king) => Andrea Righi (arighi)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861359
Title:
swap storms kills interactive use
To
I had a vmstat 1 running; the entire time the system was swapping out, X
was unusable.
Thanks
** Attachment added: "vmstat 1 output"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359/+attachment/5324495/+files/vmstat-1
--
You received this bug notification because you are a
** Changed in: linux (Ubuntu)
Assignee: (unassigned) => Colin Ian King (colin-king)
** Changed in: linux (Ubuntu)
Importance: Undecided => High
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
I forgot to mention, I also have nvme.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861359
Title:
swap storms kills interactive use
To manage notifications about this bug go to:
FYI, I decided to do this:
$ sudo swapoff -a && sudo swapon -a
$ free -h
totalusedfree shared buff/cache available
Mem: 15Gi 5.9Gi 4.8Gi 2.0Gi 4.8Gi 7.2Gi
Swap: 15Gi 348Mi15Gi
Even though I am no
Seth and I talked about this and I marked this as affects me. If it
helps, I saw this on eoan and focal doesn't make a difference (which
might suggest the change is between disco and eoan).
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to
** Changed in: linux (Ubuntu)
Status: Incomplete => Confirmed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861359
Title:
swap storms kills interactive use
To manage notifications about
apport information
** Description changed:
Hello, several times since upgrading to focal from 19.04 I've found my
computer entirely unresponsive for periods of twenty or thirty seconds.
No mouse movement, no keyboard input, the screen output does not change.
My computer was using swap
apport information
** Tags added: apport-collected
** Description changed:
Hello, several times since upgrading to focal from 19.04 I've found my
computer entirely unresponsive for periods of twenty or thirty seconds.
No mouse movement, no keyboard input, the screen output does not
** Package changed: linux-signed-5.4 (Ubuntu) => linux (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861359
Title:
swap storms kills interactive use
To manage notifications about this
** Attachment added: "vmstat 1 output"
https://bugs.launchpad.net/ubuntu/+source/linux-signed-5.4/+bug/1861359/+attachment/5323949/+files/vmstat1
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
** Attachment added: "a small bit of top output after an event"
https://bugs.launchpad.net/ubuntu/+source/linux-signed-5.4/+bug/1861359/+attachment/5323950/+files/top
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
57 matches
Mail list logo