Am Sat, 10 Feb 2018 14:23:34 +0100 schrieb Kai Krakow: > Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala: > >> On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote: >>> This last log lines indicates journald wasn't scheduled for a long >>> time which caused the watchdog to hit and journald was aborted. >>> Consider increasing the watchdog timeout if your system is indeed that >>> loaded and that's is supposed to be an OK thing... >> >> BTW I've seen the same behavior on a system with a single active >> process that uses enough memory to trigger significant swap use. I >> wonder if there has been a regression in the kernel causing misbehavior >> when swapping? The problems aren't specific to journald - desktop >> environment can totally freeze too etc. > > This problem seems to be there since kernel 4.9 which was a real pita in > this regard. It's progressively becoming better since kernel 4.10. The > kernel seems trying to prevent swapping at any cost since then, at least > at the cost of much higher latency, and at the cost of pushing all cache > out of RAM. > > The result is processes stuck for easily 30 seconds and more during > memory pressure. Sometimes I see the kernel loudly complaining in dmesg > about high wait times for allocating RAM, especially from the btrfs > module. Thus, the biggest problem may be that kernel threads itself get > stuck in memory allocations and are a victim of high latency. > > Currently I'm running my user session in a slice with max 80% RAM which > seems to help. It helps not discarding all cache. I also put some > potentially high memory users (regarding cache and/or resident mem) into > slices with carefully selected memory limits (backup and maintenance > services). Slices limited in such a way will start swapping before cache > is discarded and everything works better again. Part of this problem may > be that I have one process running which mmaps and locks 1G of memory > (bees, a btrfs deduplicator). > > This system has 16G of RAM which is usually plenty but I use tmpfs to > build packages in Gentoo, and while that worked wonderfully before 4.9, > I have to be really careful now. The kernel happily throws away cache > instead of swapping early. Setting vm.swappiness differently seems to > have no perceivable effect. > > Software that uses mmap is the first latency victim of this new > behavior. > As such, also systemd-journald seems to be hit hard by this. > > After the system recovered from high memory pressure (which can take > 10-15 minutes, resulting in a loadavg of 400+), it ends up with some > gigabytes of inactive memory in the swap which it will only swap back in > then during shutdown (which will also take some minutes then). > > The problem since 4.9 seems to be that the kernel tends to do swap > storms instead of constantly swapping out memory at low rates during > usage. The swap storms totally thrash the system. > > Before 4.9, the kernel had no such latency spikes under memory pressure. > Swap would usually grew slowly over time, and the system felt sluggish > one or another time but still usable wrt latency. I usually ended up > with 5-8G of swap usage, and that was no problem. Now, swap only > significantly grows during swap storms with an unusable system for many > minutes, with latencies of 10+ seconds around twice per minute. > > I had no swap storm yet since the last boot, and swap usage is around > 16M now. Before kernel 4.9, this would be much higher already.
After some more research, I found that vm.watermark_scale_factor may be the knob I am looking for. I'm going to watch behavior now with a higher factor (default = 10, now 200). -- Regards, Kai Replies to list-only preferred. _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel