On 8/11/25 18:24, Sulev-Madis Silber wrote:
damn if this is the same thing that i experience on 13. battled for long time
now
the worst version of this is when wired grows to entire ram size. then entire
system becomes unusable, unless you don't need userland ever
things that use mmap seem to be very good at it, as well as having storage device write
speeds low "helps" as well
i'm not the only one who reports it
the problem is also hard to debug it seems, otherwise it would have been fixed?
i've seen number of zfs related wtf's over a decade but they do get eventually
fixed
in my case, i don't think it leaks. it does give it back on memory pressure. to
a point, still super high wired and i can't see where it goes to. it's not in
anywhere in that lines that various stats utils give
If wired is taken up by ZFS ARC it should give it up reasonably quickly
to within ARC's bounds. Unfortunately ZFS ARC is not the only thing to
use memory as wired and wired memory is not supposed to be able to be
swapped out. That has been confusing for ZFS having came from UFS as UFS
memory was more clearly marked as 'temporarily using RAM while we can'
for tools like top.
i think that it rightfully assumes that free ram is wasted ram and just caches
something somewhere and doesn't ever give it back or at least does it very
slowly
And without caching, ZFS performance can be quite horrible as
copy-on-write should cause noticeable fragmentation to all but single
write files while metadata similarly never seems to get regrouped
efficiently after edits to it without a full copy rewriting it.
i for example observe it being ok for a while after a reboot but after you
start to actually using zfs, running scrubs and so on, it gets into weird state
where it's slower. nothing fails, it just stays like this
I hadn't tracked it down specifically but see ZFS get slower too. My
'guess' is it has trouble reasonably tracking what file metadata to keep
in memory once heavy memory pressure has been applied on the system
(definitely firefox or 'sometimes' a large poudriere job seem easy
enough to trigger it). Performance stays down if memory pressure causing
processes have not been closed and it doesn't always seem to work well
again after closing them.
I see biggest performance hits to fragmented ZFS filesystem metadata
which reads very slowly from magnetic disk.
I think atime updates were one cause of it which is likely why
installs have disabled that on all but 1 zfs pool; haven't tested if
relatime is still much of a source of it. I saw very high performance on
a pool that received a backup of my full system (either 20MB/s or
200MB/s doing directory listings after transfer clean reboot but can't
remember) and found very bad performance checking it a short time later
(<2MB/s not uncommon after cron jobs ran so likely that less database,
permissions, pkg checks, etc. ran around 'accessing' the content). I had
disabled atime for the backup pool and set it to read only trying to
mitigate such unnecessary performance drops but never did a lot of
testing around it.
Other disk modifications/deletions cause it too. Git runs slower over
time doing updates, checking its in a clean state or not, and even 'git
log' after doing a series of updates. Running `git gc` seems to help but
seemed less effective when ran every time instead of waiting until a
number of pull requests and maybe other activities caused slowdown
first. Cache from ccache is another candidate for horrible performance.
I have a ccache4 61GB cache that shows it has been through 301 cleanups
that seems to be getting slower but I've had <20GB cache running more
than 15x slower that this one after a good # of cleanups. Not permitting
cleanups until willing to make a copy seems the best choice for
performance. Running `ccache -X 13` or # greater than original
compression setting seems to help reorder it somewhat but requires
ccache4 (more efficient but misses some compiler substitutions in
poudriere builds so less can be cache accelerated).
I normally 'fix' the performance degradation by moving the impacted
area and copying it back (ex: `mv /var/cache/ccache
/var/cache/ccache.orig&&cp -a /var/cache/ccache2 /var/cache/ccache` and
then rm -rf the original if the copy creates without error but that
sometimes becomes more troublesome if a folder is a dataset instead of
just a folder. It gets silly looking for and managing candidates as even
things like a Firefox or Thunderbird profile get a lot slower with
metadata fragmentation. I haven't tried tools to rewrite files in place
to see if that is enough to cause metadata to be rewritten well without
full copies being temporarily stored.
I haven't tracked how much impact snapshots vs no snapshots have
regarding slowing down datasets that have been through a lot of
modification/deletion. Making copies this way will increase disk size if
dedupe and block cloning are not in use and last I heard block cloning
copies expand out to their own after ZFS replication.
I find that unmounting and remounting swap devices after memory
conditions are reasonable again helps but don't know if its truly
related. If you don't have swap, try adding some and note if there is
any impact too.
anyone else with such issues?
On August 11, 2025 11:18:47 PM GMT+03:00, Mark Millard <[email protected]>
wrote:
Context reported by notafet :
14.3-RELEASE-p2 GENERIC on amd64 with ZFS in use
RAM: looks to be 24 GiBytes, not explicitly mentioned
SWAP: 8192 MiBytes
(From using the image of top's figures.)
Wired: 17 GiBytes
ARC Total: 1942 MiBytes
SWAP used: 1102 MiBytes
The link to the storage channel's message is:
https://discord.com/channels/727023752348434432/757305697527398481/1404367777904463914
===
Mark Millard
marklmi at yahoo.com