[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Victor Rodriguez
On 1/29/23 00:50, Matt Vandermeulen wrote: I've observed a similar horror when upgrading a cluster from Luminous to Nautilus, which had the same effect of an overwhelming amount of snaptrim making the cluster unusable. In our case, we held its hand by setting all OSDs to have zero max trimmin

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Matt Vandermeulen
I've observed a similar horror when upgrading a cluster from Luminous to Nautilus, which had the same effect of an overwhelming amount of snaptrim making the cluster unusable. In our case, we held its hand by setting all OSDs to have zero max trimming PGs, unsetting nosnaptrim, and then slowly

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-28 Thread Victor Rodriguez
After some investigation this is what I'm seeing: - OSD processes get stuck at least at 100% CPU if I ceph osd unset nosnaptrim. They keep at 100% CPU even if I ceph osd set nosnaptrim. They stayed like that for at least 26 hours. Some quick benchmarks don't show a reduction of the performance