Another update: Giovanna agreed to switch back to mclock_scheduler and
adjust osd_snap_trim_cost to 400K. It looks very promising, after a
few hours the snaptrim queue was processed.
@Sridhar: thanks a lot for your valuable input!
Zitat von Eugen Block :
Quick update: we decided to switch t
Quick update: we decided to switch to wpq to see if that would confirm
our suspicion, and it did. After a few hours all PGs in the snaptrim
queue had been processed. We haven't looked into the average object
sizes yet, maybe we'll try that approach next week or so. If you have
any other ide
Hi,
as expected the issue is not resolved and turned up again a couple of
hours later. Here's the tracker issue:
https://tracker.ceph.com/issues/67702
I also attached a log snippet from one osd with debug_osd 10 to the
tracker. Let me know if you need anything else, I'll stay in touch
wi
Hi Eugen,
On Fri, Aug 23, 2024 at 1:37 PM Eugen Block wrote:
> Hi again,
>
> I have a couple of questions about this.
> What exactly happened to the PGs? They were queued for snaptrimming,
> but we didn't see any progress. Let's assume the average object size
> in that pool was around 2 MB (I do
Hi again,
I have a couple of questions about this.
What exactly happened to the PGs? They were queued for snaptrimming,
but we didn't see any progress. Let's assume the average object size
in that pool was around 2 MB (I don't have the actual numbers). Does
that mean if osd_snap_trim_cost (
Oh yeah, I think I stumbled upon that as well, but then it slipped my
mind again. Thanks for pointing that out, I appreciate it!
Zitat von Sridhar Seshasayee :
Hi Eugen,
There was a PR (https://github.com/ceph/ceph/pull/55040) related to mClock
and snaptrim
that was backported and available
Hi Eugen,
There was a PR (https://github.com/ceph/ceph/pull/55040) related to mClock
and snaptrim
that was backported and available from v18.2.4. The fix more accurately
determines the
cost (instead of priority with wpq) of snaptrim operation depending on the
average size of
the objects in the PG.
I know, I know, but since the rest seemed to work well I didn't want
to change it yet but rather analyze what else was going on. And since
we found a few things, it was worth it. :-)
Zitat von Joachim Kraftmayer :
Hi Eugen,
the first what can into my mind was replace mclock with wpq.
Joachi
Hi Eugen,
the first what can into my mind was replace mclock with wpq.
Joachim
Eugen Block schrieb am Do., 22. Aug. 2024, 14:31:
> Just a quick update on this topic. I assisted Giovanna directly off
> list. For now the issue seems resolved, although I don't think we
> really fixed anything but r
Just a quick update on this topic. I assisted Giovanna directly off
list. For now the issue seems resolved, although I don't think we
really fixed anything but rather got rid of the current symptoms.
A couple of findings for posterity:
- There's a k8s pod creating new snap-schedules every co
Hello Eugen,
Hi (please don't drop the ML from your responses),
Sorry. I didn't pay attention. I will.
All PGs of pool cephfs are affected and they are in all OSDs
then just pick a random one and check if anything stands out. I'm not
sure if you mentioned it already, did you also try rest
Hi (please don't drop the ML from your responses),
All PGs of pool cephfs are affected and they are in all OSDs
then just pick a random one and check if anything stands out. I'm not
sure if you mentioned it already, did you also try restarting OSDs?
Oh, not yesterday. I do it now, then I
ps,
hier the command
[rook@rook-ceph-tools-5459f7cb5b-p55np /]$ ceph pg dump | grep snaptrim
| grep -v 'snaptrim_wait' | awk '{print $18}' | sort | uniq
dumped all
0
10
11
2
8
9
Am 20.08.2024 um 10:25 schrieb MARTEL Arnaud:
I had this problem once in the past and found that it was related to
Hello Arnaud,
I have all 6 OSDs in the List :-(.
Thanks, for Idea, maybe could help other users
Regards,
Giovanna
Am 20.08.2024 um 10:25 schrieb MARTEL Arnaud:
ceph pg dump | grep snaptrim | grep -v ‘snaptrim_wait’
--
Giovanna Ratini
Mail:rat...@dbvis.inf.uni-konstanz.de
Phone: +49 (0) 7
I had this problem once in the past and found that it was related to a
particular osd. To identify it, I ran the command “ceph pg dump | grep snaptrim
| grep -v ‘snaptrim_wait’” and found that the osd displayed in the “UP_PRIMARY”
column was almost always the same.
So I restarted this osd and t
Did you reduce the default values I mentioned? You could also look
into the historic_ops of the primary OSD for one affected PG:
ceph tell osd. dump_historic_ops_by_duration
But I'm not sure if that can actually help here. There are plenty of
places to look at, you could turn on debug logs o
Hello Eugen,
yesterday after stop and go of snaptrim the queue decrease a little and
then remain blocked.
They didn't grow and didn't decrease.
Is that good or bad?
Am 19.08.2024 um 15:43 schrieb Eugen Block:
There's a lengthy thread [0] where several approaches are proposed.
The worst is a
Hello Eugen,
root@kube-master02:~# k ceph config get osd osd_pg_max_concurrent_snap_trims
Info: running 'ceph' command with args: [config get osd
osd_pg_max_concurrent_snap_trims]
2
root@kube-master02:~# k ceph config get osd osd_max_trimming_pgs
Info: running 'ceph' command with args: [config
There's a lengthy thread [0] where several approaches are proposed.
The worst is a OSD recreation, but that's the last resort, of course.
What's are the current values for these configs?
ceph config get osd osd_pg_max_concurrent_snap_trims
ceph config get osd osd_max_trimming_pgs
Maybe decrea
Hallo Eugen,
yes, the load is for now not too much.
I stop the snap and now this is the output. No changes in the queue.
root@kube-master02:~# k ceph -s
Info: running 'ceph' command with args: [-s]
cluster:
id: 3a35629a-6129-4daf-9db6-36e0eda637c7
health: HEALTH_WARN
n
What happens when you disable snaptrimming entirely?
ceph osd set nosnaptrim
So the load on your cluster seems low, but are the OSDs heavily
utilized? Have you checked iostat?
Zitat von Giovanna Ratini :
Hello Eugen,
*root@kube-master02:~# k ceph -s*
Info: running 'ceph' command with arg
Hello Eugen,
*root@kube-master02:~# k ceph -s*
Info: running 'ceph' command with args: [-s]
cluster:
id: 3a35629a-6129-4daf-9db6-36e0eda637c7
health: HEALTH_WARN
32 pgs not deep-scrubbed in time
32 pgs not scrubbed in time
services:
mon: 3 daemons, qu
Can you share the current ceph status? Are the OSDs reporting anything
suspicious? How is the disk utilization?
Zitat von Giovanna Ratini :
More information:
The snaptrim take a lot of time but the he objects_trimmed are "0"
"objects_trimmed": 0,
"snaptrim_duration": 500.5807601752,
I
More information:
The snaptrim take a lot of time but the he objects_trimmed are "0"
"objects_trimmed": 0,
"snaptrim_duration": 500.5807601752,
It could explain, why the queue are growing up..
Am 17.08.2024 um 14:37 schrieb Giovanna Ratini:
Hello again,
I checked the pgs dump. Snapshot
Hello again,
I checked the pgs dump. Snapshot grow up
Query für PG: 3.12
{
"snap_trimq":
"[5b974~3b,5cc3a~1,5cc3c~1,5cc3e~1,5cc40~1,5cd83~1,5cd85~1,5cd87~1,5cd89~1,5cecc~1,5cece~4,5ced3~2,5cf72~1,5cf74~4,5cf79~a2,5d0b8~1,5d0bb~1,5d0bd~a5,5d1f9~2,5d204~a5,5d349~a7,5d48e~3,5d493~a4,5d5d7~a7,5
Hello Eugen,
thank you for your answer.
I restarted all the kube-ceph nodes one after the other. Nothing has
changed.
ok, I deactivate the snap ... : ceph fs snap-schedule deactivate /
Is there a way to see how many snapshots will be deleted per hour?
Regards,
Gio
Am 17.08.2024 um 10:
Hi,
have you tried to fail the mgr? Sometimes the PG stats are not
correct. You could also temporarily disable snapshots to see if things
settle down.
Zitat von Giovanna Ratini :
Hello all,
We use Ceph (v18.2.2) and Rook (1.14.3) as the CSI for a Kubernetes
environment. Last week, we h
27 matches
Mail list logo