[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Lars Köppel
I updated from 17.2.6 to 17.2.7 and a few hours later to 18.2.2. Would it be an option to go back to 17.2.6? [image: ariadne.ai Logo] Lars Köppel Developer Email: lars.koep...@ariadne.ai Phone: +49 6221 5993580 <+4962215993580> ariadne.ai (Germany) GmbH Häusserstraße 3, 69115 Heidelberg Amtsgeric

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-13 Thread Bandelow, Gunnar
Hi Torkil, Maybe im overlooking something, but how about just renaming the datacenter buckets? Best regards, Gunnar --- Original Nachricht --- Betreff: [ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree? Von: "Torkil Svensgaard" An: "Matthias Grandl" CC: 

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-13 Thread Torkil Svensgaard
On 13/06/2024 12:17, Bandelow, Gunnar wrote: Hi Torkil, Hi Gunnar Maybe im overlooking something, but how about just renaming the datacenter buckets? Here's the ceph osd tree command header and my pruned tree: IDCLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-13 Thread Torkil Svensgaard
On 13/06/2024 08:54, Janne Johansson wrote: We made a mistake when we moved the servers physically so while the replica 3 is intact the crush tree is not accurate. If we just remedy the situation with "ceph osd crush move ceph-flashX datacenter=Y" we will just end up with a lot of misplaced da

[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Eugen Block
Downgrading isn't supported, I don't think that would be a good idea. I also don't see anything obvious standing out in the pg output. Any chance you can add more OSDs to the metadata pool to see if it stops at some point? Did the cluster usage change in any way? For example cephfs snapshot

[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Lars Köppel
We have been using snapshots for a long time. The only change in usage is that we are currently deleting many small files from the system. Because this is slow (~150 requests/s) this is running for the last few weeks. Could such a load result in a problem with the MDS? I have to ask for permission

[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Eugen Block
I'm quite sure that this could result in the impact you're seeing. To confirm that suspicion you could stop deleting and wait a couple of days to see if the usage stabilizes. And if it does, maybe delete less files at once or so to see how far you can tweak it. That would be my approach.

[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities

2024-06-13 Thread Frédéric Nass
Hello, 'ceph osd deep-scrub 5' deep-scrubs all PGs for which osd.5 is primary (and only those). You can check that from ceph-osd.5.log by running: for pg in $(grep 'deep-scrub starts' /var/log/ceph/*/ceph-osd.5.log | awk '{print $8}') ; do echo "pg: $pg, primary osd is osd.$(ceph pg $pg query -

[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Lars Köppel
ok. Thank you for your help. We will try this and report back in a few days. [image: ariadne.ai Logo] Lars Köppel Developer Email: lars.koep...@ariadne.ai Phone: +49 6221 5993580 <+4962215993580> ariadne.ai (Germany) GmbH Häusserstraße 3, 69115 Heidelberg Amtsgericht Mannheim, HRB 744040 Geschäft

[ceph-users] deep scrubb and scrubb does get the job done

2024-06-13 Thread Manuel Oetiker
Hi our cluster is on warning for more than two weeks we had to move some pools form ssd to hdd and it looked good ... but somehow the pgs scrubb does not get done with his jobs * PG_NOT_DEEP_SCRUBBED : 171 pgs not deep-scrubbed in time * PG_NOT_SCRUBBED : 132 pgs not scrubbed in time

[ceph-users] Ceph crash :-(

2024-06-13 Thread Ranjan Ghosh
Hi all, I just upgraded the first node of our cluster to Ubuntu 24.04 (Noble) from 23.10 (Mantic). Unfortunately Ceph doesnt work anymore on the new node: ===  ceph version 19.2.0~git20240301.4c76c50 (4c76c50a73f63ba48ccdf0adccce03b00d1d80c7) squid (dev)  1: /lib/x86_64-linux-gnu/libc.so.6

[ceph-users] Re: Ceph crash :-(

2024-06-13 Thread Robert Sander
On 13.06.24 18:18, Ranjan Ghosh wrote: What's more APT says I now got a Ceph Version (19.2.0~git20240301.4c76c50-0ubuntu6) which doesn't even have any official release notes: Ubuntu 24.04 ships with that version from a git snapshot. You have to ask Canonical why they did this. I would not u

[ceph-users] Re: [SPAM] Re: Ceph crash :-(

2024-06-13 Thread Sebastian
If this is one node from many it’s not a problem because you can reinstall system and ceph and rebalance cluster. BTW. Read release notes before :) I’m also not reading it in case of my personal desktop, but on servers where I keep data I’m doing it. but what canonical did in this case is… this

[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread sinan
I have doing some further testing. My RGW pool is placed on spinning disks. I created a 2nd RGW data pool, placed on flash disks. Benchmarking on HDD pool: Client 1 -> 1 RGW Node: 150 obj/s Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client) Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s Cl

[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread Anthony D'Atri
How large are the objects you tested with? > On Jun 13, 2024, at 14:46, si...@turka.nl wrote: > > I have doing some further testing. > > My RGW pool is placed on spinning disks. > I created a 2nd RGW data pool, placed on flash disks. > > Benchmarking on HDD pool: > Client 1 -> 1 RGW Node: 150

[ceph-users] Re: Ceph crash :-(

2024-06-13 Thread Robert Sander
Hi, On 13.06.24 20:29, Ranjan Ghosh wrote: Other Ceph nodes run on 18.2 which came with the previous Ubuntu version. I wonder if I could easily switch to Ceph packages or whether that would cause even more problems. Perhaps it's more advisable to wait until Ubuntu releases proper packages.

[ceph-users] Re: [SPAM] Re: Ceph crash :-(

2024-06-13 Thread David C.
Debian unstable The situation is absolutely not dramatic, but if it is a large production, you should benefit from product support. Based on the geographical area of your email domain, perhaps ask Robert for a local service ? Le jeu. 13 juin 2024 à 20:35, Ranjan Ghosh a écrit : > I'm still in

[ceph-users] Re: [SPAM] Re: Ceph crash :-(

2024-06-13 Thread Ranjan Ghosh
I'm still in doubt whether any reinstall will fix this issue, because the packages seem to be buggy and there a no better packages right now for Ubuntu 24.04 it seems. Canonical is really crazy if you ask me. Even for a non-LTS version but especially for an LTS version. What were they thinking

[ceph-users] Re: Ceph crash :-(

2024-06-13 Thread David C.
In addition to Robert's recommendations, Remember to respect the update order (mgr->mon->(crash->)osd->mds->...) Before everything was containerized, it was not recommended to have different services on the same machine. Le jeu. 13 juin 2024 à 19:37, Robert Sander a écrit : > On 13.06.24 18:

[ceph-users] Re: Patching Ceph cluster

2024-06-13 Thread Sake Ceph
Yeah we fully automated this with Ansible. In short we do the following. 1. Check if cluster is healthy before continuing (via REST-API) only health_ok is good 2. Disable scrub and deep-scrub 3. Update all applications on all the hosts in the cluster 4. For every host, one by one, do the followi

[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread sinan
Disabling Nagle didn't have any effect. I created a new RGW pool (data, index), both on flash disks. No effect. I set the size=2, no effect. Btw, cluster is running on Octopus (15.2). When using 3 MB/s objects, I am still getting 150 objects/s. Just a higher throughput (150x3MB = 450MB/s). But

[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread Anthony D'Atri
There you go. Tiny objects are the hardest thing for any object storage service: you can have space amplification and metadata operations become a very high portion of the overall workload. With 500KB objects, you may waste a significant fraction of underlying space -- especially if you have

[ceph-users] Can't comment on my own tracker item any more

2024-06-13 Thread Frank Schilder
Hi all, I just received a notification about a bug I reported 4 years ago (https://tracker.ceph.com/issues/45253): > Issue #45253 has been updated by Victoria Mackie. I would like to leave a comment, but the comment function seems not available any more even through I'm logged in and I'm repor

[ceph-users] Re: Can't comment on my own tracker item any more

2024-06-13 Thread Frank Schilder
OK, I can click on the little "quote" symbol and then a huge dialog opens that says "edit" but means "comment". Would it be possible to add the simple comment action again? Also, that the quote action removes nested text makes it a little bit less useful than it could be. I had to copy the code

[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread Sinan Polat
500K object size > Op 13 jun 2024 om 21:11 heeft Anthony D'Atri het > volgende geschreven: > > How large are the objects you tested with? > >> On Jun 13, 2024, at 14:46, si...@turka.nl wrote: >> >> I have doing some further testing. >> >> My RGW pool is placed on spinning disks. >> I crea

[ceph-users] Re: deep scrubb and scrubb does get the job done

2024-06-13 Thread Frank Schilder
Yes, there is: https://github.com/frans42/ceph-goodies/blob/main/doc/TuningScrub.md This is work in progress and a few details are missing, but it should help you find the right parameters. Note that this is tested on octopus with WPQ. Best regards, = Frank Schilder AIT Risø Cam

[ceph-users] Are ceph commands backward compatible?

2024-06-13 Thread Satoru Takeuchi
Hi, I'm developing some tools that execute ceph commands like rbd. During development, I have come to wonder about compatibility of ceph commands. I'd like to use ceph commands which version is >= the version used by ceph daemons. It results in executing new ceph commands against ceph clusters us

[ceph-users] Re: Patching Ceph cluster

2024-06-13 Thread Michael Worsham
I'd love to see what your playbook(s) looks like for doing this. -- Michael From: Sake Ceph Sent: Thursday, June 13, 2024 4:05 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: Patching Ceph cluster This is an external email. Please take care when clicking lin

[ceph-users] Separated multisite sync and user traffic, doable?

2024-06-13 Thread Szabo, Istvan (Agoda)
Hi, Could that cause any issue if the endpoints defined in the zonegroups are not in the endpoint list behind haproxy? The question is mainly about the role of the endpoint servers in the zonegroup list. Their role is the sync only or something else also? This would be the scenario, could it wo