[ceph-users] Re: What's the best way to add numerous OSDs?

2024-08-06 Thread Boris
Hi Fabien, additional to what Anthony said you could do the following: - `ceph osd set nobackfill` to disable initial backfilling - `ceph config set osd osd_mclock_override_recovery_settings true` to override the mclock sheduler backfill settings - Let the orchestrator add one host each time. I w

[ceph-users] Re: What's the best way to add numerous OSDs?

2024-08-06 Thread Fox, Kevin M
some kernels (el7?) lie about being jewel until after they are blocked from connecting at jewel. then they report newer. Just fyi. From: Anthony D'Atri Sent: Tuesday, August 6, 2024 5:08 PM To: Fabien Sirjean Cc: ceph-users Subject: [ceph-users] Re: What'

[ceph-users] Re: What's the best way to add numerous OSDs?

2024-08-06 Thread Anthony D'Atri
Since they’re 20TB, I’m going to assume that these are HDDs. There are a number of approaches. One common theme is to avoid rebalancing until after all have been added to the cluster and are up / in, otherwise you can end up with a storm of map updates and superfluous rebalancing. One strateg

[ceph-users] Re: Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

2024-08-06 Thread Justin Lee
The actual mount command doesn't hang, we just can't interact with any of the directory's contents once mounted. I couldn't find anything unusual in the logs. Best, Justin Lee On Fri, Aug 2, 2024 at 10:38 AM Dhairya Parmar wrote: > So the mount hung? Can you see anything suspicious in the logs?

[ceph-users] RGW sync gets stuck every day

2024-08-06 Thread Olaf Seibert
Hi all, we have some Ceph clusters with RGW replication between them. It seems that in the last month at least, it gets stuck at around the same time ~every day. Not 100% the same time, and also not 100% of the days, but in the more recent days seem to happen more, and for longer. With "stuc

[ceph-users] Re: Can you return orphaned objects to a bucket?

2024-08-06 Thread vuphung69
Hi, Currently I see it only supports the latest version, is there any way to support old versions like Pacific or Quincy? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] RGW bucket notifications stop working after a while and blocking requests

2024-08-06 Thread Florian Schwab
Hi, we just set up 2 new ceph clusters (using rook). To do some processing of the user activity we configured a topic that sends events to Kafka. After 5-12 hours this stops working with a 503 SlowDown response: debug 2024-08-02T09:17:58.205+ 7ff4359ad700 1 req 13681579273117692719 0.005000

[ceph-users] Re: Ceph MDS failing because of corrupted dentries in lost+found after update from 17.2.7 to 18.2.0

2024-08-06 Thread Justin Lee
Hi Dhairya, Thanks for the response! We tried removing it as you suggested with `rm -rf` but the command just hangs indefinitely with no output. We are also unable to `ls lost_found`, or otherwise interact with the directory's contents. Best, Justin lee On Fri, Aug 2, 2024 at 8:24 AM Dhairya Par

[ceph-users] Cephadm: unable to copy ceph.conf.new

2024-08-06 Thread Magnus Larsen
Hi Ceph-users! Ceph version: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Using cephadm to orchestrate the Ceph cluster I’m running into https://tracker.ceph.com/issues/59189, which is fixed in next version—quincy 17.2.7—via https://github.com/ceph/ceph/pull/50

[ceph-users] Re: Pull failed on cluster upgrade

2024-08-06 Thread Adam King
If you're using VMs, https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6X6QIEMWDYSA6XOKEYH5OJ4TIQSBD5BL/ might be relevant On Tue, Aug 6, 2024 at 3:21 AM Nicola Mori wrote: > I think I found the problem. Setting the cephadm log level to debug and > then watching the logs during th

[ceph-users] Re: Pull failed on cluster upgrade

2024-08-06 Thread David Orman
What operating system/distribution are you running? What hardware? David On Tue, Aug 6, 2024, at 02:20, Nicola Mori wrote: > I think I found the problem. Setting the cephadm log level to debug and > then watching the logs during the upgrade: > >ceph config set mgr mgr/cephadm/log_to_cluster_

[ceph-users] What's the best way to add numerous OSDs?

2024-08-06 Thread Fabien Sirjean
Hello everyone, We need to add 180 20TB OSDs to our Ceph cluster, which currently consists of 540 OSDs of identical size (replicated size 3). I'm not sure, though: is it a good idea to add all the OSDs at once? Or is it better to add them gradually? The idea is to minimize the impact of reb

[ceph-users] Re: [EXTERNAL] RGW bucket notifications stop working after a while and blocking requests

2024-08-06 Thread Florian Schwab
Looks like the issue was fixed in the latest reef release (18.2.4) I found the following commit that seams to fix it: https://github.com/ceph/ceph/commit/26f1d6614bbc45a0079608718f191f94bd4eebb6 After upgrading we also haven’t encountered the problem again. Cheers, Florian > On 5. Aug 2024, at

[ceph-users] Ceph Developer Summit (Tentacle) Aug 12-19

2024-08-06 Thread Noah Lehman
Hi Ceph users, The next Ceph Developer Summit is happening virtually from August 12 – 19, 2024 adn we want to see you there. The focus of the summit will include planning around our next release, Tentacle, and everyone in our community is welcome to participate! Learn more and RSVP here: https://

[ceph-users] Re: Recovering from total mon loss and backing up lockbox secrets

2024-08-06 Thread Christian Rohmann
On 06.08.24 1:19 PM, Boris wrote: I am in the process of creating disaster recovery documentation and I have two topics where I am not sure how to do it or even if it is possible. Is it possible to recover from a 100% mon data loss? Like all mons fail and the actual mon data is not recoverable.

[ceph-users] Re: Osds going down/flapping after Luminous to Nautilus upgrade part 1

2024-08-06 Thread Eugen Block
Hi, the upgrade notes for Nautilus [0] contain this section: Running nautilus OSDs will not bind to their v2 address automatically. They must be restarted for that to happen. Regards, Eugen [0] https://docs.ceph.com/en/latest/releases/nautilus/#instructions Zitat von Mark Kirkwood : We ha

[ceph-users] Recovering from total mon loss and backing up lockbox secrets

2024-08-06 Thread Boris
Hi, I am in the process of creating disaster recovery documentation and I have two topics where I am not sure how to do it or even if it is possible. Is it possible to recover from a 100% mon data loss? Like all mons fail and the actual mon data is not recoverable. In my head I would thing that

[ceph-users] Re: Resize RBD - New size not compatible with object map

2024-08-06 Thread Torkil Svensgaard
On 06/08/2024 12:37, Ilya Dryomov wrote: On Tue, Aug 6, 2024 at 11:55 AM Torkil Svensgaard wrote: Hi [ceph: root@ceph-flash1 /]# rbd info rbd_ec/projects rbd image 'projects': size 750 TiB in 196608000 objects order 22 (4 MiB objects) snapshot_count: 0

[ceph-users] Re: Resize RBD - New size not compatible with object map

2024-08-06 Thread Ilya Dryomov
On Tue, Aug 6, 2024 at 11:55 AM Torkil Svensgaard wrote: > > Hi > > [ceph: root@ceph-flash1 /]# rbd info rbd_ec/projects > rbd image 'projects': > size 750 TiB in 196608000 objects > order 22 (4 MiB objects) > snapshot_count: 0 > id: 15a979db61dda7 > da

[ceph-users] Resize RBD - New size not compatible with object map

2024-08-06 Thread Torkil Svensgaard
Hi [ceph: root@ceph-flash1 /]# rbd info rbd_ec/projects rbd image 'projects': size 750 TiB in 196608000 objects order 22 (4 MiB objects) snapshot_count: 0 id: 15a979db61dda7 data_pool: rbd_ec_data block_name_prefix: rbd_data.10.15a979db61dda7

[ceph-users] RGW sync gets stuck every day

2024-08-06 Thread Olaf Seibert
Hi all, we have some Ceph clusters with RGW replication between them. It seems that in the last month at least, it gets stuck at around the same time ~every day. Not 100% the same time, and also not 100% of the days, but in the more recent days seem to happen more, and for longer. With "stuc

[ceph-users] Re: Pull failed on cluster upgrade

2024-08-06 Thread Nicola Mori
I think I found the problem. Setting the cephadm log level to debug and then watching the logs during the upgrade: ceph config set mgr mgr/cephadm/log_to_cluster_level debug ceph -W cephadm --watch-debug I found this line just before the error: ceph: stderr Fatal glibc error: CPU does no