[ceph-users] Re: reef 18.2.3 QE validation status

2024-05-28 Thread Yuri Weinstein
We have discovered some issues (#1 and #2) during the final stages of testing that require considering a delay in this point release until all options and risks are assessed and resolved. We will keep you all updated on the progress. Thank you for your patience! #1

[ceph-users] Rocky 8 to Rocky 9 upgrade and ceph without data loss

2024-05-28 Thread Christopher Durham
I have both a small test cluster and a larger production cluster. They are (were, for the test cluster) running Rocky Linux 8.9. They are both updated originally from Pacific, currently at reef 18.2.2.These are all rpm installs. It has come time to consider upgrades to Rocky 9.3. As there is

[ceph-users] Re: ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
On 28/05/2024 17:07, Wesley Dillingham wrote: What is the state of your PGs? could you post "ceph -s" PGs all good: root@moss-be1001:/# ceph -s cluster: id: d7849d66-183c-11ef-b973-bc97e1bb7c18 health: HEALTH_WARN 1 stray daemon(s) not managed by cephadm services:

[ceph-users] Re: ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Wesley Dillingham
What is the state of your PGs? could you post "ceph -s" I believe (but a bit of an assumption after encountering something similar myself) that under the hood cephadm is using the "ceph osd safe-to-destroy osd.X" command and when OSD.X is no longer running and not all PGs are active+clean (for

[ceph-users] ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
Hi, I want to prepare a failed disk for replacement. I did: ceph orch osd rm 35 --zap --replace and it's now in the state "Done, waiting for purge", with 0 pgs, and REPLACE and ZAP set to true. It's been like this for some hours, and now my cluster is unhappy: [WRN] CEPHADM_STRAY_DAEMON: 1

[ceph-users] Re: OSD processes crashes on repair 'unexpected clone'

2024-05-28 Thread Thomas Björklund
Sorry, not sure what happened with the formatting, pasting the whole contents again. Hi, We have an old cluster with 3 nodes running ceph version 15.2.17. We have a PG in state active+clean+inconsistent which we are unable to repair. It's an RBD pool in use by kubernetes. The earliest

[ceph-users] Help needed! First MDs crashing, then MONs. How to recover ?

2024-05-28 Thread Noe P.
Hi, we ran into a bigger problem today with our ceph cluster (Quincy, Alma8.9). We have 4 filesystems and a total of 6 MDs, the largest fs having two ranks assigned (i.e. one standby). Since we often have the problem of MDs lagging behind, we restart the MDs occasionally. Helps ususally, the

[ceph-users] OSD processes crashes on repair 'unexpected clone'

2024-05-28 Thread Thomas Björklund
Hi, We have an old cluster with 3 nodes running ceph version 15.2.17. We have a PG in state active+clean+inconsistent which we are unable to repair. It's an RBD pool in use by kubernetes. The earliest indication of the issue comes from ceph-osd.4.log on one of the nodes:

[ceph-users] Re: Safe method to perform failback for RBD on one way mirroring.

2024-05-28 Thread Eugen Block
Hi, I think there might be a misunderstanding about one-way-mirroring. It really only mirrors one way, from A to B. In case site A fails, you can promote the images in B and continue using those images. But there's no automated way back, because it's only one way. When site A comes back,