[ceph-users] MDS stuck in replay and continually crashing during replay

2024-10-03 Thread Ivan Clayson
disaster recovery again (https://docs.ceph.com/en/reef/cephfs/disaster-recovery-experts/#disaster-recovery-experts) as this is the 2nd time this has happened to this cluster in the last 4 months and it took over a month to recover the data last time Kindest regards, Ivan -- Iva

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-10 Thread Ivan Clayson
appease the upgrade process. Probably won't solve your problem, but at least you'll be able to move fairly painlessly to a better-supported platform. Best Regards, Tim On Tue, 2024-07-09 at 11:14 +0100, Ivan Clayson wrote: Hi Dhairya, I would be more than happy to try and gi

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Ivan Clayson
[1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519 On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson wrote: Hi Dhairya, Thank you ever so much for having another look at this so quickly. I don't think I have any logs similar to the ones you referenced this time

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-08 Thread Ivan Clayson
for this? The other log that you shared is being downloaded right now, once that's done and I'm done going through it, I'll update you. [0] https://tracker.ceph.com/issues/54546 On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson wrote: Hi Dhairya, Sorry to resurrect this

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-08 Thread Ivan Clayson
edhat.com-.* Do not click links or open attachments unless you recognize the sender and know the content is safe. If you think this is a phishing email, please forward it to phish...@mrc-lmb.cam.ac.uk -- On Fri, Jun 28, 2024 at 6:02 PM Ivan Clayson wrote: Hi Dhairya, I would be

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-28 Thread Ivan Clayson
ing in OSDs taking a significant percentage of CPU. Do let me know how this goes. On Thu, Jun 27, 2024 at 3:44 PM Ivan Clayson wrote: Hi Dhairya, We can induce the crash by simply restarting the MDS and the crash seems to happen when an MDS goes from up:standby to up:replay. The MDS

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-27 Thread Ivan Clayson
18, Dhairya Parmar wrote: CAUTION: This email originated from outside of the LMB: *.-dpar...@redhat.com-.* Do not click links or open attachments unless you recognize the sender and know the content is safe. If you think this is a phishing email, please forward it to phish...@mrc-lmb.ca

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-25 Thread Ivan Clayson
s/66251 -- ***Dhairya Parmar* Associate Software Engineer, CephFS <https://www.redhat.com/>IBM, Inc. On Mon, Jun 24, 2024 at 8:54 PM Ivan Clayson wrote: Hello, We have been experiencing a serious issue with our CephFS backup cluster running quincy (version 17.2.7)

[ceph-users] CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-06-24 Thread Ivan Clayson
ld the entire metadata pool if we could avoid it (once was enough for us) as this cluster has ~9 PB of data on it. Kindest regards, Ivan Clayson -- Ivan Clayson - Scientific Computing Officer Room 2N249 Structural Studies MRC Laboratory of Molecular Biology Francis Crick Ave,

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE, MDS_SLOW_METADATA_IO, and MDS_SLOW_REQUEST errors and slow osd_ops despite hardware being fine

2024-03-19 Thread Ivan Clayson
24 18:07, Gregory Farnum wrote: CAUTION: This email originated from outside of the LMB: *.-gfar...@redhat.com-.* Do not click links or open attachments unless you recognize the sender and know the content is safe. If you think this is a phishing email, please forward it to phish...@mrc-lmb.cam.ac.u

[ceph-users] MDS_CLIENT_LATE_RELEASE, MDS_SLOW_METADATA_IO, and MDS_SLOW_REQUEST errors and slow osd_ops despite hardware being fine

2024-03-15 Thread Ivan Clayson
a way to stop these from happening as we are having to solve these nearly daily now and we can't seem to find a way to reduce them. We do use snapshots to backup our cluster where we've been doing this for ~6 months and these issues have only been occurring on and off for a couple

[ceph-users] Re: Clients failing to respond to capability release

2023-10-12 Thread Ivan Clayson
see whether there are any timeout or uncorrectable errors. We would also be very eager to hear if these approaches are sub-optimal and whether anyone else has any insight into our problems. Sorry as well for resurrecting an old thread but we thought our experiences may be helpfully for others!