[ceph-users] Re: recovery_unfound

2020-02-03 Thread Paul Emmerich
This might be related to recent problems with OSDs not being queried for unfound objects properly in some cases (which I think was fixed in master?) Anyways: run ceph pg query on the affected PGs, check for "might have unfound" and try restarting the OSDs mentioned there. Probably also sufficient

[ceph-users] Re: recovery_unfound

2020-02-04 Thread Jake Grimmett
Hi Paul, Many thanks for your helpful suggestions. Yes, we have 13 pgs with "might_have_unfound" entries. (also 1 pgs without "might_have_unfound" stuck in active+recovery_unfound+degraded+repair state) Taking one pg with unfound objects: [root@ceph1 ~]# ceph health detail | grep 5.5c9 pg

[ceph-users] Re: recovery_unfound

2020-02-04 Thread Chad William Seys
Hi Jake and all, We're having what looks to be the exact same problem. In our case it happened when I was "draining" an OSD for removal. (ceph crush remove...) Adding the OSD back doesn't help workaround the bug. Everything is either triply replicated or EC k3m2, either of which should st

[ceph-users] Re: recovery_unfound

2020-02-05 Thread Jake Grimmett
Hi Chad, In case it's relevant we are on Nautilus 14.2.6, not Mimic. I've followed Paul's advice and issued a "ceph osd down XXX" command for the primary osd in each affected pg. I've also tried doing a systemctl restart for several of the primary osd's, again with no apparent effect. Unfortunate

[ceph-users] Re: recovery_unfound

2020-02-05 Thread Chad William Seys
Hi Jake, In case it's relevant we are on Nautilus 14.2.6, not Mimic. Yeah, my guess is that it is multiversion. Also, my scenario simply should not have lost any data, so don't kick yourself too hard. This command supposedly check cephfs for damaged files. It came back with nothing for me

[ceph-users] Re: recovery_unfound during scrub with auto repair = true

2021-06-13 Thread Dan van der Ster
Hi again, We haven't taken any actions yet, but this seems like it might be a bug. We compared the version numbers with the osdmap epoch at the time the object went unfound -- indeed the osdmap was e3593555 when this PG was marked recovery_unfound: 2021-06-13 03:50:13.808204 mon.cephbeesly-mon-2a

[ceph-users] Re: recovery_unfound during scrub with auto repair = true

2021-06-13 Thread Dan van der Ster
Last update of the day: We decided to stop osd.951 and mark it out. The PG went active, the object was no longer unfound, and backfilling just completed successfully. Now we're deep scrubbing one more time, but it looks good! We'll follow up in the tracker about trying to reproduce this bug and h