[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Kári Bertilsson
Hello David I have physical devices i can use to mirror the OSD's no problem. But i dont't think those disks are actually failing since there is no bad sector on them and they are brand new with no issues reading from. But they got corrupt OSD superblock which i believe happend because of bad SAS

[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread David Turner
Do you have access to another Ceph cluster with enough available space to create rbds that you dd these failing disks into? That's what I'm doing right now with some failing disks. I've recovered 2 out of 6 osds that failed in this way. I would recommend against using the same cluster for this, but

[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Kári Bertilsson
Hi Paul I was able to mount both OSD's i need data from successfully using "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op fuse --mountpoint /osd92/" I see the PG slices that are missing in the mounted folder "41.b3s3_head" "41.ccs5_head" etc. And i can copy any data from inside

[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Paul Emmerich
First thing I'd try is to use objectstore-tool to scrape the inactive/broken PGs from the dead OSDs using it's PG export feature. Then import these PGs into any other OSD which will automatically recover it. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://cr

[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Kári Bertilsson
Yes ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1 On Tue, May 12, 2020 at 10:39 AM Eugen Block wrote: > Can you share your osd tree and the current ceph status? > > > Zitat von Kári Bertilsson : > > > Hello > > > > I had an incidence where 3 OSD's crashed at once completely an

[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Eugen Block
Can you share your osd tree and the current ceph status? Zitat von Kári Bertilsson : Hello I had an incidence where 3 OSD's crashed at once completely and won't power up. And during recovery 3 OSD's in another host have somehow become corrupted. I am running erasure coding with 8+2 setup usin