[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
Dear Michael, this is a bit of a nut. I can't see anything obvious. I have two hypotheses that you might consider testing. 1) Problem with 1 incomplete PG. In the shadow hierarchy for your cluster I can see quite a lot of nodes like { "id": -135, "name":

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
Dear Michael, > Please mark OSD 41 as "in" again and wait for some slow ops to show up. I forgot. "wait for some slow ops to show up" ... and then what? Could you please go to the host of the affected OSD and look at the output of "ceph daemon osd.ID ops" or "ceph daemon osd.ID

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
Dear Michael, thanks for this initial work. I will need to look through the files you posted in more detail. In the meantime: Please mark OSD 41 as "in" again and wait for some slow ops to show up. As far as I can see, marking it "out" might have cleared hanging slow ops (there were 1000