Hi Steve, I was just about to follow your steps[0] with the ceph-objectstore-tool, (I do not want to remove more snapshots)
So I have this error pg 17.36 is active+clean+inconsistent, acting [7,29,12] 2019-09-02 14:17:34.175139 7f9b3f061700 -1 log_channel(cluster) log [ERR] : deep-scrub 17.36 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head : expected clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 missing I removed the snapshot with snapshot id 4, did a pg repair without any result. I am trying to understand this command of yours, ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-229/ --pgid 2.9a6 '{"oid":"rb.0.2479b45.238e1f29","snapid":-2,"hash":2320771494,"max":0,"p ool":2,"namespace":"","max":0}' I think you are getting this info from the --op list not? And grep for the "rbd_data.1f114174b0dc51.0000000000000974" occurance? I have these entries on osd.29 ["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna pid":63,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}] ["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna pid":-2,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}] So I guess snapid's with -2 are bad? I have noticed actually quite a few -2 listings in these op list output, and do not understand why there are so many and the cluster is healthy except for this pg 17.36. [0] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47212.html -----Original Message----- From: Steve Anthony [mailto:sma...@lehigh.edu] Sent: vrijdag 16 november 2018 17:44 To: ceph-us...@lists.ceph.com Subject: Re: [ceph-users] pg 17.36 is active+clean+inconsistent head expected clone 1 missing? Looks similar to a problem I had after a several OSDs crashed while trimming snapshots. In my case, the primary OSD thought the snapshot was gone, but some of the replicas are still there, so scrubbing flags it. First I purged all snapshots and then ran ceph pg repair on the problematic placement groups. The first time I encountered this, that action was sufficient to repair the problem. The second time however, I ended up having to manually remove the snapshot objects. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027431.html Once I had done that, repair the placement group fixed the issue. -Steve On 11/16/2018 04:00 AM, Marc Roos wrote: > > > I am not sure that is going to work, because I have this error quite > some time, from before I added the 4th node. And on the 3 node cluster > it was: > > osdmap e18970 pg 17.36 (17.36) -> up [9,0,12] acting [9,0,12] > > If I understand correctly what you intent to do, moving the data around. > This was sort of accomplished by adding the 4th node. > > > > -----Original Message----- > From: Frank Yu [mailto:flyxia...@gmail.com] > Sent: vrijdag 16 november 2018 3:51 > To: Marc Roos > Cc: ceph-users > Subject: Re: [ceph-users] pg 17.36 is active+clean+inconsistent head > expected clone 1 missing? > > try to restart osd.29, then use pg repair. If this doesn't work or it > appear again after a while, scan your HDD which used for osd.29, maybe > there is bad sector of your disks, just replace the disk with new one. > > > > On Thu, Nov 15, 2018 at 5:00 PM Marc Roos <m.r...@f1-outsourcing.eu> > wrote: > > > > Forgot, these are bluestore osds > > > > -----Original Message----- > From: Marc Roos > Sent: donderdag 15 november 2018 9:59 > To: ceph-users > Subject: [ceph-users] pg 17.36 is active+clean+inconsistent head > expected clone 1 missing? > > > > I thought I will give it another try, asking again here since there > is > another thread current. I am having this error since a year or so. > > This I of course already tried: > ceph pg deep-scrub 17.36 > ceph pg repair 17.36 > > > [@c01 ~]# rados list-inconsistent-obj 17.36 > {"epoch":24363,"inconsistents":[]} > > > [@c01 ~]# ceph pg map 17.36 > osdmap e24380 pg 17.36 (17.36) -> up [29,12,6] acting [29,12,6] > > > [@c04 ceph]# zgrep ERR ceph-osd.29.log*gz > ceph-osd.29.log-20181114.gz:2018-11-13 14:19:55.766604 7f25a05b1700 > -1 > log_channel(cluster) log [ERR] : deep-scrub 17.36 > 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head > expected > clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 > missing > ceph-osd.29.log-20181114.gz:2018-11-13 14:24:55.943454 7f25a05b1700 > -1 > log_channel(cluster) log [ERR] : 17.36 deep-scrub 1 errors > > > _______________________________________________ > ceph-users mailing list > ceph-us...@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-us...@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- Steve Anthony LTS HPC Senior Analyst Lehigh University sma...@lehigh.edu _______________________________________________ ceph-users mailing list ceph-us...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io