Hello,
I'm using ceph since long time ago. A day ago added jewel requirement
for OSD. And upgraded crush map.
From this time I had all kind of errors, maybe because disks failing
because rebalances or because there's a problem I don't know.
I have some pg active+clean+inconsistent, from different volumens. When
I try to repair or do scrub I get:
2017-09-14 15:24:32.139215 [ERR] 9.8b shard 2: soid
9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head data_digest 0x903e1482
!= data_digest 0x4d4e39be from auth oi
9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head(3982'375882
osd.1.0:2494526 dirty|data_digest|omap_digest s 4194304 uv 375794 dd
4d4e39be od ffffffff)
2017-09-14 15:24:32.139220 [ERR] 9.8b shard 6: soid
9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head data_digest 0x903e1482
!= data_digest 0x4d4e39be from auth oi
9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head(3982'375882
osd.1.0:2494526 dirty|data_digest|omap_digest s 4194304 uv 375794 dd
4d4e39be od ffffffff)
2017-09-14 15:24:32.139222 [ERR] 9.8b soid
9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head: failed to pick
suitable auth object
I removed one of the OSD and added a bigger one to the cluster. But
still had the old authority disk in the machine. (But I removed from
crush map and all as documentation says). Mine is a small cluster and I
know it tends to be more critical since not enough replicas if something
goes wrong:
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 4.27299 root default
-4 4.27299 rack rack-1
-2 1.00000 host blue-compute
0 1.00000 osd.0 up 1.00000 1.00000
2 1.00000 osd.2 up 1.00000 1.00000
-3 3.27299 host red-compute
4 1.00000 osd.4 up 1.00000 1.00000
3 1.36380 osd.3 up 1.00000 1.00000
6 0.90919 osd.6 up 1.00000 1.00000
the old osd.1 still in machine red-compute but outside the cluster. I
repeat. My question is.
With this kind of error. Is anything I can do to recover from the error?
Second. If I cannot find an authority pg on the cluster, in osd.2 and
osd.6 how can I fix it? Can I get it from the old osd.1. How?
> ceph pg map 9.8b
osdmap e7049 pg 9.8b (9.8b) -> up [6,2] acting [6,2]
> rados list-inconsistent-pg high_value
["9.8b"]
Any help on this?
Thank you in advance.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com