[ceph-users] Scrub failing all the time, new inconsistencies keep appearing

Gonzalo Aguilar Delgado Thu, 14 Sep 2017 06:40:13 -0700

Hello,

I'm using ceph since long time ago. A day ago added jewel requirementfor OSD. And upgraded crush map.

From this time I had all kind of errors, maybe because disks failingbecause rebalances or because there's a problem I don't know.

I have some pg active+clean+inconsistent, from different volumens. WhenI try to repair or do scrub I get:

2017-09-14 15:24:32.139215 [ERR] 9.8b shard 2: soid9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head data_digest 0x903e1482!= data_digest 0x4d4e39be from auth oi9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head(3982'375882osd.1.0:2494526 dirty|data_digest|omap_digest s 4194304 uv 375794 dd4d4e39be od ffffffff)2017-09-14 15:24:32.139220 [ERR] 9.8b shard 6: soid9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head data_digest 0x903e1482!= data_digest 0x4d4e39be from auth oi9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head(3982'375882osd.1.0:2494526 dirty|data_digest|omap_digest s 4194304 uv 375794 dd4d4e39be od ffffffff)2017-09-14 15:24:32.139222 [ERR] 9.8b soid9:d1c72806:::rb.0.21dc.238e1f29.0000000125ae:head: failed to picksuitable auth object

I removed one of the OSD and added a bigger one to the cluster. Butstill had the old authority disk in the machine. (But I removed fromcrush map and all as documentation says). Mine is a small cluster and Iknow it tends to be more critical since not enough replicas if somethinggoes wrong:



ID WEIGHT  TYPE NAME                 UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 4.27299 root default
-4 4.27299     rack rack-1
-2 1.00000         host blue-compute
 0 1.00000             osd.0              up  1.00000 1.00000
 2 1.00000             osd.2              up  1.00000 1.00000
-3 3.27299         host red-compute
 4 1.00000             osd.4              up  1.00000 1.00000
 3 1.36380             osd.3              up  1.00000 1.00000
 6 0.90919             osd.6              up  1.00000 1.00000

the old osd.1 still in machine red-compute but outside the cluster. Irepeat. My question is.



With this kind of error. Is anything I can do to recover from the error?

Second. If I cannot find an authority pg on the cluster, in osd.2 andosd.6 how can I fix it? Can I get it from the old osd.1. How?


> ceph pg map 9.8b
  osdmap e7049 pg 9.8b (9.8b) -> up [6,2] acting [6,2]

> rados list-inconsistent-pg high_value
["9.8b"]

Any help on this?


Thank you in advance.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Scrub failing all the time, new inconsistencies keep appearing

Reply via email to