Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-31 Thread Goncalo Borges
Hi Kenneth, All Just an update for completeness on this topic. We have been hit again by this issue. I have been discussing it with Brad (RH staff) in another ML thread, and I have opened a tracker issue: http://tracker.ceph.com/issues/17177 I believe this is a bug since there are other peop

Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-24 Thread Gregory Farnum
On Tue, Aug 9, 2016 at 11:15 PM, Goncalo Borges wrote: > Hi Greg... > > Thanks for replying, You seem omnipresent in all ceph/cephfs issues! > > Can you please confirm that, in Jewel, 'ceph pg repair' simply copies the pg > contents of the primary osd to the others? And that can lead to data > cor

Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-09 Thread Goncalo Borges
Hi Greg... Thanks for replying, You seem omnipresent in all ceph/cephfs issues! Can you please confirm that, in Jewel, 'ceph pg repair' simply copies the pg contents of the primary osd to the others? And that can lead to data corruption if the problematic osd is indeed the primary? If in Jew

Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-09 Thread Gregory Farnum
On Tue, Aug 9, 2016 at 2:00 AM, Kenneth Waegeman wrote: > Hi, > > I did a diff on the directories of all three the osds, no difference .. So I > don't know what's wrong. omap (as implied by the omap_digest complaint) is stored in the OSD leveldb, not in the data directories, so you wouldn't expec

Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-09 Thread Kenneth Waegeman
Hi, I did a diff on the directories of all three the osds, no difference .. So I don't know what's wrong. Only thing I see different is a scrub file in the TEMP folder (it is already another pg than last mail): -rw-r--r--1 ceph ceph 0 Aug 9 09:51 scrub\u6.107__head_0107__f

Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-08 Thread Goncalo Borges
Hi Kenneth... The previous default behavior of 'ceph pg repair' was to copy the pg objects from the primary osd to others. Not sure if it is till the case in Jewel. For this reason, once we get these kind of errors in a data pool, the best practice is to compare the md5 checksums of the damage

[ceph-users] how to debug pg inconsistent state - no ioerrors seen

2016-08-08 Thread Kenneth Waegeman
Hi all, Since last week, some pg's are going in the inconsistent state after a scrub error. Last week we had 4 pgs in that state, They were on different OSDS, but all of the metadata pool. I did a pg repair on them, and all were healthy again. But now again one pg is inconsistent. with healt