[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-21 Thread Sven Kieske
On Mo, 2021-09-20 at 10:29 -0500, Mark Nelson wrote: > At least in one case for us, the user was using consumer grade SSDs > without power loss protection.  I don't think we ever fully diagnosed if > that was the cause though.  Another case potentially was related to high > memory usage on the

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-21 Thread Andrej Filipcic
Hi, Some further investigation on the failed OSDs: 1 out of 8 OSDs actually has hardware issue, [16841006.029332] sd 0:0:10:0: [sdj] tag#96 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s [16841006.037917] sd 0:0:10:0: [sdj] tag#34 FAILED Result: hostbyte=DID_SOFT_ERROR

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Paul Mezzanini
I was doing a rolling upgrade from 14.2.x -> 15.2.x (wait a week) -> 16.2.5.  It was the last jump that had the hiccup. I'm doing the 16.2.5 -> .6 upgrade as I type this.  So far, so good. -paul On 9/20/21 10:02 AM, David Orman wrote: For clarity, was this on upgrading to 16.2.6 from

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Mark Nelson
FWIW, we've had similar reports in the past: https://tracker.ceph.com/issues/37282 https://tracker.ceph.com/issues/48002 https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/2GBK5NJFOSQGMN25GQ3CZNX4W2ZGQV5U/?sort=date https://www.spinics.net/lists/ceph-users/msg59466.html

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Neha Ojha
Can we please create a bluestore tracker issue for this (if one does not exist already), where we can start capturing all the relevant information needed to debug this? Given that this has been encountered in previous 16.2.* versions, it doesn't sound like a regression in 16.2.6 to me, rather an

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic
On 20/09/2021 16:02, David Orman wrote: Same question here, for clarity, was this on upgrading to 16.2.6 from 16.2.5? Or upgrading from some other release? from 16.2.5. but the OSD services were never restarted after upgrade to .5, so it could be a leftover of previous issues. Cheers,

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Sean
In my case it happened after upgrading from v16.2.4 to v16.2.5 a couple months ago. ~ Sean On Sep 20, 2021 at 9:02:45 AM, David Orman wrote: > Same question here, for clarity, was this on upgrading to 16.2.6 from > 16.2.5? Or upgrading > from some other release? > > On Mon, Sep 20, 2021 at

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread David Orman
Same question here, for clarity, was this on upgrading to 16.2.6 from 16.2.5? Or upgrading from some other release? On Mon, Sep 20, 2021 at 8:57 AM Sean wrote: > > I also ran into this with v16. In my case, trying to run a repair totally > exhausted the RAM on the box, and was unable to

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread David Orman
For clarity, was this on upgrading to 16.2.6 from 16.2.5? Or upgrading from some other release? On Mon, Sep 20, 2021 at 8:33 AM Paul Mezzanini wrote: > > I got the exact same error on one of my OSDs when upgrading to 16. I > used it as an exercise on trying to fix a corrupt rocksdb. A spent a

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Sean
I also ran into this with v16. In my case, trying to run a repair totally exhausted the RAM on the box, and was unable to complete. After removing/recreating the OSD, I did notice that it has a drastically smaller OMAP size than the other OSDs. I don’t know if that actually means anything, but

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Paul Mezzanini
I got the exact same error on one of my OSDs when upgrading to 16.  I used it as an exercise on trying to fix a corrupt rocksdb. A spent a few days of poking with no success.  I got mostly tool crashes like you are seeing with no forward progress. I eventually just gave up, purged the OSD,

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic
attached it, but did not work, here it is: https://www-f9.ijs.si/~andrej/ceph/ceph-osd.1049.log-20210920.gz Cheers, Andrej On 9/20/21 9:41 AM, Dan van der Ster wrote: On Sun, Sep 19, 2021 at 4:48 PM Andrej Filipcic wrote: I have attached a part of the osd log. Hi Andrej. Did you mean to

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Dan van der Ster
On Sun, Sep 19, 2021 at 4:48 PM Andrej Filipcic wrote: > I have attached a part of the osd log. Hi Andrej. Did you mean to attach more than the snippets? Could you also send the log of the first startup in 16.2.6 of an now-corrupted osd? Cheers, dan