[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-12-02 Thread Paul Emmerich
On Mon, Dec 2, 2019 at 4:55 PM Simon Ironside wrote: > > Any word on 14.2.5? Nervously waiting here . . . real soon, the release is 99% done (check the corresponding thread on the devel mailing list) Paul > > Thanks, > Simon. > > On 18/11/2019 11:29, Simon Ironside wrote: > > > I will sit tig

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-12-02 Thread Simon Ironside
Any word on 14.2.5? Nervously waiting here . . . Thanks, Simon. On 18/11/2019 11:29, Simon Ironside wrote: I will sit tight and wait for 14.2.5. Thanks again, Simon. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-18 Thread Simon Ironside
Hi Igor, Thanks very much for providing all this detail. On 18/11/2019 10:43, Igor Fedotov wrote: - Check how full their DB devices are? For your case it makes sense to check this. And then safely wait for 14.2.5 if its not full. bluefs.db_used_bytes / bluefs_db_total_bytes is only around 1

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-18 Thread Igor Fedotov
Hi Simon, On 11/15/2019 6:02 PM, Simon Ironside wrote: Hi Igor, On 15/11/2019 14:22, Igor Fedotov wrote: Do you mean both standalone DB and(!!) standalone WAL devices/partitions by having SSD DB/WAL? No, 1x combined DB/WAL partition on an SSD and 1x data partition on an HDD per OSD. I.e. c

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-15 Thread Simon Ironside
Hi Igor, On 15/11/2019 14:22, Igor Fedotov wrote: Do you mean both standalone DB and(!!) standalone WAL devices/partitions by having SSD DB/WAL? No, 1x combined DB/WAL partition on an SSD and 1x data partition on an HDD per OSD. I.e. created like: ceph-deploy osd create --data /dev/sda --b

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-15 Thread Igor Fedotov
Hi Simon, Do you mean both standalone DB and(!!) standalone WAL devices/partitions by having SSD DB/WAL? If so then BlueFS might eventually overwrite some data at you DB volume with BlueFS log content. Which most probably makes OSD crash and unable to restart one day. This is quite random an

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-15 Thread Simon Ironside
Hi, I have two new-ish 14.2.4 clusters that began life on 14.2.0 , all with HDD OSDs with SSD DB/WALs but neither have experienced obvious problems yet. What's the impact of this? Does possible data corruption mean possible silent data corruption? Or does the corruption cause the OSD failures

[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-14 Thread Mark Nelson
Great job tracking this down to everyone involved! Mark On 11/14/19 10:10 AM, Sage Weil wrote: Hi everyone, We've identified a data corruption bug[1], first introduced[2] (by yours truly) in 14.2.3 and affecting both 14.2.3 and 14.2.4. The corruption appears as a rocksdb checksum error or as