Re: Datafile Corruption

2019-10-09 Thread Laxmikant Upadhyay
Cool ! Thanks for sharing the RCA. On Wed, Oct 9, 2019 at 2:56 PM Philip Ó Condúin wrote: > Just to follow up on this issue as others may see it in the future, we > cracked it! > > Ou datafile corruption issues were a problem with the OS wrongly taking > one block belonging to

Re: Datafile Corruption

2019-10-09 Thread Philip Ó Condúin
Just to follow up on this issue as others may see it in the future, we cracked it! Ou datafile corruption issues were a problem with the OS wrongly taking one block belonging to a C* data file thinking it was no longer used and treating it as a free block that would later be used. For example: C

Re: Datafile Corruption

2019-08-14 Thread Patrick McFadin
/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db > > /dev/null" > If you get an error message, it's probably a hardware issue. > > - Erik - > > -- > *From:* Philip Ó Condúin > *Sent:* Thursday, August 8, 2019 09:58 > *To:* user@cassandra

Re: Datafile Corruption

2019-08-14 Thread Forkalsrud, Erik
ardware issue. - Erik - From: Philip Ó Condúin Sent: Thursday, August 8, 2019 09:58 To: user@cassandra.apache.org Subject: Re: Datafile Corruption Hi Jon, Good question, I'm not sure if we're using NVMe, I don't see /dev/nvme but we could still be using it. W

Re: Datafile Corruption

2019-08-09 Thread Philip Ó Condúin
8 callbacks suppressed >>> Jul 9 03:00:43 kernel: fnic_handle_fip_timer: 8 callbacks suppressed >>> >>> >>> >>> On Thu, 8 Aug 2019 at 15:42, ZAIDI, ASAD A wrote: >>> >>>> Did you check if packets are NOT being dropped for network interfaces

Re: Datafile Corruption

2019-08-08 Thread Philip Ó Condúin
faces >>> Cassandra instances are consuming (ifconfig –a) internode compression is >>> set for all endpoint – may be network is playing any role here? >>> >>> is this corruption limited so certain keyspace/table | DCs or is that >>> wide spread – the

Re: Datafile Corruption

2019-08-08 Thread Jon Haddad
cific >> keyspace/table is affected – is that correct? >> >> When you remove corrupted sstable of a certain table, I guess you >> verifies all nodes for corrupted sstables for same table (may be with with >> nodetool scrub tool) so to limit spread of corruptions – right? >> >> Ju

Re: Datafile Corruption

2019-08-08 Thread Philip Ó Condúin
mit spread of corruptions – right? > > Just curious to know – you’re not using lz4/default compressor for all > tables there must be some reason for it. > > > > > > > > *From:* Philip Ó Condúin [mailto:philipocond...@gmail.com] > *Sent:* Thursday, August 08, 201

RE: Datafile Corruption

2019-08-08 Thread ZAIDI, ASAD A
– right? Just curious to know – you’re not using lz4/default compressor for all tables there must be some reason for it. From: Philip Ó Condúin [mailto:philipocond...@gmail.com] Sent: Thursday, August 08, 2019 6:20 AM To: user@cassandra.apache.org Subject: Re: Datafile Corruption Hi All, Thank

Re: Datafile Corruption

2019-08-08 Thread Jeff Jirsa
The corrupt block exception from the compressor in 2.1/2.2 is something I don’t recall ever being attributed to anything other than bad hardware, so that seems by far the most likely option. The corruption that the compressor is catching says the checksum written immediately after the

Re: Datafile Corruption

2019-08-08 Thread Philip Ó Condúin
Hi All, Thank you so much for the replies. Currently, I have the following list that can potentially cause some sort of corruption in a Cassandra cluster. - Sudden Power cut - *We have had no power cuts in the datacenters* - Network Issues - *no network issues from what I can tell*

Re: Datafile Corruption

2019-08-07 Thread Nitan Kainth
Repair during upgrade have caused corruption too. Also, dropping and adding columns with same name but different type Regards, Nitan Cell: 510 449 9629 > On Aug 7, 2019, at 2:42 PM, Jeff Jirsa wrote: > > Is compression enabled? > > If not, bit flips on disk can corrupt data files and reads

Re: Datafile Corruption

2019-08-07 Thread Jeff Jirsa
Is compression enabled? If not, bit flips on disk can corrupt data files and reads + repair may send that corruption to other hosts in the cluster > On Aug 7, 2019, at 3:46 AM, Philip Ó Condúin wrote: > > Hi All, > > I am currently experiencing multiple datafile corruptions across most

Re: Datafile Corruption

2019-08-07 Thread Laxmikant Upadhyay
Few for reasons: Sudden Power cut Disk full Issue in casandra version like Cassandra-13752 On Wed, Aug 7, 2019, 4:16 PM Philip Ó Condúin wrote: > Hi All, > > I am currently experiencing multiple datafile corruptions across most > nodes in my cluster, there seems to be no pattern to the

Datafile Corruption

2019-08-07 Thread Philip Ó Condúin
Hi All, I am currently experiencing multiple datafile corruptions across most nodes in my cluster, there seems to be no pattern to the corruption. I'm starting to think it might be a bug, we're using Cassandra 2.2.13. Without going into detail about the issue I just want to confirm something.