Cool ! Thanks for sharing the RCA.
On Wed, Oct 9, 2019 at 2:56 PM Philip Ó Condúin
wrote:
> Just to follow up on this issue as others may see it in the future, we
> cracked it!
>
> Ou datafile corruption issues were a problem with the OS wrongly taking
> one block belonging to
Just to follow up on this issue as others may see it in the future, we
cracked it!
Ou datafile corruption issues were a problem with the OS wrongly taking one
block belonging to a C* data file thinking it was no longer used and
treating it as a free block that would later be used.
For example:
C
/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db >
> /dev/null"
> If you get an error message, it's probably a hardware issue.
>
> - Erik -
>
> --
> *From:* Philip Ó Condúin
> *Sent:* Thursday, August 8, 2019 09:58
> *To:* user@cassandra
ardware issue.
- Erik -
From: Philip Ó Condúin
Sent: Thursday, August 8, 2019 09:58
To: user@cassandra.apache.org
Subject: Re: Datafile Corruption
Hi Jon,
Good question, I'm not sure if we're using NVMe, I don't see /dev/nvme but we
could still be using it.
W
8 callbacks suppressed
>>> Jul 9 03:00:43 kernel: fnic_handle_fip_timer: 8 callbacks suppressed
>>>
>>>
>>>
>>> On Thu, 8 Aug 2019 at 15:42, ZAIDI, ASAD A wrote:
>>>
>>>> Did you check if packets are NOT being dropped for network interfaces
faces
>>> Cassandra instances are consuming (ifconfig –a) internode compression is
>>> set for all endpoint – may be network is playing any role here?
>>>
>>> is this corruption limited so certain keyspace/table | DCs or is that
>>> wide spread – the
cific
>> keyspace/table is affected – is that correct?
>>
>> When you remove corrupted sstable of a certain table, I guess you
>> verifies all nodes for corrupted sstables for same table (may be with with
>> nodetool scrub tool) so to limit spread of corruptions – right?
>>
>> Ju
mit spread of corruptions – right?
>
> Just curious to know – you’re not using lz4/default compressor for all
> tables there must be some reason for it.
>
>
>
>
>
>
>
> *From:* Philip Ó Condúin [mailto:philipocond...@gmail.com]
> *Sent:* Thursday, August 08, 201
– right?
Just curious to know – you’re not using lz4/default compressor for all tables
there must be some reason for it.
From: Philip Ó Condúin [mailto:philipocond...@gmail.com]
Sent: Thursday, August 08, 2019 6:20 AM
To: user@cassandra.apache.org
Subject: Re: Datafile Corruption
Hi All,
Thank
The corrupt block exception from the compressor in 2.1/2.2 is something I don’t
recall ever being attributed to anything other than bad hardware, so that seems
by far the most likely option.
The corruption that the compressor is catching says the checksum written
immediately after the
Hi All,
Thank you so much for the replies.
Currently, I have the following list that can potentially cause some sort
of corruption in a Cassandra cluster.
- Sudden Power cut - *We have had no power cuts in the datacenters*
- Network Issues - *no network issues from what I can tell*
Repair during upgrade have caused corruption too.
Also, dropping and adding columns with same name but different type
Regards,
Nitan
Cell: 510 449 9629
> On Aug 7, 2019, at 2:42 PM, Jeff Jirsa wrote:
>
> Is compression enabled?
>
> If not, bit flips on disk can corrupt data files and reads
Is compression enabled?
If not, bit flips on disk can corrupt data files and reads + repair may send
that corruption to other hosts in the cluster
> On Aug 7, 2019, at 3:46 AM, Philip Ó Condúin wrote:
>
> Hi All,
>
> I am currently experiencing multiple datafile corruptions across most
Few for reasons:
Sudden Power cut
Disk full
Issue in casandra version like Cassandra-13752
On Wed, Aug 7, 2019, 4:16 PM Philip Ó Condúin
wrote:
> Hi All,
>
> I am currently experiencing multiple datafile corruptions across most
> nodes in my cluster, there seems to be no pattern to the
Hi All,
I am currently experiencing multiple datafile corruptions across most nodes
in my cluster, there seems to be no pattern to the corruption. I'm
starting to think it might be a bug, we're using Cassandra 2.2.13.
Without going into detail about the issue I just want to confirm something.
15 matches
Mail list logo