Re: Datafile Corruption

Jeff Jirsa Thu, 08 Aug 2019 07:28:38 -0700

The corrupt block exception from the compressor in 2.1/2.2 is something I don’t 
recall ever being attributed to anything other than bad hardware, so that seems 
by far the most likely option.


The corruption that the compressor is catching says the checksum written 
immediately after the compressed block doesn’t match when read back. 

How sensitive is this data? Would you be able to send one of the corrupt data 
files to a developer to check? Or is it something like PII you can’t share? 

Have you found this corruption on every single instance? Are you using single 
tokens or vnodes? Is it happening in both dcs? 




> On Aug 8, 2019, at 4:20 AM, Philip Ó Condúin <philipocond...@gmail.com> wrote:
> 
> Hi All,
> 
> Thank you so much for the replies.  
> 
> Currently, I have the following list that can potentially cause some sort of 
> corruption in a Cassandra cluster. 
> 
> Sudden Power cut  -  We have had no power cuts in the datacenters
> Network Issues - no network issues from what I can tell
> Disk full - I don't think this is an issue for us, see disks below.
> An issue in Casandra version like Cassandra-13752 - couldn't find any Jira 
> issues similar to ours.
> Bit Flips - we have compression enabled so I don't think this should be an 
> issue.
> Repair during upgrade has caused corruption too - we have not upgraded
> Dropping and adding columns with the same name but a different type - I will 
> need to ask the apps team how they are using the database.
> 
> 
> Ok, let me try and explain the issue we are having, I am under a lot of 
> pressure from above to get this fixed and I can't figure it out.
> 
> This is a PRE-PROD environment.
> 2 datacenters.
> 9 physical servers in each datacenter
> 4 Cassandra instances on each server
> 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site B.
> 
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> 
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> 
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> 
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> 
> Cassandra load is about 200GB and the rest of the space is snapshots
> 
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> 
> Description of problem:
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
> 
> We have maintenance scripts that run every night running compactions and 
> creating snapshots, I decided to turn these off, fix any corruptions we 
> currently had and ran major compaction on all nodes, once this was done we 
> had a "clean" cluster and we left the cluster for a few days.  After the 
> process we noticed one corruption in the cluster, this datafile was created 
> after I turned off the maintenance scripts so my theory of the scripts 
> causing the issue was wrong.  We then kicked off another repair and started 
> to find more corrupt files created after the maintenance script was turned 
> off.
> 
> 
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> 
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> 
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> 
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:81)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>  ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:169)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>  ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.Iterators$7.computeNext(Iterators.java:645) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>  ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(ColumnIndex.java:174)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.update(LazilyCompactedRow.java:187)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.repair.Validator.rowHash(Validator.java:201) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.repair.Validator.add(Validator.java:150) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1166)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:76)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:736)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: at java.lang.Thread.run(Thread.java:748) 
> [na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: Caused by: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /data/ssd2/data/KeyspaceMetadata/x-x/l
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:216)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:226)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:42)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:352)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: ... 27 common frames omitted
> Aug 07 22:30:33 cassandra[34611]: Caused by: 
> org.apache.cassandra.io.compress.CorruptBlockException: 
> (/data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: ... 30 common frames omitted
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Not a global repair, will 
> not do anticompaction
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Stopping gossiper
> Aug 07 22:30:33 cassandra[34611]: WARN  21:30:33 Stopping gossip by operator 
> request
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Announcing shutdown
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Node /10.2.57.37 state jump 
> to shutdown
> 
> 
> So I went to the file system to see when this corrupt file was created and it 
> was created on July 30th at 15.55
> 
> root@x 0 01:14:03 ~ # ls -l 
> /data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db
> -rw-r--r-- 1 cass_b cass_b 3182243670 Jul 30 15:55 
> /data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db
> 
> 
> 
> So I checked /var/log/messages for errors around that time
> The only thing that stands out to me is the message "Cannot perform a full 
> major compaction as repaired and unrepaired sstables cannot be compacted 
> together", I'm not sure if this would be an issue though and cause corruption.
> 
> Jul 30 15:55:06 x systemd: Created slice User Slice of root.
> Jul 30 15:55:06 x systemd: Started Session c165280 of user root.
> Jul 30 15:55:06 x audispd: node=x. type=USER_START 
> msg=audit(1564498506.021:457933): pid=17533 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_tty_audit,pam_systemd,pam_unix
>  acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
> Jul 30 15:55:06 x systemd: Removed slice User Slice of root.
> Jul 30 15:55:14 x tag_audit_log: type=USER_CMD 
> msg=audit(1564498506.013:457932): pid=17533 uid=509 auid=4294967295 
> ses=4294967295 msg='cwd="/" 
> cmd=2F7573722F7362696E2F69706D692D73656E736F7273202D2D71756965742D6361636865202D2D7364722D63616368652D7265637265617465202D2D696E746572707265742D6F656D2D64617461202D2D6F75747075742D73656E736F722D7374617465202D2D69676E6F72652D6E6F742D617661696C61626C652D73656E736F7273202D2D6F75747075742D73656E736F722D7468726573686F6C6473
>  terminal=? res=success'
> Jul 30 15:55:14 x tag_audit_log: type=USER_START 
> msg=audit(1564498506.021:457933): pid=17533 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_tty_audit,pam_systemd,pam_unix
>  acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
> Jul 30 15:55:19 x cassandra: INFO  14:55:19 Writing 
> Memtable-compactions_in_progress@1462227999(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Jul 30 15:55:19 x cassandra: INFO  14:55:19 Cannot perform a full major 
> compaction as repaired and unrepaired sstables cannot be compacted together. 
> These two set of sstables will be compacted separately.
> Jul 30 15:55:19 x cassandra: INFO  14:55:19 Writing 
> Memtable-compactions_in_progress@1198535528(1.002KiB serialized bytes, 57 
> ops, 0%/0% of on/off-heap limit)
> Jul 30 15:55:20 x cassandra: INFO  14:55:20 Writing 
> Memtable-compactions_in_progress@2039409834(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Jul 30 15:55:24 x audispd: node=x. type=USER_LOGOUT 
> msg=audit(1564498524.409:457934): pid=46620 uid=0 auid=464400029 ses=2747 
> msg='op=login id=464400029 exe="/usr/sbin/sshd" hostname=? addr=? 
> terminal=/dev/pts/0 res=success'
> Jul 30 15:55:24 x audispd: node=x. type=USER_LOGOUT 
> msg=audit(1564498524.409:457935): pid=4878 uid=0 auid=464400029 ses=2749 
> msg='op=login id=464400029 exe="/usr/sbin/sshd" hostname=? addr=? 
> terminal=/dev/pts/1 res=success'
> 
> Jul 30 15:55:57 x systemd: Created slice User Slice of root.
> Jul 30 15:55:57 x systemd: Started Session c165288 of user root.
> Jul 30 15:55:57 x audispd: node=x. type=USER_START 
> msg=audit(1564498557.294:457958): pid=19687 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_tty_audit,pam_systemd,pam_unix
>  acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
> Jul 30 15:55:57 x audispd: node=x. type=USER_START 
> msg=audit(1564498557.298:457959): pid=19690 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_systemd,pam_keyinit,pam_limits,pam_unix 
> acct="cass_b" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? 
> res=success'
> Jul 30 15:55:58 x systemd: Removed slice User Slice of root.
> Jul 30 15:56:02 x cassandra: INFO  14:56:02 Writing 
> Memtable-compactions_in_progress@1532791194(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Jul 30 15:56:02 x cassandra: INFO  14:56:02 Cannot perform a full major 
> compaction as repaired and unrepaired sstables cannot be compacted together. 
> These two set of sstables will be compacted separately.
> Jul 30 15:56:02 x cassandra: INFO  14:56:02 Writing 
> Memtable-compactions_in_progress@1455399453(0.281KiB serialized bytes, 16 
> ops, 0%/0% of on/off-heap limit)
> Jul 30 15:56:04 x tag_audit_log: type=USER_CMD 
> msg=audit(1564498555.190:457951): pid=19294 uid=509 auid=4294967295 
> ses=4294967295 msg='cwd="/" 
> cmd=72756E75736572202D73202F62696E2F62617368202D6C20636173735F62202D632063617373616E6472612D6D6574612F63617373616E6472612F62696E2F6E6F6465746F6F6C2074707374617473
>  terminal=? res=success'
> 
> 
> 
> We have checked a number of other things like NTP setting etc but nothing is 
> telling us what could cause so many corruptions across the entire cluster.
> Things were healthy with this cluster for months, the only thing I can think 
> is that we started loading data from a load of 20GB per instance up to 200GB 
> where it sits now, maybe this just highlighted the issue.
> 
> 
> 
> Compaction and Compression on Keyspace CL's [mixture]
> All CF's are using compression.
> 
> AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
> 
> AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> 
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
> 
> --We are also using internode network compression:
> internode_compression: all
> 
> 
> 
> Does anyone have any idea what I should check next?
> Our next theory is that there may be an issue with Checksum, but I'm not sure 
> where to go with this.
> 
> Any help would be very much appreciated before I lose the last bit of hair I 
> have on my head. 
> 
> Kind Regards,
> Phil
> 
>> On Wed, 7 Aug 2019 at 20:51, Nitan Kainth <nitankai...@gmail.com> wrote:
>> Repair during upgrade have caused corruption too.
>> 
>> Also, dropping and adding columns with same name but different type
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629
>> 
>>> On Aug 7, 2019, at 2:42 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>> 
>>> Is compression enabled?
>>> 
>>> If not, bit flips on disk can corrupt data files and reads + repair may 
>>> send that corruption to other hosts in the cluster 
>>> 
>>> 
>>>> On Aug 7, 2019, at 3:46 AM, Philip Ó Condúin <philipocond...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> I am currently experiencing multiple datafile corruptions across most 
>>>> nodes in my cluster, there seems to be no pattern to the corruption.  I'm 
>>>> starting to think it might be a bug, we're using Cassandra 2.2.13.
>>>> 
>>>> Without going into detail about the issue I just want to confirm something.
>>>> 
>>>> Can someone share with me a list of scenarios that would cause corruption?
>>>> 
>>>> 1. OS failure
>>>> 2. Cassandra disturbed during the writing 
>>>> 
>>>> etc etc.
>>>> 
>>>> I need to investigate each scenario and don't want to leave any out.
>>>> 
>>>> -- 
>>>> Regards,
>>>> Phil
> 
> 
> -- 
> Regards,
> Phil

Re: Datafile Corruption

Reply via email to