Hi everyone, We are running some Cassandra clusters (Usually a cluster of 5 nodes with replication factor of 3.) And at least once per day we do see some corruption related to a specific sstable in system/hints. (We are using Cassandra version 1.2.16 on RHEL 6.5)
Here is an example of such exception: ERROR [CompactionExecutor:1694] 2014-06-08 21:37:33,267 CassandraDaemon.java (line 191) Exception in thread Thread[CompactionExecutor:1694,1,main] org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/syste m/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:167) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:83) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:123) ... 23 more INFO [HintedHandoff:35] 2014-06-08 21:37:33,267 HintedHandOffManager.java (line 296) Started hinted handoff for host: 502a48cd-171b-4e83-a9ad-67f32437353a with IP: /10.210.239.190 ERROR [HintedHandoff:33] 2014-06-08 21:37:33,267 CassandraDaemon.java (line 191) Exception in thread Thread[HintedHandoff:33,1,main] java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:441) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:508) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:437) ... 6 more Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:167) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:83) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) Our current filesystem configuration for Cassandra: (nothing fancy …) /dev/sda6 /home/y/var/cassandra/commitlog ext4 defaults,commit=20,noatime,nobarrier,nodiratime 0 0 /dev/sda7 /home/y/var/cassandra/data ext4 defaults,commit=20,data=writeback,noatime,nobarrier,nodiratime 0 0 The workaround we have right now is the following: 1- delete the “guilty” sstable, in this case: /home/y/var/cassandra/data/system/hints/system-hints-ic-281* 2- Issue a major compaction for system/hints —> nodetool compact system hints; 3- Repeat for all the stables producing this issue. My biggest worry here is around the following message: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: dataSize of 8224262783474088549 starting at 502360510 would be larger than file /home/y/var/cassandra/data/system/hints/system-hints-ic-281-Data.db length 504590769 Any clues on why this is happening ? Thanks, FR