Hi, we had some hard-disk issues this week, which caused some datafiles to get corrupt, which was reported by the compaction. My approach to fix this was to delete the corrupted files and run repair. That sounded easy at first, but unfortunetaly C* 1.1.11 sometimes does not show which datafile is causing the exception.
How do you handle such cases? Do you delete the entire CF or do you look up the compaction-started message and delete the files being involved? In my opinion the Stacktrace should always show the filename of the file which could not be read. Does anybody know if there were already changes to the logging since 1.1.11? CASSANDRA-2261<https://issues.apache.org/jira/browse/CASSANDRA-2261>does not seem to have fixed the Exceptionhandling part. Were there perhaps changes in 1.2 with the new disk-failure handling? cheers, Christian PS: Here are some examples I found in my logs: *Bad behaviour:* ERROR [ValidationExecutor:1] 2013-05-29 13:26:09,121 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[ValidationExecutor:1,1,main] java.io.IOError: java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:726) at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:69) at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:457) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:90) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:71) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:114) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 19 more *Also bad behaviour:* ERROR [CompactionExecutor:1] 2013-05-29 13:12:58,896 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.IOException: java.util.zip.DataFormatException: incomplete dynamic bit lengths tree at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:174) at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.util.zip.DataFormatException: incomplete dynamic bit lengths tree at org.apache.cassandra.io.compress.DeflateCompressor.uncompress(DeflateCompressor.java:114) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:90) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:71) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:114) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 20 more Caused by: java.util.zip.DataFormatException: incomplete dynamic bit lengths tree at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:238) at org.apache.cassandra.io.compress.DeflateCompressor.uncompress(DeflateCompressor.java:110) ... 33 more *This corruption is logging it correctly:* ERROR [CompactionExecutor:3095] 2013-05-27 08:25:06,777 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:3095,1,main] java.io.IOError: org.apache.cassandra.io.compress.CorruptedBlockException: (/var/lib/cassandra/data/Monitoring/cfLargeData/Monitoring-cfLargeData-hf-21873-Data.db): corruption detected, chunk at 977039 of length 237504. at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:174) at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.cassandra.io.compress.CorruptedBlockException: (/var/lib/cassandra/data/Monitoring/cfLargeData/Monitoring-cfLargeData-hf-21873-Data.db): corruption detected, chunk at 977039 of length 237504. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:97) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:71) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377) at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:114) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 20 more