Dan,

Do you have any more information on this issue? Have you been able to discover anything from exporing your SSTables to JSON?

Thanks,
Ben

On 1/29/11 12:45 PM, Dan Hendry wrote:

I am once again having severe problems with my Cassandra cluster. This time, I straight up cannot read sections of data (consistency level ONE). Client side, I am seeing timeout exceptions. On the Cassandra node, I am seeing errors as shown below. I don't understand what has happened or how to fix it. I also don't understand how I am seeing errors on only one node, using consistency level ONE with a rf=2 and yet clients are failing. I have tried turning on debug logging but that been no help, the logs roll over (20 mb) in < 10 seconds (the cluster is being used quite heavily).

My cluster has been working fine for weeks the suddenly, I had a corrupt SSTable which caused me all sorts of grief (outlined in pervious emails). I was able to solve the problem by turning down the max compaction threshold then figuring out which SSTable was corrupt by watching which minor compactions failed. After that, I straight up deleted the on-disk data. Now I am having problems on a different node (but adjacent in the ring) for what I am almost certain is the same column family (presumably the same row/column). At this point, the data is effectively lost as I know 1 of the 2 replicas was completely deleted.

Is there any advice going forward? My next course of action was going to be exporting all of the sstables to JSON using the provided tool and trying to look it over manually to see what the problem actually is (if exporting will even work). I am not sure how useful this will be as there is nearly 80 GB of data for this CF on a single node. What is more concerning is that I have no idea how this problem initially popped up. I have performed hardware tests and nothing seems to be malfunctioning. Furthermore, the fact that these issues have 'jumped' nodes is a strong indication to me this is a Cassandra problem.

There is a Cassandra bug here somewhere, if only in the way corrupt columns are dealt with.

db (85098417 bytes)

ERROR [ReadStage:221] 2011-01-29 12:42:39,153 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.RuntimeException: java.io.IOException: Invalid localDeleteTime read: -1516572672

at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:124)

at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:47)

at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)

at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)

at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)

at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)

at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)

at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)

at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)

at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)

at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)

at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)

        at org.apache.cassandra.db.Table.getRow(Table.java:384)

at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)

at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)

at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.io.IOException: Invalid localDeleteTime read: -1516572672

at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:356)

at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)

at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:180)

at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:119)

        ... 22 more

ERROR [ReadStage:210] 2011-01-29 12:42:41,529 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.RuntimeException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0

at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:124)

at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:47)

at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)

at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)

at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)

at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)

at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)

at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)

at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)

at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)

at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)

at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)

        at org.apache.cassandra.db.Table.getRow(Table.java:384)

at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)

at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)

at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0

at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:68)

at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364)

at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)

at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:180)

at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:119)

        ... 22 more

Dan Hendry

(403) 660-2297

<<attachment: ben_coverston.vcf>>

Reply via email to