I am once again having severe problems with my Cassandra cluster. This time,
I straight up cannot read sections of data (consistency level ONE). Client
side, I am seeing timeout exceptions. On the Cassandra node, I am seeing
errors as shown below. I don't understand what has happened or how to fix
it. I also don't understand how I am seeing errors on only one node, using
consistency level ONE with a rf=2 and yet clients are failing. I have tried
turning on debug logging but that been no help, the logs roll over (20 mb)
in < 10 seconds (the cluster is being used quite heavily). 

 

My cluster has been working fine for weeks the suddenly, I had a corrupt
SSTable which caused me all sorts of grief (outlined in pervious emails). I
was able to solve the problem by turning down the max compaction threshold
then figuring out which SSTable was corrupt by watching which minor
compactions failed. After that, I straight up deleted the on-disk data. Now
I am having problems on a different node (but adjacent in the ring) for what
I am almost certain is the same column family (presumably the same
row/column). At this point, the data is effectively lost as I know 1 of the
2 replicas was completely deleted.

 

Is there any advice going forward? My next course of action was going to be
exporting all of the sstables to JSON using the provided tool and trying to
look it over manually to see what the problem actually is (if exporting will
even work). I am not sure how useful this will be as there is nearly 80 GB
of data for this CF on a single node. What is more concerning is that I have
no idea how this problem initially popped up. I have performed hardware
tests and nothing seems to be malfunctioning. Furthermore, the fact that
these issues have 'jumped' nodes is a strong indication to me this is a
Cassandra problem.

 

There is a Cassandra bug here somewhere, if only in the way corrupt columns
are dealt with. 

 

 

db (85098417 bytes)

ERROR [ReadStage:221] 2011-01-29 12:42:39,153
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.RuntimeException: java.io.IOException: Invalid localDeleteTime
read: -1516572672

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:124)

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:47)

        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

        at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableS
liceIterator.java:108)

        at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter
ator.java:283)

        at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt
erator.java:326)

        at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte
rator.java:230)

        at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav
a:68)

        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

        at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQ
ueryFilter.java:118)

        at
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilte
r.java:142)

        at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
re.java:1230)

        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1107)

        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1077)

        at org.apache.cassandra.db.Table.getRow(Table.java:384)

        at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.jav
a:63)

        at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)

        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.io.IOException: Invalid localDeleteTime read: -1516572672

        at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
56)

        at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
13)

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche
r.getNextBlock(IndexedSliceReader.java:180)

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:119)

        ... 22 more

ERROR [ReadStage:210] 2011-01-29 12:42:41,529
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.RuntimeException:
org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid
column name length 0

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:124)

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:47)

        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

        at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableS
liceIterator.java:108)

        at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter
ator.java:283)

        at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt
erator.java:326)

        at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte
rator.java:230)

        at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav
a:68)

        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

        at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQ
ueryFilter.java:118)

        at
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilte
r.java:142)

        at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
re.java:1230)

       at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1107)

        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1077)

        at org.apache.cassandra.db.Table.getRow(Table.java:384)

        at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.jav
a:63)

        at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)

        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

        at java.lang.Thread.run(Thread.java:662)

Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException:
invalid column name length 0

        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:6
8)

        at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
64)

        at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
13)

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche
r.getNextBlock(IndexedSliceReader.java:180)

        at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:119)

        ... 22 more

 

Dan Hendry

(403) 660-2297

 

Reply via email to