> bleeding edge code you are running (did you try rc1?) or you do have nodes
on different versions

All nodes are running code from
https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7 which I
thought was essentially RC1 with fixes but I will give the actual release a
try.

> you have a hardware problem

Hard to say. I don’t think so, everything else seems to be working fine. I
will try and run some diagnostics on the two nodes which seem to be acting
up.


Now for some new developments; the plot thickens. I am fairly sure there is
a corrupt ColumnFamily/SSTable. After a restart, two adjacent nodes both
show the following error. After which the CompactionManager pending tasks
never returns to zero. I am fairly sure this cf is not getting compacted but
compaction for other column families seems to continue. In order to get rid
of all these errors I have to perform a truncate operation using the cli,
after which I get the same IndexOutOfBounds exception. Can I just shut down
the node (draining first), and delete all data files related to this column
family on the two problematic nodes? The data they contain is reasonably
unimportant and I don’t mind loosing it.


ERROR [CompactionExecutor:1] 2010-12-06 05:07:56,736
AbstractCassandraDaemon.java (line 90) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:520)
        at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:340)
        at
org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.jav
a:57)
        at
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedSuper(ColumnFamilySto
re.java:818)
        at
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFam
ilyStore.java:781)
        at
org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.ja
va:774)
        at
org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:93)
        at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterato
r.java:138)
        at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav
a:107)
        at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav
a:42)
        at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav
a:73)
        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)
        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)
        at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(Filter
Iterator.java:183)
        at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterat
or.java:94)
        at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.jav
a:321)
        at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
        at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:97)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
        at java.lang.Thread.run(Thread.java:662)

Dan

-----Original Message-----
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: December-04-10 22:45
To: user
Subject: Re: Various exceptions on 0.7

At least one of your nodes is sending garbage to the others.

Either there's a bug in the bleeding edge code you are running (did
you try rc1?) or you do have nodes on different versions or you have a
hardware problem.

On Sat, Dec 4, 2010 at 5:51 PM, Dan Hendry <dan.hendry.j...@gmail.com>
wrote:
> Here are two other errors which appear frequently:
> ERROR [MutationStage:29] 2010-12-04 17:47:46,931
RowMutationVerbHandler.java
> (line 83) Error in row mutation
> java.io.IOException: Invalid localDeleteTime read: 0
>         at
>
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
55)
>         at
>
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
12)
>         at
>
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFami
lySerializer.java:129)
>         at
>
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria
lizer.java:120)
>         at
>
org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja
va:383)
>         at
>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
93)
>         at
>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
51)
>         at
>
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler
.java:52)
>         at
>
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)
>         at
>
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
>         at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
>         at java.lang.Thread.run(Thread.java:662)
>
> ERROR [MutationStage:15] 2010-12-04 17:48:33,216
RowMutationVerbHandler.java
> (line 83) Error in row mutation
> org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid
> column name length 0
>         at
>
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:6
8)
>         at
>
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
63)
>         at
>
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
12)
>         at
>
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFami
lySerializer.java:129)
>         at
>
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria
lizer.java:120)
>         at
>
org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja
va:383)
>         at
>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
93)
>         at
>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
51)
>         at
>
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler
.java:52)
>         at
>
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)
>         at
>
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
>         at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
>         at java.lang.Thread.run(Thread.java:662)
>
> On Sat, Dec 4, 2010 at 6:29 PM, Dan Hendry <dan.hendry.j...@gmail.com>
> wrote:
>>
>> No, all nodes are running very recent (< 2 day old) code out of the 0.7
>> branch. This cluster has always had 0.7 RC1(+) code running on it
>>
>> On Sat, Dec 4, 2010 at 6:24 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>>
>>> Are you mixing different Cassandra versions?
>>>
>>> On Sat, Dec 4, 2010 at 4:58 PM, Dan Hendry <dan.hendry.j...@gmail.com>
>>> wrote:
>>> > To be clear, I had to interrupt a clean operation earlier in the day
be
>>> > killing the cassandra process. Now the node works for awhile,
>>> > but continually logging the "Error in row mutation" errors then
>>> > eventually
>>> > logs a "Fatal exception in thread" error. After which, the process
>>> > stays
>>> > alive but there seem to be problems reading from the node. At the very
>>> > least, read performance is massively degraded.
>>> >
>>> > On Sat, Dec 4, 2010 at 5:52 PM, Dan Hendry <dan.hendry.j...@gmail.com>
>>> > wrote:
>>> >>
>>> >> One of my Cassandra nodes is giving me a number of errors then
>>> >> effectively
>>> >> dying. I think it was somehow caused by interrupting a nodetool clean
>>> >> operation. Running a recent 0.7 build out of svn.
>>> >> ERROR [MutationStage:26] 2010-12-04 16:23:04,395
>>> >> RowMutationVerbHandler.java (line 83) Error in row mutation
>>> >> java.io.EOFException
>>> >>         at
java.io.DataInputStream.readFully(DataInputStream.java:180)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:264)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:7
6)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
63)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
12)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFami
lySerializer.java:129)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria
lizer.java:120)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja
va:383)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
93)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
51)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler
.java:52)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
>>> >>         at java.lang.Thread.run(Thread.java:662)
>>> >> ERROR [MutationStage:13] 2010-12-04 16:25:04,061
>>> >> RowMutationVerbHandler.java (line 83) Error in row mutation
>>> >> org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
>>> >> find
>>> >> cfId=524288
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria
lizer.java:117)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja
va:383)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
93)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3
51)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler
.java:52)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
>>> >>         at java.lang.Thread.run(Thread.java:662)
>>> >> ERROR [MutationStage:20] 2010-12-04 16:25:25,216
>>> >> DebuggableThreadPoolExecutor.java (line 103) Error in
>>> >> ThreadPoolExecutor
>>> >> java.lang.NullPointerException
>>> >>         at org.apache.cassandra.db.Table.apply(Table.java:398)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler
.java:73)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
>>> >>         at java.lang.Thread.run(Thread.java:662)
>>> >> ERROR [MutationStage:20] 2010-12-04 16:25:25,216
>>> >> AbstractCassandraDaemon.java (line 90) Fatal exception in thread
>>> >> Thread[MutationStage:20,5,main]
>>> >> java.lang.NullPointerException
>>> >>         at org.apache.cassandra.db.Table.apply(Table.java:398)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler
.java:73)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
>>> >>         at
>>> >>
>>> >>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
>>> >>         at java.lang.Thread.run(Thread.java:662)
>>> >> ERROR [COMMIT-LOG-WRITER] 2010-12-04 16:25:25,216
>>> >> AbstractCassandraDaemon.java (line 90) Fatal exception in thread
>>> >> Thread[COMMIT-LOG-WRITER,5,main]
>>> >> java.lang.RuntimeException: java.lang.NullPointerException
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>> >>         at java.lang.Thread.run(Thread.java:662)
>>> >> Caused by: java.lang.NullPointerException
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.ja
va:92)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.jav
a:509)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayT
hrow(PeriodicCommitLogExecutorService.java:52)
>>> >>         at
>>> >>
>>> >>
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>> >>         ... 1 more
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.872 / Virus Database: 271.1.1/3300 - Release Date: 12/06/10
02:34:00

Reply via email to