> bleeding edge code you are running (did you try rc1?) or you do have nodes on different versions
All nodes are running code from https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7 which I thought was essentially RC1 with fixes but I will give the actual release a try. > you have a hardware problem Hard to say. I dont think so, everything else seems to be working fine. I will try and run some diagnostics on the two nodes which seem to be acting up. Now for some new developments; the plot thickens. I am fairly sure there is a corrupt ColumnFamily/SSTable. After a restart, two adjacent nodes both show the following error. After which the CompactionManager pending tasks never returns to zero. I am fairly sure this cf is not getting compacted but compaction for other column families seems to continue. In order to get rid of all these errors I have to perform a truncate operation using the cli, after which I get the same IndexOutOfBounds exception. Can I just shut down the node (draining first), and delete all data files related to this column family on the two problematic nodes? The data they contain is reasonably unimportant and I dont mind loosing it. ERROR [CompactionExecutor:1] 2010-12-06 05:07:56,736 AbstractCassandraDaemon.java (line 90) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:520) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:340) at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.jav a:57) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedSuper(ColumnFamilySto re.java:818) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFam ilyStore.java:781) at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.ja va:774) at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:93) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterato r.java:138) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav a:107) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav a:42) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(Filter Iterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterat or.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.jav a:321) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:97) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Dan -----Original Message----- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: December-04-10 22:45 To: user Subject: Re: Various exceptions on 0.7 At least one of your nodes is sending garbage to the others. Either there's a bug in the bleeding edge code you are running (did you try rc1?) or you do have nodes on different versions or you have a hardware problem. On Sat, Dec 4, 2010 at 5:51 PM, Dan Hendry <dan.hendry.j...@gmail.com> wrote: > Here are two other errors which appear frequently: > ERROR [MutationStage:29] 2010-12-04 17:47:46,931 RowMutationVerbHandler.java > (line 83) Error in row mutation > java.io.IOException: Invalid localDeleteTime read: 0 > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 55) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 12) > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFami lySerializer.java:129) > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria lizer.java:120) > at > org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja va:383) > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 93) > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 51) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler .java:52) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) > at java.lang.Thread.run(Thread.java:662) > > ERROR [MutationStage:15] 2010-12-04 17:48:33,216 RowMutationVerbHandler.java > (line 83) Error in row mutation > org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid > column name length 0 > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:6 8) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 63) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 12) > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFami lySerializer.java:129) > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria lizer.java:120) > at > org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja va:383) > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 93) > at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 51) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler .java:52) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) > at java.lang.Thread.run(Thread.java:662) > > On Sat, Dec 4, 2010 at 6:29 PM, Dan Hendry <dan.hendry.j...@gmail.com> > wrote: >> >> No, all nodes are running very recent (< 2 day old) code out of the 0.7 >> branch. This cluster has always had 0.7 RC1(+) code running on it >> >> On Sat, Dec 4, 2010 at 6:24 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>> >>> Are you mixing different Cassandra versions? >>> >>> On Sat, Dec 4, 2010 at 4:58 PM, Dan Hendry <dan.hendry.j...@gmail.com> >>> wrote: >>> > To be clear, I had to interrupt a clean operation earlier in the day be >>> > killing the cassandra process. Now the node works for awhile, >>> > but continually logging the "Error in row mutation" errors then >>> > eventually >>> > logs a "Fatal exception in thread" error. After which, the process >>> > stays >>> > alive but there seem to be problems reading from the node. At the very >>> > least, read performance is massively degraded. >>> > >>> > On Sat, Dec 4, 2010 at 5:52 PM, Dan Hendry <dan.hendry.j...@gmail.com> >>> > wrote: >>> >> >>> >> One of my Cassandra nodes is giving me a number of errors then >>> >> effectively >>> >> dying. I think it was somehow caused by interrupting a nodetool clean >>> >> operation. Running a recent 0.7 build out of svn. >>> >> ERROR [MutationStage:26] 2010-12-04 16:23:04,395 >>> >> RowMutationVerbHandler.java (line 83) Error in row mutation >>> >> java.io.EOFException >>> >> at java.io.DataInputStream.readFully(DataInputStream.java:180) >>> >> at >>> >> >>> >> org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:264) >>> >> at >>> >> >>> >> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:7 6) >>> >> at >>> >> >>> >> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 63) >>> >> at >>> >> >>> >> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 12) >>> >> at >>> >> >>> >> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFami lySerializer.java:129) >>> >> at >>> >> >>> >> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria lizer.java:120) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja va:383) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 93) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 51) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler .java:52) >>> >> at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) >>> >> at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [MutationStage:13] 2010-12-04 16:25:04,061 >>> >> RowMutationVerbHandler.java (line 83) Error in row mutation >>> >> org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't >>> >> find >>> >> cfId=524288 >>> >> at >>> >> >>> >> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySeria lizer.java:117) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.ja va:383) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 93) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:3 51) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler .java:52) >>> >> at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) >>> >> at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [MutationStage:20] 2010-12-04 16:25:25,216 >>> >> DebuggableThreadPoolExecutor.java (line 103) Error in >>> >> ThreadPoolExecutor >>> >> java.lang.NullPointerException >>> >> at org.apache.cassandra.db.Table.apply(Table.java:398) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler .java:73) >>> >> at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) >>> >> at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [MutationStage:20] 2010-12-04 16:25:25,216 >>> >> AbstractCassandraDaemon.java (line 90) Fatal exception in thread >>> >> Thread[MutationStage:20,5,main] >>> >> java.lang.NullPointerException >>> >> at org.apache.cassandra.db.Table.apply(Table.java:398) >>> >> at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler .java:73) >>> >> at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) >>> >> at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) >>> >> at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [COMMIT-LOG-WRITER] 2010-12-04 16:25:25,216 >>> >> AbstractCassandraDaemon.java (line 90) Fatal exception in thread >>> >> Thread[COMMIT-LOG-WRITER,5,main] >>> >> java.lang.RuntimeException: java.lang.NullPointerException >>> >> at >>> >> >>> >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >>> >> at java.lang.Thread.run(Thread.java:662) >>> >> Caused by: java.lang.NullPointerException >>> >> at >>> >> >>> >> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.ja va:92) >>> >> at >>> >> >>> >> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.jav a:509) >>> >> at >>> >> >>> >> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayT hrow(PeriodicCommitLogExecutorService.java:52) >>> >> at >>> >> >>> >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >>> >> ... 1 more >>> > >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of Riptano, the source for professional Cassandra support >>> http://riptano.com >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.872 / Virus Database: 271.1.1/3300 - Release Date: 12/06/10 02:34:00