Hi,

I had a schema disagreement problem in cassandra 1.0.9 cluster, where one node had different schema version. So I followed the faq at http://wiki.apache.org/cassandra/FAQ#schema_disagreement disabled gossip, disabled thrift, drained and finally stopped the cassandra process, on startup
noticed
INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467) Couldn't detect any schema definitions in local storage.
in the log, and after
INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619) Bootstrap/Replace/Move completed! Now serving reads. it started throwing Fatal exceptions for all read/write operations endlessly.

I had to stop cassandra process again(no draining was done)

On second start it did came up ok immediately loading the correct cluster schema version INFO [main] 2012-05-18 16:54:44,303 DatabaseDescriptor.java (line 499) Loading schema version 9db34ef0-a0be-11e1-0000-f9687e034cf7

But now this node appears to have started with no data from keyspace which had schema disagreement.
The original keyspace sstables now appear under snapshots dir.

# nodetool -h localhost ring
Address DC Rack Status State Load Owns Token
                                                                               
141784319550391026443072753096570088106
10.49.127.4 eu-west 1a Up Normal 8.19 GB 16.67% 0 10.241.29.65 eu-west 1b Up Normal 8.18 GB 16.67% 28356863910078205288614550619314017621 10.59.46.236 eu-west 1c Up Normal 8.22 GB 16.67% 56713727820156410577229101238628035242 10.50.33.232 eu-west 1a Up Normal 8.2 GB 16.67% 85070591730234615865843651857942052864 10.234.71.33 eu-west 1b Up Normal 8.15 GB 16.67% 113427455640312821154458202477256070485 10.58.249.118 eu-west 1c Up Normal 660.98 MB 16.67% 141784319550391026443072753096570088106
#

The node is the one with 660.98 MB data( which is opscenter keyspace data which was not invalidated)

So i have some questions:

1) What did I wrong? - why cassandra was throwing exceptions on first startup?
2) Why the keyspace data was invalidated ? Is it expected?
3) If answer to #2 is "yes it's expected" then that's the point in doing http://wiki.apache.org/cassandra/FAQ#schema_disagreement then all keyspace data is lost anyway? It makes more sense to just do http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node 4) afaiu i could also stop cassandra again move old sstables from snapshot back to keyspace data dir and run repair for all keyspace CFs? So that it finishes faster and makes less load than running a repair which has no previous keyspace data at all?

The first startup log is below:

INFO [main] 2012-05-18 16:23:07,367 AbstractCassandraDaemon.java (line 105) Logging initialized INFO [main] 2012-05-18 16:23:07,382 AbstractCassandraDaemon.java (line 126) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_24 INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line 127) Heap size: 2600468480/2600468480 INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line 128) Classpath: /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/java/mx4j-tools.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-1.0.9.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/joda-time-1.6.2.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra//lib/jamm-0.2.5.jar INFO [main] 2012-05-18 16:23:10,661 CLibrary.java (line 109) JNA mlockall successful INFO [main] 2012-05-18 16:23:10,692 DatabaseDescriptor.java (line 114) Loading settings from file:/etc/cassandra/ssa/cassandra.yaml INFO [main] 2012-05-18 16:23:10,868 DatabaseDescriptor.java (line 168) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2012-05-18 16:23:10,911 DatabaseDescriptor.java (line 233) Global memtable threshold is enabled at 826MB INFO [main] 2012-05-18 16:23:10,970 Ec2Snitch.java (line 63) EC2Snitch using region: eu-west, zone: 1c. INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,652 SSTableReader.java (line 153) Opening /var/lib/cassandra/data/system/Versions-hc-5 (247 bytes) INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,767 SSTableReader.java (line 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-2 (128 bytes) INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,768 SSTableReader.java (line 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-3 (124 bytes) INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,802 SSTableReader.java (line 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-1 (124 bytes) INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,825 SSTableReader.java (line 153) Opening /var/lib/cassandra/data/system/LocationInfo-hc-35 (89 bytes) INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,825 SSTableReader.java (line 153) Opening /var/lib/cassandra/data/system/LocationInfo-hc-34 (558 bytes) INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,835 SSTableReader.java (line 153) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-56 (235748 bytes) INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467) Couldn't detect any schema definitions in local storage. INFO [main] 2012-05-18 16:23:11,880 DatabaseDescriptor.java (line 493) Found table data in data directories. Consider using the CLI to define your schema. INFO [main] 2012-05-18 16:23:11,914 CommitLogSegment.java (line 60) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1337358191914.log INFO [main] 2012-05-18 16:23:11,928 CommitLog.java (line 172) Replaying /var/lib/cassandra/commitlog/CommitLog-1337356780099.log INFO [main] 2012-05-18 16:23:15,237 CommitLog.java (line 355) Finished reading /var/lib/cassandra/commitlog/CommitLog-1337356780099.log INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 77886 mutations from unknown (probably removed) CF with id 1016 INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 37804 mutations from unknown (probably removed) CF with id 1008 INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped 3 mutations from unknown (probably removed) CF with id 1024 INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 67759 mutations from unknown (probably removed) CF with id 1009 INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 51268 mutations from unknown (probably removed) CF with id 1010 INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 59319 mutations from unknown (probably removed) CF with id 1011 INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 61110 mutations from unknown (probably removed) CF with id 1012 INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped 182908 mutations from unknown (probably removed) CF with id 1013 INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 10038 mutations from unknown (probably removed) CF with id 1014 INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 14783 mutations from unknown (probably removed) CF with id 1001 INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 4256 mutations from unknown (probably removed) CF with id 1000 INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped 746 mutations from unknown (probably removed) CF with id 1003 INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 360) Skipped 3013 mutations from unknown (probably removed) CF with id 1002 INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 179) Log replay complete, 0 replayed mutations INFO [main] 2012-05-18 16:23:15,259 StorageService.java (line 400) Cassandra version: 1.0.9 INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 401) Thrift API version: 19.20.0 INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 414) Loading persisted ring state INFO [main] 2012-05-18 16:23:15,288 StorageService.java (line 494) Starting up server gossip INFO [main] 2012-05-18 16:23:15,292 ColumnFamilyStore.java (line 705) Enqueuing flush of Memtable-LocationInfo@189237805(29/36 serialized/live bytes, 1 ops) INFO [FlushWriter:1] 2012-05-18 16:23:15,293 Memtable.java (line 246) Writing Memtable-LocationInfo@189237805(29/36 serialized/live bytes, 1 ops) INFO [FlushWriter:1] 2012-05-18 16:23:15,350 Memtable.java (line 283) Completed flushing /var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db (80 bytes) INFO [main] 2012-05-18 16:23:15,365 Ec2Snitch.java (line 113) Ec2Snitch adding ApplicationState ec2region=eu-west ec2zone=1c INFO [main] 2012-05-18 16:23:15,379 MessagingService.java (line 269) Starting Messaging Service on port 7000 INFO [main] 2012-05-18 16:23:15,399 StorageService.java (line 609) Using saved token 141784319550391026443072753096570088106 INFO [main] 2012-05-18 16:23:15,403 ColumnFamilyStore.java (line 705) Enqueuing flush of Memtable-LocationInfo@559001765(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-18 16:23:15,404 Memtable.java (line 246) Writing Memtable-LocationInfo@559001765(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-18 16:23:15,451 Memtable.java (line 283) Completed flushing /var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db (163 bytes) INFO [main] 2012-05-18 16:23:15,454 StorageService.java (line 989) Node dsc2c.internal/10.58.249.118 state jump to normal INFO [CompactionExecutor:1] 2012-05-18 16:23:15,461 CompactionTask.java (line 115) Compacting [SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-35-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-34-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db')] INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619) Bootstrap/Replace/Move completed! Now serving reads. INFO [CompactionExecutor:1] 2012-05-18 16:23:15,507 CompactionTask.java (line 223) Compacted to [/var/lib/cassandra/data/system/LocationInfo-hc-38-Data.db,]. 890 to 558 (~62% of original) bytes for 4 keys at 0.015651MB/s. Time: 34ms. ERROR [MutationStage:1] 2012-05-18 16:23:15,664 RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1012 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [MutationStage:2] 2012-05-18 16:23:15,666 RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1010 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [ReadStage:1] 2012-05-18 16:23:15,668 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReadStage:1,5,main] java.lang.IllegalArgumentException: Unknown ColumnFamily UserData in keyspace PRODUCTION at org.apache.cassandra.config.Schema.getComparator(Schema.java:223) at org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320) at org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92) at org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44) at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106) at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74) at org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [ReadStage:3] 2012-05-18 16:23:15,669 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReadStage:3,5,main] java.lang.IllegalArgumentException: Unknown ColumnFamily UserFraudParameters in keyspace PRODUCTION at org.apache.cassandra.config.Schema.getComparator(Schema.java:223) at org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320) at org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92) at org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44) at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106) at org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74) at org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
...
and so endlessly on


Thanks
Alex



Reply via email to