Hi,
I had a schema disagreement problem in cassandra 1.0.9 cluster, where
one node had different schema version.
So I followed the faq at
http://wiki.apache.org/cassandra/FAQ#schema_disagreement
disabled gossip, disabled thrift, drained and finally stopped the
cassandra process, on startup
noticed
INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467)
Couldn't detect any schema definitions in local storage.
in the log, and after
INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619)
Bootstrap/Replace/Move completed! Now serving reads.
it started throwing Fatal exceptions for all read/write operations
endlessly.
I had to stop cassandra process again(no draining was done)
On second start it did came up ok immediately loading the correct
cluster schema version
INFO [main] 2012-05-18 16:54:44,303 DatabaseDescriptor.java (line 499)
Loading schema version 9db34ef0-a0be-11e1-0000-f9687e034cf7
But now this node appears to have started with no data from keyspace
which had schema disagreement.
The original keyspace sstables now appear under snapshots dir.
# nodetool -h localhost ring
Address DC Rack Status State Load
Owns Token
141784319550391026443072753096570088106
10.49.127.4 eu-west 1a Up Normal 8.19 GB
16.67% 0
10.241.29.65 eu-west 1b Up Normal 8.18 GB
16.67% 28356863910078205288614550619314017621
10.59.46.236 eu-west 1c Up Normal 8.22 GB
16.67% 56713727820156410577229101238628035242
10.50.33.232 eu-west 1a Up Normal 8.2 GB
16.67% 85070591730234615865843651857942052864
10.234.71.33 eu-west 1b Up Normal 8.15 GB
16.67% 113427455640312821154458202477256070485
10.58.249.118 eu-west 1c Up Normal 660.98 MB
16.67% 141784319550391026443072753096570088106
#
The node is the one with 660.98 MB data( which is opscenter keyspace
data which was not invalidated)
So i have some questions:
1) What did I wrong? - why cassandra was throwing exceptions on first
startup?
2) Why the keyspace data was invalidated ? Is it expected?
3) If answer to #2 is "yes it's expected" then that's the point in
doing http://wiki.apache.org/cassandra/FAQ#schema_disagreement
then all keyspace data is lost anyway? It makes more sense to just do
http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node
4) afaiu i could also stop cassandra again move old sstables from
snapshot back to keyspace data dir and run repair for all keyspace CFs?
So that it finishes faster
and makes less load than running a repair which has no previous keyspace
data at all?
The first startup log is below:
INFO [main] 2012-05-18 16:23:07,367 AbstractCassandraDaemon.java (line
105) Logging initialized
INFO [main] 2012-05-18 16:23:07,382 AbstractCassandraDaemon.java (line
126) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_24
INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line
127) Heap size: 2600468480/2600468480
INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line
128) Classpath:
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/java/mx4j-tools.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-1.0.9.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-1.0.9.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/joda-time-1.6.2.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra//lib/jamm-0.2.5.jar
INFO [main] 2012-05-18 16:23:10,661 CLibrary.java (line 109) JNA
mlockall successful
INFO [main] 2012-05-18 16:23:10,692 DatabaseDescriptor.java (line 114)
Loading settings from file:/etc/cassandra/ssa/cassandra.yaml
INFO [main] 2012-05-18 16:23:10,868 DatabaseDescriptor.java (line 168)
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2012-05-18 16:23:10,911 DatabaseDescriptor.java (line 233)
Global memtable threshold is enabled at 826MB
INFO [main] 2012-05-18 16:23:10,970 Ec2Snitch.java (line 63) EC2Snitch
using region: eu-west, zone: 1c.
INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,652 SSTableReader.java
(line 153) Opening /var/lib/cassandra/data/system/Versions-hc-5 (247 bytes)
INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,767 SSTableReader.java
(line 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-2 (128 bytes)
INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,768 SSTableReader.java
(line 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-3 (124 bytes)
INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,802 SSTableReader.java
(line 153) Opening /var/lib/cassandra/data/system/IndexInfo-hc-1 (124 bytes)
INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,825 SSTableReader.java
(line 153) Opening /var/lib/cassandra/data/system/LocationInfo-hc-35 (89
bytes)
INFO [SSTableBatchOpen:2] 2012-05-18 16:23:11,825 SSTableReader.java
(line 153) Opening /var/lib/cassandra/data/system/LocationInfo-hc-34
(558 bytes)
INFO [SSTableBatchOpen:1] 2012-05-18 16:23:11,835 SSTableReader.java
(line 153) Opening
/var/lib/cassandra/data/system/HintsColumnFamily-hc-56 (235748 bytes)
INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467)
Couldn't detect any schema definitions in local storage.
INFO [main] 2012-05-18 16:23:11,880 DatabaseDescriptor.java (line 493)
Found table data in data directories. Consider using the CLI to define
your schema.
INFO [main] 2012-05-18 16:23:11,914 CommitLogSegment.java (line 60)
Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1337358191914.log
INFO [main] 2012-05-18 16:23:11,928 CommitLog.java (line 172)
Replaying /var/lib/cassandra/commitlog/CommitLog-1337356780099.log
INFO [main] 2012-05-18 16:23:15,237 CommitLog.java (line 355) Finished
reading /var/lib/cassandra/commitlog/CommitLog-1337356780099.log
INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped
77886 mutations from unknown (probably removed) CF with id 1016
INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped
37804 mutations from unknown (probably removed) CF with id 1008
INFO [main] 2012-05-18 16:23:15,238 CommitLog.java (line 360) Skipped
3 mutations from unknown (probably removed) CF with id 1024
INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped
67759 mutations from unknown (probably removed) CF with id 1009
INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped
51268 mutations from unknown (probably removed) CF with id 1010
INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped
59319 mutations from unknown (probably removed) CF with id 1011
INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped
61110 mutations from unknown (probably removed) CF with id 1012
INFO [main] 2012-05-18 16:23:15,239 CommitLog.java (line 360) Skipped
182908 mutations from unknown (probably removed) CF with id 1013
INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped
10038 mutations from unknown (probably removed) CF with id 1014
INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped
14783 mutations from unknown (probably removed) CF with id 1001
INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped
4256 mutations from unknown (probably removed) CF with id 1000
INFO [main] 2012-05-18 16:23:15,240 CommitLog.java (line 360) Skipped
746 mutations from unknown (probably removed) CF with id 1003
INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 360) Skipped
3013 mutations from unknown (probably removed) CF with id 1002
INFO [main] 2012-05-18 16:23:15,241 CommitLog.java (line 179) Log
replay complete, 0 replayed mutations
INFO [main] 2012-05-18 16:23:15,259 StorageService.java (line 400)
Cassandra version: 1.0.9
INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 401)
Thrift API version: 19.20.0
INFO [main] 2012-05-18 16:23:15,260 StorageService.java (line 414)
Loading persisted ring state
INFO [main] 2012-05-18 16:23:15,288 StorageService.java (line 494)
Starting up server gossip
INFO [main] 2012-05-18 16:23:15,292 ColumnFamilyStore.java (line 705)
Enqueuing flush of Memtable-LocationInfo@189237805(29/36 serialized/live
bytes, 1 ops)
INFO [FlushWriter:1] 2012-05-18 16:23:15,293 Memtable.java (line 246)
Writing Memtable-LocationInfo@189237805(29/36 serialized/live bytes, 1 ops)
INFO [FlushWriter:1] 2012-05-18 16:23:15,350 Memtable.java (line 283)
Completed flushing
/var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db (80 bytes)
INFO [main] 2012-05-18 16:23:15,365 Ec2Snitch.java (line 113)
Ec2Snitch adding ApplicationState ec2region=eu-west ec2zone=1c
INFO [main] 2012-05-18 16:23:15,379 MessagingService.java (line 269)
Starting Messaging Service on port 7000
INFO [main] 2012-05-18 16:23:15,399 StorageService.java (line 609)
Using saved token 141784319550391026443072753096570088106
INFO [main] 2012-05-18 16:23:15,403 ColumnFamilyStore.java (line 705)
Enqueuing flush of Memtable-LocationInfo@559001765(53/66 serialized/live
bytes, 2 ops)
INFO [FlushWriter:1] 2012-05-18 16:23:15,404 Memtable.java (line 246)
Writing Memtable-LocationInfo@559001765(53/66 serialized/live bytes, 2 ops)
INFO [FlushWriter:1] 2012-05-18 16:23:15,451 Memtable.java (line 283)
Completed flushing
/var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db (163 bytes)
INFO [main] 2012-05-18 16:23:15,454 StorageService.java (line 989)
Node dsc2c.internal/10.58.249.118 state jump to normal
INFO [CompactionExecutor:1] 2012-05-18 16:23:15,461
CompactionTask.java (line 115) Compacting
[SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-37-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-35-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-34-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-hc-36-Data.db')]
INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619)
Bootstrap/Replace/Move completed! Now serving reads.
INFO [CompactionExecutor:1] 2012-05-18 16:23:15,507
CompactionTask.java (line 223) Compacted to
[/var/lib/cassandra/data/system/LocationInfo-hc-38-Data.db,]. 890 to
558 (~62% of original) bytes for 4 keys at 0.015651MB/s. Time: 34ms.
ERROR [MutationStage:1] 2012-05-18 16:23:15,664
RowMutationVerbHandler.java (line 61) Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=1012
at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
at
org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
ERROR [MutationStage:2] 2012-05-18 16:23:15,666
RowMutationVerbHandler.java (line 61) Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
cfId=1010
at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
at
org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
ERROR [ReadStage:1] 2012-05-18 16:23:15,668 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[ReadStage:1,5,main]
java.lang.IllegalArgumentException: Unknown ColumnFamily UserData in
keyspace PRODUCTION
at
org.apache.cassandra.config.Schema.getComparator(Schema.java:223)
at
org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320)
at
org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
at
org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
ERROR [ReadStage:3] 2012-05-18 16:23:15,669 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[ReadStage:3,5,main]
java.lang.IllegalArgumentException: Unknown ColumnFamily
UserFraudParameters in keyspace PRODUCTION
at
org.apache.cassandra.config.Schema.getComparator(Schema.java:223)
at
org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:320)
at
org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
at
org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
...
and so endlessly on
Thanks
Alex