Problem with PropertyFileSnitch in Amazon EC2

Sameer Farooqui Mon, 20 Jun 2011 14:28:40 -0700

Hi,

I'm setting up a 3 node test cluster in multiple Amazon Availability Zones
to test cross-zone internode communication (and eventually cross-region
communications).


But I wanted to start with a cross-zone setup and am having trouble getting
the nodes to connect to each other and join one 3-node ring. All nodes just
seem to join their own ring and claim 100% of that space.

I'm using this Beta2 distribution of Brisk:
http://debian.datastax.com/maverick/pool/brisk_1.0~beta1.2.tar.gz

I had to manually recreate the $BRISK_HOME/lib/ folder because it didn't
exist in the binary for some reason and I also added jna and mx4j jar files
to the lib directory.

The cluster is geographically located like this:

Node 1 (seed): East-A
Node 2: East-A
Node 3: East-B

The cassandra-topology.properties file on all three nodes contains this:

# Cassandra Node IP=Data Center:Rack
10.68.x.x=DC1:RAC1
10.198.x.x=DC1:RAC2
10.204.x.x=DC2:RAC1
default=DC1:RAC1


and finally, here is what the relevant sections of the YAML file looks like
for each node:

++ Node 1 ++
cluster_name: 'Test Cluster'
initial_token: 0
auto_bootstrap: false
partitioner: org.apache.cassandra.dht.RandomPartitioner
- seeds: 50.17.x.x    #This is the elastic IP for Node 1
listen_address: 10.68.x.x
rpc_address: 0.0.0.0
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
encryption_options:
    internode_encryption: none

++ Node 2 ++
cluster_name: 'Test Cluster'
initial_token: 56713727820156410577229101238628035242
auto_bootstrap: true
partitioner: org.apache.cassandra.dht.RandomPartitioner
- seeds: 50.17.x.x    #This is the elastic IP for Node 1
listen_address: 10.198.x.x
rpc_address: 0.0.0.0
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
encryption_options:
    internode_encryption: none

++ Node 3 ++
cluster_name: 'Test Cluster'
initial_token: 113427455640312821154458202477256070485
auto_bootstrap: true
partitioner: org.apache.cassandra.dht.RandomPartitioner
- seeds: 50.17.x.x    #This is the elastic IP for Node 1
listen_address: 10.204.x.x
rpc_address: 0.0.0.0
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
encryption_options:
    internode_encryption: none


When I start Cassandra on all three nodes using "sudo bin/brisk cassandra",
the startup log doesn't show any warnings or errors. The end of the start
log on Node1 says:
 INFO [main] 2011-06-20 21:06:57,702 MessagingService.java (line 201)
Starting Messaging Service on port 7000
 INFO [main] 2011-06-20 21:06:57,723 StorageService.java (line 482) Using
saved token 0
 INFO [main] 2011-06-20 21:06:57,724 ColumnFamilyStore.java (line 1011)
Enqueuing flush of Memtable-LocationInfo@1260987126(38/47 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:57,724 Memtable.java (line 237)
Writing Memtable-LocationInfo@1260987126(38/47 serialized/live bytes, 2 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:57,809 Memtable.java (line 254)
Completed flushing /raiddrive/data/system/LocationInfo-g-12-Data.db (148
bytes)
 INFO [CompactionExecutor:2] 2011-06-20 21:06:57,812 CompactionManager.java
(line 539) Compacting Major:
[SSTableReader(path='/raiddrive/data/system/LocationInfo-g-9-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-11-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-10-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-12-Data.db')]
 INFO [CompactionExecutor:2] 2011-06-20 21:06:57,828 CompactionIterator.java
(line 186) Major@1110828771(system, LocationInfo, 429/808) now compacting at
16777 bytes/ms.
 INFO [main] 2011-06-20 21:06:57,881 Mx4jTool.java (line 67) mx4j
successfuly loaded
 INFO [CompactionExecutor:2] 2011-06-20 21:06:57,909 CompactionManager.java
(line 603) Compacted to
/raiddrive/data/system/LocationInfo-tmp-g-13-Data.db.  808 to 432 (~53% of
original) bytes for 3 keys.  Time: 97ms.
 INFO [main] 2011-06-20 21:06:57,953 BriskDaemon.java (line 146) Binding
thrift service to /0.0.0.0:9160
 INFO [main] 2011-06-20 21:06:57,955 BriskDaemon.java (line 160) Using
TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO [Thread-4] 2011-06-20 21:06:57,958 BriskDaemon.java (line 187)
Listening for thrift clients...


And the end of the log on node 2 says:
 INFO [main] 2011-06-20 21:06:57,899 StorageService.java (line 368)
Cassandra version: 0.8.0-beta2-SNAPSHOT
 INFO [main] 2011-06-20 21:06:57,901 StorageService.java (line 369) Thrift
API version: 19.10.0
 INFO [main] 2011-06-20 21:06:57,901 StorageService.java (line 382) Loading
persisted ring state
 INFO [main] 2011-06-20 21:06:57,904 StorageService.java (line 418) Starting
up server gossip
 INFO [main] 2011-06-20 21:06:57,915 ColumnFamilyStore.java (line 1011)
Enqueuing flush of Memtable-LocationInfo@885597447(29/36 serialized/live
bytes, 1 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:57,916 Memtable.java (line 237)
Writing Memtable-LocationInfo@885597447(29/36 serialized/live bytes, 1 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:57,990 Memtable.java (line 254)
Completed flushing /raiddrive/data/system/LocationInfo-g-8-Data.db (80
bytes)
 INFO [CompactionExecutor:1] 2011-06-20 21:06:58,000 CompactionManager.java
(line 539) Compacting Major:
[SSTableReader(path='/raiddrive/data/system/LocationInfo-g-6-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-8-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-7-Data.db'),
SSTableReader(path='/raiddrive/data/system/LocationInfo-g-5-Data.db')]
 INFO [main] 2011-06-20 21:06:58,007 MessagingService.java (line 201)
Starting Messaging Service on port 7000
 INFO [CompactionExecutor:1] 2011-06-20 21:06:58,015 CompactionIterator.java
(line 186) Major@291813814(system, LocationInfo, 467/770) now compacting at
16777 bytes/ms.
 INFO [main] 2011-06-20 21:06:58,032 StorageService.java (line 482) Using
saved token 56713727820156410577229101238628035242
 INFO [main] 2011-06-20 21:06:58,033 ColumnFamilyStore.java (line 1011)
Enqueuing flush of Memtable-LocationInfo@934909150(53/66 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:58,033 Memtable.java (line 237)
Writing Memtable-LocationInfo@934909150(53/66 serialized/live bytes, 2 ops)
 INFO [FlushWriter:1] 2011-06-20 21:06:58,157 Memtable.java (line 254)
Completed flushing /raiddrive/data/system/LocationInfo-g-10-Data.db (163
bytes)
 INFO [CompactionExecutor:1] 2011-06-20 21:06:58,169 CompactionManager.java
(line 603) Compacted to /raiddrive/data/system/LocationInfo-tmp-g-9-Data.db.
 770 to 447 (~58% of original) bytes for 3 keys.  Time: 168ms.
 INFO [main] 2011-06-20 21:06:58,206 Mx4jTool.java (line 67) mx4j
successfuly loaded
 INFO [main] 2011-06-20 21:06:58,249 BriskDaemon.java (line 146) Binding
thrift service to /0.0.0.0:9160
 INFO [main] 2011-06-20 21:06:58,252 BriskDaemon.java (line 160) Using
TFastFramedTransport with a max frame size of 15728640 bytes.
 INFO [Thread-4] 2011-06-20 21:06:58,254 BriskDaemon.java (line 187)
Listening for thrift clients...


Running nodetool ring on Node1 shows:
ubuntu@ip-10-68-x-x:~/brisk-1.0~beta1.2/resources/cassandra$ bin/nodetool -h
localhost ring
Address         Status State   Load            Owns    Token
10.68.x.x     Up     Normal  10.9 KB         100.00% 0

nodetool ring on Node2 shows:
ubuntu@domU-12-31-39-10-x-x:~/brisk-1.0~beta1.2/resources/cassandra$
bin/nodetool -h localhost ring
Address         Status State   Load            Owns    Token

10.198.x.x  Up     Normal  15.21 KB        100.00%
56713727820156410577229101238628035242


I have also tried placing all three nodes in the same data center, like
this, with no luck:
10.68.x.x=DC1:RAC1
10.198.x.x=DC1:RAC2
10.204.x.x=DC1:RAC3

After the above change, all nodes still join their own ring and take claim
of 100% of the ring. Here is the full startup log for when just one data
center is specified in the topology.properties file:

Node 1: http://pastebin.com/Vzy2u9WB
Node 2: http://pastebin.com/rqGy5Asy

On a side note, I have also tried switching the snitch in the YAML file on
all three nodes to BriskSimpleSnitch. The problem persists where the nodes
still don't join the same ring and the same symptoms are exhibited. So, I'm
guessing the problem is not necessarily the snitch, but something else?

I can ping all three nodes from each other and the following ports are open
between the nodes: ICMP, TCP 1024-65535, 7000, 7199, 8012, 8888


Questions:

1) What am I doing wrong that's preventing the nodes from seeing each other
and joining 1 ring? What should I look at more closely to troubleshoot this?

2) Would it help to troubleshoot this if I turn on DEBUG logging for
Cassandra and then restart the "bin/brisk cassandra" service?

Problem with PropertyFileSnitch in Amazon EC2

Reply via email to