Sorry, the bug was in our snitch. We're using getHostName() instead of getCanonicalHostName() to determine DC & Rack and since for local it returns alias, instead of reverse DNS, DC & Rack numbers are not as expected.
Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]<http://www.adform.com/> [Visit us!] Follow: [twitter]<http://twitter.com/#!/adforminsider> Visit our blog<http://www.adform.com/site/blog> Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] Sent: Thursday, December 01, 2011 14:05 To: user@cassandra.apache.org Subject: NetworkTopologyStrategy bug? Assume for now we have 1 DC and 1 rack with 3 nodes. Ring will look like: (we use own snitch, which returns DC=0, Rack=0 for this case). Address DC Rack Token 113427455640312821154458202477256070484 10.0.0.1 0 0 0 10.0.0.2 0 0 56713727820156410577229101238628035242 10.0.0.3 0 0 113427455640312821154458202477256070484 Schema: ReplicaPlacementStrategy=NetworkTopologyStrategy, options: [0:2] (2 replicas in DC 0). When trying to run cleanup (same problem with repair), Cassandra reports: >From 10.0.0.1: DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.3,10.0.0.2 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring >From 10.0.0.2: DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.3,10.0.0.1 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring >From 10.0.0.3: DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.2,10.0.0.1 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring For me this means, that one node thinks that whole data range is on other two nodes. As a result: WRITE request with any key/any token sent to 10.0.0.1 controller will be forwarded and saved on 10.0.0.2 and 10.0.0.3 READ request with CL.One with any key/any token sent to 10.0.0.2 controller will be forwarded to 10.0.0.1 or 10.0.0.3, and since 10.0.0.1 can't have data for write above, some requests fails, some don't (if 10.0.0.3 answers). More of it, every READ request to any node will be forwarded to other node. That what we have right now with 0.8.6 and up to 1.0.5 as with 3 nodes in 1 DC, as with 8x2 nodes. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com<mailto:viktor.jevdoki...@adform.com> Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]<http://www.adform.com/> [Visit us!] Follow: [twitter]<http://twitter.com/#!/adforminsider> Visit our blog<http://www.adform.com/site/blog> Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
<<inline: image001.png>>
<<inline: image002.png>>
<<inline: image003.png>>
<<inline: signature-logo46e2.png>>
<<inline: dm-exco578c.png>>
<<inline: tweet7db.png>>