Thanks Robert. I didn't realize that some of the keyspaces (not all and esp. the biggest one I was focusing on) had RF > 2. I wasted 3 days on it. Thanks again for the pointers. I will try again and share the results.
On Wed, Oct 30, 2013 at 12:28 AM, Robert Coli <rc...@eventbrite.com> wrote: > On Tue, Oct 29, 2013 at 11:45 AM, Narendra Sharma < > narendra.sha...@gmail.com> wrote: > >> We had a cluster of 4 nodes in AWS. The average load on each node was >> approx 750GB. We added 4 new nodes. It is now more than 30 hours and the >> node is still in JOINING mode. >> Specifically I am analyzing the one with IP 10.3.1.29. There is no >> compaction or streaming or index building happening. >> > > If your cluster has RF>2, you are bootstrapping two nodes into the same > range simultaneously. That is not supported. [1,2] The node you are having > the problem with is in the range that is probably overlapping. > > If I were you I would : > > 1) stop all "Joining" nodes and wipe their state including system keyspace > 2) optionally "removetoken" any nodes which remain in cluster gossip state > after stopping > 3) re-start/bootstrap them one at a time, waiting for each to complete > bootstrapping before starting the next one > 4) (unrelated) Upgrade from 1.1.6 to the head of 1.1.x ASAP. > > =Rob > [1] https://issues.apache.org/jira/browse/CASSANDRA-2434 > [2] > https://issues.apache.org/jira/browse/CASSANDRA-2434?focusedCommentId=13091851&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13091851 > -- Narendra Sharma Software Engineer *http://www.aeris.com* *http://narendrasharma.blogspot.com/*