Looking at the output of "nodetool netstats" I see that the bootstrapping nodes 
pulling from only two of the nine nodes currently in the datacenter.   That 
surprises me: I'd think the vnodes it pulls from would be randomly spread 
across the existing nodes.  We're using Cassandra 2.0.11 with 256 vnodes each.

I also notice that while bootstrapping, the node is quite busy doing 
compactions.   There are over 1000 pending compactions on the new node and it's 
not finished bootstrapping. I'd think those would be unnecessary, since the 
other nodes in the data center have zero pending compactions.  Perhaps the 
compactions explains why running "du -hs /var/lib/cassandra/data" on the new 
node shows more disk space usage than on the old nodes.

Is it reasonable to do "nodetool disableautocompaction" on the bootstrapping 
node? Should that be the default???

If I start bootstrapping one node, it's not yet in the cluster but it decides 
which token ranges it owns and requests streams for that data. If  I then try 
to bootstrap a SECOND node concurrently, it will take over ownership of some 
token ranges from the first node. Will the first node then adjust what data it 
streams?

It seems to me the cassandra server needs to keep track of both the OLD token 
ranges and vnodes and the NEW ones.  I'm not convinced that running two 
bootstraps concurrently (starting the second one after several minutes of 
delay) is safe.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.com<mailto:dona...@audiencescience.com>

[AudienceScience]

Reply via email to