is this understanding correct "we had a 12 node cluster with 256 vnodes on each 
node (upgraded from 1.1), we added two additional nodes that streamed so much 
data (600+Gb when other nodes had 150-200GB) during the joining phase that they 
filled their local disks and had to be killed" ?

Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and 
update the thread with the ticket number.

Can you show the output from nodetool status so we can get a feel for the ring?
Can you include the logs from one of the nodes that failed to join ? 

Thanks

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/04/2013, at 10:01 AM, John Watson <j...@disqus.com> wrote:

> On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> We're going to try running a shuffle before adding a new node again... maybe 
>> that will help
> 
> I don't think  hurt but I doubt it will help. 
> 
> We had to bail on shuffle since we need to add capacity ASAP and not in 20 
> days.
>  
> 
>>> It seems when new nodes join, they are streamed *all* sstables in the 
>>> cluster.
> 
>> 
> 
> How many nodes did you join, what was the num_tokens ? 
> Did you notice streaming from all nodes (in the logs) or are you saying this 
> in response to the cluster load increasing ? 
> 
>  
> Was only adding 2 nodes at the time (planning to add a total of 12.) Starting 
> with a cluster of 12, but now 11 since 1 node entered some weird state when 
> one of the new nodes ran out disk space.
> num_tokens is set to 256 on all nodes.
> Yes, nearly all current nodes were streaming to the new ones (which was great 
> until disk space was an issue.)
>>> The purple line machine, I just stopped the joining process because the 
>>> main cluster was dropping mutation messages at this point on a few nodes 
>>> (and it still had dozens of sstables to stream.)
> Which were the new nodes ?
> Can you show the output from nodetool status?
> 
> 
> The new nodes are the purple and gray lines above all the others.
> 
> nodetool status doesn't show joining nodes. I think I saw a bug already filed 
> for this but I can't seem to find it.
>  
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/04/2013, at 9:35 AM, Bryan Talbot <btal...@aeriagames.com> wrote:
> 
>> I believe that "nodetool rebuild" is used to add a new datacenter, not just 
>> a new host to an existing cluster.  Is that what you ran to add the node?
>> 
>> -Bryan
>> 
>> 
>> 
>> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <j...@disqus.com> wrote:
>> Small relief we're not the only ones that had this issue.
>> 
>> We're going to try running a shuffle before adding a new node again... maybe 
>> that will help
>> 
>> - John
>> 
>> 
>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral 
>> <fsob...@igcorp.com.br> wrote:
>> I am using the same version and observed something similar.
>> 
>> I've added a new node, but the instructions from Datastax did not work for 
>> me. Then I ran "nodetool rebuild" on the new node. After finished this 
>> command, it contained two times the load of the other nodes. Even when I ran 
>> "nodetool cleanup" on the older nodes, the situation was the same.
>> 
>> The problem only seemed to disappear when "nodetool repair" was applied to 
>> all nodes.
>> 
>> Regards,
>> Francisco Sobral.
>> 
>> 
>> 
>> 
>> On Apr 25, 2013, at 4:57 PM, John Watson <j...@disqus.com> wrote:
>> 
>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running 
>>> upgradesstables, I figured it would be safe to start adding nodes to the 
>>> cluster. Guess not?
>>> 
>>> It seems when new nodes join, they are streamed *all* sstables in the 
>>> cluster.
>>> 
>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>> 
>>> The gray the line machine ran out disk space and for some reason cascaded 
>>> into errors in the cluster about 'no host id' when trying to store hints 
>>> for it (even though it hadn't joined yet).
>>> The purple line machine, I just stopped the joining process because the 
>>> main cluster was dropping mutation messages at this point on a few nodes 
>>> (and it still had dozens of sstables to stream.)
>>> 
>>> I followed this: 
>>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>> 
>>> Is there something missing in that documentation?
>>> 
>>> Thanks,
>>> 
>>> John
>> 
>> 
>> 
> 
> 

Reply via email to