Opened a ticket: https://issues.apache.org/jira/browse/CASSANDRA-5525
On Mon, Apr 29, 2013 at 2:24 AM, aaron morton <aa...@thelastpickle.com>wrote: > is this understanding correct "we had a 12 node cluster with 256 vnodes on > each node (upgraded from 1.1), we added two additional nodes that streamed > so much data (600+Gb when other nodes had 150-200GB) during the joining > phase that they filled their local disks and had to be killed" ? > > Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and > update the thread with the ticket number. > > Can you show the output from nodetool status so we can get a feel for the > ring? > Can you include the logs from one of the nodes that failed to join ? > > Thanks > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 29/04/2013, at 10:01 AM, John Watson <j...@disqus.com> wrote: > > On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> We're going to try running a shuffle before adding a new node again... >>> maybe that will help >>> >> I don't think hurt but I doubt it will help. >> > > We had to bail on shuffle since we need to add capacity ASAP and not in 20 > days. > > >> >> It seems when new nodes join, they are streamed *all* sstables in the >>>> cluster. >>>> >>>> >>>> >>>> How many nodes did you join, what was the num_tokens ? >> Did you notice streaming from all nodes (in the logs) or are you saying >> this in response to the cluster load increasing ? >> >> > Was only adding 2 nodes at the time (planning to add a total of 12.) > Starting with a cluster of 12, but now 11 since 1 node entered some weird > state when one of the new nodes ran out disk space. > num_tokens is set to 256 on all nodes. > Yes, nearly all current nodes were streaming to the new ones (which was > great until disk space was an issue.) > >> The purple line machine, I just stopped the joining process because >>>> the main cluster was dropping mutation messages at this point on a few >>>> nodes (and it still had dozens of sstables to stream.) >>>> >>>> Which were the new nodes ? >> Can you show the output from nodetool status? >> >> > The new nodes are the purple and gray lines above all the others. > > nodetool status doesn't show joining nodes. I think I saw a bug already > filed for this but I can't seem to find it. > > >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 27/04/2013, at 9:35 AM, Bryan Talbot <btal...@aeriagames.com> wrote: >> >> I believe that "nodetool rebuild" is used to add a new datacenter, not >> just a new host to an existing cluster. Is that what you ran to add the >> node? >> >> -Bryan >> >> >> >> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <j...@disqus.com> wrote: >> >>> Small relief we're not the only ones that had this issue. >>> >>> We're going to try running a shuffle before adding a new node again... >>> maybe that will help >>> >>> - John >>> >>> >>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral < >>> fsob...@igcorp.com.br> wrote: >>> >>>> I am using the same version and observed something similar. >>>> >>>> I've added a new node, but the instructions from Datastax did not work >>>> for me. Then I ran "nodetool rebuild" on the new node. After finished this >>>> command, it contained two times the load of the other nodes. Even when I >>>> ran "nodetool cleanup" on the older nodes, the situation was the same. >>>> >>>> The problem only seemed to disappear when "nodetool repair" was applied >>>> to all nodes. >>>> >>>> Regards, >>>> Francisco Sobral. >>>> >>>> >>>> >>>> >>>> On Apr 25, 2013, at 4:57 PM, John Watson <j...@disqus.com> wrote: >>>> >>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and >>>> running upgradesstables, I figured it would be safe to start adding nodes >>>> to the cluster. Guess not? >>>> >>>> It seems when new nodes join, they are streamed *all* sstables in the >>>> cluster. >>>> >>>> >>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png >>>> >>>> The gray the line machine ran out disk space and for some reason >>>> cascaded into errors in the cluster about 'no host id' when trying to store >>>> hints for it (even though it hadn't joined yet). >>>> The purple line machine, I just stopped the joining process because the >>>> main cluster was dropping mutation messages at this point on a few nodes >>>> (and it still had dozens of sstables to stream.) >>>> >>>> I followed this: >>>> http://www.datastax.com/docs/1.2/operations/add_replace_nodes >>>> >>>> Is there something missing in that documentation? >>>> >>>> Thanks, >>>> >>>> John >>>> >>>> >>>> >>> >> >> > >