Mark,

I see this output my log many times over for 2 nodes.  We have a cron entry 
across all clusters that force a full GC at 2 AM.  node1 is due to Full GC that 
was scheduled (I can disable this).  Node2 was due to a Full GC that occurred 
during our peak operation (these happen occasionally, we’ve been working to 
reduce them).  Few Questions


1)      Will any node leaving the cluster while streaming force us to bootsrap 
all over again?  If so, is this addressed in future versions?

2)      We have too much data to migrate to run on non-production hours.  How 
do we make it such that full GC’s don’t impact bootstrapping?  Should we 
increase phi_convict_threshold ?

Parag



From: Mark Reddy [mailto:mark.re...@boxever.com]
Sent: Wednesday, July 30, 2014 7:58 AM
To: user@cassandra.apache.org
Subject: Re: bootstrapping new nodes on 1.2.12

Thanks for the detailed response.  I checked ‘nodetool netstats’ and I see 
there are pending streams, all of which are stuck at 0%.  I was expecting to 
see at least one output that was more than 0%.  Have you seen this before?

This could indicate that the bootstrap process is hung due to a failed 
streaming session. Can you check your logs for the following line:

AbstractStreamSession.java (line 110) Stream failed because /xxx.xxx.xxx.xxx 
died or was restarted/removed (streams may still be active in background, but 
further streams won't be started)

If that is the case you will need to wipe the node and begin the bootstrapping 
process again


Mark


On Wed, Jul 30, 2014 at 12:03 PM, Parag Patel 
<ppa...@clearpoolgroup.com<mailto:ppa...@clearpoolgroup.com>> wrote:
Thanks for the detailed response.  I checked ‘nodetool netstats’ and I see 
there are pending streams, all of which are stuck at 0%.  I was expecting to 
see at least one output that was more than 0%.  Have you seen this before?

Side question – does a new node stream from other nodes in any particular 
order?  Perhaps this is a coincidence, but if I were to sort my hostnames in 
alphabetical order, it’s currently streaming from the last 2.

From: Mark Reddy [mailto:mark.re...@boxever.com<mailto:mark.re...@boxever.com>]
Sent: Wednesday, July 30, 2014 4:42 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: bootstrapping new nodes on 1.2.12

Hi Parag,

1)      Would anyone be able to help me interrupt this information from 
OpsCenter?

At a high level bootstrapping a new node has two phases, streaming and 
secondary index builds. I believe OpsCenter will only report active streams, 
the pending stream will be listed as such in OpsCenter as well. In OpsCenter 
rather than looking at the Data Size check the used space on the Storage 
Capacity pie chart, this will show how much data is on disk but not necessarily 
live on the node yet.

Personally I would check 'nodetool netstats' to see what streams are remaining, 
this will list all active / pending stream and what files are to be streamed, 
at the moment you might just be streaming some very large files and once 
complete you will see a dramatic increase in data size.

If streaming is complete and you use secondary indexes, check 'nodetool 
compcationstats' for any secondary index builds that may be taking place.


2)      Is there anything I can do to speed this up?

If you have the capacity you could increase 
stream_throughput_outbound_megabits_per_sec in your cassandra.yaml

If you don't have the capacity you could add more nodes to spread the data so 
you stream less in future.

Finally you could upgrade to 2.0.x as it contains a complete refactor of 
streaming and should make your streaming sessions more robust and transparent: 
https://issues.apache.org/jira/browse/CASSANDRA-5286


Mark

On Wed, Jul 30, 2014 at 3:15 AM, Parag Patel 
<ppa...@clearpoolgroup.com<mailto:ppa...@clearpoolgroup.com>> wrote:
Hi,

It’s taking a while to boostrap a 13th node into a 12 node cluster.  The 
average node size is about 1.7TB.  At the beginning of today we were close to 
.9TB on the new node and 12 hours later we’re at 1.1TB.  I figured it would 
have finished by now because when I was looking on OpsCenter, there were 2 
transfers remaining.  1 was at 0% and the other was at 2%.  I look again now 
and those same nodes haven’t progressed all day.  Instead I see 9 more 
transfers (some of which are progressing).


1)      Would anyone be able to help me interrupt this information from 
OpsCenter?

2)      Is there anything I can do to speed this up?

Thanks,
Parag




Reply via email to