Re: Dropping down replication factor

Jeffrey Jirsa Sat, 12 Aug 2017 18:38:57 -0700

Compaction is backed up  that may be normal write load (because of the rack
imbalance), or it may be a secondary index build. Hard to say for sure.
nodetool compactionstats¹ if you¹re able to provide it. The jstack probably
not necessary, streaming is being marked as failed and it¹s turning itself
off. Not sure why streaming is marked as failing, though, anything on the
sending sides?






From:  Brian Spindler <brian.spind...@gmail.com>
Reply-To:  <user@cassandra.apache.org>
Date:  Saturday, August 12, 2017 at 6:34 PM
To:  <user@cassandra.apache.org>
Subject:  Re: Dropping down replication factor

Thanks for replying Jeff.

Responses below. 

On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa <jji...@gmail.com> wrote:
> Answers inline
> 
> --
> Jeff Jirsa
> 
> 
>> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
>> >
>> > Hi folks, hopefully a quick one:
>> >
>> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's all
>> in one region but spread across 3 availability zones.  It was nicely balanced
>> with 4 nodes in each.
>> >
>> > But with a couple of failures and subsequent provisions to the wrong az we
>> now have a cluster with :
>> >
>> > 5 nodes in az A
>> > 5 nodes in az B
>> > 2 nodes in az C
>> >
>> > Not sure why, but when adding a third node in AZ C it fails to stream after
>> getting all the way to completion and no apparent error in logs.  I've looked
>> at a couple of bugs referring to scrubbing and possible OOM bugs due to
>> metadata writing at end of streaming (sorry don't have ticket handy).  I'm
>> worried I might not be able to do much with these since the disk space usage
>> is high and they are under a lot of load given the small number of them for
>> this rack.
> 
> You'll definitely have higher load on az C instances with rf=3 in this ratio
> 
> Streaming should still work - are you sure it's not busy doing something? Like
> building secondary index or similar? jstack thread dump would be useful, or at
> least nodetool tpstats
> 
Only other thing might be a backup.  We do incrementals x1hr and snapshots
x24h; they are shipped to s3 then links are cleaned up.  The error I get on
the node I'm trying to add to rack C is:

ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception
encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at 
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87)
~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:11
66) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.jav
a:944) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:7
40) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:6
17) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391)
[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:5
66) [apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655)
[apache-cassandra-2.1.15.jar:2.1.15]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at 
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(S
treamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
        at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.ex
ecute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
        at 
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionLis
t.java:156) ~[guava-16.0.jar:na]
        at 
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:1
45) ~[guava-16.0.jar:na]
        at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture
.java:202) ~[guava-16.0.jar:na]
        at 
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResult
Future.java:209) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(Stre
amResultFuture.java:185) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java
:413) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.ja
va:700) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.jav
a:661) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(St
reamReceiveTask.java:179) ~[apache-cassandra-2.1.15.jar:2.1.15]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_112]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_112]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
42) ~[na:1.8.0_112]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
17) ~[na:1.8.0_112]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]
WARN  [StorageServiceShutdownHook] 2017-08-12 23:54:51,582
Gossiper.java:1462 - No local state or state is in silent shutdown, not
announcing shutdown
INFO  [StorageServiceShutdownHook] 2017-08-12 23:54:51,582
MessagingService.java:734 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/10.40.17.114 <http://10.40.17.114> ] 2017-08-12 23:54:51,583
MessagingService.java:1020 - MessagingService has terminated the accept()
thread

And I got this on this same node when it was bootstrapping, I ran 'nodetool
netstats' just before it shutdown:

        Receiving 377 files, 161928296443 bytes total. Already received 377
files, 161928296443 bytes total
 
TPStats on host that was streaming the data to this node:

Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
MutationStage                     1         1     4488289014         0
0
ReadStage                         0         0       24486526         0
0
RequestResponseStage              0         0     3038847374
<tel:(303)%20884-7374>          0                 0
ReadRepairStage                   0         0        1601576         0
0
CounterMutationStage              0         0          68403         0
0
MiscStage                         0         0              0         0
0
AntiEntropySessions               0         0              0         0
0
HintedHandoff                     0         0             18         0
0
GossipStage                       0         0        2786892         0
0
CacheCleanupExecutor              0         0              0         0
0
InternalResponseStage             0         0          61115         0
0
CommitLogArchiver                 0         0              0         0
0
CompactionExecutor                4        83         304167         0
0
ValidationExecutor                0         0          78249         0
0
MigrationStage                    0         0          94201         0
0
AntiEntropyStage                  0         0         160505         0
0
PendingRangeCalculator            0         0             30         0
0
Sampler                           0         0              0         0
0
MemtableFlushWriter               0         0          71270         0
0
MemtablePostFlush                 0         0         175209         0
0
MemtableReclaimMemory             0         0          81222         0
0
Native-Transport-Requests         2         0     1983565628         0
9405444

Message type           Dropped
READ                       218
RANGE_SLICE                 15
_TRACE                       0
MUTATION               2949001
COUNTER_MUTATION             0
BINARY                       0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR               8571

I can get a jstack if needed.
> 
>> >
>> > Rather than troubleshoot this further, what I was thinking about doing was:
>> > - drop the replication factor on our keyspace to two
> 
> Repair before you do this, or you'll lose your consistency guarantees
 
Given the load on the 2 nodes in rack C I'm hoping a repair will succeed.

> 
>> > - hopefully this would reduce load on these two remaining nodes
> 
> It should, racks awareness guarantees on replica per rack if rf==num racks, so
> right now those 2 c machines have 2.5x as much data as the others. This will
> drop that requirement and drop the load significantly
> 
>> > - run repairs/cleanup across the cluster
>> > - then shoot these two nodes in the 'c' rack
> 
> Why shoot the c instances? Why not drop RF and then add 2 more C instances,
> then increase RF back to 3, run repair, then Decom the extra instances in a
> and b?
> 
> 
Fair point.  I was considering staying at RF two but I think with your
points below, I should reconsider.
 
>> > - run repairs/cleanup across the cluster
>> >
>> > Would this work with minimal/no disruption?
> 
> The big risk of running rf=2 is that quorum==all - any gc pause or node
> restarting will make you lose HA or strong consistency guarantees.
> 
>> > Should I update their "rack" before hand or after ?
> 
> You can't change a node's rack once it's in the cluster, it SHOULD refuse to
> start if you do that
> 
Got it. 
 
>> > What else am I not thinking about?
>> >
>> > My main goal atm is to get back to where the cluster is in a clean
>> consistent state that allows nodes to properly bootstrap.
>> >
>> > Thanks for your help in advance.
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Dropping down replication factor

Reply via email to