[ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876492#comment-16876492 ]
Jay Zhuang commented on CASSANDRA-15141: ---------------------------------------- Here is a patch to improve the performance of calculating endpoint replicas. It's 100x - 1000x faster than the default implementation: | Branch | uTest | JVM-dTest | dTest | | [15141-trunk|https://github.com/Instagram/cassandra/tree/15141-trunk] | [circle #51|https://circleci.com/gh/Instagram/cassandra/51] | [circle #50|https://circleci.com/gh/Instagram/cassandra/50] | [circle #53|https://circleci.com/gh/Instagram/cassandra/53] | > RemoveNode takes long time and blocks gossip stage > -------------------------------------------------- > > Key: CASSANDRA-15141 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership > Reporter: Jay Zhuang > Assignee: Jay Zhuang > Priority: Normal > > This function > [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] > during removenode and decommission is slow for large vnode cluster with > NetworkTopologyStrategy. As it needs to build whole replications map for > every token range. > In one of our cluster (> 1k nodes), it takes about 20 seconds for each > NetworkTopologyStrategy keyspace, so the total time to process a removenode > message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user > keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org