[ https://issues.apache.org/jira/browse/CASSANDRA-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Oleg Kibirev updated CASSANDRA-5456: ------------------------------------ Comment: was deleted (was: Copying bootstrapTokens rather than holding a lock on the same for entire loop) > Large number of bootstrapping nodes cause gossip to stop working > ---------------------------------------------------------------- > > Key: CASSANDRA-5456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5456 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.1.10 > Reporter: Oleg Kibirev > > Long running section of code in PendingRangeCalculatorService is synchronized > on bootstrapTokens. This causes gossip to stop working as it waits for the > same lock when a large number of nodes (hundreds in our case) are > bootstrapping. Consequently, the whole cluster becomes non-functional. > I experimented with the following change in > PendingRangeCalculatorService.java and it resolved the problem in our case. > Prior code had synchronized around the for loop. > synchronized(bootstrapTokens) { > bootstrapTokens = new LinkedHashMap<Token, InetAddress>(bootstrapTokens); > } > for (Map.Entry<Token, InetAddress> entry : bootstrapTokens.entrySet()) > { > InetAddress endpoint = entry.getValue(); > allLeftMetadata.updateNormalToken(entry.getKey(), endpoint); > for (Range<Token> range : > strategy.getAddressRanges(allLeftMetadata).get(endpoint)) > pendingRanges.put(range, endpoint); > allLeftMetadata.removeEndpoint(endpoint); > } > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira