I don’t know if it’s the OPs intent in this case, but the response latency profile will likely be different for two clusters equivalent in total storage but different in node count. Multiple reasons for that, but probably the biggest would be that you’re changing a divisor in I/O queuing statistics that matter to compaction-triggered dirty page flushes, and I’d expect you would see that in latencies. Speculative retry stats to bounce past slow nodes busy with garbage collections might shift a bit too.
R From: "Durity, Sean R" <sean_r_dur...@homedepot.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Monday, July 13, 2020 at 10:48 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: RE: Running Large Clusters in Production Message from External Sender I’m curious – is the scaling needed for the amount of data, the amount of user connections, throughput or what? I have a 200ish cluster, but it is primarily a disk space issue. When I can have (and administer) nodes with large disks, the cluster size will shrink. Sean Durity From: Isaac Reath (BLOOMBERG/ 919 3RD A) <ire...@bloomberg.net> Sent: Monday, July 13, 2020 10:35 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Running Large Clusters in Production Thanks for the info Jeff, all very helpful! From: user@cassandra.apache.org<mailto:user@cassandra.apache.org> At: 07/11/20 12:30:36 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Running Large Clusters in Production Gossip related stuff eventually becomes the issue For example, when a new host joins the cluster (or replaces a failed host), the new bootstrapping tokens go into a “pending range” set. Writes then merge pending ranges with final ranges, and the data structures involved here weren’t necessarily designed for hundreds of thousands of ranges, so it’s likely they stop behaving at some point (https://issues.apache.org/jira/browse/CASSANDRA-6345 [issues.apache.org]<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/CASSANDRA-6345__;!!M-nmYVHPHQ!cuNBkxxbXQWIlkzR4IaScJBk5m04XNEIXtp5dnuYZj5rQQjp_cM8neG6aq1fHgj60hZbg2U$> , https://issues.apache.org/jira/browse/CASSANDRA-6127 [issues.apache.org]<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/CASSANDRA-6127__;!!M-nmYVHPHQ!cuNBkxxbXQWIlkzR4IaScJBk5m04XNEIXtp5dnuYZj5rQQjp_cM8neG6aq1fHgj616F6neA$> as an example, but there have been others) Unrelated to vnodes, until cassandra 4.0, the internode messaging requires basically 6 threads per instance - 3 for ingress and 3 for egress, to every other host in the cluster. The full mesh gets pretty expensive, it was rewritten in 4.0 and that thousand number may go up quite a bit after that. On Jul 11, 2020, at 9:16 AM, Isaac Reath (BLOOMBERG/ 919 3RD A) <ire...@bloomberg.net<mailto:ire...@bloomberg.net>> wrote: Thank you John and Jeff, I was leaning towards sharding and this really helps support that opinion. Would you mind explaining a bit more what about vnodes caused those issues? From: user@cassandra.apache.org<mailto:user@cassandra.apache.org> At: 07/10/20 19:06:27 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Cc: Isaac Reath (BLOOMBERG/ 919 3RD A ) Subject: Re: Running Large Clusters in Production I worked on a handful of large clusters (> 200 nodes) using vnodes, and there were some serious issues with both performance and availability. We had to put in a LOT of work to fix the problems. I agree with Jeff - it's way better to manage multiple clusters than a really large one. On Fri, Jul 10, 2020 at 2:49 PM Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>> wrote: 1000 instances are fine if you're not using vnodes. I'm not sure what the limit is if you're using vnodes. If you might get to 1000, shard early before you get there. Running 8x100 host clusters will be easier than one 800 host cluster. On Fri, Jul 10, 2020 at 2:19 PM Isaac Reath (BLOOMBERG/ 919 3RD A) <ire...@bloomberg.net<mailto:ire...@bloomberg.net>> wrote: Hi All, I’m currently dealing with a use case that is running on around 200 nodes, due to growth of their product as well as onboarding additional data sources, we are looking at having to expand that to around 700 nodes, and potentially beyond to 1000+. To that end I have a couple of questions: 1) For those who have experienced managing clusters at that scale, what types of operational challenges have you run into that you might not see when operating 100 node clusters? A couple that come to mind are version (especially major version) upgrades become a lot more risky as it no longer becomes feasible to do a blue / green style deployment of the database and backup & restore operations seem far more error prone as well for the same reason (having to do an in-place restore instead of being able to spin up a new cluster to restore to). 2) Is there a cluster size beyond which sharding across multiple clusters becomes the recommended approach? Thanks, Isaac ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.