About a year ago I remember a conversation about C* clusters with large numbers of nodes. I think Jon Haddad had raised the point that > 100 nodes you start to run into issues, something related to a thread pool with a size proportionate to the number of nodes, but that this problem would be mitigated in C* 4. However that's about all I recall, and I haven't found anything via Googling that talks about this concern.
I have a current project where I need to know the specifics a bit better. In particular: 1. In a multi-DC cluster, is this size concern about the size of a DC, or does it apply to the aggregate number of nodes in the entire cluster? 2. What specific misbehaviors manifest? Does this show as a memory drain, a latency increase, network congestion, etc? 3. How sharp is the cliff in terms of seeing that misbehavior? 4. Any pointer at the code artifact that causes this would be of interest. We're using a mix of 3.7 and 3.11 in our clusters, so any git pointer appropriate to that would be great. Long story short, I'm planning out a bunch of upgrades, with details I won't get into here, but spinning up a new DC matching a desired final configuration looks to be the healthier path so long as I don't slam face first into a problem related to node count while in the middle of it. -- Reid M. Pinchback Owner & CEO CodeKami Consulting LLC
