I agree it's only a problem with 'small' clusters - but it seems like 'small' is 'most users'? Even with 10 nodes it looks like a pretty big imbalance if I add an 11th node, and don't add the other 9 or move a large part of the ring. Or in practice have folks not had trouble with incremental scalability?
-----Original Message----- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, March 25, 2010 2:27 PM To: user@cassandra.apache.org Subject: Re: Ring management and load balance One problem is if the heaviest node is next to a node that's is lighter than average, instead of heavier. Then if the new node takes extra from the heaviest, say 75% instead of just 1/2, and then we take 1/2 of the heaviest's neighbor and put it on the heaviest, you made that lighter-than-average node even lighter. Could you move 1/2, 1/4, etc. only until you get to a node lighter than average? Probably. But I'm not sure if it's a big enough win to justify the the complexity. Probably a better solution would be a tool where you tell it "I want to add N nodes to my cluster, analyzes the load factors and tell me what tokens to add them with, and what additional moves to make to get me within M% of equal loads, with the minimum amount of data movement." -Jonathan On Thu, Mar 25, 2010 at 1:52 PM, Jeremy Dunck <jdu...@gmail.com> wrote: > On Thu, Mar 25, 2010 at 1:26 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> Pretty much everything assumes that there is a 1:1 correspondence >> between IP and Token. It's probably in the ballpark of "one month to >> code, two to get the bugs out." Gossip is one of the trickier parts >> of our code base, and this would be all over that. The actual storage >> system changes would be simpler I think. > > What if adding a node shifted down-ring tokens less and less? If > adding node N+1, it shifts the first N/2^x, the second N/2^2x, the > third N/2^3x, etc, so that a fixed number of nodes are shifted, but > the bump is smoothed out? Tokens stay 1:1. > > I'm talking out of my league here -- haven't actually run a cluster > yet -- so probably a dumb idea. :-) >