I agree it's only a problem with 'small' clusters - but it seems like 'small' 
is 'most users'? Even with 10 nodes it looks like a pretty big imbalance if I 
add an 11th node, and don't add the other 9 or move a large part of the ring. 
Or in practice have folks not had trouble with incremental scalability?



-----Original Message-----
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, March 25, 2010 2:27 PM
To: user@cassandra.apache.org
Subject: Re: Ring management and load balance

One problem is if the heaviest node is next to a node that's is
lighter than average, instead of heavier.  Then if the new node takes
extra from the heaviest, say 75% instead of just 1/2, and then we take
1/2 of the heaviest's neighbor and put it on the heaviest, you made
that lighter-than-average node even lighter.

Could you move 1/2, 1/4, etc. only until you get to a node lighter
than average?  Probably.  But I'm not sure if it's a big enough win to
justify the the complexity.

Probably a better solution would be a tool where you tell it "I want
to add N nodes to my cluster, analyzes the load factors and tell me
what tokens to add them with, and what additional moves to make to get
me within M% of equal loads, with the minimum amount of data
movement."

-Jonathan

On Thu, Mar 25, 2010 at 1:52 PM, Jeremy Dunck <jdu...@gmail.com> wrote:
> On Thu, Mar 25, 2010 at 1:26 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>> Pretty much everything assumes that there is a 1:1 correspondence
>> between IP and Token.  It's probably in the ballpark of "one month to
>> code, two to get the bugs out."  Gossip is one of the trickier parts
>> of our code base, and this would be all over that.  The actual storage
>> system changes would be simpler I think.
>
> What if adding a node shifted down-ring tokens less and less?  If
> adding node N+1, it shifts the first N/2^x, the second N/2^2x, the
> third N/2^3x, etc, so that a fixed number of nodes are shifted, but
> the bump is smoothed out?  Tokens stay 1:1.
>
> I'm talking out of my league here -- haven't actually run a cluster
> yet -- so probably a dumb idea.  :-)
>

Reply via email to