On Fri, Mar 26, 2010 at 4:35 PM, Mike Malone wrote:
> With the random partitioner there's no need to suggest a token. The key
> space is statistically random so you should be able to just split 2^128 into
> equal sized segments and get fairly equal storage load. Your read / write
> load could get
Mike,
If you have the assumption that your rows are roughly equal in size (at
least statistcally), then you could also just take a node's total load (this
is exposed via Jmx) and divide by the amount of keys/rows on that node. Not
sure how to get the latter, but shouldn't be such a big deal to int
Sorry for the last mail, hit the wrong button. This JMX property gives a
per-CF granularity, right?
I think it doesn't solve the problem completely here because the problem of
key load-balancing effectively demands for a per-key granularity. But this
could help statistical sampling.
Roland
26.
But this
26.03.2010 22:29 schrieb am "Rob Coli" :
On 3/26/10 1:36 PM, Roland Hänel wrote:
>
> If I was going to write such a tool: do you think the th...
The JMX interface exposes an Attribute which seems appropriate to this use.
It is called "TotalDiskSpaceUsed," and is available on a per-column
2010/3/26 Roland Hänel
> Jonathan,
>
> I agree with your idea about a tool that could 'propose' good token choices
> for optimal load-balancing.
>
> If I was going to write such a tool: do you think the thrift API provides
> the necessary information? I think with the RandomPartitioner you cannot
On 3/26/10 1:36 PM, Roland Hänel wrote:
If I was going to write such a tool: do you think the thrift API
provides the necessary information? I think with the RandomPartitioner
you cannot scan all your rows to actually find out how big certain
ranges of rows are. And even with the OPP (that is the
Jonathan,
I agree with your idea about a tool that could 'propose' good token choices
for optimal load-balancing.
If I was going to write such a tool: do you think the thrift API provides
the necessary information? I think with the RandomPartitioner you cannot
scan all your rows to actually find
ing"
Sent: Thursday, March 25, 2010 6:59pm
To: "user@cassandra.apache.org"
Subject: RE: Ring management and load balance
I agree it's only a problem with 'small' clusters - but it seems like 'small'
is 'most users'? Even with 10 nodes it looks like
e folks not had trouble with incremental scalability?
-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Thursday, March 25, 2010 2:27 PM
To: user@cassandra.apache.org
Subject: Re: Ring management and load balance
One problem is if the heaviest node is next to a node
One problem is if the heaviest node is next to a node that's is
lighter than average, instead of heavier. Then if the new node takes
extra from the heaviest, say 75% instead of just 1/2, and then we take
1/2 of the heaviest's neighbor and put it on the heaviest, you made
that lighter-than-average
On Thu, Mar 25, 2010 at 1:26 PM, Jonathan Ellis wrote:
> Pretty much everything assumes that there is a 1:1 correspondence
> between IP and Token. It's probably in the ballpark of "one month to
> code, two to get the bugs out." Gossip is one of the trickier parts
> of our code base, and this wou
On Thu, Mar 25, 2010 at 1:17 PM, Mike Malone wrote:
> On Thu, Mar 25, 2010 at 9:56 AM, Jonathan Ellis wrote:
>>
>> The advantage to doing it the way Cassandra does is that you can keep
>> keys sorted with OrderPreservingPartitioner for range scans. grabbing
>> one token of many from each node in
On Thu, Mar 25, 2010 at 9:56 AM, Jonathan Ellis wrote:
> The advantage to doing it the way Cassandra does is that you can keep
> keys sorted with OrderPreservingPartitioner for range scans. grabbing
> one token of many from each node in the ring would prohibit that.
>
> So we rely on active load
On Thu, Mar 25, 2010 at 11:40 AM, Jeremy Dunck wrote:
> On Thu, Mar 25, 2010 at 10:56 AM, Jonathan Ellis wrote:
>> The advantage to doing it the way Cassandra does is that you can keep
>> keys sorted with OrderPreservingPartitioner for range scans. grabbing
>> one token of many from each node in
On Thu, Mar 25, 2010 at 10:56 AM, Jonathan Ellis wrote:
> The advantage to doing it the way Cassandra does is that you can keep
> keys sorted with OrderPreservingPartitioner for range scans. grabbing
> one token of many from each node in the ring would prohibit that.
>
> So we rely on active load
The advantage to doing it the way Cassandra does is that you can keep
keys sorted with OrderPreservingPartitioner for range scans. grabbing
one token of many from each node in the ring would prohibit that.
So we rely on active load balancing to get to a "good enough" balance,
say within 50%. It
I wanted to check my understanding of the load balance operation. Let's say I
have 5 nodes, each of them has been assigned at startup 1/5 of the ring, and
the load is equal across them (say using random partitioner). The load on the
cluster gets high, so I add a sixth server. During bootstrap, t
17 matches
Mail list logo