Re: Cluster key distribution wrong after upgrading to 0.8.4

Thibaut Britz Mon, 22 Aug 2011 03:08:17 -0700

Hi,

Thanks for explaining: As I understood each node now only displays
it's local view of the the data it cotains, and not the global view
anymore.


One more question:
Why do the nodes at the end of the ring only show the % load from 2
nodes and not from 3?
We are always writing with quorum, so there should also be data on the
adjacent nodes? Or are the quorum writes not working as expected (only
writing to 2 nodes) instead of 3 at the beginning and end of the
cluster?

Thanks,
Thibaut


On Mon, Aug 22, 2011 at 12:01 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I'm not sure what the fix is.
>
> When using an order preserving partitioner it's up to you to ensure the ring 
> is correctly balanced.
>
> Say you have the following setup…
>
> node : token
> 1 : a
> 2 : h
> 3 : p
>
> If keys are always 1 character we can say each node own's roughly 33% of the 
> ring. Because we know there are only 26 possible keys.
>
> With the RP we know how many keys there are, the output of the md5 
> calculation is a 128 bit integer. So we can say what fraction of the total 
> each range is.
>
> If in the example above keys are of any length, how many values exist between 
> a and h ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/08/2011, at 3:33 AM, Thibaut Britz wrote:
>
>> Hi,
>>
>> I will wait until this is fixed beforeI upgrade, just to be sure.
>>
>> Shall I open a new ticket for this issue?
>>
>> Thanks,
>> Thibaut
>>
>> On Sun, Aug 21, 2011 at 11:57 AM, aaron morton <aa...@thelastpickle.com> 
>> wrote:
>>> This looks like an artifact of the way ownership is calculated for the OOP.
>>> See 
>>> https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java#L177
>>>  it
>>> was changed in this ticket
>>> https://issues.apache.org/jira/browse/CASSANDRA-2800
>>> The change applied in CASSANDRA-2800 was not applied to the
>>> AbstractByteOrderPartitioner. Looks like it should have been. I'll chase
>>> that up.
>>>
>>> When each node calculates the ownership for the token ranges (for OOP and
>>> BOP) it's based on the number of keys the node has in that range. As there
>>> is no way for the OOP to understand the range of values the keys may take.
>>> If you look at the 192 node it's showing ownership most with 192, 191 and
>>> 190 - so i'm assuming RF3 and 192 also has data from the ranges owned by 191
>>> and 190.
>>> IMHO you can ignore this.
>>> You can use load the the number of keys estimate from cfstats to get an idea
>>> of whats happening.
>>> Hope that helps.
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> On 19/08/2011, at 9:42 PM, Thibaut Britz wrote:
>>>
>>> Hi,
>>>
>>> we were using apache-cassandra-2011-06-28_08-04-46.jar so far in
>>> production and wanted to upgrade to 0.8.4.
>>>
>>> Our cluster was well balanced and we only saved keys with a lower case
>>> md5 prefix. (Orderpreserving partitioner).
>>> Each node owned 20% of the tokens, which was also displayed on each
>>> node in nodetool -h localhost ring.
>>>
>>> After upgrading, our well balanced cluster shows completely wrong
>>> percentage on who owns which keys:
>>>
>>> *.*.*.190:
>>> Address         DC          Rack        Status State   Load
>>> Owns    Token
>>>
>>>        ffffffffffffffff
>>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>>> 34.57%  2a
>>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>>> 0.02%   55
>>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>>> 0.02%   80
>>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>>> 0.02%   aa
>>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>>> 65.36%  ffffffffffffffff
>>>
>>> *.*.*.191:
>>> Address         DC          Rack        Status State   Load
>>> Owns    Token
>>>
>>>        ffffffffffffffff
>>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>>> 36.46%  2a
>>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>>> 26.02%  55
>>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>>> 0.02%   80
>>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>>> 0.02%   aa
>>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>>> 37.48%  ffffffffffffffff
>>>
>>> *.*.*.192:
>>> Address         DC          Rack        Status State   Load
>>> Owns    Token
>>>
>>>        ffffffffffffffff
>>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>>> 38.16%  2a
>>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>>> 27.61%  55
>>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>>> 34.17%  80
>>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>>> 0.02%   aa
>>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>>> 0.02%   ffffffffffffffff
>>>
>>> *.*.*.194:
>>> Address         DC          Rack        Status State   Load
>>> Owns    Token
>>>
>>>        ffffffffffffffff
>>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>>> 0.03%   2a
>>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>>> 31.43%  55
>>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>>> 39.69%  80
>>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>>> 28.82%  aa
>>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>>> 0.03%   ffffffffffffffff
>>>
>>> *.*.*.196:
>>> Address         DC          Rack        Status State   Load
>>> Owns    Token
>>>
>>>        ffffffffffffffff
>>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>>> 0.02%   2a
>>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>>> 0.02%   55
>>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>>> 0.02%   80
>>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>>> 27.52%  aa
>>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>>> 72.42%  ffffffffffffffff
>>>
>>>
>>> Interestingly, each server shows something completely different.
>>>
>>> Removing the locationInfo files didn't help.
>>> -Dcassandra.load_ring_state=false didn't help as well.
>>>
>>> Our cassandra.yaml is at http://pastebin.com/pCVCt3RM
>>>
>>> Any idea on what might cause this? Is it save to suspect that
>>> operating under this distribution will cause severe data loss? Or can
>>> I safely ignore this?
>>>
>>> Thanks,
>>> Thibaut
>>>
>>>
>
>

Re: Cluster key distribution wrong after upgrading to 0.8.4

Reply via email to