Re: Data distribution odds

Udo Kohlmeyer Tue, 03 May 2016 14:46:29 -0700

Hi Eugene,

Data distribution is done via the bucket allocation in the partitionregion. By default a partition region uses 113 buckets.

Bucket allocation for data entries is done by taking the key.hashCode %113 + 1. This can have an affect on the distribution. If over aprolonged period you see that some buckets have the greater percentageof data, you might consider increasing/decreasing the number of buckets.

You could potentially write your own custom partitioner Custom Partitionresolver<http://geode.docs.pivotal.io/docs/developing/partitioned_regions/using_custom_partition_resolvers.html>.But that seems to be overkill for what you are trying to achieve.

As for distribution of buckets over nodes, this can only be affected bydoing a rebalance. Having an equal distribution of data across all nodesall the time, this does not seem feasible, due to the nature of hashesand keys. But over time, you should see that the data will be "balanced"across all the nodes.


--Udo

On 4/05/2016 7:24 am, Eugene Strokin wrote:

I've 10 nodes cluster now, and when I was starting it up 3 nodesdidn't connect because I wasmissing enable-network-partition-detection=true property. I've sentsome traffic before I noticed the problem and it created total of ~500items in the distributed cache.
I've stopped the traffic.
I've fixed the problem with those 3 nodes.
So 7 nodes had about 55 items and 3 nodes had 0.
I thought, that such small difference wouldn't be even visible in along run, and I've put the traffic back on without cleaning the data.Now, the cluster has total of ~300K items cached, and 7 nodes hasabout 40K items, and those 3 nodes about 10K each.Looks like the distribution kept the ration some how. Is this right?Can it be fixed somehow without running rebalancing job? I don't mindto keep those initial 55 items difference, just wanted to set clusterto keep equal data distribution all the time.
Thanks
Eugene

Re: Data distribution odds

Reply via email to