Hi Eugene,

Data distribution is done via the bucket allocation in the partition region. By default a partition region uses 113 buckets.

Bucket allocation for data entries is done by taking the key.hashCode % 113 + 1. This can have an affect on the distribution. If over a prolonged period you see that some buckets have the greater percentage of data, you might consider increasing/decreasing the number of buckets.

You could potentially write your own custom partitioner Custom Partition resolver <http://geode.docs.pivotal.io/docs/developing/partitioned_regions/using_custom_partition_resolvers.html>. But that seems to be overkill for what you are trying to achieve.

As for distribution of buckets over nodes, this can only be affected by doing a rebalance. Having an equal distribution of data across all nodes all the time, this does not seem feasible, due to the nature of hashes and keys. But over time, you should see that the data will be "balanced" across all the nodes.

--Udo

On 4/05/2016 7:24 am, Eugene Strokin wrote:
I've 10 nodes cluster now, and when I was starting it up 3 nodes didn't connect because I was missing enable-network-partition-detection=true property. I've sent some traffic before I noticed the problem and it created total of ~500 items in the distributed cache.
I've stopped the traffic.
I've fixed the problem with those 3 nodes.
So 7 nodes had about 55 items and 3 nodes had 0.
I thought, that such small difference wouldn't be even visible in a long run, and I've put the traffic back on without cleaning the data. Now, the cluster has total of ~300K items cached, and 7 nodes has about 40K items, and those 3 nodes about 10K each. Looks like the distribution kept the ration some how. Is this right? Can it be fixed somehow without running rebalancing job? I don't mind to keep those initial 55 items difference, just wanted to set cluster to keep equal data distribution all the time.

Thanks
Eugene

Reply via email to