Hi Eugene,
Data distribution is done via the bucket allocation in the partition
region. By default a partition region uses 113 buckets.
Bucket allocation for data entries is done by taking the key.hashCode %
113 + 1. This can have an affect on the distribution. If over a
prolonged period you see that some buckets have the greater percentage
of data, you might consider increasing/decreasing the number of buckets.
You could potentially write your own custom partitioner Custom Partition
resolver
<http://geode.docs.pivotal.io/docs/developing/partitioned_regions/using_custom_partition_resolvers.html>.
But that seems to be overkill for what you are trying to achieve.
As for distribution of buckets over nodes, this can only be affected by
doing a rebalance. Having an equal distribution of data across all nodes
all the time, this does not seem feasible, due to the nature of hashes
and keys. But over time, you should see that the data will be "balanced"
across all the nodes.
--Udo
On 4/05/2016 7:24 am, Eugene Strokin wrote:
I've 10 nodes cluster now, and when I was starting it up 3 nodes
didn't connect because I was
missing enable-network-partition-detection=true property. I've sent
some traffic before I noticed the problem and it created total of ~500
items in the distributed cache.
I've stopped the traffic.
I've fixed the problem with those 3 nodes.
So 7 nodes had about 55 items and 3 nodes had 0.
I thought, that such small difference wouldn't be even visible in a
long run, and I've put the traffic back on without cleaning the data.
Now, the cluster has total of ~300K items cached, and 7 nodes has
about 40K items, and those 3 nodes about 10K each.
Looks like the distribution kept the ration some how. Is this right?
Can it be fixed somehow without running rebalancing job? I don't mind
to keep those initial 55 items difference, just wanted to set cluster
to keep equal data distribution all the time.
Thanks
Eugene