Hi there Ashish,
I think it is safe to assume that once you change the PartitionResolver
strategy that you might have to reload the data.
I will not commit to a definitive, “Yes, you have to reload the data and cannot
load it again from disk” answer, but I think that answer will become
self-evident when you change the region configuration, as some settings on the
region cannot be amended after creation.
I don’t know if you have considered this yet, but it sounds like you have some
“complex” string key, that you try and parse for the common. Have you consider
maybe using an Object like
public class ComplexKey implements DataSerializable {
private String commonPartitioningKey;
private String key;
public ComplexKey() {}
public ComplexKey(String commonPartitioningKey, String key) {
this.commonPartitioningKey = commonPartitioningKey;
this.key = key;
}
@Override
public int hashCode() {
return key.hashCode();
}
@Override
public boolean equals(Object obj) {
return this.key.equals(((ComplexKey) obj).key);
}
public Object getCommonPartitioningKey() {
return commonPartitioningKey;
}
public void setCommonPartitioningKey(String commonPartitioningKey) {
this.commonPartitioningKey = commonPartitioningKey;
}
public Object getKey() {
return key;
}
public void setKey(String key) {
this.key = key;
}
@Override
public void toData(DataOutput out) throws IOException {
out.writeUTF(commonPartitioningKey);
out.writeUTF(key);
}
@Override
public void fromData(DataInput in) throws IOException, ClassNotFoundException
{
commonPartitioningKey = in.readUTF();
key = in.readUTF();
}
}
Where you can still do a get using the natural key of the object but the
PartitionResolver can partition according to the partitioningKey. Imo, it just
cleanly separates the partitioning and natural key logic.
BE AWARE, you should not use PDX serialization for keys, so stick to
Serializable or DataSerializable.
As for functions. You should see no difference. Colocation just means that the
same bucket number of colocated regions are stored on the same server. What you
can now use, is you the notion of “local” data across colocated regions and
don’t need to go across the network if you need to access colocated data. So
possibly functions can run using local data only and don’t need to go across a
network if they need data from another region. I might improve performance a
little.
Anyway, lots of information. Reach out if you get stuck or don’t understand
something.
—Udo
On Jul 16, 2020, 9:38 AM -0700, aashish choudhary
<[email protected]>, wrote:
Hi,
We are seeing some performance issue with partitioned regions as when we
execute data aware function then some of the calls to other regions inside
functions goes to different nodes for further processing. So we are trying to
implement data colocation between those regions.
We will be using custom partitioning of data by implementing PartitionResolver
interface.
Questions
I believe we would need to import/export data again after creating regions with
colocation. Please confirm.
Since we have regions with different key but all regions have first part of the
key common(separated by _) so in partition resolver implementing class we just
take the first of key for routing. Will this custom partition the data
correctly?
Do we need to do any changes while reading data in functions after enabling
data colocation?
With best regards,
Ashish