Hi Kishore,

Thanks for creating the JIRA. I will try to respond to this mail here, but please let me know if you would like to continue further discussion on the issue in the JIRA going forward.
>
The reasoning behind having a consistent naming scheme is to provide a
consistent mechanism of assigning partition to nodes even after restarts.
This is important for stateful systems where we dont want to move the data

I see the need for stability in naming instances to a avoid complete reshuffle on cluster restart. However, IMO this is a consequence of Helix's design of having ZooKeeper be the single source of truth when the cluster is not running.

Let's say Helix had an alternate approach:

While the cluster is running, let's say that ZooKeeper is used as the source of truth regarding locations of partitions of resources. On the other hand, when the cluster starts up, say ZK starts with a clean slate that is incrementally populated as instances join the cluster based on partitions reported by each instance during the "join" process. After this point say Helix continued doing what it does today.

With this approach, instance names matter only while the cluster is running and has no stability requirements across restarts. However, this is a huge change for Helix and I am sure you guys probably thought about this as a possible direction - I would like to hear your thoughts on this topic.


on restarts. Another (not really technical but more practical) reason is to
avoid rogue instances connecting to the cluster with random id due to code
bugs or misconfiguration.

I completely agree with the need to handle the rogue/misconfigured instances case.


This requirement has come up multiple times at LinkedIn and on other
threads. Will a feature  like auto create instance on join and delete on
leave be help ful. We can have this flag set at cluster level when the
cluster is created so we can throw exception if the flag is set is false
and node is not already created.

While the above feature would be great for adding new instances with little configuration (and for zero-configuration while testing), there still needs a way to handle a loaded cluster restart without leading to a massive reshuffle.


Thanks,
Vinayak

Reply via email to