Hi Vinayak, Thats a good observation, we do use zookeeper as the source of truth and it makes it easier for a central component to reason about the state and take appropriate actions.
We do have the logic of reporting the partitions a node owns when it starts up, see the carryOverPrevious code where we populate what partitions where hosted by a participant in its previous life. But we reset the state to initial state(OFFLINE) since its not a good idea to carry over the previous state associated with the partition. Currently this happens automatically because the previous state is already maintained in the zookeeper and straight forward for us to know the previous state because they start with same id. In your case, you can chose to maintain information locally and when the node starts up it can upload the information to zookeeper and the central component can either chose to keep the existing mapping or shuffle based on the use case requirement. To make work with your scenario where instance name keeps changing, we need additional api to take the partitions it owns as input while joining the cluster. Note one of the challenges you will face when you change the instance names dynamically, is to come up with an assignment algorithm that is idempotent. That is given a set of instances names and partitions(tasks) come up with a mapping of partition to instance names. Good idea of moving this to a JIRA, i wish there was a tool to convert email discussion into a jira :) Email is good because it allows others to provide their input.( not many check JIRA:)) Thanks, Kishore G On Tue, Feb 26, 2013 at 9:48 AM, Vinayak Borkar <[email protected]> wrote: > Hi Kishore, > > > Thanks for creating the JIRA. I will try to respond to this mail here, but > please let me know if you would like to continue further discussion on the > issue in the JIRA going forward. > > > > >> The reasoning behind having a consistent naming scheme is to provide a >> consistent mechanism of assigning partition to nodes even after restarts. >> This is important for stateful systems where we dont want to move the data >> > > I see the need for stability in naming instances to a avoid complete > reshuffle on cluster restart. However, IMO this is a consequence of Helix's > design of having ZooKeeper be the single source of truth when the cluster > is not running. > > Let's say Helix had an alternate approach: > > While the cluster is running, let's say that ZooKeeper is used as the > source of truth regarding locations of partitions of resources. On the > other hand, when the cluster starts up, say ZK starts with a clean slate > that is incrementally populated as instances join the cluster based on > partitions reported by each instance during the "join" process. After this > point say Helix continued doing what it does today. > > With this approach, instance names matter only while the cluster is > running and has no stability requirements across restarts. However, this is > a huge change for Helix and I am sure you guys probably thought about this > as a possible direction - I would like to hear your thoughts on this topic. > > > > on restarts. Another (not really technical but more practical) reason is >> to >> avoid rogue instances connecting to the cluster with random id due to code >> bugs or misconfiguration. >> > > I completely agree with the need to handle the rogue/misconfigured > instances case. > > > >> This requirement has come up multiple times at LinkedIn and on other >> threads. Will a feature like auto create instance on join and delete on >> leave be help ful. We can have this flag set at cluster level when the >> cluster is created so we can throw exception if the flag is set is false >> and node is not already created. >> > > While the above feature would be great for adding new instances with > little configuration (and for zero-configuration while testing), there > still needs a way to handle a loaded cluster restart without leading to a > massive reshuffle. > > > Thanks, > Vinayak >
