Hi Vinayak, We have encountered a similar scenario at LinkedIn.
In one case they have predefined set of ports per host and nodes are created in Helix upfront for those ports. When the nodes start up they check if no one has already taken up that instance (host:port), if not they join the cluster with the host name. Another use case, they come up with the name (host_port) based on whats available and create the instance using helix admin and then join the cluster. The reasoning behind having a consistent naming scheme is to provide a consistent mechanism of assigning partition to nodes even after restarts. This is important for stateful systems where we dont want to move the data on restarts. Another (not really technical but more practical) reason is to avoid rogue instances connecting to the cluster with random id due to code bugs or misconfiguration. But we dont really enforce having a host and port as part of the instance name. ( at least not by design). All we enforce is a unique name for each instance across the cluster. So a node can come up with a unique id, add itself to the cluster. It can still set its information of port, host etc in the config so that its discoverable by other nodes in the system. Probably in your case, you should also drop the instance on disconnect. NOTE: there are some command line tools that assume the host:port format, those are bugs and need to be fixed. But if you use the java api directly you should not have a problem. In any case let us know if you have problem setting your own unique id. This requirement has come up multiple times at LinkedIn and on other threads. Will a feature like auto create instance on join and delete on leave be help ful. We can have this flag set at cluster level when the cluster is created so we can throw exception if the flag is set is false and node is not already created. Thanks Kishore G On Mon, Feb 25, 2013 at 7:28 PM, Vinayak Borkar <[email protected]> wrote: > Hi Shi, > > Thanks for your response. > > The Helix documentation suggests that the recommended way to name > instances is to use a combination of host name and port number. I suppose > adding the port number allows multiple instances to run on the same > machine. However, this also means that each instance needs to be provided a > dedicated port number that is known upfront and is ensured to be stable > across cluster restarts. > > In my particular application, no assumption is made about the availability > of specific ports on the machine that runs the agent. Instead, the agent on > startup opens a socket with port 0, getting a free port assigned to the > socket, which is then used for further communication with that agent for > the duration that the agent is alive. This strategy of not depending on > specific ports allows us to run multiple agents on the same machine (mostly > for testing) without worrying about the agents trying to bind to the same > port for RPC. In production this scheme let's our agents run on the server > machines without regards to what ports are available and allowing for > zero-configuration. > > I am in the process of porting this application to use Helix as the > cluster management platform and trying to figure out what the best way > would be to do so. To get around the problem, I think I will need to figure > out a more stable way to name my instances so that they maintain their name > regardless of which port they are bound to. > > Have you encountered other use cases that needed an alternate way to name > the instances instead of using hostname and port numbers? > > Thanks, > Vinayak > > > Hi Vinayak: >> >> In this scenario, Helix admin command / API (see >> http://helix.incubator.apache.**org/Tutorial.html<http://helix.incubator.apache.org/Tutorial.html>) >> can be used to add the >> instance with the new generated name into the cluster, and then the >> instance can start with the name. But doing this may require the >> idealstate >> of the resource hosted in the helix cluster be re-calculated after the new >> instance is added, unless the resource is in auto-rebalance mode. >> >> Can you share some more details about your use case? >> >> Thanks, >> -Shi >> >> >> On Mon, Feb 25, 2013 at 1:29 PM, Vinayak Borkar <[email protected]> wrote: >> >> Hi guys, >>> >>> >>> I am trying to use Helix in a system where the "instances" start up and >>> listen on a free port that is not pre-configured before the application >>> starts -- This is done so that the application does not rely on the >>> availability of specific ports. As a result, the instance name (host, >>> port) >>> are not know upfront. However, Helix requires the instance be created in >>> >>> Helix before it connects. Any ideas to get out of this situation? Is >>> there >>> a way to tell Helix to create an instance on receiving a connection from >>> an >>> instance? >>> >>> Thanks, >>> Vinayak >>> >>>
