This is the JIRA I had created for this requirement https://issues.apache.org/jira/browse/HELIX-19.
On Tue, Feb 26, 2013 at 8:30 AM, kishore g <[email protected]> wrote: > Hi Vinayak, > > We have encountered a similar scenario at LinkedIn. > > In one case they have predefined set of ports per host and nodes are > created in Helix upfront for those ports. When the nodes start up they > check if no one has already taken up that instance (host:port), if not they > join the cluster with the host name. > > Another use case, they come up with the name (host_port) based on whats > available and create the instance using helix admin and then join the > cluster. > > The reasoning behind having a consistent naming scheme is to provide a > consistent mechanism of assigning partition to nodes even after restarts. > This is important for stateful systems where we dont want to move the data > on restarts. Another (not really technical but more practical) reason is to > avoid rogue instances connecting to the cluster with random id due to code > bugs or misconfiguration. > > But we dont really enforce having a host and port as part of the instance > name. ( at least not by design). All we enforce is a unique name for each > instance across the cluster. So a node can come up with a unique id, add > itself to the cluster. It can still set its information of port, host etc > in the config so that its discoverable by other nodes in the system. > Probably in your case, you should also drop the instance on disconnect. > > NOTE: there are some command line tools that assume the host:port format, > those are bugs and need to be fixed. But if you use the java api directly > you should not have a problem. In any case let us know if you have problem > setting your own unique id. > > This requirement has come up multiple times at LinkedIn and on other > threads. Will a feature like auto create instance on join and delete on > leave be help ful. We can have this flag set at cluster level when the > cluster is created so we can throw exception if the flag is set is false > and node is not already created. > > Thanks > Kishore G > > > > > > > > > > > > On Mon, Feb 25, 2013 at 7:28 PM, Vinayak Borkar <[email protected]> wrote: > >> Hi Shi, >> >> Thanks for your response. >> >> The Helix documentation suggests that the recommended way to name >> instances is to use a combination of host name and port number. I suppose >> adding the port number allows multiple instances to run on the same >> machine. However, this also means that each instance needs to be provided a >> dedicated port number that is known upfront and is ensured to be stable >> across cluster restarts. >> >> In my particular application, no assumption is made about the >> availability of specific ports on the machine that runs the agent. Instead, >> the agent on startup opens a socket with port 0, getting a free port >> assigned to the socket, which is then used for further communication with >> that agent for the duration that the agent is alive. This strategy of not >> depending on specific ports allows us to run multiple agents on the same >> machine (mostly for testing) without worrying about the agents trying to >> bind to the same port for RPC. In production this scheme let's our agents >> run on the server machines without regards to what ports are available and >> allowing for zero-configuration. >> >> I am in the process of porting this application to use Helix as the >> cluster management platform and trying to figure out what the best way >> would be to do so. To get around the problem, I think I will need to figure >> out a more stable way to name my instances so that they maintain their name >> regardless of which port they are bound to. >> >> Have you encountered other use cases that needed an alternate way to name >> the instances instead of using hostname and port numbers? >> >> Thanks, >> Vinayak >> >> >> Hi Vinayak: >>> >>> In this scenario, Helix admin command / API (see >>> http://helix.incubator.apache.**org/Tutorial.html<http://helix.incubator.apache.org/Tutorial.html>) >>> can be used to add the >>> instance with the new generated name into the cluster, and then the >>> instance can start with the name. But doing this may require the >>> idealstate >>> of the resource hosted in the helix cluster be re-calculated after the >>> new >>> instance is added, unless the resource is in auto-rebalance mode. >>> >>> Can you share some more details about your use case? >>> >>> Thanks, >>> -Shi >>> >>> >>> On Mon, Feb 25, 2013 at 1:29 PM, Vinayak Borkar <[email protected]> >>> wrote: >>> >>> Hi guys, >>>> >>>> >>>> I am trying to use Helix in a system where the "instances" start up and >>>> listen on a free port that is not pre-configured before the application >>>> starts -- This is done so that the application does not rely on the >>>> availability of specific ports. As a result, the instance name (host, >>>> port) >>>> are not know upfront. However, Helix requires the instance be created in >>>> >>>> Helix before it connects. Any ideas to get out of this situation? Is >>>> there >>>> a way to tell Helix to create an instance on receiving a connection >>>> from an >>>> instance? >>>> >>>> Thanks, >>>> Vinayak >>>> >>>> >
