Re: Dynamically configuring instances

kishore g Tue, 26 Feb 2013 08:31:00 -0800

Hi Vinayak,

We have encountered a similar scenario at LinkedIn.

In one case they have predefined set of ports per host and nodes are
created in Helix upfront for those ports. When the nodes start up they
check if no one has already taken up that instance (host:port), if not they
join the cluster with the host name.

Another use case, they come up with the name (host_port) based on whats
available and create the instance using helix admin and then join the
cluster.

The reasoning behind having a consistent naming scheme is to provide a
consistent mechanism of assigning partition to nodes even after restarts.
This is important for stateful systems where we dont want to move the data
on restarts. Another (not really technical but more practical) reason is to
avoid rogue instances connecting to the cluster with random id due to code
bugs or misconfiguration.

But we dont really enforce having a host and port as part of the instance
name. ( at least not by design). All we enforce is a unique name for each
instance across the cluster. So a node can come up with a unique id, add
itself to the cluster. It can still set its information of port, host etc
in the config so that its discoverable by other nodes in the system.
Probably in your case, you should also drop the instance on disconnect.

NOTE: there are some command line tools that assume the host:port format,
those are bugs and need to be fixed. But if you use the java api directly
you should not have a problem. In any case let us know if you have problem
setting your own unique id.

This requirement has come up multiple times at LinkedIn and on other
threads. Will a feature  like auto create instance on join and delete on
leave be help ful. We can have this flag set at cluster level when the
cluster is created so we can throw exception if the flag is set is false
and node is not already created.

Thanks
Kishore G

On Mon, Feb 25, 2013 at 7:28 PM, Vinayak Borkar <[email protected]> wrote:

> Hi Shi,
>
> Thanks for your response.
>
> The Helix documentation suggests that the recommended way to name
> instances is to use a combination of host name and port number. I suppose
> adding the port number allows multiple instances to run on the same
> machine. However, this also means that each instance needs to be provided a
> dedicated port number that is known upfront and is ensured to be stable
> across cluster restarts.
>
> In my particular application, no assumption is made about the availability
> of specific ports on the machine that runs the agent. Instead, the agent on
> startup opens a socket with port 0, getting a free port assigned to the
> socket, which is then used for further communication with that agent for
> the duration that the agent is alive. This strategy of not depending on
> specific ports allows us to run multiple agents on the same machine (mostly
> for testing) without worrying about the agents trying to bind to the same
> port for RPC. In production this scheme let's our agents run on the server
> machines without regards to what ports are available and allowing for
> zero-configuration.
>
> I am in the process of porting this application to use Helix as the
> cluster management platform and trying to figure out what the best way
> would be to do so. To get around the problem, I think I will need to figure
> out a more stable way to name my instances so that they maintain their name
> regardless of which port they are bound to.
>
> Have you encountered other use cases that needed an alternate way to name
> the instances instead of using hostname and port numbers?
>
> Thanks,
> Vinayak
>
>
>  Hi Vinayak:
>>
>> In this scenario, Helix admin command / API (see
>> http://helix.incubator.apache.**org/Tutorial.html<http://helix.incubator.apache.org/Tutorial.html>)
>> can be used to add the
>> instance with the new generated name into the cluster, and then the
>> instance can start with the name. But doing this may require the
>> idealstate
>> of the resource hosted in the helix cluster be re-calculated after the new
>> instance is added, unless the resource is in auto-rebalance mode.
>>
>> Can you share some more details about your use case?
>>
>> Thanks,
>> -Shi
>>
>>
>> On Mon, Feb 25, 2013 at 1:29 PM, Vinayak Borkar <[email protected]> wrote:
>>
>>  Hi guys,
>>>
>>>
>>> I am trying to use Helix in a system where the "instances" start up and
>>> listen on a free port that is not pre-configured before the application
>>> starts -- This is done so that the application does not rely on the
>>> availability of specific ports. As a result, the instance name (host,
>>> port)
>>> are not know upfront. However, Helix requires the instance be created in
>>>
>>> Helix before it connects. Any ideas to get out of this situation? Is
>>> there
>>> a way to tell Helix to create an instance on receiving a connection from
>>> an
>>> instance?
>>>
>>> Thanks,
>>> Vinayak
>>>
>>>

Re: Dynamically configuring instances

Reply via email to