Re: General Architecture built around Helix

Santiago Perez Fri, 14 Jun 2013 17:34:11 -0700

Helix user here (not developer) so take my words with a grain of salt.

Regarding 6 you might want to consider the behavior of the node.js instance
if that instance loses connection to zk, you'll probably want to kill it
too, otherwise you could ignore the fact that the JVM lost the connection
too.


Regards,
Santiago


On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh <[email protected]> wrote:

> We have a working prototype of basically something like #2 you proposed
> above. We're using the standard helix participant, and on the @Transitions
> of the state model send commands to node.js via Http.
>
> I want to run you through our general architecture to make sure we are not
> violating anything on the Helix side. As a reminder, what we need to
> guarantee is that an any given time one and only one node.js process is in
> charge of a task.
>
> 1. A machine with N cores will have N (pending testing) node.js processes
> running
> 2. Associated with each of the N node processes are also N Helix
> participants (separate JVM instances -- reason for this to come later)
> 3. Separate helix controller will be running on the machine and will just
> leader elect between machines.
> 4. The spectator router will likely be HAProxy and thus a linux kernel
> will run JVM to serve as Helix spectator
> 5. The state machine for each will simply be ONLINEOFFLINE mode. (however
> i do get error messages that say that i havent defined an OFFLINE to
> DROPPED mode, i was going to ask you this but this is a minor detail
> compared to the rest of the architecture)
> 5. Simple Bash script will serve as a watch dog on each node.js and helix
> participant pair. If any of the two are "dead" the other process must
> immediately be SIGKILLED, hence the need for one JVM serving as Helix
> Participant for every Node.js
> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight to
> zookeeper as an extra safety blanket. If it finds that it is NOT in the
> liveinstances it likely means that its JVM participant lost its connection
> to Zookeeper, but the process is still running so the bash script has not
> terminated the node server. In this case the node server must end its own
> process.
>
> Thank you for all your help.
>
> Sincerely,
> Lance
>
>
>
>
> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <[email protected]> wrote:
>
>> Hi Lance,
>>
>> Thanks for your interest in Helix. There are two possible approaches
>>
>> 1. Similar to what you suggested: Write a Helix Participant in non-jvm
>> language which in your case is node.js. There seem to be quite a few
>> implementations in node.js that can interact with zookeeper. Helix
>> participant does the following ( you got it right but i am providing right
>> sequence)
>>
>>    1. Create an ephemeral node under LIVEINSTANCES
>>    2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for transitions
>>    3. After transition is completed it updates
>>    /INSTANCES/<PARTICIPANT_NAME>/CURRENTSTATE
>>
>> Controller is doing most of the heavy lifting of ensuring that these
>> transitions lead to the desired configuration. Its quite easy to
>> re-implement this in any other language, the most difficult thing would be
>> zookeeper binding. We have used java bindings and its solid.
>> This is at a very high level, there are some more details I have left out
>> like handling connection loss/session expiry etc that will require some
>> thinking.
>>
>>
>> 2. The other option is to use the Helix-agent as a proxy: We added Helix
>> agent as part of 0.6.1, we havent documented it yet. Here is the gist of
>> what it does. Think of it as a generic state transition handler. You can
>> configure Helix to run a specific system command as part of each
>> transition. Helix agent is a separate process that runs along side your
>> actual process. Instead of the actual process getting the transition, Helix
>> Agent gets the transition. As part of this transition the Helix agent can
>> invoke api's on the actual process via RPC, HTTP etc. Helix agent simply
>> acts as a proxy to the actual process.
>>
>> I have another approach and will try to write it up tonight, but before
>> that I have few questions
>>
>>
>>    1. How many node.js servers run on each node one or >1
>>    2. Spectator/router is java or non java based ?
>>    3. Can you provide more details about your state machine.
>>
>>
>> thanks,
>> Kishore G
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh <[email protected]>wrote:
>>
>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did a
>>> tremendous job with Helix. We are looking to use it to manage a cluster
>>> primarily running Node.js. Our model for using Helix would be to have
>>> node.js or some other non-JVM library be *Participants*, a router as a *
>>> Spectator* and another set of machines to serve as the *Controllers 
>>> *(pending
>>> testing we may just run master-slave controllers on the same instances as
>>> the Participants) . The participants will be interacting with Zookeeper in
>>> two ways, one is to receive helix state transition messages through the
>>> instance of the HelixManager <Participant>, and another is to directly
>>> interact with Zookeeper just to maintain ephemeral nodes within /INSTANCES.
>>> Maintaining ephemeral nodes directly to Zookeeper would be done instead of
>>> using InstanceConfig and calling addInstance on HelixAdmin because of the
>>> basic health checking baked into maintaining ephemeral nodes. If not we
>>> would then have to write a health checker from Node.js and the JVM running
>>> the Participant. Are there better alternatives for non-JVM Helix
>>> participants? I corresponded with Kishore briefly and he mentioned
>>> HelixAgents specifically ProcessMonitorThread that came out in the last
>>> release.
>>>
>>>
>>> Thank you very much!
>>>
>>>  Lance Co Ting Keh
>>>
>>
>>
>

Re: General Architecture built around Helix

Reply via email to