Hi Lance, That a fairly simple fix. Will provide the fix tomorrow.
thanks, Kishore G On Sun, Jun 23, 2013 at 2:18 PM, Lance Co Ting Keh <[email protected]> wrote: > Hi Kishore, > > Hope you are having a restful weekend. I was just wondering when I should > normally expect the bug fix to go through? > > > Thank you very much, > Lance > > > On Tue, Jun 18, 2013 at 1:36 PM, Lance Co Ting Keh <[email protected]> wrote: > >> Thanks Kishore, here is the link to the bug: >> https://issues.apache.org/jira/browse/HELIX-131 >> >> >> On Tue, Jun 18, 2013 at 9:13 AM, kishore g <[email protected]> wrote: >> >>> My bad, i dint realize that you needed helixadmin to actually create the >>> cluster. Please file a bug, fix it quite simple. >>> >>> thanks, >>> Kishore G >>> >>> >>> On Tue, Jun 18, 2013 at 9:00 AM, Lance Co Ting Keh <[email protected]>wrote: >>> >>>> Thanks Kishore. Would you like me to file a bug fix for the first >>>> solution? >>>> >>>> Also with the use of the factory, i get the following error message: >>>> [error] org.apache.helix.HelixException: Initial cluster structure is >>>> not set up for cluster: dev-box-cluster >>>> >>>> Seems it did not create the appropriate zNodes for me. was there >>>> something i was suppose to initialize before calling the factory? >>>> >>>> Thank you >>>> Lance >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Jun 17, 2013 at 8:09 PM, kishore g <[email protected]> wrote: >>>> >>>>> Hi Lance, >>>>> >>>>> Looks like we are not setting the connection timeout while connecting >>>>> to zookeeper in zkHelixAdmin. >>>>> >>>>> Fix is to change line 99 in ZkHelixAdmin.java _zkClient = >>>>> newZkClient(zkAddress); to >>>>> _zkClient = new ZkClient(zkAddress, timeout* 1000); >>>>> >>>>> Another workaround is to use HelixManager to get HelixAdmin >>>>> >>>>> manager = HelixManagerFactory.getZKHelixManager(cluster, "Admin", >>>>> InstanceType.ADMINISTRATOR, zkAddress); >>>>> manager.connect(); >>>>> admin= manager. getClusterManagmentTool(); >>>>> >>>>> This will wait for 60 seconds before failing. >>>>> Thanks, >>>>> Kishore G >>>>> >>>>> >>>>> On Mon, Jun 17, 2013 at 6:15 PM, Lance Co Ting Keh <[email protected]>wrote: >>>>> >>>>>> Thank you kishore. I'll definitely try the memory consumption of one >>>>>> JVM per node.js server first. If its too much we'll likely do your >>>>>> proposed >>>>>> design but execute kills via the OS. This is to ensure no rogue servers. >>>>>> >>>>>> I have a small implementation question. when calling new >>>>>> ZkHelixAdmin, when it fails it retries again and again infinitely. (val >>>>>> admin = new ZKHelixAdmin("")) is there a method I can override to limit >>>>>> the >>>>>> number of reconnects and just have it fail? >>>>>> >>>>>> >>>>>> >>>>>> Lance >>>>>> >>>>>> >>>>>> On Sun, Jun 16, 2013 at 11:56 PM, kishore g <[email protected]>wrote: >>>>>> >>>>>>> Hi Lance, >>>>>>> >>>>>>> Looks good to me. Having a JVM per node.js server might add >>>>>>> additional over head, you should definitely run this with production >>>>>>> configuration and ensure that it does not impact performanace. If you >>>>>>> find >>>>>>> it consuming too many resources, you can probably try this approach. >>>>>>> >>>>>>> 1. Have one agent per node >>>>>>> 2. Instead of creating a separate helix agent per node.js, you >>>>>>> can create a multiple participants within the same agent. Each >>>>>>> participant >>>>>>> will represents node.js process. >>>>>>> 3. The monitoring of participant LIVEINSTANCE and killing of >>>>>>> node.js process can be done by one of the helix agents. You create an >>>>>>> another resource using leader-standby model. Only one helix agent >>>>>>> will be >>>>>>> the leader and it will monitor the LIVEINSTANCES and if any Helix >>>>>>> Agent >>>>>>> dies it can ask node.js servers to kill itself( you can use http or >>>>>>> any >>>>>>> other mechanism of your choice). The idea here is to designate one >>>>>>> leader >>>>>>> in the system to ensure that helix-agent and node.js act like a pair. >>>>>>> >>>>>>> You can try this only if you find that overhead of JVM is >>>>>>> significant with the approach you have listed. >>>>>>> >>>>>>> Thanks, >>>>>>> Kishore G >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 14, 2013 at 8:37 PM, Lance Co Ting Keh <[email protected]>wrote: >>>>>>> >>>>>>>> Thank you for your advise Santiago. That is certainly part of the >>>>>>>> design as well. >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> Lance >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Helix user here (not developer) so take my words with a grain of >>>>>>>>> salt. >>>>>>>>> >>>>>>>>> Regarding 6 you might want to consider the behavior of the node.js >>>>>>>>> instance if that instance loses connection to zk, you'll probably >>>>>>>>> want to >>>>>>>>> kill it too, otherwise you could ignore the fact that the JVM lost the >>>>>>>>> connection too. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Santiago >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh >>>>>>>>> <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> We have a working prototype of basically something like #2 you >>>>>>>>>> proposed above. We're using the standard helix participant, and on >>>>>>>>>> the >>>>>>>>>> @Transitions of the state model send commands to node.js via Http. >>>>>>>>>> >>>>>>>>>> I want to run you through our general architecture to make sure >>>>>>>>>> we are not violating anything on the Helix side. As a reminder, what >>>>>>>>>> we >>>>>>>>>> need to guarantee is that an any given time one and only one node.js >>>>>>>>>> process is in charge of a task. >>>>>>>>>> >>>>>>>>>> 1. A machine with N cores will have N (pending testing) node.js >>>>>>>>>> processes running >>>>>>>>>> 2. Associated with each of the N node processes are also N Helix >>>>>>>>>> participants (separate JVM instances -- reason for this to come >>>>>>>>>> later) >>>>>>>>>> 3. Separate helix controller will be running on the machine and >>>>>>>>>> will just leader elect between machines. >>>>>>>>>> 4. The spectator router will likely be HAProxy and thus a linux >>>>>>>>>> kernel will run JVM to serve as Helix spectator >>>>>>>>>> 5. The state machine for each will simply be ONLINEOFFLINE mode. >>>>>>>>>> (however i do get error messages that say that i havent defined an >>>>>>>>>> OFFLINE >>>>>>>>>> to DROPPED mode, i was going to ask you this but this is a minor >>>>>>>>>> detail >>>>>>>>>> compared to the rest of the architecture) >>>>>>>>>> 5. Simple Bash script will serve as a watch dog on each node.js >>>>>>>>>> and helix participant pair. If any of the two are "dead" the other >>>>>>>>>> process >>>>>>>>>> must immediately be SIGKILLED, hence the need for one JVM serving as >>>>>>>>>> Helix >>>>>>>>>> Participant for every Node.js >>>>>>>>>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight >>>>>>>>>> to zookeeper as an extra safety blanket. If it finds that it is NOT >>>>>>>>>> in the >>>>>>>>>> liveinstances it likely means that its JVM participant lost its >>>>>>>>>> connection >>>>>>>>>> to Zookeeper, but the process is still running so the bash script >>>>>>>>>> has not >>>>>>>>>> terminated the node server. In this case the node server must end >>>>>>>>>> its own >>>>>>>>>> process. >>>>>>>>>> >>>>>>>>>> Thank you for all your help. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Lance >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g >>>>>>>>>> <[email protected]>wrote: >>>>>>>>>> >>>>>>>>>>> Hi Lance, >>>>>>>>>>> >>>>>>>>>>> Thanks for your interest in Helix. There are two possible >>>>>>>>>>> approaches >>>>>>>>>>> >>>>>>>>>>> 1. Similar to what you suggested: Write a Helix Participant in >>>>>>>>>>> non-jvm language which in your case is node.js. There seem to be >>>>>>>>>>> quite a >>>>>>>>>>> few implementations in node.js that can interact with zookeeper. >>>>>>>>>>> Helix >>>>>>>>>>> participant does the following ( you got it right but i am >>>>>>>>>>> providing right >>>>>>>>>>> sequence) >>>>>>>>>>> >>>>>>>>>>> 1. Create an ephemeral node under LIVEINSTANCES >>>>>>>>>>> 2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for >>>>>>>>>>> transitions >>>>>>>>>>> 3. After transition is completed it updates >>>>>>>>>>> /INSTANCES/<PARTICIPANT_NAME>/CURRENTSTATE >>>>>>>>>>> >>>>>>>>>>> Controller is doing most of the heavy lifting of ensuring that >>>>>>>>>>> these transitions lead to the desired configuration. Its quite easy >>>>>>>>>>> to >>>>>>>>>>> re-implement this in any other language, the most difficult thing >>>>>>>>>>> would be >>>>>>>>>>> zookeeper binding. We have used java bindings and its solid. >>>>>>>>>>> This is at a very high level, there are some more details I have >>>>>>>>>>> left out like handling connection loss/session expiry etc that will >>>>>>>>>>> require >>>>>>>>>>> some thinking. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2. The other option is to use the Helix-agent as a proxy: We >>>>>>>>>>> added Helix agent as part of 0.6.1, we havent documented it yet. >>>>>>>>>>> Here is >>>>>>>>>>> the gist of what it does. Think of it as a generic state transition >>>>>>>>>>> handler. You can configure Helix to run a specific system command >>>>>>>>>>> as part >>>>>>>>>>> of each transition. Helix agent is a separate process that runs >>>>>>>>>>> along side >>>>>>>>>>> your actual process. Instead of the actual process getting the >>>>>>>>>>> transition, >>>>>>>>>>> Helix Agent gets the transition. As part of this transition the >>>>>>>>>>> Helix agent >>>>>>>>>>> can invoke api's on the actual process via RPC, HTTP etc. Helix >>>>>>>>>>> agent >>>>>>>>>>> simply acts as a proxy to the actual process. >>>>>>>>>>> >>>>>>>>>>> I have another approach and will try to write it up tonight, but >>>>>>>>>>> before that I have few questions >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1. How many node.js servers run on each node one or >1 >>>>>>>>>>> 2. Spectator/router is java or non java based ? >>>>>>>>>>> 3. Can you provide more details about your state machine. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> Kishore G >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did >>>>>>>>>>>> a tremendous job with Helix. We are looking to use it to manage a >>>>>>>>>>>> cluster >>>>>>>>>>>> primarily running Node.js. Our model for using Helix would be >>>>>>>>>>>> to have node.js or some other non-JVM library be *Participants*, >>>>>>>>>>>> a router as a *Spectator* and another set of machines to serve >>>>>>>>>>>> as the *Controllers *(pending testing we may just run >>>>>>>>>>>> master-slave controllers on the same instances as the >>>>>>>>>>>> Participants) . The >>>>>>>>>>>> participants will be interacting with Zookeeper in two ways, one >>>>>>>>>>>> is to >>>>>>>>>>>> receive helix state transition messages through the instance of the >>>>>>>>>>>> HelixManager <Participant>, and another is to directly interact >>>>>>>>>>>> with >>>>>>>>>>>> Zookeeper just to maintain ephemeral nodes within /INSTANCES. >>>>>>>>>>>> Maintaining >>>>>>>>>>>> ephemeral nodes directly to Zookeeper would be done instead of >>>>>>>>>>>> using >>>>>>>>>>>> InstanceConfig and calling addInstance on HelixAdmin because of >>>>>>>>>>>> the basic >>>>>>>>>>>> health checking baked into maintaining ephemeral nodes. If not we >>>>>>>>>>>> would >>>>>>>>>>>> then have to write a health checker from Node.js and the JVM >>>>>>>>>>>> running the >>>>>>>>>>>> Participant. Are there better alternatives for non-JVM Helix >>>>>>>>>>>> participants? >>>>>>>>>>>> I corresponded with Kishore briefly and he mentioned HelixAgents >>>>>>>>>>>> specifically ProcessMonitorThread that came out in the last >>>>>>>>>>>> release. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thank you very much! >>>>>>>>>>>> >>>>>>>>>>>> Lance Co Ting Keh >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
