Hi Kishore, Hope you are having a restful weekend. I was just wondering when I should normally expect the bug fix to go through?
Thank you very much, Lance On Tue, Jun 18, 2013 at 1:36 PM, Lance Co Ting Keh <[email protected]> wrote: > Thanks Kishore, here is the link to the bug: > https://issues.apache.org/jira/browse/HELIX-131 > > > On Tue, Jun 18, 2013 at 9:13 AM, kishore g <[email protected]> wrote: > >> My bad, i dint realize that you needed helixadmin to actually create the >> cluster. Please file a bug, fix it quite simple. >> >> thanks, >> Kishore G >> >> >> On Tue, Jun 18, 2013 at 9:00 AM, Lance Co Ting Keh <[email protected]> wrote: >> >>> Thanks Kishore. Would you like me to file a bug fix for the first >>> solution? >>> >>> Also with the use of the factory, i get the following error message: >>> [error] org.apache.helix.HelixException: Initial cluster structure is >>> not set up for cluster: dev-box-cluster >>> >>> Seems it did not create the appropriate zNodes for me. was there >>> something i was suppose to initialize before calling the factory? >>> >>> Thank you >>> Lance >>> >>> >>> >>> >>> >>> On Mon, Jun 17, 2013 at 8:09 PM, kishore g <[email protected]> wrote: >>> >>>> Hi Lance, >>>> >>>> Looks like we are not setting the connection timeout while connecting >>>> to zookeeper in zkHelixAdmin. >>>> >>>> Fix is to change line 99 in ZkHelixAdmin.java _zkClient = >>>> newZkClient(zkAddress); to >>>> _zkClient = new ZkClient(zkAddress, timeout* 1000); >>>> >>>> Another workaround is to use HelixManager to get HelixAdmin >>>> >>>> manager = HelixManagerFactory.getZKHelixManager(cluster, "Admin", >>>> InstanceType.ADMINISTRATOR, zkAddress); >>>> manager.connect(); >>>> admin= manager. getClusterManagmentTool(); >>>> >>>> This will wait for 60 seconds before failing. >>>> Thanks, >>>> Kishore G >>>> >>>> >>>> On Mon, Jun 17, 2013 at 6:15 PM, Lance Co Ting Keh <[email protected]>wrote: >>>> >>>>> Thank you kishore. I'll definitely try the memory consumption of one >>>>> JVM per node.js server first. If its too much we'll likely do your >>>>> proposed >>>>> design but execute kills via the OS. This is to ensure no rogue servers. >>>>> >>>>> I have a small implementation question. when calling new ZkHelixAdmin, >>>>> when it fails it retries again and again infinitely. (val admin = new >>>>> ZKHelixAdmin("")) is there a method I can override to limit the number of >>>>> reconnects and just have it fail? >>>>> >>>>> >>>>> >>>>> Lance >>>>> >>>>> >>>>> On Sun, Jun 16, 2013 at 11:56 PM, kishore g <[email protected]>wrote: >>>>> >>>>>> Hi Lance, >>>>>> >>>>>> Looks good to me. Having a JVM per node.js server might add >>>>>> additional over head, you should definitely run this with production >>>>>> configuration and ensure that it does not impact performanace. If you >>>>>> find >>>>>> it consuming too many resources, you can probably try this approach. >>>>>> >>>>>> 1. Have one agent per node >>>>>> 2. Instead of creating a separate helix agent per node.js, you >>>>>> can create a multiple participants within the same agent. Each >>>>>> participant >>>>>> will represents node.js process. >>>>>> 3. The monitoring of participant LIVEINSTANCE and killing of >>>>>> node.js process can be done by one of the helix agents. You create an >>>>>> another resource using leader-standby model. Only one helix agent >>>>>> will be >>>>>> the leader and it will monitor the LIVEINSTANCES and if any Helix >>>>>> Agent >>>>>> dies it can ask node.js servers to kill itself( you can use http or >>>>>> any >>>>>> other mechanism of your choice). The idea here is to designate one >>>>>> leader >>>>>> in the system to ensure that helix-agent and node.js act like a pair. >>>>>> >>>>>> You can try this only if you find that overhead of JVM is significant >>>>>> with the approach you have listed. >>>>>> >>>>>> Thanks, >>>>>> Kishore G >>>>>> >>>>>> >>>>>> On Fri, Jun 14, 2013 at 8:37 PM, Lance Co Ting Keh <[email protected]>wrote: >>>>>> >>>>>>> Thank you for your advise Santiago. That is certainly part of the >>>>>>> design as well. >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> Lance >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Helix user here (not developer) so take my words with a grain of >>>>>>>> salt. >>>>>>>> >>>>>>>> Regarding 6 you might want to consider the behavior of the node.js >>>>>>>> instance if that instance loses connection to zk, you'll probably want >>>>>>>> to >>>>>>>> kill it too, otherwise you could ignore the fact that the JVM lost the >>>>>>>> connection too. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Santiago >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> We have a working prototype of basically something like #2 you >>>>>>>>> proposed above. We're using the standard helix participant, and on the >>>>>>>>> @Transitions of the state model send commands to node.js via Http. >>>>>>>>> >>>>>>>>> I want to run you through our general architecture to make sure we >>>>>>>>> are not violating anything on the Helix side. As a reminder, what we >>>>>>>>> need >>>>>>>>> to guarantee is that an any given time one and only one node.js >>>>>>>>> process is >>>>>>>>> in charge of a task. >>>>>>>>> >>>>>>>>> 1. A machine with N cores will have N (pending testing) node.js >>>>>>>>> processes running >>>>>>>>> 2. Associated with each of the N node processes are also N Helix >>>>>>>>> participants (separate JVM instances -- reason for this to come later) >>>>>>>>> 3. Separate helix controller will be running on the machine and >>>>>>>>> will just leader elect between machines. >>>>>>>>> 4. The spectator router will likely be HAProxy and thus a linux >>>>>>>>> kernel will run JVM to serve as Helix spectator >>>>>>>>> 5. The state machine for each will simply be ONLINEOFFLINE mode. >>>>>>>>> (however i do get error messages that say that i havent defined an >>>>>>>>> OFFLINE >>>>>>>>> to DROPPED mode, i was going to ask you this but this is a minor >>>>>>>>> detail >>>>>>>>> compared to the rest of the architecture) >>>>>>>>> 5. Simple Bash script will serve as a watch dog on each node.js >>>>>>>>> and helix participant pair. If any of the two are "dead" the other >>>>>>>>> process >>>>>>>>> must immediately be SIGKILLED, hence the need for one JVM serving as >>>>>>>>> Helix >>>>>>>>> Participant for every Node.js >>>>>>>>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight >>>>>>>>> to zookeeper as an extra safety blanket. If it finds that it is NOT >>>>>>>>> in the >>>>>>>>> liveinstances it likely means that its JVM participant lost its >>>>>>>>> connection >>>>>>>>> to Zookeeper, but the process is still running so the bash script has >>>>>>>>> not >>>>>>>>> terminated the node server. In this case the node server must end its >>>>>>>>> own >>>>>>>>> process. >>>>>>>>> >>>>>>>>> Thank you for all your help. >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Lance >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> Hi Lance, >>>>>>>>>> >>>>>>>>>> Thanks for your interest in Helix. There are two possible >>>>>>>>>> approaches >>>>>>>>>> >>>>>>>>>> 1. Similar to what you suggested: Write a Helix Participant in >>>>>>>>>> non-jvm language which in your case is node.js. There seem to be >>>>>>>>>> quite a >>>>>>>>>> few implementations in node.js that can interact with zookeeper. >>>>>>>>>> Helix >>>>>>>>>> participant does the following ( you got it right but i am providing >>>>>>>>>> right >>>>>>>>>> sequence) >>>>>>>>>> >>>>>>>>>> 1. Create an ephemeral node under LIVEINSTANCES >>>>>>>>>> 2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for >>>>>>>>>> transitions >>>>>>>>>> 3. After transition is completed it updates >>>>>>>>>> /INSTANCES/<PARTICIPANT_NAME>/CURRENTSTATE >>>>>>>>>> >>>>>>>>>> Controller is doing most of the heavy lifting of ensuring that >>>>>>>>>> these transitions lead to the desired configuration. Its quite easy >>>>>>>>>> to >>>>>>>>>> re-implement this in any other language, the most difficult thing >>>>>>>>>> would be >>>>>>>>>> zookeeper binding. We have used java bindings and its solid. >>>>>>>>>> This is at a very high level, there are some more details I have >>>>>>>>>> left out like handling connection loss/session expiry etc that will >>>>>>>>>> require >>>>>>>>>> some thinking. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2. The other option is to use the Helix-agent as a proxy: We >>>>>>>>>> added Helix agent as part of 0.6.1, we havent documented it yet. >>>>>>>>>> Here is >>>>>>>>>> the gist of what it does. Think of it as a generic state transition >>>>>>>>>> handler. You can configure Helix to run a specific system command as >>>>>>>>>> part >>>>>>>>>> of each transition. Helix agent is a separate process that runs >>>>>>>>>> along side >>>>>>>>>> your actual process. Instead of the actual process getting the >>>>>>>>>> transition, >>>>>>>>>> Helix Agent gets the transition. As part of this transition the >>>>>>>>>> Helix agent >>>>>>>>>> can invoke api's on the actual process via RPC, HTTP etc. Helix agent >>>>>>>>>> simply acts as a proxy to the actual process. >>>>>>>>>> >>>>>>>>>> I have another approach and will try to write it up tonight, but >>>>>>>>>> before that I have few questions >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1. How many node.js servers run on each node one or >1 >>>>>>>>>> 2. Spectator/router is java or non java based ? >>>>>>>>>> 3. Can you provide more details about your state machine. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> Kishore G >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did >>>>>>>>>>> a tremendous job with Helix. We are looking to use it to manage a >>>>>>>>>>> cluster >>>>>>>>>>> primarily running Node.js. Our model for using Helix would be >>>>>>>>>>> to have node.js or some other non-JVM library be *Participants*, >>>>>>>>>>> a router as a *Spectator* and another set of machines to serve >>>>>>>>>>> as the *Controllers *(pending testing we may just run >>>>>>>>>>> master-slave controllers on the same instances as the Participants) >>>>>>>>>>> . The >>>>>>>>>>> participants will be interacting with Zookeeper in two ways, one is >>>>>>>>>>> to >>>>>>>>>>> receive helix state transition messages through the instance of the >>>>>>>>>>> HelixManager <Participant>, and another is to directly interact with >>>>>>>>>>> Zookeeper just to maintain ephemeral nodes within /INSTANCES. >>>>>>>>>>> Maintaining >>>>>>>>>>> ephemeral nodes directly to Zookeeper would be done instead of using >>>>>>>>>>> InstanceConfig and calling addInstance on HelixAdmin because of the >>>>>>>>>>> basic >>>>>>>>>>> health checking baked into maintaining ephemeral nodes. If not we >>>>>>>>>>> would >>>>>>>>>>> then have to write a health checker from Node.js and the JVM >>>>>>>>>>> running the >>>>>>>>>>> Participant. Are there better alternatives for non-JVM Helix >>>>>>>>>>> participants? >>>>>>>>>>> I corresponded with Kishore briefly and he mentioned HelixAgents >>>>>>>>>>> specifically ProcessMonitorThread that came out in the last release. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thank you very much! >>>>>>>>>>> >>>>>>>>>>> Lance Co Ting Keh >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
