My bad, i dint realize that you needed helixadmin to actually create the cluster. Please file a bug, fix it quite simple.
thanks, Kishore G On Tue, Jun 18, 2013 at 9:00 AM, Lance Co Ting Keh <[email protected]> wrote: > Thanks Kishore. Would you like me to file a bug fix for the first > solution? > > Also with the use of the factory, i get the following error message: > [error] org.apache.helix.HelixException: Initial cluster structure is not > set up for cluster: dev-box-cluster > > Seems it did not create the appropriate zNodes for me. was there something > i was suppose to initialize before calling the factory? > > Thank you > Lance > > > > > > On Mon, Jun 17, 2013 at 8:09 PM, kishore g <[email protected]> wrote: > >> Hi Lance, >> >> Looks like we are not setting the connection timeout while connecting to >> zookeeper in zkHelixAdmin. >> >> Fix is to change line 99 in ZkHelixAdmin.java _zkClient = >> newZkClient(zkAddress); to >> _zkClient = new ZkClient(zkAddress, timeout* 1000); >> >> Another workaround is to use HelixManager to get HelixAdmin >> >> manager = HelixManagerFactory.getZKHelixManager(cluster, "Admin", >> InstanceType.ADMINISTRATOR, zkAddress); >> manager.connect(); >> admin= manager. getClusterManagmentTool(); >> >> This will wait for 60 seconds before failing. >> Thanks, >> Kishore G >> >> >> On Mon, Jun 17, 2013 at 6:15 PM, Lance Co Ting Keh <[email protected]> wrote: >> >>> Thank you kishore. I'll definitely try the memory consumption of one JVM >>> per node.js server first. If its too much we'll likely do your proposed >>> design but execute kills via the OS. This is to ensure no rogue servers. >>> >>> I have a small implementation question. when calling new ZkHelixAdmin, >>> when it fails it retries again and again infinitely. (val admin = new >>> ZKHelixAdmin("")) is there a method I can override to limit the number of >>> reconnects and just have it fail? >>> >>> >>> >>> Lance >>> >>> >>> On Sun, Jun 16, 2013 at 11:56 PM, kishore g <[email protected]> wrote: >>> >>>> Hi Lance, >>>> >>>> Looks good to me. Having a JVM per node.js server might add additional >>>> over head, you should definitely run this with production configuration and >>>> ensure that it does not impact performanace. If you find it consuming too >>>> many resources, you can probably try this approach. >>>> >>>> 1. Have one agent per node >>>> 2. Instead of creating a separate helix agent per node.js, you can >>>> create a multiple participants within the same agent. Each participant >>>> will >>>> represents node.js process. >>>> 3. The monitoring of participant LIVEINSTANCE and killing of >>>> node.js process can be done by one of the helix agents. You create an >>>> another resource using leader-standby model. Only one helix agent will >>>> be >>>> the leader and it will monitor the LIVEINSTANCES and if any Helix Agent >>>> dies it can ask node.js servers to kill itself( you can use http or any >>>> other mechanism of your choice). The idea here is to designate one >>>> leader >>>> in the system to ensure that helix-agent and node.js act like a pair. >>>> >>>> You can try this only if you find that overhead of JVM is significant >>>> with the approach you have listed. >>>> >>>> Thanks, >>>> Kishore G >>>> >>>> >>>> On Fri, Jun 14, 2013 at 8:37 PM, Lance Co Ting Keh <[email protected]>wrote: >>>> >>>>> Thank you for your advise Santiago. That is certainly part of the >>>>> design as well. >>>>> >>>>> >>>>> Best, >>>>> Lance >>>>> >>>>> >>>>> On Fri, Jun 14, 2013 at 5:32 PM, Santiago Perez >>>>> <[email protected]>wrote: >>>>> >>>>>> Helix user here (not developer) so take my words with a grain of salt. >>>>>> >>>>>> Regarding 6 you might want to consider the behavior of the node.js >>>>>> instance if that instance loses connection to zk, you'll probably want to >>>>>> kill it too, otherwise you could ignore the fact that the JVM lost the >>>>>> connection too. >>>>>> >>>>>> Regards, >>>>>> Santiago >>>>>> >>>>>> >>>>>> On Fri, Jun 14, 2013 at 6:30 PM, Lance Co Ting Keh <[email protected]>wrote: >>>>>> >>>>>>> We have a working prototype of basically something like #2 you >>>>>>> proposed above. We're using the standard helix participant, and on the >>>>>>> @Transitions of the state model send commands to node.js via Http. >>>>>>> >>>>>>> I want to run you through our general architecture to make sure we >>>>>>> are not violating anything on the Helix side. As a reminder, what we >>>>>>> need >>>>>>> to guarantee is that an any given time one and only one node.js process >>>>>>> is >>>>>>> in charge of a task. >>>>>>> >>>>>>> 1. A machine with N cores will have N (pending testing) node.js >>>>>>> processes running >>>>>>> 2. Associated with each of the N node processes are also N Helix >>>>>>> participants (separate JVM instances -- reason for this to come later) >>>>>>> 3. Separate helix controller will be running on the machine and will >>>>>>> just leader elect between machines. >>>>>>> 4. The spectator router will likely be HAProxy and thus a linux >>>>>>> kernel will run JVM to serve as Helix spectator >>>>>>> 5. The state machine for each will simply be ONLINEOFFLINE mode. >>>>>>> (however i do get error messages that say that i havent defined an >>>>>>> OFFLINE >>>>>>> to DROPPED mode, i was going to ask you this but this is a minor detail >>>>>>> compared to the rest of the architecture) >>>>>>> 5. Simple Bash script will serve as a watch dog on each node.js and >>>>>>> helix participant pair. If any of the two are "dead" the other process >>>>>>> must >>>>>>> immediately be SIGKILLED, hence the need for one JVM serving as Helix >>>>>>> Participant for every Node.js >>>>>>> 6. Each node.js instance sets a watch on /LIVEINSTANCES straight to >>>>>>> zookeeper as an extra safety blanket. If it finds that it is NOT in the >>>>>>> liveinstances it likely means that its JVM participant lost its >>>>>>> connection >>>>>>> to Zookeeper, but the process is still running so the bash script has >>>>>>> not >>>>>>> terminated the node server. In this case the node server must end its >>>>>>> own >>>>>>> process. >>>>>>> >>>>>>> Thank you for all your help. >>>>>>> >>>>>>> Sincerely, >>>>>>> Lance >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 12, 2013 at 9:07 PM, kishore g <[email protected]>wrote: >>>>>>> >>>>>>>> Hi Lance, >>>>>>>> >>>>>>>> Thanks for your interest in Helix. There are two possible approaches >>>>>>>> >>>>>>>> 1. Similar to what you suggested: Write a Helix Participant in >>>>>>>> non-jvm language which in your case is node.js. There seem to be quite >>>>>>>> a >>>>>>>> few implementations in node.js that can interact with zookeeper. Helix >>>>>>>> participant does the following ( you got it right but i am providing >>>>>>>> right >>>>>>>> sequence) >>>>>>>> >>>>>>>> 1. Create an ephemeral node under LIVEINSTANCES >>>>>>>> 2. watches /INSTANCES/<PARTICIPANT_NAME>/MESSAGES node for >>>>>>>> transitions >>>>>>>> 3. After transition is completed it updates >>>>>>>> /INSTANCES/<PARTICIPANT_NAME>/CURRENTSTATE >>>>>>>> >>>>>>>> Controller is doing most of the heavy lifting of ensuring that >>>>>>>> these transitions lead to the desired configuration. Its quite easy to >>>>>>>> re-implement this in any other language, the most difficult thing >>>>>>>> would be >>>>>>>> zookeeper binding. We have used java bindings and its solid. >>>>>>>> This is at a very high level, there are some more details I have >>>>>>>> left out like handling connection loss/session expiry etc that will >>>>>>>> require >>>>>>>> some thinking. >>>>>>>> >>>>>>>> >>>>>>>> 2. The other option is to use the Helix-agent as a proxy: We added >>>>>>>> Helix agent as part of 0.6.1, we havent documented it yet. Here is the >>>>>>>> gist >>>>>>>> of what it does. Think of it as a generic state transition handler. >>>>>>>> You can >>>>>>>> configure Helix to run a specific system command as part of each >>>>>>>> transition. Helix agent is a separate process that runs along side your >>>>>>>> actual process. Instead of the actual process getting the transition, >>>>>>>> Helix >>>>>>>> Agent gets the transition. As part of this transition the Helix agent >>>>>>>> can >>>>>>>> invoke api's on the actual process via RPC, HTTP etc. Helix agent >>>>>>>> simply >>>>>>>> acts as a proxy to the actual process. >>>>>>>> >>>>>>>> I have another approach and will try to write it up tonight, but >>>>>>>> before that I have few questions >>>>>>>> >>>>>>>> >>>>>>>> 1. How many node.js servers run on each node one or >1 >>>>>>>> 2. Spectator/router is java or non java based ? >>>>>>>> 3. Can you provide more details about your state machine. >>>>>>>> >>>>>>>> >>>>>>>> thanks, >>>>>>>> Kishore G >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 12, 2013 at 11:07 AM, Lance Co Ting Keh >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> Hi my name is Lance Co Ting Keh and I work at Box. You guys did a >>>>>>>>> tremendous job with Helix. We are looking to use it to manage a >>>>>>>>> cluster >>>>>>>>> primarily running Node.js. Our model for using Helix would be to >>>>>>>>> have node.js or some other non-JVM library be *Participants*, a >>>>>>>>> router as a *Spectator* and another set of machines to serve as >>>>>>>>> the *Controllers *(pending testing we may just run master-slave >>>>>>>>> controllers on the same instances as the Participants) . The >>>>>>>>> participants >>>>>>>>> will be interacting with Zookeeper in two ways, one is to receive >>>>>>>>> helix >>>>>>>>> state transition messages through the instance of the HelixManager >>>>>>>>> <Participant>, and another is to directly interact with Zookeeper >>>>>>>>> just to >>>>>>>>> maintain ephemeral nodes within /INSTANCES. Maintaining ephemeral >>>>>>>>> nodes >>>>>>>>> directly to Zookeeper would be done instead of using InstanceConfig >>>>>>>>> and >>>>>>>>> calling addInstance on HelixAdmin because of the basic health checking >>>>>>>>> baked into maintaining ephemeral nodes. If not we would then have to >>>>>>>>> write >>>>>>>>> a health checker from Node.js and the JVM running the Participant. Are >>>>>>>>> there better alternatives for non-JVM Helix participants? I >>>>>>>>> corresponded >>>>>>>>> with Kishore briefly and he mentioned HelixAgents specifically >>>>>>>>> ProcessMonitorThread that came out in the last release. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thank you very much! >>>>>>>>> >>>>>>>>> Lance Co Ting Keh >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
