Hi Lance, We have a test case that tests the scenario you described. https://github.com/apache/incubator-helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/TestStandAloneCMMain.java
Thanks, Kishore G On Wed, Jun 26, 2013 at 3:48 PM, Shi Lu <[email protected]> wrote: > Hi Lance: > > Here is how the multiple controller leader election works: > > In the case that controller x, y, z both try to control a cluster > > 1. x, y, z both try to create a zookeeper ephemeral node > /clusterName/CONTROLLER/LEADER > > 2. Only one controller creates the ephemeral node successfully then starts > controlling the cluster; > > 3. Other controllers fail to create the ephemeral node (it is already > created by the leader), they will register a zookeeper change listener on > the /clusterName/CONTROLLER/LEADER ephemeral node; in case that node is > gone, they will try to create the node, and if successful will control the > cluster. > > So in the two controller case, when you shut down controller A, it may > take some time for controller B to start controlling the cluster. > > Can you share your test code? > > Thanks, > -Shi > > > On Wed, Jun 26, 2013 at 8:43 AM, Lance Co Ting Keh <[email protected]> wrote: > >> Hi guys, >> >> I tried naming the controllers differently. I first had one controller >> running and is printing that it "isLeader()". When i brought up a second >> controller named differently, the first controller printed that it is NOT >> the leader and the new controller became the leader. Then I shut off the >> current leader (second controller) but the first controller still continued >> printing that it is NOT the leader. Somehow it leader elected once and did >> not leader elect again. The only way im generating the leader is this: >> >> controllerManager = >> HelixControllerMain.startHelixController(zkAddress, >> clusterName, "controller", HelixControllerMain.STANDALONE); >> >> AND >> >> controllerManager = >> HelixControllerMain.startHelixController(zkAddress, >> clusterName, "controller2", HelixControllerMain.STANDALONE); >> >> and im checking by saying controllerManager.isLeader() am i doing >> something wrong? >> >> Thank you >> Lance >> >> >> On Fri, Jun 21, 2013 at 1:51 PM, Lance Co Ting Keh <[email protected]> wrote: >> >>> Thank you very much for the quick response guys >>> >>> >>> On Fri, Jun 21, 2013 at 1:49 PM, Zhen Zhang <[email protected]> wrote: >>> >>>> yes. Using different names for the controllers is a quick workaround. >>>> >>>> From: Lance Co Ting Keh <[email protected]> >>>> Reply-To: "[email protected]" < >>>> [email protected]> >>>> Date: Friday, June 21, 2013 1:47 PM >>>> >>>> To: "[email protected]" <[email protected]> >>>> Subject: Re: Controller fault tolerance >>>> >>>> Okay thank you. But for now the quick fix is to make sure to name >>>> the controllers differently? >>>> >>>> >>>> On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <[email protected]>wrote: >>>> >>>>> This is a known bug in helix. >>>>> https://issues.apache.org/jira/browse/HELIX-123 >>>>> >>>>> The problem is we are comparing the instance name of the controller >>>>> but not the session id, so if you start two controllers of the same name, >>>>> isLeader() return true. We will fix it shortly. >>>>> >>>>> Thanks, >>>>> Jason >>>>> >>>>> From: Lance Co Ting Keh <[email protected]> >>>>> Reply-To: "[email protected]" < >>>>> [email protected]> >>>>> Date: Friday, June 21, 2013 1:39 PM >>>>> To: "[email protected]" <[email protected] >>>>> > >>>>> Subject: Re: Controller fault tolerance >>>>> >>>>> Hi Kishore, >>>>> >>>>> I tried starting two controllers programmatically like you mentioned: >>>>> >>>>> >>>>> >>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> clusterName, "controller", HelixControllerMain.STANDALONE); >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I then called isLeader() on the both managers >>>>> (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). >>>>> and both of them returned true. They're obviously both on the same >>>>> zookeeper instance, and on the same cluster. The controllers are running >>>>> and so im not sure whether or not its actually leader electing properly, >>>>> or I'm misinterpreting the isLeader() function >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> Lance >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <[email protected] >>>>> > wrote: >>>>> >>>>>> Hi Kishore, >>>>>> >>>>>> Thanks for the quick response. >>>>>> >>>>>> Regards, >>>>>> Kumar >>>>>> >>>>>> >>>>>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <[email protected]>wrote: >>>>>> >>>>>>> Hi Kumar, >>>>>>> >>>>>>> You can start multiple controllers and only one of them will be >>>>>>> active and rest of them will be in standby mode. If the active >>>>>>> controller >>>>>>> fails, one of the standby will become active and start managing the >>>>>>> cluster. >>>>>>> >>>>>>> You can start the controllers either using command line or >>>>>>> programmatically. >>>>>>> >>>>>>> command line >>>>>>> >>>>>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername> >>>>>>> >>>>>>> using Helix api >>>>>>> >>>>>>> >>>>>>> >>>>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress, >>>>>>> >>>>>>> >>>>>>> clusterName, "controller", HelixControllerMain.STANDALONE); >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hope this helps. >>>>>>> >>>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> >>>>>>> Kishore G >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am trying to understand the Helix Controller/Cluster manager >>>>>>>> fault tolerance mechanism. >>>>>>>> Single Controller will become Single-Point-Failure. So what are the >>>>>>>> available options/techniques to >>>>>>>> achieve controller fault tolerance? Any pointers/recipes/code >>>>>>>> snippets? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Kumar >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
