Re: Controller fault tolerance

kishore g Wed, 26 Jun 2013 16:13:19 -0700

Hi Lance,

We have a test case that tests the scenario you described.
https://github.com/apache/incubator-helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/TestStandAloneCMMain.java


Thanks,
Kishore G


On Wed, Jun 26, 2013 at 3:48 PM, Shi Lu <[email protected]> wrote:

> Hi Lance:
>
> Here is how the multiple controller leader election works:
>
> In the case that controller x, y, z both try to control a cluster
>
> 1. x, y, z both try to create a zookeeper ephemeral node
> /clusterName/CONTROLLER/LEADER
>
> 2. Only one controller creates the ephemeral node successfully then starts
> controlling the cluster;
>
> 3. Other controllers fail to create the ephemeral node (it is already
> created by the leader), they will register a zookeeper change listener on
> the  /clusterName/CONTROLLER/LEADER ephemeral node; in case that node is
> gone, they will try to create the node, and if successful will control the
> cluster.
>
> So in the two controller case, when you shut down controller A, it may
> take some time for controller B to start controlling the cluster.
>
> Can you share your test code?
>
> Thanks,
> -Shi
>
>
> On Wed, Jun 26, 2013 at 8:43 AM, Lance Co Ting Keh <[email protected]> wrote:
>
>> Hi guys,
>>
>> I tried naming the controllers differently. I first had one controller
>> running and is printing that it "isLeader()". When i brought up a second
>> controller named differently, the first controller printed that it is NOT
>> the leader and the new controller became the leader. Then I shut off the
>> current leader (second controller) but the first controller still continued
>> printing that it is NOT the leader. Somehow it leader elected once and did
>> not leader elect again. The only way im generating the leader is this:
>>
>>       controllerManager =
>> HelixControllerMain.startHelixController(zkAddress,
>>         clusterName, "controller", HelixControllerMain.STANDALONE);
>>
>>  AND
>>
>>       controllerManager =
>> HelixControllerMain.startHelixController(zkAddress,
>>         clusterName, "controller2", HelixControllerMain.STANDALONE);
>>
>> and im checking by saying controllerManager.isLeader() am i doing
>> something wrong?
>>
>> Thank you
>> Lance
>>
>>
>> On Fri, Jun 21, 2013 at 1:51 PM, Lance Co Ting Keh <[email protected]> wrote:
>>
>>> Thank you very much for the quick response guys
>>>
>>>
>>> On Fri, Jun 21, 2013 at 1:49 PM, Zhen Zhang <[email protected]> wrote:
>>>
>>>>  yes. Using different names for the controllers is a quick workaround.
>>>>
>>>>   From: Lance Co Ting Keh <[email protected]>
>>>> Reply-To: "[email protected]" <
>>>> [email protected]>
>>>> Date: Friday, June 21, 2013 1:47 PM
>>>>
>>>> To: "[email protected]" <[email protected]>
>>>> Subject: Re: Controller fault tolerance
>>>>
>>>>   Okay thank you. But for now the quick fix is to make sure to name
>>>> the controllers differently?
>>>>
>>>>
>>>> On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <[email protected]>wrote:
>>>>
>>>>>  This is a known bug in helix.
>>>>> https://issues.apache.org/jira/browse/HELIX-123
>>>>>
>>>>>  The problem is we are comparing the instance name of the controller
>>>>> but not the session id, so if you start two controllers of the same name,
>>>>> isLeader() return true. We will fix it shortly.
>>>>>
>>>>>  Thanks,
>>>>> Jason
>>>>>
>>>>>   From: Lance Co Ting Keh <[email protected]>
>>>>> Reply-To: "[email protected]" <
>>>>> [email protected]>
>>>>> Date: Friday, June 21, 2013 1:39 PM
>>>>> To: "[email protected]" <[email protected]
>>>>> >
>>>>> Subject: Re: Controller fault tolerance
>>>>>
>>>>>   Hi Kishore,
>>>>>
>>>>>  I tried starting two controllers programmatically like you mentioned:
>>>>>
>>>>>
>>>>>
>>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I then called isLeader() on the both managers 
>>>>> (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()).
>>>>>  and both of them returned true. They're obviously both on the same 
>>>>> zookeeper instance, and on the same cluster. The controllers are running 
>>>>> and so im not sure whether or not its actually leader electing properly, 
>>>>> or I'm misinterpreting the isLeader() function
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Lance
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Hi Kishore,
>>>>>>
>>>>>> Thanks for the quick response.
>>>>>>
>>>>>> Regards,
>>>>>> Kumar
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <[email protected]>wrote:
>>>>>>
>>>>>>> Hi Kumar,
>>>>>>>
>>>>>>>  You can start multiple controllers and only one of them will be
>>>>>>> active and rest of them will be in standby mode. If the active 
>>>>>>> controller
>>>>>>> fails, one of the standby will become active and start managing the 
>>>>>>> cluster.
>>>>>>>
>>>>>>>  You can start the controllers either using command line or
>>>>>>> programmatically.
>>>>>>>
>>>>>>>  command line
>>>>>>>
>>>>>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>>>>>>
>>>>>>>  using Helix api
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>>>>
>>>>>>>
>>>>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>>
>>>>>>> Kishore G
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to understand the Helix Controller/Cluster manager
>>>>>>>> fault tolerance mechanism.
>>>>>>>> Single Controller will become Single-Point-Failure. So what are the
>>>>>>>> available options/techniques to
>>>>>>>> achieve controller fault tolerance?   Any pointers/recipes/code
>>>>>>>> snippets?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Kumar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Controller fault tolerance

Reply via email to