Re: How to handle total Zookeeper restart

Ming Fang Mon, 04 Mar 2013 04:50:47 -0800

I manually brought down Zookeeper and erased the Zookeeper data on purpose as a 
test.
The goal is to find a way  to 
1)continue processing in the event of total Helix/Zookeeper failure and lost of 
state.
2)recovery gracefully once Helix/Zookeeper are restarted.


On Mar 4, 2013, at 12:44 AM, kishore g <[email protected]> wrote:

> Hi Ming,
> 
> Helix depends on the data in zookeeper. Its ok for zookeeper to restart and 
> Helix will handle it but if zookeeper loses its state( data directory) then 
> unfortunately we cannot recover the state.
> 
> How did you lose the zookeeper cluster ( including  state ). 
> 
> thanks,
> Kishore G
> 
> 
> 
> 
> On Sun, Mar 3, 2013 at 8:58 PM, Ming Fang <[email protected]> wrote:
> Hi
> 
> When I have a working Helix cluster, all participants for working fine, and 
> for whatever reason I lost the entire Zookeeper cluster(including all state),
> what is the best way to handle this?
> 
> Ideally I want all the participants to continue working and that the only 
> capability I would loose is Helix's ability to failover.
> Upon restart of Zookeeper, the Controllers and Participants should register 
> their latest state back to the new Zookeeper cluster.
> However my tests thus far shows that even thought the HelixManager 
> reconnects, they do not write the necessary data into Zookeeper for the 
> cluster to function correctly.
> For example, the external view callbacks are not showing the participants at 
> all.
> 
> Is this something Helix should handle or is it up to the applications to 
> detect the failure and then recreate new HelixManagers?
> 
> Thanks
> --ming
>

Re: How to handle total Zookeeper restart

Reply via email to