Re: FYI - Apache ZooKeeper Backup, a Treatise

Flavio Junqueira Thu, 16 Jun 2016 01:04:14 -0700

Great write-up, Jordan, thanks!

Whether to backup zk data or not is possibly an open topic for this community, 
even though we have discussed it at times. My sense has been that precisely 
because of the issues you mention in your post, it is typically best to have a 
way to recreate its data upon a disaster rather than backup the data. I think 
there could be three general scenarios in which folks would prefer to backup 
data, but you correct me if these aren't accurate:


- The data in zk isn't elsewhere, so it can't be recreated: zk isn't a regular 
database, so I'd think it is best not to store data and focus on cluster data 
or metadata.
- There is a just a lot of data and I'd rather have a shorter time to recover: 
zk in general shouldn't have that much data in db, but let's go with the 
assumption that for the requirements of the application it is a lot. For such a 
case, it probably depends on whether your application can efficiently and 
effectively recover from a backup. Basically, as pointed out in the post, the 
data could be inconsistent and cause trouble if you don't think about the 
corner cases. 
- The code to recreate the zk metadata for my application is super complex: if 
you decide to code against zk, it is good to think whether reconstructing in 
the case of a disaster is doable and if it is design and implement to 
reconstruct the state upon a disaster.

Also, we typically provision enough replicas, often replicating across data 
centers, to make sure that the data isn't all gone. Having more replicas does 
not rule out completely the possibility of a disaster, but in such rare cases 
we resort to the expensive path.

I personally have never worked with an application that was taking backups of 
zk data in prod, so I'm really interested in what others think. 

-Flavio


> On 16 Jun 2016, at 00:43, Jordan Zimmerman <jor...@jordanzimmerman.com> wrote:
> 
> FYI - I wrote a blog about backing up ZooKeeper:
> 
> https://www.elastic.co/blog/zookeeper-backup-a-treatise
> 
> -Jordan

Re: FYI - Apache ZooKeeper Backup, a Treatise

Reply via email to