Great write-up, Jordan, thanks! Whether to backup zk data or not is possibly an open topic for this community, even though we have discussed it at times. My sense has been that precisely because of the issues you mention in your post, it is typically best to have a way to recreate its data upon a disaster rather than backup the data. I think there could be three general scenarios in which folks would prefer to backup data, but you correct me if these aren't accurate:
- The data in zk isn't elsewhere, so it can't be recreated: zk isn't a regular database, so I'd think it is best not to store data and focus on cluster data or metadata. - There is a just a lot of data and I'd rather have a shorter time to recover: zk in general shouldn't have that much data in db, but let's go with the assumption that for the requirements of the application it is a lot. For such a case, it probably depends on whether your application can efficiently and effectively recover from a backup. Basically, as pointed out in the post, the data could be inconsistent and cause trouble if you don't think about the corner cases. - The code to recreate the zk metadata for my application is super complex: if you decide to code against zk, it is good to think whether reconstructing in the case of a disaster is doable and if it is design and implement to reconstruct the state upon a disaster. Also, we typically provision enough replicas, often replicating across data centers, to make sure that the data isn't all gone. Having more replicas does not rule out completely the possibility of a disaster, but in such rare cases we resort to the expensive path. I personally have never worked with an application that was taking backups of zk data in prod, so I'm really interested in what others think. -Flavio > On 16 Jun 2016, at 00:43, Jordan Zimmerman <jor...@jordanzimmerman.com> wrote: > > FYI - I wrote a blog about backing up ZooKeeper: > > https://www.elastic.co/blog/zookeeper-backup-a-treatise > > -Jordan