I disagree with the original post that this is a problem, even in EC2. Having the persistent copy on disk is exactly what makes the rolling restart work so well.
I think that the misunderstanding is that this on-disk image is critical to cluster function. It is not critical because it is replicated to all cluster members. This means that any member can disappear and a new instance can replace it with no big cost other than the temporary load of copying the current snapshot from some cluster member. On Mon, Jul 6, 2009 at 11:33 AM, Mahadev Konar <maha...@yahoo-inc.com>wrote: > In the documentation of zookeeper, I have read that > > zookeeper saves snapshots of the in-memory data in the file system. Is > > that needed for recovery? Logically, it would be much easier for me if > > this is not the case. > Yes, zookeeper keeps persistent state on disk. This is used for recovery > and > correctness of zookeeper.