Dominic Williams wrote:
1/ If a node crashes or something else goes wrong, you leave behind
persistent nodes. Over time these will grow and grow, rather like the old
tmp folders used to fill with files under Windows
That's true. One either needs to use ephemerals or use persistent and
have a "garbage collector" (implicit or explicit gc). In most cases it's
preferable to use the ephemeral.
2/ Persistent nodes = nasty scalability *bottleneck* because you're actually
having to write to disk somewhere.
This is not actually how ZK works. All znodes regardless of
persistent/ephemeral are written to disk persistently. Ephemeral nodes
are tied to the session that created them. As long as the session is
alive the ephemeral node is alive. Sessions themselves are
persistently/reliably stored by the ZK cluster. This allows the shutdown
of the entire cluster and restart it, all sessions/ephemerals will be
maintained. Sessions can move from server to server (if say network
connectivity to server A fails, or server A itself fails then the client
will move to server B). The session and all ephemerals are maintained
(well, as long as the client moves withing the expiration timeout value).
To avoid this I'm actually thinking of writing locking system where you work
out the existing chain not by enumerating sequential children, but by
looking at the contents of each temporary lock node to see what it is
waiting on. But... that's quite horrible. Was wondering whether there is
some technical reason why you ephemeral nodes can't have children??
There are a few cases to think about.
1) obviously ephemeral nodes can't have persistent children, this just
doesn't make sense
2) ephemeral nodes have an owner - the session that created them. so it
would also not make sense (in my mind at least) to have an ephemeral
/foo with another ephemeral /foo/bar with a different owner.
3) so you are left with "ephemerals can be a child of an ephemeral with
the same owner".
4) there are also issues of order. in particular what is the "deletion
order" depth first or breadth first, etc...
I believe the answer so far has been "we don't do this because it's
fairly complicated and we haven't seen any use cases that require it."
In the cases I've seen so far there was either a misunderstanding of how
zk worked, or a simpler way available.
Does that make sense? Thoughts?
Patrick