Hi folks, I was considering using Zookeeper to implement a replication protocol due the global order guarantee. In my case, operations are logged by creating persistent sequential znodes. Knowing the name of last applied znode, backups can identify pending operations and apply them in order. Because I want to allow backups to join the system at any time, I will not delete a znode before a checkpoint. Thus, I can ending up with thousand of child nodes and consequently ZooKeeper.getChildren() calls might be very consuming since a huge list of node will be returned.
I thought of using another znode to store the last created znode. So if the last applied znode was op-11 and last created znode was op-14, I would try to read op-12 and op-13. However, in order to protect against partial failure, I have to encode some extra information ( I am using <session-id>-<local sequential number>) in the name of znodes. Thus it is not possible to predict their names (they'll be op-<almost random string>-<zookeeper seq number>). Consequently , I will have to call getChildren() anyway. Has somebody faced the same issue ? Has anybody found a better solution ? I was thinking of extending ZooKeeper code to have some kind of indexed access to child znodes, but I don`t know how easy/clever is that. Thanks, André