>> What's the order there? Watchers are triggered as at the end of processing a transaction (create / delete / setData and so on), after the data tree is updated. A fired watcher event will be queued on servers response queue (which guarantees FIFO order for the same session). The client - server connection is on top of TCP which also guarantees FIFO order. The requests processing on server side is also ordered for the same session - in particular, a write request will stall all read requests for the same session. All these combined together guarantees that a client will not observe new data until the client gets a watcher notification.
Note the assumption is for same session. There is no order guarantee if session expired - for example it's totally possible that client might never receive watched events if the appertaining session expired and client established a new session - the server side watcher might be long gone at that point (and note watcher only fires once). >> Is the node data not overwritten in-place and two copies are kept until all watch notifications are acknowledged, No, there is no copy on write, or any sort of ACK between client / server for watcher notifications. >> we can assume an unexpected network delay I don't think network delay will violate the aforementioned ordering guarantee, but I am happy to discuss further if you have a concrete counter example. On Thu, Oct 22, 2020 at 4:29 PM Marcin Copik <mco...@gmail.com> wrote: > Hi! > > I've been trying to understand the ZooKeeper semantics when it comes > to ordering of watch notifications and other requests. Based on the > technical documentation and the book, I've been able to follow the > main rules but they seem to be a bit unclear when it comes to an > actual implementation. > > (A) Order w.r.t to updated node data - I found the following statement > in the documentation: "ZooKeeper provides an ordering guarantee: a > client will never see a change for which it has set a watch until it > first sees the watch event." > At the same time, I found the following passage in ZooKeeper's book: > "One important guarantee of notifications is that they are delivered > to a client before any other change is made to the same znode. If a > client sets a watch to a znode and there are two consecutive updates > to the znode, the client receives the notification after the first > update and before it has a chance to observe the second update by, > say, reading the znode data." > (B) Order of watches - if there are two state changes u and u', watch > notifications corresponding to both of them must be delivered in the > same order. > (C) Order of system changes - if there are state updates u and u' > related to nodes a and b, respectively, and a client has set a watch > on node a, the client **cannot** read new value of b before seeing a > watch event related to a. > > Questions: > 1) I'm confused by the order of operations implied by (A): clients > can't observe the new state before receiving a watch event but when > they receive it, the new state must be available. Thus, the server > must send watch notifications to each client and update the node data. > What's the order there? No matter if we choose to first update node > contents or to process watch invocation, it can happen that a client > receives watch notification and performs a read before node data is > updated (stale data), or client reads updated data before receiving a > watch notification. Such scenarios are not likely, but they can happen > with non-deterministic delays. > > How is it resolved in ZooKeeper? Is the node data not overwritten > in-place and two copies are kept until all watch notifications are > acknowledged, when we no longer need an old copy to return stale > value? Are read requests from client X stalled until the watch event > is acknowledged by client X? Or is there another solution employed > there? > > 2) Does C) imply that all updates received by a local server are > applied in a serial manner, i.e., for update u the server must receive > an acknowledgement of watch notification from each client interested > in it, before proceeding with update u'? Otherwise, a read issued by > the client might return the new data before it has received the watch > event; we can assume an unexpected network delay. > > Thanks in advance for any help you can provide! > Best regards, > Marcin Copik >