2013/6/25 Joseph Gentle <[email protected]> > > >> When peers connect, they send each other missing ops. Figuring out > >> which ops are missing can be surprisingly tricky - but we'll figure > >> that out later. New ops must be ingested in order, so we always ingest > >> an operation after ingesting all of its parents. > > Just use a Merkle Tree that is at the same time a prefix tree with respect to the hashes of the ops (explanation below). The bandwidth usage is O(1) if both clients are in sync and O(log n) if they have one or few different ops and O(n) in the worst case, where n in the number of ops.
Constructing the tree is simple. Let the hash function output 20 bytes and let's encode this in hex. This results in a hash-string of 40 hex-characters for each operation. Each node hashes over the hashes of its children. Leaf-nodes correspond to operations and thus use the hash value of their respective operation. The tree-invariant is that all siblings on level x share the same prefix of x hex-characters. The tree is not sent over the network. Instead, clients start comparing the hashes at the root. Two clients compare their root hash. If it is equal, the entire tree is equal and therefore they are in sync. If not, they download all direct children and repeat the procedure for each sub-tree rooted by one of these children. For example, if child number 3 has a different hash, but all others share the same hash, then we have learned that there are one or more ops with a hash of 3xxxx... that are different and need syncing. Typically we can limit the depth of the tree to few levels. 8 levels already yield a tree that could store 16^8 possible ops. So in the worst case two clients need to wait for 8 round-trips to determine a missing op. In addition, each client sends a time stamp. So when syncing we report the last time stamp received from this client and ask for all ops this client received later. If these are few, then simply get them (even if we know some of the ops already, because we got them from another client). If there are too many ops, fall back to the merkle tree. With a good approximation of RTT and bandwidth, it is easy to calculate which algorithm is the best to sync two clients. Greetings Torben
