I want to start the discussion around what OT algorithms to use. This is going to get technical. This is not a bug. Please ask simple but tangential questions (like how does wave's current crypto system work?) in a new thread instead of this one.
Requirements: - Arbitrary peers can syncronize data - We can collaboratively edit tree structures, key-value pairs and rich text. (I'd like arbitrary JSON editing) - Fast enough to work on mobile devices. Protocol needs to be low bandwidth and require a low number of network round-trips. - Ideally a low storage overhead Nice-to-have: - End-to-end encryption of content. (In the wake of PRISM, the less information my wave server needs to know about me, the better.) - Some wave version of OTR[1] - Presence & cursors - Simple. At least, as simple as we can manage. Lets at least try :/ Is there anything else I'm missing? Some of this stuff is only tangentially related to OT (like cursors and encryption), but its still important we bring it up. Given that we want to be able to connect arbitrary peers, I think it makes sense to use a message encryption & signing system. Currently, wave servers sign their user's messages. To support users collaborating over a LAN without servers, we should allow users to sign (& encrypt) their own messages. Unfortunately, the signature will become invalid if the operation is transformed. Consider peers A and B (communicating over a LAN), and disconnected peer C. A and B each generate simultaneous operations and sign their operations. A sends its operation to B, who transforms it. B then connects to C and sends both operations. For C to verify A's operation, C must see the original (untransformed) version of the operation. I think the only way we can make this work securely is by only sending original operations instead of sending operations after they've been transformed. As far as I know, there's three broad directions we could take algorithmically: 1. CRDTs (like Logoot[1]) 2. 'Classical' OT using vector clocks for versioning. 3. The OT system that Torben Weis was working on. (Some might remember him from the wave summit a few years ago). I have been chatting on and off with him about this since the summit. He hasn't published anything on it yet though. Its similar to classical OT, except using git style version hashes. CRDTs are probably the most elegant solution. They avoid the entire complicated mess of transformation. However: - They add a significant overhead to the document size (we must store a unique name between every two characters in the wave) - As far as I know, nobody has implemented arbitrary JSON editing using CRDTs. Implementing a set is very difficult. - Even without transform, we'll still have to store operations anyway so peers can verify signatures. I'd like to know what we should expect the actual document size overhead to be. I expect it to be at least a 5x size increase, but I don't know for sure. The papers I've read cleverly minimize the overhead by only doing OT on a line-by-line basis, but thats unacceptable for wave. As I understand it, vector clocks are the size of the number of unique peers which have ever connected to the network. (They contain an integer version number per peer on the wave). As I understand them, a vector clock must be stored in each operation - which to me always seemed unpalatable enough I didn't explore them further. Note that this is per *peer*, not per user. If we want full p2p, this is one per device which has ever edited a wave. There's probably some clever ways we could minimize that size. Off the top of my head: - Most operations are generated on the same site as their single parent. These operations only increment the local version number. We could special case this. I don't know how much better this would make the system - I suspect a lot. - If we want, we could build a hybrid system where most clients use a client-server protocol like the current system. Servers could then use vector clocks or whatever to confer amongst themselves. This way we only need one element in the clock per server. However, we lose most of the benefits of having a p2p OT system to begin with. Torben's system is the one I know the most about. I'm biased to think thats the right answer because I understand it better than the others. I just spent the last couple hours writing up how it works, but I'm going to run it by Torben before posting to this list. It works like this: First, we need a transform function that supports TP2. The simplest way to do this is to add tombstones to the data type. (We'll need to do this for the vector clock based system as well, by the way). I have a plaintext implementation which does this in javascript[3], although its quite slow (~150k transforms per second). I expect that an optimized version of this will be around 10-20 million transforms / second in C[4] for simple ops, and probably a few million per second in java. Second, make an inverse transform function (prune). P(T(a, b), b) = a. In its current incantation, the algorithm works as follows: Consider a DAG of all the operations. The closest metaphor is a network of operations in git. Every operation has a unique hash identifier and one or more parents (except the root, which has no parents). It'll look something like this: https://dl.dropboxusercontent.com/u/2494815/ops.png ... except it'll usually be waaaay less complicated than that. The cool thing about this structure is that if you apply all of the operations in that graph in *any order*, you'll end up in the same document state. (So long as each node comes after all its parents and you use the transform function correctly). Each peer stores a history list of operations. Each operation is stored twice. One copy is its original form (the exact bytes which were used for hashing and signing). When sending the operation to a remote peer, this is the copy we send. The second copy of the op is transformed. At all times, the transformed ops in the history list are written such that you could directly apply every operation in the history list (in order) to a new document and you would reach the current document state. The history lists on different nodes don't have to be in the same order. We also store a copy of the current document snapshot, because its really useful for UI. Consider the DAG again. Every operation in the history list is a node in that DAG, with its parents as other nodes. The DAG will have at least one node with no children (= the bottom nodes in that diagram I linked above). These childless nodes form the *frontier*. When a new operation is created, all ops in the frontier become the new node's parents. The new node also stores the number of ancestors it has (= history list length). I'll get to why thats useful that in a moment. The history list has a special power because of the prune function: operations which don't depend on one another can be swapped in place. swap = (a, b) -> b' = prune(b, a) a' = transform(a, b) When peers connect, they send each other missing ops. Figuring out which ops are missing can be surprisingly tricky - but we'll figure that out later. New ops must be ingested in order, so we always ingest an operation after ingesting all of its parents. So you're given an op from another peer to ingest. We need to transform that operation by all operations we have locally that aren't in the op's ancestors. You can't just arbitrarily transform two ops by one another - they have to be both be applyable to the same document. So what we do is rewrite our local history so that all of the operation's ancestors are at the start of our history list, and all of the other operations are at the end of the history. Once we've done that, the operation can be transformed by all the ops in the second half of our history list and applied (and appended at the end of the history list). Remember, we know how many ancestors the new operation has. We saved that earlier when the op was created. We also have all of that operation's parents in our history list. So we find the position of the last parent the op has in the history list. If that operation is at the same position as the new op's ancestor length, we're golden. Imagine your history list contains: A B X Y and the peer you're connecting to has: A B C (C has B as a parent). C has 2 ancestors (A and B). When you ingest C, you find B (its last parent), which is at position 2 in the history list. So you know that everything after that in the history list isn't one of C's parents, and you can transform C by X, Y and then apply it. If our history list instead looks like this: X A B Y ... then you'll find B is too late in your history list. You need to rewrite your history list to move the X after B. You use the swap function to do that, swapping X and A, then X and B. Your history then looks like A B X Y like we wanted, so you can proceed as above. This algorithm requires a lot of transforms, but I think most of them are unavoidable because of the signing / trust issue. It also requires that every operation (ie, every key press) has a unique ID. Thats probably going to be a 16 byte hash or something; which seems like a lot of extra bytes for typing one character. I implemented it here: https://github.com/josephg/tp2stuff/blob/master/node2.coffee ... though with the randomizer setup I have (a handful of peers, all randomly generating operations and periodically (randomly) connecting to each other to sync up. 5 clients generating operations, with two random clients syncing about every 5 operations results in about 100x as many transforms as operations. (10k ops -> 11M transforms). Doing the same thing with 3 clients 'only' needs 2M transforms. Performance stays steady over time (it doesn't get worse as the document grows like CRDTs will) - this scatterplot shows the number of seconds per 30 iterations on one of my tests: https://dl.dropboxusercontent.com/u/2494815/Screen%20Shot%202013-06-24%20at%201.01.42%20PM.png I think this huge number of transforms required is one of the costs of doing encryption & signing of the ops. We might be able to lower the number of transforms required if we trust our servers with our keys - but I'm not sure I want to do that. Real-world performance should be much better than that anyway, but yeah - it makes me nervous. I'm probably going to run more tests to see what kind of performance we can expect in practice. Blah. That was a lot of words. Thoughts? -J [1] http://www.cypherpunks.ca/otr/ [2] http://hal.archives-ouvertes.fr/docs/00/34/59/11/PDF/main.pdf [3] https://github.com/share/ottypes/blob/master/lib/text-tp2.js [4] based on my extremely fast non-tombstone equivalent data structure here: https://github.com/share/libot/blob/master/text.h
