> 1. CRDTs (like Logoot[1]) > 2. 'Classical' OT using vector clocks for versioning. > 3. The OT system [...] similar to classical OT, except using git style > version hashes.
A quick look at Logoot's paper says that deletion is not generally possible with CRDTs (p5 section 4) without either a) Tombstones or b) vector clocks. So, I suspect it would be possible to make use of CRDTs (which seem the best approach) with the git-style hash to fully allow deletion. > CRDTs are probably the most elegant solution. They avoid the entire > complicated mess of transformation. However: > - They add a significant overhead to the document size (we must store > a unique name between every two characters in the wave) > - As far as I know, nobody has implemented arbitrary JSON editing > using CRDTs. Implementing a set is very difficult. > - Even without transform, we'll still have to store operations anyway > so peers can verify signatures. > > I'd like to know what we should expect the actual document size > overhead to be. I expect it to be at least a 5x size increase, but I > don't know for sure. The papers I've read cleverly minimize the > overhead by only doing OT on a line-by-line basis, but thats > unacceptable for wave. Really? Wave currently has special handling for <line></line> elements in the document structure. (IIRC for the sole purpose of making its OT easier - there was some reason to not put the document as <line>data</line> but <line></line>data). Many systems (e.g. *nix) take the idea of a line quite seriously since they all favour text-based approaches. The protocol shouldn't really be handling arbitrary binary data(should it?), and JSON can easily be arbitrary newlined without a problem) > [...] a plaintext implementation which does this in javascript[3], although > its quite slow (~150k transforms per second). I expect that an > optimized version of this will be around 10-20 million transforms / > second in C[4] for simple ops, and probably a few million per second > in java. Nice numbers! (Heh, can we see some actual data :) ) > When peers connect, they send each other missing ops. Figuring out > which ops are missing can be surprisingly tricky - but we'll figure > that out later. New ops must be ingested in order, so we always ingest > an operation after ingesting all of its parents. Alarm bells start ringing here... Care to elaborate on how we could reasonably do this, it sounds difficult. > This algorithm requires a lot of transforms, but I think most of them > are unavoidable because of the signing / trust issue. Is there some way to use GPG-style signing chains to avoid having to retransform the whole thing whilst maintaining trust? > It also requires > that every operation (ie, every key press) has a unique ID. Thats > probably going to be a 16 byte hash or something; which seems like a > lot of extra bytes for typing one character. This is a client problem. There is not always a reason to make a single character an operation. > Blah. That was a lot of words. Thoughts? See above. :) Ali
