Re: Delving into OT

Ali Lown Mon, 24 Jun 2013 14:32:15 -0700

> 1. CRDTs (like Logoot[1])
> 2. 'Classical' OT using vector clocks for versioning.
> 3. The OT system [...] similar to classical OT, except using git style
> version hashes.


A quick look at Logoot's paper says that deletion is not generally
possible with CRDTs (p5 section 4) without either a) Tombstones or b)
vector clocks.

So, I suspect it would be possible to make use of CRDTs (which seem
the best approach) with the git-style hash to fully allow deletion.

> CRDTs are probably the most elegant solution. They avoid the entire
> complicated mess of transformation. However:
> - They add a significant overhead to the document size (we must store
> a unique name between every two characters in the wave)
> - As far as I know, nobody has implemented arbitrary JSON editing
> using CRDTs. Implementing a set is very difficult.
> - Even without transform, we'll still have to store operations anyway
> so peers can verify signatures.
>
> I'd like to know what we should expect the actual document size
> overhead to be. I expect it to be at least a 5x size increase, but I
> don't know for sure. The papers I've read cleverly minimize the
> overhead by only doing OT on a line-by-line basis, but thats
> unacceptable for wave.

Really? Wave currently has special handling for <line></line> elements
in the document structure. (IIRC for the sole purpose of making its OT
easier - there was some reason to not put the document as
<line>data</line> but <line></line>data).
Many systems (e.g. *nix) take the idea of a line quite seriously since
they all favour text-based approaches.
The protocol shouldn't really be handling arbitrary binary data(should
it?), and JSON can easily be arbitrary newlined without a problem)

> [...] a plaintext implementation which does this in javascript[3], although
> its quite slow (~150k transforms per second). I expect that an
> optimized version of this will be around 10-20 million transforms /
> second in C[4] for simple ops, and probably a few million per second
> in java.

Nice numbers! (Heh, can we see some actual data :) )

> When peers connect, they send each other missing ops. Figuring out
> which ops are missing can be surprisingly tricky - but we'll figure
> that out later. New ops must be ingested in order, so we always ingest
> an operation after ingesting all of its parents.

Alarm bells start ringing here... Care to elaborate on how we could
reasonably do this, it sounds difficult.

> This algorithm requires a lot of transforms, but I think most of them
> are unavoidable because of the signing / trust issue.

Is there some way to use GPG-style signing chains to avoid having to
retransform the whole thing whilst maintaining trust?

> It also requires
> that every operation (ie, every key press) has a unique ID. Thats
> probably going to be a 16 byte hash or something; which seems like a
> lot of extra bytes for typing one character.

This is a client problem. There is not always a reason to make a
single character an operation.

> Blah. That was a lot of words. Thoughts?

See above. :)

Ali

Re: Delving into OT

Reply via email to