On Jul 31, 2011, at 12:58 PM, Simon Slavin wrote:
> These two go together. Multi-master replication (one example of which is a
> document store) is relatively easy. Datestamp every value (document) and
> whichever one has the lastest date is the one you want.
I hear that a lot, but it makes me pretty uncomfortable. People like
timestamps because there's some kind of implied causality. It's not there in
reality, though, and this type of resolution can lead to harmful (i.e. data
loss) results. Just because something happened after something else doesn't
mean that it had all the same knowledge that went into the first decision.
And those are the best cases. Lamport clocks (and the more general
vector clocks) exist because they *explicitly* state causality. That is, if I
have perfectly synchronized clocks and two applications running on machines
immediately next to each other (trying to avoid the relativity argument[0]),
event A can occur that changes the data to a particular state and can be picked
up by server one, but not server two. Server one and server two can go to
change the data at roughly the same time, but server two was slightly slower
and it came in last. Now server two is just eating data *because* it's
reacting more slowly. Your timestamp-automated conflict resolver favors slower
machines that stay behind.
With explicit causality, you state that state B succeeds state A
because we knew about state A regardless of when we made our decision. If
state B' tries to succeed state A without knowing about state B, then it can
happen on an isolated system, but will introduce a conflict when it learns that
something had already done this. Now we have two successors for state A and
only your application's conflict resolver can make sense of what it means for
what state the document should be in.
(note that in the case of CouchDB if state B and state B' were the
same, this would be recognized as not a conflict, but that's rather a special
case).
[0]: I always try to avoid the relativity argument, but when you're dealing
with systems that are far apart, whether an event happened before another event
is entirely up to the perspective of the observer. My CouchDB on Mars might
see and react to an event hours before we can observe it on earth. About one
(earth) year later, Earth might observe and react to this event hours before
Mars can. In both cases, an event coming from the other direction would have
flipped its arrival order between the two.
While that may seem like a silly thing to discuss, the exact thing happens
locally. The theoretical floor of ping time between the east and west coast of
the US is about 18ms milliseconds. In practice, you'll get something closer to
40ms. Do you know how much stuff happens within 40ms? From the perspective of
the east coast, anything happening on the east cost will appear to occur
considerably sooner than anything happening at "the same time" on the west
coast.
--
dustin sallings
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users