On Jul 31, 2011, at 12:58 PM, Simon Slavin wrote:

> These two go together.  Multi-master replication (one example of which is a 
> document store) is relatively easy.  Datestamp every value (document) and 
> whichever one has the lastest date is the one you want.

        I hear that a lot, but it makes me pretty uncomfortable.  People like 
timestamps because there's some kind of implied causality.  It's not there in 
reality, though, and this type of resolution can lead to harmful (i.e. data 
loss) results.  Just because something happened after something else doesn't 
mean that it had all the same knowledge that went into the first decision.

        And those are the best cases.  Lamport clocks (and the more general 
vector clocks) exist because they *explicitly* state causality.  That is, if I 
have perfectly synchronized clocks and two applications running on machines 
immediately next to each other (trying to avoid the relativity argument[0]), 
event A can occur that changes the data to a particular state and can be picked 
up by server one, but not server two.  Server one and server two can go to 
change the data at roughly the same time, but server two was slightly slower 
and it came in last.  Now server two is just eating data *because* it's 
reacting more slowly.  Your timestamp-automated conflict resolver favors slower 
machines that stay behind.

        With explicit causality, you state that state B succeeds state A 
because we knew about state A regardless of when we made our decision.  If 
state B' tries to succeed state A without knowing about state B, then it can 
happen on an isolated system, but will introduce a conflict when it learns that 
something had already done this.  Now we have two successors for state A and 
only your application's conflict resolver can make sense of what it means for 
what state the document should be in.

        (note that in the case of CouchDB if state B and state B' were the 
same, this would be recognized as not a conflict, but that's rather a special 
case).



[0]: I always try to avoid the relativity argument, but when you're dealing 
with systems that are far apart, whether an event happened before another event 
is entirely up to the perspective of the observer.  My CouchDB on Mars might 
see and react to an event hours before we can observe it on earth.  About one 
(earth) year later, Earth might observe and react to this event hours before 
Mars can.  In both cases, an event coming from the other direction would have 
flipped its arrival order between the two.

While that may seem like a silly thing to discuss, the exact thing happens 
locally.  The theoretical floor of ping time between the east and west coast of 
the US is about 18ms milliseconds.  In practice, you'll get something closer to 
40ms.  Do you know how much stuff happens within 40ms?  From the perspective of 
the east coast, anything happening on the east cost will appear to occur 
considerably sooner than anything happening at "the same time" on the west 
coast.

-- 
dustin sallings



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to