Ok, but if you're using a system of time that isn't server clock oriented (Sachin's document revision ID, and my fixed and necessarily consistent base timestamp [B's always know their parent A's exact recorded timestamp]), isn't the principle of using timestamps to force a particular update out of several to win still sound?
> as using the clocks is only valid if clocks are perfectly sync'ed, which they are not Clock skew is a problem which doesn't seem to be a factor in either use case given that both have a consistent external source of truth for timestamp. On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > In most datacenters you're going to see significant variance in your > server times. Likely > 20ms between servers in the same rack. Even > google, using atomic clocks, has 1-7ms variance. [1] > > I would +1 Tyler's advice here, as using the clocks is only valid if > clocks are perfectly sync'ed, which they are not, and likely never will be > in our lifetime. > > [1] http://queue.acm.org/detail.cfm?id=2745385 > > > On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens <migh...@gmail.com> wrote: > >> > It's possible, but you'll end up with problems when attempting to >> overwrite or delete entries >> >> I'm wondering if you can elucidate on that a little bit, do you just mean >> that it's easy to forget to always set your timestamp correctly, and if you >> goof it up, it makes it difficult to recover from (i.e. you issue a delete >> with system timestamp instead of document version, and that's way larger >> than your document version would ever be, so you can never write that >> document again)? Or is there some bug in write timestamps that can cause >> the wrong entry to win the write contention? >> >> We're looking at doing something similar to keep a live max value column >> in a given table, our setup is as follows: >> >> CREATE TABLE a ( >> id <whatever>, >> time timestamp, >> max_b_foo int, >> PRIMARY KEY (id) >> ); >> CREATE TABLE b ( >> b_id <whatever>, >> a_id <whatever>, >> a_timestamp timestamp, >> foo int, >> PRIMARY KEY (a_id, b_id) >> ); >> >> The idea being that there's a one-to-many relationship between *a* and >> *b*. We want *a* to know what the maximum value is in *b* for field >> *foo* so we can avoid reading *all* *b* when we want to resolve *a*. You >> can see that we can't just use *b*'s clustering key to resolve that with >> LIMIT 1; also this is for DSE Solr, which wouldn't be able to query a by >> max b.foo anyway. So when we write to *b*, we also write to *a* with >> something like >> >> UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET max_b_foo >> = ${b.foo} WHERE id = ${b.a_id} >> >> Assuming that we don't run afoul of related antipatterns such as >> repeatedly overwriting the same value indefinitely, this strikes me as >> sound if unorthodox practice, as long as conflict resolution in Cassandra >> isn't broken in some subtle way. We also designed this to be safe from >> getting write timestamps greatly out of sync with clock time so that >> non-timestamped operations (especially delete) if done accidentally will >> still have a reasonable chance of having the expected results. >> >> So while it may not be the intended use case for write timestamps, and >> there are definitely gotchas if you are not careful or misunderstand the >> consequences, as far as I can see the logic behind it is sound but does >> rely on correct conflict resolution in Cassandra. I'm curious if I'm >> missing or misunderstanding something important. >> >> On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs <ty...@datastax.com> wrote: >> >>> Don't use the version as your timestamp. It's possible, but you'll end >>> up with problems when attempting to overwrite or delete entries. >>> >>> Instead, make the version part of the primary key: >>> >>> CREATE TABLE document_store (document_id bigint, version int, document >>> text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version >>> desc) >>> >>> That way you don't have to worry about overwriting higher versions with >>> a lower one, and to read the latest version, you only have to do: >>> >>> SELECT * FROM document_store WHERE document_id = ? LIMIT 1; >>> >>> Another option is to use lightweight transactions (i.e. UPDATE ... SET >>> docuement = ?, version = ? WHERE document_id = ? IF version < ?), but >>> that's going to make writes much more expensive. >>> >>> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam <skni...@gmail.com> >>> wrote: >>> >>>> I am planning to use the Update...USING TIMESTAMP... statement to make >>>> sure that I do not overwrite fresh data with stale data while having to >>>> avoid doing at least LOCAL_QUORUM writes. >>>> >>>> Here is my table structure. >>>> >>>> Table=DocumentStore >>>> DocumentID (primaryKey, bigint) >>>> Document(text) >>>> Version(int) >>>> >>>> If the service receives 2 write requests with Version=1 and Version=2, >>>> regardless of the order of arrival, the business requirement is that we end >>>> up with Version=2 in the database. >>>> >>>> Can I use the following CQL Statement? >>>> >>>> Update DocumentStore using <versionValue> >>>> SET Document=<documentValue>, >>>> Version=<versionValue> >>>> where DocumentID=<documentIDValue>; >>>> >>>> Has anybody used something like this? If so was the behavior as >>>> expected? >>>> >>>> Regards >>>> Sachin >>>> >>> >>> >>> >>> -- >>> Tyler Hobbs >>> DataStax <http://datastax.com/> >>> >> >>