I agree with Tyler, in the normal run of a live application I would not recommend the use of the timestamp, and use other ways to *version* *inserts*. Otherwise you may fall in the *upsert* pitfalls that Tyler mentions.
However I find there’s a legitimate use the USING TIMESTAMP trick, when migrating data form another datastore. The trick is at some point to enable the application to start writing cassandra *without* any timestamp setting on the statements. ⇐ for fresh data Then start a migration batch that will use a write time with an older date (i.e. when there’s *no* possible *collision* with other data). ⇐ for older data *This tricks has been used in prod with billions of records.* -- Brice On Thu, Mar 12, 2015 at 10:42 PM, Eric Stevens <migh...@gmail.com> wrote: > Ok, but if you're using a system of time that isn't server clock oriented > (Sachin's document revision ID, and my fixed and necessarily consistent > base timestamp [B's always know their parent A's exact recorded > timestamp]), isn't the principle of using timestamps to force a particular > update out of several to win still sound? > > > as using the clocks is only valid if clocks are perfectly sync'ed, > which they are not > > Clock skew is a problem which doesn't seem to be a factor in either use > case given that both have a consistent external source of truth for > timestamp. > > On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> In most datacenters you're going to see significant variance in your >> server times. Likely > 20ms between servers in the same rack. Even >> google, using atomic clocks, has 1-7ms variance. [1] >> >> I would +1 Tyler's advice here, as using the clocks is only valid if >> clocks are perfectly sync'ed, which they are not, and likely never will be >> in our lifetime. >> >> [1] http://queue.acm.org/detail.cfm?id=2745385 >> >> >> On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens <migh...@gmail.com> wrote: >> >>> > It's possible, but you'll end up with problems when attempting to >>> overwrite or delete entries >>> >>> I'm wondering if you can elucidate on that a little bit, do you just >>> mean that it's easy to forget to always set your timestamp correctly, and >>> if you goof it up, it makes it difficult to recover from (i.e. you issue a >>> delete with system timestamp instead of document version, and that's way >>> larger than your document version would ever be, so you can never write >>> that document again)? Or is there some bug in write timestamps that can >>> cause the wrong entry to win the write contention? >>> >>> We're looking at doing something similar to keep a live max value column >>> in a given table, our setup is as follows: >>> >>> CREATE TABLE a ( >>> id <whatever>, >>> time timestamp, >>> max_b_foo int, >>> PRIMARY KEY (id) >>> ); >>> CREATE TABLE b ( >>> b_id <whatever>, >>> a_id <whatever>, >>> a_timestamp timestamp, >>> foo int, >>> PRIMARY KEY (a_id, b_id) >>> ); >>> >>> The idea being that there's a one-to-many relationship between *a* and >>> *b*. We want *a* to know what the maximum value is in *b* for field >>> *foo* so we can avoid reading *all* *b* when we want to resolve *a*. >>> You can see that we can't just use *b*'s clustering key to resolve that >>> with LIMIT 1; also this is for DSE Solr, which wouldn't be able to query a >>> by max b.foo anyway. So when we write to *b*, we also write to *a* >>> with something like >>> >>> UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET max_b_foo >>> = ${b.foo} WHERE id = ${b.a_id} >>> >>> Assuming that we don't run afoul of related antipatterns such as >>> repeatedly overwriting the same value indefinitely, this strikes me as >>> sound if unorthodox practice, as long as conflict resolution in Cassandra >>> isn't broken in some subtle way. We also designed this to be safe from >>> getting write timestamps greatly out of sync with clock time so that >>> non-timestamped operations (especially delete) if done accidentally will >>> still have a reasonable chance of having the expected results. >>> >>> So while it may not be the intended use case for write timestamps, and >>> there are definitely gotchas if you are not careful or misunderstand the >>> consequences, as far as I can see the logic behind it is sound but does >>> rely on correct conflict resolution in Cassandra. I'm curious if I'm >>> missing or misunderstanding something important. >>> >>> On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs <ty...@datastax.com> wrote: >>> >>>> Don't use the version as your timestamp. It's possible, but you'll end >>>> up with problems when attempting to overwrite or delete entries. >>>> >>>> Instead, make the version part of the primary key: >>>> >>>> CREATE TABLE document_store (document_id bigint, version int, document >>>> text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version >>>> desc) >>>> >>>> That way you don't have to worry about overwriting higher versions with >>>> a lower one, and to read the latest version, you only have to do: >>>> >>>> SELECT * FROM document_store WHERE document_id = ? LIMIT 1; >>>> >>>> Another option is to use lightweight transactions (i.e. UPDATE ... SET >>>> docuement = ?, version = ? WHERE document_id = ? IF version < ?), but >>>> that's going to make writes much more expensive. >>>> >>>> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam <skni...@gmail.com> >>>> wrote: >>>> >>>>> I am planning to use the Update...USING TIMESTAMP... statement to make >>>>> sure that I do not overwrite fresh data with stale data while having to >>>>> avoid doing at least LOCAL_QUORUM writes. >>>>> >>>>> Here is my table structure. >>>>> >>>>> Table=DocumentStore >>>>> DocumentID (primaryKey, bigint) >>>>> Document(text) >>>>> Version(int) >>>>> >>>>> If the service receives 2 write requests with Version=1 and Version=2, >>>>> regardless of the order of arrival, the business requirement is that we >>>>> end >>>>> up with Version=2 in the database. >>>>> >>>>> Can I use the following CQL Statement? >>>>> >>>>> Update DocumentStore using <versionValue> >>>>> SET Document=<documentValue>, >>>>> Version=<versionValue> >>>>> where DocumentID=<documentIDValue>; >>>>> >>>>> Has anybody used something like this? If so was the behavior as >>>>> expected? >>>>> >>>>> Regards >>>>> Sachin >>>>> >>>> >>>> >>>> >>>> -- >>>> Tyler Hobbs >>>> DataStax <http://datastax.com/> >>>> >>> >>> >