> > do you just mean that it's easy to forget to always set your timestamp > correctly, and if you goof it up, it makes it difficult to recover from > (i.e. you issue a delete with system timestamp instead of document version, > and that's way larger than your document version would ever be, so you can > never write that document again)?
Yes, that's basically what I meant. Plus, if you need to make a manual correction to a document, you'll need to increment the version, which would presumably cause problems for your application. It's possible to handle all of this correctly if you take care, but I wouldn't trust myself to always get this right. > @Tyler > With your recommendation, won't I end up saving all the version(s) of the > document. In my case the document is pretty huge (~5mb) and each document > has up to 10 versions. And you already highlighted that light weight > transactions are very expensive. > You can always delete older versions to free up space. Using lightweight transactions may be a decent option if you don't have really high write throughput and aren't expecting high contention (which I don't think you are). I recommend testing this out with your application to see how it performs for you. On Sun, Mar 22, 2015 at 7:02 PM, Sachin Nikam <skni...@gmail.com> wrote: > @Eric Stevens > Thanks for representing my position while I came back to this thread. > > @Tyler > With your recommendation, won't I end up saving all the version(s) of the > document. In my case the document is pretty huge (~5mb) and each document > has up to 10 versions. And you already highlighted that light weight > transactions are very expensive. > > Also as Eric mentions, can you elaborate on what kind of problems could > happen when we try to overwrite or delete data? > Regards > Sachin > > On Fri, Mar 13, 2015 at 4:23 AM, Brice Dutheil <brice.duth...@gmail.com> > wrote: > >> I agree with Tyler, in the normal run of a live application I would not >> recommend the use of the timestamp, and use other ways to *version* >> *inserts*. Otherwise you may fall in the *upsert* pitfalls that Tyler >> mentions. >> >> However I find there’s a legitimate use the USING TIMESTAMP trick, when >> migrating data form another datastore. >> >> The trick is at some point to enable the application to start writing >> cassandra *without* any timestamp setting on the statements. ⇐ for fresh >> data >> Then start a migration batch that will use a write time with an older >> date (i.e. when there’s *no* possible *collision* with other data). ⇐ >> for older data >> >> *This tricks has been used in prod with billions of records.* >> >> >> -- Brice >> >> On Thu, Mar 12, 2015 at 10:42 PM, Eric Stevens <migh...@gmail.com> wrote: >> >>> Ok, but if you're using a system of time that isn't server clock >>> oriented (Sachin's document revision ID, and my fixed and necessarily >>> consistent base timestamp [B's always know their parent A's exact recorded >>> timestamp]), isn't the principle of using timestamps to force a particular >>> update out of several to win still sound? >>> >>> > as using the clocks is only valid if clocks are perfectly sync'ed, >>> which they are not >>> >>> Clock skew is a problem which doesn't seem to be a factor in either use >>> case given that both have a consistent external source of truth for >>> timestamp. >>> >>> On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad <j...@jonhaddad.com> >>> wrote: >>> >>>> In most datacenters you're going to see significant variance in your >>>> server times. Likely > 20ms between servers in the same rack. Even >>>> google, using atomic clocks, has 1-7ms variance. [1] >>>> >>>> I would +1 Tyler's advice here, as using the clocks is only valid if >>>> clocks are perfectly sync'ed, which they are not, and likely never will be >>>> in our lifetime. >>>> >>>> [1] http://queue.acm.org/detail.cfm?id=2745385 >>>> >>>> >>>> On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens <migh...@gmail.com> wrote: >>>> >>>>> > It's possible, but you'll end up with problems when attempting to >>>>> overwrite or delete entries >>>>> >>>>> I'm wondering if you can elucidate on that a little bit, do you just >>>>> mean that it's easy to forget to always set your timestamp correctly, and >>>>> if you goof it up, it makes it difficult to recover from (i.e. you issue a >>>>> delete with system timestamp instead of document version, and that's way >>>>> larger than your document version would ever be, so you can never write >>>>> that document again)? Or is there some bug in write timestamps that can >>>>> cause the wrong entry to win the write contention? >>>>> >>>>> We're looking at doing something similar to keep a live max value >>>>> column in a given table, our setup is as follows: >>>>> >>>>> CREATE TABLE a ( >>>>> id <whatever>, >>>>> time timestamp, >>>>> max_b_foo int, >>>>> PRIMARY KEY (id) >>>>> ); >>>>> CREATE TABLE b ( >>>>> b_id <whatever>, >>>>> a_id <whatever>, >>>>> a_timestamp timestamp, >>>>> foo int, >>>>> PRIMARY KEY (a_id, b_id) >>>>> ); >>>>> >>>>> The idea being that there's a one-to-many relationship between *a* >>>>> and *b*. We want *a* to know what the maximum value is in *b* for >>>>> field *foo* so we can avoid reading *all* *b* when we want to resolve >>>>> *a*. You can see that we can't just use *b*'s clustering key to >>>>> resolve that with LIMIT 1; also this is for DSE Solr, which wouldn't be >>>>> able to query a by max b.foo anyway. So when we write to *b*, we >>>>> also write to *a* with something like >>>>> >>>>> UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET >>>>> max_b_foo = ${b.foo} WHERE id = ${b.a_id} >>>>> >>>>> Assuming that we don't run afoul of related antipatterns such as >>>>> repeatedly overwriting the same value indefinitely, this strikes me as >>>>> sound if unorthodox practice, as long as conflict resolution in Cassandra >>>>> isn't broken in some subtle way. We also designed this to be safe from >>>>> getting write timestamps greatly out of sync with clock time so that >>>>> non-timestamped operations (especially delete) if done accidentally will >>>>> still have a reasonable chance of having the expected results. >>>>> >>>>> So while it may not be the intended use case for write timestamps, and >>>>> there are definitely gotchas if you are not careful or misunderstand the >>>>> consequences, as far as I can see the logic behind it is sound but does >>>>> rely on correct conflict resolution in Cassandra. I'm curious if I'm >>>>> missing or misunderstanding something important. >>>>> >>>>> On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs <ty...@datastax.com> >>>>> wrote: >>>>> >>>>>> Don't use the version as your timestamp. It's possible, but you'll >>>>>> end up with problems when attempting to overwrite or delete entries. >>>>>> >>>>>> Instead, make the version part of the primary key: >>>>>> >>>>>> CREATE TABLE document_store (document_id bigint, version int, >>>>>> document text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER >>>>>> BY >>>>>> (version desc) >>>>>> >>>>>> That way you don't have to worry about overwriting higher versions >>>>>> with a lower one, and to read the latest version, you only have to do: >>>>>> >>>>>> SELECT * FROM document_store WHERE document_id = ? LIMIT 1; >>>>>> >>>>>> Another option is to use lightweight transactions (i.e. UPDATE ... >>>>>> SET docuement = ?, version = ? WHERE document_id = ? IF version < ?), but >>>>>> that's going to make writes much more expensive. >>>>>> >>>>>> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam <skni...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I am planning to use the Update...USING TIMESTAMP... statement to >>>>>>> make sure that I do not overwrite fresh data with stale data while >>>>>>> having >>>>>>> to avoid doing at least LOCAL_QUORUM writes. >>>>>>> >>>>>>> Here is my table structure. >>>>>>> >>>>>>> Table=DocumentStore >>>>>>> DocumentID (primaryKey, bigint) >>>>>>> Document(text) >>>>>>> Version(int) >>>>>>> >>>>>>> If the service receives 2 write requests with Version=1 and >>>>>>> Version=2, regardless of the order of arrival, the business requirement >>>>>>> is >>>>>>> that we end up with Version=2 in the database. >>>>>>> >>>>>>> Can I use the following CQL Statement? >>>>>>> >>>>>>> Update DocumentStore using <versionValue> >>>>>>> SET Document=<documentValue>, >>>>>>> Version=<versionValue> >>>>>>> where DocumentID=<documentIDValue>; >>>>>>> >>>>>>> Has anybody used something like this? If so was the behavior as >>>>>>> expected? >>>>>>> >>>>>>> Regards >>>>>>> Sachin >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Tyler Hobbs >>>>>> DataStax <http://datastax.com/> >>>>>> >>>>> >>>>> >>> >> > -- Tyler Hobbs DataStax <http://datastax.com/>