Re: CQL 3.x Update ...USING TIMESTAMP...

Eric Stevens Thu, 12 Mar 2015 14:45:15 -0700

Ok, but if you're using a system of time that isn't server clock oriented
(Sachin's document revision ID, and my fixed and necessarily consistent
base timestamp [B's always know their parent A's exact recorded
timestamp]), isn't the principle of using timestamps to force a particular
update out of several to win still sound?


> as using the clocks is only valid if clocks are perfectly sync'ed, which
they are not

Clock skew is a problem which doesn't seem to be a factor in either use
case given that both have a consistent external source of truth for
timestamp.

On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> In most datacenters you're going to see significant variance in your
> server times.  Likely > 20ms between servers in the same rack.  Even
> google, using atomic clocks, has 1-7ms variance.  [1]
>
> I would +1 Tyler's advice here, as using the clocks is only valid if
> clocks are perfectly sync'ed, which they are not, and likely never will be
> in our lifetime.
>
> [1] http://queue.acm.org/detail.cfm?id=2745385
>
>
> On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens <migh...@gmail.com> wrote:
>
>> > It's possible, but you'll end up with problems when attempting to
>> overwrite or delete entries
>>
>> I'm wondering if you can elucidate on that a little bit, do you just mean
>> that it's easy to forget to always set your timestamp correctly, and if you
>> goof it up, it makes it difficult to recover from (i.e. you issue a delete
>> with system timestamp instead of document version, and that's way larger
>> than your document version would ever be, so you can never write that
>> document again)?  Or is there some bug in write timestamps that can cause
>> the wrong entry to win the write contention?
>>
>> We're looking at doing something similar to keep a live max value column
>> in a given table, our setup is as follows:
>>
>> CREATE TABLE a (
>>   id <whatever>,
>>   time timestamp,
>>   max_b_foo int,
>>   PRIMARY KEY (id)
>> );
>> CREATE TABLE b (
>>   b_id <whatever>,
>>   a_id <whatever>,
>>   a_timestamp timestamp,
>>   foo int,
>>   PRIMARY KEY (a_id, b_id)
>> );
>>
>> The idea being that there's a one-to-many relationship between *a* and
>> *b*.  We want *a* to know what the maximum value is in *b* for field
>> *foo* so we can avoid reading *all* *b* when we want to resolve *a*. You
>> can see that we can't just use *b*'s clustering key to resolve that with
>> LIMIT 1; also this is for DSE Solr, which wouldn't be able to query a by
>> max b.foo anyway.  So when we write to *b*, we also write to *a* with
>> something like
>>
>> UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET max_b_foo
>> = ${b.foo} WHERE id = ${b.a_id}
>>
>> Assuming that we don't run afoul of related antipatterns such as
>> repeatedly overwriting the same value indefinitely, this strikes me as
>> sound if unorthodox practice, as long as conflict resolution in Cassandra
>> isn't broken in some subtle way.  We also designed this to be safe from
>> getting write timestamps greatly out of sync with clock time so that
>> non-timestamped operations (especially delete) if done accidentally will
>> still have a reasonable chance of having the expected results.
>>
>> So while it may not be the intended use case for write timestamps, and
>> there are definitely gotchas if you are not careful or misunderstand the
>> consequences, as far as I can see the logic behind it is sound but does
>> rely on correct conflict resolution in Cassandra.  I'm curious if I'm
>> missing or misunderstanding something important.
>>
>> On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>>
>>> Don't use the version as your timestamp.  It's possible, but you'll end
>>> up with problems when attempting to overwrite or delete entries.
>>>
>>> Instead, make the version part of the primary key:
>>>
>>> CREATE TABLE document_store (document_id bigint, version int, document
>>> text, PRIMARY KEY (document_id, version)) WITH CLUSTERING ORDER BY (version
>>> desc)
>>>
>>> That way you don't have to worry about overwriting higher versions with
>>> a lower one, and to read the latest version, you only have to do:
>>>
>>> SELECT * FROM document_store WHERE document_id = ? LIMIT 1;
>>>
>>> Another option is to use lightweight transactions (i.e. UPDATE ... SET
>>> docuement = ?, version = ? WHERE document_id = ? IF version < ?), but
>>> that's going to make writes much more expensive.
>>>
>>> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam <skni...@gmail.com>
>>> wrote:
>>>
>>>> I am planning to use the Update...USING TIMESTAMP... statement to make
>>>> sure that I do not overwrite fresh data with stale data while having to
>>>> avoid doing at least LOCAL_QUORUM writes.
>>>>
>>>> Here is my table structure.
>>>>
>>>> Table=DocumentStore
>>>> DocumentID (primaryKey, bigint)
>>>> Document(text)
>>>> Version(int)
>>>>
>>>> If the service receives 2 write requests with Version=1 and Version=2,
>>>> regardless of the order of arrival, the business requirement is that we end
>>>> up with Version=2 in the database.
>>>>
>>>> Can I use the following CQL Statement?
>>>>
>>>> Update DocumentStore using <versionValue>
>>>> SET  Document=<documentValue>,
>>>> Version=<versionValue>
>>>> where DocumentID=<documentIDValue>;
>>>>
>>>> Has anybody used something like this? If so was the behavior as
>>>> expected?
>>>>
>>>> Regards
>>>> Sachin
>>>>
>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>
>>

Re: CQL 3.x Update ...USING TIMESTAMP...

Reply via email to