Re: CQL 3.x Update ...USING TIMESTAMP...

Sachin Nikam Mon, 20 Apr 2015 14:05:52 -0700

Tyler,
I can consider trying out light weight transactions, but here are my
concerns
#1. We have 2 data centers located close by with plans to expand to more
data centers which are even further away geographically.
#2. How will this impact light weight transactions when there is high level
of network contention for cross data center traffic.
#3. Do you know of any real examples where companies have used light weight
transactions in a multi-data center traffic.
Regards
Sachin


On Tue, Mar 24, 2015 at 10:56 AM, Tyler Hobbs <ty...@datastax.com> wrote:

> do you just mean that it's easy to forget to always set your timestamp
>> correctly, and if you goof it up, it makes it difficult to recover from
>> (i.e. you issue a delete with system timestamp instead of document version,
>> and that's way larger than your document version would ever be, so you can
>> never write that document again)?
>
>
> Yes, that's basically what I meant.  Plus, if you need to make a manual
> correction to a document, you'll need to increment the version, which would
> presumably cause problems for your application.  It's possible to handle
> all of this correctly if you take care, but I wouldn't trust myself to
> always get this right.
>
>
>> @Tyler
>> With your recommendation, won't I end up saving all the version(s) of the
>> document. In my case the document is pretty huge (~5mb) and each document
>> has up to 10 versions. And you already highlighted that light weight
>> transactions are very expensive.
>>
>
> You can always delete older versions to free up space.
>
> Using lightweight transactions may be a decent option if you don't have
> really high write throughput and aren't expecting high contention (which I
> don't think you are).  I recommend testing this out with your application
> to see how it performs for you.
>
>
> On Sun, Mar 22, 2015 at 7:02 PM, Sachin Nikam <skni...@gmail.com> wrote:
>
>> @Eric Stevens
>> Thanks for representing my position while I came back to this thread.
>>
>> @Tyler
>> With your recommendation, won't I end up saving all the version(s) of the
>> document. In my case the document is pretty huge (~5mb) and each document
>> has up to 10 versions. And you already highlighted that light weight
>> transactions are very expensive.
>>
>> Also as Eric mentions, can you elaborate on what kind of problems could
>> happen when we try to overwrite or delete data?
>> Regards
>> Sachin
>>
>> On Fri, Mar 13, 2015 at 4:23 AM, Brice Dutheil <brice.duth...@gmail.com>
>> wrote:
>>
>>> I agree with Tyler, in the normal run of a live application I would not
>>> recommend the use of the timestamp, and use other ways to *version*
>>> *inserts*. Otherwise you may fall in the *upsert* pitfalls that Tyler
>>> mentions.
>>>
>>> However I find there’s a legitimate use the USING TIMESTAMP trick, when
>>> migrating data form another datastore.
>>>
>>> The trick is at some point to enable the application to start writing
>>> cassandra *without* any timestamp setting on the statements. ⇐ for
>>> fresh data
>>> Then start a migration batch that will use a write time with an older
>>> date (i.e. when there’s *no* possible *collision* with other data). ⇐
>>> for older data
>>>
>>> *This tricks has been used in prod with billions of records.*
>>> 
>>>
>>> -- Brice
>>>
>>> On Thu, Mar 12, 2015 at 10:42 PM, Eric Stevens <migh...@gmail.com>
>>> wrote:
>>>
>>>> Ok, but if you're using a system of time that isn't server clock
>>>> oriented (Sachin's document revision ID, and my fixed and necessarily
>>>> consistent base timestamp [B's always know their parent A's exact recorded
>>>> timestamp]), isn't the principle of using timestamps to force a particular
>>>> update out of several to win still sound?
>>>>
>>>> > as using the clocks is only valid if clocks are perfectly sync'ed,
>>>> which they are not
>>>>
>>>> Clock skew is a problem which doesn't seem to be a factor in either use
>>>> case given that both have a consistent external source of truth for
>>>> timestamp.
>>>>
>>>> On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad <j...@jonhaddad.com>
>>>> wrote:
>>>>
>>>>> In most datacenters you're going to see significant variance in your
>>>>> server times.  Likely > 20ms between servers in the same rack.  Even
>>>>> google, using atomic clocks, has 1-7ms variance.  [1]
>>>>>
>>>>> I would +1 Tyler's advice here, as using the clocks is only valid if
>>>>> clocks are perfectly sync'ed, which they are not, and likely never will be
>>>>> in our lifetime.
>>>>>
>>>>> [1] http://queue.acm.org/detail.cfm?id=2745385
>>>>>
>>>>>
>>>>> On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens <migh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> > It's possible, but you'll end up with problems when attempting to
>>>>>> overwrite or delete entries
>>>>>>
>>>>>> I'm wondering if you can elucidate on that a little bit, do you just
>>>>>> mean that it's easy to forget to always set your timestamp correctly, and
>>>>>> if you goof it up, it makes it difficult to recover from (i.e. you issue 
>>>>>> a
>>>>>> delete with system timestamp instead of document version, and that's way
>>>>>> larger than your document version would ever be, so you can never write
>>>>>> that document again)?  Or is there some bug in write timestamps that can
>>>>>> cause the wrong entry to win the write contention?
>>>>>>
>>>>>> We're looking at doing something similar to keep a live max value
>>>>>> column in a given table, our setup is as follows:
>>>>>>
>>>>>> CREATE TABLE a (
>>>>>>   id <whatever>,
>>>>>>   time timestamp,
>>>>>>   max_b_foo int,
>>>>>>   PRIMARY KEY (id)
>>>>>> );
>>>>>> CREATE TABLE b (
>>>>>>   b_id <whatever>,
>>>>>>   a_id <whatever>,
>>>>>>   a_timestamp timestamp,
>>>>>>   foo int,
>>>>>>   PRIMARY KEY (a_id, b_id)
>>>>>> );
>>>>>>
>>>>>> The idea being that there's a one-to-many relationship between *a*
>>>>>> and *b*.  We want *a* to know what the maximum value is in *b* for
>>>>>> field *foo* so we can avoid reading *all* *b* when we want to
>>>>>> resolve *a*. You can see that we can't just use *b*'s clustering key
>>>>>> to resolve that with LIMIT 1; also this is for DSE Solr, which wouldn't 
>>>>>> be
>>>>>> able to query a by max b.foo anyway.  So when we write to *b*, we
>>>>>> also write to *a* with something like
>>>>>>
>>>>>> UPDATE a USING TIMESTAMP ${b.a_timestamp.toMicros + b.foo} SET
>>>>>> max_b_foo = ${b.foo} WHERE id = ${b.a_id}
>>>>>>
>>>>>> Assuming that we don't run afoul of related antipatterns such as
>>>>>> repeatedly overwriting the same value indefinitely, this strikes me as
>>>>>> sound if unorthodox practice, as long as conflict resolution in Cassandra
>>>>>> isn't broken in some subtle way.  We also designed this to be safe from
>>>>>> getting write timestamps greatly out of sync with clock time so that
>>>>>> non-timestamped operations (especially delete) if done accidentally will
>>>>>> still have a reasonable chance of having the expected results.
>>>>>>
>>>>>> So while it may not be the intended use case for write timestamps,
>>>>>> and there are definitely gotchas if you are not careful or misunderstand
>>>>>> the consequences, as far as I can see the logic behind it is sound but 
>>>>>> does
>>>>>> rely on correct conflict resolution in Cassandra.  I'm curious if I'm
>>>>>> missing or misunderstanding something important.
>>>>>>
>>>>>> On Wed, Mar 11, 2015 at 4:11 PM, Tyler Hobbs <ty...@datastax.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Don't use the version as your timestamp.  It's possible, but you'll
>>>>>>> end up with problems when attempting to overwrite or delete entries.
>>>>>>>
>>>>>>> Instead, make the version part of the primary key:
>>>>>>>
>>>>>>> CREATE TABLE document_store (document_id bigint, version int,
>>>>>>> document text, PRIMARY KEY (document_id, version)) WITH CLUSTERING 
>>>>>>> ORDER BY
>>>>>>> (version desc)
>>>>>>>
>>>>>>> That way you don't have to worry about overwriting higher versions
>>>>>>> with a lower one, and to read the latest version, you only have to do:
>>>>>>>
>>>>>>> SELECT * FROM document_store WHERE document_id = ? LIMIT 1;
>>>>>>>
>>>>>>> Another option is to use lightweight transactions (i.e. UPDATE ...
>>>>>>> SET docuement = ?, version = ? WHERE document_id = ? IF version < ?), 
>>>>>>> but
>>>>>>> that's going to make writes much more expensive.
>>>>>>>
>>>>>>> On Wed, Mar 11, 2015 at 12:45 AM, Sachin Nikam <skni...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am planning to use the Update...USING TIMESTAMP... statement to
>>>>>>>> make sure that I do not overwrite fresh data with stale data while 
>>>>>>>> having
>>>>>>>> to avoid doing at least LOCAL_QUORUM writes.
>>>>>>>>
>>>>>>>> Here is my table structure.
>>>>>>>>
>>>>>>>> Table=DocumentStore
>>>>>>>> DocumentID (primaryKey, bigint)
>>>>>>>> Document(text)
>>>>>>>> Version(int)
>>>>>>>>
>>>>>>>> If the service receives 2 write requests with Version=1 and
>>>>>>>> Version=2, regardless of the order of arrival, the business 
>>>>>>>> requirement is
>>>>>>>> that we end up with Version=2 in the database.
>>>>>>>>
>>>>>>>> Can I use the following CQL Statement?
>>>>>>>>
>>>>>>>> Update DocumentStore using <versionValue>
>>>>>>>> SET  Document=<documentValue>,
>>>>>>>> Version=<versionValue>
>>>>>>>> where DocumentID=<documentIDValue>;
>>>>>>>>
>>>>>>>> Has anybody used something like this? If so was the behavior as
>>>>>>>> expected?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Sachin
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Tyler Hobbs
>>>>>>> DataStax <http://datastax.com/>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: CQL 3.x Update ...USING TIMESTAMP...

Reply via email to