Completely understand! All that I am quibbling over is whether a CL of quorum guarantees consistency or not. That is what the documentation says - right. IF for a CL of Q read - it depends on which node returns read first to determine the actual returned result or other more convoluted conditions , then a Quorum read/write is not consistent, by any definition.
I can still use Cassandra, and will use it, luv it!!! But let us not make this statement on the Wiki architecture section:- ------------------------------------------------------------- More specifically: R=read replica count W=write replica count N=replication factor Q=*QUORUM* (Q = N / 2 + 1) - If W + R > N, you will have consistency - W=1, R=N - W=N, R=1 - W=Q, R=Q where Q = N / 2 + 1 Cassandra provides consistency when R + W > N (read replica count + write replica count > replication factor). ---------------------------------------------------- . On Thu, Feb 24, 2011 at 11:22 AM, Sylvain Lebresne <sylv...@datastax.com>wrote: > On Thu, Feb 24, 2011 at 6:01 PM, Anthony John <chirayit...@gmail.com>wrote: > >> If you are correct and you are probably closer to the code - then CL of >> Quorum does not guarantee a consistency. > > > If the operation succeed, it does (for some definition of consistency which > is, following reads at Quorum will be guaranteed to see the new value of a > update at quorum). If it fails, then no, it does not guarantee consistency. > > It is important to note that the word consistency has multiple meaning. In > particular, when we are talking of consistency in Cassandra, we are not > talking of the same definition as the C in ACID (see: > http://www.allthingsdistributed.com/2007/12/eventually_consistent.html) > >> >> On Thu, Feb 24, 2011 at 10:54 AM, Sylvain Lebresne >> <sylv...@datastax.com>wrote: >> >>> On Thu, Feb 24, 2011 at 5:34 PM, Anthony John <chirayit...@gmail.com>wrote: >>> >>>> >>Time stamps are not used for conflict resolution - unless is is part >>>>> of the application logic!!! >>>>> >>>> >>>> >>What is you definition of conflict resolution ? Because if you update >>>> twice the same column (which >>>> >>I'll call a conflict), then the timestamps are used to decide which >>>> update wins (which I'll call a resolution). >>>> >>>> I understand what you are saying, and yes semantics is very important >>>> here. And yes we are responding to the immediate questions without covering >>>> all questions in the thread. >>>> >>>> The point being made here is that the timestamp of the column is not >>>> used by Cassandra to figure out what data to return. >>>> >>> >>> Not quite true. >>> >>> >>>> E.g. - Quorum is 2 nodes - and RF of 3 over N1/2/3 >>>> A Quorum Write comes and add/updates the time stamp (TS2) of a >>>> particular data element. It succeeds on N1 - fails on N2/3. So the write is >>>> returned as failed - right ? >>>> Now Quorum read comes in for exactly the same piece of data that the >>>> write failed for. >>>> So N1 has TS2 but both N2/3 have the old TS (say TS1) >>>> And the read succeeds - Will it return TS1 or TS2. >>>> >>>> I submit it will return TS1 - the old TS. >>>> >>> >>> It all depends on which (first 2) nodes respond to the read (since RF=3, >>> that can any two of N1/N2/N3). If N1 is part of the two that makes the >>> quorum, then TS2 will be returned, because cassandra will compare the >>> timestamp and decide what to return based on this. If N2/N3 responds >>> however, both timestamp will be TS1 and so, after timestamp resolution, it >>> will stil be TS1 that will be returned. >>> So yes timestamp is used for conflict resolution. >>> >>> In your example, you could get TS1 back because a failed write can let >>> you cluster in an inconsistent state. You'd have to retry the quorum and >>> only when it succeeds can you be guaranteed that quorum read will always >>> return TS2. >>> >>> This is because when a write fails, Cassandra doesn't guarantee that the >>> write did not made it in (there is no revert). >>> >>> >>>> >>>> Are we on the same page with this interpretation ? >>>> >>>> Regards, >>>> >>>> -JA >>>> >>>> On Thu, Feb 24, 2011 at 10:12 AM, Sylvain Lebresne < >>>> sylv...@datastax.com> wrote: >>>> >>>>> On Thu, Feb 24, 2011 at 4:52 PM, Anthony John >>>>> <chirayit...@gmail.com>wrote: >>>>> >>>>>> Sylvan, >>>>>> >>>>>> Time stamps are not used for conflict resolution - unless is is part >>>>>> of the application logic!!! >>>>>> >>>>> >>>>> What is you definition of conflict resolution ? Because if you update >>>>> twice the same column (which >>>>> I'll call a conflict), then the timestamps are used to decide which >>>>> update wins (which I'll call a resolution). >>>>> >>>>> >>>>>> You can have "lost updates" w/Cassandra. You need to to use 3rd >>>>>> products - cages for e.g. - to get ACID type consistency. >>>>>> >>>>> >>>>> Then again, you'll have to define what you are calling "lost updates". >>>>> Provided you use a reasonable consistency level, Cassandra provides fairly >>>>> strong durability guarantee, so for some definition you don't "lose >>>>> updates". >>>>> >>>>> That being said, I never pretended that Cassandra provided any ACID >>>>> guarantee. ACID relates to transaction, which Cassandra doesn't support. >>>>> If >>>>> we're talking about the guarantees of transaction, then by all means, >>>>> cassandra won't provide it. And yes you can use cages or the like to get >>>>> transaction. But that was not the point of the thread, was it ? The thread >>>>> is about vector clocks, and that has nothing to do with transaction >>>>> (vector >>>>> clocks certainly don't give you transactions). >>>>> >>>>> Sorry if I wasn't clear in my mail, but I was only responding to why so >>>>> far I don't think vector clocks would really provide much for Cassandra. >>>>> >>>>> -- >>>>> Sylvain >>>>> >>>>> >>>>>> -JA >>>>>> >>>>>> >>>>>> On Thu, Feb 24, 2011 at 7:41 AM, Sylvain Lebresne < >>>>>> sylv...@datastax.com> wrote: >>>>>> >>>>>>> On Thu, Feb 24, 2011 at 3:22 AM, Anthony John <chirayit...@gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Apologies : For some reason my response on the original mail keeps >>>>>>>> bouncing back, thus this new one! >>>>>>>> > From the other hand, the same article says: >>>>>>>> > "For conditional writes to work, the condition must be evaluated >>>>>>>> at all update >>>>>>>> > sites before the write can be allowed to succeed." >>>>>>>> > >>>>>>>> > This means, that when doing such an update CL=ALL must be used >>>>>>>> >>>>>>>> Sorry, but I am confused by that entire thread! >>>>>>>> >>>>>>>> Questions:- >>>>>>>> 1. Does Cassandra implement any kind of data locking - at any >>>>>>>> granularity whether it be row/colF/Col ? >>>>>>>> >>>>>>> >>>>>>> No locking, no. >>>>>>> >>>>>>> >>>>>>>> 2. If the answer to 1 above is NO! - how does CL ALL prevent >>>>>>>> conflicts. Concurrent updates on exactly the same piece of data on >>>>>>>> different >>>>>>>> nodes can still mess each other up, right ? >>>>>>>> >>>>>>> >>>>>>> Not sure why you are taking CL.ALL specifically. But in any CL, >>>>>>> updating the same piece of data means the same column value. In that >>>>>>> case, >>>>>>> the resolution rules are the following: >>>>>>> - If the updates have a different timestamp, keep the one with the >>>>>>> higher timestamp. That is, the more recent of two updates win. >>>>>>> - It the timestamps are the same, then it compares the values (byte >>>>>>> comparison) and keep the highest value. This is just to break ties in a >>>>>>> consistent manner. >>>>>>> >>>>>>> So if you do two truly concurrent updates (that is from two place at >>>>>>> the same instant), then you'll end with one of the update. This is the >>>>>>> column level. >>>>>>> >>>>>>> However, if that simple conflict detection/resolution mechanism is >>>>>>> not good enough for some of your use case and you need to keep two >>>>>>> concurrent updates, it is easy enough. Just make sure that the update >>>>>>> don't >>>>>>> end up in the same column. This is easily achieved by appending some >>>>>>> unique >>>>>>> identifier to the column name for instance. And when reading, do a >>>>>>> slice and >>>>>>> reconcile whatever you get back with whatever logic make sense. If you >>>>>>> do >>>>>>> that, congrats, you've roughly emulated what vector clocks would do. >>>>>>> Btw, no >>>>>>> locking or anything needed. >>>>>>> >>>>>>> In my experience, for most things the timestamp resolution is enough. >>>>>>> If the same user update twice it's profile picture on you web site at >>>>>>> the >>>>>>> same microsecond, it's usually fine to end up with one of the two >>>>>>> pictures. >>>>>>> In the rare case where you need something more specific, using the >>>>>>> cassandra >>>>>>> data model usually solves the problem easily. The reason for not having >>>>>>> vector clocks in Cassandra is that so far, we haven't really found much >>>>>>> example where it is no the case. >>>>>>> >>>>>>> -- >>>>>>> Sylvain >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >