I see the point - apologies for putting everyone through this! It was just militating against my mental model.
In summary, here is my take away - simple stuff but - IMO - important to conclude this thread (I hope):- 1. I was splitting hair over a failed ( partial ) Q Write. Such an event should be immediately followed by the same write going to a connection on to another node ( potentially using connection caches of client implementations ) or a Read at CL of All. Because a write could have partially gone through. 2. Timestamps are used in determining the latest version ( correcting the false impression I was propagating) Finally, wrt "W + R > N for Q CL statement" holds, but could be broken in case of a failed write as it is unsure whether the new value got written on any server or not. Is that a fair characterization ? Bottom line - unlike traditional DBMS, errors do not ensure automatic cleanup and revert back, app code has to follow up if immediate - and not eventual - consistency is desired. I made that leap in almost all cases - I think - but the case of a failed write. My bad and I can live with this! Regards, -JA On Thu, Feb 24, 2011 at 11:50 AM, Sylvain Lebresne <sylv...@datastax.com>wrote: > On Thu, Feb 24, 2011 at 6:33 PM, Anthony John <chirayit...@gmail.com>wrote: > >> Completely understand! >> >> All that I am quibbling over is whether a CL of quorum guarantees >> consistency or not. That is what the documentation says - right. IF for a CL >> of Q read - it depends on which node returns read first to determine the >> actual returned result or other more convoluted conditions , then a Quorum >> read/write is not consistent, by any definition. >> > > But that's the point. The definition of consistency we are talking about > has no meaning if you consider only a quorum read. The definition (which is > the de facto definition of consistency in 'eventually consistent') make > sense if we talk about a write followed by a read. And it is > considering succeeding write followed by succeeding read. > And that is the statement the wiki is making. > > Honestly, we could debate forever on the definition of consistency and > whatnot. Cassandra guaranties that if you do a (succeeding) write on W > replica and then a (succeeding) read on R replica and if R+W>N, then it is > guaranteed that the read will see the preceding write. And this is what is > called consistency in the context of eventual consistency (which is not the > context of ACID). > > If this is not the definition of consistency you had in mind then by all > mean, Cassandra probably don't guarantee this definition. But given that the > paragraph preceding what you pasted state clearly we are not talking about > ACID consistency, but eventual consistency, I don't think the wiki is making > any unfair statement. > > That being said, the wiki may not be always as clear as it could. But it's > an editable wiki :) > > -- > Sylvain > > >> >> I can still use Cassandra, and will use it, luv it!!! But let us not make >> this statement on the Wiki architecture section:- >> >> ------------------------------------------------------------- >> >> More specifically: R=read replica count W=write replica count N=replication >> factor Q=*QUORUM* (Q = N / 2 + 1) >> >> - >> >> If W + R > N, you will have consistency >> - W=1, R=N >> - W=N, R=1 >> - W=Q, R=Q where Q = N / 2 + 1 >> >> Cassandra provides consistency when R + W > N (read replica count + write >> replica count > replication factor). >> >> ---------------------------------------------------- >> >> >> . >> >> >> On Thu, Feb 24, 2011 at 11:22 AM, Sylvain Lebresne >> <sylv...@datastax.com>wrote: >> >>> On Thu, Feb 24, 2011 at 6:01 PM, Anthony John <chirayit...@gmail.com>wrote: >>> >>>> If you are correct and you are probably closer to the code - then CL of >>>> Quorum does not guarantee a consistency. >>> >>> >>> If the operation succeed, it does (for some definition of consistency >>> which is, following reads at Quorum will be guaranteed to see the new value >>> of a update at quorum). If it fails, then no, it does not guarantee >>> consistency. >>> >>> It is important to note that the word consistency has multiple meaning. >>> In particular, when we are talking of consistency in Cassandra, we are not >>> talking of the same definition as the C in ACID (see: >>> http://www.allthingsdistributed.com/2007/12/eventually_consistent.html) >>> >>>> >>>> On Thu, Feb 24, 2011 at 10:54 AM, Sylvain Lebresne < >>>> sylv...@datastax.com> wrote: >>>> >>>>> On Thu, Feb 24, 2011 at 5:34 PM, Anthony John >>>>> <chirayit...@gmail.com>wrote: >>>>> >>>>>> >>Time stamps are not used for conflict resolution - unless is is >>>>>>> part of the application logic!!! >>>>>>> >>>>>> >>>>>> >>What is you definition of conflict resolution ? Because if you >>>>>> update twice the same column (which >>>>>> >>I'll call a conflict), then the timestamps are used to decide which >>>>>> update wins (which I'll call a resolution). >>>>>> >>>>>> I understand what you are saying, and yes semantics is very important >>>>>> here. And yes we are responding to the immediate questions without >>>>>> covering >>>>>> all questions in the thread. >>>>>> >>>>>> The point being made here is that the timestamp of the column is not >>>>>> used by Cassandra to figure out what data to return. >>>>>> >>>>> >>>>> Not quite true. >>>>> >>>>> >>>>>> E.g. - Quorum is 2 nodes - and RF of 3 over N1/2/3 >>>>>> A Quorum Write comes and add/updates the time stamp (TS2) of a >>>>>> particular data element. It succeeds on N1 - fails on N2/3. So the write >>>>>> is >>>>>> returned as failed - right ? >>>>>> Now Quorum read comes in for exactly the same piece of data that the >>>>>> write failed for. >>>>>> So N1 has TS2 but both N2/3 have the old TS (say TS1) >>>>>> And the read succeeds - Will it return TS1 or TS2. >>>>>> >>>>>> I submit it will return TS1 - the old TS. >>>>>> >>>>> >>>>> It all depends on which (first 2) nodes respond to the read (since >>>>> RF=3, that can any two of N1/N2/N3). If N1 is part of the two that makes >>>>> the >>>>> quorum, then TS2 will be returned, because cassandra will compare the >>>>> timestamp and decide what to return based on this. If N2/N3 responds >>>>> however, both timestamp will be TS1 and so, after timestamp resolution, it >>>>> will stil be TS1 that will be returned. >>>>> So yes timestamp is used for conflict resolution. >>>>> >>>>> In your example, you could get TS1 back because a failed write can let >>>>> you cluster in an inconsistent state. You'd have to retry the quorum and >>>>> only when it succeeds can you be guaranteed that quorum read will always >>>>> return TS2. >>>>> >>>>> This is because when a write fails, Cassandra doesn't guarantee that >>>>> the write did not made it in (there is no revert). >>>>> >>>>> >>>>>> >>>>>> Are we on the same page with this interpretation ? >>>>>> >>>>>> Regards, >>>>>> >>>>>> -JA >>>>>> >>>>>> On Thu, Feb 24, 2011 at 10:12 AM, Sylvain Lebresne < >>>>>> sylv...@datastax.com> wrote: >>>>>> >>>>>>> On Thu, Feb 24, 2011 at 4:52 PM, Anthony John <chirayit...@gmail.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Sylvan, >>>>>>>> >>>>>>>> Time stamps are not used for conflict resolution - unless is is part >>>>>>>> of the application logic!!! >>>>>>>> >>>>>>> >>>>>>> What is you definition of conflict resolution ? Because if you update >>>>>>> twice the same column (which >>>>>>> I'll call a conflict), then the timestamps are used to decide which >>>>>>> update wins (which I'll call a resolution). >>>>>>> >>>>>>> >>>>>>>> You can have "lost updates" w/Cassandra. You need to to use 3rd >>>>>>>> products - cages for e.g. - to get ACID type consistency. >>>>>>>> >>>>>>> >>>>>>> Then again, you'll have to define what you are calling "lost >>>>>>> updates". Provided you use a reasonable consistency level, Cassandra >>>>>>> provides fairly strong durability guarantee, so for some definition you >>>>>>> don't "lose updates". >>>>>>> >>>>>>> That being said, I never pretended that Cassandra provided any ACID >>>>>>> guarantee. ACID relates to transaction, which Cassandra doesn't >>>>>>> support. If >>>>>>> we're talking about the guarantees of transaction, then by all means, >>>>>>> cassandra won't provide it. And yes you can use cages or the like to get >>>>>>> transaction. But that was not the point of the thread, was it ? The >>>>>>> thread >>>>>>> is about vector clocks, and that has nothing to do with transaction >>>>>>> (vector >>>>>>> clocks certainly don't give you transactions). >>>>>>> >>>>>>> Sorry if I wasn't clear in my mail, but I was only responding to why >>>>>>> so far I don't think vector clocks would really provide much for >>>>>>> Cassandra. >>>>>>> >>>>>>> -- >>>>>>> Sylvain >>>>>>> >>>>>>> >>>>>>>> -JA >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 24, 2011 at 7:41 AM, Sylvain Lebresne < >>>>>>>> sylv...@datastax.com> wrote: >>>>>>>> >>>>>>>>> On Thu, Feb 24, 2011 at 3:22 AM, Anthony John < >>>>>>>>> chirayit...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Apologies : For some reason my response on the original mail keeps >>>>>>>>>> bouncing back, thus this new one! >>>>>>>>>> > From the other hand, the same article says: >>>>>>>>>> > "For conditional writes to work, the condition must be evaluated >>>>>>>>>> at all update >>>>>>>>>> > sites before the write can be allowed to succeed." >>>>>>>>>> > >>>>>>>>>> > This means, that when doing such an update CL=ALL must be used >>>>>>>>>> >>>>>>>>>> Sorry, but I am confused by that entire thread! >>>>>>>>>> >>>>>>>>>> Questions:- >>>>>>>>>> 1. Does Cassandra implement any kind of data locking - at any >>>>>>>>>> granularity whether it be row/colF/Col ? >>>>>>>>>> >>>>>>>>> >>>>>>>>> No locking, no. >>>>>>>>> >>>>>>>>> >>>>>>>>>> 2. If the answer to 1 above is NO! - how does CL ALL prevent >>>>>>>>>> conflicts. Concurrent updates on exactly the same piece of data on >>>>>>>>>> different >>>>>>>>>> nodes can still mess each other up, right ? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Not sure why you are taking CL.ALL specifically. But in any CL, >>>>>>>>> updating the same piece of data means the same column value. In that >>>>>>>>> case, >>>>>>>>> the resolution rules are the following: >>>>>>>>> - If the updates have a different timestamp, keep the one with >>>>>>>>> the higher timestamp. That is, the more recent of two updates win. >>>>>>>>> - It the timestamps are the same, then it compares the values >>>>>>>>> (byte comparison) and keep the highest value. This is just to break >>>>>>>>> ties in >>>>>>>>> a consistent manner. >>>>>>>>> >>>>>>>>> So if you do two truly concurrent updates (that is from two place >>>>>>>>> at the same instant), then you'll end with one of the update. This is >>>>>>>>> the >>>>>>>>> column level. >>>>>>>>> >>>>>>>>> However, if that simple conflict detection/resolution mechanism is >>>>>>>>> not good enough for some of your use case and you need to keep two >>>>>>>>> concurrent updates, it is easy enough. Just make sure that the update >>>>>>>>> don't >>>>>>>>> end up in the same column. This is easily achieved by appending some >>>>>>>>> unique >>>>>>>>> identifier to the column name for instance. And when reading, do a >>>>>>>>> slice and >>>>>>>>> reconcile whatever you get back with whatever logic make sense. If >>>>>>>>> you do >>>>>>>>> that, congrats, you've roughly emulated what vector clocks would do. >>>>>>>>> Btw, no >>>>>>>>> locking or anything needed. >>>>>>>>> >>>>>>>>> In my experience, for most things the timestamp resolution is >>>>>>>>> enough. If the same user update twice it's profile picture on you web >>>>>>>>> site >>>>>>>>> at the same microsecond, it's usually fine to end up with one of the >>>>>>>>> two >>>>>>>>> pictures. In the rare case where you need something more specific, >>>>>>>>> using the >>>>>>>>> cassandra data model usually solves the problem easily. The reason >>>>>>>>> for not >>>>>>>>> having vector clocks in Cassandra is that so far, we haven't really >>>>>>>>> found >>>>>>>>> much example where it is no the case. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sylvain >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >