Re: What happens if there is a collision?

Jérôme Verstrynge Thu, 21 Oct 2010 16:42:03 -0700

On 21/10/2010 23:40, Peter Schuller wrote:

OK. Thanks for your answer. From an email exchange I had with Jonathan, all
this means that one should re-read its writes with quorum to make sure they
have not been overriden by timestamp-tie conflicts. I suggested to send
feedback to writting node (in the ACK) when such timestamps-tie conflict
happen. This would avoid having to double-check all writes for timestamp-tie
conflicts.


If multiple applications write to the same ColumnFamily/Tables, this
double-check is a must (unless a separate locking mecanism is implemented,
which would be more heavy).

I'm not sure I understand what you're trying to accomplish. Given that
you have no locking/synchronization mechanism external to Cassandra,
what is it that you are actually learning from re-reading the value? A
completed write at level QUOROM means it was successfully written and
that readers reading at QUOROM will see it unless the value has been
updated subsequently.

REM: I am not trying to make this discussion longer than necessary or toplay semantics. I am not in to that at all and I appreciate the time youtake to answer me, really.

Here is where I disagree with your conclusion when there is a timestamptie. The write by node E will not be performed successfully (at quorumlevel), because of the tie resolution in favor of A somewhere in all thenodes between A and E.

Let's imagine that A initiates its column write at: 334450 ms with 'AAA'and timestamp 334450 msLet's imagine that E initiates its column write at: 334451 ms with'ZZZ'and timestamp 334450 ms

(E is the latest write)

Let's imagine that A reaches C at 334455 ms and performs its write.

Let's imagine that E reaches C at 334456 ms and attempts to performs itswrite. It will loose the timestamp-tie ('AAA' is greater than 'ZZZ').

Even if there is no further writting on that same column using timestamp334450, a quorum read won't see that 'ZZZ' value (which is the latestattempt to write/update the column).


Node A will have completed a write a QUOROM level.

Node E will have completed a write a QUOROM level, but its value won'tbe registered and it won't be notified about it.

Hence, I disagree with your conclusion that a quorum write implies thatit was successfully written. It is not the case for E. I know we couldplay semantics about the meaning of 'successful write' here, but thatwould not lead us nowhere and that is not my point.

But even if you re-read, that does not remove
the fundamental potential for a race condition (i.e., you still don't
know when you see the result of your read whether it wasn't just
ovewritten anyway just after you did your read).

Perhaps I'm misunderstanding what you're trying to do?

I totally agree there is a risk of race condition.

Here is what I am trying to do and why:

If there is no timestamp-tie between A and E, then I have no issue.

If there is a timestamp-tie, then the context becomes uncertain for E,out of the blue.If application E can't be sure about what has been saved in Cassandra,it cannot rely on what it has in memory. It is a vicious circle. Itcan't anticipate on the potential actions of A on the column too.This is unsual for any application, but may be this is the price to payfor using Cassandra. Fair enough.

If E is not informed of the timestamp tie, then it is left alone in thedark. Hence, this is why I say Cassandra is not deterministic to E. Theresult of a write is potentially non-deterministic in what it actuallyperforms.

If E was aware that it lost a timestamp-tie, it would know that there isa possible gap between its internal memory representation and what ittried to save into Cassandra. That is, EVEN if there is no further writeon that same column (or, in other words, regardless of any potentialsubsequent races).

If E was informed it lost a timestamp-tie, it could re-read the column(and let's assume that there is no further write in between, but thisdoes not change anything to the argument). It could spot that its writefor timestamp value 334450 ms failed, and also the reason why ('AAA'greater than 'ZZZ). It could operate a new write, which eventually couldresult in another timestamp-tie, but at least it would be informed aboutit too... It would have a safety net.

The case I am trying to cover is the case where the context forapplication E becomes invalid because of a successful write call toCassandra without registration of 'ZZZ'. How can Cassandra call it asuccessful write, when in fact, it isn't for application E? I believeCassandra should notify application E one way or another. This is why Imentioned an extra timestamp-tie flag in the write ACK sent by nodesback to node E.


The subsequent question I have is:

If 'value breaks timestamp-tie', how does Cassandra behave in case ofupdates? If there is a column with value 'AAA' at 334450 ms and anapplication explicitely wants to update this value to 'ZZZ' for 334450ms, it seems like the timestamp-tie will prevent that. Hence, theupdate/mutation would be undeterministic to E. It seems like one shouldfirst delete the existing record and write a new one (and that couldlead to race conditions and timestamp-ties too).

My conclusion so far is that a timestamp-tie boolean would helpresolving potentially non-deterministic situations which can appearrandomly at any time. Implementing locks would completely prevent thesesituations, but then, locks should be implemented for all writes on alltables if two application instance have access to it. It is alight/inexpensive versus heavy/costly safety net situation.

I think this should be documented, because engineers will hit that'local' undeterministic issue for sure if two instances of theirapplications perform 'completed writes' in the same column family.Completed does not mean successful, even with quorum (or ALL). Theyought to know it.


Jérôme

Re: What happens if there is a collision?

Reply via email to