On 22/10/2010 2:27, Nicholas Knight wrote:
On Oct 22, 2010, at 7:41 AM, Jérôme Verstrynge wrote:
Let's imagine that A initiates its column write at: 334450 ms with 'AAA' and 
timestamp 334450 ms
Let's imagine that E initiates its column write at: 334451 ms with 'ZZZ'and 
timestamp 334450 ms
(E is the latest write)

Let's imagine that A reaches C at 334455 ms and performs its write.
Let's imagine that E reaches C at 334456 ms and attempts to performs its write. 
It will loose the timestamp-tie ('AAA' is greater than 'ZZZ').
How is this any different from E's perspective than if A had come along a 
moment later with timestamp 334452?
If this results in only one entry, then I am happy. If this results in two entries (334450 and 334452), then the situation is different and does not correspond to my argument.

When I read http://wiki.apache.org/cassandra/DataModel, the column section explicitely says: "All values are supplied by the client, including the 'timestamp'."

Hence, there is nothing that explicitely guarantees that only one record is created from this documentation.

What you describe is an application in *desperate* need of either a serious 
redesign, or a distributed locking mechanism.

This really isn't a Cassandra-specific problem, Cassandra just happens to be 
the distributed storage system at issue. Any such system without a locking 
mechanism will present some form of this problem, and the answer will be the 
same: Avoid it in the application design, or incorporate a locking mechanism 
into the application.
I agree about the problem not being specific to Cassandra. I have nothing against Cassandra. In fact, I am facinated by it and consider using it in my own projects.

If there is a timestamp-tie, then the context becomes uncertain for E, out of 
the blue.
If application E can't be sure about what has been saved in Cassandra, it 
cannot rely on what it has in memory. It is a vicious circle. It can't 
anticipate on the potential actions of A on the column too.
And how is this different from E's data being overwritten with a later 
timestamp? Either way, what E thinks is in Cassandra really isn't.
Well, E knows that it can't predict the value for future timestamps values coming from other nodes. Fine. What I am worried about is that it can't predict the value for its own timestamp.

If you need to make sure you have consistency at this level, you *need* a 
locking mechanism.
This is unsual for any application, but may be this is the price to pay for 
using Cassandra. Fair enough.
Hardly. Any non-serial application that doesn't use some form of locking has 
this exact same problem at all levels of storage, possibly even in its internal 
variables.
I have not argued against locking as a potential solution. I am only suggesting something lighter.

If E is not informed of the timestamp tie, then it is left alone in the dark. 
Hence, this is why I say Cassandra is not deterministic to E. The result of a 
write is potentially non-deterministic in what it actually performs.
Cassandra is deterministic for a given input. What you're saying is you aren't 
properly controlling the input that your application is giving it.
You are making my point (lol). No matter what an application writes, it should re-read its owns write for determinism for a given timestamp when other application instances are writing in the same 'table'.

If E was aware that it lost a timestamp-tie, it would know that there is a 
possible gap between its internal memory representation and what it tried to 
save into Cassandra. That is, EVEN if there is no further write on that same 
column (or, in other words, regardless of any potential subsequent races).
What is the significance of this?
If you know there is no timestamp collision, then you know you don't need to re-read for determinism. Otherwise you should. In a situation where you can't know, you should automatically re-read, which is expensive (or implement a locking mechanism).

If E was informed it lost a timestamp-tie, it could re-read the column (and 
let's assume that there is no further write in between, but this does not 
change anything to the argument). It could spot that its write for timestamp 
value 334450 ms failed, and also the reason why ('AAA' greater than 'ZZZ). It 
could operate a new write, which eventually could result in another 
timestamp-tie, but at least it would be informed about it too... It would have 
a safety net.
To what end? A and E would apparently get into some sort of never-ending fight. 
The application as described is broken and needs to be fixed.
No, no fight since E would know it can't win because it has the lower hand 'ZZZ' for the given timestamp.

The case I am trying to cover is the case where the context for application E 
becomes invalid because of a successful write call to Cassandra without 
registration of 'ZZZ'. How can Cassandra call it a successful write, when in 
fact, it isn't for application E? I believe Cassandra should notify application 
E one way or another. This is why I mentioned an extra timestamp-tie flag in 
the write ACK sent by nodes back to node E.
Here's part of the problem. You're seeing E as a distinct application from A 
which can behave completely independently. You need to stop thinking like that. 
It leads to broken architectures

Even if the E and A processes come from entirely different code bases, you need 
to start by thinking of them as one application. That application is broken.
I am not going to argue this, because it is not related to my argument. I mean no offense by saying this.

The subsequent question I have is:

If 'value breaks timestamp-tie', how does Cassandra behave in case of updates? 
If there is a column with value 'AAA' at 334450 ms and an application 
explicitely wants to update this value to 'ZZZ' for 334450 ms, it seems like 
the timestamp-tie will prevent that. Hence, the update/mutation would be 
undeterministic to E. It seems like one should first delete the existing record 
and write a new one (and that could lead to race conditions and timestamp-ties 
too).
You need a locking mechanism. Timestamps aren't the droids you're looking for.
In this case, I do agree that explicit updates on a given timestamp can't be achieved without locks.

I think this should be documented, because engineers will hit that 'local' 
undeterministic issue for sure if two instances of their applications perform 
'completed writes' in the same column family. Completed does not mean 
successful, even with quorum (or ALL). They ought to know it.
I'm honestly not sure why they wouldn't. One need only perform a very cursory 
investigation of Cassandra to realize that addition of a locking mechanism is 
necessary for many applications, such as the one described here.
Again, I am not saying locks are not a solution. I was just suggesting a lighter solution for the issue I was raising. Implementing locks in Cassandra-like system is tricky. The proposed solutions so far are costly and heavy.
-NK
Thanks for your answer.

Jérôme

Reply via email to