On 22/10/2010 2:27, Nicholas Knight wrote:
On Oct 22, 2010, at 7:41 AM, Jérôme Verstrynge wrote:
Let's imagine that A initiates its column write at: 334450 ms with 'AAA' and
timestamp 334450 ms
Let's imagine that E initiates its column write at: 334451 ms with 'ZZZ'and
timestamp 334450 ms
(E is the latest write)
Let's imagine that A reaches C at 334455 ms and performs its write.
Let's imagine that E reaches C at 334456 ms and attempts to performs its write.
It will loose the timestamp-tie ('AAA' is greater than 'ZZZ').
How is this any different from E's perspective than if A had come along a
moment later with timestamp 334452?
If this results in only one entry, then I am happy. If this results in
two entries (334450 and 334452), then the situation is different and
does not correspond to my argument.
When I read http://wiki.apache.org/cassandra/DataModel, the column
section explicitely says: "All values are supplied by the client,
including the 'timestamp'."
Hence, there is nothing that explicitely guarantees that only one record
is created from this documentation.
What you describe is an application in *desperate* need of either a serious
redesign, or a distributed locking mechanism.
This really isn't a Cassandra-specific problem, Cassandra just happens to be
the distributed storage system at issue. Any such system without a locking
mechanism will present some form of this problem, and the answer will be the
same: Avoid it in the application design, or incorporate a locking mechanism
into the application.
I agree about the problem not being specific to Cassandra. I have
nothing against Cassandra. In fact, I am facinated by it and consider
using it in my own projects.
If there is a timestamp-tie, then the context becomes uncertain for E, out of
the blue.
If application E can't be sure about what has been saved in Cassandra, it
cannot rely on what it has in memory. It is a vicious circle. It can't
anticipate on the potential actions of A on the column too.
And how is this different from E's data being overwritten with a later
timestamp? Either way, what E thinks is in Cassandra really isn't.
Well, E knows that it can't predict the value for future timestamps
values coming from other nodes. Fine. What I am worried about is that it
can't predict the value for its own timestamp.
If you need to make sure you have consistency at this level, you *need* a
locking mechanism.
This is unsual for any application, but may be this is the price to pay for
using Cassandra. Fair enough.
Hardly. Any non-serial application that doesn't use some form of locking has
this exact same problem at all levels of storage, possibly even in its internal
variables.
I have not argued against locking as a potential solution. I am only
suggesting something lighter.
If E is not informed of the timestamp tie, then it is left alone in the dark.
Hence, this is why I say Cassandra is not deterministic to E. The result of a
write is potentially non-deterministic in what it actually performs.
Cassandra is deterministic for a given input. What you're saying is you aren't
properly controlling the input that your application is giving it.
You are making my point (lol). No matter what an application writes, it
should re-read its owns write for determinism for a given timestamp when
other application instances are writing in the same 'table'.
If E was aware that it lost a timestamp-tie, it would know that there is a
possible gap between its internal memory representation and what it tried to
save into Cassandra. That is, EVEN if there is no further write on that same
column (or, in other words, regardless of any potential subsequent races).
What is the significance of this?
If you know there is no timestamp collision, then you know you don't
need to re-read for determinism. Otherwise you should. In a situation
where you can't know, you should automatically re-read, which is
expensive (or implement a locking mechanism).
If E was informed it lost a timestamp-tie, it could re-read the column (and
let's assume that there is no further write in between, but this does not
change anything to the argument). It could spot that its write for timestamp
value 334450 ms failed, and also the reason why ('AAA' greater than 'ZZZ). It
could operate a new write, which eventually could result in another
timestamp-tie, but at least it would be informed about it too... It would have
a safety net.
To what end? A and E would apparently get into some sort of never-ending fight.
The application as described is broken and needs to be fixed.
No, no fight since E would know it can't win because it has the lower
hand 'ZZZ' for the given timestamp.
The case I am trying to cover is the case where the context for application E
becomes invalid because of a successful write call to Cassandra without
registration of 'ZZZ'. How can Cassandra call it a successful write, when in
fact, it isn't for application E? I believe Cassandra should notify application
E one way or another. This is why I mentioned an extra timestamp-tie flag in
the write ACK sent by nodes back to node E.
Here's part of the problem. You're seeing E as a distinct application from A
which can behave completely independently. You need to stop thinking like that.
It leads to broken architectures
Even if the E and A processes come from entirely different code bases, you need
to start by thinking of them as one application. That application is broken.
I am not going to argue this, because it is not related to my argument.
I mean no offense by saying this.
The subsequent question I have is:
If 'value breaks timestamp-tie', how does Cassandra behave in case of updates?
If there is a column with value 'AAA' at 334450 ms and an application
explicitely wants to update this value to 'ZZZ' for 334450 ms, it seems like
the timestamp-tie will prevent that. Hence, the update/mutation would be
undeterministic to E. It seems like one should first delete the existing record
and write a new one (and that could lead to race conditions and timestamp-ties
too).
You need a locking mechanism. Timestamps aren't the droids you're looking for.
In this case, I do agree that explicit updates on a given timestamp
can't be achieved without locks.
I think this should be documented, because engineers will hit that 'local'
undeterministic issue for sure if two instances of their applications perform
'completed writes' in the same column family. Completed does not mean
successful, even with quorum (or ALL). They ought to know it.
I'm honestly not sure why they wouldn't. One need only perform a very cursory
investigation of Cassandra to realize that addition of a locking mechanism is
necessary for many applications, such as the one described here.
Again, I am not saying locks are not a solution. I was just suggesting a
lighter solution for the issue I was raising. Implementing locks in
Cassandra-like system is tricky. The proposed solutions so far are
costly and heavy.
-NK
Thanks for your answer.
Jérôme