Hi Chesnay Sorry for asking the question in a confusing manner. Being new to flink, there are many questions swirling around in my head.
Thanks for the details in your answers. Here's the facts , as I see them: (a) Cassandra Counters are not idempotent (b) The failures, in context of Cassandra, are not the typical failures of an ACID transaction. The failure indicate that the operation was not able to continue at the specified transaction level; meaning that at least one of the nodes didn't ack back in the requisite amount of time the reads or the writes. This failure is NOT indicative of the fact that some node (or many ) might have seen and processed the reads or writes; just that at least one of the nodes did not. There is no rollback either. The antientropy features of Cassandra will kick in and attempt to correct the situation internal to Cassandra. From an external system, though, the situation is different....if such failure occurs, one could try to retry the operation (specifically writes) again outside of Cassandra; provided one has the ability to do so through an intermediate layer (think flink)and the write is specifically modeled to be idempotent in the data model (specifically Rowkey design). One could model the data model so as to make Flink work exceptionally well with Cassandra; except counter tables. There is no way in Cassandra currently to model an idempotent counter table that I know of. Therefore an event replay that affects a counter might end up double counting. When will the Cassandra sink be released? I am ready to test it out even now. Hello Milind, I'm not entirely sure i fully understood your question, but I'll try anyway :) There is now way to provide exactly-once semantics for Cassandra's counters. As such we (will) only provide exactly-once semantics for a subset of Cassandra operations; idempotent inserts/updates. There are several things that would allow exactly-once semantics: - transactions - rather obvious i think - replaying/rollback to a given state - replay for sources/rollback for sinks upon failure - an atomic idempotent update across 2 tables. - allows tracking every read/write made; selectively re-read/write upon failure One of the key requisites is proper failure reporting though; if an update fails we *need to know*. As far as i know Cassandra doesn't make this guarantee. Regards, Chesnay Schepler On 10.05.2016 07:48, milind parikh wrote: Given FLINK 3311 & 3332, I am wondering it would be possible, without idempotent counters in Cassandra, to deliver on an exactly once sink into Cassandra. I do note that the verbiage/disc2 in 3332 does warn the user that this is not exactly "exactly once" sink. However my question has to do with whether having idempotent counters and a Data model that enables all other idempotent operations are a necessary prerequisite to exactly once semantics in flink. Asked a different way, what source and sink would enable a end-to-end exactly - once semantics, in the current state-of-the-art, with Flink in the middle. Thanks Milind