Hi Aaron, Thanks for your input.
On Fri, Sep 21, 2012 at 9:56 AM, aaron morton <aa...@thelastpickle.com> wrote: > The commit log is essentially internal implementation. The total size of the > commit log is restricted, and the multiple files used to represent segments > are recycled. So once all the memtables have been flushed for segment it may > be overwritten. > > To archive the segments see the conf/commitlog_archiving.properties file. > > Large rows will bypass the commit log. > > A write commited to the commit log may still be considered a failure if CL > nodes do not succeed. So if I understand you correctly, one shouldn't code against what is essentially an internal artefact that could be subject to change as the Cassandra code base evolves and furthermore may not contain the information an application thinks it should contain. > IMHO it's a better design to multiplex the data stream at the application > level. That's a fair point, and I could multicast the data at that level. The reason why I was considering querying the commit log was because I would prefer to implement a state based synchronization as opposed to an event driven synchronization (which is what the app layer multicast and the AOP solution Brian suggested would be). This is because I'd rather know from Cassandra what Cassandra thinks it has got, rather than trusting an event stream who can only infer what information Cassandra should theoretically hold. The use case I am looking at should be reconcilable and hence I'm trying to avoid placing trust in the fact that all of the events were actually sent correctly, arrived correctly and were written to the target storage without any bugs. I also want to detect the scenario that portions of the data that was written to the target system gets accidentally updated or nuked via a back door. So in summary, given that there is no out of the box way of saying to Cassandra "give me all mutations since timestamp X", I would either have to go for an event driven approach or reconsider the layout of the Cassandra store such that I could reconcile it in an efficient fashion. Thanks for your help, Cheers, Ben