The batch size can be large, so in memory ordering isn't an option, unfortunately.
On Thu, Dec 22, 2016 at 7:09 AM, Jesse Hodges <hodges.je...@gmail.com> wrote: > Depending on the expected max out of order window, why not order them in > memory? Then you don't need to reread from Cassandra, in case of a problem > you can reread data from Kafka. > > -Jesse > > > On Dec 21, 2016, at 7:24 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > > > - I'm receiving a batch of messages to a Kafka topic. > > > > Each message has a timestamp, however the messages can arrive / get > processed out of order. I.e event 1's timestamp could've been a few seconds > before event 2, and event 2 could still get processed before event 1. > > > > - I know the number of messages that are sent per batch. > > > > - I need to process the messages in order. The messages are basically > providing the history of an item. I need to be able to track the history > accurately (i.e, if an event occurred 3 times, i need to accurately log the > dates of the first, 2nd, and 3rd time it occurred). > > > > The approach I'm considering is: > > > > - Creating a cassandra table which is ordered by the timestamp of the > messages. > > > > - Once a batch of messages has arrived, writing them all to cassandra, > counting on them being ordered by the timestamp even if they are processed > out of order. > > > > - Then iterating over the messages in the cassandra table, to process > them in order. > > > > However, I'm concerned about Cassandra's eventual consistency. Could it > be that even though I wrote the messages, they are not there when I try to > read them (which would be almost immediately after they are written)? > > > > Should I enforce consistency = ALL to make sure the messages will be > available immediately after being written? > > > > Is there a better way to handle this thru either Kafka streams or > Cassandra? >