All-time stream re-processing

Tobias Pfeiffer Wed, 24 Sep 2014 02:01:08 -0700

Hi,

I have a setup (in mind) where data is written to Kafka and this data is
persisted in HDFS (e.g., using camus) so that I have an all-time archive of
all stream data ever received. Now I want to process that all-time archive
and when I am done with that, continue with the live stream, using Spark
Streaming. (In a perfect world, Kafka would have infinite storage and I
would always use the Kafka receiver, starting from offset 0.)
Does anyone have an idea how to realize such a setup? Would I write a
custom receiver that first reads the HDFS file and then connects to Kafka?
Is there an existing solution for that use case?


Thanks
Tobias

All-time stream re-processing

Reply via email to