Hi,

I have a setup (in mind) where data is written to Kafka and this data is
persisted in HDFS (e.g., using camus) so that I have an all-time archive of
all stream data ever received. Now I want to process that all-time archive
and when I am done with that, continue with the live stream, using Spark
Streaming. (In a perfect world, Kafka would have infinite storage and I
would always use the Kafka receiver, starting from offset 0.)
Does anyone have an idea how to realize such a setup? Would I write a
custom receiver that first reads the HDFS file and then connects to Kafka?
Is there an existing solution for that use case?

Thanks
Tobias

Reply via email to