Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread Suneel Marthi
rocessing engine such as Spark? I do > like the choices that adopting the unified programming model outlined in > Apache Beam/Google Cloud Dataflow SDK and this purports to have runners for > both Flink and Spark. > > > > Regards, > > > > Leith > > *From: *Till R

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread milind parikh
programming model outlined in Apache Beam/Google Cloud Dataflow SDK and this purports to have runners for both Flink and Spark. Regards, Leith *From: *Till Rohrmann *Date: *Wednesday, 20 July 2016 at 5:05 PM *To: * *Subject: *Re: Using Kafka and Flink for batch processing of a batch data sou

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Leith Mudge
Date: Wednesday, 20 July 2016 at 5:05 PM To: Subject: Re: Using Kafka and Flink for batch processing of a batch data source At the moment there is also no batch source for Kafka. I'm also not so sure how you would define a batch given a Kafka stream. Only reading till a certain offset? Or maybe

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Till Rohrmann
At the moment there is also no batch source for Kafka. I'm also not so sure how you would define a batch given a Kafka stream. Only reading till a certain offset? Or maybe until one has read n messages? I think it's best to write the batch data to HDFS or another batch data store. Cheers, Till O

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-19 Thread milind parikh
It likely does not make sense to publish a file ( "batch data") into Kafka; unless the file is very small. An improvised pub-sub mechanism for Kafka could be to (a) write the file into a persistent store outside of kafka (b) publishing of a message into Kafka about that write so as to enable proce

Using Kafka and Flink for batch processing of a batch data source

2016-07-19 Thread Leith Mudge
I am currently working on an architecture for a big data streaming and batch processing platform. I am planning on using Apache Kafka for a distributed messaging system to handle data from streaming data sources and then pass on to Apache Flink for stream processing. I would also like to use Fli