HI Aseem, You can run Apache Flume to consume messages from Kafka and write them to s3/HDFS in (micro batches)streaming fashion.
Writes to s3/HDFS should be in micro batches, you can do it for every message (not every sure if s3 supports append) but it won't be performant. https://flume.apache.org/ Thanks Sudev On Tue, Dec 6, 2016 at 5:14 PM Sharninder <sharnin...@gmail.com> wrote: > What do you mean by streaming way? The logic to push to S3 will be in your > consumer, so it totally depends on how you want to read and store. I think > that's an easier way to do what you want to, instead of trying to backup > kafka and then read messages from there. Not even sure that's possible. > > On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal <asmbans...@gmail.com> wrote: > > > I get that we can read them and store them in batches but is there some > > streaming way? > > > > On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal <asmbans...@gmail.com> > wrote: > > > > > Because we need to do exploratory data analysis and machine learning. > We > > > need to backup the messages somewhere so that the data scientists can > > > query/load them. > > > > > > So we need something like a router that just opens up a new consumer > > group > > > which just keeps on storing them to S3. > > > > > > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera <sharnin...@gmail.com > > > > > wrote: > > > > > >> Why not just have a parallel consumer read all messages from whichever > > >> topics you're interested in and store them wherever you want to? You > > don't > > >> need to "backup" Kafka messages. > > >> > > >> _____________________________ > > >> From: Aseem Bansal <asmbans...@gmail.com> > > >> Sent: Tuesday, December 6, 2016 4:55 PM > > >> Subject: Storing Kafka Message JSON to deep storage like S3 > > >> To: <users@kafka.apache.org> > > >> > > >> > > >> Hi > > >> > > >> Has anyone done a storage of Kafka JSON messages to deep storage like > > S3. > > >> We are looking to back up all of our raw Kafka JSON messages for > > >> Exploration. S3, HDFS, MongoDB come to mind initially. > > >> > > >> I know that it can be stored in kafka itself but storing them in Kafka > > >> itself does not seem like a good option as we won't be able to query > it > > >> and > > >> the configurations of machines containing kafka will have to be > > increased > > >> as we go. Something like S3 we won't have to manage. > > >> > > >> > > >> > > >> > > >> > > > > > > > > > > > > -- > -- > Sharninder >