HI Aseem,

You can run Apache Flume to consume messages from Kafka and write them to
s3/HDFS in (micro batches)streaming fashion.

Writes to s3/HDFS should be in micro batches, you can do it for every
message (not every sure if s3 supports append) but it won't be performant.

https://flume.apache.org/

Thanks
Sudev

On Tue, Dec 6, 2016 at 5:14 PM Sharninder <sharnin...@gmail.com> wrote:

> What do you mean by streaming way? The logic to push to S3 will be in your
> consumer, so it totally depends on how you want to read and store. I think
> that's an easier way to do what you want to, instead of trying to backup
> kafka and then read messages from there. Not even sure that's possible.
>
> On Tue, Dec 6, 2016 at 5:11 PM, Aseem Bansal <asmbans...@gmail.com> wrote:
>
> > I get that we can read them and store them in batches but is there some
> > streaming way?
> >
> > On Tue, Dec 6, 2016 at 5:09 PM, Aseem Bansal <asmbans...@gmail.com>
> wrote:
> >
> > > Because we need to do exploratory data analysis and machine learning.
> We
> > > need to backup the messages somewhere so that the data scientists can
> > > query/load them.
> > >
> > > So we need something like a router that just opens up a new consumer
> > group
> > > which just keeps on storing them to S3.
> > >
> > > On Tue, Dec 6, 2016 at 5:05 PM, Sharninder Khera <sharnin...@gmail.com
> >
> > > wrote:
> > >
> > >> Why not just have a parallel consumer read all messages from whichever
> > >> topics you're interested in and store them wherever you want to? You
> > don't
> > >> need to "backup" Kafka messages.
> > >>
> > >>                 _____________________________
> > >> From: Aseem Bansal <asmbans...@gmail.com>
> > >> Sent: Tuesday, December 6, 2016 4:55 PM
> > >> Subject: Storing Kafka Message JSON to deep storage like S3
> > >> To:  <users@kafka.apache.org>
> > >>
> > >>
> > >> Hi
> > >>
> > >> Has anyone done a storage of Kafka JSON messages to deep storage like
> > S3.
> > >> We are looking to back up all of our raw Kafka JSON messages for
> > >> Exploration. S3, HDFS, MongoDB come to mind initially.
> > >>
> > >> I know that it can be stored in kafka itself but storing them in Kafka
> > >> itself does not seem like a good option as we won't be able to query
> it
> > >> and
> > >> the configurations of machines containing kafka will have to be
> > increased
> > >> as we go. Something like S3 we won't have to manage.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
>
>
>
> --
> --
> Sharninder
>

Reply via email to