We are using Apache Flume as a router to consume data from Kafka and push to HDFS. With Flume 1.6, Kafka Channel, Source and Sink are available out of the box.
Here is the blog post from Cloudera http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/ Thanks, Vivek Thakre On Thu, Oct 22, 2015 at 2:29 PM, Hawin Jiang <hawin.ji...@gmail.com> wrote: > Very useful information for us. > Thanks Guozhang. > On Oct 22, 2015 2:02 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: > > > Hi Adrian, > > > > Another alternative approach is to use Kafka's own Copycat framework for > > data ingressing / egressing. It will be released in our 0.9.0 version > > expected in Nov. > > > > Under Copycat users can write different "connector" instantiated for > > different source / sink systems, while for your case there is a in-built > > HDFS connector coming along with the framework itself. You can find more > > details in these Kafka wikis / java docs: > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767 > > > > > > > https://s3-us-west-2.amazonaws.com/confluent-files/copycat-docs-wip/intro.html > > > > Guozhang > > > > > > On Thu, Oct 22, 2015 at 12:52 PM, Henry Cai <h...@pinterest.com.invalid> > > wrote: > > > > > Take a look at secor: > > > > > > https://github.com/pinterest/secor > > > > > > Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any > > > underlying systems such as Hadoop, it only uses Kafka high level > consumer > > > to balance the work loads. Very easy to understand and manage. It's > > > probably the 2nd most popular kafka/HDFS ingestion tool (behind camus). > > > Lots of web companies use this to do the kafka data ingestion > > > (Pinterest/Uber/AirBnb). > > > > > > > > > On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead <awoodh...@hotels.com > > > > > wrote: > > > > > > > Hello all, > > > > > > > > We're looking at options for getting data from Kafka onto HDFS and > > Camus > > > > looks like the natural choice for this. It's also evident that > LinkedIn > > > who > > > > originally created Camus are taking things in a different direction > and > > > are > > > > advising people to use their Gobblin ETL framework instead. We feel > > that > > > > Gobblin is overkill for many simple use cases and Camus seems a much > > > > simpler and better fit. The problem now is that with LinkedIn > > apparently > > > > withdrawing official support for it it appears that any changes to > > Camus > > > > are being managed by various forks of it and it looks like everyone > is > > > > building and using their own versions. Wouldn't it be better for a > > > > community to form around one official fork so development efforts can > > be > > > > focused on this? Any thoughts on this? > > > > > > > > Thanks, > > > > > > > > Adrian > > > > > > > > > > > > > > > > > > > -- > > -- Guozhang > > >