Hey Jay, It's awesome to get reply from one of the key Kafka contributor :) . Thanks for suggesting Kafka Connect.
How does Kafka-Connect deals with HDFS small files? ( I assume setting large flus.size allows user to maintain minimum HDFS file size. ) Does Kafka-Connect keep file handle open until file is committed? ( Flume keeps file handles open resulting into too many files open) Can I write custom serializer for kafka-connect ? Thanks, R P ________________________________________ From: Jay Kreps <j...@confluent.io> Sent: Thursday, February 11, 2016 11:45 AM To: users@kafka.apache.org Subject: Re: What is the best way to write Kafka data into HDFS? Check out Kafka Connect: http://www.confluent.io/blog/how-to-build-a-scalable-etl-pipeline-with-kafka-connect -Jay On Wed, Feb 10, 2016 at 5:09 PM, R P <hadoo...@outlook.com> wrote: > Hello All, > New Kafka user here. What is the best way to write Kafka data into HDFS? > I have looked into following options and found that Flume is quickest and > easiest to setup. > > 1. Flume > 2. KaBoom > 3. Kafka Hadoop Loader > 4. Camus -> Gobblin > > Although Flume can result into small file problems when your data is > partitioned and some partitions generate sporadic data. > > What are some best practices and options to write data from Kafka to HDFS? > > Thanks, > R P > > > >