Hi there,

I recently started to use Kafka for our data analysis pipeline and it works
very well.

One problem to us so far is expanding our cluster when we need more storage
space.
Kafka provides some scripts for helping do this but the process wasn't
smooth.

To make it work perfectly, seems Kafka needs to do some jobs that a
distributed file system has already done.
So just wondering if any thoughts to make Kafka work on top of HDFS? Maybe
make the Kafka storage engine pluggable and HDFS is one option?

The pros might be that HDFS has already handled storage management
(replication, corrupted disk/machine, migration, load balance, etc.) very
well and it frees Kafka and the users from the burden, and the cons might
be performance degradation.
As Kafka does very well on performance, possibly even with some degree of
degradation, it's still competitive for the most situations.

Best,
-- 
Hangjun Ye

Reply via email to