Hi there, I recently started to use Kafka for our data analysis pipeline and it works very well.
One problem to us so far is expanding our cluster when we need more storage space. Kafka provides some scripts for helping do this but the process wasn't smooth. To make it work perfectly, seems Kafka needs to do some jobs that a distributed file system has already done. So just wondering if any thoughts to make Kafka work on top of HDFS? Maybe make the Kafka storage engine pluggable and HDFS is one option? The pros might be that HDFS has already handled storage management (replication, corrupted disk/machine, migration, load balance, etc.) very well and it frees Kafka and the users from the burden, and the cons might be performance degradation. As Kafka does very well on performance, possibly even with some degree of degradation, it's still competitive for the most situations. Best, -- Hangjun Ye