Take a look at Camus <https://github.com/linkedin/camus/>
François Langelier Étudiant en génie Logiciel - École de Technologie Supérieure<http://www.etsmtl.ca/> Capitaine Club Capra <http://capra.etsmtl.ca/> VP-Communication - CS Games <http://csgames.org> 2014 Jeux de Génie <http://www.jdgets.com/> 2011 à 2014 Argentier Fraternité du Piranha <http://fraternitedupiranha.com/> 2012-2014 Comité Organisateur Olympiades ÉTS 2012 Compétition Québécoise d'Ingénierie 2012 - Compétition Senior On 19 May 2014 05:28, Hangjun Ye <yehang...@gmail.com> wrote: > Hi there, > > I recently started to use Kafka for our data analysis pipeline and it works > very well. > > One problem to us so far is expanding our cluster when we need more storage > space. > Kafka provides some scripts for helping do this but the process wasn't > smooth. > > To make it work perfectly, seems Kafka needs to do some jobs that a > distributed file system has already done. > So just wondering if any thoughts to make Kafka work on top of HDFS? Maybe > make the Kafka storage engine pluggable and HDFS is one option? > > The pros might be that HDFS has already handled storage management > (replication, corrupted disk/machine, migration, load balance, etc.) very > well and it frees Kafka and the users from the burden, and the cons might > be performance degradation. > As Kafka does very well on performance, possibly even with some degree of > degradation, it's still competitive for the most situations. > > Best, > -- > Hangjun Ye >