i think it would be nice if the recommended setup for kafka is jbod and not raid because: * it makes it easy to "test" kafka on an existing hadoop/spark cluster * co-location, for example we colocate kafka and spark streaming (our spark streaming app is kafka partition location aware)
ideally kafka would survive a disk failure and only report partial loss, just like a hdfs datanode does. i realize this is a big ask... On Tue, Jan 20, 2015 at 12:25 PM, Yang Fang <franklin.f...@gmail.com> wrote: > I think the best way is raid not jbod. If one disk of jbod goes wrong , > broker shutdown, then it takes long time to recovery . Brokes which run for > long time will be more and more leaders of partitions. I/O pressure will > be unbalanced. > btw, I use kafka 0.8.0-beta1 >