Hello,

I want to size the kafka cluster with just one topic and I'm going to
process the data with Spark and others applications.

If I have six hard drives per node, is it kafka smart enough to deal with
them? I guess that the memory should be very important in this point and
all data is cached in memory. Is it possible to config kafka to use many
directories as HDFS, each one with a different disk?

I'm not sure about the number of partitions either. I have read
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
and they talk about number of partitions much higher that I had thought. Is
it normal to have a topic with 1000 partitions? I was thinking about about
two/four partitions per node. is it wrong my thought?

As I'm going to process data with Spark, I could have numberPartitions
equals numberExecutors in Spark as max, always thinking in the future and
sizing higher than that.

Reply via email to