Hello, I want to size the kafka cluster with just one topic and I'm going to process the data with Spark and others applications.
If I have six hard drives per node, is it kafka smart enough to deal with them? I guess that the memory should be very important in this point and all data is cached in memory. Is it possible to config kafka to use many directories as HDFS, each one with a different disk? I'm not sure about the number of partitions either. I have read http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ and they talk about number of partitions much higher that I had thought. Is it normal to have a topic with 1000 partitions? I was thinking about about two/four partitions per node. is it wrong my thought? As I'm going to process data with Spark, I could have numberPartitions equals numberExecutors in Spark as max, always thinking in the future and sizing higher than that.