I have a Spark Streaming application which reads data from kafka and save the the transformation result to hdfs. My original partition number of kafka topic is 8, and repartition the data to 100 to increase the parallelism of spark job. Now I am wondering if I increase the kafka partition number to 100 instead of setting repartition to 100, will the performance be enhanced? (I know repartition action cost a lot cpu resource) If I set the kafka partition number to 100, does it have any negative efficiency? I just have one production environment so it's not convenient for me to do the test....
Thanks! Regard, Junfeng Chen