Hi guys, I'm making some tests with Spark and Kafka using a Python script. I use the second method that doesn't need any receiver (Direct Approach). It should adapt the number of RDDs to the number of partitions in the topic. I'm trying to verify it. What's the easiest way to verify it ? I also tried to co-locate Yarn, Spark and Kafka to check if RDDs are created depending on the leaders of partitions in a topic, and they are not. Can you confirm that RDDs are not created depending on the location of partitions and that co-locating Kafka with Spark is not a must-have or that Spark does not take advantage of it ?
As the parallelism is simplified (by creating as many RDDs as there are partitions) I suppose that the biggest part of the tuning is playing with KafKa partitions (not talking about network configuration or management of Spark resources) ? Thank you --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org