Hi list, In an excelent blog post on Kafka and Spark Streaming integrartion ( http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/), Michael Noll poses an assumption about the number of partitions of the RDDs created by input DStreams. He says his hypothesis is that the number por partitions per RDD is batchInterval / spark.streaming.blockInterval. I guess this is based on the following extract from the Spark Streaming Programming Guide at section"Level of Parallelism of Data Receiving"
"Another parameter that should be considered is the receiver’s blocking interval. For most receivers, the received data is coalesced together into large blocks of data before storing inside Spark’s memory. The number of blocks in each batch determines the number of tasks that will be used to process those the received data in a map-like transformation. This blocking interval is determined by the configuration parameter <https://spark.apache.org/docs/1.1.0/configuration.html> spark.streaming.blockInterval and the default value is 200 milliseconds." Could someone confirm whether that hypotesis is true or false? And if it is false, is there any way to know the number of partitions per RDD for an input DStream? Thansk a lot for your help, Greetings, Juan Rodriguez