Can you post your actual code? On Thu, Mar 10, 2016 at 9:55 PM, Mukul Gupta <mukul.gu...@aricent.com> wrote: > Hi All, I was running the following test: Setup 9 VM runing spark workers > with 1 spark executor each. 1 VM running kafka and spark master. Spark > version is 1.6.0 Kafka version is 0.9.0.1 Spark is using its own resource > manager and is not running over YARN. Test I created a kafka topic with 3 > partition. next I used "KafkaUtils.createDirectStream" to get a DStream. > JavaPairInputDStream<String, String> stream = > KafkaUtils.createDirectStream(…); JavaDStream stream1 = stream.map(func1); > stream1.print(); where func1 just contains a sleep followed by returning of > value. Observation First RDD partition corresponding to partition 1 of kafka > was processed on one of the spark executor. Once processing is finished, > then RDD partitions corresponding to remaining two kafka partitions were > processed in parallel on different spark executors. I expected that all > three RDD partitions should have been processed in parallel as there were > spark executors available which were lying idle. I re-ran the test after > increasing the partitions of kafka topic to 5. This time also RDD partition > corresponding to partition 1 of kafka was processed on one of the spark > executor. Once processing is finished for this RDD partition, then RDD > partitions corresponding to remaining four kafka partitions were processed > in parallel on different spark executors. I am not clear about why spark is > waiting for operations on first RDD partition to finish, while it could > process remaining partitions in parallel? Am I missing any configuration? > Any help is appreciated. Thanks, Mukul > ________________________________ > View this message in context: Kafka + Spark streaming, RDD partitions not > processed in parallel > Sent from the Apache Spark User List mailing list archive at Nabble.com.
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org