I am running into some problems with Spark Streaming when reading from Kafka.I used Spark 1.2.0 built on CDH5. The example is based on: https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala * It works with default implementation. val topicMap = topics.split(",").map((_,numThreads.toInt)).toMap val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
* However, when I changed it to parallel receiving, like shown below val topicMap = topics.split(",").map((_, 1)).toMap val parallelInputs = (1 to numThreads.toInt) map { _ => KafkaUtils. createStream(ssc, zkQuorum, group, topicMap) } ssc.union(parallelInputs) After the change, the job stage just hang there and never finish. It looks like no action is triggered on the streaming job. When I check the "Streaming" tab, it show messages below: Batch Processing Statistics No statistics have been generated yet. Am I doing anything wrong on the parallel receiving part? -- Chen Song