Hey,

Spark 1.1.0
Kafka 0.8.1.1
Hadoop (YARN/HDFS) 2.5.1

I have a five partition Kafka topic.  I can create a single Kafka receiver via 
KafkaUtils.createStream with five threads in the topic map and consume messages 
fine.  Sifting through the user list and Google, I see that its possible to 
split the Kafka receiver among the Spark workers such that I can have a 
receiver per topic, and have this distributed to workers rather than localized 
to the driver.  I’m following something like this:  
https://github.com/apache/spark/blob/ae58aea2d1435b5bb011e68127e1bcddc2edf5b2/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java#L132
  But for Kafka obviously.  From the Streaming Programming Guide “ Receiving 
multiple data streams can therefore be achieved by creating multiple input 
DStreams and configuring them to receive different partitions of the data 
stream from the source(s)."

However, I’m not able to consume any messages from Kafka after I perform the 
union operation.  Again, if I create a single, multi-threaded, receiver I can 
consume messages fine.  If I create 5 receivers in a loop, and call 
jssc.union(…) i get:

INFO scheduler.ReceiverTracker: Stream 0 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 1 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 2 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 3 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 4 received 0 blocks

Do I need to do anything to the unioned DStream?  Am I going about this 
incorrectly?

Thanks in advance.

Matt

Reply via email to