Multiple Kafka Receivers and Union

Matt Narrell Tue, 23 Sep 2014 09:21:11 -0700

Hey,

Spark 1.1.0
Kafka 0.8.1.1
Hadoop (YARN/HDFS) 2.5.1

I have a five partition Kafka topic. I can create a single Kafka receiver via
KafkaUtils.createStream with five threads in the topic map and consume messages
fine. Sifting through the user list and Google, I see that its possible to
split the Kafka receiver among the Spark workers such that I can have a
receiver per topic, and have this distributed to workers rather than localized
to the driver. I’m following something like this:
https://github.com/apache/spark/blob/ae58aea2d1435b5bb011e68127e1bcddc2edf5b2/extras/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java#L132
But for Kafka obviously. From the Streaming Programming Guide “ Receiving
multiple data streams can therefore be achieved by creating multiple input
DStreams and configuring them to receive different partitions of the data
stream from the source(s)."

However, I’m not able to consume any messages from Kafka after I perform the
union operation. Again, if I create a single, multi-threaded, receiver I can
consume messages fine. If I create 5 receivers in a loop, and call
jssc.union(…) i get:

INFO scheduler.ReceiverTracker: Stream 0 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 1 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 2 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 3 received 0 blocks
INFO scheduler.ReceiverTracker: Stream 4 received 0 blocks

Do I need to do anything to the unioned DStream? Am I going about this
incorrectly?

Thanks in advance.

Matt

Multiple Kafka Receivers and Union

Reply via email to