Please ignore this question, as i've figured out what my problem was.

In the case that anyone else runs into something similar, the problem was
on the kafka side.  I was using the console producer to generate the
messages going into the kafka logs.  This producer will send all of the
messages to the same partition, unless you specify the "--new-producer"
parameter.

Thanks,
Jorge

On Wed, Feb 3, 2016 at 12:44 PM, Jorge Rodriguez <jo...@bloomreach.com>
wrote:

> Hello Spark users,
>
> We are setting up our fist bach of spark streaming pipelines.  And I am
> running into an issue which I am not sure how to resolve, but seems like
> should be fairly trivial.
>
> I am using receiver-mode Kafka consumer that comes with Spark, and running
> in standalone mode.  I've setup two receivers, which are consuming from a 4
> broker, 4 partition kafka topic.
>
> If you will look at the image below, you will see that* even though I
> have two receivers, only one of them ever consumes data at a time*.  I
> believe this to be my current bottleneck for scaling.
>
> What am I missing?
>
> To me, order of events consumed is not important.  I just want to optimize
> for maximum throughput.
>
>
> [image: Inline image 1]
>
> Thanks in advance for any help or tips!
>
> Jorge
>

Reply via email to