Please ignore this question, as i've figured out what my problem was. In the case that anyone else runs into something similar, the problem was on the kafka side. I was using the console producer to generate the messages going into the kafka logs. This producer will send all of the messages to the same partition, unless you specify the "--new-producer" parameter.
Thanks, Jorge On Wed, Feb 3, 2016 at 12:44 PM, Jorge Rodriguez <jo...@bloomreach.com> wrote: > Hello Spark users, > > We are setting up our fist bach of spark streaming pipelines. And I am > running into an issue which I am not sure how to resolve, but seems like > should be fairly trivial. > > I am using receiver-mode Kafka consumer that comes with Spark, and running > in standalone mode. I've setup two receivers, which are consuming from a 4 > broker, 4 partition kafka topic. > > If you will look at the image below, you will see that* even though I > have two receivers, only one of them ever consumes data at a time*. I > believe this to be my current bottleneck for scaling. > > What am I missing? > > To me, order of events consumed is not important. I just want to optimize > for maximum throughput. > > > [image: Inline image 1] > > Thanks in advance for any help or tips! > > Jorge >