Re: KafkaInputDStream mapping of partitions to tasks

Patrick Wendell Thu, 27 Mar 2014 12:23:09 -0700

If you call repartition() on the original stream you can set the level of
parallelism after it's ingested from Kafka. I'm not sure how it maps kafka
topic partitions to tasks for the ingest thought.



On Thu, Mar 27, 2014 at 11:09 AM, Scott Clasen <scott.cla...@gmail.com>wrote:

> I have a simple streaming job that creates a kafka input stream on a topic
> with 8 partitions, and does a forEachRDD
>
> The job and tasks are running on mesos, and there are two tasks running,
> but
> only 1 task doing anything.
>
> I also set spark.streaming.concurrentJobs=8  but still there is only 1 task
> doing work. I would have expected that each task took a subset of the
> partitions.
>
> Is there a way to make more than one task share the work here?  Are my
> expectations off here?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: KafkaInputDStream mapping of partitions to tasks

Reply via email to