Thanks, Sean. This was not yet digested data for me :)

"The number of partitions in a streaming RDD is determined by the
block interval and the batch interval."  I have seen the bit on
spark.streaming.blockInterval
in the doc but I didn't connect it with the batch interval and the number
of partitions.

On Mon, May 11, 2015 at 5:34 PM, Sean Owen <so...@cloudera.com> wrote:

> You might have a look at the Spark docs to start. 1 batch = 1 RDD, but
> 1 RDD can have many partitions. And should, for scale. You do not
> submit multiple jobs to get parallelism.
>
> The number of partitions in a streaming RDD is determined by the block
> interval and the batch interval. If you have a batch interval of 10s
> and block interval of 1s you'll get 10 partitions of data in the RDD.
>
> On Mon, May 11, 2015 at 10:29 PM, Dmitry Goldenberg
> <dgoldenberg...@gmail.com> wrote:
> > Understood. We'll use the multi-threaded code we already have..
> >
> > How are these execution slots filled up? I assume each slot is dedicated
> to
> > one submitted task.  If that's the case, how is each task distributed
> then,
> > i.e. how is that task run in a multi-node fashion?  Say 1000
> batches/RDD's
> > are extracted out of Kafka, how does that relate to the number of
> executors
> > vs. task slots?
> >
> > Presumably we can fill up the slots with multiple instances of the same
> > task... How do we know how many to launch?
> >
> > On Mon, May 11, 2015 at 5:20 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> BTW I think my comment was wrong as marcelo demonstrated. In
> >> standalone mode you'd have one worker, and you do have one executor,
> >> but his explanation is right. But, you certainly have execution slots
> >> for each core.
> >>
> >> Are you talking about your own user code? you can make threads, but
> >> that's nothing do with Spark then. If you run code on your driver,
> >> it's not distributed. If you run Spark over an RDD with 1 partition,
> >> only one task works on it.
> >>
> >> On Mon, May 11, 2015 at 10:16 PM, Dmitry Goldenberg
> >> <dgoldenberg...@gmail.com> wrote:
> >> > Sean,
> >> >
> >> > How does this model actually work? Let's say we want to run one job
> as N
> >> > threads executing one particular task, e.g. streaming data out of
> Kafka
> >> > into
> >> > a search engine.  How do we configure our Spark job execution?
> >> >
> >> > Right now, I'm seeing this job running as a single thread. And it's
> >> > quite a
> >> > bit slower than just running a simple utility with a thread executor
> >> > with a
> >> > thread pool of N threads doing the same task.
> >> >
> >> > The performance I'm seeing of running the Kafka-Spark Streaming job
> is 7
> >> > times slower than that of the utility.  What's pulling Spark back?
> >> >
> >> > Thanks.
> >> >
> >> >
> >> > On Mon, May 11, 2015 at 4:55 PM, Sean Owen <so...@cloudera.com>
> wrote:
> >> >>
> >> >> You have one worker with one executor with 32 execution slots.
> >> >>
> >> >> On Mon, May 11, 2015 at 9:52 PM, dgoldenberg <
> dgoldenberg...@gmail.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > Is there anything special one must do, running locally and
> submitting
> >> >> > a
> >> >> > job
> >> >> > like so:
> >> >> >
> >> >> > spark-submit \
> >> >> >         --class "com.myco.Driver" \
> >> >> >         --master local[*]  \
> >> >> >         ./lib/myco.jar
> >> >> >
> >> >> > In my logs, I'm only seeing log messages with the thread identifier
> >> >> > of
> >> >> > "Executor task launch worker-0".
> >> >> >
> >> >> > There are 4 cores on the machine so I expected 4 threads to be at
> >> >> > play.
> >> >> > Running with local[32] did not yield 32 worker threads.
> >> >> >
> >> >> > Any recommendations? Thanks.
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > View this message in context:
> >> >> >
> >> >> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
> >> >> > Sent from the Apache Spark User List mailing list archive at
> >> >> > Nabble.com.
> >> >> >
> >> >> >
> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> >> > For additional commands, e-mail: user-h...@spark.apache.org
> >> >> >
> >> >
> >> >
> >
> >
>

Reply via email to