Thanks, Sean. This was not yet digested data for me :) "The number of partitions in a streaming RDD is determined by the block interval and the batch interval." I have seen the bit on spark.streaming.blockInterval in the doc but I didn't connect it with the batch interval and the number of partitions.
On Mon, May 11, 2015 at 5:34 PM, Sean Owen <so...@cloudera.com> wrote: > You might have a look at the Spark docs to start. 1 batch = 1 RDD, but > 1 RDD can have many partitions. And should, for scale. You do not > submit multiple jobs to get parallelism. > > The number of partitions in a streaming RDD is determined by the block > interval and the batch interval. If you have a batch interval of 10s > and block interval of 1s you'll get 10 partitions of data in the RDD. > > On Mon, May 11, 2015 at 10:29 PM, Dmitry Goldenberg > <dgoldenberg...@gmail.com> wrote: > > Understood. We'll use the multi-threaded code we already have.. > > > > How are these execution slots filled up? I assume each slot is dedicated > to > > one submitted task. If that's the case, how is each task distributed > then, > > i.e. how is that task run in a multi-node fashion? Say 1000 > batches/RDD's > > are extracted out of Kafka, how does that relate to the number of > executors > > vs. task slots? > > > > Presumably we can fill up the slots with multiple instances of the same > > task... How do we know how many to launch? > > > > On Mon, May 11, 2015 at 5:20 PM, Sean Owen <so...@cloudera.com> wrote: > >> > >> BTW I think my comment was wrong as marcelo demonstrated. In > >> standalone mode you'd have one worker, and you do have one executor, > >> but his explanation is right. But, you certainly have execution slots > >> for each core. > >> > >> Are you talking about your own user code? you can make threads, but > >> that's nothing do with Spark then. If you run code on your driver, > >> it's not distributed. If you run Spark over an RDD with 1 partition, > >> only one task works on it. > >> > >> On Mon, May 11, 2015 at 10:16 PM, Dmitry Goldenberg > >> <dgoldenberg...@gmail.com> wrote: > >> > Sean, > >> > > >> > How does this model actually work? Let's say we want to run one job > as N > >> > threads executing one particular task, e.g. streaming data out of > Kafka > >> > into > >> > a search engine. How do we configure our Spark job execution? > >> > > >> > Right now, I'm seeing this job running as a single thread. And it's > >> > quite a > >> > bit slower than just running a simple utility with a thread executor > >> > with a > >> > thread pool of N threads doing the same task. > >> > > >> > The performance I'm seeing of running the Kafka-Spark Streaming job > is 7 > >> > times slower than that of the utility. What's pulling Spark back? > >> > > >> > Thanks. > >> > > >> > > >> > On Mon, May 11, 2015 at 4:55 PM, Sean Owen <so...@cloudera.com> > wrote: > >> >> > >> >> You have one worker with one executor with 32 execution slots. > >> >> > >> >> On Mon, May 11, 2015 at 9:52 PM, dgoldenberg < > dgoldenberg...@gmail.com> > >> >> wrote: > >> >> > Hi, > >> >> > > >> >> > Is there anything special one must do, running locally and > submitting > >> >> > a > >> >> > job > >> >> > like so: > >> >> > > >> >> > spark-submit \ > >> >> > --class "com.myco.Driver" \ > >> >> > --master local[*] \ > >> >> > ./lib/myco.jar > >> >> > > >> >> > In my logs, I'm only seeing log messages with the thread identifier > >> >> > of > >> >> > "Executor task launch worker-0". > >> >> > > >> >> > There are 4 cores on the machine so I expected 4 threads to be at > >> >> > play. > >> >> > Running with local[32] did not yield 32 worker threads. > >> >> > > >> >> > Any recommendations? Thanks. > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > View this message in context: > >> >> > > >> >> > > http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html > >> >> > Sent from the Apache Spark User List mailing list archive at > >> >> > Nabble.com. > >> >> > > >> >> > > --------------------------------------------------------------------- > >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> >> > For additional commands, e-mail: user-h...@spark.apache.org > >> >> > > >> > > >> > > > > > >