Good catch, I was not aware of this setting. I’m wondering though if it also generates a shuffle or if the data is still processed by the node on which it’s ingested - so that you’re not gated by the number of cores on one machine.
-adrian On 9/25/15, 5:27 PM, "Silvio Fiorito" <silvio.fior...@granturing.com> wrote: >One thing you should look at is your batch duration and >spark.streaming.blockInterval > >Those 2 things control how many partitions are generated for each RDD (batch) >of the DStream when using a receiver (vs direct approach). > >So if you have a 2 second batch duration and the default blockInterval of >200ms this will create 10 partitions. This means you can have a max of 10 >parallel tasks (as long as you have the cores available) running at a time for >a map-like operation. > > > > >On 9/25/15, 9:08 AM, "nib...@free.fr" <nib...@free.fr> wrote: > >>Hello, >>I used a custom receiver in order to receive JMS messages from MQ Servers. >>I want to benefit of Yarn cluster, my questions are : >> >>- Is it possible to have only one node receiving JMS messages and parralelize >>the RDD over all the cluster nodes ? >>- Is it possible to parallelize also the message receiver over cluster nodes ? >> >>If you have any code example for the both items it would be fine, because the >>parralelization mechanism in the code is not crystal clear for me ... >> >>Tks >>Nicolas >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>For additional commands, e-mail: user-h...@spark.apache.org >>