Re: Receiver and Parallelization

Adrian Tanase Fri, 25 Sep 2015 09:09:54 -0700

Good catch, I was not aware of this setting.

I’m wondering though if it also generates a shuffle or if the data is still 
processed by the node on which it’s ingested - so that you’re not gated by the 
number of cores on one machine.


-adrian



On 9/25/15, 5:27 PM, "Silvio Fiorito" <silvio.fior...@granturing.com> wrote:

>One thing you should look at is your batch duration and 
>spark.streaming.blockInterval
>
>Those 2 things control how many partitions are generated for each RDD (batch) 
>of the DStream when using a receiver (vs direct approach).
>
>So if you have a 2 second batch duration and the default blockInterval of 
>200ms this will create 10 partitions. This means you can have a max of 10 
>parallel tasks (as long as you have the cores available) running at a time for 
>a map-like operation.
>
>
>
>
>On 9/25/15, 9:08 AM, "nib...@free.fr" <nib...@free.fr> wrote:
>
>>Hello,
>>I used a custom receiver in order to receive JMS messages from MQ Servers.
>>I want to benefit of Yarn cluster, my questions are :
>>
>>- Is it possible to have only one node receiving JMS messages and parralelize 
>>the RDD over all the cluster nodes ?
>>- Is it possible to parallelize also the message receiver over cluster nodes ?
>>
>>If you have any code example for the both items it would be fine, because the 
>>parralelization mechanism in the code is not crystal clear for me ...
>>
>>Tks
>>Nicolas
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>For additional commands, e-mail: user-h...@spark.apache.org
>>

Re: Receiver and Parallelization

Reply via email to