One thing you should look at is your batch duration and spark.streaming.blockInterval
Those 2 things control how many partitions are generated for each RDD (batch) of the DStream when using a receiver (vs direct approach). So if you have a 2 second batch duration and the default blockInterval of 200ms this will create 10 partitions. This means you can have a max of 10 parallel tasks (as long as you have the cores available) running at a time for a map-like operation. On 9/25/15, 9:08 AM, "nib...@free.fr" <nib...@free.fr> wrote: >Hello, >I used a custom receiver in order to receive JMS messages from MQ Servers. >I want to benefit of Yarn cluster, my questions are : > >- Is it possible to have only one node receiving JMS messages and parralelize >the RDD over all the cluster nodes ? >- Is it possible to parallelize also the message receiver over cluster nodes ? > >If you have any code example for the both items it would be fine, because the >parralelization mechanism in the code is not crystal clear for me ... > >Tks >Nicolas > >--------------------------------------------------------------------- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org >