Re: Parallelize on spark context

Holden Karau Thu, 06 Nov 2014 23:17:37 -0800

Hi Naveen,

So by default when we call parallelize it will be parallelized by the
default number (which we can control with the
property spark.default.parallelism) or if we just want a specific instance
of parallelize to have a different number of partitions, we can instead
call sc.parallelize(data, numpartitions). The default value of this is
documented in
http://spark.apache.org/docs/latest/configuration.html#spark-properties


Cheers,

Holden :)

On Thu, Nov 6, 2014 at 10:43 PM, Naveen Kumar Pokala <
npok...@spcapitaliq.com> wrote:

> Hi,
>
>
>
> JavaRDD<Integer> distData = sc.parallelize(data);
>
>
>
> On what basis parallelize splits the data into multiple datasets. How to
> handle if we want these many datasets to be executed per executor?
>
>
>
> For example, my data is of 1000 integers list and I am having 2 node yarn
> cluster. It is diving into 2 batches of 500 size.
>
>
>
> Regards,
>
> Naveen.
>



-- 
Cell : 425-233-8271

Re: Parallelize on spark context

Reply via email to