ache-spark-user-list.1001560.n3.nabble.com/Parallelize-on-spark-context-tp18327p18381.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For
@spark.apache.org
Subject: Re: Parallelize on spark context
Hi Naveen,
So by default when we call parallelize it will be parallelized by the default
number (which we can control with the property spark.default.parallelism) or if
we just want a specific instance of parallelize to have a different
Hi Naveen,
So by default when we call parallelize it will be parallelized by the
default number (which we can control with the
property spark.default.parallelism) or if we just want a specific instance
of parallelize to have a different number of partitions, we can instead
call sc.parallelize(data
Hi,
JavaRDD distData = sc.parallelize(data);
On what basis parallelize splits the data into multiple datasets. How to handle
if we want these many datasets to be executed per executor?
For example, my data is of 1000 integers list and I am having 2 node yarn
cluster. It is diving into 2 batche