Re: Parallelize on spark context

2014-11-07 Thread _soumya_
ache-spark-user-list.1001560.n3.nabble.com/Parallelize-on-spark-context-tp18327p18381.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

RE: Parallelize on spark context

2014-11-06 Thread Naveen Kumar Pokala
@spark.apache.org Subject: Re: Parallelize on spark context Hi Naveen, So by default when we call parallelize it will be parallelized by the default number (which we can control with the property spark.default.parallelism) or if we just want a specific instance of parallelize to have a different

Re: Parallelize on spark context

2014-11-06 Thread Holden Karau
Hi Naveen, So by default when we call parallelize it will be parallelized by the default number (which we can control with the property spark.default.parallelism) or if we just want a specific instance of parallelize to have a different number of partitions, we can instead call sc.parallelize(data

Parallelize on spark context

2014-11-06 Thread Naveen Kumar Pokala
Hi, JavaRDD distData = sc.parallelize(data); On what basis parallelize splits the data into multiple datasets. How to handle if we want these many datasets to be executed per executor? For example, my data is of 1000 integers list and I am having 2 node yarn cluster. It is diving into 2 batche