Hi Tamas, Can you try to set mapred.map.tasks and see if it works?
Thanks, Yin On Thu, Oct 2, 2014 at 10:33 AM, Tamas Jambor <jambo...@gmail.com> wrote: > That would work - I normally use hive queries through spark sql, I > have not seen something like that there. > > On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain <ashish....@gmail.com> wrote: > > If you are using textFiles() to read data in, it also takes in a > parameter > > the number of minimum partitions to create. Would that not work for you? > > > > On Oct 2, 2014 7:00 AM, "jamborta" <jambo...@gmail.com> wrote: > >> > >> Hi all, > >> > >> I have been testing repartitioning to ensure that my algorithms get > >> similar > >> amount of data. > >> > >> Noticed that repartitioning is very expensive. Is there a way to force > >> Spark > >> to create a certain number of partitions when the data is read in? How > >> does > >> it decided on the partition size initially? > >> > >> Thanks, > >> > >> > >> > >> -- > >> View this message in context: > >> > http://apache-spark-user-list.1001560.n3.nabble.com/partition-size-for-initial-read-tp15603.html > >> Sent from the Apache Spark User List mailing list archive at Nabble.com. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >