Re: partition size for initial read

Yin Huai Thu, 02 Oct 2014 07:38:47 -0700

Hi Tamas,

Can you try to set mapred.map.tasks and see if it works?


Thanks,

Yin

On Thu, Oct 2, 2014 at 10:33 AM, Tamas Jambor <jambo...@gmail.com> wrote:

> That would work - I normally use hive queries through spark sql, I
> have not seen something like that there.
>
> On Thu, Oct 2, 2014 at 3:13 PM, Ashish Jain <ashish....@gmail.com> wrote:
> > If you are using textFiles() to read data in, it also takes in a
> parameter
> > the number of minimum partitions to create. Would that not work for you?
> >
> > On Oct 2, 2014 7:00 AM, "jamborta" <jambo...@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> I have been testing repartitioning to ensure that my algorithms get
> >> similar
> >> amount of data.
> >>
> >> Noticed that repartitioning is very expensive. Is there a way to force
> >> Spark
> >> to create a certain number of partitions when the data is read in? How
> >> does
> >> it decided on the partition size initially?
> >>
> >> Thanks,
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/partition-size-for-initial-read-tp15603.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: partition size for initial read

Reply via email to