I'm not sure exactly what you're trying to do, but take a look at
rdd.toLocalIterator if you haven't already.

On Tue, Dec 30, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:

> collect()-ing a partition still implies copying it to the driver, but
> you're suggesting you can't collect() the whole data set to the
> driver. What do you mean: collect() 1 partition? or collect() some
> smaller result from each partition?
>
> On Tue, Dec 30, 2014 at 11:54 AM, DEVAN M.S. <msdeva...@gmail.com> wrote:
> > Hi all,
> > i have one large data-set. when i am getting the number of partitions its
> > showing 43.
> > We can't collect() the large data-set in to  memory so i am thinking like
> > this, collect() each partitions so that it will be small in size.
> >
> > Any thoughts ?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to