I'm not sure exactly what you're trying to do, but take a look at rdd.toLocalIterator if you haven't already.
On Tue, Dec 30, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote: > collect()-ing a partition still implies copying it to the driver, but > you're suggesting you can't collect() the whole data set to the > driver. What do you mean: collect() 1 partition? or collect() some > smaller result from each partition? > > On Tue, Dec 30, 2014 at 11:54 AM, DEVAN M.S. <msdeva...@gmail.com> wrote: > > Hi all, > > i have one large data-set. when i am getting the number of partitions its > > showing 43. > > We can't collect() the large data-set in to memory so i am thinking like > > this, collect() each partitions so that it will be small in size. > > > > Any thoughts ? > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >