Whats your usecase and what are you trying to achieve? May be there's a better way of doing it.
Thanks Best Regards On Fri, May 8, 2015 at 10:20 AM, Richard Alex Hofer <rho...@andrew.cmu.edu> wrote: > Hi, > I'm working on a project in Spark and am trying to understand what's going > on. Right now to try and understand what's happening we came up with this > snippet of code which very roughly resembles what we're actually doing. > When trying to run this our master node ends up quickly using up its memory > even though all of our RDDs are very small. Can someone explain what's > going on here and how we can avoid it? > > a = sc.parallelize(xrange(100),10) > b = a > > for i in xrange(100000): > a = a.map(lambda x: x + 1) > if i % 300 == 0: > # We do this to try and force some of our RDD to evaluate > a.persist() > a.foreachPartition(lambda _: None) > b.unpersist() > b = a > a.collect() > b.unpersist() > > -Richard Hofer > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >