Whats your usecase and what are you trying to achieve? May be there's a
better way of doing it.

Thanks
Best Regards

On Fri, May 8, 2015 at 10:20 AM, Richard Alex Hofer <rho...@andrew.cmu.edu>
wrote:

> Hi,
> I'm working on a project in Spark and am trying to understand what's going
> on. Right now to try and understand what's happening we came up with this
> snippet of code which very roughly resembles what we're actually doing.
> When trying to run this our master node ends up quickly using up its memory
> even though all of our RDDs are very small. Can someone explain what's
> going on here and how we can avoid it?
>
> a = sc.parallelize(xrange(100),10)
> b = a
>
> for i in xrange(100000):
>     a = a.map(lambda x: x + 1)
>     if i % 300 == 0:
>     # We do this to try and force some of our RDD to evaluate
>     a.persist()
>         a.foreachPartition(lambda _: None)
>         b.unpersist()
>         b = a
> a.collect()
> b.unpersist()
>
> -Richard Hofer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to