Have you set the persistence level of the RDD to MEMORY_ONLY_SER (
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)?
If you're calling cache, the default persistence level is MEMORY_ONLY so
that setting will have no impact.


On Thu, Jun 5, 2014 at 4:41 PM, Xu (Simon) Chen <xche...@gmail.com> wrote:

> I have a working set larger than available memory, thus I am hoping to
> turn on rdd compression so that I can store more in-memory. Strangely it
> made no difference. The number of cached partitions, fraction cached, and
> size in memory remain the same. Any ideas?
>
> I confirmed that rdd compression wasn't on before and it was on for the
> second test.
>
> scala> sc.getConf.getAll foreach println
> ...
> (spark.rdd.compress,true)
> ...
>
> I haven't tried lzo vs snappy, but my guess is that either one should
> provide at least some benefit..
>
> Thanks.
> -Simon
>
>

Reply via email to