Have you set the persistence level of the RDD to MEMORY_ONLY_SER ( http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)? If you're calling cache, the default persistence level is MEMORY_ONLY so that setting will have no impact.
On Thu, Jun 5, 2014 at 4:41 PM, Xu (Simon) Chen <xche...@gmail.com> wrote: > I have a working set larger than available memory, thus I am hoping to > turn on rdd compression so that I can store more in-memory. Strangely it > made no difference. The number of cached partitions, fraction cached, and > size in memory remain the same. Any ideas? > > I confirmed that rdd compression wasn't on before and it was on for the > second test. > > scala> sc.getConf.getAll foreach println > ... > (spark.rdd.compress,true) > ... > > I haven't tried lzo vs snappy, but my guess is that either one should > provide at least some benefit.. > > Thanks. > -Simon > >