On Fri, Aug 1, 2014 at 12:39 PM, Sean Owen <so...@cloudera.com> wrote:
Isn't this your worker running out of its memory for computations, > rather than for caching RDDs? > I’m not sure how to interpret the stack trace, but let’s say that’s true. I’m even seeing this with a simple a = sc.textFile().cache() and then a.count(). Spark shouldn’t need that much memory for this kind of work, no? then the answer is that you should tell > it to use less memory for caching. > I can try that. That’s done by changing spark.storage.memoryFraction, right? This still seems strange though. The default fraction of the JVM left for non-cache activity (1 - 0.6 = 40% <http://spark.apache.org/docs/latest/configuration.html#execution-behavior>) should be plenty for just counting elements. I’m using m1.xlarge nodes that have 15GB of memory apiece. Nick