Re: What should happen if we try to cache more data than the cluster can hold in memory?

Nicholas Chammas Fri, 01 Aug 2014 10:19:15 -0700

On Fri, Aug 1, 2014 at 12:39 PM, Sean Owen <so...@cloudera.com> wrote:


Isn't this your worker running out of its memory for computations,
> rather than for caching RDDs?
>
I’m not sure how to interpret the stack trace, but let’s say that’s true.
I’m even seeing this with a simple a = sc.textFile().cache() and then
a.count(). Spark shouldn’t need that much memory for this kind of work, no?

then the answer is that you should tell
> it to use less memory for caching.
>
I can try that. That’s done by changing spark.storage.memoryFraction, right?

This still seems strange though. The default fraction of the JVM left for
non-cache activity (1 - 0.6 = 40%
<http://spark.apache.org/docs/latest/configuration.html#execution-behavior>)
should be plenty for just counting elements. I’m using m1.xlarge nodes that
have 15GB of memory apiece.

Nick

Re: What should happen if we try to cache more data than the cluster can hold in memory?

Reply via email to