subject:"Spark Memory Bounds"

Re: Spark Memory Bounds

2014-05-28 Thread Keith Simmons

Thanks! Sounds like my rough understanding was roughly right :) Definitely understand cached RDDs can add to the memory requirements. Luckily, like you mentioned, you can configure spark to flush that to disk and bound its total size in memory via spark.storage.memoryFraction, so I have a

Spark Memory Bounds

2014-05-27 Thread Keith Simmons

I'm trying to determine how to bound my memory use in a job working with more data than can simultaneously fit in RAM. From reading the tuning guide, my impression is that Spark's memory usage is roughly the following: (A) In-Memory RDD use + (B) In memory Shuffle use + (C) Transient memory used

Re: Spark Memory Bounds

2014-05-27 Thread Christopher Nguyen

Keith, do you mean bound as in (a) strictly control to some quantifiable limit, or (b) try to minimize the amount used by each task? If a, then that is outside the scope of Spark's memory management, which you should think of as an application-level (that is, above JVM) mechanism. In this scope,

Re: Spark Memory Bounds

2014-05-27 Thread Keith Simmons

A dash of both. I want to know enough that I can reason about, rather than strictly control, the amount of memory Spark will use. If I have a big data set, I want to understand how I can design it so that Spark's memory consumption falls below my available resources. Or alternatively, if it's

Re: Spark Memory Bounds

Spark Memory Bounds

Re: Spark Memory Bounds

Re: Spark Memory Bounds

4 matches

Site Navigation

Mail list logo

Footer information