Short answer: yes. Take a look at: http://spark.apache.org/docs/latest/running-on-yarn.html
Look for "memoryOverhead". On Mon, Jan 12, 2015 at 2:06 PM, Michael Albert <m_albert...@yahoo.com.invalid> wrote: > Greetings! > > My executors apparently are being terminated because they are > "running beyond physical memory limits" according to the > "yarn-hadoop-nodemanager" > logs on the worker nodes (/mnt/var/log/hadoop on AWS EMR). > I'm setting the "driver-memory" to 8G. > However, looking at "stdout" in userlogs, I can see GC going on, but the > lines look > like "6G -> 5G(7.2G), 0.45secs", so the GC seems to think that the process > is using > about 6G of space, not 8G of space. > However, "ps aux" shows an RSS hovering just below 8G. > > The process does a "mapParitionsWithIndex", and the process uses compression > which (I believe) calls into the native zlib library > (the overall purpose is to convert each partition into a "matlab" file). > > Could it be that the Yarn container is counting both the memory used by the > JVM proper and memory used by zlib, but that the GC only "sees" the > "internal" memory. So the GC keeps the memory usage "reasonable", > e.g., 6G in an 8G container, but then zlib grabs some memory, and the > YARN container then terminates the task? > > If so, is there anything I can do so that I tell YARN to watch for a larger > memory limit than I tell the JVM to use for it's memory? > > Thanks! > > Sincerely, > Mike > > -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org