Hey, We're moving from Spark 0.7.x to Spark 0.8 (or master actually, to get the latest shuffle improvement), and after some false starts getting our EMR scripts to start our slaves with a ulimit higher than 4,000 (sudo -i -u hadoop ... FYI), we're seeing some OOMEs.
I took a heap dump, and it's from 14,000 DiskBlockObjectWriters, which are stored in 2 7,000-element arrays of BlockObjectWriters in the ShuffleWriterGroup. Each DBOWriter seems to have 300k of buffers from it's ning LZFOutputStream. We run with modest slaves, 4gb of RAM allocated to Spark, ~2g left for the OS/worker (which is probably conservative), so: 14,000 * .3m = 4,200m = basically all of our heap Note that I don't believe there is a leak here--IIRC we have 2 threads per backend, so each of them running a shuffle (two active ShuffleWriterGroups) is expected. So, my question is, is this expected? With the shuffle improvements, is there now a base level of RAM required for Spark that is >4gb? I'll bump up to the next instance size and try that, but just curious as to others thoughts. I can send around the heap dump off line if requested (it's 120mb compressed). - Stephen