Hey,

We're moving from Spark 0.7.x to Spark 0.8 (or master actually, to get
the latest shuffle improvement), and after some false starts getting
our EMR scripts to start our slaves with a ulimit higher than 4,000
(sudo -i -u hadoop ... FYI), we're seeing some OOMEs.

I took a heap dump, and it's from 14,000 DiskBlockObjectWriters, which
are stored in 2 7,000-element arrays of BlockObjectWriters in the
ShuffleWriterGroup. Each DBOWriter seems to have 300k of buffers from
it's ning LZFOutputStream.

We run with modest slaves, 4gb of RAM allocated to Spark, ~2g left for
the OS/worker (which is probably conservative), so:

    14,000 * .3m = 4,200m = basically all of our heap

Note that I don't believe there is a leak here--IIRC we have 2 threads
per backend, so each of them running a shuffle (two active
ShuffleWriterGroups) is expected.

So, my question is, is this expected? With the shuffle improvements, is
there now a base level of RAM required for Spark that is >4gb?

I'll bump up to the next instance size and try that, but just curious
as to others thoughts. I can send around the heap dump off line if
requested (it's 120mb compressed).

- Stephen

Reply via email to