gc/oome from 14,000 DiskBlockObjectWriters

Stephen Haberman Fri, 25 Oct 2013 07:42:35 -0700

Hey,

We're moving from Spark 0.7.x to Spark 0.8 (or master actually, to get
the latest shuffle improvement), and after some false starts getting
our EMR scripts to start our slaves with a ulimit higher than 4,000
(sudo -i -u hadoop ... FYI), we're seeing some OOMEs.


I took a heap dump, and it's from 14,000 DiskBlockObjectWriters, which
are stored in 2 7,000-element arrays of BlockObjectWriters in the
ShuffleWriterGroup. Each DBOWriter seems to have 300k of buffers from
it's ning LZFOutputStream.

We run with modest slaves, 4gb of RAM allocated to Spark, ~2g left for
the OS/worker (which is probably conservative), so:

    14,000 * .3m = 4,200m = basically all of our heap

Note that I don't believe there is a leak here--IIRC we have 2 threads
per backend, so each of them running a shuffle (two active
ShuffleWriterGroups) is expected.

So, my question is, is this expected? With the shuffle improvements, is
there now a base level of RAM required for Spark that is >4gb?

I'll bump up to the next instance size and try that, but just curious
as to others thoughts. I can send around the heap dump off line if
requested (it's 120mb compressed).

- Stephen

gc/oome from 14,000 DiskBlockObjectWriters

Reply via email to