Snappy sounds like it'd be a better solution here. LZF requires a pretty sizeable buffer per stream (accounting for the 300k you're seeing). It looks like you have 7000 reducers, and each one requires an LZF-compressed stream. Snappy has a much lower overhead per stream, so I'd give it a try.
Thanks, by the way, for the very detailed problem description and trying it on Spark 0.8! Hopefully we can get this issue resolved. On Fri, Oct 25, 2013 at 8:30 AM, Stephen Haberman < stephen.haber...@gmail.com> wrote: > > We're moving from Spark 0.7.x to Spark 0.8 (or master actually, to get > > the latest shuffle improvement) > > FWIW I went back from Spark master to Spark 0.8 and am seeing the same > behavior: OOMEs because of ~14,000 DiskBlockObjectWriters with 300k of > memory each. > > So, it is not a change from the recent shuffle patch. > > - Stephen > >