fyi after further troubleshooting logging this as https://issues.apache.org/jira/browse/SPARK-12511
On Tuesday, 22 December 2015, 18:16, Antony Mayi <antonym...@yahoo.com> wrote: I narrowed it down to problem described for example here: https://bugs.openjdk.java.net/browse/JDK-6293787 It is the mass finalization of zip Inflater/Deflater objects which can't keep up with the rate of these instances being garbage collected. as the jdk bug report (not being accepted as a bug) suggests this is an error of suboptimal destruction of the instances. Not sure where the zip comes from - for all the compressors used in spark I was using the default snappy codec. I am trying to disable all the spark.*.compress options and so far it seems this has dramatically improved, the finalization looks to be keeping up and the heap is stable. Any input is still welcome! On Tuesday, 22 December 2015, 12:17, Ted Yu <yuzhih...@gmail.com> wrote: This might be related but the jmap output there looks different: http://stackoverflow.com/questions/32537965/huge-number-of-io-netty-buffer-poolthreadcachememoryregioncacheentry-instances On Tue, Dec 22, 2015 at 2:59 AM, Antony Mayi <antonym...@yahoo.com.invalid> wrote: I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm part, not python) OOM (no matter how big heap is assigned, eventually runs out). When checking the heap it is all taken by "byte" items of io.netty.buffer.PoolThreadCache. The number of io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry is constant yet the number of [B "bytes" keeps growing as well as the number of Finalizer instances. When checking the Finalizer instances it is all of ZipFile$ZipFileInputStream and ZipFile$ZipFileInflaterInputStream num #instances #bytes class name---------------------------------------------- 1: 123556 278723776 [B 2: 258988 10359520 java.lang.ref.Finalizer 3: 174620 9778720 java.util.zip.Deflater 4: 66684 7468608 org.apache.spark.executor.TaskMetrics 5: 80070 7160112 [C 6: 282624 6782976 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry 7: 206371 4952904 java.lang.Long the platform is using netty 3.6.6 and openjdk 1.8 (tried on 1.7 as well with same issue). would anyone have a clue how to troubleshoot further? thx.