fyi after further troubleshooting logging this as 
https://issues.apache.org/jira/browse/SPARK-12511 

    On Tuesday, 22 December 2015, 18:16, Antony Mayi <antonym...@yahoo.com> 
wrote:
 
 

 I narrowed it down to problem described for example here: 
https://bugs.openjdk.java.net/browse/JDK-6293787
It is the mass finalization of zip Inflater/Deflater objects which can't keep 
up with the rate of these instances being garbage collected. as the jdk bug 
report (not being accepted as a bug) suggests this is an error of suboptimal 
destruction of the instances.
Not sure where the zip comes from - for all the compressors used in spark I was 
using the default snappy codec.
I am trying to disable all the spark.*.compress options and so far it seems 
this has dramatically improved, the finalization looks to be keeping up and the 
heap is stable.
Any input is still welcome! 

    On Tuesday, 22 December 2015, 12:17, Ted Yu <yuzhih...@gmail.com> wrote:
 
 

 This might be related but the jmap output there looks different:
http://stackoverflow.com/questions/32537965/huge-number-of-io-netty-buffer-poolthreadcachememoryregioncacheentry-instances

On Tue, Dec 22, 2015 at 2:59 AM, Antony Mayi <antonym...@yahoo.com.invalid> 
wrote:

I have streaming app (pyspark 1.5.2 on yarn) that's crashing due to driver (jvm 
part, not python) OOM (no matter how big heap is assigned, eventually runs out).
When checking the heap it is all taken by "byte" items of 
io.netty.buffer.PoolThreadCache. The number of 
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry is constant yet the 
number of [B "bytes" keeps growing as well as the number of Finalizer 
instances. When checking the Finalizer instances it is all of 
ZipFile$ZipFileInputStream and ZipFile$ZipFileInflaterInputStream
 num     #instances         #bytes  class 
name----------------------------------------------   1:        123556      
278723776  [B   2:        258988       10359520  java.lang.ref.Finalizer   3:   
     174620        9778720  java.util.zip.Deflater   4:         66684        
7468608  org.apache.spark.executor.TaskMetrics   5:         80070        
7160112  [C   6:        282624        6782976  
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry   7:        206371      
  4952904  java.lang.Long
the platform is using netty 3.6.6 and openjdk 1.8 (tried on 1.7 as well with 
same issue).
would anyone have a clue how to troubleshoot further?
thx.



 
   

 
  

Reply via email to