While it has heap space, batches run well below 15 seconds.

Once it starts to run out of space, processing time takes about 1.5 minutes. 
Scheduling delay is around 4 minutes and total delay around 5.5 minutes. I 
usually shut it down at that point.

The number of stages (and pending stages) does seem to be quite high and 
increases over time.

4584    foreachRDD at HDFSPersistence.java:52     2016/05/30 16:23:52  1.9 min  
          36/36 (4964 skipped) 285/285 (28026 skipped)
4586    transformToPair at SampleCalculator.java:88          2016/05/30 
16:25:02  0.2 s          1/1       4/4
4585    (Unknown Stage Name)         2016/05/30 16:23:52  1.2 min            
1/1       1/1
4582    (Unknown Stage Name)         2016/05/30 16:21:51  48 s     1/1 (4063 
skipped)          12/12 (22716 skipped)
4583    (Unknown Stage Name)         2016/05/30 16:21:51  48 s     1/1       1/1
4580    (Unknown Stage Name)         2016/05/30 16:16:38  4.0 min            
36/36 (4879 skipped)            285/285 (27546 skipped)
4581    (Unknown Stage Name)         2016/05/30 16:16:38  0.1 s    1/1       4/4
4579    (Unknown Stage Name)         2016/05/30 16:15:53  45 s     1/1       1/1
4578    (Unknown Stage Name)         2016/05/30 16:14:38  1.3 min            
1/1 (3993 skipped)          12/12 (22326 skipped)
4577    (Unknown Stage Name)         2016/05/30 16:14:37  0.8 s    1/1       
1/1Is this what you mean by pending stages?

I have taken a few heap dumps but I’m not sure what I am looking at for the 
problematic classes.

From: Shahbaz [mailto:shahzadh...@gmail.com]
Sent: 2016, May, 30 3:25 PM
To: Dancuart, Christian
Cc: user
Subject: Re: Spark Streaming heap space out of memory

Hi Christian,


  *   What is the processing time of each of your Batch,is it exceeding 15 
seconds.
  *   How many jobs are queued.
  *   Can you take a heap dump and see which objects are occupying the heap.

Regards,
Shahbaz


On Tue, May 31, 2016 at 12:21 AM, 
christian.dancu...@rbc.com<mailto:christian.dancu...@rbc.com> 
<christian.dancu...@rbc.com<mailto:christian.dancu...@rbc.com>> wrote:
Hi All,

We have a spark streaming v1.4/java 8 application that slows down and
eventually runs out of heap space. The less driver memory, the faster it
happens.

Appended is our spark configuration and a snapshot of the of heap taken
using jmap on the driver process. The RDDInfo, $colon$colon and [C objects
keep growing as we observe. We also tried to use G1GC, but it acts the same.

Our dependency graph contains multiple updateStateByKey() calls. For each,
we explicitly set the checkpoint interval to 240 seconds.

We have our batch interval set to 15 seconds; with no delays at the start of
the process.

Spark configuration (Spark Driver Memory: 6GB, Spark Executor Memory: 2GB):
spark.streaming.minRememberDuration=180s
spark.ui.showConsoleProgress=false
spark.streaming.receiver.writeAheadLog.enable=true
spark.streaming.unpersist=true
spark.streaming.stopGracefullyOnShutdown=true
spark.streaming.ui.retainedBatches=10
spark.ui.retainedJobs=10
spark.ui.retainedStages=10
spark.worker.ui.retainedExecutors=10
spark.worker.ui.retainedDrivers=10
spark.sql.ui.retainedExecutions=10
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max=128m

num     #instances         #bytes  class name
----------------------------------------------
   1:       8828200      565004800  org.apache.spark.storage.RDDInfo
   2:      20794893      499077432  scala.collection.immutable.$colon$colon
   3:       9646097      459928736  [C
   4:       9644398      231465552  java.lang.String
   5:      12760625      204170000  java.lang.Integer
   6:         21326      111198632  [B
   7:        556959       44661232  [Lscala.collection.mutable.HashEntry;
   8:       1179788       37753216
java.util.concurrent.ConcurrentHashMap$Node
   9:       1169264       37416448  java.util.Hashtable$Entry
  10:        552707       30951592  org.apache.spark.scheduler.StageInfo
  11:        367107       23084712  [Ljava.lang.Object;
  12:        556948       22277920  scala.collection.mutable.HashMap
  13:          2787       22145568
[Ljava.util.concurrent.ConcurrentHashMap$Node;
  14:        116997       12167688  org.apache.spark.executor.TaskMetrics
  15:        360425        8650200
java.util.concurrent.LinkedBlockingQueue$Node
  16:        360417        8650008
org.apache.spark.deploy.history.yarn.HandleSparkEvent
  17:          8332        8478088  [Ljava.util.Hashtable$Entry;
  18:        351061        8425464  scala.collection.mutable.ArrayBuffer
  19:        116963        8421336  org.apache.spark.scheduler.TaskInfo
  20:        446136        7138176  scala.Some
  21:        211968        5087232
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
  22:        116963        4678520
org.apache.spark.scheduler.SparkListenerTaskEnd
  23:        107679        4307160
org.apache.spark.executor.ShuffleWriteMetrics
  24:         72162        4041072
org.apache.spark.executor.ShuffleReadMetrics
  25:        117223        3751136  scala.collection.mutable.ListBuffer
  26:         81473        3258920  org.apache.spark.executor.InputMetrics
  27:        125903        3021672  org.apache.spark.rdd.RDDOperationScope
  28:         91455        2926560  java.util.HashMap$Node
  29:            89        2917776
[Lscala.concurrent.forkjoin.ForkJoinTask;
  30:        116957        2806968
org.apache.spark.scheduler.SparkListenerTaskStart
  31:          2122        2188568  [Lorg.apache.spark.scheduler.StageInfo;
  32:         16411        1819816  java.lang.Class
  33:         87862        1405792
org.apache.spark.scheduler.SparkListenerUnpersistRDD
  34:         22915         916600  org.apache.spark.storage.BlockStatus
  35:          5887         895568  [Ljava.util.HashMap$Node;
  36:           480         855552
[Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
  37:          7569         834968  [I
  38:          9626         770080  org.apache.spark.rdd.MapPartitionsRDD
  39:         31748         761952  java.lang.Long




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-heap-space-out-of-memory-tp27050.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

_______________________________________________________________________
If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.  

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.

Reply via email to