While it has heap space, batches run well below 15 seconds. Once it starts to run out of space, processing time takes about 1.5 minutes. Scheduling delay is around 4 minutes and total delay around 5.5 minutes. I usually shut it down at that point.
The number of stages (and pending stages) does seem to be quite high and increases over time. 4584 foreachRDD at HDFSPersistence.java:52 2016/05/30 16:23:52 1.9 min 36/36 (4964 skipped) 285/285 (28026 skipped) 4586 transformToPair at SampleCalculator.java:88 2016/05/30 16:25:02 0.2 s 1/1 4/4 4585 (Unknown Stage Name) 2016/05/30 16:23:52 1.2 min 1/1 1/1 4582 (Unknown Stage Name) 2016/05/30 16:21:51 48 s 1/1 (4063 skipped) 12/12 (22716 skipped) 4583 (Unknown Stage Name) 2016/05/30 16:21:51 48 s 1/1 1/1 4580 (Unknown Stage Name) 2016/05/30 16:16:38 4.0 min 36/36 (4879 skipped) 285/285 (27546 skipped) 4581 (Unknown Stage Name) 2016/05/30 16:16:38 0.1 s 1/1 4/4 4579 (Unknown Stage Name) 2016/05/30 16:15:53 45 s 1/1 1/1 4578 (Unknown Stage Name) 2016/05/30 16:14:38 1.3 min 1/1 (3993 skipped) 12/12 (22326 skipped) 4577 (Unknown Stage Name) 2016/05/30 16:14:37 0.8 s 1/1 1/1Is this what you mean by pending stages? I have taken a few heap dumps but I’m not sure what I am looking at for the problematic classes. From: Shahbaz [mailto:shahzadh...@gmail.com] Sent: 2016, May, 30 3:25 PM To: Dancuart, Christian Cc: user Subject: Re: Spark Streaming heap space out of memory Hi Christian, * What is the processing time of each of your Batch,is it exceeding 15 seconds. * How many jobs are queued. * Can you take a heap dump and see which objects are occupying the heap. Regards, Shahbaz On Tue, May 31, 2016 at 12:21 AM, christian.dancu...@rbc.com<mailto:christian.dancu...@rbc.com> <christian.dancu...@rbc.com<mailto:christian.dancu...@rbc.com>> wrote: Hi All, We have a spark streaming v1.4/java 8 application that slows down and eventually runs out of heap space. The less driver memory, the faster it happens. Appended is our spark configuration and a snapshot of the of heap taken using jmap on the driver process. The RDDInfo, $colon$colon and [C objects keep growing as we observe. We also tried to use G1GC, but it acts the same. Our dependency graph contains multiple updateStateByKey() calls. For each, we explicitly set the checkpoint interval to 240 seconds. We have our batch interval set to 15 seconds; with no delays at the start of the process. Spark configuration (Spark Driver Memory: 6GB, Spark Executor Memory: 2GB): spark.streaming.minRememberDuration=180s spark.ui.showConsoleProgress=false spark.streaming.receiver.writeAheadLog.enable=true spark.streaming.unpersist=true spark.streaming.stopGracefullyOnShutdown=true spark.streaming.ui.retainedBatches=10 spark.ui.retainedJobs=10 spark.ui.retainedStages=10 spark.worker.ui.retainedExecutors=10 spark.worker.ui.retainedDrivers=10 spark.sql.ui.retainedExecutions=10 spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max=128m num #instances #bytes class name ---------------------------------------------- 1: 8828200 565004800 org.apache.spark.storage.RDDInfo 2: 20794893 499077432 scala.collection.immutable.$colon$colon 3: 9646097 459928736 [C 4: 9644398 231465552 java.lang.String 5: 12760625 204170000 java.lang.Integer 6: 21326 111198632 [B 7: 556959 44661232 [Lscala.collection.mutable.HashEntry; 8: 1179788 37753216 java.util.concurrent.ConcurrentHashMap$Node 9: 1169264 37416448 java.util.Hashtable$Entry 10: 552707 30951592 org.apache.spark.scheduler.StageInfo 11: 367107 23084712 [Ljava.lang.Object; 12: 556948 22277920 scala.collection.mutable.HashMap 13: 2787 22145568 [Ljava.util.concurrent.ConcurrentHashMap$Node; 14: 116997 12167688 org.apache.spark.executor.TaskMetrics 15: 360425 8650200 java.util.concurrent.LinkedBlockingQueue$Node 16: 360417 8650008 org.apache.spark.deploy.history.yarn.HandleSparkEvent 17: 8332 8478088 [Ljava.util.Hashtable$Entry; 18: 351061 8425464 scala.collection.mutable.ArrayBuffer 19: 116963 8421336 org.apache.spark.scheduler.TaskInfo 20: 446136 7138176 scala.Some 21: 211968 5087232 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry 22: 116963 4678520 org.apache.spark.scheduler.SparkListenerTaskEnd 23: 107679 4307160 org.apache.spark.executor.ShuffleWriteMetrics 24: 72162 4041072 org.apache.spark.executor.ShuffleReadMetrics 25: 117223 3751136 scala.collection.mutable.ListBuffer 26: 81473 3258920 org.apache.spark.executor.InputMetrics 27: 125903 3021672 org.apache.spark.rdd.RDDOperationScope 28: 91455 2926560 java.util.HashMap$Node 29: 89 2917776 [Lscala.concurrent.forkjoin.ForkJoinTask; 30: 116957 2806968 org.apache.spark.scheduler.SparkListenerTaskStart 31: 2122 2188568 [Lorg.apache.spark.scheduler.StageInfo; 32: 16411 1819816 java.lang.Class 33: 87862 1405792 org.apache.spark.scheduler.SparkListenerUnpersistRDD 34: 22915 916600 org.apache.spark.storage.BlockStatus 35: 5887 895568 [Ljava.util.HashMap$Node; 36: 480 855552 [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry; 37: 7569 834968 [I 38: 9626 770080 org.apache.spark.rdd.MapPartitionsRDD 39: 31748 761952 java.lang.Long -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-heap-space-out-of-memory-tp27050.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> _______________________________________________________________________ If you received this email in error, please advise the sender (by return email or otherwise) immediately. You have consented to receive the attached electronically at the above-noted email address; please retain a copy of this confirmation for future reference. Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation pour les fins de reference future.