I guess it may be some dead-lock in BlockGenerator. Could you check it by yourself?
On Thu, Feb 4, 2016 at 4:14 PM, Udo Fholl <udofholl1...@gmail.com> wrote: > Thank you for your response > > Unfortunately I cannot share a thread dump. What are you looking for > exactly? > > Here is the list of the 50 biggest objects (retained size order, > descendent): > > java.util.concurrent.ArrayBlockingQueue# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > scala.concurrent.forkjoin.ForkJoinPool# > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.storage.MemoryStore# > java.util.LinkedHashMap# > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue# > scala.concurrent.forkjoin.ForkJoinTask[]# > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue# > scala.concurrent.forkjoin.ForkJoinTask[]# > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue# > scala.concurrent.forkjoin.ForkJoinTask[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > org.apache.spark.streaming.receiver.BlockGenerator$Block# > scala.collection.mutable.ArrayBuffer# > java.lang.Object[]# > scala.collection.Iterator$$anon$ > org.apache.spark.InterruptibleIterator# > scala.collection.IndexedSeqLike$Elements# > scala.collection.mutable.ArrayOps$ofRef# > java.lang.Object[]# > > > > > On Thu, Feb 4, 2016 at 7:14 PM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Hey Udo, >> >> mapWithState usually uses much more memory than updateStateByKey since it >> caches the states in memory. >> >> However, from your description, looks BlockGenerator cannot push data >> into BlockManager, there may be something wrong in BlockGenerator. Could >> you share the top 50 objects in the heap dump and the thread dump? >> >> >> On Wed, Feb 3, 2016 at 9:52 AM, Udo Fholl <udofholl1...@gmail.com> wrote: >> >>> Hi all, >>> >>> I recently migrated from 'updateStateByKey' to 'mapWithState' and now I >>> see a huge increase of memory. Most of it is a massive "BlockGenerator" >>> (which points to a massive "ArrayBlockingQueue" that in turns point to a >>> huge "Object[]"). >>> >>> I'm pretty sure it has to do with my code, but I barely changed anything >>> in the code. Just adapted the function. >>> >>> Did anyone run into this? >>> >>> Best regards, >>> Udo. >>> >> >> >