Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

Shixiong(Ryan) Zhu Thu, 04 Feb 2016 17:31:52 -0800

I guess it may be some dead-lock in BlockGenerator. Could you check it by
yourself?


On Thu, Feb 4, 2016 at 4:14 PM, Udo Fholl <udofholl1...@gmail.com> wrote:

> Thank you for your response
>
> Unfortunately I cannot share  a thread dump. What are you looking for
> exactly?
>
> Here is the list of the 50 biggest objects (retained size order,
> descendent):
>
> java.util.concurrent.ArrayBlockingQueue#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> scala.concurrent.forkjoin.ForkJoinPool#
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.storage.MemoryStore#
> java.util.LinkedHashMap#
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue#
> scala.concurrent.forkjoin.ForkJoinTask[]#
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue#
> scala.concurrent.forkjoin.ForkJoinTask[]#
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue#
> scala.concurrent.forkjoin.ForkJoinTask[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> org.apache.spark.streaming.receiver.BlockGenerator$Block#
> scala.collection.mutable.ArrayBuffer#
> java.lang.Object[]#
> scala.collection.Iterator$$anon$
> org.apache.spark.InterruptibleIterator#
> scala.collection.IndexedSeqLike$Elements#
> scala.collection.mutable.ArrayOps$ofRef#
> java.lang.Object[]#
>
>
>
>
> On Thu, Feb 4, 2016 at 7:14 PM, Shixiong(Ryan) Zhu <
> shixi...@databricks.com> wrote:
>
>> Hey Udo,
>>
>> mapWithState usually uses much more memory than updateStateByKey since it
>> caches the states in memory.
>>
>> However, from your description, looks BlockGenerator cannot push data
>> into BlockManager, there may be something wrong in BlockGenerator. Could
>> you share the top 50 objects in the heap dump and the thread dump?
>>
>>
>> On Wed, Feb 3, 2016 at 9:52 AM, Udo Fholl <udofholl1...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I recently migrated from 'updateStateByKey' to 'mapWithState' and now I
>>> see a huge increase of memory. Most of it is a massive "BlockGenerator"
>>> (which points to a massive "ArrayBlockingQueue" that in turns point to a
>>> huge "Object[]").
>>>
>>> I'm pretty sure it has to do with my code, but I barely changed anything
>>> in the code. Just adapted the function.
>>>
>>> Did anyone run into this?
>>>
>>> Best regards,
>>> Udo.
>>>
>>
>>
>

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

Reply via email to