Re: Out of core execution has no effect on GC crash

Alexander Asplund Tue, 10 Sep 2013 15:27:17 -0700

Correction: the computation does not actually stall - it does
complains a bit that the directories cannot be created and then
eventually moves to the next superstep. I guess this means I'm
actually fitting all the data in memory?


On 9/10/13, Alexander Asplund <alexaspl...@gmail.com> wrote:
> Thanks, disabling GC overhead limit did the trick!
>
> I did however run into another issue - the computation ends up
> stalling when it tries to write partitions to disk. All the workers
> keep sending out messages about DiskBackedPartitionStore failed to
> create directory _bsp/_partitions/_jobxxxxx/part-vertices-xxx
>
> On 9/10/13, Claudio Martella <claudio.marte...@gmail.com> wrote:
>> As David mentions, even with OOC, the objects are still created (and yes,
>> often soon destroyed after spilled to disk) putting pressure on the GC.
>> Moreover, with the increase in size of the graph, the number of in-memory
>> vertices is not the only increasing chunk of memory, as there are other
>> memory stores around the codebase that get filled, such as caches etc.
>>
>> Try increasing the heap to something reasonable for your machines.
>>
>>
>> On Tue, Sep 10, 2013 at 3:21 AM, David Boyd
>> <db...@data-tactics-corp.com>wrote:
>>
>>> Alexander:
>>>     You might try turning off the GC Overhead limit
>>> (-XX:-UseGCOverheadLimit)
>>> Also you could turn on verbose GC logging (-verbose:gc
>>> -Xloggc:/tmp/@taskid@.gc)
>>> to see what is happening.
>>> Because the OOC still has to create and destroy objects I suspect that
>>> the
>>> heap is just
>>> getting really fragmented.
>>>
>>> There are options that you can set with Java to change the type of
>>> garbage
>>> collection and
>>> how it is scheduled as well.
>>>
>>> You might up the heap size slightly - what is the default heap size on
>>> your cluster?
>>>
>>>
>>> On 9/9/2013 8:33 PM, Alexander Asplund wrote:
>>>
>>>> A small note: I'm not seeing any partitions directory being formed
>>>> under _bsp, which is where I have understood that they should be
>>>> appearing.
>>>>
>>>> On 9/10/13, Alexander Asplund <alexaspl...@gmail.com> wrote:
>>>>
>>>>> Really appreciate the swift responses! Thanks again.
>>>>>
>>>>> I have not both increased mapper tasks and decreased max number of
>>>>> partitions at the same time. I first did tests with increased Mapper
>>>>> heap available, but reset the setting after it apparently caused
>>>>> other, large volume, non-Giraph jobs to crash nodes when reducers also
>>>>> were running.
>>>>>
>>>>> I'm curious why increasing mapper heap is a requirement. Shouldn't the
>>>>> OOC mode be able to work with the amount of heap that is available? Is
>>>>> there some agreement on the minimum amount of heap necessary for OOC
>>>>> to succeed, to guide the choice of Mapper heap amount?
>>>>>
>>>>> Either way, I will try increasing mapper heap again as much as
>>>>> possible, which hopefully will run.
>>>>>
>>>>> On 9/9/13, Claudio Martella <claudio.marte...@gmail.com> wrote:
>>>>>
>>>>>> did you extend the heap available to the mapper tasks? e.g. through
>>>>>> mapred.child.java.opts.
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund
>>>>>> <alexaspl...@gmail.com>wrote:
>>>>>>
>>>>>>  Thanks for the reply.
>>>>>>>
>>>>>>> I tried setting giraph.maxPartitionsInMemory to 1, but I'm still
>>>>>>> getting OOM: GC limit exceeded.
>>>>>>>
>>>>>>> Are there any particular cases the OOC will not be able to handle,
>>>>>>> or
>>>>>>> is it supposed to work in all cases? If the latter, it might be that
>>>>>>> I
>>>>>>> have made some configuration error.
>>>>>>>
>>>>>>> I do have one concern that might indicateI have done something
>>>>>>> wrong:
>>>>>>> to allow OOC to activate without crashing I had to modify the trunk
>>>>>>> code. This was because Giraph relied on guava-12 and
>>>>>>> DiskBackedPartitionStore used hasInt() - a method which does not
>>>>>>> exist
>>>>>>> in guava-11 which hadoop 2 depends on. At runtime guava 11 was being
>>>>>>> used
>>>>>>>
>>>>>>> I suppose this problem might indicate I'm running submitting the job
>>>>>>> using the wrong binary. Currently I am including the giraph
>>>>>>> dependencies with the jar, and running using hadoop jar.
>>>>>>>
>>>>>>> On 9/7/13, Claudio Martella <claudio.marte...@gmail.com> wrote:
>>>>>>>
>>>>>>>> OOC is used also at input superstep. try to decrease the number of
>>>>>>>> partitions kept in memory.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund
>>>>>>>> <alexaspl...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>  Hi,
>>>>>>>>>
>>>>>>>>> I'm trying to process a graph that is about 3 times the size of
>>>>>>>>> available memory. On the other hand, there is plenty of disk
>>>>>>>>> space.
>>>>>>>>> I
>>>>>>>>> have enabled the giraph.useOutOfCoreGraph property, but it still
>>>>>>>>> crashes with outOfMemoryError: GC limit exceeded when I try
>>>>>>>>> running
>>>>>>>>> my
>>>>>>>>> job.
>>>>>>>>>
>>>>>>>>> I'm wondering of the spilling is supposed to work during the input
>>>>>>>>> step. If so, are there any additional steps that must be taken to
>>>>>>>>> ensure it functions?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Alexander Asplund
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>     Claudio Martella
>>>>>>>>     claudio.marte...@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Alexander Asplund
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>     Claudio Martella
>>>>>>     claudio.marte...@gmail.com
>>>>>>
>>>>>>
>>>>> --
>>>>> Alexander Asplund
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> ========= mailto:db...@data-tactics.com ============
>>> David W. Boyd
>>> Director, Engineering
>>> 7901 Jones Branch, Suite 700
>>> Mclean, VA 22102
>>> office:   +1-571-279-2122
>>> fax:     +1-703-506-6703
>>> cell:     +1-703-402-7908
>>> ==============
>>> http://www.data-tactics.com.**com/<http://www.data-tactics.com.com/>============
>>> First Robotic Mentor - FRC, FTC - www.iliterobotics.org
>>> President - USSTEM Foundation - www.usstem.org
>>>
>>> The information contained in this message may be privileged
>>> and/or confidential and protected from disclosure.
>>> If the reader of this message is not the intended recipient
>>> or an employee or agent responsible for delivering this message
>>> to the intended recipient, you are hereby notified that any
>>> dissemination, distribution or copying of this communication
>>> is strictly prohibited.  If you have received this communication
>>> in error, please notify the sender immediately by replying to
>>> this message and deleting the material from any computer.
>>>
>>>
>>>
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.marte...@gmail.com
>>
>
>
> --
> Alexander Asplund
>


-- 
Alexander Asplund

Re: Out of core execution has no effect on GC crash

Reply via email to