Correction: the computation does not actually stall - it does complains a bit that the directories cannot be created and then eventually moves to the next superstep. I guess this means I'm actually fitting all the data in memory?
On 9/10/13, Alexander Asplund <alexaspl...@gmail.com> wrote: > Thanks, disabling GC overhead limit did the trick! > > I did however run into another issue - the computation ends up > stalling when it tries to write partitions to disk. All the workers > keep sending out messages about DiskBackedPartitionStore failed to > create directory _bsp/_partitions/_jobxxxxx/part-vertices-xxx > > On 9/10/13, Claudio Martella <claudio.marte...@gmail.com> wrote: >> As David mentions, even with OOC, the objects are still created (and yes, >> often soon destroyed after spilled to disk) putting pressure on the GC. >> Moreover, with the increase in size of the graph, the number of in-memory >> vertices is not the only increasing chunk of memory, as there are other >> memory stores around the codebase that get filled, such as caches etc. >> >> Try increasing the heap to something reasonable for your machines. >> >> >> On Tue, Sep 10, 2013 at 3:21 AM, David Boyd >> <db...@data-tactics-corp.com>wrote: >> >>> Alexander: >>> You might try turning off the GC Overhead limit >>> (-XX:-UseGCOverheadLimit) >>> Also you could turn on verbose GC logging (-verbose:gc >>> -Xloggc:/tmp/@taskid@.gc) >>> to see what is happening. >>> Because the OOC still has to create and destroy objects I suspect that >>> the >>> heap is just >>> getting really fragmented. >>> >>> There are options that you can set with Java to change the type of >>> garbage >>> collection and >>> how it is scheduled as well. >>> >>> You might up the heap size slightly - what is the default heap size on >>> your cluster? >>> >>> >>> On 9/9/2013 8:33 PM, Alexander Asplund wrote: >>> >>>> A small note: I'm not seeing any partitions directory being formed >>>> under _bsp, which is where I have understood that they should be >>>> appearing. >>>> >>>> On 9/10/13, Alexander Asplund <alexaspl...@gmail.com> wrote: >>>> >>>>> Really appreciate the swift responses! Thanks again. >>>>> >>>>> I have not both increased mapper tasks and decreased max number of >>>>> partitions at the same time. I first did tests with increased Mapper >>>>> heap available, but reset the setting after it apparently caused >>>>> other, large volume, non-Giraph jobs to crash nodes when reducers also >>>>> were running. >>>>> >>>>> I'm curious why increasing mapper heap is a requirement. Shouldn't the >>>>> OOC mode be able to work with the amount of heap that is available? Is >>>>> there some agreement on the minimum amount of heap necessary for OOC >>>>> to succeed, to guide the choice of Mapper heap amount? >>>>> >>>>> Either way, I will try increasing mapper heap again as much as >>>>> possible, which hopefully will run. >>>>> >>>>> On 9/9/13, Claudio Martella <claudio.marte...@gmail.com> wrote: >>>>> >>>>>> did you extend the heap available to the mapper tasks? e.g. through >>>>>> mapred.child.java.opts. >>>>>> >>>>>> >>>>>> On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund >>>>>> <alexaspl...@gmail.com>wrote: >>>>>> >>>>>> Thanks for the reply. >>>>>>> >>>>>>> I tried setting giraph.maxPartitionsInMemory to 1, but I'm still >>>>>>> getting OOM: GC limit exceeded. >>>>>>> >>>>>>> Are there any particular cases the OOC will not be able to handle, >>>>>>> or >>>>>>> is it supposed to work in all cases? If the latter, it might be that >>>>>>> I >>>>>>> have made some configuration error. >>>>>>> >>>>>>> I do have one concern that might indicateI have done something >>>>>>> wrong: >>>>>>> to allow OOC to activate without crashing I had to modify the trunk >>>>>>> code. This was because Giraph relied on guava-12 and >>>>>>> DiskBackedPartitionStore used hasInt() - a method which does not >>>>>>> exist >>>>>>> in guava-11 which hadoop 2 depends on. At runtime guava 11 was being >>>>>>> used >>>>>>> >>>>>>> I suppose this problem might indicate I'm running submitting the job >>>>>>> using the wrong binary. Currently I am including the giraph >>>>>>> dependencies with the jar, and running using hadoop jar. >>>>>>> >>>>>>> On 9/7/13, Claudio Martella <claudio.marte...@gmail.com> wrote: >>>>>>> >>>>>>>> OOC is used also at input superstep. try to decrease the number of >>>>>>>> partitions kept in memory. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund >>>>>>>> <alexaspl...@gmail.com>wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm trying to process a graph that is about 3 times the size of >>>>>>>>> available memory. On the other hand, there is plenty of disk >>>>>>>>> space. >>>>>>>>> I >>>>>>>>> have enabled the giraph.useOutOfCoreGraph property, but it still >>>>>>>>> crashes with outOfMemoryError: GC limit exceeded when I try >>>>>>>>> running >>>>>>>>> my >>>>>>>>> job. >>>>>>>>> >>>>>>>>> I'm wondering of the spilling is supposed to work during the input >>>>>>>>> step. If so, are there any additional steps that must be taken to >>>>>>>>> ensure it functions? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Alexander Asplund >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Claudio Martella >>>>>>>> claudio.marte...@gmail.com >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Alexander Asplund >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Claudio Martella >>>>>> claudio.marte...@gmail.com >>>>>> >>>>>> >>>>> -- >>>>> Alexander Asplund >>>>> >>>>> >>>> >>> >>> -- >>> ========= mailto:db...@data-tactics.com ============ >>> David W. Boyd >>> Director, Engineering >>> 7901 Jones Branch, Suite 700 >>> Mclean, VA 22102 >>> office: +1-571-279-2122 >>> fax: +1-703-506-6703 >>> cell: +1-703-402-7908 >>> ============== >>> http://www.data-tactics.com.**com/<http://www.data-tactics.com.com/>============ >>> First Robotic Mentor - FRC, FTC - www.iliterobotics.org >>> President - USSTEM Foundation - www.usstem.org >>> >>> The information contained in this message may be privileged >>> and/or confidential and protected from disclosure. >>> If the reader of this message is not the intended recipient >>> or an employee or agent responsible for delivering this message >>> to the intended recipient, you are hereby notified that any >>> dissemination, distribution or copying of this communication >>> is strictly prohibited. If you have received this communication >>> in error, please notify the sender immediately by replying to >>> this message and deleting the material from any computer. >>> >>> >>> >> >> >> >> -- >> Claudio Martella >> claudio.marte...@gmail.com >> > > > -- > Alexander Asplund > -- Alexander Asplund