Thanks, disabling GC overhead limit did the trick! I did however run into another issue - the computation ends up stalling when it tries to write partitions to disk. All the workers keep sending out messages about DiskBackedPartitionStore failed to create directory _bsp/_partitions/_jobxxxxx/part-vertices-xxx
On 9/10/13, Claudio Martella <claudio.marte...@gmail.com> wrote: > As David mentions, even with OOC, the objects are still created (and yes, > often soon destroyed after spilled to disk) putting pressure on the GC. > Moreover, with the increase in size of the graph, the number of in-memory > vertices is not the only increasing chunk of memory, as there are other > memory stores around the codebase that get filled, such as caches etc. > > Try increasing the heap to something reasonable for your machines. > > > On Tue, Sep 10, 2013 at 3:21 AM, David Boyd > <db...@data-tactics-corp.com>wrote: > >> Alexander: >> You might try turning off the GC Overhead limit >> (-XX:-UseGCOverheadLimit) >> Also you could turn on verbose GC logging (-verbose:gc >> -Xloggc:/tmp/@taskid@.gc) >> to see what is happening. >> Because the OOC still has to create and destroy objects I suspect that >> the >> heap is just >> getting really fragmented. >> >> There are options that you can set with Java to change the type of >> garbage >> collection and >> how it is scheduled as well. >> >> You might up the heap size slightly - what is the default heap size on >> your cluster? >> >> >> On 9/9/2013 8:33 PM, Alexander Asplund wrote: >> >>> A small note: I'm not seeing any partitions directory being formed >>> under _bsp, which is where I have understood that they should be >>> appearing. >>> >>> On 9/10/13, Alexander Asplund <alexaspl...@gmail.com> wrote: >>> >>>> Really appreciate the swift responses! Thanks again. >>>> >>>> I have not both increased mapper tasks and decreased max number of >>>> partitions at the same time. I first did tests with increased Mapper >>>> heap available, but reset the setting after it apparently caused >>>> other, large volume, non-Giraph jobs to crash nodes when reducers also >>>> were running. >>>> >>>> I'm curious why increasing mapper heap is a requirement. Shouldn't the >>>> OOC mode be able to work with the amount of heap that is available? Is >>>> there some agreement on the minimum amount of heap necessary for OOC >>>> to succeed, to guide the choice of Mapper heap amount? >>>> >>>> Either way, I will try increasing mapper heap again as much as >>>> possible, which hopefully will run. >>>> >>>> On 9/9/13, Claudio Martella <claudio.marte...@gmail.com> wrote: >>>> >>>>> did you extend the heap available to the mapper tasks? e.g. through >>>>> mapred.child.java.opts. >>>>> >>>>> >>>>> On Tue, Sep 10, 2013 at 12:50 AM, Alexander Asplund >>>>> <alexaspl...@gmail.com>wrote: >>>>> >>>>> Thanks for the reply. >>>>>> >>>>>> I tried setting giraph.maxPartitionsInMemory to 1, but I'm still >>>>>> getting OOM: GC limit exceeded. >>>>>> >>>>>> Are there any particular cases the OOC will not be able to handle, or >>>>>> is it supposed to work in all cases? If the latter, it might be that >>>>>> I >>>>>> have made some configuration error. >>>>>> >>>>>> I do have one concern that might indicateI have done something wrong: >>>>>> to allow OOC to activate without crashing I had to modify the trunk >>>>>> code. This was because Giraph relied on guava-12 and >>>>>> DiskBackedPartitionStore used hasInt() - a method which does not >>>>>> exist >>>>>> in guava-11 which hadoop 2 depends on. At runtime guava 11 was being >>>>>> used >>>>>> >>>>>> I suppose this problem might indicate I'm running submitting the job >>>>>> using the wrong binary. Currently I am including the giraph >>>>>> dependencies with the jar, and running using hadoop jar. >>>>>> >>>>>> On 9/7/13, Claudio Martella <claudio.marte...@gmail.com> wrote: >>>>>> >>>>>>> OOC is used also at input superstep. try to decrease the number of >>>>>>> partitions kept in memory. >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 7, 2013 at 1:37 AM, Alexander Asplund >>>>>>> <alexaspl...@gmail.com>wrote: >>>>>>> >>>>>>> Hi, >>>>>>>> >>>>>>>> I'm trying to process a graph that is about 3 times the size of >>>>>>>> available memory. On the other hand, there is plenty of disk space. >>>>>>>> I >>>>>>>> have enabled the giraph.useOutOfCoreGraph property, but it still >>>>>>>> crashes with outOfMemoryError: GC limit exceeded when I try running >>>>>>>> my >>>>>>>> job. >>>>>>>> >>>>>>>> I'm wondering of the spilling is supposed to work during the input >>>>>>>> step. If so, are there any additional steps that must be taken to >>>>>>>> ensure it functions? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Alexander Asplund >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Claudio Martella >>>>>>> claudio.marte...@gmail.com >>>>>>> >>>>>>> >>>>>> -- >>>>>> Alexander Asplund >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Claudio Martella >>>>> claudio.marte...@gmail.com >>>>> >>>>> >>>> -- >>>> Alexander Asplund >>>> >>>> >>> >> >> -- >> ========= mailto:db...@data-tactics.com ============ >> David W. Boyd >> Director, Engineering >> 7901 Jones Branch, Suite 700 >> Mclean, VA 22102 >> office: +1-571-279-2122 >> fax: +1-703-506-6703 >> cell: +1-703-402-7908 >> ============== >> http://www.data-tactics.com.**com/<http://www.data-tactics.com.com/>============ >> First Robotic Mentor - FRC, FTC - www.iliterobotics.org >> President - USSTEM Foundation - www.usstem.org >> >> The information contained in this message may be privileged >> and/or confidential and protected from disclosure. >> If the reader of this message is not the intended recipient >> or an employee or agent responsible for delivering this message >> to the intended recipient, you are hereby notified that any >> dissemination, distribution or copying of this communication >> is strictly prohibited. If you have received this communication >> in error, please notify the sender immediately by replying to >> this message and deleting the material from any computer. >> >> >> > > > > -- > Claudio Martella > claudio.marte...@gmail.com > -- Alexander Asplund