I have no idea without the logs, especially when it happens rarely.
On Fri, Sep 13, 2013 at 12:33 AM, Alexander Asplund <alexaspl...@gmail.com>wrote: > Actually, why is it saying it fails to create directory in the first > place, when it is trying to write files? > On Sep 12, 2013 3:04 PM, "Alexander Asplund" <alexaspl...@gmail.com> > wrote: > >> I can also add that there is no such issue with DiskBackedMessageStore. >> It successfully creates a large number of store files, and never starts >> failing. >> On Sep 12, 2013 2:11 PM, "Alexander Asplund" <alexaspl...@gmail.com> >> wrote: >> >>> It's very strange.. it is definitely failing on some partitions.. >>> currently the disk size of a offloading worker corresponda about to the >>> size of its part of the graph... but the worker attempts to create >>> additional partitions, and this fails. >>> On Sep 12, 2013 2:07 PM, "Alexander Asplund" <alexaspl...@gmail.com> >>> wrote: >>> >>>> Actually, I take that back. It seems it does succeeded in creating >>>> partitions - it just struggles with it sometimes. Should I be worried about >>>> these errors if partition directories seem to be filling up? >>>> On Sep 11, 2013 6:38 PM, "Claudio Martella" <claudio.marte...@gmail.com> >>>> wrote: >>>> >>>>> Giraph does not offload partitions or messages to HDFS in the >>>>> out-of-core module. It uses local disk on the computing nodes. By defualt, >>>>> it uses the tasktracker local directory where for example the distributed >>>>> cache is stored. >>>>> >>>>> Could you provide the stacktrace Giraph is spitting when failing? >>>>> >>>>> >>>>> On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund < >>>>> alexaspl...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm still trying to get Giraph to work on a graph that requires more >>>>>> memory that is available. The problem is that when the Workers try to >>>>>> offload partitions, the offloading fails. The DiskBackedPartitionStore >>>>>> fails to create the directory >>>>>> _bsp/_partitions/job-xxxx/part-vertices-xxx (roughly from recall). >>>>>> >>>>>> The input or computation will then continue for a while, which I >>>>>> believe is because it is still managing to hold everything in memory - >>>>>> but at some point it reaches the limit where there simply is no more >>>>>> heap space, and it crashes with OOM. >>>>>> >>>>>> Has anybody had this problem with giraph failing to make HDFS >>>>>> directories? >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Claudio Martella >>>>> claudio.marte...@gmail.com >>>>> >>>> -- Claudio Martella claudio.marte...@gmail.com