Hi all, I'm seeing the same problem. I'm pasting here part of the logs that looks more relevant in case it helps. This appears on the log of every hadoop slave node.
2013-09-23 12:34:29,908 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000 2013-09-23 12:34:29,908 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000 2013-09-23 12:34:29,917 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/20, overall roughly 0.0% input splits reserved 2013-09-23 12:34:29,919 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/20 from ZooKeeper and got input split 'hdfs://hadoop-master:54310/<some path>/part-r-00002:402653184+14270392' 2013-09-23 12:34:29,935 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is available 2013-09-23 12:34:29,935 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library loaded 2013-09-23 12:34:31,491 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/20 (v=9209, e=782750) 2013-09-23 12:34:31,496 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/131, overall roughly 0.71428573% input splits reserved 2013-09-23 12:34:31,497 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/131 from ZooKeeper and got input split 'hdfs://hadoop-master:54310/home/<some path>/part-r-00018:335544320+67108864' 2013-09-23 12:34:35,374 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/131 (v=44211, e=3680393) 2013-09-23 12:34:35,378 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/113, overall roughly 1.4285715% input splits reserved 2013-09-23 12:34:35,378 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/113 from ZooKeeper and got input split 'hdfs://hadoop-master:54310/home/<some path>/part-r-00016:67108864+67108864' 2013-09-23 12:34:38,161 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-365_vertices 2013-09-23 12:34:38,171 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-245_vertices 2013-09-23 12:34:38,181 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-25_vertices 2013-09-23 12:34:38,190 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-85_vertices 2013-09-23 12:34:38,205 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-345_vertices 2013-09-23 12:34:38,216 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-285_vertices 2013-09-23 12:34:38,228 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-205_vertices 2013-09-23 12:34:38,240 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-265_vertices 2013-09-23 12:34:38,255 ERROR org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: Failed to create directory _bsp/_partitions/job_201307021917_1469/partition-5_vertices 2013-09-23 12:34:38,834 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/113 (v=43776, e=3684432) 2013-09-23 12:34:38,838 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201307021917_1469/_vertexInputSplitDir/1, overall roughly 2.857143% input splits reserved On Fri, Sep 13, 2013 at 11:27 AM, Claudio Martella <claudio.marte...@gmail.com> wrote: > I have no idea without the logs, especially when it happens rarely. > > > On Fri, Sep 13, 2013 at 12:33 AM, Alexander Asplund <alexaspl...@gmail.com> > wrote: >> >> Actually, why is it saying it fails to create directory in the first >> place, when it is trying to write files? >> >> On Sep 12, 2013 3:04 PM, "Alexander Asplund" <alexaspl...@gmail.com> >> wrote: >>> >>> I can also add that there is no such issue with DiskBackedMessageStore. >>> It successfully creates a large number of store files, and never starts >>> failing. >>> >>> On Sep 12, 2013 2:11 PM, "Alexander Asplund" <alexaspl...@gmail.com> >>> wrote: >>>> >>>> It's very strange.. it is definitely failing on some partitions.. >>>> currently the disk size of a offloading worker corresponda about to the >>>> size >>>> of its part of the graph... but the worker attempts to create additional >>>> partitions, and this fails. >>>> >>>> On Sep 12, 2013 2:07 PM, "Alexander Asplund" <alexaspl...@gmail.com> >>>> wrote: >>>>> >>>>> Actually, I take that back. It seems it does succeeded in creating >>>>> partitions - it just struggles with it sometimes. Should I be worried >>>>> about >>>>> these errors if partition directories seem to be filling up? >>>>> >>>>> On Sep 11, 2013 6:38 PM, "Claudio Martella" >>>>> <claudio.marte...@gmail.com> wrote: >>>>>> >>>>>> Giraph does not offload partitions or messages to HDFS in the >>>>>> out-of-core module. It uses local disk on the computing nodes. By >>>>>> defualt, >>>>>> it uses the tasktracker local directory where for example the distributed >>>>>> cache is stored. >>>>>> >>>>>> Could you provide the stacktrace Giraph is spitting when failing? >>>>>> >>>>>> >>>>>> On Thu, Sep 12, 2013 at 12:54 AM, Alexander Asplund >>>>>> <alexaspl...@gmail.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm still trying to get Giraph to work on a graph that requires more >>>>>>> memory that is available. The problem is that when the Workers try to >>>>>>> offload partitions, the offloading fails. The >>>>>>> DiskBackedPartitionStore >>>>>>> fails to create the directory >>>>>>> _bsp/_partitions/job-xxxx/part-vertices-xxx (roughly from recall). >>>>>>> >>>>>>> The input or computation will then continue for a while, which I >>>>>>> believe is because it is still managing to hold everything in memory >>>>>>> - >>>>>>> but at some point it reaches the limit where there simply is no more >>>>>>> heap space, and it crashes with OOM. >>>>>>> >>>>>>> Has anybody had this problem with giraph failing to make HDFS >>>>>>> directories? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Claudio Martella >>>>>> claudio.marte...@gmail.com > > > > > -- > Claudio Martella > claudio.marte...@gmail.com