Hi folks,
          We are successfully able to run Giraph for 1B vertices and around
20B edges in our cluster. This is great. But when we run it over 5B
vertices over the actual data and around 50B edges we see some issues in
the final step while offloading the partitions. Since the dataset is huge
for our cluster, we are using giraph.useOutOfCoreGraph and
giraph.useOutOfCoreMessages
to spill the data when overloaded.With this setup all the supersteps
finished within around 4 hours. But in the final step after reporting
saving vertices in task status, it hangs after writing a few partitions, it
is happening consistently in our case. I played with all the config params
and nothing is helping out, any suggestions from you will be really
helpful. Thanks a lot.

 The log snippet:

2013-10-14 10:24:20,144 INFO
org.apache.giraph.worker.BspServiceWorker: saveVertices: Starting to
save 26146422 vertices
2013-10-14 10:24:20,183 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 1922 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-1922_vertices
2013-10-14 10:24:20,307 WARN org.apache.giraph.bsp.BspService:
process: Unknown and unprocessed event
(path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_addressesAndPartitions,
type=NodeDeleted, state=SyncConnected)
2013-10-14 10:24:20,431 WARN org.apache.giraph.bsp.BspService:
process: Unknown and unprocessed event
(path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_superstepFinished,
type=NodeDeleted, state=SyncConnected)
2013-10-14 10:24:20,555 INFO
org.apache.giraph.worker.BspServiceWorker: processEvent: Job state
changed, checking to see if it needs to restart
2013-10-14 10:24:20,640 INFO org.apache.giraph.bsp.BspService:
getJobState: Job state already exists
(/_hadoopBsp/job_201310130212_0013/_masterJobState)
2013-10-14 10:24:22,928 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 13762 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-13762_vertices
2013-10-14 10:24:27,648 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 23682 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-23682_vertices
2013-10-14 10:24:30,557 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 14882 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-14882_vertices
2013-10-14 10:24:32,935 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 11842 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11842_vertices
2013-10-14 10:24:33,714 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 962 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-962_vertices
2013-10-14 10:24:35,184 INFO
org.apache.giraph.worker.BspServiceWorker: saveVertices: Saved 978047
out of 26146422 vertices, on partition 5 out of 160
2013-10-14 10:24:35,187 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 22722 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-22722_vertices
2013-10-14 10:24:37,276 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 21762 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-21762_vertices
2013-10-14 10:24:39,868 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 11362 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11362_vertices
2013-10-14 10:24:41,391 INFO
org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 482 to
/mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-482_vertices

------------------------------


*The error show in the job failure page for each attempt*



FAILED

Task attempt_201310130212_0013_m_000001_0 failed to report status for
7200 seconds. Killing!


-- 
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

San Francisco, CA 94158

Reply via email to