Thanks will try that out, rewriting in saveVertices to match the new interfaces does not seem too big. Did you find out later what might be a potential issues for the same ?
Thanks Sund On Mon, Oct 14, 2013 at 11:26 AM, Manuel Lagang <manuellag...@gmail.com>wrote: > I also had the same issues when I used the out-of-core features, even for > trivial datasets, when I used the 1.0.0-RC3 branch. The job would seem to > finish all supersteps, but it would hang during the final output of data to > HDFS. I found that if I used the latest code in trunk instead (which > required some rewriting to match the new interface), then my jobs would > finish fine. > > > On Mon, Oct 14, 2013 at 11:13 AM, Jyotirmoy Sundi <sundi...@gmail.com>wrote: > >> Hi folks, >> We are successfully able to run Giraph for 1B vertices and >> around 20B edges in our cluster. This is great. But when we run it over 5B >> vertices over the actual data and around 50B edges we see some issues in >> the final step while offloading the partitions. Since the dataset is huge >> for our cluster, we are using giraph.useOutOfCoreGraph and >> giraph.useOutOfCoreMessages >> to spill the data when overloaded.With this setup all the supersteps >> finished within around 4 hours. But in the final step after reporting >> saving vertices in task status, it hangs after writing a few partitions, it >> is happening consistently in our case. I played with all the config >> params and nothing is helping out, any suggestions from you will be really >> helpful. Thanks a lot. >> >> The log snippet: >> >> 2013-10-14 10:24:20,144 INFO org.apache.giraph.worker.BspServiceWorker: >> saveVertices: Starting to save 26146422 vertices >> 2013-10-14 10:24:20,183 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 1922 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-1922_vertices >> 2013-10-14 10:24:20,307 WARN org.apache.giraph.bsp.BspService: process: >> Unknown and unprocessed event >> (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_addressesAndPartitions, >> type=NodeDeleted, state=SyncConnected) >> 2013-10-14 10:24:20,431 WARN org.apache.giraph.bsp.BspService: process: >> Unknown and unprocessed event >> (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_superstepFinished, >> type=NodeDeleted, state=SyncConnected) >> 2013-10-14 10:24:20,555 INFO org.apache.giraph.worker.BspServiceWorker: >> processEvent: Job state changed, checking to see if it needs to restart >> 2013-10-14 10:24:20,640 INFO org.apache.giraph.bsp.BspService: getJobState: >> Job state already exists (/_hadoopBsp/job_201310130212_0013/_masterJobState) >> 2013-10-14 10:24:22,928 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 13762 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-13762_vertices >> 2013-10-14 10:24:27,648 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 23682 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-23682_vertices >> 2013-10-14 10:24:30,557 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 14882 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-14882_vertices >> 2013-10-14 10:24:32,935 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 11842 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11842_vertices >> 2013-10-14 10:24:33,714 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 962 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-962_vertices >> 2013-10-14 10:24:35,184 INFO org.apache.giraph.worker.BspServiceWorker: >> saveVertices: Saved 978047 out of 26146422 vertices, on partition 5 out of >> 160 >> 2013-10-14 10:24:35,187 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 22722 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-22722_vertices >> 2013-10-14 10:24:37,276 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 21762 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-21762_vertices >> 2013-10-14 10:24:39,868 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 11362 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11362_vertices >> 2013-10-14 10:24:41,391 INFO >> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >> writing partition vertices 482 to >> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-482_vertices >> >> ------------------------------ >> >> >> *The error show in the job failure page for each attempt* >> >> >> >> FAILED >> >> >> Task attempt_201310130212_0013_m_000001_0 failed to report status for 7200 >> seconds. Killing! >> >> >> -- >> Best Regards, >> Jyotirmoy Sundi >> Data Engineer, >> Admobius >> >> San Francisco, CA 94158 >> > > -- Best Regards, Jyotirmoy Sundi Data Engineer, Admobius San Francisco, CA 94158