The latest trunk compiled without the need not change any interfaces apart from just adding a new exception to one of the class.
On Mon, Oct 14, 2013 at 11:40 AM, Jyotirmoy Sundi <sundi...@gmail.com>wrote: > Thanks will try that out, rewriting in saveVertices to match the new > interfaces does not seem too big. > Did you find out later what might be a potential issues for the same ? > > Thanks > Sund > > > On Mon, Oct 14, 2013 at 11:26 AM, Manuel Lagang <manuellag...@gmail.com>wrote: > >> I also had the same issues when I used the out-of-core features, even for >> trivial datasets, when I used the 1.0.0-RC3 branch. The job would seem to >> finish all supersteps, but it would hang during the final output of data to >> HDFS. I found that if I used the latest code in trunk instead (which >> required some rewriting to match the new interface), then my jobs would >> finish fine. >> >> >> On Mon, Oct 14, 2013 at 11:13 AM, Jyotirmoy Sundi <sundi...@gmail.com>wrote: >> >>> Hi folks, >>> We are successfully able to run Giraph for 1B vertices and >>> around 20B edges in our cluster. This is great. But when we run it over 5B >>> vertices over the actual data and around 50B edges we see some issues in >>> the final step while offloading the partitions. Since the dataset is huge >>> for our cluster, we are using giraph.useOutOfCoreGraph and >>> giraph.useOutOfCoreMessages >>> to spill the data when overloaded.With this setup all the supersteps >>> finished within around 4 hours. But in the final step after reporting >>> saving vertices in task status, it hangs after writing a few partitions, it >>> is happening consistently in our case. I played with all the config >>> params and nothing is helping out, any suggestions from you will be really >>> helpful. Thanks a lot. >>> >>> The log snippet: >>> >>> 2013-10-14 10:24:20,144 INFO org.apache.giraph.worker.BspServiceWorker: >>> saveVertices: Starting to save 26146422 vertices >>> 2013-10-14 10:24:20,183 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 1922 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-1922_vertices >>> 2013-10-14 10:24:20,307 WARN org.apache.giraph.bsp.BspService: process: >>> Unknown and unprocessed event >>> (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_addressesAndPartitions, >>> type=NodeDeleted, state=SyncConnected) >>> 2013-10-14 10:24:20,431 WARN org.apache.giraph.bsp.BspService: process: >>> Unknown and unprocessed event >>> (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_superstepFinished, >>> type=NodeDeleted, state=SyncConnected) >>> 2013-10-14 10:24:20,555 INFO org.apache.giraph.worker.BspServiceWorker: >>> processEvent: Job state changed, checking to see if it needs to restart >>> 2013-10-14 10:24:20,640 INFO org.apache.giraph.bsp.BspService: getJobState: >>> Job state already exists (/_hadoopBsp/job_201310130212_0013/_masterJobState) >>> 2013-10-14 10:24:22,928 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 13762 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-13762_vertices >>> 2013-10-14 10:24:27,648 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 23682 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-23682_vertices >>> 2013-10-14 10:24:30,557 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 14882 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-14882_vertices >>> 2013-10-14 10:24:32,935 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 11842 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11842_vertices >>> 2013-10-14 10:24:33,714 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 962 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-962_vertices >>> 2013-10-14 10:24:35,184 INFO org.apache.giraph.worker.BspServiceWorker: >>> saveVertices: Saved 978047 out of 26146422 vertices, on partition 5 out of >>> 160 >>> 2013-10-14 10:24:35,187 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 22722 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-22722_vertices >>> 2013-10-14 10:24:37,276 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 21762 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-21762_vertices >>> 2013-10-14 10:24:39,868 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 11362 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11362_vertices >>> 2013-10-14 10:24:41,391 INFO >>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: >>> writing partition vertices 482 to >>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-482_vertices >>> >>> ------------------------------ >>> >>> >>> *The error show in the job failure page for each attempt* >>> >>> >>> >>> FAILED >>> >>> >>> Task attempt_201310130212_0013_m_000001_0 failed to report status for 7200 >>> seconds. Killing! >>> >>> >>> -- >>> Best Regards, >>> Jyotirmoy Sundi >>> Data Engineer, >>> Admobius >>> >>> San Francisco, CA 94158 >>> >> >> > > > -- > Best Regards, > Jyotirmoy Sundi > Data Engineer, > Admobius > > San Francisco, CA 94158 > -- Best Regards, Jyotirmoy Sundi Data Engineer, Admobius San Francisco, CA 94158