The latest trunk compiled without the need not change any interfaces apart
from just adding a new exception to one of the class.


On Mon, Oct 14, 2013 at 11:40 AM, Jyotirmoy Sundi <sundi...@gmail.com>wrote:

> Thanks will try that out, rewriting in saveVertices to match the new
> interfaces does not seem too big.
> Did you find out later what might be a potential issues for the same ?
>
> Thanks
> Sund
>
>
> On Mon, Oct 14, 2013 at 11:26 AM, Manuel Lagang <manuellag...@gmail.com>wrote:
>
>> I also had the same issues when I used the out-of-core features, even for
>> trivial datasets, when I used the 1.0.0-RC3 branch. The job would seem to
>> finish all supersteps, but it would hang during the final output of data to
>> HDFS. I found that if I used the latest code in trunk instead (which
>> required some rewriting to match the new interface), then my jobs would
>> finish fine.
>>
>>
>> On Mon, Oct 14, 2013 at 11:13 AM, Jyotirmoy Sundi <sundi...@gmail.com>wrote:
>>
>>> Hi folks,
>>>           We are successfully able to run Giraph for 1B vertices and
>>> around 20B edges in our cluster. This is great. But when we run it over 5B
>>> vertices over the actual data and around 50B edges we see some issues in
>>> the final step while offloading the partitions. Since the dataset is huge
>>> for our cluster, we are using giraph.useOutOfCoreGraph and 
>>> giraph.useOutOfCoreMessages
>>> to spill the data when overloaded.With this setup all the supersteps
>>> finished within around 4 hours. But in the final step after reporting
>>> saving vertices in task status, it hangs after writing a few partitions, it
>>> is happening consistently in our case. I played with all the config
>>> params and nothing is helping out, any suggestions from you will be really
>>> helpful. Thanks a lot.
>>>
>>>  The log snippet:
>>>
>>> 2013-10-14 10:24:20,144 INFO org.apache.giraph.worker.BspServiceWorker: 
>>> saveVertices: Starting to save 26146422 vertices
>>> 2013-10-14 10:24:20,183 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 1922 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-1922_vertices
>>> 2013-10-14 10:24:20,307 WARN org.apache.giraph.bsp.BspService: process: 
>>> Unknown and unprocessed event 
>>> (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_addressesAndPartitions,
>>>  type=NodeDeleted, state=SyncConnected)
>>> 2013-10-14 10:24:20,431 WARN org.apache.giraph.bsp.BspService: process: 
>>> Unknown and unprocessed event 
>>> (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_superstepFinished,
>>>  type=NodeDeleted, state=SyncConnected)
>>> 2013-10-14 10:24:20,555 INFO org.apache.giraph.worker.BspServiceWorker: 
>>> processEvent: Job state changed, checking to see if it needs to restart
>>> 2013-10-14 10:24:20,640 INFO org.apache.giraph.bsp.BspService: getJobState: 
>>> Job state already exists (/_hadoopBsp/job_201310130212_0013/_masterJobState)
>>> 2013-10-14 10:24:22,928 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 13762 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-13762_vertices
>>> 2013-10-14 10:24:27,648 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 23682 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-23682_vertices
>>> 2013-10-14 10:24:30,557 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 14882 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-14882_vertices
>>> 2013-10-14 10:24:32,935 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 11842 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11842_vertices
>>> 2013-10-14 10:24:33,714 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 962 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-962_vertices
>>> 2013-10-14 10:24:35,184 INFO org.apache.giraph.worker.BspServiceWorker: 
>>> saveVertices: Saved 978047 out of 26146422 vertices, on partition 5 out of 
>>> 160
>>> 2013-10-14 10:24:35,187 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 22722 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-22722_vertices
>>> 2013-10-14 10:24:37,276 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 21762 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-21762_vertices
>>> 2013-10-14 10:24:39,868 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 11362 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11362_vertices
>>> 2013-10-14 10:24:41,391 INFO 
>>> org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: 
>>> writing partition vertices 482 to 
>>> /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-482_vertices
>>>
>>> ------------------------------
>>>
>>>
>>> *The error show in the job failure page for each attempt*
>>>
>>>
>>>
>>> FAILED
>>>
>>>
>>> Task attempt_201310130212_0013_m_000001_0 failed to report status for 7200 
>>> seconds. Killing!
>>>
>>>
>>> --
>>> Best Regards,
>>> Jyotirmoy Sundi
>>> Data Engineer,
>>> Admobius
>>>
>>> San Francisco, CA 94158
>>>
>>
>>
>
>
> --
> Best Regards,
> Jyotirmoy Sundi
> Data Engineer,
> Admobius
>
> San Francisco, CA 94158
>



-- 
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

San Francisco, CA 94158

Reply via email to