Thanks Hassan. I have removed the checkpointing, still getting a different
error
*Script :*
hadoop jar
/usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000
-Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021
-Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m
org.apache.giraph.examples.ConnectedComponentsComputation -vif
org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/test/ouput_10M -w 5 -ca
giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=5,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.useOutOfCoreMessages=true,giraph.useOutOfCoreGraph=true
*Exception:*
2016-05-15 05:34:28,113 INFO [ooc-io-0]
org.apache.giraph.ooc.OutOfCoreIOCallable: call: execution of IO
command LoadPartitionIOCommand: (partitionId = 107, superstep = 0)
failed!
2016-05-15 05:34:28,114 ERROR [ooc-io-0]
org.apache.giraph.utils.LogStacktraceCallable: Execution of callable
failed
java.lang.RuntimeException: java.io.EOFException
at
org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:76)
at
org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:30)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:47)
at
org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutEdges(DiskBackedPartitionStore.java:286)
at
org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMemoryPartitionData(DiskBackedPartitionStore.java:329)
at
org.apache.giraph.ooc.data.OutOfCoreDataManager.loadPartitionData(OutOfCoreDataManager.java:195)
at
org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPartitionData(DiskBackedPartitionStore.java:360)
at
org.apache.giraph.ooc.io.LoadPartitionIOCommand.execute(LoadPartitionIOCommand.java:64)
at
org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:72)
... 6 more
2016-05-15 05:34:28,117 INFO [ooc-io-0]
org.apache.giraph.ooc.OutOfCoreIOCallableFactory: afterExecute: an
out-of-core thread terminated unexpectedly with
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.io.EOFException
2016-05-15 05:34:28,441 INFO [compute-0]
org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition:
processing partition 117 is done!
2016-05-15 05:34:29,111 INFO [compute-0]
org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition:
processing partition 27 is done!
2016-05-15 05:34:29,620 INFO [compute-0]
org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition:
processing partition 127 is done!
2016-05-15 05:34:30,123 INFO [compute-0]
org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition:
processing partition 22 is done!
2016-05-15 05:34:30,123 INFO [compute-0]
org.apache.giraph.ooc.FixedOutOfCoreEngine: getNextPartition: waiting
until a partition becomes available!
2016-05-15 05:34:31,123 ERROR [compute-0]
org.apache.giraph.utils.LogStacktraceCallable: Execution of callable
failed
java.lang.RuntimeException: Job Failed due to a failure in an
out-of-core IO thread
at
org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81)
at
org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187)
at
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:69)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-15 05:34:31,124 ERROR [main]
org.apache.giraph.graph.GraphMapper: Caught an unrecoverable exception
Exception occurred
java.lang.IllegalStateException: Exception occurred
at
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:253)
at
org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:761)
at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:349)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Job Failed due to a failure in an
out-of-core IO thread
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:250)
... 10 more
Caused by: java.lang.RuntimeException: Job Failed due to a failure in
an out-of-core IO thread
at
org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81)
at
org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187)
at
org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:69)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-05-15 05:34:31,125 ERROR [main]
org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got
failure, unregistering health on
/_hadoopBsp/job_1463146675144_0036/_applicationAttemptsDir/0/_superstepDir/0/_workerHealthyDir/ip-172-31-37-39.eu-west-1.compute.internal_2
on superstep 0
On Sun, May 15, 2016 at 3:54 AM, Hassan Eslami <[email protected]> wrote:
> Hi Ramesh!
>
> Thanks for bringing this up, and thanks for trying out the new out-of-core
> mechanism. The new out-of-core mechanism has not been integrated with
> checkpointing yet. This is part of an ongoing project, and we should have
> the integration within a few weeks. In the meantime, you can try
> out-of-core without checkpointing enabled.
>
> Best,
> Hassan
>
>
> On Saturday, May 14, 2016, Ramesh Krishnan <[email protected]>
> wrote:
>
>> PFA the correct logs for the concurrent exception
>>
>> 2016-05-14 19:10:55,733 ERROR [ooc-io-0]
>> org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed
>> java.lang.RuntimeException: java.io.EOFException
>> at
>> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:76)
>> at
>> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:30)
>> at
>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.EOFException
>> at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:47)
>> at
>> org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutEdges(DiskBackedPartitionStore.java:286)
>> at
>> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMemoryPartitionData(DiskBackedPartitionStore.java:329)
>> at
>> org.apache.giraph.ooc.data.OutOfCoreDataManager.loadPartitionData(OutOfCoreDataManager.java:195)
>> at
>> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPartitionData(DiskBackedPartitionStore.java:360)
>> at
>> org.apache.giraph.ooc.io.LoadPartitionIOCommand.execute(LoadPartitionIOCommand.java:64)
>> at
>> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:72)
>> ... 6 more
>> 2016-05-14 19:10:55,737 INFO [ooc-io-0]
>> org.apache.giraph.ooc.OutOfCoreIOCallableFactory: afterExecute: an
>> out-of-core thread terminated unexpectedly with
>> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>> java.io.EOFException
>> 2016-05-14 19:10:55,739 INFO [checkpoint-vertices-7]
>> org.apache.giraph.ooc.FixedOutOfCoreEngine: getNextPartition: waiting until
>> a partition becomes available!
>> 2016-05-14 19:10:56,426 ERROR [checkpoint-vertices-6]
>> org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed
>> java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO
>> thread
>> at
>> org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81)
>> at
>> org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187)
>> at
>> org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1398)
>> at
>> org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1392)
>> at
>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> On Sun, May 15, 2016 at 1:02 AM, Ramesh Krishnan <[email protected]
>> > wrote:
>>
>>>
>>> Hi Team,
>>>
>>> I have the latest build of giraph running on a 5 node cluster. When i
>>> try to use OutofCore Graph option for a huge data set like 600Milion edges
>>> i am running into
>>> the following exception. Please find below the script being executed and
>>> the exception logs. I have tried all possible ways and could not avoid this
>>> issue , i am really in need of your help.
>>>
>>> *Script:*hadoop jar
>>> /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar
>>> org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000
>>> -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021
>>> -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m
>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif
>>> org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M
>>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>> /test/ouput_10M -w 5 -ca
>>> giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=10,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.numOutputThreads=10,giraph.useOutOfCoreMessages=true,giraph.numOutputThreads=4,giraph.numInputThreads=4,giraph.useOutOfCoreGraph=true,giraph.cleanupCheckpointsAfterSuccess=true,giraph.checkpointFrequency=1
>>>
>>>
>>>
>>>
>>> *Exception:hadoop jar
>>> /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar
>>> org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000
>>> -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021
>>> -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m
>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif
>>> org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M
>>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>> /test/ouput_10M -w 5 -ca
>>> giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=10,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.numOutputThreads=10,giraph.useOutOfCoreMessages=true,giraph.numOutputThreads=4,giraph.numInputThreads=4,giraph.useOutOfCoreGraph=true,giraph.cleanupCheckpointsAfterSuccess=true,giraph.checkpointFrequency=1*
>>>
>>> *thanks*
>>>
>>> *Ramesh*
>>>
>>>
>>