Thanks Hassan. I have removed the checkpointing, still getting a different error
*Script :* hadoop jar /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000 -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021 -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m org.apache.giraph.examples.ConnectedComponentsComputation -vif org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /test/ouput_10M -w 5 -ca giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=5,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.useOutOfCoreMessages=true,giraph.useOutOfCoreGraph=true *Exception:* 2016-05-15 05:34:28,113 INFO [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallable: call: execution of IO command LoadPartitionIOCommand: (partitionId = 107, superstep = 0) failed! 2016-05-15 05:34:28,114 ERROR [ooc-io-0] org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed java.lang.RuntimeException: java.io.EOFException at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:76) at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:30) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:47) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutEdges(DiskBackedPartitionStore.java:286) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMemoryPartitionData(DiskBackedPartitionStore.java:329) at org.apache.giraph.ooc.data.OutOfCoreDataManager.loadPartitionData(OutOfCoreDataManager.java:195) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPartitionData(DiskBackedPartitionStore.java:360) at org.apache.giraph.ooc.io.LoadPartitionIOCommand.execute(LoadPartitionIOCommand.java:64) at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:72) ... 6 more 2016-05-15 05:34:28,117 INFO [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallableFactory: afterExecute: an out-of-core thread terminated unexpectedly with java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException 2016-05-15 05:34:28,441 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition: processing partition 117 is done! 2016-05-15 05:34:29,111 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition: processing partition 27 is done! 2016-05-15 05:34:29,620 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition: processing partition 127 is done! 2016-05-15 05:34:30,123 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine: doneProcessingPartition: processing partition 22 is done! 2016-05-15 05:34:30,123 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine: getNextPartition: waiting until a partition becomes available! 2016-05-15 05:34:31,123 ERROR [compute-0] org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO thread at org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:69) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-05-15 05:34:31,124 ERROR [main] org.apache.giraph.graph.GraphMapper: Caught an unrecoverable exception Exception occurred java.lang.IllegalStateException: Exception occurred at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:253) at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:761) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:349) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO thread at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:250) ... 10 more Caused by: java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO thread at org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153) at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:69) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-05-15 05:34:31,125 ERROR [main] org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_1463146675144_0036/_applicationAttemptsDir/0/_superstepDir/0/_workerHealthyDir/ip-172-31-37-39.eu-west-1.compute.internal_2 on superstep 0 On Sun, May 15, 2016 at 3:54 AM, Hassan Eslami <hsn.esl...@gmail.com> wrote: > Hi Ramesh! > > Thanks for bringing this up, and thanks for trying out the new out-of-core > mechanism. The new out-of-core mechanism has not been integrated with > checkpointing yet. This is part of an ongoing project, and we should have > the integration within a few weeks. In the meantime, you can try > out-of-core without checkpointing enabled. > > Best, > Hassan > > > On Saturday, May 14, 2016, Ramesh Krishnan <ramesh.154...@gmail.com> > wrote: > >> PFA the correct logs for the concurrent exception >> >> 2016-05-14 19:10:55,733 ERROR [ooc-io-0] >> org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed >> java.lang.RuntimeException: java.io.EOFException >> at >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:76) >> at >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:30) >> at >> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.io.EOFException >> at java.io.DataInputStream.readInt(DataInputStream.java:392) >> at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:47) >> at >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutEdges(DiskBackedPartitionStore.java:286) >> at >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMemoryPartitionData(DiskBackedPartitionStore.java:329) >> at >> org.apache.giraph.ooc.data.OutOfCoreDataManager.loadPartitionData(OutOfCoreDataManager.java:195) >> at >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPartitionData(DiskBackedPartitionStore.java:360) >> at >> org.apache.giraph.ooc.io.LoadPartitionIOCommand.execute(LoadPartitionIOCommand.java:64) >> at >> org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:72) >> ... 6 more >> 2016-05-14 19:10:55,737 INFO [ooc-io-0] >> org.apache.giraph.ooc.OutOfCoreIOCallableFactory: afterExecute: an >> out-of-core thread terminated unexpectedly with >> java.util.concurrent.ExecutionException: java.lang.RuntimeException: >> java.io.EOFException >> 2016-05-14 19:10:55,739 INFO [checkpoint-vertices-7] >> org.apache.giraph.ooc.FixedOutOfCoreEngine: getNextPartition: waiting until >> a partition becomes available! >> 2016-05-14 19:10:56,426 ERROR [checkpoint-vertices-6] >> org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed >> java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO >> thread >> at >> org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81) >> at >> org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187) >> at >> org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1398) >> at >> org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1392) >> at >> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> On Sun, May 15, 2016 at 1:02 AM, Ramesh Krishnan <ramesh.154...@gmail.com >> > wrote: >> >>> >>> Hi Team, >>> >>> I have the latest build of giraph running on a 5 node cluster. When i >>> try to use OutofCore Graph option for a huge data set like 600Milion edges >>> i am running into >>> the following exception. Please find below the script being executed and >>> the exception logs. I have tried all possible ways and could not avoid this >>> issue , i am really in need of your help. >>> >>> *Script:*hadoop jar >>> /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar >>> org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000 >>> -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021 >>> -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m >>> org.apache.giraph.examples.ConnectedComponentsComputation -vif >>> org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M >>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>> /test/ouput_10M -w 5 -ca >>> giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=10,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.numOutputThreads=10,giraph.useOutOfCoreMessages=true,giraph.numOutputThreads=4,giraph.numInputThreads=4,giraph.useOutOfCoreGraph=true,giraph.cleanupCheckpointsAfterSuccess=true,giraph.checkpointFrequency=1 >>> >>> >>> >>> >>> *Exception:hadoop jar >>> /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar >>> org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000 >>> -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021 >>> -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m >>> org.apache.giraph.examples.ConnectedComponentsComputation -vif >>> org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M >>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>> /test/ouput_10M -w 5 -ca >>> giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=10,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.numOutputThreads=10,giraph.useOutOfCoreMessages=true,giraph.numOutputThreads=4,giraph.numInputThreads=4,giraph.useOutOfCoreGraph=true,giraph.cleanupCheckpointsAfterSuccess=true,giraph.checkpointFrequency=1* >>> >>> *thanks* >>> >>> *Ramesh* >>> >>> >>