Hi Claudio.. As I mentioned the error while running the giraph job with checkpointing feature on in previous posts,i could fix one of the errors as mentioned below
Task Id : attempt_201401310947_0001_m_ 000001_0, Status : FAILED org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hduser/_bsp/_checkpoints/job_201401310947_0001/4.kanha-Vostro-1014_1.metadata could only be replicated to 0 nodes, instead of 1 Then again I executed the giraph job,this time it failed with dumping the following error... 14/02/01 23:12:33 INFO job.GiraphJob: run: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201402012227_0003 14/02/01 23:12:58 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer kanha-Vostro-1014:22181 --zkNode /_hadoopBsp/job_201402012227_0003/_haltComputation' 14/02/01 23:12:58 INFO mapred.JobClient: Running job: job_201402012227_0003 14/02/01 23:12:59 INFO mapred.JobClient: map 50% reduce 0% 14/02/01 23:13:02 INFO mapred.JobClient: map 100% reduce 0% 14/02/01 23:13:30 INFO mapred.JobClient: map 50% reduce 0% 14/02/01 23:13:38 INFO mapred.JobClient: Task Id : attempt_201402012227_0003_m_000000_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) attempt_201402012227_0003_m_000000_0: SLF4J: Class path contains multiple SLF4J bindings. attempt_201402012227_0003_m_000000_0: SLF4J: Found binding in [file:/app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201402012227_0003/jars/org/slf4j/impl/StaticLoggerBinder.class] attempt_201402012227_0003_m_000000_0: SLF4J: Found binding in [jar:file:/usr/local/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201402012227_0003_m_000000_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. attempt_201402012227_0003_m_000000_0: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/02/01 23:13:54 INFO mapred.JobClient: map 100% reduce 0% 14/02/01 23:23:48 INFO mapred.JobClient: Task Id : attempt_201402012227_0003_m_000001_0, Status : FAILED java.lang.IllegalStateException: run: Caught an unrecoverable exception createExt: Failed to create /_hadoopBsp/job_201402012227_0003/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir/kanha-Vostro-1014_1 after 3 tries! at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.lang.IllegalStateException: createExt: Failed to create /_hadoopBsp/job_201402012227_0003/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir/kanha-Vostro-1014_1 after 3 tries! at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:182) at org.apache.giraph.worker.BspServiceWorker.writeFinshedSuperstepInfoToZK(BspServiceWorker.java:899) at org.apache.giraph.worker.BspServiceWorker.finishSuperstep(BspServiceWorker.java:769) at org.apache.giraph.graph.GraphTaskManager.completeSuperstepAndCollectStats(GraphTaskManager.java:398) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:289) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more Task attempt_201402012227_0003_m_000001_0 failed to report status for 601 seconds. Killing! attempt_201402012227_0003_m_000001_0: SLF4J: Class path contains multiple SLF4J bindings. attempt_201402012227_0003_m_000001_0: SLF4J: Found binding in [file:/app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201402012227_0003/jars/org/slf4j/impl/StaticLoggerBinder.class] attempt_201402012227_0003_m_000001_0: SLF4J: Found binding in [jar:file:/usr/local/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201402012227_0003_m_000001_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. attempt_201402012227_0003_m_000001_0: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] attempt_201402012227_0003_m_000001_0: log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ClientCnxn). attempt_201402012227_0003_m_000001_0: log4j:WARN Please initialize the log4j system properly. 14/02/01 23:23:49 INFO mapred.JobClient: map 50% reduce 0% 14/02/01 23:24:03 INFO mapred.JobClient: Job complete: job_201402012227_0003 14/02/01 23:24:03 INFO mapred.JobClient: Counters: 5 14/02/01 23:24:03 INFO mapred.JobClient: Job Counters 14/02/01 23:24:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1295731 14/02/01 23:24:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/02/01 23:24:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/02/01 23:24:03 INFO mapred.JobClient: Launched map tasks=4 14/02/01 23:24:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 Seeking your suggestion.. Jyoti On Fri, Jan 31, 2014 at 12:41 PM, Jyoti Yadav <rao.jyoti26ya...@gmail.com>wrote: > Thanks Claudio for your reply.. > I think it is the problem due to less hard disk space. > /app/hadoop/tmp/dfs/name/data this directory is almost full.. > > Should i format my namenode?? Will it create any problem?? > I know if i format ,i will lose all my data residing in hdfs. > Before formatting it,i will take backup of all the input files used to run > giraph job.. > > Seeking your suggestions.. > Thanks > > > On Fri, Jan 31, 2014 at 10:47 AM, Claudio Martella < > claudio.marte...@gmail.com> wrote: > >> >> On Fri, Jan 31, 2014 at 5:58 AM, Jyoti Yadav >> <rao.jyoti26ya...@gmail.com>wrote: >> >>> could only be replicated to 0 nodes, instead of 1 >> >> >> this is not a problem related to giraph, but to hdfs. please see >> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo >> >> >> -- >> Claudio Martella >> >> > >