This looks more like the Zookeeper/YARN issues mentioned in the past. Unfortunately, I do not have a YARN instance to test this with. Does anyone else have any insights here?

On 1/10/14 1:48 PM, Kristen Hardwick wrote:
Hi all, I'm requesting help again! I'm trying to get this SimpleShortestPathsComputation example working, but I'm stuck again. Now the job begins to run and seems to work until the final step (it performs 3 supersteps), but the overall job is failing.

In the master, among other things, I see:

...
14/01/10 15:04:17 INFO master.MasterThread: setup: Took 0.87 seconds.
14/01/10 15:04:17 INFO master.MasterThread: input superstep: Took 0.708 seconds. 14/01/10 15:04:17 INFO master.MasterThread: superstep 0: Took 0.158 seconds. 14/01/10 15:04:17 INFO master.MasterThread: superstep 1: Took 0.344 seconds. 14/01/10 15:04:17 INFO master.MasterThread: superstep 2: Took 0.064 seconds.
14/01/10 15:04:17 INFO master.MasterThread: shutdown: Took 0.162 seconds.
14/01/10 15:04:17 INFO master.MasterThread: total: Took 2.31 seconds.
14/01/10 15:04:17 INFO yarn.GiraphYarnTask: Master is ready to commit final job output data. 14/01/10 15:04:18 INFO yarn.GiraphYarnTask: Master has committed the final job output data.
...

To me, that looks promising - like the job was successful. However, in the WORKER_ONLY containers, I see these things:

...
14/01/10 15:04:17 INFO graph.GraphTaskManager: cleanup: Starting for WORKER_ONLY 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_addressesAndPartitions, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent : partitionExchangeChildrenChanged (at least one worker is done sending partitions) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_superstepFinished, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 INFO netty.NettyClient: stop: reached wait threshold, 1 connections closed, releasing NettyClient.bootstrap resources now. 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState) 14/01/10 15:04:17 INFO yarn.GiraphYarnTask: [STATUS: task-1] saveVertices: Starting to save 2 vertices using 1 threads 14/01/10 15:04:17 INFO worker.BspServiceWorker: saveVertices: Starting to save 2 vertices using 1 threads 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState) 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state path is empty! - /_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState
14/01/10 15:04:17 ERROR zookeeper.ClientCnxn: Error while calling watcher
java.lang.NullPointerException
        at java.io.StringReader.<init>(StringReader.java:50)
        at org.json.JSONTokener.<init>(JSONTokener.java:66)
        at org.json.JSONObject.<init>(JSONObject.java:402)
at org.apache.giraph.bsp.BspService.getJobState(BspService.java:716) at org.apache.giraph.worker.BspServiceWorker.processEvent(BspServiceWorker.java:1563)
        at org.apache.giraph.bsp.BspService.process(BspService.java:1095)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_vertexInputSplitsAllReady, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_addressesAndPartitions, type=NodeDeleted, state=SyncConnected) 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent : partitionExchangeChildrenChanged (at least one worker is done sending partitions) 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_superstepFinished, type=NodeDeleted, state=SyncConnected)
...
14/01/10 15:04:17 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/spry/Shortest/_temporary/1/_temporary/attempt_1389300168420_0024_m_000001_1/part-m-00001: File does not exist. Holder DFSClient_NONMAPREDUCE_-643344145_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2755) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2567) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2480) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
...

I apologize for the wall of error message, but I tried to leave in at least some of the parts that might be useful. I put the entire YARN log here: http://tny.cz/af229738

Has anyone ever seen this before? This is the command I'm using to run:

hadoop jar giraph-core/target/giraph-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.SplitMasterWorker=false -Dgiraph.zkList="localhost:2181" -Dgiraph.zkSessionMsecTimeout=600000 -Dgiraph.useInputSplitLocality=false org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/spry/input -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/spry/Shortest -w 1

My setup is still the same as the other email if you saw it:

I compiled Giraph with this command, and everything built successfully except "Apache Giraph Distribution" which it doesn't seem like I need:

mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests clean package

I am running with the following components:

Single node cluster
Giraph 1.1
Hadoop 2.2.0 (Hortonworks)
Java 1.7.0_45

Thanks in advance,
-Kristen Hardwick


Reply via email to