This looks more like the Zookeeper/YARN issues mentioned in the past.
Unfortunately, I do not have a YARN instance to test this with. Does
anyone else have any insights here?
On 1/10/14 1:48 PM, Kristen Hardwick wrote:
Hi all, I'm requesting help again! I'm trying to get this
SimpleShortestPathsComputation example working, but I'm stuck again.
Now the job begins to run and seems to work until the final step (it
performs 3 supersteps), but the overall job is failing.
In the master, among other things, I see:
...
14/01/10 15:04:17 INFO master.MasterThread: setup: Took 0.87 seconds.
14/01/10 15:04:17 INFO master.MasterThread: input superstep: Took
0.708 seconds.
14/01/10 15:04:17 INFO master.MasterThread: superstep 0: Took 0.158
seconds.
14/01/10 15:04:17 INFO master.MasterThread: superstep 1: Took 0.344
seconds.
14/01/10 15:04:17 INFO master.MasterThread: superstep 2: Took 0.064
seconds.
14/01/10 15:04:17 INFO master.MasterThread: shutdown: Took 0.162 seconds.
14/01/10 15:04:17 INFO master.MasterThread: total: Took 2.31 seconds.
14/01/10 15:04:17 INFO yarn.GiraphYarnTask: Master is ready to commit
final job output data.
14/01/10 15:04:18 INFO yarn.GiraphYarnTask: Master has committed the
final job output data.
...
To me, that looks promising - like the job was successful. However, in
the WORKER_ONLY containers, I see these things:
...
14/01/10 15:04:17 INFO graph.GraphTaskManager: cleanup: Starting for
WORKER_ONLY
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and
unprocessed event
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_addressesAndPartitions,
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent :
partitionExchangeChildrenChanged (at least one worker is done sending
partitions)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and
unprocessed event
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_superstepFinished,
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 INFO netty.NettyClient: stop: reached wait
threshold, 1 connections closed, releasing NettyClient.bootstrap
resources now.
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job
state changed, checking to see if it needs to restart
14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already
exists
(/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState)
14/01/10 15:04:17 INFO yarn.GiraphYarnTask: [STATUS: task-1]
saveVertices: Starting to save 2 vertices using 1 threads
14/01/10 15:04:17 INFO worker.BspServiceWorker: saveVertices: Starting
to save 2 vertices using 1 threads
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job
state changed, checking to see if it needs to restart
14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already
exists
(/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState)
14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state path is
empty! -
/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState
14/01/10 15:04:17 ERROR zookeeper.ClientCnxn: Error while calling watcher
java.lang.NullPointerException
at java.io.StringReader.<init>(StringReader.java:50)
at org.json.JSONTokener.<init>(JSONTokener.java:66)
at org.json.JSONObject.<init>(JSONObject.java:402)
at
org.apache.giraph.bsp.BspService.getJobState(BspService.java:716)
at
org.apache.giraph.worker.BspServiceWorker.processEvent(BspServiceWorker.java:1563)
at org.apache.giraph.bsp.BspService.process(BspService.java:1095)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and
unprocessed event
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_vertexInputSplitsAllReady,
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and
unprocessed event
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_addressesAndPartitions,
type=NodeDeleted, state=SyncConnected)
14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent :
partitionExchangeChildrenChanged (at least one worker is done sending
partitions)
14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and
unprocessed event
(path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_superstepFinished,
type=NodeDeleted, state=SyncConnected)
...
14/01/10 15:04:17 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/user/spry/Shortest/_temporary/1/_temporary/attempt_1389300168420_0024_m_000001_1/part-m-00001:
File does not exist. Holder DFSClient_NONMAPREDUCE_-643344145_1 does
not have any open files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2755)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2567)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2480)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
...
I apologize for the wall of error message, but I tried to leave in at
least some of the parts that might be useful. I put the entire YARN
log here: http://tny.cz/af229738
Has anyone ever seen this before? This is the command I'm using to run:
hadoop jar
giraph-core/target/giraph-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner -Dgiraph.SplitMasterWorker=false
-Dgiraph.zkList="localhost:2181" -Dgiraph.zkSessionMsecTimeout=600000
-Dgiraph.useInputSplitLocality=false
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
/user/spry/input -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/spry/Shortest -w 1
My setup is still the same as the other email if you saw it:
I compiled Giraph with this command, and everything built successfully
except "Apache Giraph Distribution" which it doesn't seem like I need:
mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests clean package
I am running with the following components:
Single node cluster
Giraph 1.1
Hadoop 2.2.0 (Hortonworks)
Java 1.7.0_45
Thanks in advance,
-Kristen Hardwick