Hi Avery (or anyone else that knows),

Could you please give me some details that would help me find the past
threads that might address this issue? I searched Google with various
combinations of "giraph datastreamer exception yarn lease expired
zookeeper" and didn't really come up with anything that seemed relevant.

Is it possible that it's just a memory issue on my end? I'm running inside
a VM - a single node cluster with 8 GB of memory allocated to it. Could
that have anything to do with it? Right now I'm investigating the code to
try to lower the amount of memory allocated to the containers.

Thanks,
Kristen


On Fri, Jan 10, 2014 at 8:45 PM, Avery Ching <ach...@apache.org> wrote:

>  This looks more like the Zookeeper/YARN issues mentioned in the past.
> Unfortunately, I do not have a YARN instance to test this with.  Does
> anyone else have any insights here?
>
>
> On 1/10/14 1:48 PM, Kristen Hardwick wrote:
>
>  Hi all, I'm requesting help again! I'm trying to get this
> SimpleShortestPathsComputation example working, but I'm stuck again. Now
> the job begins to run and seems to work until the final step (it performs 3
> supersteps), but the overall job is failing.
>
>  In the master, among other things, I see:
>
>  ...
>  14/01/10 15:04:17 INFO master.MasterThread: setup: Took 0.87 seconds.
> 14/01/10 15:04:17 INFO master.MasterThread: input superstep: Took 0.708
> seconds.
> 14/01/10 15:04:17 INFO master.MasterThread: superstep 0: Took 0.158
> seconds.
> 14/01/10 15:04:17 INFO master.MasterThread: superstep 1: Took 0.344
> seconds.
> 14/01/10 15:04:17 INFO master.MasterThread: superstep 2: Took 0.064
> seconds.
> 14/01/10 15:04:17 INFO master.MasterThread: shutdown: Took 0.162 seconds.
> 14/01/10 15:04:17 INFO master.MasterThread: total: Took 2.31 seconds.
> 14/01/10 15:04:17 INFO yarn.GiraphYarnTask: Master is ready to commit
> final job output data.
> 14/01/10 15:04:18 INFO yarn.GiraphYarnTask: Master has committed the final
> job output data.
>  ...
>
>  To me, that looks promising - like the job was successful. However, in
> the WORKER_ONLY containers, I see these things:
>
>  ...
>  14/01/10 15:04:17 INFO graph.GraphTaskManager: cleanup: Starting for
> WORKER_ONLY
> 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed
> event
> (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_addressesAndPartitions,
> type=NodeDeleted, state=SyncConnected)
> 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent :
> partitionExchangeChildrenChanged (at least one worker is done sending
> partitions)
> 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed
> event
> (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/1/_superstepFinished,
> type=NodeDeleted, state=SyncConnected)
> 14/01/10 15:04:17 INFO netty.NettyClient: stop: reached wait threshold, 1
> connections closed, releasing NettyClient.bootstrap resources now.
> 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job state
> changed, checking to see if it needs to restart
> 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already
> exists
> (/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState)
> 14/01/10 15:04:17 INFO yarn.GiraphYarnTask: [STATUS: task-1] saveVertices:
> Starting to save 2 vertices using 1 threads
> 14/01/10 15:04:17 INFO worker.BspServiceWorker: saveVertices: Starting to
> save 2 vertices using 1 threads
> 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent: Job state
> changed, checking to see if it needs to restart
> 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state already
> exists
> (/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState)
> 14/01/10 15:04:17 INFO bsp.BspService: getJobState: Job state path is
> empty! -
> /_hadoopBsp/giraph_yarn_application_1389300168420_0024/_masterJobState
> 14/01/10 15:04:17 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
>         at java.io.StringReader.<init>(StringReader.java:50)
>         at org.json.JSONTokener.<init>(JSONTokener.java:66)
>         at org.json.JSONObject.<init>(JSONObject.java:402)
>         at
> org.apache.giraph.bsp.BspService.getJobState(BspService.java:716)
>         at
> org.apache.giraph.worker.BspServiceWorker.processEvent(BspServiceWorker.java:1563)
>         at org.apache.giraph.bsp.BspService.process(BspService.java:1095)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed
> event
> (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_vertexInputSplitsAllReady,
> type=NodeDeleted, state=SyncConnected)
>  14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed
> event
> (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_addressesAndPartitions,
> type=NodeDeleted, state=SyncConnected)
> 14/01/10 15:04:17 INFO worker.BspServiceWorker: processEvent :
> partitionExchangeChildrenChanged (at least one worker is done sending
> partitions)
> 14/01/10 15:04:17 WARN bsp.BspService: process: Unknown and unprocessed
> event
> (path=/_hadoopBsp/giraph_yarn_application_1389300168420_0024/_applicationAttemptsDir/0/_superstepDir/2/_superstepFinished,
> type=NodeDeleted, state=SyncConnected)
> ...
> 14/01/10 15:04:17 WARN hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /user/spry/Shortest/_temporary/1/_temporary/attempt_1389300168420_0024_m_000001_1/part-m-00001:
> File does not exist. Holder DFSClient_NONMAPREDUCE_-643344145_1 does not
> have any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2755)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2567)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2480)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>  ...
>
>  I apologize for the wall of error message, but I tried to leave in at
> least some of the parts that might be useful. I put the entire YARN log
> here: http://tny.cz/af229738
>
>  Has anyone ever seen this before? This is the command I'm using to run:
>
>  hadoop jar
> giraph-core/target/giraph-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner -Dgiraph.SplitMasterWorker=false
> -Dgiraph.zkList="localhost:2181" -Dgiraph.zkSessionMsecTimeout=600000
> -Dgiraph.useInputSplitLocality=false
> org.apache.giraph.examples.SimpleShortestPathsComputation -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/spry/input -vof
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/spry/Shortest -w 1
>
>  My setup is still the same as the other email if you saw it:
>
>  I compiled Giraph with this command, and everything built successfully
> except "Apache Giraph Distribution" which it doesn't seem like I need:
>
> mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests clean package
>
> I am running with the following components:
>
>  Single node cluster
>  Giraph 1.1
>  Hadoop 2.2.0 (Hortonworks)
>  Java 1.7.0_45
>
>  Thanks in advance,
> -Kristen Hardwick
>
>
>

Reply via email to